Datawarehousing - guidelines for planning and development

 Introduction

Data warehousing has established a new middleware layer in corporate application architecture. Such a middleware layer is necessary because the direct, individual access of decision support applications to data of operational, transaction-oriented applications has proved to be technically or economically infeasible. Data quality problems and complex integration requirements usually make it impossible to provide a consistent and real time integrated data to decision support systems.

 

Datawarehousing system as middleware layer.

 

This paper outlines the common guidelines followed across the Datawarehousing solutions. It also highlights the risks involved at various stages of Datawarehousing application development. Each of these guidelines needs to be discussed in detail to arrive at a solution best suitable for your business requirements.
 

Planning

Requirement Analysis guidelines:

 

Risks:


 

Architecture and design guidelines:

IT infrastructure.

Development

Guidelines for development and testing Datawarehousing solution:

 

Risks:


 

 

Glossary:

Data Mart: Data Mart differs from data warehouse in the scope of the data that it deals with. Typically, it is focused on a specific department or business process rather than being a corporate data repository. A data mart can be stand alone or fed centrally from the corporate data warehouse.

 

Dimension: Av dimension of a piece of data is any part of that data on which it can e usefully viewed, divided or summarized. Typical dimensions are items such as time, date, region, location or account_id.

 

Fact Data: The fact data is the basic core data that will be stored in the data warehouse. For example: Sales, Call records or account transactions.

 

MPP: An acronym for massively parallel processing. These are large multi-node machines with larger number of CPUs.

 

NUMA: An acronym for non-uniform memory architecture. A NUMA machine is composed of multiple nodes, but unlike cluster or MPP machines, it has shared memory address space.

 

SMP: An acronym for symmetric multi processing. An SMP machine consists of many CPUs, which share memory and disk.

Cluster: A tightly coupled group of SMP machines, with shared disks. 

References:

Technical article: The Current and Future Role of Data Warehousing in Corporate Application Architecture, by Robert Winter, Institute of Information Management, University of St. Gallen.

Inmon, W.H., Zachman, J.A., Geiger, J.G.: Data Stores Data Warehousing and the Zachman Framework, McGraw-

Hill:

Modern Data warehousing, Mining and Visualization: Core Concepts, George M. Marakas

Database Systems: A Practical Approach to design, Implementation, and Management, 3/e Thomas Connolly, Carolyn Begg