Institute for Building Intelligent and Performing Enterprises
 Building Intelligent and Performing Enterprises
Business Intelligence Tool Evaluation Kit 
  
Login or Register  
 
Join Professional Network of Business Intelligence and Performance Management

   Data Warehouse Modeling and Analyze Phase OLAP + Data Warehouse Design Phase  

BUY→ BI Tools Evaluation || Data Quality Kit || Consulting

BiPM Encyclopedia  →   Business Intelligence  →  SECTION -  Data-Warehouse/Mart  →  CHAPTER -  Data Warehouse Project Plan- Work Break-Down Structure  → 

Data Warehouse Design Phase

Data Warehouse design phase is hinged around the extraction, transformation and loading phase. It is advisable to do a test run on a simple schema, before getting down to a full design of your enterprise data warehouse.

Detailed Assessment of your existing IT landscape

This is a detailed inventorization of your IT landscape, which could have implication in selection of your BI tool vendor as well as for the Business Intelligence design. This inventorization will include-

Integration platform

  • Any application integration platform, currently in use (like Tibco or MQ Series...).
  • What are the various application integration techniques are used? Most of the organizations, though having a core EAI platform, still use different type of integration techniques ranging from state of the art data exchange bus to point-to-point data file exchange.
  • Any current issues in the integration space.

The source systems (as identified in the analyze stage) inventory-

  • The servers and their location
  • The operating systems
  • The Database management system
  • How these source systems are currently exchanging data

Data Quality and Stability issues

  • Are there any weak spots related to the data quality?
  • What have been major production issues, related to the source systems?
  • Have there been any audit comments related to the source systems?
  • Are users happy with the data accuracy in the reports produced out of the source systems?

Organization's platform direction

  • Is the organization Oracle or IBM or MS centric in terms of its DBMS
  • Which ERP track the organization is going towards Oracle or SAP?

Future IT road-map

  • Are we planning any major IT initiative, which could have an impact on BI platform?
  • Is any of the source system planned to be re-written or significantly changed?

Evaluate and identify the core BI platform

Before you start designing the data warehouse, you should have identified your core BI platform including the ETL, Data Warehouse server, metadata management and OLAP server.

Along with this you would have identified the application server and web server, which will enable the integration of the core BI platform with other applications and with the end-users. In reality, most of the times, you select your core BI platform components (recommended to be taken from a single vendor apart from data Warehouse server...) before you initiate the project. This is because your vendor brings in huge knowledge-base to support you through analyze and design phases. The evaluation criteria are provided in the Business Intelligence Tool Evaluation Kit. The whole process of evaluating and identifying vendors and tools is covered in vendor-tool domain section

Do source system Mapping

This is well covered in the Source systems mapping and Source systems mapping matrix. In-brief, you create the list of source systems, and get the details on the level of stability, quality of data, and which data they can provide. At this stage you will also decide on which system will be the master reference for which data.

Do ETL for first set of schemas

The details of ETL are covered in Extraction Design, Transformation design and Loading Design. If you are working on an enterprise data warehouse and not data-mart (refer Data Warehouse vs Data mart), you will be picking up first 1-2 simpler schemas and do an end to end design of extraction, transformation and loading. This will provide you a significant level of learning on the issues you face, your learning in terms of the configuration and usage of ETL tools (and check if they really can do what they claim to do...)

NOTE- Some part of this effort may be throw-away. This is because, when you create your ETL strategy for all schemas in the data-warehouse, your ETL design will be different vis-à-vis doing it for 1-2 schemas. Many of the Extraction and transformation routines could serve multiple schemas. (As a help note- schema is like a single cube in OLAP...)

Develop and Test ETL for first set of schemas for sample production data

Do the development and testing for the first set of identified schema and check on how it works. This will provide you the last leg of initial learnings you need to have before you go for full-fledged design.  Development and testing should not be that much of an issue, if you have a smart ETL tool.

Do the ETL design of the entire set of schemas

Now is the time to go ahead with the detailed design for the entire enterprise data-warehouse. This means that you will do a holistic design of all the schemas involved in the current scope of data-warehouse initiative. As you do the holistic design, across the schemas, you may use the principles of synergies so that one can-

  • Avoid duplicate extractions- Though three schemas are looking for the customer data, the extraction routine for getting customer master should be run once.
  • Avoid duplicate transformation routines- It’s the same principle as above. If there are 3 schemas looking for customer data, you may run the transformation routine of de-duping customer master only once.
  • Make entity based transformation routines- This is aligned to the principle of conformed and foundation dimensions. A healthy data warehouse design will use standard dimension table design for a given dimension across various schemas. This will mean that even if a given schema is not using specific dimension attribute (like not using the state attribute in the location dimension), it will still be maintained in the dimension table and populated (if the data exists...).You may refer use foundation dimensions for cross-drilling.

Taking this principle into account, an ideal transformation will be the one which will do the end to end transformation of a dimension in one go. For example, it will do all transformation related to customer master, location master, product master, so that the whole process is clean and auditable.

Implement the physical database design

The whole ETL design (which includes data base structures as well as the ETL routines), will be in the logical form to start with. The same has to be translated into the physical design. A complete chapter on Data Warehouse physical design will be released soon.

Configure Metadata for the design elements

Once your design is complete, you will start configuring the data warehouse metadata in the metadata management tool. The metadata tool configuration will be an ongoing process, which primarily starts with design phase.

Sometimes the question comes on 'Metadata also includes dimensional models as you create them in the analyze phase. Why should you not configure those dimensional models in Metadata tools in the analyze phase?’ The answer to this question is that firstly the dimensional models are signed off towards the end of the analyze phase and beginning of the design phase. Secondly, as you do the ETL design, they undergo some level of fine-tuning in the dimensional model. Therefore our suggestion would be to start configuring your metadata tool in the design phase, after you have done the holistic ETL design.

Design and Configure Data Warehouse Job flow and Scheduling

The ETL design includes the sequence of tasks. In these steps, you will translate this sequence into a job flow and control language. This is part of data base management and data centre operations task. ETL scheduler tool is the key in this set-up. All good ETL tools come with the ETL schedulers. You may also decide to use a third party schedule as you may want a single schedule for all your job scheduling activities.

Design and Configure Data Warehouse Access Browsing and Query Servicing

Data Warehouse querying and browsing services are typically part of the standard Data Warehouse server. The only difference here is that you would need to configure them for Data Warehouse kind of design. For example, making query services aggregate aware. This means that if an aggregate/summary level schema/cube exists, query service will not be using the detailed level schema for running a summary level query.

Design and Configure Staging Servers and Loaded Areas;

This is the standard step of configuring the servers with all the attendant components, just like you will do it for any OLTP application.


   Data Warehouse Modeling and Analyze Phase OLAP + Data Warehouse Design Phase  

All Topics in: "Data Warehouse Project Plan- Work Break-Down Structure" Chapter
 Data Warehouse Project Definition →  Data Warehouse Project Initiation Phase →  Data Warehouse Business Requirements Gathering Phase →  Data Warehouse Modeling and Analyze Phase →  Data Warehouse Design Phase →  OLAP + Data Warehouse Design Phase →  Physical Database Design and Implementation → 
 

Was this page helpful?

If you like it ? share it !
Digg
Digg
Reddit
Reddit
Del.icio.us
Delicious
Google
Google
Live
Live
Facebook
Facebook
Slashdot
Slashdot
Netscape
Netscape
Technorati
Technorati
Stumbleupon
Stumbleupon
Spurl
Spurl
Furl
Furl
Blogmarks
Blogmarks
Yahoo
Yahoo
Plugim
Plugim
Squidoo
Squidoo
BlinkBits
BlinkBits

BUY→ BI Tools Evaluation || Data Quality Kit || Consulting

Tags    -     See all

Add to this page
 
   
   
 

 
Back
CONTENT ZONE
Data-Warehouse/Mart
Customize Alerts