Detailed Assessment of your existing IT landscape
This is a detailed inventorization of your IT landscape, which could have implication in selection of your BI tool vendor as well as for the Business Intelligence design. This inventorization will include-
Integration platform
- Any application integration platform, currently in use (like Tibco or MQ Series...).
- What are the various application integration techniques are used? Most of the organizations, though having a core EAI platform, still use different type of integration techniques ranging from state of the art data exchange bus to point-to-point data file exchange.
- Any current issues in the integration space.
The source systems (as identified in the analyze stage) inventory-
- The servers and their location
- The operating systems
- The Database management system
- How these source systems are currently exchanging data
Data Quality and Stability issues
- Are there any weak spots related to the data quality?
- What have been major production issues, related to the source systems?
- Have there been any audit comments related to the source systems?
- Are users happy with the data accuracy in the reports produced out of the source systems?
Organization's platform direction
- Is the organization Oracle or IBM or MS centric in terms of its DBMS
- Which ERP track the organization is going towards Oracle or SAP?
Future IT road-map
- Are we planning any major IT initiative, which could have an impact on BI platform?
- Is any of the source system planned to be re-written or significantly changed?
Evaluate and identify the core BI platform
Before you start designing the data warehouse, you should have identified your core BI platform including the ETL, Data Warehouse server, metadata management and OLAP server.
Along with this you would have identified the application server and web server, which will enable the integration of the core BI platform with other applications and with the end-users. In reality, most of the times, you select your core BI platform components (recommended to be taken from a single vendor apart from data Warehouse server...) before you initiate the project. This is because your vendor brings in huge knowledge-base to support you through analyze and design phases. The evaluation criteria are provided in the Business Intelligence Tool Evaluation Kit. The whole process of evaluating and identifying vendors and tools is covered in vendor-tool domain section
Do source system Mapping
This is well covered in the Source systems mapping and Source systems mapping matrix. In-brief, you create the list of source systems, and get the details on the level of stability, quality of data, and which data they can provide. At this stage you will also decide on which system will be the master reference for which data.
Do ETL for first set of schemas
The details of ETL are covered in Extraction Design, Transformation design and Loading Design. If you are working on an enterprise data warehouse and not data-mart (refer Data Warehouse vs Data mart), you will be picking up first 1-2 simpler schemas and do an end to end design of extraction, transformation and loading. This will provide you a significant level of learning on the issues you face, your learning in terms of the configuration and usage of ETL tools (and check if they really can do what they claim to do...)
NOTE- Some part of this effort may be throw-away. This is because, when you create your ETL strategy for all schemas in the data-warehouse, your ETL design will be different vis-à-vis doing it for 1-2 schemas. Many of the Extraction and transformation routines could serve multiple schemas. (As a help note- schema is like a single cube in OLAP...)
Develop and Test ETL for first set of schemas for sample production data
Do the development and testing for the first set of identified schema and check on how it works. This will provide you the last leg of initial learnings you need to have before you go for full-fledged design. Development and testing should not be that much of an issue, if you have a smart ETL tool.
Do the ETL design of the entire set of schemas
Now is the time to go ahead with the detailed design for the entire enterprise data-warehouse. This means that you will do a holistic design of all the schemas involved in the current scope of data-warehouse initiative. As you do the holistic design, across the schemas, you may use the principles of synergies so that one can-
- Avoid duplicate extractions- Though three schemas are looking for the customer data, the extraction routine for getting customer master should be run once.
- Avoid duplicate transformation routines- It’s the same principle as above. If there are 3 schemas looking for customer data, you may run the transformation routine of de-duping customer master only once.
- Make entity based transformation routines- This is aligned to the principle of conformed and foundation dimensions. A healthy data warehouse design will use standard dimension table design for a given dimension across various schemas. This will mean that even if a given schema is not using specific dimension attribute (like not using the state attribute in the location dimension), it will still be maintained in the dimension table and populated (if the data exists...).You may refer use foundation dimensions for cross-drilling.
Taking this principle into account, an ideal transformation will be the one which will do the end to end transformation of a dimension in one go. For example, it will do all transformation related to customer master, location master, product master, so that the whole process is clean and auditable.
Implement the physical database design
The whole ETL design (which includes data base structures as well as the ETL routines), will be in the logical form to start with. The same has to be translated into the physical design. A complete chapter on Data Warehouse physical design will be released soon.
Configure Metadata for the design elements
Once your design is complete, you will start configuring the data warehouse metadata in the metadata management tool. The metadata tool configuration will be an ongoing process, which primarily starts with design phase.
Sometimes the question comes on 'Metadata also includes dimensional models as you create them in the analyze phase. Why should you not configure those dimensional models in Metadata tools in the analyze phase?’ The answer to this question is that firstly the dimensional models are signed off towards the end of the analyze phase and beginning of the design phase. Secondly, as you do the ETL design, they undergo some level of fine-tuning in the dimensional model. Therefore our suggestion would be to start configuring your metadata tool in the design phase, after you have done the holistic ETL design.
Design and Configure Data Warehouse Job flow and Scheduling
The ETL design includes the sequence of tasks. In these steps, you will translate this sequence into a job flow and control language. This is part of data base management and data centre operations task. ETL scheduler tool is the key in this set-up. All good ETL tools come with the ETL schedulers. You may also decide to use a third party schedule as you may want a single schedule for all your job scheduling activities.
Design and Configure Data Warehouse Access Browsing and Query Servicing
Data Warehouse querying and browsing services are typically part of the standard Data Warehouse server. The only difference here is that you would need to configure them for Data Warehouse kind of design. For example, making query services aggregate aware. This means that if an aggregate/summary level schema/cube exists, query service will not be using the detailed level schema for running a summary level query.
Design and Configure Staging Servers and Loaded Areas;
This is the standard step of configuring the servers with all the attendant components, just like you will do it for any OLTP application.
|