This page is an extract from BIDS Metadata Management Solution authored by Kamlesh Mhashilkar- Head, BIPM Services of Tata Consultancy Services
Back-Room Metadata Overview
Back-Room Metadata spans across the Source Systems and BI Technical Metadata areas. This is primarily utilized by the administrators, developers and designers. The mappings between data source and system technical elements form the base of the ETL metadata, which is used for the data load purpose. Also the end user data usage at various levels (i.e. report, table, column level) is needed for performance monitoring and tuning purpose. These metadata components primarily come under the back-room metadata.
ETL Metadata
In ideal scenario, end-to-end data loading process should be metadata driven. Otherwise a lot of manual intervention is needed to run the data load jobs and carry out the changes in the existing components. ETL metadata principally gets divided into two areas.
- Control Metadata
- Process Metadata
Control Metadata
The metadata deployed for controlling Extraction, Transformation and Loading processing is named as control metadata. It mainly contains the scheduling and data audit / reconciliation information.
Scheduling metadata drives the triggering of the ETL jobs and the check on data receipt. The streamlining of various dependent processes and parallelism of independent process needs to be configured in this metadata. The data audit and reconciliation involves capturing check sums, row counts of source data and the loaded data. In case of mismatch (outside the reasonable ranges), alerts need to be generated. Record rejection and error logging should also be noted in the metadata.
Control metadata also aids in stating the data load status (i.e. data availability in BI system) to the end users. The history of input data errors and rejection also helps administrators to proactively enhance the system or suggest the source system changes.
The following diagram gives a basic control metadata model, which can be enhanced as per requirement.

The entire ETL processing can be managed, to the extent possible, through a single, metadata-driven process control environment. The control process should capture metadata regarding the progress and statistics of the daily job. It can also facilitate maintenance and development of a cohesive set of extract statistics metadata.
Process Metadata
The metadata that is used for processing purpose is called the Process Metadata. The complete ETL data transformation information is placed in this metadata section.
The following are the various processing activities, which are recorded in the Process Metadata.
No. |
Activity |
Description |
1. |
Transformation |
Data type conversion: This involves lower-level transformations converting one data type or format to another. E.g. converting date, numeric, and character representations from one database to another.
Calculation and derivation: These transformations need to apply the business rules identified during the requirements process, which would involve functions including string manipulation, date and time arithmetic, conditional statements, and basic mathematical functions.
Aggregation: These involve summarization of low level data to the required granularity.
Special Transformations: e.g. row to column conversions and column to row conversions |
2. |
Cleansing and Augmentation |
Cleaning up the source data by applying transformations (e.g. conversion, derivation and translation), removing duplicates and merging data from different source systems.
De-duping / match-merging is one of the important activities in the cleaning process, which is valuable for Customer centric applications. It aims at defining the business entity uniquely by removing its duplicate entries or by merging the multiple entries. e.g. duplicate customer records can be removed by applying an intelligent algorithm which can define a unique customer with similar sounding name and address. One can also apply augmentation and enrichment techniques. |
3. |
Validation |
Referential integrity (/lookups) and constraint checks can be either implemented at the database level or during transformation process. These rules should be specified as a part of metadata. |
The ETL development could be in the form of specification through an ETL tool or development of custom built programs or mix of both these methods. The custom built scripts should use the metadata rather than hard coding of validations so that maintenance is possible through metadata. Also the Audit Trail (e.g. ETL process log) generated should display the metadata references in order to improve the readability.
Data Models in backroom metadata
The data models are the backbone of the BI system metadata. These models can encompass
- DW schema (Technical Metadata) and the corresponding Business Metadata (For Data Warehouse schema modeling, refer Data Warehouse dimensional modeling)
- Source system data structures and metadata (Data Source Metadata)
- Mapping between source entities and DW entities (ETL Process Metadata)
NOTE: The availability of data source metadata and ETL process metadata with the data model depends on the functionality offered by the data modeling tool e.g. ERWin. Otherwise data models encompass only the DW schema and corresponding business metadata.
The DW schema and corresponding Business Metadata can be fed to the RDBMS. E.g. tools like ERWin, Designer2000 allow forward engineering to generate database structures and corresponding comments in the database. This business metadata may be imported into the metadata repository of front-end tool using a few custom scripts. This can populate the front-end layer with the business metadata which is an integral part of Front-Room Metadata. The following table shows a basic data structure for storing the metadata related to a data field in DW schema.
No. |
Fields |
1. |
Column_Name |
2. |
Data_Type |
3. |
Length |
4. |
Precision |
5. |
Default_Value |
6. |
Low_Range |
7. |
High_Range |
8. |
Units_of_Measure |
9. |
Level |
11. |
Business_Name |
12. |
Business_Description |
13. |
Table_Name |
14. |
Subject_Area |
Also ETL processes (tool based or custom built) can be manually defined with the mapping information fed in the data modeling tools. The data source metadata can be imported into the ETL repository, which can be utilized during the ETL development.
Back-Room Metadata- Security Profiles
BI system administrators need to set and monitor the system security at various levels to ensure access restriction. The user profiles and their security policies are maintained at various regions, namely Back-End and Front-End Security Regions.
Tool administrators, developers and designers are part of the back-end security. They have privileges to modify the system and the data. Their area of operation is should be confined to the Level 100 security zone. It is recommended that they should not operate from the nodes other that this security zone. This information is maintained in metadata of the database or metadata repositories of various tools.
The end-users, who access the data, come under the Front-End Security Region. The administrators create these users and define their data access policies. ACLs are generated for this purpose and implemented in individual front-end tools of the system. These ACLs are stored in the metadata repositories of the respective tools.
The following are the different levels / categories of users along with their region of control.
Security Region |
Role |
Region of Control |
Back-End Security |
Administrator |
Access to Operating System, system tools |
Back-End Security |
Developer / Designer |
Access to system tools (development environment) |
Back-End Security |
Supervisor |
Access to system tools (production environment) |
Front-End Security |
Executive |
Access to data from multiple departments |
Front-End Security |
Manager |
Access to data from respective department |
Front-End Security |
Analyst |
Access to data from respective division / business function in the Department |
Front-End Security |
Operator |
Access to operational data (mostly from ODS). |
Audit Trail
BI system usage metadata indicating who is accessing which components of the DW, which reports are accessed at what frequency, what are the queries / ad-hoc reports requested, processing time for each report / query etc. should be recorded in the metadata repository.
The following table gives a generic audit trail table design.
No. |
Field |
Description |
1. |
User_ID (PK) |
User identification |
2. |
Action (PK) |
Action specification e.g. Login, Logout, Open Report, Create Report, Save Report, Refresh Report, Publish Report to Corporate, Send Report to another User, Error etc. |
3. |
Timestamp (PK) |
Time when the action was initiated |
4. |
Execution_Time |
Average execution time for the action. This can be NULL if the action was aborted. |
5. |
Module |
Details of the module for which the action is taken e.g. Report Name, Table Name, Subject Area, Department, Error Code etc. |
6. |
User_IP |
The machine identification (IP Address) from which the user is accessing the system |
Note- BIDS Solutions encompass the proprietary solutions from TCS covering Business Intelligence and Data Warehousing landscape.
|