Metadata Transformation is of a different order compared to data transformation in a data warehouse. The level of transformation in data warehouse is more intense compared to a metadata transformation. The reason is that while we can do standardization, augmentation and enrichment of data in DW transformation, we cannot do it in metadata.
Metadata is expected to represent the real world. For example if you have incomplete customer data, you can adopt certain extrapolation rules to fill in the blanks, to enable your customer analysis. However, in case of metadata, you cannot add a dummy location to a file, if that information is missing. You may have to just leave it like that. Typically in metadata, we leave the incomplete metadata as-is or update it manually based upon the real data.
Here are some standards operations you can do on the extracted metadata
Creating common entity identifiers:
Different systems have different names in representing the same entity. As an example, three core systems could be having same customer master table in different names. While you do maintain the original names and have three different records in the master_table table in metadata, you can also add a tag, which identified these tables to be linked to the same entity (that is customer). This helps you in standardizing the customer master table structure at a later data.
Creating common attribute identifier
This is similar to the previous point. Sometimes the same attribute is stored in different names across different systems and also across different tables within the same system. For example- Attribute City where customer lives can be stored as customer_city OR cust_city OR cust_location_city in different customer master tables. One can add a common attribute identifier (say customer_city_standard) along with the actual names to signify it as the same attribute.
TIP- Typically these common entity and attribute identifiers have a naming strategy which helps in tracking them. For example you can add standard at the end of the common identifier.
Adding contextual and conceptual level information
The higher level metadata information is only partially available in the source system, and has to be manually added into the staging area. As most of this information is descriptive and contextual, one generally has a higher end resource to enter it. For example, if you want to enter the objectives of a business application or a data model, you would like to have the master architecture or a senior data modeler to specify it.
TIP- For any manual entry of metadata, we recommend it to be entered into the staging area, before it gets loaded into the repository. A metadata repository (like a data warehouse) should ideally not allow any kind of write-back. This recommendation in many tools is not possible, as they allow the manual entry in the repository area only.
Building consistency around descriptive fields
If you have three customer master tables in three different systems, you will have three records in your metadata specifying the structure and other details. Just like a common entity identifier (as mentioned above), one can also have same or similar description linked to each of these metadata records. This is another form of identifying the commonality.
Tagging the metadata
If you refer to business metadata and technical metadata, you will see that there many different metadata object types. The transformation process links each of the metadata with relevant tags. The examples of tags are:
- Type of object: Data table, application program, business rule, business policy etc.
- Form of the object: Paper, Flat data file, database etc...
- The subject area to which the object belong to: sales function, finance function, treasury, HR...
There is a long list of type of tags and tags themselves. These tags are indispensable to manage, search and analyze the metadata.
Filling in the blanks
If there is a lack of metadata information, one needs to manually fill in the data in the staging area, during the transformation process. |