Sources for Metadata:
Software Applications
Metadata in these software applications is lying in different forms, and each form will require different approach:
Captive Metadata Repository
Many applications have their own metadata repositories for their internal functioning. This is perhaps the best form of Metadata availability, as you can expect metadata to be fairly sanitized. The down-side is that sometime the metadata database and its structure is highly proprietary, which makes it difficult to access. A metadata tool has to be able to connect with these application centric repositories through a metadata exchange mechanism (say XML) or through some standards ODBC protocol. The last option is for a interface file exchange.
Application Management Tools:
These tools contain information on various application related aspects like
- Data modeling tools carrying logical and physical data model details
- Application versioning tools carrying the versioning and other details on programs and procedures
- Release management tools carrying the release management flow
- Data Centre Operations tools carrying the job schedules and other details
The business applications
If the above two categories of tools are not able to fulfill the purpose, a metadata extraction tool has to self discover the metadata. It is able to extract the table structure by referring to the data tables in the application database. It can find the details of the programs with in the applications by referring to the headers in the programs which (hopefully) carry the title, description, time of change and other related information.
Unstructured Content
Most of the business metadata even in evolved organizations is lying in the form of documents and spreadsheets. We are categorizing spreadsheets in the unstructured content as they are typically not governed by the tight rules of data management. The unstructured content is the biggest challenge as well as an opportunity for Metadata initiatives. Unless you have some content management or collaboration tools, it is a difficult task to get this kind of data.
TIP- To have an efficient metadata creation for unstructured content, Use methods like content management tools, collaboration platforms, central shared drives etc...
End-Users
There is always a manual component of the metadata. For example, you may not be able to automatically generate the location and sources of physical documents (like legal papers), and you may need to enter it. A good extraction layer should allow people to enter data directly into the metadata staging area (where all the extracted data is placed- just like staging area of a data warehouse).
Application Integration Tools
Application integration or messaging tools have host of data on:
- Source and destination systems for different messaging.
- The structure for data exchange
- The frequency of exchange
- The controls and checks applied on the data exchange.
Web
Public internet or extranet can provide metadata around the data structures of the forms in which you capture data, the change history of the updation done in the web content.
Work Management tools
Tools like collaboration, business process management, work flow management etc. provide a rich source of mainly business metadata. It provides the business process maps, desk instructions, policies, rules and management policy.
Independent Metadata repository
As has been mentioned in the detailed metadata architecture scenarios, typically metadata exists in a distributed model. In this model, the independent metadata repositories (like Data Warehouse repository, ERP repository, CRM repository...) pool into a central enterprise level metadata repository. These repositories are the first option to pick-up the metadata. If the metadata is not available in one of these independent repositories, the metadata extraction should go to the source systems.
Imperatives of good metadata extraction
This list is picked from the data warehouse extraction design, with examples related to metadata
Reliable Source:
For example, one can have business rules related to sales compensation lying in different places:
- A paper document which is lying with the sales compensation unit
- The functional specs of a sales compensation system
- Business process map in the Business process management application
The correct sales compensation business rules will be lying in one of the sources or a combination of them. It is upon the data steward or metadata manager to ensure that the correct and most comprehensive source is identified.
Completeness of Metadata Extraction
Ensuring that Meta- Data Extraction is well audited The Extraction process should be able to run the quality checks to confirm that all the Meta-data has been extracted from all sources before giving a go-ahead to the metadata transformation activity. For example, one needs to check if the metadata has been extracted from all applications, content management systems, work-flow management systems, CASE tools etc. In a good metadata management tool, there will be an extraction monitoring mechanism, which will keep on recording the status.
Preservation of Data
Ensure that Extraction Process preserves the metadata: The metadata repository typically needs to maintain historical snapshots, which in source systems many a times are over written or archive and also purged. A staging area typically should 'append' the extracted information to the existing metadata. This should be maintained at least till the transformed previous data reaches the final 'Presentation/Analysis' area.
In synch with source metadata repositories and the source systems
We always recommend for the metadata extraction tool to take the data from independent metadata repository (refer detailed architecture scenarios and Metadata ETL), as they would have done the job of sanitizing and integrating metadata from their respective sources. Once you have done the extraction, one need to do a quality check that extracted data is in synch with the source data.
The frequency of extraction
Most of the metadata extraction will happen as the metadata changes. Some kind of metadata changes fast (like job schedules, technical metadata on new applications, changes in the database design etc.) and some change infrequently (like organizational policies, rules, code of conduct...). As per BiPMinstitute.com recommendation, all sources of metadata and types of metadata (technical metadata and business metadata) should be checked for any changes on daily basis. We do not think there is a need for real-time metadata integration (unlike the need for real-time data integration). |