Institute for Building Intelligent and Performing Enterprises
 Building Intelligent and Performing Enterprises
Business Intelligence Tool Evaluation Kit 
  
Login or Register  
 
Join Professional Network of Business Intelligence and Performance Management

   Metadata Architecture Scenarios Metadata Repository Transformation Design  

BUY→ BI Tools Evaluation || Data Quality Kit || Consulting

BiPM Encyclopedia  →   Business Intelligence  →  SECTION -  Metadata Management  →  CHAPTER -  Metadata Architecture and Design  → 

Metadata Repository Extraction Design

Metadata extraction is the first stage in Metadata environment. It is equivalent to the extraction stage of a data warehouse. We will be looking in detail at various sources and also the mechanism by which sourcing layer gets the data and puts it into the integration layer, which is equivalent to the Data Warehouse Transformation and loading in data warehouse. The extraction layer picks-up data from various sources. Examples are Software tools, end-users, documents, messaging and transactions, applications, web sites & e-commerce and 3rd parties.

Sources for Metadata:

Software Applications

Metadata in these software applications is lying in different forms, and each form will require different approach:

Captive Metadata Repository

Many applications have their own metadata repositories for their internal functioning. This is perhaps the best form of Metadata availability, as you can expect metadata to be fairly sanitized. The down-side is that sometime the metadata database and its structure is highly proprietary, which makes it difficult to access. A metadata tool has to be able to connect with these application centric repositories through a metadata exchange mechanism (say XML) or through some standards ODBC protocol. The last option is for a interface file exchange.

Application Management Tools:

These tools contain information on various application related aspects like

  • Data modeling tools carrying logical and physical data model details
  • Application versioning tools carrying the versioning and other details on programs and procedures
  • Release management tools carrying the release management flow
  • Data Centre Operations tools carrying the job schedules and other details

The business applications

If the above two categories of tools are not able to fulfill the purpose, a metadata extraction tool has to self discover the metadata. It is able to extract the table structure by referring to the data tables in the application database. It can find the details of the programs with in the applications by referring to the headers in the programs which (hopefully) carry the title, description, time of change and other related information.

Unstructured Content

Most of the business metadata even in evolved organizations is lying in the form of documents and spreadsheets. We are categorizing spreadsheets in the unstructured content as they are typically not governed by the tight rules of data management. The unstructured content is the biggest challenge as well as an opportunity for Metadata initiatives. Unless you have some content management or collaboration tools, it is a difficult task to get this kind of data.

TIP- To have an efficient metadata creation for unstructured content, Use methods like content management tools, collaboration platforms, central shared drives etc...

End-Users

There is always a manual component of the metadata. For example, you may not be able to automatically generate the location and sources of physical documents (like legal papers), and you may need to enter it. A good extraction layer should allow people to enter data directly into the metadata staging area (where all the extracted data is placed- just like staging area of a data warehouse).

Application Integration Tools

Application integration or messaging tools have host of data on:

  • Source and destination systems for different messaging.
  • The structure for data exchange
  • The frequency of exchange
  • The controls and checks applied on the data exchange.

Web

Public internet or extranet can provide metadata around the data structures of the forms in which you capture data, the change history of the updation done in the web content.

Work Management tools

Tools like collaboration, business process management, work flow management etc. provide a rich source of mainly business metadata. It provides the business process maps, desk instructions, policies, rules and management policy.

Independent Metadata repository

As has been mentioned in the detailed metadata architecture scenarios, typically metadata exists in a distributed model. In this model, the independent metadata repositories (like Data Warehouse repository, ERP repository, CRM repository...) pool into a central enterprise level metadata repository. These repositories are the first option to pick-up the metadata. If the metadata is not available in one of these independent repositories, the metadata extraction should go to the source systems.

Imperatives of good metadata extraction

This list is picked from the data warehouse extraction design, with examples related to metadata

Reliable Source:

For example, one can have business rules related to sales compensation lying in different places:

  • A paper document which is lying with the sales compensation unit
  • The functional specs of a sales compensation system
  • Business process map in the Business process management application

The correct sales compensation business rules will be lying in one of the sources or a combination of them. It is upon the data steward or metadata manager to ensure that the correct and most comprehensive source is identified.

Completeness of Metadata Extraction

Ensuring that Meta- Data Extraction is well audited The Extraction process should be able to run the quality checks to confirm that all the Meta-data has been extracted from all sources before giving a go-ahead to the metadata transformation activity. For example, one needs to check if the metadata has been extracted from all applications, content management systems, work-flow management systems, CASE tools etc. In a good metadata management tool, there will be an extraction monitoring mechanism, which will keep on recording the status.

Preservation of Data

Ensure that Extraction Process preserves the metadata: The metadata repository typically needs to maintain historical snapshots, which in source systems many a times are over written or archive and also purged. A staging area typically should 'append' the extracted information to the existing metadata. This should be maintained at least till the transformed previous data reaches the final 'Presentation/Analysis' area.

In synch with source metadata repositories and the source systems

We always recommend for the metadata extraction tool to take the data from independent metadata repository (refer detailed architecture scenarios and Metadata ETL), as they would have done the job of sanitizing and integrating metadata from their respective sources. Once you have done the extraction, one need to do a quality check that extracted data is in synch with the source data.

The frequency of extraction

Most of the metadata extraction will happen as the metadata changes. Some kind of metadata changes fast (like job schedules, technical metadata on new applications, changes in the database design etc.) and some change infrequently (like organizational policies, rules, code of conduct...). As per BiPMinstitute.com recommendation, all sources of metadata and types of metadata (technical metadata and business metadata) should be checked for any changes on daily basis. We do not think there is a need for real-time metadata integration (unlike the need for real-time data integration).


   Metadata Architecture Scenarios Metadata Repository Transformation Design  

All Topics in: "Metadata Architecture and Design" Chapter
 Metadata Architecture Design →  Metadata standards →  Metadata Extraction, Transformation and Loading →  Metadata Architecture Scenarios →  Metadata Repository Extraction Design →  Metadata Repository Transformation Design → 
 

Was this page helpful?

If you like it ? share it !
Digg
Digg
Reddit
Reddit
Del.icio.us
Delicious
Google
Google
Live
Live
Facebook
Facebook
Slashdot
Slashdot
Netscape
Netscape
Technorati
Technorati
Stumbleupon
Stumbleupon
Spurl
Spurl
Furl
Furl
Blogmarks
Blogmarks
Yahoo
Yahoo
Plugim
Plugim
Squidoo
Squidoo
BlinkBits
BlinkBits

BUY→ BI Tools Evaluation || Data Quality Kit || Consulting

Tags    -     See all

Add to this page
 
   
   
 

 
Back
CONTENT ZONE
Metadata Management
Customize Alerts