Source system documentation: Its generally not possible to have a complete
description of :
the source system tables
inter-linkages between the tables,
the same data-elements with different field names in different tables
The transformation rules: Many transformation rules are fairly complex,
and cannot be implemented through standard options given in a tool and have
to be programmed into the system.
The reasoning: While it might be possible for an ETL system to self-document
the simple transformational rules, it cannot document the reasons and objectives
behind the transformation. For example, why are you splitting a customer_ID
(AA302456) into sub-components like Customer_type(AA), customer location (30)
and customer_number(2456). The different reasons for this splitting could
be:
Enabling the specific type of queries around customer type.
Two different tables could be having different field structure for customer
type (AA v/s XXAA)
The documentation on risks related to the efficacy of ETL: During the design
of an ETL system, one comes to know the limitations of the ETL. For example,
you may not be able to achieve 100% perfect extractions or transformations
given the:
limitations of data quality in source systems
limitations of the extraction flexibility,
the performance load due to a complex extraction query.
These limitations should always be documented, which give a more realistic
view of the level of accuracy around the data and the output information.
The flow of ETL: A set of data goes sometimes go through multiple transformation
routines before it reaches end-state and be ready for loading. A good documentation
should be able to provide an end-to-end view of this entire flow so that
one can understand the purpose behind this flow. This end-to-end view should
be able to answer the following questions:
Why we are following X steps and not Y steps to do a transformation?
What is the completion criteria related to each step of transformation/Extraction?
Data Quality checks: An ETL system generally does not document the data quality checks, which
need to be done and their reasoning.
Quick Feedback- Was this information helpful ?
BiPM Support- Let us help you find what you are looking for-