Data lineage

      

ABSTARCT :

The purpose of this paper is to analyze and implement incremental updates of data lineage storage in the software tool Manta Flow. The basis of this work is the study of current data lineage storage in Manta Flow, research of existing solutions of incremental updates in version control systems, research of incremental backups in databases, analysis and design of a new solution of incremental updates in Manta Flow and a subsequent prototype implementation and performance testing execution. The resulting prototype can be deployed into the existing Manta Flow product, reducing time complexity of updates in data lineage storage in orders of magnitude.

EXISTING SYSTEM :

? Our goal is to develop a new way to automatically solve a significant class of existing management and analysis problems in a corporate data warehouse environment. ? Our approach taken is based on scanning, mapping, modelling and analysing metadata of existing systems without accessing the contents of the database or impacting the behaviour of the data processing system. ? An unlimited number of different data models can exist inside our metadata model simultaneously, with relationships between them. ? Therefore, mappings stored in a repository can exist as objects independent from the transformation process and can be reused by several different processes.

DISADVANTAGE :

? Even a special issue on Applications of Provenance is devoted to discussion of the different roles of provenance in information management in a variety of domains. ? Data lineage issues are not new both in practice and research. Overview of data lineage and data provenance tracing studies can be found in book, a more recent survey is offered by the paper. ? Interval search is fundamental to the revision validity querying, we cannot use custom class to avoid the issue of multiple queries when filtering elements according to their revision validity. ? The problem lies in internal representation of Manta repository. So, it is a particular implementation not principal limitation. Complete explanation and discussion is out of the scope of this paper.

PROPOSED SYSTEM :

• They proposed a business information model (or conceptual business model) as the solution and a central mapping point to overcome those issues. • De Santana proposed the integrated metadata and the CWM metamodel-based data lineage documentation approach. • Data lineage can help with efforts to analyze how information is used and to track key bits of information that serve a particular purpose. • Data warehouse systems collect data from various distributed and heterogeneous data sources, integrating details or summarized information in local database for further processing and analysis for various applications and purposes.

ADVANTAGE :

? Performance testing was performed to test the implemented incremental update in Manta Flow and compare its expected effectiveness with the original full update effectiveness. ? A part of a real (anonymized) database provided by Manta company was used for performance testing. ? Moreover, performance testing also proved only a slight slowdown when the same amount of data is updated by incremental update instead of full update. ? The reason we used Manta in this paper is the fact that authors took an active part in Manta tools development, specifically in design and implementation of full and incremental update of its metadata repository.

Download DOC Download PPT

We have more than 145000 Documents , PPT and Research Papers

Have a question ?

Chat on WhatsApp