Open Access
Subscription Access
A Meta Data Vault Approach for Evolutionary Integration of Big Data Sets:Case Study Using the NCBI Database for Genetic Variation
A data warehouse integrates data from various and heterogeneous data sources and creates a consolidated view of the data that is optimized for reporting and analysis. Today, business and technology are constantly evolving, which directly affects the data sources. New data sources can emerge while some can become unavailable. The DW or the data mart that is based on these data sources needs to reflect these changes. Various solutions to adapt a data warehouse after the changes in the data sources and the business requirements have been proposed in the literature [1]. However, research in the problem of DW evolution has focused mainly on managing changes in the dimensional model while other aspects related to the ETL, and maintaining the history of changes has not been addressed. The paper presents a Meta Data vault model that includes a data vault based data warehouse and a master data management. A major area of focus in this research is to keep both history of changes and a "single version of the truth," through an MDM, integrated with the DW. The paper also outlines the load patterns used to load data into the data warehouse and materialized views to deliver data to end-users. To test the proposed model, we have used big data sets from the biomedical field and for each modification of the data source schema, we outline the changes that need to be made to the EDW, the data marts and the ETL.
Keywords
Data Warehouse (DW), Enterprise Data Warehouse (EDW), Business Intelligence, Data Vault (DV), Business Data Vault, Master Data Vault, Master Data Management (MDM), Data Mart, Materialized View, Schema Evolution, Data Warehouse Evolution, ETL, Metadata Repository, Relational Database Management System (RDMS), NoSQL.
User
Font Size
Information
- D.Subotic, V. Jovanovic, and P. Poscic, Data Warehouse and Master Data management Evolution –a Meta-Data-Vault Approach. Issues in Information Systems, 15(Ii), 14–23, 2014.
- D.Linstedt, SuperCharge Your Data Warehouse: Invaluable Data Modeling Rules to Implement your Data Vault. Create Space Independent Publishing Platform, USA, 2011.
- A.Kitts, L. Phan, W. Minghong, and J. B. Holmes, The database of short genetic variation (dbSNP), The NCBI Handbook, (2nd), 2013. Available at http://www.ncbi.nlm.nih.gov/books/NBK174586/.
- H.Afify, M. Islam, and M. Wahed. DNA Lossless Differential Compression Algorithm Based on Similarity of Genomic Sequence Database, International Journal of Computer Science & Information Technology (IJCSIT), 2011.
- B.Feldman, E. Martin, and T. Skotnes. Genomics and the role of big data in personalizing the healthcare experience, 2013. Available at https://www.oreilly.com/ideas/genomics-and-the-role-of-big-data-in-personalizing-the-healthcare-experience.
- V.Jovanovic, and I. Bojicic. Conceptual Data Vault Model, Proceedings of the Southern Association for Information Systems Conference, Atlanta, GA, USA, 2012.
- D. Linstedt, and M. Olschimke, Building a scalable data warehouse with Data Vault 2.0, 2016.
- Z. Naamane, and V. Jovanovic, Effectiveness of Data Vault compared to Dimensional Data Marts on Overall Performance of a Data Warehouse System. IJCSI International Journal of Computer Science Issues, Volume 13, Issue 4, 2016
- N.Rahman, An empirical study of data warehouse implementation effectiveness. International Journal of Management Science and Engineering Management, 2017
- R.Kimball, and M. Ross, The data warehouse toolkit, 3rd edition, John Wiley, 2013.
- B.Shah, K. Ramachandran, and V. Raghavan, A Hybrid Approach for Data Warehouse View Selection.International Journal of Data Warehousing and Mining (IJDWM), 2006.
- M.Teschke, and A. Ulbrichl, Using materialized views to speed up data warehousing. University Erlangen-Nuremberg (IMMD VI), 1–16, 1997. Available at http://www.ceushb.de/forschung/downloads/TeUl97.pdf
Abstract Views: 392
PDF Views: 169