Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

A Survey on the Evolution of Models of Data Integration


Affiliations
1 Department of Computer and Information Systems, University of Aizu, Aizu-Wakamatsu, Japan
     

   Subscribe/Renew Journal


From time to time there have been different models of data integration to manage and analyze data. Also with the emergence of big data, the database community has proposed newer and better solutions to manage such disparate and large data. Also, the changes in the data storage models and massive data repositories on the web have encouraged the need for novel data integration models. In this article, we try to present a case of various trends in integrating data through different models. We present a brief overview of Federated Database Systems, Data Warehouse, Mediators and new proposed Polystore Systems with the evolution of architecture, query processing, distribution, automation and data models supported within those data integration models. The similarities and differences of these models are also presented. Also, the novelty of Polystore Systems with various examples is discussed. This article also highlights the importance of such system for integrating large scale heterogeneous data

Keywords

Data Integration, Multi-database Systems, Polystore Systems
Subscription Login to verify subscription
User
Notifications
Font Size


  • M. Ceriani, and P. Bottoni, “A dataflow platform for applications based on linked data,” International Journal of Computational Science and Engineering, vol. 16, no. 4, pp. 419-429, 2018.
  • C. R. Musick, T. Critchlow, M. Ganesh, T. Slezak, and K. Fidelis, “System and method for integrating and accessing multiple data sources within a data warehouse architecture,” U.S. Patent No. 7,152,070, Dec. 19, 2006.
  • A. P. Sheth, and J. A. Larson, “Federated database systems for managing distributed, heterogeneous, and autonomous databases,” ACM Computing Surveys, vol. 22, no. 3, pp. 183-236, 1990.
  • S. Suwanmanee, et al., “Wrapping and integrating heterogeneous databases with OWL,” 7th International Conference on Enterprise Information Systems (ICIES 2005), 2005.
  • V. Gadepally, P. Chen, J. Duggan, A. Elmore, B. Haynes, ......, and M. Stonebraker, “The BigDAWG polystore system and architecture,” 2016 IEEE High Performance Extreme Computing Conference (HPEC), IEEE, Waltham, MA, USA, Sep. 13-15, 2016.
  • M. Stonebraker, and U. Çetintemel, ““One size fits all”: An idea whose time has come and gone,” Making Databases Work: The Pragmatic Wisdom of Michael Stonebraker, 2018, pp. 441-462.
  • Z. She, S. Ravishankar, and J. Duggan, “BigDAWG polystore query optimization through semantic equivalences,” 2016 IEEE High Performance Extreme Computing Conference (HPEC), IEEE, Waltham, MA, USA, Sep. 13-15, 2016.
  • D. L. Moody, and M. A. R. Kortink, “From enterprise models to dimensional models: A methodology for data warehouse and data mart design,” Proceedings of the International Workshop on Design and Management of Data Warehouses (DMDW’2000), Stockholm, Sweden, Jun. 5-6, 2000.
  • S. Chaudhuri, and U. Dayal, “An overview of data warehousing and OLAP technology,” ACM Sigmod Record, vol. 26, no. 1, pp. 65-74, 1997.
  • G. J. L. Kemp, N. Angelopoulos, and P. M. D. Gray, “Architecture of a mediator for a bioinformatics database federation,” IEEE Transactions on Information Technology in Biomedicine, vol. 6, no. 2, pp. 116-122, 2002.
  • J. Duggan, A. J. Elmore, M. Stonebraker, M. Balazinska, B. Howe, ..., and S. Z. Brown, “The BigDAWG polystore system,” ACM Sigmod Record, vol. 44, no. 2, pp. 11-16, 2015.
  • Mohd. Saeed, M. Villarroel, A. T. Reisner, G. Clifford, L.-W. Lehman, ....., and R. G. Mark, “Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II): A public-access intensive care unit database,” Critical Care Medicine, vol. 39, no. 5, pp. 952-960, 2011.
  • M. Armbrust, et al., “Spark SQL: Relational data processing in spark,” Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015.
  • M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, ….., and I. Stoica, “Apache spark: A unified engine for big data processing,” Communications of the ACM, vol. 59, no. 11, pp. 56-65, 2016.
  • D. J. DeWitt, A. Halverson, R. Nehme, S. Shankar, ....., and J. Gramling, “Split query processing in polybase,” Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, 2013.
  • B. Kolev, P. Valduriez, C. Bondiombouy, R. Jimenez-Peris, R. Pau, and J. Pereira, “CloudMdsQL: Querying heterogeneous cloud data stores with a common language,” Distributed and Parallel Databases, vol. 34, no. 4, pp. 463-503, 2016.

Abstract Views: 217

PDF Views: 0




  • A Survey on the Evolution of Models of Data Integration

Abstract Views: 217  |  PDF Views: 0

Authors

Shashank Shrestha
Department of Computer and Information Systems, University of Aizu, Aizu-Wakamatsu, Japan
Subhash Bhalla
Department of Computer and Information Systems, University of Aizu, Aizu-Wakamatsu, Japan

Abstract


From time to time there have been different models of data integration to manage and analyze data. Also with the emergence of big data, the database community has proposed newer and better solutions to manage such disparate and large data. Also, the changes in the data storage models and massive data repositories on the web have encouraged the need for novel data integration models. In this article, we try to present a case of various trends in integrating data through different models. We present a brief overview of Federated Database Systems, Data Warehouse, Mediators and new proposed Polystore Systems with the evolution of architecture, query processing, distribution, automation and data models supported within those data integration models. The similarities and differences of these models are also presented. Also, the novelty of Polystore Systems with various examples is discussed. This article also highlights the importance of such system for integrating large scale heterogeneous data

Keywords


Data Integration, Multi-database Systems, Polystore Systems

References