Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Value Proposition and ETL Process in Big Data Environment


Affiliations
1 RVCE, Bangalore, Karnataka, India
     

   Subscribe/Renew Journal


For any retail company, managing inventory is of prime importance. Every store should have enough items so that it can fulfill the demand. To achieve this, the stores must be restocked before those items become out of stock. For restocking, the items must arrive from a fulfillment center which distributes the items to various stores, also called distribution centers. Since, distribution center and fulfillment centers are generally far from each other, there is a delay between request for restock and the time it takes for the item to reach from fulfillment centers to distribution centers. To prevent out of stock conditions, the request should be made by considering the time it takes for an item to arrive from fulfillment center. The quantity of item also determines the request time as only few quantities of large items can be sent at once and need multiple transits to restock to the required numbers. Along with these, there are other conditions like general traffic, seasonal climate variations, etc. that can affect the transit time of items. All of these conditions must be taken care while deciding when the item is requested. The proposed system decides the request time and quantity of items along with different variations by training from years of data. This allows the system to work more efficiently and prevent the out of stock conditions to increase sales of the company.

Keywords

Big Data, ETL Process, HDFS, SparkML, SparkSQL, Value Proposition.
Subscription Login to verify subscription
User
Notifications
Font Size


  • M. Bowman, S. K. Debray, and L. L. Peterson. “Reasoning about naming systems,” ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 15, no. 5, pp. 795-825, 1993.
  • M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia, “Spark SQL: Relational data processing in spark,” Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD’15, pp. 1383-1394, 2015.
  • R. Kimball, and J. Caserta, The Data Warehouse ETL Toolkit: Practical Techniques for Extracting Cleaning Conforming and Delivering Data, Wiley Publishing, Inc., 2017.
  • R. Kimball, and M. Ross, The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd ed., Jonh Wiley & Sons, Inc., 2017.
  • D. M. Tank, A. Ganatra, Y. P. Kosta, and C. K. Bhensdadia, “Speeding ETL processing in data warehouses using high-performance joins for Changed Data Capture (CDC),” pp. 365-368, October 2017.
  • G. Forman, “An extensive empirical study of feature selection metrics for text classification,” Journal of Machine Learning Research, vol. 3, pp. 1289-1305, March 2003.
  • I. Mekterovic, and L. Brkic, “Delta view generation for incremental loading of large dimensions in a data warehouse,” 2016 38th International Convention on Information and Communication Technology Electronics and Microelectronics (MIPRO), pp. 1417-1422, May 2016.
  • K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The hadoop distributed file system,” 2017 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1-10, May 2017.
  • T. Hey, S. Tansley, and K. Tolle, The Fourth Paradigm: Data Incentive Scientific Discovery, Microsoft Corporation, 2009.

Abstract Views: 270

PDF Views: 1




  • Value Proposition and ETL Process in Big Data Environment

Abstract Views: 270  |  PDF Views: 1

Authors

Prateek Kumar
RVCE, Bangalore, Karnataka, India
Veena Gaded
RVCE, Bangalore, Karnataka, India

Abstract


For any retail company, managing inventory is of prime importance. Every store should have enough items so that it can fulfill the demand. To achieve this, the stores must be restocked before those items become out of stock. For restocking, the items must arrive from a fulfillment center which distributes the items to various stores, also called distribution centers. Since, distribution center and fulfillment centers are generally far from each other, there is a delay between request for restock and the time it takes for the item to reach from fulfillment centers to distribution centers. To prevent out of stock conditions, the request should be made by considering the time it takes for an item to arrive from fulfillment center. The quantity of item also determines the request time as only few quantities of large items can be sent at once and need multiple transits to restock to the required numbers. Along with these, there are other conditions like general traffic, seasonal climate variations, etc. that can affect the transit time of items. All of these conditions must be taken care while deciding when the item is requested. The proposed system decides the request time and quantity of items along with different variations by training from years of data. This allows the system to work more efficiently and prevent the out of stock conditions to increase sales of the company.

Keywords


Big Data, ETL Process, HDFS, SparkML, SparkSQL, Value Proposition.

References