A Novel Fragmentation Scheme for Textual Data Using Similarity-Based Threshold Segmentation Method in Distributed Network Environment

Sashi Tarun; Ranbir Singh Batth; Sukhpreet Kaur

doi:10.22247/ijcna/2020/205322

A Novel Fragmentation Scheme for Textual Data Using Similarity-Based Threshold Segmentation Method in Distributed Network Environment

Sashi Tarun ¹, Ranbir Singh Batth ¹, Sukhpreet Kaur ²

Affiliations
1 School of Computer Science and Engineering, Lovely Professional University, Phagwara, India
2 Department of Computer Science and Engineering, Chandigarh Engineering College, Mohali, India

Abstract
References
Article Metrics
Refbacks

Data distribution is one of the most essential architectures of any serving network. Data storage and its retrieval depend a lot on how the data is organized in the distributed environment. With the fast development of technology, the requirements of users have also changed. A user who was stationary earlier has become mobile now and requires access to the data from anywhere in the world. An unorganized data structure will result in output delay in the network and may further result in user migration from one service provider to another service provider. Data fragmentation is one of the most essential parts when it comes to data storage. Organized data always gives convenience to others to use it conveniently. Due to the vast collection of data extraction of information in a fast manner is very complicated. So, to achieve performance in a distributed system an optimal strategy is required to overcome previous lapses and serves the maximum number of users in a wide geographical network. This research paper proposes a novel relative based fragmentation method that analyses the attributes of the data in relative architecture and is helpful to achieve query performance with better speed and accuracy. To assess the current proposed work a comparison has been drawn between k-means dependent cosine similarity measurement and hybridization of cosine and soft-cosine partition methods for data partitioning. Mentioned results in the article shows that the proposed similarity-based threshold segmentation method outperforms the existing in terms of partitioning strategy, precision, and recall parameters to achieve performance.

Keywords

Fragmentation, K-Means, Similarity, Data Partitioning, Threshold, Segmentation, Precision, Recall.

I-Scholar

Journal Help

User

Notifications

Journal Content
Browse

Font Size

Information

Tarun S., Batth R. S. (2019). Distributed Database Design Challenges and its Countermeasures-A Study. Journal of the Gujarat Research Society 21 (6), pp. 875-886

S. Tarun, R. S. Batth and S. Kaur, "A Review on Fragmentation, Allocation and Replication in Distributed Database Systems," 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE), Dubai, United Arab Emirates, 2019, pp. 538-544, doi: 10.1109/ICCIKE47802.2019.9004233

R. Singh and K. S. Mann, “Improved TDMA Protocol for Channel Sensing in Vehicular Ad Hoc Network Using Time Lay,” Proceedings of 2nd International Conference on Communication, Computing and Networking Lecture Notes in Networks and Systems, pp. 303–311, 2018.

A. Nayar, R. S. Batth, D. B. Ha, and G. Sussendran, G. “Opportunistic networks: Present scenario-A mirror review” International Journal of Communication Networks and Information Security,” 10 (1), pp. 223-241, 2018.

G.S Shahi, R.S Batth, S. Egerton, 2020 “MRGM: An Adaptive Mechanism for Congestion Control in Smart Vehicular Network”, International Journal of Communication Networks and Information Security 12 (2).

Qi, H., & Gani, A. (2012, May). Research on mobile cloud computing: Review, trend and perspectives. In 2012 Second International Conference on Digital Information and Communication Technology and it's Applications (DICTAP), IEEE, pp. 195-202.

Venters, W., & Whitley, E. A. (2012). A critical review of cloud computing: researching desires and realities. Journal of Information Technology, 27(3), pp. 179-197.

Borkar, V., Deshmukh, K., & Sarawagi, S. (2001, May). Automatic Segmentation of text into structured records. In Proceedings of the 2001 ACM SIGMOD international conference on Management of data, pp. 175-186.

Santini, S., & Jain, R. (1999). Similarity measures. IEEE Transactions on pattern analysis and machine Intelligence, 21(9), pp. 871-883.

Huang, A. (2008, April). Similarity measures for text document clustering. In Proceedings of the sixth new zealand computer science research student conference (NZCSRSC2008), Christchurch, New Zealand ,Vol. 4, pp. 9-56.

Sidorov, G., Gelbukh, A., Gómez-Adorno, H., & Pinto, D. (2014). Soft similarity and soft cosine measure: Similarity of features in vector space model. Computación y Sistemas, 18(3), pp. 491-504.

Michie, D., Spiegelhalter, D. J., & Taylor, C. C. (1994). Machine learning. Neural and Statistical Classification, 13(1994), pp. 1-298.

Ikonomakis, M., Kotsiantis, S., & Tampakas, V. (2005). Text classification using machine learning techniques. WSEAS transactions on computers, 4(8), pp. 966-974.

Verma and A. Kumar, “Performance Enhancement of K-Means Clustering Algorithms for High Dimensional Data sets”, International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 4, No. 1,pp.5-9, 2014.

Z.Tao, H. Liu, H. Fu and Y.Fu, “Image Co-segmentation via Saliency-Guided Constrained Clustering with Cosine Similarity”, AAAI,pp. 4285-4291,2017

X. Gu, H. Zhang and S. Kim, “Deep code search”, In Proceedings of the 40th International Conference on Software Engineering, ACM,pp. 933-944,2018.

W. L Xiang, Y. Z. Li, R. C. He, M.X. Gao, M.Q An, “A novel artificial bee colony algorithm based on the cosine similarity”, Computers & Industrial Engineering, Vol. 115, pp.54-68, 2018.

Wiese, L. (2014). Clustering-based fragmentation and data replication for flexible query answering in distributed databases. Journal of Cloud Computing 3, 18. https://doi.org/10.1186/s13677-014-0018-0

Ali A. Amer, Adel A. Sewisy, Taha M.A. Elgendy. (2017). An optimized approach for simultaneous horizontal data fragmentation and allocation in Distributed Database Systems (DDBSs). Heliyon 3 e00487. doi: 10.1016/j.heliyon.2017. e00487

Abdalla, H., &Artoli, A. M. (2019). Towards an efficient data fragmentation, allocation, and clustering approach in a distributed environment. Information, 10(3), 112. https://doi.org/10.3390/info10030112

Rahimi, H., Parand, F. A., & Riahi, D. (2018). Hierarchical simultaneous vertical fragmentation and allocation using modified Bond Energy Algorithm in distributed databases. Applied computing and informatics, 14(2), pp. 127-133. https://doi.org/10.1016/j.aci.2015.03.001

Lim, S., Ng, Y. (2001). A Hybrid Fragmentation Approach for Distributed Deductive Database Systems. Knowledge and Information Systems 3, pp. 198–224. https://doi.org/10.1007/PL00011666

Khan S. I., (2016). Efficient Partitioning of Large Databases without Query Statistics”, Database System Journal, pp. 34-53.

Peng, P., Zou, L., Chen, L., & Zhao, D. (2019). Adaptive distributed RDF graph fragmentation and allocation based on query workload. IEEE Transactions on Knowledge and Data Engineering, 31(4),pp.670-685. https://doi.org/10.1109/TKDE.2018.2841389

Aloini, D., Benevento, E., Stefanini, A., & Zerbino, P. (2020). Process fragmentation and port performance: Merging SNA and text mining. International Journal of Information Management, 51,101925.https://doi.org/10.1016/j.ijinfomgt.2019.03.012

Memmi, G., Kapusta, K., & Qiu, H. (2015, August). Data protection: Combining fragmentation, encryption, and dispersion. In 2015 International Conference on Cyber Security of Smart Cities, Industrial Control System and Communications (SSIC) (pp.1-9). IEEE. https://doi.org/10.1109/SSIC.2015.7245680

Links: https://www.kaggle.com/soaxelbrooke/first-inbound-and-response-tweets/data?select=sample.csv

Links: https://gist.github.com/larsyencken/1440509

Lende, S. P., &Raghuwanshi, M. M. (2016, February). Question answering system on education acts using NLP techniques. In 2016 world conference on futuristic trends in research and innovation for social welfare (Startup Conclave) (pp. 1-6). IEEE.

Zeyu, X., Qiangqian, S., Yijie, W., & Chenyang, Z. (2018). Paragraph vector representation based on word to vector and CNN learning. Computers, Materials & Continua, 55(2), pp. 213-227.

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

Bartunov, S., Kondrashkin, D., Osokin, A., &Vetrov, D. (2016, May). Breaking sticks and ambiguities with adaptive skip-gram. In artificial intelligence and statistics, pp. 130-138.

H. Guo, J. Zhou and C.A. Wu (2018), “Imbalanced Learning Based on Data-Partition and SMOTE”, Information, Vol. 9, No. 9, pp. 238.

Kaur K., Laxmi V. (2019), “Hierarchical Clustering Based Improved Data Partitioning using Hybrid Similarity Measurement Approach”, International Journal of Innovative Technology and Exploring Engineering, Volume-8 Issue-8, pp. 3008-2014.

Abstract Views: 351

PDF Views: 0

A Novel Fragmentation Scheme for Textual Data Using Similarity-Based Threshold Segmentation Method in Distributed Network Environment

Abstract Views: 351 | PDF Views: 0

Authors

Sashi Tarun
School of Computer Science and Engineering, Lovely Professional University, Phagwara, India

Ranbir Singh Batth
School of Computer Science and Engineering, Lovely Professional University, Phagwara, India

Sukhpreet Kaur
Department of Computer Science and Engineering, Chandigarh Engineering College, Mohali, India

Abstract

Keywords

Fragmentation, K-Means, Similarity, Data Partitioning, Threshold, Segmentation, Precision, Recall.

References

DOI: https://doi.org/10.22247/ijcna%2F2020%2F205322

Username
Password
Remember me

Username
Password
Remember me

International Journal of Computer Networks and Applications

International Journal of Computer Networks and Applications

A Novel Fragmentation Scheme for Textual Data Using Similarity-Based Threshold Segmentation Method in Distributed Network Environment

Keywords

A Novel Fragmentation Scheme for Textual Data Using Similarity-Based Threshold Segmentation Method in Distributed Network Environment

Authors

Abstract

Keywords

References