Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Data-Deduplication in Linux Kernel File-System


Affiliations
1 Department of Computer Engineering, MIT's College of Engineering, Kothrud, Pune-38, India
     

   Subscribe/Renew Journal


The Data Deduplication is basically a compression technique to eliminate redundant data from hard disk or storage space to efficiently use the storage space. As in every operating system the storage space is manage by file system or we can say data is stored on secondary storage space by file system. So we are modifying the file system so that it can eliminate the redundant block of data before storing to the secondary space which is also called as Inline Data Deduplication. Ext4 is latest file system which is used in Linux, which is having so many new features, so we are modifying Ext4 and adding this one more feature called as Data Deduplication. In our method Inline data deduplication we create a table to store a hash key, and the corresponding block number, which contains the data for that hash key. The hash key is generated using sha1 algorithm. Every time whenever the new data comes it is given to sha1 before allocating any blocks for it and the key is generated. Then this key is compare with already stored keys in the table, it the key is already present then in that case only the corresponding counter of the key is modified or incremented, this counter is basically used to keep track of count of pointers that are pointing to block on the physical device. Whenever the key is not present in that case key is stored and the control is passed to superblock which allocates the free blocks, from the list which it contains and then returns the allocated block numbers to table where they are stored corresponding there key and the counter is also incremented. So by using this method we can eliminate redundant allocation of data blocks, as result we can save the space and increase the efficiency of the storage space. This is how enterprises and big organization can save space as there data is growing exponentially in their field. An also as this method is block level elimination it elimination ratio is also good and good save of storage space.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 238

PDF Views: 2




  • Data-Deduplication in Linux Kernel File-System

Abstract Views: 238  |  PDF Views: 2

Authors

Amit Savyanavar
Department of Computer Engineering, MIT's College of Engineering, Kothrud, Pune-38, India
Sachin Katarnaware
Department of Computer Engineering, MIT's College of Engineering, Kothrud, Pune-38, India
Pritam Bankar
Department of Computer Engineering, MIT's College of Engineering, Kothrud, Pune-38, India
Prashant Jadhav
Department of Computer Engineering, MIT's College of Engineering, Kothrud, Pune-38, India
Nikhil Bagde
Department of Computer Engineering, MIT's College of Engineering, Kothrud, Pune-38, India

Abstract


The Data Deduplication is basically a compression technique to eliminate redundant data from hard disk or storage space to efficiently use the storage space. As in every operating system the storage space is manage by file system or we can say data is stored on secondary storage space by file system. So we are modifying the file system so that it can eliminate the redundant block of data before storing to the secondary space which is also called as Inline Data Deduplication. Ext4 is latest file system which is used in Linux, which is having so many new features, so we are modifying Ext4 and adding this one more feature called as Data Deduplication. In our method Inline data deduplication we create a table to store a hash key, and the corresponding block number, which contains the data for that hash key. The hash key is generated using sha1 algorithm. Every time whenever the new data comes it is given to sha1 before allocating any blocks for it and the key is generated. Then this key is compare with already stored keys in the table, it the key is already present then in that case only the corresponding counter of the key is modified or incremented, this counter is basically used to keep track of count of pointers that are pointing to block on the physical device. Whenever the key is not present in that case key is stored and the control is passed to superblock which allocates the free blocks, from the list which it contains and then returns the allocated block numbers to table where they are stored corresponding there key and the counter is also incremented. So by using this method we can eliminate redundant allocation of data blocks, as result we can save the space and increase the efficiency of the storage space. This is how enterprises and big organization can save space as there data is growing exponentially in their field. An also as this method is block level elimination it elimination ratio is also good and good save of storage space.