Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Tool for Identification of Duplicate Records Downloaded from Multiple CD-ROMS. A Case Study with SPIRS Based Databases


Affiliations
1 National Centre for Science Information (NCSI), Indian Institute of Science (IISc), Bangalore 560012, India
     

   Subscribe/Renew Journal


As research becomes more and more interdisciplinary, literature search from CD-ROM databases is often carried out on more than one CD-ROM database. This results in retrieving duplicate records due to same literature being covered (indexed) in more than one database. The retrieval software does not identify such duplicate records. Three different programs have been written to accomplish the task of identifying the duplicate records. These programs are executed from a shell script to minimize manual intervention. The various fields that have been used (extracted) to identify the duplicate records include the article title, year, volume number, issue number and pagination. The shell script when executed prompts for input file that may contain duplicate records. The programs identify the duplicate records and write them to a new file.

Keywords

Searching Multiple CD-ROM Databases, Duplicate Records, SPIRS (Silver Platter Information Retrieval System) Databases.
User
About The Authors

Pradeep P. Kavi
National Centre for Science Information (NCSI), Indian Institute of Science (IISc), Bangalore 560012
India

Francis Jayakant
National Centre for Science Information (NCSI), Indian Institute of Science (IISc), Bangalore 560012
India


Notifications

Abstract Views: 316

PDF Views: 7




  • Tool for Identification of Duplicate Records Downloaded from Multiple CD-ROMS. A Case Study with SPIRS Based Databases

Abstract Views: 316  |  PDF Views: 7

Authors

Pradeep P. Kavi
National Centre for Science Information (NCSI), Indian Institute of Science (IISc), Bangalore 560012, India
Francis Jayakant
National Centre for Science Information (NCSI), Indian Institute of Science (IISc), Bangalore 560012, India

Abstract


As research becomes more and more interdisciplinary, literature search from CD-ROM databases is often carried out on more than one CD-ROM database. This results in retrieving duplicate records due to same literature being covered (indexed) in more than one database. The retrieval software does not identify such duplicate records. Three different programs have been written to accomplish the task of identifying the duplicate records. These programs are executed from a shell script to minimize manual intervention. The various fields that have been used (extracted) to identify the duplicate records include the article title, year, volume number, issue number and pagination. The shell script when executed prompts for input file that may contain duplicate records. The programs identify the duplicate records and write them to a new file.

Keywords


Searching Multiple CD-ROM Databases, Duplicate Records, SPIRS (Silver Platter Information Retrieval System) Databases.