Open Access Open Access  Restricted Access Subscription Access

A Survey and Classification of Publicly Available COVID-19 Datasets


Affiliations
1 Associate Professor, Documentation Research and Training Centre, Indian Statistical Institute, Bangalore, Karnataka, India
2 Project Linked Personnel, Documentation Research and Training Centre, Indian Statistical Institute, Bangalore, Karnataka, India
3 Professor, Machine Intelligence Unit, Indian Statistical Institute, Kolkata, West Bengal, India
 

The current study curates a list of authentic and open-access sources of alphanumeric COVID-19 pandemic data. We have gathered 74 datasets from 42 sources, including sources from 18 countries. The datasets are searched through the Kaggle and GitHub repositories besides Google, providing a representation of varieties of pandemic-related datasets. The datasets are categorized according to their sources- primary and secondary, and according to their geographical distribution. While analyzing the dataset, we came across some classes in which the datasets can be categorized. We present the categorization in the form of taxonomy and highlight the present COVID-19 data collection and use challenges. The study will help researchers and data curators in the identification and classification of pandemic data.

Keywords

COVID-19, Classification, Curation, Datasets, Metadata.
User
Notifications
Font Size


  • A Survey and Classification of Publicly Available COVID-19 Datasets

Abstract Views: 267  |  PDF Views: 122

Authors

Biswanath Dutta
Associate Professor, Documentation Research and Training Centre, Indian Statistical Institute, Bangalore, Karnataka, India
Puranjani Das
Project Linked Personnel, Documentation Research and Training Centre, Indian Statistical Institute, Bangalore, Karnataka, India
Sushmita Mitra
Professor, Machine Intelligence Unit, Indian Statistical Institute, Kolkata, West Bengal, India

Abstract


The current study curates a list of authentic and open-access sources of alphanumeric COVID-19 pandemic data. We have gathered 74 datasets from 42 sources, including sources from 18 countries. The datasets are searched through the Kaggle and GitHub repositories besides Google, providing a representation of varieties of pandemic-related datasets. The datasets are categorized according to their sources- primary and secondary, and according to their geographical distribution. While analyzing the dataset, we came across some classes in which the datasets can be categorized. We present the categorization in the form of taxonomy and highlight the present COVID-19 data collection and use challenges. The study will help researchers and data curators in the identification and classification of pandemic data.

Keywords


COVID-19, Classification, Curation, Datasets, Metadata.

References