December 1, 2021

The Science of Preventing Pandemics With Data

The Science of Preventing Pandemics With Data
Share article

COVID-19 has now become one of the deadliest pandemics of 21st Century. As on April 13, 2020, around 1.8 million people got infected and more than 1,00,000 people died across the globe due to the novel coronavirus disease. (Situation Report-84, published on April 13, 2020 by WHO). The scariest part is that the numbers are still growing exponentially. The following graph shows the trend of COVID-19 confirmed cases across different WHO regions.

Global communities are fighting hard to curb this pandemic. On one hand, medical and paramedical workers are spending the major chunk of their time providing care to those who are tested positive. On the other hand, administrative authorities are on ground to implement necessary guidelines issued by respective governments. Vaccine development research labs are spending a hell out of their resources to find the cure. To cater to the current scenario, the data science community is also actively participating in this fight for humanity.

Allen Institute for AI in partnership with the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research and the National Library of Medicine — National Institutes of Health, in coordination with The White House Office of Science and Technology Policy have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 51,000 scholarly articles, including over 40,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. The dataset represents the most extensive machine-readable coronavirus literature collection available for data mining to date. This dataset is freely available on Kaggle under COVID-19 Open Research Dataset Challenge (CORD-19). The objective is to call the global community of data science and artificial intelligence (AI) to apply text and data mining tools and techniques to answer high priority 9 scientific questions drawn from research topics of National Academies of Sciences, Engineering and Medicine’s Standing Committee on Emerging Infectious Diseases and 21st Century Health Threats (NASEM’s SCIED) and R&D Blueprint for COVID-19 WHO. The questions are as follows:

  1. What is known about transmission, incubation and environmental stability?
  2. What do we know about COVID-19 risk factors?
  3. What do we know about virus genetics, origin and evolution?
  4. What do we know about vaccines and therapeutics?
  5. What has been published about medical care?
  6. What do we know about non-pharmaceutical interventions?
  7. What do we know about diagnostics and surveillance?
  8. What has been published about ethical and social science considerations?
  9. What has been published about information sharing and inter-sectoral collaboration?

There is a tremendous response from the data science and AI community in answering the above-mentioned questions. Many submissions are being made across each question and major findings are being summarized on theweb portal.

The Ministry of Health & Family Welfare, Government of India also launchedCOVID-19 Solution Challenge where it appeals to data scientists to provide insights that can help in preventing the pandemic.

Other than taking on these challenges, some of the data scientists have also been actively involved in generating insights using the available data such as identifying trends of coronavirus cases and predicting the number of probable cases in subsequent days. This is helping authorities to arrange the necessary resources beforehand required for fighting this deadly and invisible enemy. Through simulations generated by them, the common man now understands the importance of social distancing in flattening the curve and ultimately curbing it.

As a data scientist, I believe that the data science community can help the world not only in emerging stronger out of the current mayhem caused by the COVID-19 pandemic but also in signalling about the probability of any similar outbreak that is likely to happen in the near future. With current infrastructure, tools, advancement in algorithms and availability of data, we are in an era of predictive and prescriptive analytics along with more interactive descriptive analytics. COVID-19 has given us the much-needed push and we should not stop here. It’s high time that we make huge investments in data science research, primarily focusing on exploring the possibility of predictive models, which will enable us to predict pandemics beforehand and use prescriptive analytics in recommending ways to prevent them.


COVID-19 Open Research Dataset Challenge (CORD-19)

The Author

Sachin Kumar graduated from Tier-1 educational institutes in India. He has over 5 years of experience in Data Science & Analytics, Management & Consulting, Research & Development across various industries. Today, as a Data Scientist at Games24x7, he helps the company grow through the application of data tools and techniques.