When facing events with increasing tendencies and global capacities, events disruptive to our health, to our society and the economy, managing them, analyzing them and helping in the process of finding solutions that will decrease these events and their impact, is crucial.




When facing events with increasing tendencies and global capacities, events disruptive to our health, to our society and the economy, managing them, analyzing them and helping in the process of finding solutions that will decrease these events and their impact, is crucial.

The worldwide outbreak of COVID-19 at the moment is reaching more than 81,100 people in China alone, where the outbreak originated, so the full number of confirmed cases globally is more than 185,041.

The number of people confirmed to have died as a result of the virus at the moment is more than 8000.

The virus has been declared a pandemic by the World Health Organization, meaning that it is spreading rapidly all around the world. So far, 156 countries have confirmed cases.

At the moment the epicenter of the virus is Europe, with the largest number of confirmed cases in Italy, approximately 31500 cases.

The number of deaths in Europe are growing even more rapidly than they did in China at the same stage of the outbreak. Our main objective, by showing this process of Compared Analysis and Forecasts, is to aggregate the existing research, bring together all of the relevant data, and allow everyone who reads this to understand how important is the early research of the coronavirus outbreak.

Data for this analysis is obtained from

Timeframe of the collected data

  • from 22 January 2020 to 17 March 2020

Software used in this analysis

  • Python 3.7.1
  • pandas 1.0.1
  • numpy 1.18.1
  • fbprophet 0.6


Import needed libraries

First we need to import every package which are necessary for the analysis and forecast

import pandas as pd import glob from fbprophet import Prophet


Reading files

Read all the comma separated (csv) files from each day and concatenate them into one big DataFrame in order to analyse the whole data(table)

df_all=pd.DataFrame() for name in glob.glob("*.csv"): df_temp=pd.read_csv(name) df_temp['Date']=name[-14:-4] frames=[df_all,df_temp] df_all=pd.concat(frames,sort=False) del(df_temp) df_all['Date']=pd.to_datetime(df_all['Date'])
<class 'pandas.core.frame.DataFrame'> Int64Index: 6438 entries, 0 to 6437 Data columns (total 9 columns): Province/State 3826 non-null object Country/Region 6438 non-null object Last Update 6438 non-null object Confirmed 6419 non-null float64 Deaths 5997 non-null float64 Recovered 6050 non-null float64 Date 6438 non-null datetime64[ns] Latitude 3620 non-null float64 Longitude 3620 non-null float64 dtypes: datetime64[ns](1), float64(5), object(3) memory usage: 503.0+ KB


Mend Discrepancies in Country Names

There are a lot of naming misconfigurations in 'Country/Region' column, so that issue should be fixed in order to have precise & clean data afterwards

df_all['Country/Region']=df_all.apply(lambda x: 'China' if x['Country/Region'] =='Mainland China' else x['Country/Region'],axis=1) df_all['Country/Region']=df_all.apply(lambda x: 'Iran' if x['Country/Region'] =='Iran (Islamic Republic of)' else x['Country/Region'],axis=1) df_all['Country/Region']=df_all.apply(lambda x: 'South Korea' if x['Country/Region'] =='Republic of Korea' else x['Country/Region'],axis=1) df_all['Country/Region']=df_all.apply(lambda x: 'South Korea' if x['Country/Region'] =='Korea, South' else x['Country/Region'],axis=1) df_all['Country/Region']=df_all.apply(lambda x: 'Taiwan' if x['Country/Region'] =='Taiwan*' else x['Country/Region'],axis=1) df_all['Country/Region']=df_all.apply(lambda x: 'Russia' if x['Country/Region'] =='Russian Federation' else x['Country/Region'],axis=1)

Make Distinction between China and other Countries

Mark China and Others separatelly since China Cases are almost at stopping point

df_all['Country']=df_all.apply(lambda x: 'China' if x['Country/Region'] =='China' else 'Other',axis=1)

Analysis on China vs Others

  • * Group by Country (China/Others) and Date
  • * Calculate Percentages
df_all_grouped=df_all.groupby(by=['Date','Country'])[['Confirmed','Recovered','Deaths']].sum().reset_index() df_all_grouped ['Percentage_Recovered']= df_all_grouped['Recovered']/ df_all_grouped ['Confirmed']*100 df_all_grouped ['Percentage_Dead']= df_all_grouped'Deaths']/ df_all_grouped ['Confirmed'] *100 df_all_grouped.head(10)
  Date Country Confirmed Recovered Deaths Percentage_Recovered Percentage_Dead
0 01-22-2020 China 547.0 28.0 17.0 5.118830 3.107861
1 01-22-2020 Other 8.0 0.0 0.0 0.000000 0.000000
2 01-23-2020 China 639.0 30.0 18.0 4.694836 2.816901
3 01-23-2020 Other 14.0 0.0 0.0 0.000000 0.000000
4 01-24-2020 China 916.0 36.0 26.0 3.930131 2.838428
5 01-24-2020 Other 25.0 0.0 0.0 0.000000 0.000000
6 01-25-2020 China 1399.0 39.0 42.0 2.787706 3.002144
7 01-25-2020 Other 39.0 0.0 0.0 0.000000 0.000000
8 01-26-2020 China 2062.0 49.0 56.0 2.376334 2.715810
9 01-26-2020 Other 56.0 3.0 0.0 5.357143 0.000000


Last Date Analysis

  • * Analyze only last/max date
  • * Grouped by Country/Region
df_all_last=df_all[df_all.Date==df_all.Date.max()] df_all_last=df_all_last.groupby(by=['Country/Region'])[['Confirmed','Deaths']].sum().reset_index() df_all_last['Percentage_Dead']=df_all_last['Deaths']/df_all_last['Confirmed']*100 df_all_last.sort_values(by=['Percentage_Dead'],ascending=False,inplace=True) df_all_last.head(10)
  Country/Region Confirmed Deaths Percentage_Dead
135 Sudan 1.0 1.0 100.000000
58 Guatemala 2.0 1.0 50.000000
61 Guyana 4.0 1.0 25.000000
148 Ukraine 7.0 1.0 14.285714
110 Philippines 142.0 12.0 8.450704
69 Iraq 124.0 10.0 8.064516
72 Italy 27980.0 2158.0 7.712652
2 Algeria 54.0 4.0 7.407407
10 Azerbaijan 15.0 1.0 6.666667
90 Martinique 15.0 1.0 6.666667



  • 1. We use linear model without daily seasonality since there are only dates and no yearly seasonality
  • 2. This version of FB prophet raises an error if columns are not named like ['ds', 'y']
  • 3. We predict next 10 days
df_forecast=df_all[['Confirmed', 'Date']] df_forecast.columns=['ds', 'y'] model = Prophet(daily.seasonality=False, yearly.seasonality=False) # future = model.make_future_dataframe(periods=10) forecast = model.predict(future) fig1 = model.plot(forecast)
Covid-19 Forecast


Confirmed Cases Trend Model

The Trend Model for the Confirmed Cases shows quite a signifigance (p-value is <0.0001) and the trend is >10000 new cases per day

Trend model



Visualizations are made with Tableau Desktop Public 2019.2.3

Confirmed cases on 15.03 in Other Countries have surpassed China (86444 vs 81003)

Confirmed Cases by Country 01.03.2020
Confirmed Cases by Country 17.03.2020

Confirmed Cases by Day show weird anomalies on 13.02.2020 (15133) and 13.03.2020 (16837) and also there is an evident trend >10000 cases per day in the last 5 days

Confirmed Cases by day

Difference from Previous Day on Reported Confirmed Cases shows quite an uprising in Other Countries

Confirmed Cases difference from previous day

Forecasts on Confirmed Cases show halt in China and (almost) exponential growth in other countries Top countries by cases besides China are Italy,Iran,South Korea, Spain, Germany, France and US They all show serious growth rate (>1) both on linear and logarithmic scale

30 Days Linear forecast on Confirmed Cases
10 Days Linear forecast on Confirmed Cases
10 Days Logarithmic Forecast
10 Days Linear Forecast on Top Countries
10 Days Logarithmic Forecast on Confirmed Cases in Top Countries


Recovered cases are quite big/good in China, but that is probably due to the fact that in China the outbreak was >1 month earlier.

In other countries the Recovered Percentage of Confirmed Cases oscillates ~10%

Recovered Cases
Recovered Cases by Day
Recovered Cases % from Confirmed Cases
Recovered Cases by Country 17.03.2020

The forecast on Recovered Cases predicts that China will be “clean” by the end of March 2020

10 Days Linear Forecast on Recovered Cases


Death cases are almost at halt in China, while in Other countries there is quite a growth with 3100+ deaths in the last 5 days, especially in Italy.

Deaths by Day
Deaths by Country 17.03.2020

Death Percentage is close to 4%, but in some Regions is quite higher (Italy 7.94 %, Iran 6.11 %)

Deaths % from Confirmed Cases
Deaths % from Confirmed Cases in Top Countries

Deaths forecast is a sad thing to do...

10 Days Linear Forecast on Deaths


What kind of benefit does compared analysis and forecast provide?

  • * One of the biggest advantages of comparing events over time is discovering trends, finding patterns, marking key points/influences and getting “in touch” with the actual situation trough the numbers or data.
  • * Accurate forecasting helps us reduce negative outcomes, schedule meaningful actions & avoid unnecessary actions, and finally managing any situation/case/problem better overall.

Why are the compared analysis and forecast important in situations like this?

In this dire times for the humanity as a whole, every (qu)bit of intelligence and wisdom we can find and share between each other, by any means and methods, will help in saving lives. What is more important than that?


The analysis and forecast are made with our software Crystal Qube™ V2.0.

For more info contact us at