__Could Increases in Influenza Vaccination Rates Give Rise to Exponentially-Related Corresponding Increases in Mortality Rates From Covid-19?__

Dr Grahame Blackwell
**A Pan-European Study incorporating data from over 350 million subjects**

See also this Pan-American Study incorporating data from 200 million subjects

__Summary__

A preliminary analysis based on a document published in the BMJ indicated that overall death rates from Covid-19 in different European countries
could be exponentially linked to those countries' influenza vaccination rates (percentages) for over-65s. [This is consistent with a Jan 2020 study in the journal *Vaccine* linking influenza vaccinations
with 36% increased susceptibility to coronavirus (pre-Covid-19).]

This follow-up in-depth analysis, involving over 350 million subjects, considers all European countries for which reliable vaccination data was available on the OECD website at time of preparation.

Data, links and instructions are provided to enable readers to check this analysis for themselves. The steps are as follows:

(1) Identify which data should and shouldn't be used, in accordance with statistical best practice;
calculate the *Correlation Coefficient*, R;

(2) Test the *null hypothesis* ("no actual connection") by checking the probability that this value for R could occur by chance;

(3) This probability turns out to be less than 0.00001 - so we must reject that hypothesis in favour of the *alternative hypothesis*:

"Overall mortality rates from Covid-19 **do** increase exponentially with increasing percentage vaccination rates of over-65s".

At this point we can add the information given by R^{2}, the *Coefficient of Determination*: the significance of this figure is given at the bottom of this page.

**Conclusion:** Rate of total deaths/million of population from Covid-19 rises exponentially with increasing rate of percentage influenza vaccinations for over-65s across Europe.
This could be cause-and-effect, or it could be due to some as-yet unidentified third factor linking these two.

It's difficult to imagine what that third factor might be, to give an exponential connection.

=============================

[See notes on Statistical Inferencing, in relation to this analysis, here.

This is the detailed version. Click here for the short version.

BMJ publication: Dr Allan Cunningham has collected data from reliable sources for 20 European countries. He's listed this in a
Rapid-Response document published in the BMJ. [See it here.]

Dr Cunningham suggests analysing these 20 data pairs to see whether there's any connection
between percentage of over-65s receiving the Influenza vaccine in a country and the death rate per million in that country from Covid-19.
That analysis can be seen here, with step-by-step instructions on how to do it yourself.
To avoid any suggestion of **selection bias** (cherry-picking sets of figures to give a preferred result), this analysis
is updated below, again with step-by-step instructions on how to do it yourself (no specialist expertise needed), using all the latest reliable data as at 1st August 2020.

Figures used here are taken from the OECD website (for over-65s vaccination percentages) and the Worldometer site for mortality rates per million population.
First we need to be sure that we have reliable properly-accredited figures. With regard to to over-65 percentage vaccination figures,
neither Austria nor Poland have a vaccination percentage documented for later than 2014; Turkey's latest figure is for 2016.
It seems possible, then, that any of these three figures could introduce an element of unreliability, so they should be omitted.

Another consideration is that of **outliers** - data points that lie so far out of the pattern set by the other points that they are clearly being heavily influenced by other factors,
and so can't be regarded as part of that pattern. Outliers can be identified to some degree by eye - looking at the **scattergram** of the graph points -
or more accurately by calculating whether they lie outside statistically-defined boundaries and so are most unlikely to be valid elements of the set.

First we need to identify the straight-line relationship between percentage influenza vaccination of over-65s and logs of mortality rates per million, over different countries
(using logs to base 10 in this case; any base will give equivalent results). That previous study over 20 countries made it quite clear that any relationship between vaccination
rates and death rates will be exponential, so relationship between vaccinations and **logs** of death rates will be linear - if this assumption is wrong then our results
will make that very clear by not giving a significant correlation.

Four points appear by eye to be outside of the general pattern: Greece, Slovakia, Belgium and Iceland (the four furthest-out points on the scattergram below); this may be why three of them were omitted from the earlier analysis.
For this in-depth study this requires further investigation, using well-defined principles by which outliers are identified.

***

Vacc %

72

62.7

68.5

54.3

52.7

60.8

52.2

51

52

39.8

49.5

34.8

24.1

38.2

21.5

12.5

12.9

14.8

7.7

10.2

59.1

49.4

56.2

72

62.7

68.5

54.3

52.7

60.8

52.2

51

52

39.8

49.5

34.8

24.1

38.2

21.5

12.5

12.9

14.8

7.7

10.2

59.1

49.4

56.2

Deaths/m

679

359

357

608

581

170

568

464

106

182

59

110

62

47

36

5

57

29

17

47

849

29

20

679

359

357

608

581

170

568

464

106

182

59

110

62

47

36

5

57

29

17

47

849

29

20

Log(d/m)

2.83187

2.555094

2.552668

2.783904

2.764176

2.230449

2.754348

2.666518

2.025306

2.260071

1.770852

2.041393

1.792392

1.672098

1.556303

0.69897

1.755875

1.462398

1.230449

1.672098

2.928908

1.462398

1.30103

2.83187

2.555094

2.552668

2.783904

2.764176

2.230449

2.754348

2.666518

2.025306

2.260071

1.770852

2.041393

1.792392

1.672098

1.556303

0.69897

1.755875

1.462398

1.230449

1.672098

2.928908

1.462398

1.30103

The columns of figures to the left give: percentage vaccinations of over-65s (for 2018, or in two cases 2017); deaths per million of total population; logs of those death rates (to base 10).
Stats are given for the 26 European countries listed by the OEDC, in the order they're listed (apart from Greece, marked ***), less the three countries noted above as having possibly unreliable figures:
Austria, Poland, Turkey - i.e. 23 sets of figures in all.

We can get a best straight-line fit for these 23 sets of figures here.
Simply copy-and-paste the 'Vacc %' column of figure into the 'XValues' box and the 'log(d/m)' figures into the 'YValues' box,
skip past the 'Estimate' box and press the 'Calculate' button. You'll get the Regression Equation: **y**(hat)** = 0.022X + 1.11802** .

You'll also see the graph shown here (but without the blue circle).

This shows the 'line of best fit' through the 23 data points.

That line has the equation: Y = 0.022X + 1.11802 .

The point circled in blue is significantly further from the best-fit line than any of the other points. You can confirm this simply by holding a ruler up to your computer screen:
the distance from that point to the line is half as much again as the distance from the next nearest point (the one down by '12' on the baseline).

This suggests that the circled point (for Greece) may be an outlier.

The next step is to check mathematically for any outliers.

[The following text, in maroon, can be skipped unless you're particularly interested in calculations for outliers.]

This involves first calculating distances of **all** the points from the line. It's not intended to cover the maths for that here,
those interested (and slightly mathematical!) can find the necessary info here.

To check for outliers we next have to calculate **upper** and **lower quartiles** for this set of distances, and from these the **interquartile range** (having first put those distances into order, either way round).
Again, info on these can be found
here:
in simple terms, upper and lower quartiles (UQ and LQ) are values that mark off the top and bottom values for the 'middle half' of the Y-values - i.e. cutting off the top quarter and the bottom quarter.
The interquartile range (IQR) is UQ - LQ; outliers are defined as values greater than UQ+1.5xIQR (or smaller than LQ-1.5xIQR, where small values could also show abnormal situations; that can't
apply here, as small values represent data points very close to the regression line). Note that where the number of gaps between lowest and highest point doesn't divide exactly by four,
it's necessary to **interpolate** (work out 'in-between' values) to calculate LQ and UQ.

**Long story short:** the point for Greece (circled) **does** turn out to be an outlier, no other points do.

So our correlation calculations shouldn't include the figures for Greece, as they will distort the result.

To find a **correlation coefficient**, and its level of significance, for the data for the remaining 22 countries, simply copy-and-paste
the 'Vacc %' column of figure into the 'XValues' box and the 'log(d/m)' figures into the 'YValues' box of
a statistical analysis tool you can find here, omitting the final figures for Greece, marked ***.

[Don't copy the X or Y, and be sure to copy all the numbers (apart from Greece) - preferably in one go for each column.]

Once you've got both sets of 22 figures into the 'X Value' and 'Y Value' boxes (you'll see slider bars on both boxes, that's ok)
just click on the 'Calculate R' button. This will produce the Pearson's Correlation Coefficient, referred to as 'R', for logs of death rates compared against over-65 vaccination percentages:
it should give you a value of 0.7975, which is high for a set of 22 data pairs (the word they use is '**strong**').

R can vary between +1 and -1. +1 is a 100% positive correlation between X and Y, meaning that as X goes up Y also goes up
in exact correspondence with it; -1 is a 100% negative correlation between X and Y, meaning that as X goes up Y goes **down**
in exact correspondence with it. Values in between show differing degrees of linking between X and Y, values nearer to zero showing
weaker correlation than those nearer +1 or -1.

Scroll down past the calculations (which you don't need - they've done them for you) to get a figure for statistical significance of this result:
click on the link that says: "Click here to calculate a p value". You'll be asked to input the R value (0.7975) and the number of data pairs (22)

[You can also choose a significance level if you like - if so, choose 0.01 - but we'd really need a 0.00001 option for the significance of this result!]

The calculator will give you a probability rating for this R for 22 data pairs: less than 0.00001. That's less than 1 in 100,000 (less than 10 in a million) likelihood
that this result could happen by chance - in other words, **a 99.999% probability that there's a significant link between over-65s
influenza vaccination rate and logs of Covid-19 death rate per million of population**.

To put that another way: these figures show that **there's a strong correlation between an increasing percentage of over-65s being given influenza vaccinations and an
exponentially increasing number of deaths per million (of total population) from Covid-19**. The likelihood of this being a coincidence (happening by chance) is

We can now re-calculate that straight-line fit of the log values (excluding Greece), and from that the exponential fit of the death rates against percentage vaccinations for those 22 data pairs.
First simply copy-and-paste the above 'Vacc %' column of figure into the 'XValues' box and the 'log(d/m)' figures into the 'YValues' box here,
as before, but omitting the *** figures for Greece. This gives us the straight-line graph below for death-rate log values against % vaccination figure and tells us that the equation for this graph is: **y**(hat)** = 0.02384X + 1.09086**.

If you wish (and you know how to) you can then use the graph facilities in Excel to produce the scatter-plot shown in the second graph below. Converting the equation above to the equivalent exponential equation (by taking both sides to the power 10),
we get:

Y = 10^{1.09086}_{*}10^{0.02384X} , which can be written more tidily as: Y =12.3271_{*}1.0564^{X} [_{*} is 'times'].

You could superimpose this line on the scattergraph (Excel again), as below - **or you could skip all that and just go to here**.
Here you'll find another stats tool where you can copy-&-paste the 'Vacc %' and 'deaths/m' figures in (omitting Greece) to see an identical graph to the one below.
This tool also confirms the equation for the exponential curve and the value for the Correlation Coefficient, R. It also gives the value of R^{2} - see below graphs for relevance of this.

__R ^{2}: The Coefficient of Determination__

R-Squared - which is literally the square of the Correlation Coefficient, R - is a measure of how much the variation in the dependent variable (in our case deaths/million) is related to variation in the independent variable (in our case % vaccination of over-65s). In this case R = 0.7975, giving R

[* Giving increased precision.]

Note the careful wording here: The stats do not tell us that higher vaccination rates are the

If this is so, then that