Project 1: Deaths by tuberculosis

by Michel Wermelinger & Rob Garnsey, 14 July 2015

This is the project notebook for Week 1 of The Open University's Learn to code for Data Analysis course.

In 2000, the United Nations set eight Millenium Development Goals (MDGs) to reduce poverty and diseases, improve gender equality and environmental sustainability, etc. Each goal is quantified and time-bound, to be achieved by the end of 2015. Goal 6 is to have halted and started reversing the spread of HIV, malaria and tuberculosis (TB). TB doesn't make headlines like Ebola, SARS (severe acute respiratory syndrome) and other epidemics, but is far deadlier. For more information, see the World Health Organisation (WHO) page http://www.who.int/gho/tb/en/.

Given the population and number of deaths due to TB in some countries during one year, the following questions will be answered:

  • What is the total, maximum, minimum and average number of deaths in that year?
  • Which countries have the most and the least deaths?
  • What is the death rate (deaths per 100,000 inhabitants) for each country?
  • Which countries have the lowest and highest death rate?

The death rate allows for a better comparison of countries with widely different population sizes.

The data

The data consists of total population and total number of deaths due to TB (excluding HIV) in 2013 in each of the BRICS and Portuguese-speaking countries.

The data was taken from http://apps.who.int/gho/data/node.main.POP107?lang=en (population) and http://apps.who.int/gho/data/node.main.593?lang=en (deaths). The uncertainty bounds of the number of deaths were ignored.

The data was collected into an Excel file which should be in the same folder as this notebook.

In [1]:
from pandas import *
data = read_excel('WHO POP TB all.xls')
data.sort('TB deaths')
Out[1]:
Country Population (1000s) TB deaths
147 San Marino 31 0.00
125 Niue 1 0.01
111 Monaco 38 0.03
3 Andorra 79 0.26
129 Palau 21 0.36
40 Cook Islands 21 0.41
118 Nauru 10 0.67
76 Iceland 330 0.93
68 Grenada 106 1.10
5 Antigua and Barbuda 90 1.20
113 Montenegro 621 1.20
152 Seychelles 93 1.40
105 Malta 429 1.50
143 Saint Kitts and Nevis 54 1.60
11 Bahamas 377 1.80
14 Barbados 285 2.00
144 Saint Lucia 182 2.20
99 Luxembourg 530 2.20
44 Cyprus 1141 2.30
174 Tonga 105 2.50
50 Dominica 72 2.70
137 Qatar 2169 2.70
179 Tuvalu 10 2.80
145 Saint Vincent and the Grenadines 109 3.10
126 Norway 5043 4.40
146 Samoa 190 6.10
121 New Zealand 4506 6.30
103 Maldives 345 7.60
12 Bahrain 1332 9.60
164 Suriname 539 12.00
... ... ... ...
160 South Sudan 11296 4500.00
119 Nepal 27797 4600.00
2 Algeria 39208 5100.00
193 Zimbabwe 14150 5700.00
184 United Republic of Tanzania 49253 6000.00
181 Ukraine 45239 6600.00
46 Democratic People's Republic of Korea 24895 6700.00
4 Angola 21472 6900.00
158 Somalia 10496 7700.00
31 Cameroon 22254 7800.00
170 Thailand 67010 8100.00
88 Kenya 44354 9100.00
163 Sudan 37964 9700.00
30 Cambodia 15135 10000.00
100 Madagascar 22925 12000.00
0 Afghanistan 30552 13000.00
141 Russian Federation 142834 17000.00
190 Viet Nam 91680 17000.00
115 Mozambique 25834 18000.00
159 South Africa 52776 25000.00
116 Myanmar 53259 26000.00
134 Philippines 98394 27000.00
58 Ethiopia 94101 30000.00
36 China 1393337 41000.00
47 Democratic Republic of the Congo 67514 46000.00
128 Pakistan 182143 49000.00
78 Indonesia 249866 64000.00
13 Bangladesh 156595 80000.00
124 Nigeria 173615 160000.00
77 India 1252140 240000.00

194 rows × 3 columns

In [2]:
popColumn = data['Population (1000s)']
popMax = max(popColumn)
popMax
popMin = popColumn.min()
popMin
popRange = popMax-popMin
popRange
Out[2]:
1393336
In [3]:
popColumn = data['Population (1000s)']
popMax = max(popColumn)
popMax
Out[3]:
1393337

The range of the problem

The column of interest is the last one.

In [ ]:
 
In [4]:
tbColumn = data['TB deaths']

The total number of deaths in 2013 is:

In [5]:
tbColumn.sum()
Out[5]:
1072677.97

The largest and smallest number of deaths in a single country are:

In [6]:
tbColumn.max()
Out[6]:
240000.0
In [7]:
tbColumn.min()
Out[7]:
0.0

From less than 20 to almost a quarter of a million deaths is a huge range. The average number of deaths, over all countries in the data, can give a better idea of the seriousness of the problem in each country. The average can be computed as the mean or the median. Given the wide range of deaths, the median is probably a more sensible average measure.

In [8]:
tbColumn.mean()
Out[8]:
5529.267886597938
In [9]:
tbColumn.median()
Out[9]:
315.0

The median is far lower than the mean. This indicates that some of the countries had a very high number of TB deaths in 2013, pushing the value of the mean up.

Everything else being equal, a country with a large population can be expected to have a larger number of deaths - from all causes including TB - than a smaller country. To allow for this we calculate the rate of TB deaths in each country per 100,000 head of population.

In [10]:
populationColumn = data['Population (1000s)']
data['TB deaths (per 100,000)'] = tbColumn * 100 / populationColumn
data.sort('TB deaths (per 100,000)')
Out[10]:
Country Population (1000s) TB deaths TB deaths (per 100,000)
147 San Marino 31 0.00 0.000000
111 Monaco 38 0.03 0.078947
126 Norway 5043 4.40 0.087250
120 Netherlands 16759 20.00 0.119339
137 Qatar 2169 2.70 0.124481
166 Sweden 9571 13.00 0.135827
121 New Zealand 4506 6.30 0.139814
185 United States of America 320051 490.00 0.153101
16 Belgium 11104 18.00 0.162104
32 Canada 35182 62.00 0.176226
8 Australia 23343 45.00 0.192777
113 Montenegro 621 1.20 0.193237
44 Cyprus 1141 2.30 0.201578
82 Israel 7733 16.00 0.206905
167 Switzerland 8078 17.00 0.210448
45 Czech Republic 10702 28.00 0.261633
76 Iceland 330 0.93 0.281818
60 Finland 5426 17.00 0.313306
43 Cuba 11266 37.00 0.328422
3 Andorra 79 0.26 0.329114
9 Austria 8495 29.00 0.341377
105 Malta 429 1.50 0.349650
65 Germany 82727 300.00 0.362639
81 Ireland 4627 18.00 0.389021
177 Turkey 74933 310.00 0.413703
99 Luxembourg 530 2.20 0.415094
48 Denmark 5619 24.00 0.427122
11 Bahamas 377 1.80 0.477454
86 Jordan 7274 35.00 0.481166
83 Italy 60990 310.00 0.508280
... ... ... ... ...
29 Cabo Verde 499 150.00 30.060120
58 Ethiopia 94101 30000.00 31.880639
4 Angola 21472 6900.00 32.134873
131 Papua New Guinea 7321 2400.00 32.782407
31 Cameroon 22254 7800.00 35.049879
106 Marshall Islands 53 21.00 39.622642
160 South Sudan 11296 4500.00 39.837110
193 Zimbabwe 14150 5700.00 40.282686
0 Afghanistan 30552 13000.00 42.550406
153 Sierra Leone 6092 2600.00 42.678923
39 Congo 4448 2000.00 44.964029
95 Lesotho 2074 960.00 46.287367
159 South Africa 52776 25000.00 47.370017
33 Central African Republic 4616 2200.00 47.660312
116 Myanmar 53259 26000.00 48.818040
96 Liberia 4294 2100.00 48.905449
13 Bangladesh 156595 80000.00 51.087199
100 Madagascar 22925 12000.00 52.344602
92 Lao People's Democratic Republic 6770 3600.00 53.175775
62 Gabon 1672 910.00 54.425837
117 Namibia 2303 1300.00 56.448111
30 Cambodia 15135 10000.00 66.072019
47 Democratic Republic of the Congo 67514 46000.00 68.134017
115 Mozambique 25834 18000.00 69.675621
71 Guinea-Bissau 1704 1200.00 70.422535
158 Somalia 10496 7700.00 73.361280
172 Timor-Leste 1133 990.00 87.378641
165 Swaziland 1250 1100.00 88.000000
124 Nigeria 173615 160000.00 92.157936
49 Djibouti 873 870.00 99.656357

194 rows × 4 columns

In [11]:
tbDRColumn = data['TB deaths (per 100,000)']
tbDRMean=tbDRColumn.mean()
tbDRMean
Out[11]:
13.988003982547117
In [12]:
tbDRColumn = data['TB deaths (per 100,000)']
tbDRMedian=tbDRColumn.median()
tbDRMedian
Out[12]:
4.223028743152806

Conclusions

When viewed on a per head-of-population basis it is clear that TB is primarily a disease of developing countries in Africa and South East Asia. Broadly speaking the worst afflicted countries are those still emerging from a history of exploitation, colonisation, poor governance and armed conflict. They are among the poorest countries in the world and their citizens can expect little in the way of government services. Effective health care for patients with TB is essential to halt the spread of the disease within communities.

The countries where TB is least prevalent are those of Western Europe and the Anglosphere, which are among the richest countries in the world, where public order prevails and where standards of housing and health care are high.

There are some cases that do not fit this generalisation. Cuba, for example, has both a colonial past and a history of conflict arising from that. Its people are very poor but its government has been effective in providing basic services such as education and health. TB is less of a problem in Cuba than in some European countries.

This pattern suggests further lines of enquiry about the conditions which contribute to the spread of TB and the strategies for stopping it. A further study could take into account other relevant variables, such as GDP, family income, education levels and health service provision.