by Michel Wermelinger & Rob Garnsey, 14 July 2015
This is the project notebook for Week 1 of The Open University's Learn to code for Data Analysis course.
In 2000, the United Nations set eight Millenium Development Goals (MDGs) to reduce poverty and diseases, improve gender equality and environmental sustainability, etc. Each goal is quantified and time-bound, to be achieved by the end of 2015. Goal 6 is to have halted and started reversing the spread of HIV, malaria and tuberculosis (TB). TB doesn't make headlines like Ebola, SARS (severe acute respiratory syndrome) and other epidemics, but is far deadlier. For more information, see the World Health Organisation (WHO) page http://www.who.int/gho/tb/en/.
Given the population and number of deaths due to TB in some countries during one year, the following questions will be answered:
The death rate allows for a better comparison of countries with widely different population sizes.
The data consists of total population and total number of deaths due to TB (excluding HIV) in 2013 in each of the BRICS and Portuguese-speaking countries.
The data was taken from http://apps.who.int/gho/data/node.main.POP107?lang=en (population) and http://apps.who.int/gho/data/node.main.593?lang=en (deaths). The uncertainty bounds of the number of deaths were ignored.
The data was collected into an Excel file which should be in the same folder as this notebook.
from pandas import *
data = read_excel('WHO POP TB all.xls')
data.sort('TB deaths')
popColumn = data['Population (1000s)']
popMax = max(popColumn)
popMax
popMin = popColumn.min()
popMin
popRange = popMax-popMin
popRange
popColumn = data['Population (1000s)']
popMax = max(popColumn)
popMax
The column of interest is the last one.
tbColumn = data['TB deaths']
The total number of deaths in 2013 is:
tbColumn.sum()
The largest and smallest number of deaths in a single country are:
tbColumn.max()
tbColumn.min()
From less than 20 to almost a quarter of a million deaths is a huge range. The average number of deaths, over all countries in the data, can give a better idea of the seriousness of the problem in each country. The average can be computed as the mean or the median. Given the wide range of deaths, the median is probably a more sensible average measure.
tbColumn.mean()
tbColumn.median()
The median is far lower than the mean. This indicates that some of the countries had a very high number of TB deaths in 2013, pushing the value of the mean up.
Everything else being equal, a country with a large population can be expected to have a larger number of deaths - from all causes including TB - than a smaller country. To allow for this we calculate the rate of TB deaths in each country per 100,000 head of population.
populationColumn = data['Population (1000s)']
data['TB deaths (per 100,000)'] = tbColumn * 100 / populationColumn
data.sort('TB deaths (per 100,000)')
tbDRColumn = data['TB deaths (per 100,000)']
tbDRMean=tbDRColumn.mean()
tbDRMean
tbDRColumn = data['TB deaths (per 100,000)']
tbDRMedian=tbDRColumn.median()
tbDRMedian
When viewed on a per head-of-population basis it is clear that TB is primarily a disease of developing countries in Africa and South East Asia. Broadly speaking the worst afflicted countries are those still emerging from a history of exploitation, colonisation, poor governance and armed conflict. They are among the poorest countries in the world and their citizens can expect little in the way of government services. Effective health care for patients with TB is essential to halt the spread of the disease within communities.
The countries where TB is least prevalent are those of Western Europe and the Anglosphere, which are among the richest countries in the world, where public order prevails and where standards of housing and health care are high.
There are some cases that do not fit this generalisation. Cuba, for example, has both a colonial past and a history of conflict arising from that. Its people are very poor but its government has been effective in providing basic services such as education and health. TB is less of a problem in Cuba than in some European countries.
This pattern suggests further lines of enquiry about the conditions which contribute to the spread of TB and the strategies for stopping it. A further study could take into account other relevant variables, such as GDP, family income, education levels and health service provision.