This is an exploratory analysis of demographic data from Vall de Almonacid (https://commons.wikimedia.org/wiki/File:VGeneral_VallAlmonacid.jpg). Vall de Almonacid (Spain) is a small village on the Valencian region. Of Muslim origin, it was conquered, refounded and repopulated with Catholics from the Aragon Kingdom in 1238.
In the church there are several record books where births, marriages and funerals were recorded by the church prist. In the 1980s these records were taken by Josep Maria Perez Rodriguez who copied them by hand with the help of Jose Maria Perez Rodriguez, Maria Rovira Barbera, Pilar Perez Rodriguez, and some others.
The hand-copied files were later digitalised (by hand, again) and a copy of these files is the starting point of this analysis. Not all data was digitalised and many typos and errors are present in the dataset.
For this analysis, I opted for Python (v.3.5.2+) as a flexible yet user-friendly tool to explore data.
I have previosuly done some visualisation experiments with this data using d3.js that you can find at https://fuyas.github.io/nacimientos/ (in Spanish).
In addition, there is a small report study authored by Josep Maria Perez Rodriguez about this demographic data (http://perezrovira.net/adria/pdfs/icap_demografia_JM.pdf, in Spanish).
All modules are commonly used modules
In [1]:
# plot figures inline in the IPython notebook
%matplotlib inline
import csv
import numpy as np
import datetime
import matplotlib.pyplot as plt
from collections import Counter
# set width of figures by default
plt.rcParams['figure.figsize'] = (14.0, 6.0)
To import the data I created a small function that reads the CSV files (there are two, one for baptisms and one for funerals). The function returns the header (name of the fields stored on the first line of the CSV file), a numpy 2D array with all fields as strings, and a second copy with all fields capitalised. The second copy is used to compare strings, where capitalisation might differ due to the poor quality of data. In addition to read the data the function prints a list of all the fields, detailing the number of entries for each field and the value and frequency of the most common entry.
In [2]:
def readDB( str_file ):
file = open(str_file, "r")
reader = csv.reader(file)
rownum = 0
allData = []
ALLDATA = []
for iRow, row in enumerate(reader):
# Header row.
if iRow == 0:
header = row
else:
allData.append( row )
ALLDATA.append( [x.upper() for x in row] )
file.close()
# put all data in numpy
npData = np.array(allData)
NPDATA = np.array(ALLDATA)
print ( "\nNumber of Entries: %4d" % npData.shape[0] )
print ( "Number of Columns: %4d\n" % npData.shape[1] )
# Check number of entries per column and the most common value
print ( " # - Unique - Most common" )
for idx in range(len(header)):
vec = NPDATA[:,idx]
vec2 = vec [ NPDATA[:,idx] != '' ]
if len(vec2) > 0:
count = Counter(vec2)
print ( "[%2s] %-15s: %4d - %6d - ( %4d | %s )" % (idx, header[idx], vec2.size, len(count), count.most_common(1)[0][1], count.most_common(1)[0][0]) )
else:
print ( "[%2s] %-15s: %4d" % (idx, header[idx], vec2.size) )
return [header, npData, NPDATA]
In [3]:
[headerBirth, npBirthData, NPBIRTHDATA] = readDB ('data/allNacimientos.csv')
The births CSV file contains 5237 entries with 42 columns. The number of entries for each column varies widely and 12 contain no data at all. We can already see that Rodriguez is the most common surname and Manuel the most common name. Looking at "[20] NOMMADRE" we can see that the most common name for mothers, hence women, is Maria.
Other interesting data, albeit expected, is that the most common job "[25] OFICIOPAD" is "LABRADOR" (english: farmer) while the most common job for the padrino (Godfather) is CIRUJANO (surgeon). From this it is possible to infer that the Godfather role was most commonly taken by a higher social class person.
In addition there are 1798 comments "observaciones" with rich, yet unstructured, information, for example:
In [4]:
print( npBirthData[427,39] )
English: I baptised sup condicione a boy with unknown parents who was found at the door of Felipe Torres at midnight.
In [5]:
print( npBirthData[4013,39] )
English: Was baptised at birth by her grandmother Theodora Pitarque and died immediately. I examined the intention and correctness of the baptism and found that it was well performed.
In [6]:
[headerDeath, npDeathData, NPDEATHDATA] = readDB ('data/allDefunciones.csv')
Again, the most common surname is Rodriguez, given names are Manuel and Maria, and the most common job is "labrador". At this point we can see that the data is very incomplete with only 334 job entries out the 4330 subjects. However names of parents are plentiful (~2500, so we can hope to reconstruct some genealogy tree later on) and there are also observaciones (comments) in 2407 entries, which are quite interesting:
In [7]:
print( npDeathData[1051,17] )
English: IG, HF (Hijo de Familia, denotes a teenager), No Sacrament: Died suddenly without remedy because he fell in a wine press full of wine.
In [8]:
print( npDeathData[3794,17] )
English: Was found murdered the on 14th in the "Mojonada" in this municipality. He was from Segore (Segorbe is the largest town in the region, 8Km away from Vall de Almonacid)
In [9]:
print( npDeathData[2825,17] )
English: From Altura (another village, 10 km away) who was in this village fleeing from the French. A regular funeral was performed paid by his children.
An yes, that was during the Spanish-French war (https://en.wikipedia.org/wiki/Peninsular_War):
In [10]:
print( npDeathData[2825,8] )
In [11]:
print( npDeathData[524,17] )
English: Being mortally ill she was found drowned on the fields watering pond. The village doubted if she was to be buried on the graveyard or not (suicides would prevent people to get buried there) but the decision was to be taken by the bishop and the church. The general official of the bishop then instructed to put the body in a plank box and leave it to the "Rincon del pico espadan", at the edge of the municipality, until the witnesses could testify and the investigation was concluded. Witnesses testified that they had seen her confessing with the priest for 2 hours and the priest detailed that she had confesses about her entire life. Theologians consulted the doctor that had attended her and he explained that her illness induced her delirium. With this information the general vicar instructed to toll for the dead and that she was to have sang mass with the deceased's body present. In addition, once the body had been converted, the priest was to go personally to the "Rincon del pico espadan" and perform the funeral duties solemnly, accompanying the body until she was buried in the graveyard with all other Catholics.
Now that we have an idea of what is in the data, let's see what can we learn from it.
To check the most common entries of each field I wrote the following function, which given a list of entries relies on the Counter structure to sort the items in the list and print them in order, with the number of occurrences in brackets:
In [12]:
def printMostCommon( vec, n=float("inf") ):
vec2 = vec [ vec != '' ]
count = Counter(vec2)
print ("Most common entries:")
for idx in range(len(count)):
if idx > n-1:
break
print ( "%4d : %s" % (count.most_common()[idx][1], count.most_common()[idx][0]) )
A quick look to jobs show that something is amiss.
In [13]:
printMostCommon( NPDEATHDATA[:,18], 15 )
So followed by farmer, the most common job was... farmer (but the female version of it)!
Nevertheless something is amiss here, as the 3rd, 4th, and 5th most common jobs are a hospital. The most plausible explanation though is that these were kids that had been taken from a hospital (couples who couldn't have kids, or only had girls, usually did that back in the day) and died young, and the priest (who wrote these records) opted to mention that in the job field.
Notice that beside farmer, another "popular" job was "LABRADOR POBRE" (Poor farmer).
In [14]:
printMostCommon( NPDEATHDATA[:,13], 15 )
Looking at the 15 most common names we can see Joseph and Josef add to 272 occurrences, overtaking Manuel as the most common name. And that is without counting all the Jose and compound names so popular in Spain.
It is common in Spain to bury the deceased in the few immediate days after death. That contrasts with other Catholic countries, like Ireland, where burials can happen up to a week later than the death.
We can check the distance between death and burial, as we have these two dates in NPDEATHDATA[:,8] and NPDEATHDATA[:,11]. For this we have to use the try instruction to avoid getting errors when there is no date string or this is uncomplete.
In [15]:
diffBurial = []
for ii, defuncion in enumerate(NPDEATHDATA[:,8]):
if (defuncion != '') and (NPDEATHDATA[ii,11] != ''):
try:
deathDate = datetime.datetime.strptime(defuncion, '%d/%m/%Y').date()
burialDate = datetime.datetime.strptime(NPDEATHDATA[ii,11], '%d/%m/%Y').date()
diffBurial.append( (burialDate - deathDate).days )
except:
pass
delay = Counter(diffBurial)
print ( "\nDelay between death and funeral for %d funerals:" % len(diffBurial))
for idx in range(len(delay)):
print ( "%4d : %d dias" % (delay.most_common()[idx][1], delay.most_common()[idx][0]) )
From 2349 funerals, 2283 (97.2%) happened the day immediately after the death, 25 the same day, 13 after 2 days, and a 7 between 3 and 21 days.
Keep in mind, however, that for the 4330 entreis, there is no recording of the death date in around 2000. Probably because dead and funeral happened at the same time and the priests opted for only filling one field in the book.
In addition it can clearly be seen that there are numerous typos on the data, as 12 entries report a funeral before the death. In 12 other cases it looks like there is shift likely to be due to a typo (1 month, 3 months, 1 year, etc).
Next step is to visualise the distribution of births and deaths by year, and also by month. For that I wrote these 3 functions:
In [16]:
# Hide the right and top axes
def remove2axis(ax):
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
# Only show ticks on the left and bottom spines
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')
# Modified z-score
def MADscore(vec):
median = np.median(vec)
median_absolute_deviation = np.median([np.abs(v - median) for v in vec])
modified_z_scores = [0.6745 * (v - median) / median_absolute_deviation for v in vec]
return modified_z_scores
# Print and plot distributions
def plotDateDistribution ( vec, str_ylabel, showPlot=True, showByYear=True, showByDay=False ):
# str_day = ['Domingo', 'Lunes', 'Martes', 'Miercoles', 'Jueves', 'Viernes', 'Sabado']
# str_months = ['Enero', 'Febrero', 'Marzo', 'Abril', 'Mayo', 'Junio',
# 'Julio', 'Agosto', 'Setiembre', 'Octubre', 'Noviembre', 'Diciembre']
str_day = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
str_months = ['January', 'February', 'March', 'April', 'May', 'June',
'July', 'August', 'September', 'October', 'November', 'December']
distributionDay = [0] * 7
distributionMonth = [0] * 12
distributionYear = [0] * 2000
# put entries in bin for each occurance
for d in vec:
try:
deathDate = datetime.datetime.strptime(d, '%d/%m/%Y').date()
distributionDay[deathDate.weekday()] += 1
distributionMonth[deathDate.month-1] += 1
distributionYear[deathDate.year] += 1
except:
pass
# compute z-scores to detect outliers
zScoreDay = MADscore(distributionDay)
zScoreMonth = MADscore(distributionMonth)
zScoreYear = MADscore(distributionYear)
# find first and last year with data
min_year = next(i for i, v in enumerate(distributionYear) if v > 0)
max_year_tmp = next(i for i, v in enumerate(reversed(distributionYear)) if v > 0)
max_year = len(distributionYear) - 1 - max_year_tmp
yearRange = np.arange(min_year, max_year+1)
# Print distributions on console
if showByDay:
print( "\nBy day: ")
for i, v in enumerate(distributionDay):
print ( "%12s : %4d ( %+.2f )" % (str_day[i], v, zScoreDay[i]) )
print( "\nBy month: ")
for i, v in enumerate(distributionMonth):
print ( "%12s : %4d ( %+.2f )" % (str_months[i], v, zScoreMonth[i]) )
if showPlot:
fig = plt.figure()
fig.suptitle("By Month")
ax1 = plt.gca()
ax1.bar(range(12), distributionMonth, align='center', linewidth = '0', alpha=0.8, color='darkorange')
ax1.set_xticks( range(12) )
ax1.set_xticklabels( [v[0] for v in str_months] )
ax1.set_ylabel( str_ylabel )
plt.margins(0.05, 0)
x1,x2,y1,y2 = ax1.axis()
ax1.axis((x1,x2,y1,y2 + 5))
remove2axis(ax1)
if showByDay:
fig = plt.figure()
fig.suptitle("By Day")
ax2 = plt.gca()
ax2.bar(range(7), distributionDay, align='center', linewidth = '0', alpha=0.8, color='darkorange')
ax2.set_xticks( range(7) )
ax2.set_xticklabels( [v[0] for v in str_day] )
plt.ylabel( str_ylabel )
plt.margins(0.05, 0)
x1,x2,y1,y2 = ax2.axis()
ax2.axis((x1,x2,y1,y2 + 5))
remove2axis(ax2)
if showByYear:
values = distributionYear[min_year:max_year+1]
fig = plt.figure()
fig.suptitle("By Year")
ax3 = plt.gca()
ax3.bar( yearRange, values, align='center', width=1, linewidth = '0', alpha=0.8, color='darkorange')
ax3.set_xticks( [v for v in yearRange if v % 10 == 0] )
plt.ylabel( str_ylabel )
plt.margins(0.03, 0)
x1,x2,y1,y2 = ax3.axis()
ax3.axis((x1,x2,y1,y2 + 5))
remove2axis(ax3)
# fig.subplots_adjust(left=0.06, right=0.99, top= 0.99, bottom=0.05)
plt.show()
else:
fig = None
return [fig, distributionDay, distributionMonth, distributionYear]
Notice that this function also allows to explore the distributions of births and deaths across the days of the week. However I removed that part of the analysis as nothing interesting came out of it (all distributions were basically uniform)
In [30]:
[figBirth, birthByDay, birthByMonth, birthByYear] = plotDateDistribution ( npBirthData[:,4], '# Births', True )
The distribution by month shows a higher rate of births in the spring. March, the month with highest number of births has a modified z-score of 2.57. So a bit of an outlier but nothing exceptional (usually an outlier is considered as such when the z-score is over 3.5).
Despite not statistically significant, December shows a reducition in births, probably due to the reduced conception during Lent.
On the distribution by year we can see a steady increase in births along time, with a marked boom in the 1750 to 1770.
In [18]:
[figDeath, deathByDay, deathByMonth, deathByYear] = plotDateDistribution ( npDeathData[:,8], '# Funerals', True )
From the month distribution we see that August (A) has an abnormal spike of deaths with a modified z-score of +4.66 (Usually anything above 3.5 is considered outlier). In addition July and September are the 2nd and 3rd months with highest number of funerals.
At first we could think that given the high child mortality back then, a spike in deaths could be explained by a spike in births. However looking at the distribution of births previously plotted we see than that is not case, with August and Septembre being the months with lowest number of births. Could it be that summer and the spanish heat is worse for the population? Maybe it was due to accidents in the fields, wars being more common in the summer? We don't know, but we will come back to analysis in Section 3.4.
Distribution by year show no births from 1637 until 1663. This, I have been told, was due not because of a lack of births, but because the sheets from the records book of that period were tore. Probably to light a fire...
Similarly, in 1780 there are no deaths reported, but this is also probably due to missing records, as birth records exist therefore it is likely that the priest would have recorded funerals as well.
Another interesting fact is 1803, with that huge spike in deaths: Could it be wars, the plague, a bad crop? We don't know but we will take a look to that later on in Section 3.5, to see what the data says.
As seen in the distribution of deaths by month of the year, the summer months showed a higher mortality. This was a bit of a surprise for me, as I expected a nearly uniform distribution or an increase during winter, when the conditions are harsh (subzero temperatures are not rare during the winter months in this region).
In order to investigate any possible underlying cause that explains the super mortality spike, we first will separate adults from children:
In the field NPDEATHDATA[:,9] there is written an abbreviation of the age of people at their funerals. 'ALB' or 'A' (Albado/a), 'P' (Parvulo), 'N' (Niño/a), 'NIÑO' and 'NIÑA' denote children, while an empty field '' denote adults. HF (Hijo de familia) and the remaining indicators are excluded from this analysis.
In [19]:
printMostCommon( NPDEATHDATA[:,9], 15 )
In [20]:
# Separate adults and children
adults = []
children = []
for i, v in enumerate(NPDEATHDATA[:,9]):
if v == '':
adults.append(NPDEATHDATA[i,8])
if v in ['P', 'ALB', 'A', 'N', 'NIÑA', 'ŃIÑO']:
children.append(NPDEATHDATA[i,8])
print ("Total number of adult deaths: %d" % len(adults))
[figDeathAdults, deathByDayAdults, deathByMonthAdults, deathByYearAdults] = \
plotDateDistribution ( adults, '# Funerals Adults', True )
When only looking at adult deaths, the distribution per months seems pretty niform, without a summer spike as before.
In [21]:
print ("Total number of children deaths: %d" % len(children))
[figDethChildren, deathByDayChildren, deathByMonthChildren, deathByYearChildren] = \
plotDateDistribution ( children, '# Funerals Children', True )
When we focus on children, the summer spike becomes increasingly pronounced (z-score +7.93). This should then exclude the explanation that summer deaths were related to an increase in labour activity or warfare, a task mainly performed by adults, or older children.
To exclude violent deaths as the cause of the summer spike in mortality, we can look at deaths separated by gender. A high rate of male deaths could indicate a spike in violent deaths, as men were more involved in warfare and farming labour:
In [22]:
# Spearate adults deaths by gender
male = []
female = []
for i, v in enumerate(NPDEATHDATA[:,9]):
if v == '':
if NPDEATHDATA[i,22] == 'HOMBRE':
male.append(NPDEATHDATA[i,8])
else:
female.append(NPDEATHDATA[i,8])
print ("Total number of male adult deaths: %d" % len(male))
print ("Total number of female adult deaths: %d" % len(female))
And the ratio by month:
In [23]:
deathByMonthMaleAdults = [0] * 12
deathByMonthFemaleAdults = [0] * 12
for m in male:
try:
deathDate = datetime.datetime.strptime(m, '%d/%m/%Y').date()
deathByMonthMaleAdults[deathDate.month-1] += 1
except:
pass
for f in female:
try:
deathDate = datetime.datetime.strptime(f, '%d/%m/%Y').date()
deathByMonthFemaleAdults[deathDate.month-1] += 1
except:
pass
MFratio = np.array(deathByMonthMaleAdults) / (np.array(deathByMonthFemaleAdults) + np.array(deathByMonthMaleAdults)) * 100
MFscore = MADscore(MFratio)
print( "\nBy month: ")
for i, v in enumerate(MFratio):
print ( "%12s : %4.1f ( %+.2f )" % (i, v, MFscore[i]) )
In [24]:
fig = plt.figure()
plt.plot(range(12), MFratio, alpha=0.8, linewidth=4, color='darkorange')
plt.plot([-0, 11], [50, 50], linewidth=2, alpha=0.3, color='gray')
plt.ylabel( 'Percentage of deaths that were male' )
plt.xticks( range(12), ['J', 'F', 'M', 'A', 'M', 'J', 'J', 'A', 'S', 'O', 'N', 'D'] )
plt.margins(0.05)
remove2axis(plt.gca())
The rate of (adult) male deaths remains between 41% (May) to 57% (December), with an august rate of 52.2% in August. This constant ratio suggests that the increase in summer deaths were not violent in nature.
So while it is hard to extract conclusions from this limited dataset, the data shows that the summers had an increased mortality among children that was not related to violent deaths.
When we compare this data with other demographic studies we see that in "Evolucion Demografica de Cortes de Arenoso desde 1560 a 1660" by Antonio Poveda Ayora, MSc Thesis, Universidad de Valencia, 1982, a similar spike is reported in the month of August (with a 38% increase over the mean). In page 282 he reports:
"El máximo de defunciones que se registran en Cortes de Arenoso en los meses de agosto y septiembre puede deberse a la incidencia que, sobre estos meses, tendría la mortalidad de niños, pues las causas de la mortalidad infantil eran, fundamentalmente, las diarreas estivales antes de las lluvias de otoño, debidas a la sequedad, altas temperaturas, empleo de aguas contaminadas, etc."
English: "The maximum of deaths registered in Cortes de Arenoso in August and September can be due the high children's mortality, as main children mortality causes were summer diarrhea before autumn rains as consequence of dry weather, high temperatures, contaminated waters, etc."
We have seen that the period between 1750 and 1780 showed an increase in births that might be attributed to a higher standard of living. To confirm that we can take a look at child mortality, as there is an association between higher child mortality and poorer living conditions.
In [25]:
# we already have computed deathByYearChildren and deathByYearAdult
#ChildrenMortalityRatio = np.array(deathByYearChildren) / (np.array(deathByYearAdults) + np.array(deathByYearChildren)) * 100
#CRscore = MADscore(ChildrenMortalityRatio)
childrenMortalityRatio = [np.nan] * 2000
for i, v in enumerate(deathByYearChildren):
if v > 0 or deathByYearAdults[i] > 0:
childrenMortalityRatio[i] = deathByYearChildren[i] / (deathByYearChildren[i] + deathByYearAdults[i]) * 100
#print( childrenMortalityRatio )
# find first and last year with data
min_year = next(i for i, v in enumerate(childrenMortalityRatio) if v != np.nan)
max_year_tmp = next(i for i, v in enumerate(reversed(childrenMortalityRatio)) if v != np.nan)
max_year = len(childrenMortalityRatio) - 1 - max_year_tmp
yearRange = np.arange(min_year, max_year+1)
fig = plt.figure()
fig.suptitle("Child mortality by year")
plt.bar(yearRange, childrenMortalityRatio[min_year:max_year+1], align='center', linewidth=0, width=1, color='darkorange', alpha=0.8)
#plt.plot([-0, 11], [50, 50])
plt.ylabel( 'Children mortality ratio' )
plt.xticks( [v for v in yearRange if v % 10 == 0] )
plt.margins(0.03, 0)
remove2axis(plt.gca())
It is clear from that graph that the expected decline in child mortality is not present in the period between 1750 to 1770. In addition we can see that there is virtually no child mortality between 1705 and 1750. This was clear not the case and the likely reason for that gap is that child mortality was not reported during this period.
Similarly, the 100% child mortality reported from 1840 to 1850 is probably a consequence of bad encoding of deaths of adults, or bad parsing of the data.
As seen in the plot of all deaths by year,a huge spike in mortality occurs in 1803. To study the possible explanation of that spike we first look at the distributions by month on that year:
In [26]:
deaths1803 = []
deaths1803Adult = []
deaths1803Child = []
for i, d in enumerate(npDeathData[:,8]):
try:
deathDate = datetime.datetime.strptime(d, '%d/%m/%Y').date()
if deathDate.year == 1803:
deaths1803.append(d)
if NPDEATHDATA[i,9] == '':
deaths1803Adult.append(d)
if NPDEATHDATA[i,9] in ['P', 'ALB', 'A', 'N', 'NIÑA', 'ŃIÑO']:
deaths1803Child.append(d)
except:
pass
[figDeath1803, deathByDay1803, deathByMonth1803, deathByYear1803] = plotDateDistribution ( deaths1803, '# Funerals in 1803', True, False )
If we lat at the distribution independently for children and adults:
In [27]:
[figDeath1803, deathByDay1803, deathByMonth1803, deathByYear1803] = plotDateDistribution ( deaths1803Adult, '# Adult funerals in 1803', True, False )
[figDeath1803, deathByDay1803, deathByMonth1803, deathByYear1803] = plotDateDistribution ( deaths1803Child, '# Children funerals in 1803', True, False )
We can see that the exceptional number of 33 deaths (of which 32 where children) occured in the single month of January 1803. Was that a trend coming from 1802?
In [28]:
deaths1802 = []
for d in npDeathData[:,8]:
try:
deathDate = datetime.datetime.strptime(d, '%d/%m/%Y').date()
if deathDate.year == 1802:
deaths1802.append(d)
except:
pass
[figDeath1802, deathByDay1802, deathByMonth1802, deathByYear1802] = plotDateDistribution ( deaths1802, '# Funerals 1802', True, False )
No, January 1803 was an isolated case of high mortality. Maybe a bad crop, a strong winter, or a sever flu? We don't know.
Ot the spike is because somehow a bunnch of births were recorded with the same date repeated? Maybe the 1st of January of 1803?
In [29]:
# to dates type
dates1803 = [datetime.datetime.strptime(strDate, '%d/%m/%Y').date() for strDate in deaths1803]
dayOfMonth = [d.day for d in dates1803 if d.month == 1] # only for january
print(dayOfMonth)
# because who needs np.histogram()?
distDay = [0] * 32
for d in dayOfMonth:
distDay[d] += 1
fig = plt.figure()
plt.bar(range(len(distDay)), distDay, alpha=0.8, linewidth=0, color='darkorange')
plt.ylabel( '# Funerals January 1803, per day' )
remove2axis(plt.gca())
No, the deaths are spread among the days of the month. January 1803 was simply a tough month.
While the data contains numerous inaccuracies and typographical errors, we can already see certain trends.
The first interesting result that arises from this study is that summers, specially August, showed a higher children mortality rate than the rest of the year. This trend is seen in other demographic studies of the region where it is attributed to the hotter conditions before autunm rains. In this study we showed that the reasons for such increase in mortality were probably not related to violent deaths, as male mortality (associated with warfare and labour accidents) remained under 50% during this period.
Furthermore, the months of Decembre and January showed a second spike in mortality, more pronounced among children andmales. This is also consistent with other studies of the region.
A second result of interest is that January of 1803 was a month with very high mortality among children, albeit we don't know the reason for that.
In addition, we can see an increase in number of births between 1750 and 1780. However we do not see a decrease in child mortality during the same period that would suggest an increase in standards of living during this period.
The data is available to be download as CSV files at:
http://perezrovira.net/adria/demografiaNotebook/allDefunciones.csv
http://perezrovira.net/adria/demografiaNotebook/allNacimientos.csv
Copyright 2017 Adria Perez-Rovira (MIT License)
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
In [29]: