The exercises:
Create a new grouped by dataframe, called 'grouped_new', that groups the dataframe by pregorder and calculates the mean for the other variables. Print this out.
Sort this new grouped by dataframe by the age at pregnancy, in descending order so that the oldest average age at pregnancy is at the top.
Print the subset of the dataframe with only those rows where the pregordr is 15 or greater
In the original dataframe ('df'), create a new column that contains the age at pregnancy divided by the pregnancy order. Print out the dataframe to see the new column. (Note: There is not a sensible mathematical reason for this that I can think of. This is just to practice pandas.)
In [ ]:
import pandas
#First read in data
df = pandas.read_csv("../Data/nsfg_data1.csv.bz2", compression='bz2', index_col = 0)
#view our data. Note: 'NaN' indicates the value is missing
df
In [ ]:
#Exercise 1
grouped_new = df.groupby('pregordr').mean()
grouped_new
In [ ]:
#Exercise 2
grouped_new.sort_values(by='agepreg', ascending=False)
In [ ]:
# Exercise 3
df[df['pregordr']>15]
In [ ]:
#Exercise 4
df['age/pregordr'] = df['agepreg']/df['pregordr']
df