In this notebook, you'll
./data/raw
directory using the functions and classes (written during the data gathering phase) in the ./script
directory./data/cleaned
directory./visualiation
directoryList the team members contributing to this notebook, along with their responsabilities:
I advise you to work at least in pairs for each project notebook, as you did for the homework assignments. Of course, all team members may participate to each notebook.
Here I'll load the plant data in xml format into a R data frame using the
create_df_from_plant_xml(file)
function contained in the R script
./script/plant_df-R
In [4]:
%load_ext rmagic
To load the function into a R cell, one needs to use the
source(R_script_file)
command in R, which works in a similar way as the
import module
command in Python:
In [5]:
%%R
source('./script/plant_df-R')
Now, we can create a data frame directly from the XML file using the functin contained in the scrip above.
If you wish to perform the cleaning using Pandas data frames instead of R data frames, one make the R data frame available to Python cells by using the R magic command:
%%R -d df_name
To know more on how to pass variables back and forth between R and Python cells, please have a look at the notebook here.
In [7]:
%%R -d data
library(XML)
data = create_df_from_plant_xml('./data/raw/plant.xml')
Now let's load our data into a Pandas data frame:
In [8]:
from pandas import DataFrame
df = DataFrame(data)
df.head()
Out[8]:
The actual data cleaning can now begins!
In [ ]:
In [ ]: