Copyright 2015 Enthought, Inc. All Rights Reserved
In this exercise, we would like to compare the distribution of solar power when the sky is clear and when it is cloudy.
First of all, we need to deal with missing data in the dataset.
cloudy column,
since we will not be able to use them in the analysis.method keyword to find
an appropriate interpolation function)
In [6]:
%matplotlib inline
import pandas as pd
# Load the data and plot its columns.
solar = pd.read_csv('solar.csv', index_col=0, parse_dates=True)
solar.plot(subplots=True, figsize=(15, 5));
In [7]:
solar.head(20)
Out[7]:
In [8]:
# 2. Drop all the rows with missing data for the cloudy column.
solar = solar.dropna(subset=['cloudy'])
# Alternative: solar = solar[solar.cloudy.notnull()]
In [9]:
solar.head(20)
Out[9]:
In [3]:
solar.plot(subplots=True, figsize=(15, 5));
In [11]:
# 3. Interpolate the missing data from the column containing the power.
solar = solar.interpolate(method='time')
solar.plot(subplots=True, figsize=(15, 5));
In [12]:
solar.head(20)
Out[12]:
cloudy flag.
In [13]:
# 1. Group the data by the cloudy flag.
g = solar.groupby('cloudy')
In [14]:
# 2. Compute the mean and standard deviation of each group, in two separate commands.
g.mean()
Out[14]:
In [15]:
g.std()
Out[15]:
In [16]:
# 3. Create a new dataframe with a column for the mean and one for the standard deviation in a single command.
import numpy as np
g.agg([np.mean, np.std])
Out[16]:
In [ ]: