Introduction and relevant resources

This is the first notebook in the Learn Pandas track.
These exercises assume some prior experience with Pandas.
Each page has a list of relevant resources that you can use for reference, and the top item in each list has been chosen specifically to help you with the exercises on that page.
The first step in most data science projects is reading in the data.
In this section, you will be using pandas to create Series and DataFrame objects, both by hand and by reading data files.
The Relevant Resources, as promised:

Setup


In [1]:
import pandas as pd
pd.set_option('max_rows', 5)
from learntools.advanced_pandas.creating_reading_writing import *

Checking Answers

You can check your answers in each of the exercises that follow using the check_qN function provided in the code cell above by replacing N with the number of the exercise.
For example here's how you would check an incorrect answer to exercise 1:


In [2]:
check_q1(pd.DataFrame())


Out[2]:
False

A correct answer would return True.
If you capitulate, run print(answer_qN())).

Exercises

Exercise 1
Create a DataFrame:


In [3]:
data = {'Apples': [30], 'Bananas': [21]}
pd.DataFrame(data=data)


Out[3]:
Apples Bananas
0 30 21

In [4]:
df2 = pd.DataFrame([(30, 21)], columns=['Apples', 'Bananas'])
df2


Out[4]:
Apples Bananas
0 30 21

In [5]:
answer_q1()


pd.DataFrame({'Apples': [30], 'Bananas': [21]})

Exercise 2
Create a 2x2 DataFrame:


In [6]:
df2x2 = pd.DataFrame([[35, 21], [41, 34]], index=['2017 Sales', '2018 Sales'], columns=['Apples', 'Bananas'])
df2x2


Out[6]:
Apples Bananas
2017 Sales 35 21
2018 Sales 41 34

In [7]:
answer_q2()


pd.DataFrame(
    {'Apples': [35, 41], 'Bananas': [21, 34]},
    index=['2017 Sales', '2018 Sales']
)

Exercise 3
Create a Series:


In [8]:
pd.Series({'Flour': '4 cups', 'Milk': '1 cup', 'Eggs': '2 large', 'Spam': '1 can'}, name='Dinner')


Out[8]:
Eggs     2 large
Flour     4 cups
Milk       1 cup
Spam       1 can
Name: Dinner, dtype: object

In [9]:
answer_q3()


pd.Series(['4 cups', '1 cup', '2 large', '1 can'], 
index=['Flour', 'Milk', 'Eggs', 'Spam'], 
name='Dinner')

Exercise 4
Read data from a .csv file into a DataFrame.


In [10]:
wine_reviews = pd.read_csv('inputs/wine-reviews/winemag-data_first150k.csv', index_col=0)
wine_reviews.head()


Out[10]:
country description designation points price province region_1 region_2 variety winery
0 US This tremendous 100% varietal wine hails from ... Martha's Vineyard 96 235.0 California Napa Valley Napa Cabernet Sauvignon Heitz
1 Spain Ripe aromas of fig, blackberry and cassis are ... Carodorum Selección Especial Reserva 96 110.0 Northern Spain Toro NaN Tinta de Toro Bodega Carmen Rodríguez
2 US Mac Watson honors the memory of a wine once ma... Special Selected Late Harvest 96 90.0 California Knights Valley Sonoma Sauvignon Blanc Macauley
3 US This spent 20 months in 30% new French oak, an... Reserve 96 65.0 Oregon Willamette Valley Willamette Valley Pinot Noir Ponzi
4 France This is the top wine from La Bégude, named aft... La Brûlade 95 66.0 Provence Bandol NaN Provence red blend Domaine de la Bégude

In [11]:
wine_reviews.tail()


Out[11]:
country description designation points price province region_1 region_2 variety winery
150925 Italy Many people feel Fiano represents southern Ita... NaN 91 20.0 Southern Italy Fiano di Avellino NaN White Blend Feudi di San Gregorio
150926 France Offers an intriguing nose with ginger, lime an... Cuvée Prestige 91 27.0 Champagne Champagne NaN Champagne Blend H.Germain
150927 Italy This classic example comes from a cru vineyard... Terre di Dora 91 20.0 Southern Italy Fiano di Avellino NaN White Blend Terredora
150928 France A perfect salmon shade, with scents of peaches... Grand Brut Rosé 90 52.0 Champagne Champagne NaN Champagne Blend Gosset
150929 Italy More Pinot Grigios should taste like this. A r... NaN 90 15.0 Northeastern Italy Alto Adige NaN Pinot Grigio Alois Lageder

In [12]:
wine_reviews.shape


Out[12]:
(150930, 10)

In [13]:
wine_reviews.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 150930 entries, 0 to 150929
Data columns (total 10 columns):
country        150925 non-null object
description    150930 non-null object
designation    105195 non-null object
points         150930 non-null int64
price          137235 non-null float64
province       150925 non-null object
region_1       125870 non-null object
region_2       60953 non-null object
variety        150930 non-null object
winery         150930 non-null object
dtypes: float64(1), int64(1), object(8)
memory usage: 12.7+ MB

In [14]:
dir(wine_reviews)


Out[14]:
['T',
 '_AXIS_ALIASES',
 '_AXIS_IALIASES',
 '_AXIS_LEN',
 '_AXIS_NAMES',
 '_AXIS_NUMBERS',
 '_AXIS_ORDERS',
 '_AXIS_REVERSED',
 '_AXIS_SLICEMAP',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_wrap__',
 '__bool__',
 '__bytes__',
 '__class__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__dir__',
 '__div__',
 '__doc__',
 '__eq__',
 '__finalize__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattr__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__invert__',
 '__ipow__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__module__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__or__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdiv__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rmod__',
 '__rmul__',
 '__ror__',
 '__round__',
 '__rpow__',
 '__rsub__',
 '__rtruediv__',
 '__rxor__',
 '__setattr__',
 '__setitem__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__unicode__',
 '__weakref__',
 '__xor__',
 '_accessors',
 '_add_numeric_operations',
 '_add_series_only_operations',
 '_add_series_or_dataframe_operations',
 '_agg_by_level',
 '_agg_doc',
 '_aggregate',
 '_aggregate_multiple_funcs',
 '_align_frame',
 '_align_series',
 '_apply_broadcast',
 '_apply_empty_result',
 '_apply_raw',
 '_apply_standard',
 '_at',
 '_box_col_values',
 '_box_item_values',
 '_builtin_table',
 '_check_inplace_setting',
 '_check_is_chained_assignment_possible',
 '_check_percentile',
 '_check_setitem_copy',
 '_clear_item_cache',
 '_clip_with_scalar',
 '_combine_const',
 '_combine_frame',
 '_combine_match_columns',
 '_combine_match_index',
 '_combine_series',
 '_combine_series_infer',
 '_compare_frame',
 '_compare_frame_evaluate',
 '_consolidate',
 '_consolidate_inplace',
 '_construct_axes_dict',
 '_construct_axes_dict_for_slice',
 '_construct_axes_dict_from',
 '_construct_axes_from_arguments',
 '_constructor',
 '_constructor_expanddim',
 '_constructor_sliced',
 '_convert',
 '_count_level',
 '_create_indexer',
 '_cython_table',
 '_dir_additions',
 '_dir_deletions',
 '_ensure_valid_index',
 '_expand_axes',
 '_flex_compare_frame',
 '_from_arrays',
 '_from_axes',
 '_get_agg_axis',
 '_get_axis',
 '_get_axis_name',
 '_get_axis_number',
 '_get_axis_resolvers',
 '_get_block_manager_axis',
 '_get_bool_data',
 '_get_cacher',
 '_get_index_resolvers',
 '_get_item_cache',
 '_get_numeric_data',
 '_get_values',
 '_getitem_array',
 '_getitem_column',
 '_getitem_frame',
 '_getitem_multilevel',
 '_getitem_slice',
 '_gotitem',
 '_iat',
 '_iget_item_cache',
 '_iloc',
 '_indexed_same',
 '_info_axis',
 '_info_axis_name',
 '_info_axis_number',
 '_info_repr',
 '_init_dict',
 '_init_mgr',
 '_init_ndarray',
 '_internal_names',
 '_internal_names_set',
 '_is_builtin_func',
 '_is_cached',
 '_is_cython_func',
 '_is_datelike_mixed_type',
 '_is_mixed_type',
 '_is_numeric_mixed_type',
 '_is_view',
 '_ix',
 '_ixs',
 '_join_compat',
 '_loc',
 '_maybe_cache_changed',
 '_maybe_update_cacher',
 '_metadata',
 '_needs_reindex_multi',
 '_obj_with_exclusions',
 '_protect_consolidate',
 '_reduce',
 '_reindex_axes',
 '_reindex_axis',
 '_reindex_columns',
 '_reindex_index',
 '_reindex_multi',
 '_reindex_with_indexers',
 '_repr_data_resource_',
 '_repr_fits_horizontal_',
 '_repr_fits_vertical_',
 '_repr_html_',
 '_repr_latex_',
 '_reset_cache',
 '_reset_cacher',
 '_sanitize_column',
 '_selected_obj',
 '_selection',
 '_selection_list',
 '_selection_name',
 '_series',
 '_set_as_cached',
 '_set_axis',
 '_set_axis_name',
 '_set_is_copy',
 '_set_item',
 '_setitem_array',
 '_setitem_frame',
 '_setitem_slice',
 '_setup_axes',
 '_shallow_copy',
 '_slice',
 '_stat_axis',
 '_stat_axis_name',
 '_stat_axis_number',
 '_try_aggregate_string_function',
 '_typ',
 '_unpickle_frame_compat',
 '_unpickle_matrix_compat',
 '_update_inplace',
 '_validate_dtype',
 '_values',
 '_where',
 '_xs',
 'abs',
 'add',
 'add_prefix',
 'add_suffix',
 'agg',
 'aggregate',
 'align',
 'all',
 'any',
 'append',
 'apply',
 'applymap',
 'as_blocks',
 'as_matrix',
 'asfreq',
 'asof',
 'assign',
 'astype',
 'at',
 'at_time',
 'axes',
 'between_time',
 'bfill',
 'blocks',
 'bool',
 'boxplot',
 'clip',
 'clip_lower',
 'clip_upper',
 'columns',
 'combine',
 'combine_first',
 'compound',
 'consolidate',
 'convert_objects',
 'copy',
 'corr',
 'corrwith',
 'count',
 'country',
 'cov',
 'cummax',
 'cummin',
 'cumprod',
 'cumsum',
 'describe',
 'description',
 'designation',
 'diff',
 'div',
 'divide',
 'dot',
 'drop',
 'drop_duplicates',
 'dropna',
 'dtypes',
 'duplicated',
 'empty',
 'eq',
 'equals',
 'eval',
 'ewm',
 'expanding',
 'ffill',
 'fillna',
 'filter',
 'first',
 'first_valid_index',
 'floordiv',
 'from_csv',
 'from_dict',
 'from_items',
 'from_records',
 'ftypes',
 'ge',
 'get',
 'get_dtype_counts',
 'get_ftype_counts',
 'get_value',
 'get_values',
 'groupby',
 'gt',
 'head',
 'hist',
 'iat',
 'idxmax',
 'idxmin',
 'iloc',
 'index',
 'info',
 'insert',
 'interpolate',
 'is_copy',
 'isin',
 'isnull',
 'items',
 'iteritems',
 'iterrows',
 'itertuples',
 'ix',
 'join',
 'keys',
 'kurt',
 'kurtosis',
 'last',
 'last_valid_index',
 'le',
 'loc',
 'lookup',
 'lt',
 'mad',
 'mask',
 'max',
 'mean',
 'median',
 'melt',
 'memory_usage',
 'merge',
 'min',
 'mod',
 'mode',
 'mul',
 'multiply',
 'ndim',
 'ne',
 'nlargest',
 'notnull',
 'nsmallest',
 'nunique',
 'pct_change',
 'pipe',
 'pivot',
 'pivot_table',
 'plot',
 'points',
 'pop',
 'pow',
 'price',
 'prod',
 'product',
 'province',
 'quantile',
 'query',
 'radd',
 'rank',
 'rdiv',
 'region_1',
 'region_2',
 'reindex',
 'reindex_axis',
 'reindex_like',
 'rename',
 'rename_axis',
 'reorder_levels',
 'replace',
 'resample',
 'reset_index',
 'rfloordiv',
 'rmod',
 'rmul',
 'rolling',
 'round',
 'rpow',
 'rsub',
 'rtruediv',
 'sample',
 'select',
 'select_dtypes',
 'sem',
 'set_axis',
 'set_index',
 'set_value',
 'shape',
 'shift',
 'size',
 'skew',
 'slice_shift',
 'sort_index',
 'sort_values',
 'sortlevel',
 'squeeze',
 'stack',
 'std',
 'style',
 'sub',
 'subtract',
 'sum',
 'swapaxes',
 'swaplevel',
 'tail',
 'take',
 'to_clipboard',
 'to_csv',
 'to_dense',
 'to_dict',
 'to_excel',
 'to_feather',
 'to_gbq',
 'to_hdf',
 'to_html',
 'to_json',
 'to_latex',
 'to_msgpack',
 'to_panel',
 'to_period',
 'to_pickle',
 'to_records',
 'to_sparse',
 'to_sql',
 'to_stata',
 'to_string',
 'to_timestamp',
 'to_xarray',
 'transform',
 'transpose',
 'truediv',
 'truncate',
 'tshift',
 'tz_convert',
 'tz_localize',
 'unstack',
 'update',
 'values',
 'var',
 'variety',
 'where',
 'winery',
 'xs']

In [15]:
print(wine_reviews)


       country                                        description  \
0           US  This tremendous 100% varietal wine hails from ...   
1        Spain  Ripe aromas of fig, blackberry and cassis are ...   
...        ...                                                ...   
150928  France  A perfect salmon shade, with scents of peaches...   
150929   Italy  More Pinot Grigios should taste like this. A r...   

                                 designation  points  price  \
0                          Martha's Vineyard      96  235.0   
1       Carodorum Selección Especial Reserva      96  110.0   
...                                      ...     ...    ...   
150928                       Grand Brut Rosé      90   52.0   
150929                                   NaN      90   15.0   

                  province     region_1 region_2             variety  \
0               California  Napa Valley     Napa  Cabernet Sauvignon   
1           Northern Spain         Toro      NaN       Tinta de Toro   
...                    ...          ...      ...                 ...   
150928           Champagne    Champagne      NaN     Champagne Blend   
150929  Northeastern Italy   Alto Adige      NaN        Pinot Grigio   

                         winery  
0                         Heitz  
1       Bodega Carmen Rodríguez  
...                         ...  
150928                   Gosset  
150929            Alois Lageder  

[150930 rows x 10 columns]

In [16]:
wine_reviews.items


Out[16]:
<bound method DataFrame.iteritems of        country                                        description  \
0           US  This tremendous 100% varietal wine hails from ...   
1        Spain  Ripe aromas of fig, blackberry and cassis are ...   
...        ...                                                ...   
150928  France  A perfect salmon shade, with scents of peaches...   
150929   Italy  More Pinot Grigios should taste like this. A r...   

                                 designation  points  price  \
0                          Martha's Vineyard      96  235.0   
1       Carodorum Selección Especial Reserva      96  110.0   
...                                      ...     ...    ...   
150928                       Grand Brut Rosé      90   52.0   
150929                                   NaN      90   15.0   

                  province     region_1 region_2             variety  \
0               California  Napa Valley     Napa  Cabernet Sauvignon   
1           Northern Spain         Toro      NaN       Tinta de Toro   
...                    ...          ...      ...                 ...   
150928           Champagne    Champagne      NaN     Champagne Blend   
150929  Northeastern Italy   Alto Adige      NaN        Pinot Grigio   

                         winery  
0                         Heitz  
1       Bodega Carmen Rodríguez  
...                         ...  
150928                   Gosset  
150929            Alois Lageder  

[150930 rows x 10 columns]>

In [17]:
answer_q4()


pd.read_csv("../input/wine-reviews/winemag-data_first150k.csv", index_col=0)

Exercise 5
Read data from a .xls sheet into a pandas DataFrame.


In [18]:
wic = pd.read_excel('inputs/publicassistance/xls_files_all/WICAgencies2014ytd.xls',
                    sheetname='Pregnant Women Participating')
wic.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 106 entries, 0 to 105
Data columns (total 14 columns):
WIC PROGRAM -- NUMBER OF PREGNANT WOMEN PARTICIPATING    103 non-null object
Unnamed: 1                                               99 non-null object
Unnamed: 2                                               99 non-null object
Unnamed: 3                                               99 non-null object
Unnamed: 4                                               99 non-null object
Unnamed: 5                                               99 non-null object
Unnamed: 6                                               99 non-null object
Unnamed: 7                                               99 non-null object
Unnamed: 8                                               99 non-null object
Unnamed: 9                                               99 non-null object
Unnamed: 10                                              99 non-null object
Unnamed: 11                                              99 non-null object
Unnamed: 12                                              99 non-null object
Unnamed: 13                                              100 non-null object
dtypes: object(14)
memory usage: 11.7+ KB

In [19]:
wic.head()


Out[19]:
WIC PROGRAM -- NUMBER OF PREGNANT WOMEN PARTICIPATING Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8 Unnamed: 9 Unnamed: 10 Unnamed: 11 Unnamed: 12 Unnamed: 13
0 FISCAL YEAR 2014 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Data as of January 05, 2018 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 State Agency or Indian Tribal Organization 2013-10-01 00:00:00 2013-11-01 00:00:00 2013-12-01 00:00:00 2014-01-01 00:00:00 2014-02-01 00:00:00 2014-03-01 00:00:00 2014-04-01 00:00:00 2014-05-01 00:00:00 2014-06-01 00:00:00 2014-07-01 00:00:00 2014-08-01 00:00:00 2014-09-01 00:00:00 Average Participation
4 Connecticut 5847 5476 5274 5360 5056 5319 5500 5717 5703 5905 5754 5624 5544.58

In [20]:
answer_q5()


pd.read_excel("../input/publicassistance/xls_files_all/WICAgencies2014ytd.xls", 
sheetname='Pregnant Women Participating')

Exercise 6
Save a DataFrame as a .csv file.


In [22]:
q6_df = pd.DataFrame({'Cows': [12, 20], 'Goats': [22, 19]}, index=['Year 1', 'Year 2'])
q6_df.to_csv('cows_and_goats.csv')

In [23]:
answer_q6()


q6_df.to_csv("cows_and_goats.csv")

Exercise 7
Read SQL data into a DataFrame:


In [ ]: