Introduction and relevant resources

This is the first notebook in the Learn Pandas track.
These exercises assume some prior experience with Pandas.
Each page has a list of relevant resources that you can use for reference, and the top item in each list has been chosen specifically to help you with the exercises on that page.
The first step in most data science projects is reading in the data.
In this section, you will be using pandas to create Series and DataFrame objects, both by hand and by reading data files.
The Relevant Resources, as promised:

Setup



In [1]:

    
import pandas as pd
pd.set_option('max_rows', 5)
from learntools.advanced_pandas.creating_reading_writing import *

Checking Answers

You can check your answers in each of the exercises that follow using the check_qN function provided in the code cell above by replacing N with the number of the exercise.
For example here's how you would check an incorrect answer to exercise 1:



In [2]:

    
check_q1(pd.DataFrame())









    Out[2]:





False

A correct answer would return True.
If you capitulate, run print(answer_qN())).

Exercises

Exercise 1
Create a DataFrame:



In [3]:

    
data = {'Apples': [30], 'Bananas': [21]}
pd.DataFrame(data=data)



In [4]:

    
df2 = pd.DataFrame([(30, 21)], columns=['Apples', 'Bananas'])
df2



In [5]:

    
answer_q1()









    



pd.DataFrame({'Apples': [30], 'Bananas': [21]})

Exercise 2
Create a 2x2 DataFrame:



In [6]:

    
df2x2 = pd.DataFrame([[35, 21], [41, 34]], index=['2017 Sales', '2018 Sales'], columns=['Apples', 'Bananas'])
df2x2



In [7]:

    
answer_q2()









    



pd.DataFrame(
    {'Apples': [35, 41], 'Bananas': [21, 34]},
    index=['2017 Sales', '2018 Sales']
)

Exercise 3
Create a Series:



In [8]:

    
pd.Series({'Flour': '4 cups', 'Milk': '1 cup', 'Eggs': '2 large', 'Spam': '1 can'}, name='Dinner')









    Out[8]:





Eggs     2 large
Flour     4 cups
Milk       1 cup
Spam       1 can
Name: Dinner, dtype: object



In [9]:

    
answer_q3()









    



pd.Series(['4 cups', '1 cup', '2 large', '1 can'], 
index=['Flour', 'Milk', 'Eggs', 'Spam'], 
name='Dinner')

Exercise 4
Read data from a .csv file into a DataFrame.



In [10]:

    
wine_reviews = pd.read_csv('inputs/wine-reviews/winemag-data_first150k.csv', index_col=0)
wine_reviews.head()









    Out[10]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      variety
      winery
    
  
  
    
      0
      US
      This tremendous 100% varietal wine hails from ...
      Martha's Vineyard
      96
      235.0
      California
      Napa Valley
      Napa
      Cabernet Sauvignon
      Heitz
    
    
      1
      Spain
      Ripe aromas of fig, blackberry and cassis are ...
      Carodorum Selección Especial Reserva
      96
      110.0
      Northern Spain
      Toro
      NaN
      Tinta de Toro
      Bodega Carmen Rodríguez
    
    
      2
      US
      Mac Watson honors the memory of a wine once ma...
      Special Selected Late Harvest
      96
      90.0
      California
      Knights Valley
      Sonoma
      Sauvignon Blanc
      Macauley
    
    
      3
      US
      This spent 20 months in 30% new French oak, an...
      Reserve
      96
      65.0
      Oregon
      Willamette Valley
      Willamette Valley
      Pinot Noir
      Ponzi
    
    
      4
      France
      This is the top wine from La Bégude, named aft...
      La Brûlade
      95
      66.0
      Provence
      Bandol
      NaN
      Provence red blend
      Domaine de la Bégude



In [11]:

    
wine_reviews.tail()









    Out[11]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      variety
      winery
    
  
  
    
      150925
      Italy
      Many people feel Fiano represents southern Ita...
      NaN
      91
      20.0
      Southern Italy
      Fiano di Avellino
      NaN
      White Blend
      Feudi di San Gregorio
    
    
      150926
      France
      Offers an intriguing nose with ginger, lime an...
      Cuvée Prestige
      91
      27.0
      Champagne
      Champagne
      NaN
      Champagne Blend
      H.Germain
    
    
      150927
      Italy
      This classic example comes from a cru vineyard...
      Terre di Dora
      91
      20.0
      Southern Italy
      Fiano di Avellino
      NaN
      White Blend
      Terredora
    
    
      150928
      France
      A perfect salmon shade, with scents of peaches...
      Grand Brut Rosé
      90
      52.0
      Champagne
      Champagne
      NaN
      Champagne Blend
      Gosset
    
    
      150929
      Italy
      More Pinot Grigios should taste like this. A r...
      NaN
      90
      15.0
      Northeastern Italy
      Alto Adige
      NaN
      Pinot Grigio
      Alois Lageder



In [12]:

    
wine_reviews.shape









    Out[12]:





(150930, 10)



In [13]:

    
wine_reviews.info()









    



<class 'pandas.core.frame.DataFrame'>
Int64Index: 150930 entries, 0 to 150929
Data columns (total 10 columns):
country        150925 non-null object
description    150930 non-null object
designation    105195 non-null object
points         150930 non-null int64
price          137235 non-null float64
province       150925 non-null object
region_1       125870 non-null object
region_2       60953 non-null object
variety        150930 non-null object
winery         150930 non-null object
dtypes: float64(1), int64(1), object(8)
memory usage: 12.7+ MB



In [14]:

    
dir(wine_reviews)









    Out[14]:





['T',
 '_AXIS_ALIASES',
 '_AXIS_IALIASES',
 '_AXIS_LEN',
 '_AXIS_NAMES',
 '_AXIS_NUMBERS',
 '_AXIS_ORDERS',
 '_AXIS_REVERSED',
 '_AXIS_SLICEMAP',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_wrap__',
 '__bool__',
 '__bytes__',
 '__class__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__dir__',
 '__div__',
 '__doc__',
 '__eq__',
 '__finalize__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattr__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__invert__',
 '__ipow__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__module__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__or__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdiv__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rmod__',
 '__rmul__',
 '__ror__',
 '__round__',
 '__rpow__',
 '__rsub__',
 '__rtruediv__',
 '__rxor__',
 '__setattr__',
 '__setitem__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__unicode__',
 '__weakref__',
 '__xor__',
 '_accessors',
 '_add_numeric_operations',
 '_add_series_only_operations',
 '_add_series_or_dataframe_operations',
 '_agg_by_level',
 '_agg_doc',
 '_aggregate',
 '_aggregate_multiple_funcs',
 '_align_frame',
 '_align_series',
 '_apply_broadcast',
 '_apply_empty_result',
 '_apply_raw',
 '_apply_standard',
 '_at',
 '_box_col_values',
 '_box_item_values',
 '_builtin_table',
 '_check_inplace_setting',
 '_check_is_chained_assignment_possible',
 '_check_percentile',
 '_check_setitem_copy',
 '_clear_item_cache',
 '_clip_with_scalar',
 '_combine_const',
 '_combine_frame',
 '_combine_match_columns',
 '_combine_match_index',
 '_combine_series',
 '_combine_series_infer',
 '_compare_frame',
 '_compare_frame_evaluate',
 '_consolidate',
 '_consolidate_inplace',
 '_construct_axes_dict',
 '_construct_axes_dict_for_slice',
 '_construct_axes_dict_from',
 '_construct_axes_from_arguments',
 '_constructor',
 '_constructor_expanddim',
 '_constructor_sliced',
 '_convert',
 '_count_level',
 '_create_indexer',
 '_cython_table',
 '_dir_additions',
 '_dir_deletions',
 '_ensure_valid_index',
 '_expand_axes',
 '_flex_compare_frame',
 '_from_arrays',
 '_from_axes',
 '_get_agg_axis',
 '_get_axis',
 '_get_axis_name',
 '_get_axis_number',
 '_get_axis_resolvers',
 '_get_block_manager_axis',
 '_get_bool_data',
 '_get_cacher',
 '_get_index_resolvers',
 '_get_item_cache',
 '_get_numeric_data',
 '_get_values',
 '_getitem_array',
 '_getitem_column',
 '_getitem_frame',
 '_getitem_multilevel',
 '_getitem_slice',
 '_gotitem',
 '_iat',
 '_iget_item_cache',
 '_iloc',
 '_indexed_same',
 '_info_axis',
 '_info_axis_name',
 '_info_axis_number',
 '_info_repr',
 '_init_dict',
 '_init_mgr',
 '_init_ndarray',
 '_internal_names',
 '_internal_names_set',
 '_is_builtin_func',
 '_is_cached',
 '_is_cython_func',
 '_is_datelike_mixed_type',
 '_is_mixed_type',
 '_is_numeric_mixed_type',
 '_is_view',
 '_ix',
 '_ixs',
 '_join_compat',
 '_loc',
 '_maybe_cache_changed',
 '_maybe_update_cacher',
 '_metadata',
 '_needs_reindex_multi',
 '_obj_with_exclusions',
 '_protect_consolidate',
 '_reduce',
 '_reindex_axes',
 '_reindex_axis',
 '_reindex_columns',
 '_reindex_index',
 '_reindex_multi',
 '_reindex_with_indexers',
 '_repr_data_resource_',
 '_repr_fits_horizontal_',
 '_repr_fits_vertical_',
 '_repr_html_',
 '_repr_latex_',
 '_reset_cache',
 '_reset_cacher',
 '_sanitize_column',
 '_selected_obj',
 '_selection',
 '_selection_list',
 '_selection_name',
 '_series',
 '_set_as_cached',
 '_set_axis',
 '_set_axis_name',
 '_set_is_copy',
 '_set_item',
 '_setitem_array',
 '_setitem_frame',
 '_setitem_slice',
 '_setup_axes',
 '_shallow_copy',
 '_slice',
 '_stat_axis',
 '_stat_axis_name',
 '_stat_axis_number',
 '_try_aggregate_string_function',
 '_typ',
 '_unpickle_frame_compat',
 '_unpickle_matrix_compat',
 '_update_inplace',
 '_validate_dtype',
 '_values',
 '_where',
 '_xs',
 'abs',
 'add',
 'add_prefix',
 'add_suffix',
 'agg',
 'aggregate',
 'align',
 'all',
 'any',
 'append',
 'apply',
 'applymap',
 'as_blocks',
 'as_matrix',
 'asfreq',
 'asof',
 'assign',
 'astype',
 'at',
 'at_time',
 'axes',
 'between_time',
 'bfill',
 'blocks',
 'bool',
 'boxplot',
 'clip',
 'clip_lower',
 'clip_upper',
 'columns',
 'combine',
 'combine_first',
 'compound',
 'consolidate',
 'convert_objects',
 'copy',
 'corr',
 'corrwith',
 'count',
 'country',
 'cov',
 'cummax',
 'cummin',
 'cumprod',
 'cumsum',
 'describe',
 'description',
 'designation',
 'diff',
 'div',
 'divide',
 'dot',
 'drop',
 'drop_duplicates',
 'dropna',
 'dtypes',
 'duplicated',
 'empty',
 'eq',
 'equals',
 'eval',
 'ewm',
 'expanding',
 'ffill',
 'fillna',
 'filter',
 'first',
 'first_valid_index',
 'floordiv',
 'from_csv',
 'from_dict',
 'from_items',
 'from_records',
 'ftypes',
 'ge',
 'get',
 'get_dtype_counts',
 'get_ftype_counts',
 'get_value',
 'get_values',
 'groupby',
 'gt',
 'head',
 'hist',
 'iat',
 'idxmax',
 'idxmin',
 'iloc',
 'index',
 'info',
 'insert',
 'interpolate',
 'is_copy',
 'isin',
 'isnull',
 'items',
 'iteritems',
 'iterrows',
 'itertuples',
 'ix',
 'join',
 'keys',
 'kurt',
 'kurtosis',
 'last',
 'last_valid_index',
 'le',
 'loc',
 'lookup',
 'lt',
 'mad',
 'mask',
 'max',
 'mean',
 'median',
 'melt',
 'memory_usage',
 'merge',
 'min',
 'mod',
 'mode',
 'mul',
 'multiply',
 'ndim',
 'ne',
 'nlargest',
 'notnull',
 'nsmallest',
 'nunique',
 'pct_change',
 'pipe',
 'pivot',
 'pivot_table',
 'plot',
 'points',
 'pop',
 'pow',
 'price',
 'prod',
 'product',
 'province',
 'quantile',
 'query',
 'radd',
 'rank',
 'rdiv',
 'region_1',
 'region_2',
 'reindex',
 'reindex_axis',
 'reindex_like',
 'rename',
 'rename_axis',
 'reorder_levels',
 'replace',
 'resample',
 'reset_index',
 'rfloordiv',
 'rmod',
 'rmul',
 'rolling',
 'round',
 'rpow',
 'rsub',
 'rtruediv',
 'sample',
 'select',
 'select_dtypes',
 'sem',
 'set_axis',
 'set_index',
 'set_value',
 'shape',
 'shift',
 'size',
 'skew',
 'slice_shift',
 'sort_index',
 'sort_values',
 'sortlevel',
 'squeeze',
 'stack',
 'std',
 'style',
 'sub',
 'subtract',
 'sum',
 'swapaxes',
 'swaplevel',
 'tail',
 'take',
 'to_clipboard',
 'to_csv',
 'to_dense',
 'to_dict',
 'to_excel',
 'to_feather',
 'to_gbq',
 'to_hdf',
 'to_html',
 'to_json',
 'to_latex',
 'to_msgpack',
 'to_panel',
 'to_period',
 'to_pickle',
 'to_records',
 'to_sparse',
 'to_sql',
 'to_stata',
 'to_string',
 'to_timestamp',
 'to_xarray',
 'transform',
 'transpose',
 'truediv',
 'truncate',
 'tshift',
 'tz_convert',
 'tz_localize',
 'unstack',
 'update',
 'values',
 'var',
 'variety',
 'where',
 'winery',
 'xs']



In [15]:

    
print(wine_reviews)









    



       country                                        description  \
0           US  This tremendous 100% varietal wine hails from ...   
1        Spain  Ripe aromas of fig, blackberry and cassis are ...   
...        ...                                                ...   
150928  France  A perfect salmon shade, with scents of peaches...   
150929   Italy  More Pinot Grigios should taste like this. A r...   

                                 designation  points  price  \
0                          Martha's Vineyard      96  235.0   
1       Carodorum Selección Especial Reserva      96  110.0   
...                                      ...     ...    ...   
150928                       Grand Brut Rosé      90   52.0   
150929                                   NaN      90   15.0   

                  province     region_1 region_2             variety  \
0               California  Napa Valley     Napa  Cabernet Sauvignon   
1           Northern Spain         Toro      NaN       Tinta de Toro   
...                    ...          ...      ...                 ...   
150928           Champagne    Champagne      NaN     Champagne Blend   
150929  Northeastern Italy   Alto Adige      NaN        Pinot Grigio   

                         winery  
0                         Heitz  
1       Bodega Carmen Rodríguez  
...                         ...  
150928                   Gosset  
150929            Alois Lageder  

[150930 rows x 10 columns]



In [16]:

    
wine_reviews.items









    Out[16]:





<bound method DataFrame.iteritems of        country                                        description  \
0           US  This tremendous 100% varietal wine hails from ...   
1        Spain  Ripe aromas of fig, blackberry and cassis are ...   
...        ...                                                ...   
150928  France  A perfect salmon shade, with scents of peaches...   
150929   Italy  More Pinot Grigios should taste like this. A r...   

                                 designation  points  price  \
0                          Martha's Vineyard      96  235.0   
1       Carodorum Selección Especial Reserva      96  110.0   
...                                      ...     ...    ...   
150928                       Grand Brut Rosé      90   52.0   
150929                                   NaN      90   15.0   

                  province     region_1 region_2             variety  \
0               California  Napa Valley     Napa  Cabernet Sauvignon   
1           Northern Spain         Toro      NaN       Tinta de Toro   
...                    ...          ...      ...                 ...   
150928           Champagne    Champagne      NaN     Champagne Blend   
150929  Northeastern Italy   Alto Adige      NaN        Pinot Grigio   

                         winery  
0                         Heitz  
1       Bodega Carmen Rodríguez  
...                         ...  
150928                   Gosset  
150929            Alois Lageder  

[150930 rows x 10 columns]>



In [17]:

    
answer_q4()









    



pd.read_csv("../input/wine-reviews/winemag-data_first150k.csv", index_col=0)

Exercise 5
Read data from a .xls sheet into a pandas DataFrame.



In [18]:

    
wic = pd.read_excel('inputs/publicassistance/xls_files_all/WICAgencies2014ytd.xls',
                    sheetname='Pregnant Women Participating')
wic.info()









    



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 106 entries, 0 to 105
Data columns (total 14 columns):
WIC PROGRAM -- NUMBER OF PREGNANT WOMEN PARTICIPATING    103 non-null object
Unnamed: 1                                               99 non-null object
Unnamed: 2                                               99 non-null object
Unnamed: 3                                               99 non-null object
Unnamed: 4                                               99 non-null object
Unnamed: 5                                               99 non-null object
Unnamed: 6                                               99 non-null object
Unnamed: 7                                               99 non-null object
Unnamed: 8                                               99 non-null object
Unnamed: 9                                               99 non-null object
Unnamed: 10                                              99 non-null object
Unnamed: 11                                              99 non-null object
Unnamed: 12                                              99 non-null object
Unnamed: 13                                              100 non-null object
dtypes: object(14)
memory usage: 11.7+ KB



In [19]:

    
wic.head()









    Out[19]:







  
    
      
      WIC PROGRAM -- NUMBER OF PREGNANT WOMEN PARTICIPATING
      Unnamed: 1
      Unnamed: 2
      Unnamed: 3
      Unnamed: 4
      Unnamed: 5
      Unnamed: 6
      Unnamed: 7
      Unnamed: 8
      Unnamed: 9
      Unnamed: 10
      Unnamed: 11
      Unnamed: 12
      Unnamed: 13
    
  
  
    
      0
      FISCAL YEAR 2014
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1
      Data as of January 05, 2018
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      2
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      3
      State Agency or Indian Tribal Organization
      2013-10-01 00:00:00
      2013-11-01 00:00:00
      2013-12-01 00:00:00
      2014-01-01 00:00:00
      2014-02-01 00:00:00
      2014-03-01 00:00:00
      2014-04-01 00:00:00
      2014-05-01 00:00:00
      2014-06-01 00:00:00
      2014-07-01 00:00:00
      2014-08-01 00:00:00
      2014-09-01 00:00:00
      Average Participation
    
    
      4
      Connecticut
      5847
      5476
      5274
      5360
      5056
      5319
      5500
      5717
      5703
      5905
      5754
      5624
      5544.58



In [20]:

    
answer_q5()









    



pd.read_excel("../input/publicassistance/xls_files_all/WICAgencies2014ytd.xls", 
sheetname='Pregnant Women Participating')

Exercise 6
Save a DataFrame as a .csv file.



In [22]:

    
q6_df = pd.DataFrame({'Cows': [12, 20], 'Goats': [22, 19]}, index=['Year 1', 'Year 2'])
q6_df.to_csv('cows_and_goats.csv')



In [23]:

    
answer_q6()









    



q6_df.to_csv("cows_and_goats.csv")

Exercise 7
Read SQL data into a DataFrame:



In [ ]:

	country	description	designation	points	price	province	region_1	region_2	variety	winery
0	US	This tremendous 100% varietal wine hails from ...	Martha's Vineyard	96	235.0	California	Napa Valley	Napa	Cabernet Sauvignon	Heitz
1	Spain	Ripe aromas of fig, blackberry and cassis are ...	Carodorum Selección Especial Reserva	96	110.0	Northern Spain	Toro	NaN	Tinta de Toro	Bodega Carmen Rodríguez
2	US	Mac Watson honors the memory of a wine once ma...	Special Selected Late Harvest	96	90.0	California	Knights Valley	Sonoma	Sauvignon Blanc	Macauley
3	US	This spent 20 months in 30% new French oak, an...	Reserve	96	65.0	Oregon	Willamette Valley	Willamette Valley	Pinot Noir	Ponzi
4	France	This is the top wine from La Bégude, named aft...	La Brûlade	95	66.0	Provence	Bandol	NaN	Provence red blend	Domaine de la Bégude

	country	description	designation	points	price	province	region_1	region_2	variety	winery
150925	Italy	Many people feel Fiano represents southern Ita...	NaN	91	20.0	Southern Italy	Fiano di Avellino	NaN	White Blend	Feudi di San Gregorio
150926	France	Offers an intriguing nose with ginger, lime an...	Cuvée Prestige	91	27.0	Champagne	Champagne	NaN	Champagne Blend	H.Germain
150927	Italy	This classic example comes from a cru vineyard...	Terre di Dora	91	20.0	Southern Italy	Fiano di Avellino	NaN	White Blend	Terredora
150928	France	A perfect salmon shade, with scents of peaches...	Grand Brut Rosé	90	52.0	Champagne	Champagne	NaN	Champagne Blend	Gosset
150929	Italy	More Pinot Grigios should taste like this. A r...	NaN	90	15.0	Northeastern Italy	Alto Adige	NaN	Pinot Grigio	Alois Lageder

	WIC PROGRAM -- NUMBER OF PREGNANT WOMEN PARTICIPATING	Unnamed: 1	Unnamed: 2	Unnamed: 3	Unnamed: 4	Unnamed: 5	Unnamed: 6	Unnamed: 7	Unnamed: 8	Unnamed: 9	Unnamed: 10	Unnamed: 11	Unnamed: 12	Unnamed: 13
0	FISCAL YEAR 2014	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	Data as of January 05, 2018	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	State Agency or Indian Tribal Organization	2013-10-01 00:00:00	2013-11-01 00:00:00	2013-12-01 00:00:00	2014-01-01 00:00:00	2014-02-01 00:00:00	2014-03-01 00:00:00	2014-04-01 00:00:00	2014-05-01 00:00:00	2014-06-01 00:00:00	2014-07-01 00:00:00	2014-08-01 00:00:00	2014-09-01 00:00:00	Average Participation
4	Connecticut	5847	5476	5274	5360	5056	5319	5500	5717	5703	5905	5754	5624	5544.58

Creating, reading, and writing workbook

Introduction and relevant resources

Setup

Checking Answers

Exercises