Pandas Home Page



In [0]:

    
import pandas as pd

Creating data¶ There are two core objects in pandas: the DataFrame and the Series.

DataFrame A DataFrame is a table. It contains an array of individual entries, each of which has a certain value. Each entry corresponds to a row (or record) and a column.

For example, consider the following simple DataFrame:



In [2]:

    
pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

In this example, the "0, No" entry has the value of 131. The "0, Yes" entry has a value of 50, and so on.

DataFrame entries are not limited to integers. For instance, here's a DataFrame whose values are strings:



In [3]:

    
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})









    Out[3]:







  
    
      
      Bob
      Sue
    
  
  
    
      0
      I liked it.
      Pretty good.
    
    
      1
      It was awful.
      Bland.

The dictionary-list constructor assigns values to the column labels, but just uses an ascending count from 0 (0, 1, 2, 3, ...) for the row labels. Sometimes this is OK, but oftentimes we will want to assign these labels ourselves.

The list of row labels used in a DataFrame is known as an Index. We can assign values to it by using an index parameter in our constructor:



In [4]:

    
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 
              'Sue': ['Pretty good.', 'Bland.']},
             index=['Product A', 'Product B'])









    Out[4]:







  
    
      
      Bob
      Sue
    
  
  
    
      Product A
      I liked it.
      Pretty good.
    
    
      Product B
      It was awful.
      Bland.

Series

A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list



In [5]:

    
pd.Series([1, 2, 3, 4, 5])









    Out[5]:





0    1
1    2
2    3
3    4
4    5
dtype: int64

A Series is, in essence, a single column of a DataFrame. So you can assign column values to the Series the same way as before, using an index parameter. However, a Series does not have a column name, it only has one overall name:



In [6]:

    
pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')









    Out[6]:





2015 Sales    30
2016 Sales    35
2017 Sales    40
Name: Product A, dtype: int64

The Series and the DataFrame are intimately related. It's helpful to think of a DataFrame as actually being just a bunch of Series "glued together". We'll see more of this in the next section of this tutorial.

Reading data files

Being able to create a DataFrame or Series by hand is handy. But, most of the time, we won't actually be creating our own data by hand. Instead, we'll be working with data that already exists.

Data can be stored in any of a number of different forms and formats. By far the most basic of these is the humble CSV file. When you open a CSV file you get something that looks like this:

Product A,Product B,Product C, 30,21,9, 35,34,1, 41,11,11

So a CSV file is a table of values separated by commas. Hence the name: "Comma-Separated Values", or CSV.

Let's now set aside our toy datasets and see what a real dataset looks like when we read it into a DataFrame. We'll use the pd.read_csv() function to read the data into a DataFrame. This goes thusly:



In [0]:

    
# wine_reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv")

Para bajar los datos:

https://www.kaggle.com/luisrivera/exercise-creating-reading-and-writing/edit

Para subir los datos a Google Collaborative: D:\kaggle\Cursos\Panda



In [0]:

    
wine_reviews = pd.read_csv("winemag-data-130k-v2.csv")

We can use the shape attribute to check how large the resulting DataFrame is:



In [9]:

    
wine_reviews.shape









    Out[9]:





(43904, 14)

So our new DataFrame has 130,000 records split across 14 different columns. That's almost 2 million entries!

We can examine the contents of the resultant DataFrame using the head() command, which grabs the first five rows:



In [10]:

    
wine_reviews.head()









    Out[10]:







  
    
      
      Unnamed: 0
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
  
  
    
      0
      0
      Italy
      Aromas include tropical fruit, broom, brimston...
      Vulkà Bianco
      87
      NaN
      Sicily & Sardinia
      Etna
      NaN
      Kerin O’Keefe
      @kerinokeefe
      Nicosia 2013 Vulkà Bianco  (Etna)
      White Blend
      Nicosia
    
    
      1
      1
      Portugal
      This is ripe and fruity, a wine that is smooth...
      Avidagos
      87
      15.0
      Douro
      NaN
      NaN
      Roger Voss
      @vossroger
      Quinta dos Avidagos 2011 Avidagos Red (Douro)
      Portuguese Red
      Quinta dos Avidagos
    
    
      2
      2
      US
      Tart and snappy, the flavors of lime flesh and...
      NaN
      87
      14.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Rainstorm 2013 Pinot Gris (Willamette Valley)
      Pinot Gris
      Rainstorm
    
    
      3
      3
      US
      Pineapple rind, lemon pith and orange blossom ...
      Reserve Late Harvest
      87
      13.0
      Michigan
      Lake Michigan Shore
      NaN
      Alexander Peartree
      NaN
      St. Julian 2013 Reserve Late Harvest Riesling ...
      Riesling
      St. Julian
    
    
      4
      4
      US
      Much like the regular bottling from 2012, this...
      Vintner's Reserve Wild Child Block
      87
      65.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Sweet Cheeks 2012 Vintner's Reserve Wild Child...
      Pinot Noir
      Sweet Cheeks

The pd.read_csv() function is well-endowed, with over 30 optional parameters you can specify. For example, you can see in this dataset that the CSV file has a built-in index, which pandas did not pick up on automatically. To make pandas use that column for the index (instead of creating a new one from scratch), we can specify an index_col.



In [0]:

    
# wine_reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
# wine_reviews.head()



In [0]:

    
wine_reviews = pd.read_csv("winemag-data-130k-v2.csv", index_col=0)
wine_reviews.head()









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
  
  
    
      0
      Italy
      Aromas include tropical fruit, broom, brimston...
      Vulkà Bianco
      87.0
      NaN
      Sicily & Sardinia
      Etna
      NaN
      Kerin O’Keefe
      @kerinokeefe
      Nicosia 2013 Vulkà Bianco  (Etna)
      White Blend
      Nicosia
    
    
      1
      Portugal
      This is ripe and fruity, a wine that is smooth...
      Avidagos
      87.0
      15.0
      Douro
      NaN
      NaN
      Roger Voss
      @vossroger
      Quinta dos Avidagos 2011 Avidagos Red (Douro)
      Portuguese Red
      Quinta dos Avidagos
    
    
      2
      US
      Tart and snappy, the flavors of lime flesh and...
      NaN
      87.0
      14.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Rainstorm 2013 Pinot Gris (Willamette Valley)
      Pinot Gris
      Rainstorm
    
    
      3
      US
      Pineapple rind, lemon pith and orange blossom ...
      Reserve Late Harvest
      87.0
      13.0
      Michigan
      Lake Michigan Shore
      NaN
      Alexander Peartree
      NaN
      St. Julian 2013 Reserve Late Harvest Riesling ...
      Riesling
      St. Julian
    
    
      4
      US
      Much like the regular bottling from 2012, this...
      Vintner's Reserve Wild Child Block
      87.0
      65.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Sweet Cheeks 2012 Vintner's Reserve Wild Child...
      Pinot Noir
      Sweet Cheeks

Para practicar directo en Kaggle:

https://www.kaggle.com/luisrivera/exercise-creating-reading-and-writing/edit



In [0]:

    
import pandas as pd
pd.set_option('max_rows', 5)
# from learntools.core import binder; binder.bind(globals())
# from learntools.pandas.creating_reading_and_writing import *
# print("Setup complete.")

1.

In the cell below, create a DataFrame fruits that looks like this:



In [12]:

    
# Your code goes here. Create a dataframe matching the above diagram and assign it to the variable fruits.
fruits = pd.DataFrame({'Apples': ['30'], 
              'Bananas': ['21']})

#q1.check()
fruits



In [14]:

    
fruits = pd.DataFrame([[30, 21]], columns=['Apples', 'Bananas'])
fruits

2.

Create a dataframe fruit_sales that matches the diagram below:



In [15]:

    
fruit_sales = pd.DataFrame({'Apples': ['35', '41'], 
                            'Bananas': ['21', '34' ]},
                             index=['2017 Sales', '2018 Sales'])
fruit_sales

3.

Create a variable ingredients with a Series that looks like:

Flour     4 cups
Milk       1 cup
Eggs     2 large
Spam       1 can
Name: Dinner, dtype: object



In [16]:

    
quantities = ['4 cups', '1 cup', '2 large', '1 can']
items = ['Flour', 'Milk', 'Eggs', 'Spam']
recipe = pd.Series(quantities, index=items, name='Dinner')
recipe









    Out[16]:





Flour     4 cups
Milk       1 cup
Eggs     2 large
Spam       1 can
Name: Dinner, dtype: object

4.

Read the following csv dataset of wine reviews into a DataFrame called reviews:

The filepath to the csv file is ../input/wine-reviews/winemag-data_first150k.csv. The first few lines look like:

,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,US,"This tremendous 100% varietal wine[...]",Martha's Vineyard,96,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz
1,Spain,"Ripe aromas of fig, blackberry and[...]",Carodorum Selección Especial Reserva,96,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez



In [18]:

    
#reviews = pd.read_csv("../input/wine-reviews/winemag-data_first150k.csv", index_col=0)
reviews = pd.read_csv("winemag-data_first150k.csv", index_col=0)

reviews









    Out[18]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      variety
      winery
    
  
  
    
      0
      US
      This tremendous 100% varietal wine hails from ...
      Martha's Vineyard
      96
      235.0
      California
      Napa Valley
      Napa
      Cabernet Sauvignon
      Heitz
    
    
      1
      Spain
      Ripe aromas of fig, blackberry and cassis are ...
      Carodorum Selección Especial Reserva
      96
      110.0
      Northern Spain
      Toro
      NaN
      Tinta de Toro
      Bodega Carmen Rodríguez
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      150928
      France
      A perfect salmon shade, with scents of peaches...
      Grand Brut Rosé
      90
      52.0
      Champagne
      Champagne
      NaN
      Champagne Blend
      Gosset
    
    
      150929
      Italy
      More Pinot Grigios should taste like this. A r...
      NaN
      90
      15.0
      Northeastern Italy
      Alto Adige
      NaN
      Pinot Grigio
      Alois Lageder
    
  

150930 rows × 10 columns

5.

Run the cell below to create and display a DataFrame called animals:



In [19]:

    
animals = pd.DataFrame({'Cows': [12, 20], 'Goats': [22, 19]}, 
                       index=['Year 1', 'Year 2'])
animals

In the cell below, write code to save this DataFrame to disk as a csv file with the name cows_and_goats.csv.



In [0]:

    
animals.to_csv("cows_and_goats.csv")

https://www.kaggle.com/residentmario/indexing-selecting-assigning

Naive accessors

Native Python objects provide good ways of indexing data. Pandas carries all of these over, which helps make it easy to start with.

Consider this DataFrame:



In [21]:

    
reviews









    Out[21]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      variety
      winery
    
  
  
    
      0
      US
      This tremendous 100% varietal wine hails from ...
      Martha's Vineyard
      96
      235.0
      California
      Napa Valley
      Napa
      Cabernet Sauvignon
      Heitz
    
    
      1
      Spain
      Ripe aromas of fig, blackberry and cassis are ...
      Carodorum Selección Especial Reserva
      96
      110.0
      Northern Spain
      Toro
      NaN
      Tinta de Toro
      Bodega Carmen Rodríguez
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      150928
      France
      A perfect salmon shade, with scents of peaches...
      Grand Brut Rosé
      90
      52.0
      Champagne
      Champagne
      NaN
      Champagne Blend
      Gosset
    
    
      150929
      Italy
      More Pinot Grigios should taste like this. A r...
      NaN
      90
      15.0
      Northeastern Italy
      Alto Adige
      NaN
      Pinot Grigio
      Alois Lageder
    
  

150930 rows × 10 columns

In Python, we can access the property of an object by accessing it as an attribute. A book object, for example, might have a title property, which we can access by calling book.title. Columns in a pandas DataFrame work in much the same way.

Hence to access the country property of reviews we can use:



In [22]:

    
reviews.country









    Out[22]:





0             US
1          Spain
           ...  
150928    France
150929     Italy
Name: country, Length: 150930, dtype: object

If we have a Python dictionary, we can access its values using the indexing ([]) operator. We can do the same with columns in a DataFrame:



In [23]:

    
reviews['country']









    Out[23]:





0             US
1          Spain
           ...  
150928    France
150929     Italy
Name: country, Length: 150930, dtype: object

These are the two ways of selecting a specific Series out of a DataFrame. Neither of them is more or less syntactically valid than the other, but the indexing operator [] does have the advantage that it can handle column names with reserved characters in them (e.g. if we had a country providence column, reviews.country providence wouldn't work).

Doesn't a pandas Series look kind of like a fancy dictionary? It pretty much is, so it's no surprise that, to drill down to a single specific value, we need only use the indexing operator [] once more:



In [24]:

    
reviews['country'][0]









    Out[24]:





'US'

Indexing in pandas

The indexing operator and attribute selection are nice because they work just like they do in the rest of the Python ecosystem. As a novice, this makes them easy to pick up and use. However, pandas has its own accessor operators, loc and iloc. For more advanced operations, these are the ones you're supposed to be using.

Index-based selection Pandas indexing works in one of two paradigms. The first is index-based selection: selecting data based on its numerical position in the data. iloc follows this paradigm.

To select the first row of data in a DataFrame, we may use the following:



In [0]:

    
reviews.iloc[0]









    Out[0]:





country                                                       US
description    This tremendous 100% varietal wine hails from ...
                                     ...                        
variety                                       Cabernet Sauvignon
winery                                                     Heitz
Name: 0, Length: 10, dtype: object

Both loc and iloc are row-first, column-second. This is the opposite of what we do in native Python, which is column-first, row-second.

This means that it's marginally easier to retrieve rows, and marginally harder to get retrieve columns. To get a column with iloc, we can do the following:



In [0]:

    
reviews.iloc[:, 0]









    Out[0]:





0           US
1        Spain
         ...  
15739       US
15740       US
Name: country, Length: 15741, dtype: object



In [0]:

    
On its own, the : operator, which also comes from native Python, means "everything". When combined with other selectors, however, it can be used to indicate a range of values. For example, to select the country column from just the first, second, and third row, we would do:



In [0]:

    
reviews.iloc[:3, 0]









    Out[0]:





0       US
1    Spain
2       US
Name: country, dtype: object

Or, to select just the second and third entries, we would do:



In [0]:

    
reviews.iloc[1:3, 0]









    Out[0]:





1    Spain
2       US
Name: country, dtype: object

It's also possible to pass a list:



In [0]:

    
reviews.iloc[[0, 1, 2], 0]









    Out[0]:





0       US
1    Spain
2       US
Name: country, dtype: object

Finally, it's worth knowing that negative numbers can be used in selection. This will start counting forwards from the end of the values. So for example here are the last five elements of the dataset.



In [0]:

    
reviews.iloc[-5:]









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      variety
      winery
    
  
  
    
      15736
      US
      Clean and zesty, with high acidity framing jam...
      NaN
      85
      17.0
      California
      Napa Valley
      Napa
      Sauvignon Blanc
      Madrigal
    
    
      15737
      Australia
      Hazyblur has coaxed some expressive fruit out ...
      NaN
      85
      19.0
      South Australia
      Kangaroo Island
      NaN
      Pinot Gris
      Hazyblur
    
    
      15738
      US
      Heavy and full-bodied, a dense, soft Pinot Noi...
      Bacigalupi Vineyards
      85
      42.0
      California
      Russian River Valley
      Sonoma
      Pinot Noir
      John Tyler
    
    
      15739
      US
      Estate-grown, tart and tasting of citrus and g...
      NaN
      85
      12.0
      Washington
      Columbia Valley (WA)
      Columbia Valley
      Riesling
      Jones of Washington
    
    
      15740
      US
      A good, rugged Petite Sirah, with the bigtime ...
      NaN
      85
      15.0
      California
      North
      NaN
      NaN
      NaN

Label-based selection The second paradigm for attribute selection is the one followed by the loc operator: label-based selection. In this paradigm, it's the data index value, not its position, which matters.

For example, to get the first entry in reviews, we would now do the following:



In [0]:

    
reviews.loc[0, 'country']









    Out[0]:





'US'

iloc is conceptually simpler than loc because it ignores the dataset's indices. When we use iloc we treat the dataset like a big matrix (a list of lists), one that we have to index into by position. loc, by contrast, uses the information in the indices to do its work. Since your dataset usually has meaningful indices, it's usually easier to do things using loc instead. For example, here's one operation that's much easier using loc:



In [0]:

    
reviews.loc[:, ['taster_name', 'taster_twitter_handle', 'points']]









    



/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py:1418: FutureWarning: 
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike
  return self._getitem_tuple(key)






    Out[0]:







  
    
      
      taster_name
      taster_twitter_handle
      points
    
  
  
    
      0
      NaN
      NaN
      96
    
    
      1
      NaN
      NaN
      96
    
    
      ...
      ...
      ...
      ...
    
    
      15739
      NaN
      NaN
      85
    
    
      15740
      NaN
      NaN
      85
    
  

15741 rows × 3 columns

Choosing between loc and iloc When choosing or transitioning between loc and iloc, there is one "gotcha" worth keeping in mind, which is that the two methods use slightly different indexing schemes.

iloc uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded. So 0:10 will select entries 0,...,9. loc, meanwhile, indexes inclusively. So 0:10 will select entries 0,...,10.

Why the change? Remember that loc can index any stdlib type: strings, for example. If we have a DataFrame with index values Apples, ..., Potatoes, ..., and we want to select "all the alphabetical fruit choices between Apples and Potatoes", then it's a lot more convenient to index df.loc['Apples':'Potatoes'] than it is to index something like df.loc['Apples', 'Potatoet] (t coming after s in the alphabet).

This is particularly confusing when the DataFrame index is a simple numerical list, e.g. 0,...,1000. In this case df.iloc[0:1000] will return 1000 entries, while df.loc[0:1000] return 1001 of them! To get 1000 elements using loc, you will need to go one lower and ask for df.iloc[0:999].

Otherwise, the semantics of using loc are the same as those for iloc.

Manipulating the index

Label-based selection derives its power from the labels in the index. Critically, the index we use is not immutable. We can manipulate the index in any way we see fit.

The set_index() method can be used to do the job. Here is what happens when we set_index to the title field:



In [0]:

    
# reviews.set_index("title")  # da error, no existe esa columna



In [0]:

    
reviews.set_index("variety")









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      winery
    
    
      variety
      
      
      
      
      
      
      
      
      
    
  
  
    
      Cabernet Sauvignon
      US
      This tremendous 100% varietal wine hails from ...
      Martha's Vineyard
      96
      235.0
      California
      Napa Valley
      Napa
      Heitz
    
    
      Tinta de Toro
      Spain
      Ripe aromas of fig, blackberry and cassis are ...
      Carodorum Selección Especial Reserva
      96
      110.0
      Northern Spain
      Toro
      NaN
      Bodega Carmen Rodríguez
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      Riesling
      US
      Estate-grown, tart and tasting of citrus and g...
      NaN
      85
      12.0
      Washington
      Columbia Valley (WA)
      Columbia Valley
      Jones of Washington
    
    
      NaN
      US
      A good, rugged Petite Sirah, with the bigtime ...
      NaN
      85
      15.0
      California
      North
      NaN
      NaN
    
  

15741 rows × 9 columns

This is useful if you can come up with an index for the dataset which is better than the current one.

Conditional selection

So far we've been indexing various strides of data, using structural properties of the DataFrame itself. To do interesting things with the data, however, we often need to ask questions based on conditions.

For example, suppose that we're interested specifically in better-than-average wines produced in Italy.

We can start by checking if each wine is Italian or not:



In [0]:

    
reviews.country == 'Italy'









    Out[0]:





0        False
1        False
         ...  
15739    False
15740    False
Name: country, Length: 15741, dtype: bool

This operation produced a Series of True/False booleans based on the country of each record. This result can then be used inside of loc to select the relevant data:



In [0]:

    
reviews.loc[reviews.country == 'Italy']









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      variety
      winery
    
  
  
    
      10
      Italy
      Elegance, complexity and structure come togeth...
      Ronco della Chiesa
      95
      80.0
      Northeastern Italy
      Collio
      NaN
      Friulano
      Borgo del Tiglio
    
    
      32
      Italy
      Underbrush, scorched earth, menthol and plum s...
      Vigna Piaggia
      90
      NaN
      Tuscany
      Brunello di Montalcino
      NaN
      Sangiovese
      Abbadia Ardenga
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      15600
      Italy
      This Negroamaro-Aglianico blend opens slowly, ...
      Terra Riserva
      90
      30.0
      Southern Italy
      Leverano
      NaN
      Red Blend
      Conti Zecca
    
    
      15609
      Italy
      Here is a dense and intense Primitivo with exo...
      Punta Aquila
      90
      NaN
      Southern Italy
      Puglia
      NaN
      Primitivo
      Tenute Rubino
    
  

2303 rows × 10 columns

This DataFrame has ~20,000 rows. The original had ~130,000. That means that around 15% of wines originate from Italy.

We also wanted to know which ones are better than average. Wines are reviewed on a 80-to-100 point scale, so this could mean wines that accrued at least 90 points.

We can use the ampersand (&) to bring the two questions together:



In [0]:

    
reviews.loc[(reviews.country == 'Italy') & (reviews.points >= 90)]









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      variety
      winery
    
  
  
    
      10
      Italy
      Elegance, complexity and structure come togeth...
      Ronco della Chiesa
      95
      80.0
      Northeastern Italy
      Collio
      NaN
      Friulano
      Borgo del Tiglio
    
    
      32
      Italy
      Underbrush, scorched earth, menthol and plum s...
      Vigna Piaggia
      90
      NaN
      Tuscany
      Brunello di Montalcino
      NaN
      Sangiovese
      Abbadia Ardenga
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      15600
      Italy
      This Negroamaro-Aglianico blend opens slowly, ...
      Terra Riserva
      90
      30.0
      Southern Italy
      Leverano
      NaN
      Red Blend
      Conti Zecca
    
    
      15609
      Italy
      Here is a dense and intense Primitivo with exo...
      Punta Aquila
      90
      NaN
      Southern Italy
      Puglia
      NaN
      Primitivo
      Tenute Rubino
    
  

776 rows × 10 columns

Suppose we'll buy any wine that's made in Italy or which is rated above average. For this we use a pipe (|):



In [0]:

    
reviews.loc[(reviews.country == 'Italy') | (reviews.points >= 90)]









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      variety
      winery
    
  
  
    
      0
      US
      This tremendous 100% varietal wine hails from ...
      Martha's Vineyard
      96
      235.0
      California
      Napa Valley
      Napa
      Cabernet Sauvignon
      Heitz
    
    
      1
      Spain
      Ripe aromas of fig, blackberry and cassis are ...
      Carodorum Selección Especial Reserva
      96
      110.0
      Northern Spain
      Toro
      NaN
      Tinta de Toro
      Bodega Carmen Rodríguez
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      15623
      Australia
      Produced by Frankland Estate, this is a full-b...
      NaN
      90
      14.0
      Western Australia
      Frankland River
      NaN
      Riesling
      Rocky Gully
    
    
      15624
      Germany
      A bit fuller and drier than you might expect f...
      Butterfly
      90
      16.0
      Mosel-Saar-Ruwer
      NaN
      NaN
      Riesling
      Zilliken
    
  

7853 rows × 10 columns

Pandas comes with a few built-in conditional selectors, two of which we will highlight here.

The first is isin. isin is lets you select data whose value "is in" a list of values. For example, here's how we can use it to select wines only from Italy or France:



In [0]:

    
reviews.loc[reviews.country.isin(['Italy', 'France'])]









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      variety
      winery
    
  
  
    
      4
      France
      This is the top wine from La Bégude, named aft...
      La Brûlade
      95
      66.0
      Provence
      Bandol
      NaN
      Provence red blend
      Domaine de la Bégude
    
    
      10
      Italy
      Elegance, complexity and structure come togeth...
      Ronco della Chiesa
      95
      80.0
      Northeastern Italy
      Collio
      NaN
      Friulano
      Borgo del Tiglio
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      15704
      France
      From fruit growing around the village of Wetto...
      Cuvée Albert
      86
      NaN
      Alsace
      Alsace
      NaN
      Riesling
      Domaine Albert Mann
    
    
      15727
      France
      Strong currant aromas, soft in texture with hi...
      La Montarlet
      85
      20.0
      Loire Valley
      Sancerre
      NaN
      Sauvignon Blanc
      Joseph Mellot
    
  

5230 rows × 10 columns

The second is isnull (and its companion notnull). These methods let you highlight values which are (or are not) empty (NaN). For example, to filter out wines lacking a price tag in the dataset, here's what we would do:



In [0]:

    
reviews.loc[reviews.price.notnull()]









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      variety
      winery
    
  
  
    
      0
      US
      This tremendous 100% varietal wine hails from ...
      Martha's Vineyard
      96
      235.0
      California
      Napa Valley
      Napa
      Cabernet Sauvignon
      Heitz
    
    
      1
      Spain
      Ripe aromas of fig, blackberry and cassis are ...
      Carodorum Selección Especial Reserva
      96
      110.0
      Northern Spain
      Toro
      NaN
      Tinta de Toro
      Bodega Carmen Rodríguez
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      15739
      US
      Estate-grown, tart and tasting of citrus and g...
      NaN
      85
      12.0
      Washington
      Columbia Valley (WA)
      Columbia Valley
      Riesling
      Jones of Washington
    
    
      15740
      US
      A good, rugged Petite Sirah, with the bigtime ...
      NaN
      85
      15.0
      California
      North
      NaN
      NaN
      NaN
    
  

14776 rows × 10 columns

Assigning data

Going the other way, assigning data to a DataFrame is easy. You can assign either a constant value:



In [0]:

    
reviews['critic'] = 'everyone'
reviews['critic']









    Out[0]:





0        everyone
1        everyone
           ...   
15739    everyone
15740    everyone
Name: critic, Length: 15741, dtype: object

Or with an iterable of values:



In [0]:

    
reviews['index_backwards'] = range(len(reviews), 0, -1)
reviews['index_backwards']









    Out[0]:





0        15741
1        15740
         ...  
15739        2
15740        1
Name: index_backwards, Length: 15741, dtype: int64

https://www.kaggle.com/luisrivera/exercise-indexing-selecting-assigning/edit

Ejercicios

Run the following cell to load your data and some utility functions (including code to check your answers).



In [0]:

    
import pandas as pd

# reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
reviews = pd.read_csv("winemag-data-130k-v2.csv", index_col=0)
pd.set_option("display.max_rows", 5)

# from learntools.core import binder; binder.bind(globals())
# from learntools.pandas.indexing_selecting_and_assigning import *
# print("Setup complete.")

Look at an overview of your data by running the following line.



In [0]:

    
reviews.head()









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
  
  
    
      0
      Italy
      Aromas include tropical fruit, broom, brimston...
      Vulkà Bianco
      87
      NaN
      Sicily & Sardinia
      Etna
      NaN
      Kerin O’Keefe
      @kerinokeefe
      Nicosia 2013 Vulkà Bianco  (Etna)
      White Blend
      Nicosia
    
    
      1
      Portugal
      This is ripe and fruity, a wine that is smooth...
      Avidagos
      87
      15.0
      Douro
      NaN
      NaN
      Roger Voss
      @vossroger
      Quinta dos Avidagos 2011 Avidagos Red (Douro)
      Portuguese Red
      Quinta dos Avidagos
    
    
      2
      US
      Tart and snappy, the flavors of lime flesh and...
      NaN
      87
      14.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Rainstorm 2013 Pinot Gris (Willamette Valley)
      Pinot Gris
      Rainstorm
    
    
      3
      US
      Pineapple rind, lemon pith and orange blossom ...
      Reserve Late Harvest
      87
      13.0
      Michigan
      Lake Michigan Shore
      NaN
      Alexander Peartree
      NaN
      St. Julian 2013 Reserve Late Harvest Riesling ...
      Riesling
      St. Julian
    
    
      4
      US
      Much like the regular bottling from 2012, this...
      Vintner's Reserve Wild Child Block
      87
      65.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Sweet Cheeks 2012 Vintner's Reserve Wild Child...
      Pinot Noir
      Sweet Cheeks

# Exercises

1.

Select the description column from reviews and assign the result to the variable desc.



In [0]:

    
desc = reviews['description']



In [0]:

    
desc = reviews.description
#or

#desc = reviews["description"]
# desc is a pandas Series object, with an index matching the reviews DataFrame. In general, when we select a single column from a DataFrame, we'll get a Series.

desc.head(10)









    Out[0]:





0    Aromas include tropical fruit, broom, brimston...
1    This is ripe and fruity, a wine that is smooth...
                           ...                        
8    Savory dried thyme notes accent sunnier flavor...
9    This has great depth of flavor with its fresh ...
Name: description, Length: 10, dtype: object

2.

Select the first value from the description column of reviews, assigning it to variable first_description.



In [0]:

    
first_description = reviews["description"][0]

# q2.check()
first_description









    Out[0]:





"Aromas include tropical fruit, broom, brimstone and dried herb. The palate isn't overly expressive, offering unripened apple, citrus and dried sage alongside brisk acidity."



In [0]:

    
# Solution:

first_description = reviews.description.iloc[0]
# Note that while this is the preferred way to obtain the entry in the DataFrame, many other options will return a valid result,
# such as reviews.description.loc[0], reviews.description[0], and more!
first_description









    Out[0]:





"Aromas include tropical fruit, broom, brimstone and dried herb. The palate isn't overly expressive, offering unripened apple, citrus and dried sage alongside brisk acidity."

3.

Select the first row of data (the first record) from reviews, assigning it to the variable first_row.



In [0]:

    
first_row = reviews.iloc[0]

# q3.check()
first_row









    Out[0]:





country                                                    Italy
description    Aromas include tropical fruit, broom, brimston...
                                     ...                        
variety                                              White Blend
winery                                                   Nicosia
Name: 0, Length: 13, dtype: object



In [0]:

    
# Solution:

first_row = reviews.iloc[0]

4.

Select the first 10 values from the description column in reviews, assigning the result to variable first_descriptions.

Hint: format your output as a pandas Series.



In [0]:

    
first_descriptions = reviews.iloc[:10, 1]
# first_descriptions = reviews.description.iloc[0:9]
# first_descriptions = reviews.description.loc[0:9,'description']

# q4.check()
first_descriptions









    Out[0]:





0    Aromas include tropical fruit, broom, brimston...
1    This is ripe and fruity, a wine that is smooth...
                           ...                        
8    Savory dried thyme notes accent sunnier flavor...
9    This has great depth of flavor with its fresh ...
Name: description, Length: 10, dtype: object



In [0]:

    
# Solution:

first_descriptions = reviews.description.iloc[:10]
# Note that many other options will return a valid result, such as desc.head(10) and reviews.loc[:9, "description"].
first_descriptions









    Out[0]:





0    Aromas include tropical fruit, broom, brimston...
1    This is ripe and fruity, a wine that is smooth...
                           ...                        
8    Savory dried thyme notes accent sunnier flavor...
9    This has great depth of flavor with its fresh ...
Name: description, Length: 10, dtype: object

5.

Select the records with index labels 1, 2, 3, 5, and 8, assigning the result to the variable sample_reviews.

In other words, generate the following DataFrame:



In [0]:

    
sample_reviews = reviews.iloc[[1,2,3,5,8],]

# q5.check()
sample_reviews









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
  
  
    
      1
      Portugal
      This is ripe and fruity, a wine that is smooth...
      Avidagos
      87
      15.0
      Douro
      NaN
      NaN
      Roger Voss
      @vossroger
      Quinta dos Avidagos 2011 Avidagos Red (Douro)
      Portuguese Red
      Quinta dos Avidagos
    
    
      2
      US
      Tart and snappy, the flavors of lime flesh and...
      NaN
      87
      14.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Rainstorm 2013 Pinot Gris (Willamette Valley)
      Pinot Gris
      Rainstorm
    
    
      3
      US
      Pineapple rind, lemon pith and orange blossom ...
      Reserve Late Harvest
      87
      13.0
      Michigan
      Lake Michigan Shore
      NaN
      Alexander Peartree
      NaN
      St. Julian 2013 Reserve Late Harvest Riesling ...
      Riesling
      St. Julian
    
    
      5
      Spain
      Blackberry and raspberry aromas show a typical...
      Ars In Vitro
      87
      15.0
      Northern Spain
      Navarra
      NaN
      Michael Schachner
      @wineschach
      Tandem 2011 Ars In Vitro Tempranillo-Merlot (N...
      Tempranillo-Merlot
      Tandem
    
    
      8
      Germany
      Savory dried thyme notes accent sunnier flavor...
      Shine
      87
      12.0
      Rheinhessen
      NaN
      NaN
      Anna Lee C. Iijima
      NaN
      Heinz Eifel 2013 Shine Gewürztraminer (Rheinhe...
      Gewürztraminer
      Heinz Eifel



In [0]:

    
# Solution:

indices = [1, 2, 3, 5, 8]
sample_reviews = reviews.loc[indices]
sample_reviews









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
  
  
    
      1
      Portugal
      This is ripe and fruity, a wine that is smooth...
      Avidagos
      87
      15.0
      Douro
      NaN
      NaN
      Roger Voss
      @vossroger
      Quinta dos Avidagos 2011 Avidagos Red (Douro)
      Portuguese Red
      Quinta dos Avidagos
    
    
      2
      US
      Tart and snappy, the flavors of lime flesh and...
      NaN
      87
      14.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Rainstorm 2013 Pinot Gris (Willamette Valley)
      Pinot Gris
      Rainstorm
    
    
      3
      US
      Pineapple rind, lemon pith and orange blossom ...
      Reserve Late Harvest
      87
      13.0
      Michigan
      Lake Michigan Shore
      NaN
      Alexander Peartree
      NaN
      St. Julian 2013 Reserve Late Harvest Riesling ...
      Riesling
      St. Julian
    
    
      5
      Spain
      Blackberry and raspberry aromas show a typical...
      Ars In Vitro
      87
      15.0
      Northern Spain
      Navarra
      NaN
      Michael Schachner
      @wineschach
      Tandem 2011 Ars In Vitro Tempranillo-Merlot (N...
      Tempranillo-Merlot
      Tandem
    
    
      8
      Germany
      Savory dried thyme notes accent sunnier flavor...
      Shine
      87
      12.0
      Rheinhessen
      NaN
      NaN
      Anna Lee C. Iijima
      NaN
      Heinz Eifel 2013 Shine Gewürztraminer (Rheinhe...
      Gewürztraminer
      Heinz Eifel

6.

Create a variable df containing the country, province, region_1, and region_2 columns of the records with the index labels 0, 1, 10, and 100. In other words, generate the following DataFrame:



In [0]:

    
df = reviews.loc[[0,1,10,100],['country', 'province', 'region_1', 'region_2']]

# q6.check()
df









    Out[0]:







  
    
      
      country
      province
      region_1
      region_2
    
  
  
    
      0
      Italy
      Sicily & Sardinia
      Etna
      NaN
    
    
      1
      Portugal
      Douro
      NaN
      NaN
    
    
      10
      US
      California
      Napa Valley
      Napa
    
    
      100
      US
      New York
      Finger Lakes
      Finger Lakes



In [0]:

    
# Solution:

cols = ['country', 'province', 'region_1', 'region_2']
indices = [0, 1, 10, 100]
df = reviews.loc[indices, cols]
df









    Out[0]:







  
    
      
      country
      province
      region_1
      region_2
    
  
  
    
      0
      Italy
      Sicily & Sardinia
      Etna
      NaN
    
    
      1
      Portugal
      Douro
      NaN
      NaN
    
    
      10
      US
      California
      Napa Valley
      Napa
    
    
      100
      US
      New York
      Finger Lakes
      Finger Lakes

7.

Create a variable df containing the country and variety columns of the first 100 records.

Hint: you may use loc or iloc. When working on the answer this question and the several of the ones that follow, keep the following "gotcha" described in the tutorial:

iloc uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded. loc, meanwhile, indexes inclusively.

This is particularly confusing when the DataFrame index is a simple numerical list, e.g. 0,...,1000. In this case df.iloc[0:1000] will return 1000 entries, while df.loc[0:1000] return 1001 of them! To get 1000 elements using loc, you will need to go one lower and ask for df.iloc[0:999].



In [0]:

    
df = reviews.loc[0:99,['country', 'variety']]

# q7.check()
df









    Out[0]:







  
    
      
      country
      variety
    
  
  
    
      0
      Italy
      White Blend
    
    
      1
      Portugal
      Portuguese Red
    
    
      ...
      ...
      ...
    
    
      98
      Italy
      Sangiovese
    
    
      99
      US
      Bordeaux-style Red Blend
    
  

100 rows × 2 columns



In [0]:

    
# # Correct:

cols = ['country', 'variety']
df = reviews.loc[:99, cols]
# or

# cols_idx = [0, 11]
# df = reviews.iloc[:100, cols_idx]
df









    Out[0]:







  
    
      
      country
      variety
    
  
  
    
      0
      Italy
      White Blend
    
    
      1
      Portugal
      Portuguese Red
    
    
      ...
      ...
      ...
    
    
      98
      Italy
      Sangiovese
    
    
      99
      US
      Bordeaux-style Red Blend
    
  

100 rows × 2 columns

8.

Create a DataFrame italian_wines containing reviews of wines made in Italy. Hint: reviews.country equals what?



In [0]:

    
italian_wines = reviews.loc[reviews.country == 'Italy']

# q8.check()
italian_wines









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
  
  
    
      0
      Italy
      Aromas include tropical fruit, broom, brimston...
      Vulkà Bianco
      87
      NaN
      Sicily & Sardinia
      Etna
      NaN
      Kerin O’Keefe
      @kerinokeefe
      Nicosia 2013 Vulkà Bianco  (Etna)
      White Blend
      Nicosia
    
    
      6
      Italy
      Here's a bright, informal red that opens with ...
      Belsito
      87
      16.0
      Sicily & Sardinia
      Vittoria
      NaN
      Kerin O’Keefe
      @kerinokeefe
      Terre di Giurfo 2013 Belsito Frappato (Vittoria)
      Frappato
      Terre di Giurfo
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      129961
      Italy
      Intense aromas of wild cherry, baking spice, t...
      NaN
      90
      30.0
      Sicily & Sardinia
      Sicilia
      NaN
      Kerin O’Keefe
      @kerinokeefe
      COS 2013 Frappato (Sicilia)
      Frappato
      COS
    
    
      129962
      Italy
      Blackberry, cassis, grilled herb and toasted a...
      Sàgana Tenuta San Giacomo
      90
      40.0
      Sicily & Sardinia
      Sicilia
      NaN
      Kerin O’Keefe
      @kerinokeefe
      Cusumano 2012 Sàgana Tenuta San Giacomo Nero d...
      Nero d'Avola
      Cusumano
    
  

19540 rows × 13 columns



In [0]:

    
# Solution:

italian_wines = reviews[reviews.country == 'Italy']
italian_wines









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
  
  
    
      0
      Italy
      Aromas include tropical fruit, broom, brimston...
      Vulkà Bianco
      87
      NaN
      Sicily & Sardinia
      Etna
      NaN
      Kerin O’Keefe
      @kerinokeefe
      Nicosia 2013 Vulkà Bianco  (Etna)
      White Blend
      Nicosia
    
    
      6
      Italy
      Here's a bright, informal red that opens with ...
      Belsito
      87
      16.0
      Sicily & Sardinia
      Vittoria
      NaN
      Kerin O’Keefe
      @kerinokeefe
      Terre di Giurfo 2013 Belsito Frappato (Vittoria)
      Frappato
      Terre di Giurfo
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      129961
      Italy
      Intense aromas of wild cherry, baking spice, t...
      NaN
      90
      30.0
      Sicily & Sardinia
      Sicilia
      NaN
      Kerin O’Keefe
      @kerinokeefe
      COS 2013 Frappato (Sicilia)
      Frappato
      COS
    
    
      129962
      Italy
      Blackberry, cassis, grilled herb and toasted a...
      Sàgana Tenuta San Giacomo
      90
      40.0
      Sicily & Sardinia
      Sicilia
      NaN
      Kerin O’Keefe
      @kerinokeefe
      Cusumano 2012 Sàgana Tenuta San Giacomo Nero d...
      Nero d'Avola
      Cusumano
    
  

19540 rows × 13 columns

9.

Create a DataFrame top_oceania_wines containing all reviews with at least 95 points (out of 100) for wines from Australia or New Zealand.



In [0]:

    
top_oceania_wines = reviews.loc[reviews.country.isin(['Australia', 'New Zealand'])
                                & (reviews.points >= 95)]

# q9.check()
top_oceania_wines









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
  
  
    
      345
      Australia
      This wine contains some material over 100 year...
      Rare
      100
      350.0
      Victoria
      Rutherglen
      NaN
      Joe Czerwinski
      @JoeCz
      Chambers Rosewood Vineyards NV Rare Muscat (Ru...
      Muscat
      Chambers Rosewood Vineyards
    
    
      346
      Australia
      This deep brown wine smells like a damp, mossy...
      Rare
      98
      350.0
      Victoria
      Rutherglen
      NaN
      Joe Czerwinski
      @JoeCz
      Chambers Rosewood Vineyards NV Rare Muscadelle...
      Muscadelle
      Chambers Rosewood Vineyards
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      122507
      New Zealand
      This blend of Cabernet Sauvignon (62.5%), Merl...
      SQM Gimblett Gravels Cabernets/Merlot
      95
      79.0
      Hawke's Bay
      NaN
      NaN
      Joe Czerwinski
      @JoeCz
      Squawking Magpie 2014 SQM Gimblett Gravels Cab...
      Bordeaux-style Red Blend
      Squawking Magpie
    
    
      122939
      Australia
      Full-bodied and plush yet vibrant and imbued w...
      The Factor
      98
      125.0
      South Australia
      Barossa Valley
      NaN
      Joe Czerwinski
      @JoeCz
      Torbreck 2013 The Factor Shiraz (Barossa Valley)
      Shiraz
      Torbreck
    
  

49 rows × 13 columns



In [0]:

    
# Solution:

top_oceania_wines = reviews.loc[
    (reviews.country.isin(['Australia', 'New Zealand']))
    & (reviews.points >= 95)
]

https://www.kaggle.com/residentmario/summary-functions-and-maps

Funciones y Mapas



In [0]:

    
reviews









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
  
  
    
      0
      Italy
      Aromas include tropical fruit, broom, brimston...
      Vulkà Bianco
      87
      NaN
      Sicily & Sardinia
      Etna
      NaN
      Kerin O’Keefe
      @kerinokeefe
      Nicosia 2013 Vulkà Bianco  (Etna)
      White Blend
      Nicosia
    
    
      1
      Portugal
      This is ripe and fruity, a wine that is smooth...
      Avidagos
      87
      15.0
      Douro
      NaN
      NaN
      Roger Voss
      @vossroger
      Quinta dos Avidagos 2011 Avidagos Red (Douro)
      Portuguese Red
      Quinta dos Avidagos
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      129969
      France
      A dry style of Pinot Gris, this is crisp with ...
      NaN
      90
      32.0
      Alsace
      Alsace
      NaN
      Roger Voss
      @vossroger
      Domaine Marcel Deiss 2012 Pinot Gris (Alsace)
      Pinot Gris
      Domaine Marcel Deiss
    
    
      129970
      France
      Big, rich and off-dry, this is powered by inte...
      Lieu-dit Harth Cuvée Caroline
      90
      21.0
      Alsace
      Alsace
      NaN
      Roger Voss
      @vossroger
      Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...
      Gewürztraminer
      Domaine Schoffit
    
  

129971 rows × 13 columns

Summary functions

Pandas provides many simple "summary functions" (not an official name) which restructure the data in some useful way. For example, consider the describe() method:



In [0]:

    
reviews.points.describe()









    Out[0]:





count    129971.000000
mean         88.447138
             ...      
75%          91.000000
max         100.000000
Name: points, Length: 8, dtype: float64

This method generates a high-level summary of the attributes of the given column. It is type-aware, meaning that its output changes based on the data type of the input. The output above only makes sense for numerical data; for string data here's what we get:



In [0]:

    
reviews.taster_name.describe()









    Out[0]:





count         103727
unique            19
top       Roger Voss
freq           25514
Name: taster_name, dtype: object

If you want to get some particular simple summary statistic about a column in a DataFrame or a Series, there is usually a helpful pandas function that makes it happen.

For example, to see the mean of the points allotted (e.g. how well an averagely rated wine does), we can use the mean() function:



In [0]:

    
reviews.points.mean()









    Out[0]:





88.44713820775404

To see a list of unique values we can use the unique() function:



In [0]:

    
# reviews.taster_name.unique()    # se demora mucho??

To see a list of unique values and how often they occur in the dataset, we can use the value_counts() method:



In [0]:

    
reviews.taster_name.value_counts()









    Out[0]:





Roger Voss           25514
Michael Schachner    15134
                     ...  
Fiona Adams             27
Christina Pickard        6
Name: taster_name, Length: 19, dtype: int64

Maps

A map is a term, borrowed from mathematics, for a function that takes one set of values and "maps" them to another set of values. In data science we often have a need for creating new representations from existing data, or for transforming data from the format it is in now to the format that we want it to be in later. Maps are what handle this work, making them extremely important for getting your work done!

There are two mapping methods that you will use often.

map() is the first, and slightly simpler one. For example, suppose that we wanted to remean the scores the wines received to 0. We can do this as follows:



In [0]:

    
review_points_mean = reviews.points.mean()
reviews.points.map(lambda p: p - review_points_mean)









    Out[0]:





0        -1.447138
1        -1.447138
            ...   
129969    1.552862
129970    1.552862
Name: points, Length: 129971, dtype: float64

The function you pass to map() should expect a single value from the Series (a point value, in the above example), and return a transformed version of that value. map() returns a new Series where all the values have been transformed by your function.

apply() is the equivalent method if we want to transform a whole DataFrame by calling a custom method on each row.

La función que pase a map () debería esperar un único valor de la Serie (un valor de punto, en el ejemplo anterior) y devolver una versión transformada de ese valor. map () devuelve una nueva serie donde todos los valores han sido transformados por su función.

apply () es el método equivalente si queremos transformar un DataFrame completo llamando a un método personalizado en cada fila.



In [0]:

    
def remean_points(row):
    row.points = row.points - review_points_mean
    return row

reviews.apply(remean_points, axis='columns')









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
  
  
    
      0
      Italy
      Aromas include tropical fruit, broom, brimston...
      Vulkà Bianco
      -1.447138
      NaN
      Sicily & Sardinia
      Etna
      NaN
      Kerin O’Keefe
      @kerinokeefe
      Nicosia 2013 Vulkà Bianco  (Etna)
      White Blend
      Nicosia
    
    
      1
      Portugal
      This is ripe and fruity, a wine that is smooth...
      Avidagos
      -1.447138
      15.0
      Douro
      NaN
      NaN
      Roger Voss
      @vossroger
      Quinta dos Avidagos 2011 Avidagos Red (Douro)
      Portuguese Red
      Quinta dos Avidagos
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      129969
      France
      A dry style of Pinot Gris, this is crisp with ...
      NaN
      1.552862
      32.0
      Alsace
      Alsace
      NaN
      Roger Voss
      @vossroger
      Domaine Marcel Deiss 2012 Pinot Gris (Alsace)
      Pinot Gris
      Domaine Marcel Deiss
    
    
      129970
      France
      Big, rich and off-dry, this is powered by inte...
      Lieu-dit Harth Cuvée Caroline
      1.552862
      21.0
      Alsace
      Alsace
      NaN
      Roger Voss
      @vossroger
      Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...
      Gewürztraminer
      Domaine Schoffit
    
  

129971 rows × 13 columns

If we had called reviews.apply() with axis='index', then instead of passing a function to transform each row, we would need to give a function to transform each column.

Note that map() and apply() return new, transformed Series and DataFrames, respectively. They don't modify the original data they're called on. If we look at the first row of reviews, we can see that it still has its original points value.

Si hubiéramos llamado reviews.apply () con axis = 'index', entonces, en lugar de pasar una función para transformar cada fila, tendríamos que dar una función para transformar cada columna.

Tenga en cuenta que map () y apply () devuelven Series y DataFrames nuevos y transformados, respectivamente. No modifican los datos originales a los que se les solicita. Si miramos la primera fila de revisiones, podemos ver que todavía tiene su valor de puntos original.



In [0]:

    
reviews.head(1)









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
  
  
    
      0
      Italy
      Aromas include tropical fruit, broom, brimston...
      Vulkà Bianco
      87
      NaN
      Sicily & Sardinia
      Etna
      NaN
      Kerin O’Keefe
      @kerinokeefe
      Nicosia 2013 Vulkà Bianco  (Etna)
      White Blend
      Nicosia

Pandas provides many common mapping operations as built-ins. For example, here's a faster way of remeaning our points column:



In [0]:

    
review_points_mean = reviews.points.mean()
reviews.points - review_points_mean









    Out[0]:





0        -1.447138
1        -1.447138
            ...   
129969    1.552862
129970    1.552862
Name: points, Length: 129971, dtype: float64

In this code we are performing an operation between a lot of values on the left-hand side (everything in the Series) and a single value on the right-hand side (the mean value). Pandas looks at this expression and figures out that we must mean to subtract that mean value from every value in the dataset.

Pandas will also understand what to do if we perform these operations between Series of equal length. For example, an easy way of combining country and region information in the dataset would be to do the following:



In [0]:

    
reviews.country + " - " + reviews.region_1









    Out[0]:





0            Italy - Etna
1                     NaN
               ...       
129969    France - Alsace
129970    France - Alsace
Length: 129971, dtype: object

These operators are faster than map() or apply() because they uses speed ups built into pandas. All of the standard Python operators (>, <, ==, and so on) work in this manner.

However, they are not as flexible as map() or apply(), which can do more advanced things, like applying conditional logic, which cannot be done with addition and subtraction alone.

Estos operadores son más rápidos que map () o apply () porque usan aceleraciones integradas en pandas. Todos los operadores estándar de Python (>, <, ==, etc.) funcionan de esta manera.

Sin embargo, no son tan flexibles como map () o apply (), que pueden hacer cosas más avanzadas, como aplicar lógica condicional, que no se puede hacer solo con la suma y la resta.

https://www.kaggle.com/residentmario/grouping-and-sorting

Groupwise analysis

One function we've been using heavily thus far is the value_counts() function. We can replicate what value_counts() does by doing the following:



In [0]:

    
reviews.groupby('points').points.count()









    Out[0]:





points
80     397
81     692
      ... 
99      33
100     19
Name: points, Length: 21, dtype: int64



In [0]:

    
reviews.groupby('points').price.min()









    Out[0]:





points
80      5.0
81      5.0
       ... 
99     44.0
100    80.0
Name: price, Length: 21, dtype: float64



In [0]:

    
reviews.groupby('winery').apply(lambda df: df.title.iloc[0])









    Out[0]:





winery
1+1=3                          1+1=3 NV Rosé Sparkling (Cava)
10 Knots                 10 Knots 2010 Viognier (Paso Robles)
                                  ...                        
àMaurice    àMaurice 2013 Fred Estate Syrah (Walla Walla V...
Štoka                         Štoka 2009 Izbrani Teran (Kras)
Length: 16757, dtype: object



In [0]:

    
reviews.groupby(['country', 'province']).apply(lambda df: df.loc[df.points.idxmax()])









    Out[0]:







  
    
      
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
    
      country
      province
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      Argentina
      Mendoza Province
      Argentina
      If the color doesn't tell the full story, the ...
      Nicasia Vineyard
      97
      120.0
      Mendoza Province
      Mendoza
      NaN
      Michael Schachner
      @wineschach
      Bodega Catena Zapata 2006 Nicasia Vineyard Mal...
      Malbec
      Bodega Catena Zapata
    
    
      Other
      Argentina
      Take note, this could be the best wine Colomé ...
      Reserva
      95
      90.0
      Other
      Salta
      NaN
      Michael Schachner
      @wineschach
      Colomé 2010 Reserva Malbec (Salta)
      Malbec
      Colomé
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      Uruguay
      San Jose
      Uruguay
      Baked, sweet, heavy aromas turn earthy with ti...
      El Preciado Gran Reserva
      87
      50.0
      San Jose
      NaN
      NaN
      Michael Schachner
      @wineschach
      Castillo Viejo 2005 El Preciado Gran Reserva R...
      Red Blend
      Castillo Viejo
    
    
      Uruguay
      Uruguay
      Cherry and berry aromas are ripe, healthy and ...
      Blend 002 Limited Edition
      91
      22.0
      Uruguay
      NaN
      NaN
      Michael Schachner
      @wineschach
      Narbona NV Blend 002 Limited Edition Tannat-Ca...
      Tannat-Cabernet Franc
      Narbona
    
  

425 rows × 13 columns



In [0]:

    
reviews.groupby(['country']).price.agg([len, min, max])









    Out[0]:







  
    
      
      len
      min
      max
    
    
      country
      
      
      
    
  
  
    
      Argentina
      3800.0
      4.0
      230.0
    
    
      Armenia
      2.0
      14.0
      15.0
    
    
      ...
      ...
      ...
      ...
    
    
      Ukraine
      14.0
      6.0
      13.0
    
    
      Uruguay
      109.0
      10.0
      130.0
    
  

43 rows × 3 columns

Multi-indexes

In all of the examples we've seen thus far we've been working with DataFrame or Series objects with a single-label index. groupby() is slightly different in the fact that, depending on the operation we run, it will sometimes result in what is called a multi-index.

A multi-index differs from a regular index in that it has multiple levels. For example:



In [0]:

    
countries_reviewed = reviews.groupby(['country', 'province']).description.agg([len])
countries_reviewed









    Out[0]:







  
    
      
      
      len
    
    
      country
      province
      
    
  
  
    
      Argentina
      Mendoza Province
      3264
    
    
      Other
      536
    
    
      ...
      ...
      ...
    
    
      Uruguay
      San Jose
      3
    
    
      Uruguay
      24
    
  

425 rows × 1 columns



In [0]:

    
mi = countries_reviewed.index
type(mi)









    Out[0]:





pandas.core.indexes.multi.MultiIndex



In [0]:

    
countries_reviewed.reset_index()









    Out[0]:







  
    
      
      country
      province
      len
    
  
  
    
      0
      Argentina
      Mendoza Province
      3264
    
    
      1
      Argentina
      Other
      536
    
    
      ...
      ...
      ...
      ...
    
    
      423
      Uruguay
      San Jose
      3
    
    
      424
      Uruguay
      Uruguay
      24
    
  

425 rows × 3 columns

Sorting

Looking again at countries_reviewed we can see that grouping returns data in index order, not in value order. That is to say, when outputting the result of a groupby, the order of the rows is dependent on the values in the index, not in the data.

To get data in the order want it in we can sort it ourselves. The sort_values() method is handy for this.



In [0]:

    
countries_reviewed = countries_reviewed.reset_index()
countries_reviewed.sort_values(by='len')









    Out[0]:







  
    
      
      country
      province
      len
    
  
  
    
      179
      Greece
      Muscat of Kefallonian
      1
    
    
      192
      Greece
      Sterea Ellada
      1
    
    
      ...
      ...
      ...
      ...
    
    
      415
      US
      Washington
      8639
    
    
      392
      US
      California
      36247
    
  

425 rows × 3 columns



In [0]:

    
countries_reviewed.sort_values(by='len', ascending=False)









    Out[0]:







  
    
      
      country
      province
      len
    
  
  
    
      392
      US
      California
      36247
    
    
      415
      US
      Washington
      8639
    
    
      ...
      ...
      ...
      ...
    
    
      63
      Chile
      Coelemu
      1
    
    
      149
      Greece
      Beotia
      1
    
  

425 rows × 3 columns



In [0]:

    
countries_reviewed.sort_index()









    Out[0]:







  
    
      
      country
      province
      len
    
  
  
    
      0
      Argentina
      Mendoza Province
      3264
    
    
      1
      Argentina
      Other
      536
    
    
      ...
      ...
      ...
      ...
    
    
      423
      Uruguay
      San Jose
      3
    
    
      424
      Uruguay
      Uruguay
      24
    
  

425 rows × 3 columns



In [0]:

    
countries_reviewed.sort_values(by=['country', 'len'])









    Out[0]:







  
    
      
      country
      province
      len
    
  
  
    
      1
      Argentina
      Other
      536
    
    
      0
      Argentina
      Mendoza Province
      3264
    
    
      ...
      ...
      ...
      ...
    
    
      424
      Uruguay
      Uruguay
      24
    
    
      419
      Uruguay
      Canelones
      43
    
  

425 rows × 3 columns

https://www.kaggle.com/luisrivera/exercise-grouping-and-sorting/edit

Exercise: Grouping and Sorting



In [0]:

    
import pandas as pd

# reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
#pd.set_option("display.max_rows", 5)
reviews = pd.read_csv("./winemag-data-130k-v2.csv", index_col=0)

# from learntools.core import binder; binder.bind(globals())
# from learntools.pandas.grouping_and_sorting import *
# print("Setup complete.")

1.

Who are the most common wine reviewers in the dataset? Create a Series whose index is the taster_twitter_handle category from the dataset, and whose values count how many reviews each person wrote.



In [0]:

    
reviews_written = reviews.groupby('taster_twitter_handle').size()

#or
reviews_written









    Out[0]:





taster_twitter_handle
@AnneInVino        3685
@JoeCz             5147
                   ... 
@winewchristina       6
@worldwineguys     1005
Length: 15, dtype: int64



In [0]:

    
reviews_written = reviews.groupby('taster_twitter_handle').taster_twitter_handle.count()
reviews_written









    Out[0]:





taster_twitter_handle
@AnneInVino        3685
@JoeCz             5147
                   ... 
@winewchristina       6
@worldwineguys     1005
Name: taster_twitter_handle, Length: 15, dtype: int64

2.

What is the best wine I can buy for a given amount of money? Create a Series whose index is wine prices and whose values is the maximum number of points a wine costing that much was given in a review. Sort the values by price, ascending (so that 4.0 dollars is at the top and 3300.0 dollars is at the bottom).



In [0]:

    
best_rating_per_price = reviews.groupby('price')['points'].max().sort_index()
best_rating_per_price









    Out[0]:





price
4.0       86
5.0       87
          ..
2500.0    96
3300.0    88
Name: points, Length: 390, dtype: int64

3.

What are the minimum and maximum prices for each variety of wine? Create a DataFrame whose index is the variety category from the dataset and whose values are the min and max values thereof.



In [0]:

    
price_extremes = reviews.groupby('variety').price.agg([min, max])
price_extremes









    Out[0]:







  
    
      
      min
      max
    
    
      variety
      
      
    
  
  
    
      Abouriou
      15.0
      75.0
    
    
      Agiorgitiko
      10.0
      66.0
    
    
      ...
      ...
      ...
    
    
      Çalkarası
      19.0
      19.0
    
    
      Žilavka
      15.0
      15.0
    
  

707 rows × 2 columns

4.

What are the most expensive wine varieties? Create a variable sorted_varieties containing a copy of the dataframe from the previous question where varieties are sorted in descending order based on minimum price, then on maximum price (to break ties).



In [0]:

    
sorted_varieties = price_extremes.sort_values(by=['min', 'max'], ascending=False)   
sorted_varieties









    Out[0]:







  
    
      
      min
      max
    
    
      variety
      
      
    
  
  
    
      Ramisco
      495.0
      495.0
    
    
      Terrantez
      236.0
      236.0
    
    
      ...
      ...
      ...
    
    
      Vital
      NaN
      NaN
    
    
      Zelen
      NaN
      NaN
    
  

707 rows × 2 columns

5.

Create a Series whose index is reviewers and whose values is the average review score given out by that reviewer. Hint: you will need the taster_name and points columns.



In [0]:

    
reviewer_mean_ratings = reviews.groupby('taster_name').points.mean()
reviewer_mean_ratings









    Out[0]:





taster_name
Alexander Peartree    85.855422
Anna Lee C. Iijima    88.415629
                        ...    
Susan Kostrzewa       86.609217
Virginie Boone        89.213379
Name: points, Length: 19, dtype: float64



In [0]:

    
reviewer_mean_ratings.describe()









    Out[0]:





count    19.000000
mean     88.233026
           ...    
75%      88.975256
max      90.562551
Name: points, Length: 8, dtype: float64

6.

What combination of countries and varieties are most common? Create a Series whose index is a MultiIndexof {country, variety} pairs. For example, a pinot noir produced in the US should map to {"US", "Pinot Noir"}. Sort the values in the Series in descending order based on wine count.



In [0]:

    
country_variety_counts = reviews.groupby(['country', 'variety']).size().sort_values(ascending=False)
country_variety_counts









    Out[0]:





country    variety           
US         Pinot Noir            9885
           Cabernet Sauvignon    7315
                                 ... 
Italy      Moscato di Noto          1
Argentina  Barbera                  1
Length: 1612, dtype: int64

https://www.kaggle.com/residentmario/data-types-and-missing-values

data-types-and-missing-values

Dtypes

The data type for a column in a DataFrame or a Series is known as the dtype.

You can use the dtype property to grab the type of a specific column. For instance, we can get the dtype of the price column in the reviews DataFrame:



In [0]:

    
reviews.price.dtype









    Out[0]:





dtype('float64')

Alternatively, the dtypes property returns the dtype of every column in the DataFrame:



In [0]:

    
reviews.dtypes









    Out[0]:





country        object
description    object
                ...  
variety        object
winery         object
Length: 13, dtype: object

Data types tell us something about how pandas is storing the data internally. float64 means that it's using a 64-bit floating point number; int64 means a similarly sized integer instead, and so on.

One peculiarity to keep in mind (and on display very clearly here) is that columns consisting entirely of strings do not get their own type; they are instead given the object type.

It's possible to convert a column of one type into another wherever such a conversion makes sense by using the astype() function. For example, we may transform the points column from its existing int64 data type into a float64 data type:



In [0]:

    
reviews.points.astype('float64')









    Out[0]:





0         87.0
1         87.0
          ... 
129969    90.0
129970    90.0
Name: points, Length: 129971, dtype: float64

A DataFrame or Series index has its own dtype, too:



In [0]:

    
reviews.index.dtype









    Out[0]:





dtype('int64')

Pandas also supports more exotic data types, such as categorical data and timeseries data. Because these data types are more rarely used, we will omit them until a much later section of this tutorial.

Missing data

Entries missing values are given the value NaN, short for "Not a Number". For technical reasons these NaN values are always of the float64 dtype.

Pandas provides some methods specific to missing data. To select NaN entries you can use pd.isnull() (or its companion pd.notnull()). This is meant to be used thusly:



In [0]:

    
reviews[pd.isnull(reviews.country)]









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
  
  
    
      913
      NaN
      Amber in color, this wine has aromas of peach ...
      Asureti Valley
      87
      30.0
      NaN
      NaN
      NaN
      Mike DeSimone
      @worldwineguys
      Gotsa Family Wines 2014 Asureti Valley Chinuri
      Chinuri
      Gotsa Family Wines
    
    
      3131
      NaN
      Soft, fruity and juicy, this is a pleasant, si...
      Partager
      83
      NaN
      NaN
      NaN
      NaN
      Roger Voss
      @vossroger
      Barton & Guestier NV Partager Red
      Red Blend
      Barton & Guestier
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      129590
      NaN
      A blend of 60% Syrah, 30% Cabernet Sauvignon a...
      Shah
      90
      30.0
      NaN
      NaN
      NaN
      Mike DeSimone
      @worldwineguys
      Büyülübağ 2012 Shah Red
      Red Blend
      Büyülübağ
    
    
      129900
      NaN
      This wine offers a delightful bouquet of black...
      NaN
      91
      32.0
      NaN
      NaN
      NaN
      Mike DeSimone
      @worldwineguys
      Psagot 2014 Merlot
      Merlot
      Psagot
    
  

63 rows × 13 columns

Replacing missing values is a common operation. Pandas provides a really handy method for this problem: fillna(). fillna() provides a few different strategies for mitigating such data. For example, we can simply replace each NaN with an "Unknown":



In [0]:

    
reviews.region_2.fillna("Unknown")









    Out[0]:





0         Unknown
1         Unknown
           ...   
129969    Unknown
129970    Unknown
Name: region_2, Length: 129971, dtype: object

Or we could fill each missing value with the first non-null value that appears sometime after the given record in the database. This is known as the backfill strategy.

Alternatively, we may have a non-null value that we would like to replace. For example, suppose that since this dataset was published, reviewer Kerin O'Keefe has changed her Twitter handle from @kerinokeefe to @kerino. One way to reflect this in the dataset is using the replace() method:



In [0]:

    
reviews.taster_twitter_handle.replace("@kerinokeefe", "@kerino")









    Out[0]:





0            @kerino
1         @vossroger
             ...    
129969    @vossroger
129970    @vossroger
Name: taster_twitter_handle, Length: 129971, dtype: object

The replace() method is worth mentioning here because it's handy for replacing missing data which is given some kind of sentinel value in the dataset: things like "Unknown", "Undisclosed", "Invalid", and so on.

Exercise: Data Types and Missing Values

https://www.kaggle.com/luisrivera/exercise-data-types-and-missing-values/edit



In [0]:

    
# import pandas as pd

# reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)

# from learntools.core import binder; binder.bind(globals())
# from learntools.pandas.data_types_and_missing_data import *
# print("Setup complete.")

Exercises

1. What is the data type of the points column in the dataset?



In [0]:

    
# Your code here
dtype = reviews.points.dtype

2.

Create a Series from entries in the points column, but convert the entries to strings. Hint: strings are str in native Python.



In [0]:

    
point_strings = reviews.points.astype(str)

point_strings









    Out[0]:





0         87
1         87
          ..
129969    90
129970    90
Name: points, Length: 129971, dtype: object

3.

Sometimes the price column is null. How many reviews in the dataset are missing a price?



In [0]:

    
missing_price_reviews = reviews[reviews.price.isnull()]
n_missing_prices = len(missing_price_reviews)
n_missing_prices









    Out[0]:





8996



In [0]:

    
# Cute alternative solution: if we sum a boolean series, True is treated as 1 and False as 0
n_missing_prices = reviews.price.isnull().sum()
n_missing_prices









    Out[0]:





8996



In [0]:

    
# or equivalently:
n_missing_prices = pd.isnull(reviews.price).sum()
n_missing_prices









    Out[0]:





8996



In [0]:

    
## 4.
What are the most common wine-producing regions? Create a Series counting the number of times each value occurs in the `region_1` field. This field is often missing data, so replace missing values with `Unknown`. Sort in descending order.  Your output should look something like this:

```
Unknown                    21247
Napa Valley                 4480
                           ...  
Bardolino Superiore            1
Primitivo del Tarantino        1
Name: region_1, Length: 1230, dtype: int64



In [0]:

    
reviews_per_region = reviews.region_1.fillna('Unknown').value_counts().sort_values(ascending=False)
reviews_per_region









    Out[0]:





Unknown             21247
Napa Valley          4480
                    ...  
Canada                  1
Lake-Sonoma-Napa        1
Name: region_1, Length: 1230, dtype: int64

Renaming-and-combining columns

https://www.kaggle.com/residentmario/renaming-and-combining

Introduction

Oftentimes data will come to us with column names, index names, or other naming conventions that we are not satisfied with. In that case, you'll learn how to use pandas functions to change the names of the offending entries to something better.

You'll also explore how to combine data from multiple DataFrames and/or Series.

Renaming

The first function we'll introduce here is rename(), which lets you change index names and/or column names. For example, to change the points column in our dataset to score, we would do:



In [0]:

    
import pandas as pd

# reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
reviews = pd.read_csv("winemag-data-130k-v2.csv", index_col=0)
# from learntools.core import binder; binder.bind(globals())
# from learntools.pandas.renaming_and_combining import *
# print("Setup complete.")



In [0]:

    
reviews.head()









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
  
  
    
      0
      Italy
      Aromas include tropical fruit, broom, brimston...
      Vulkà Bianco
      87
      NaN
      Sicily & Sardinia
      Etna
      NaN
      Kerin O’Keefe
      @kerinokeefe
      Nicosia 2013 Vulkà Bianco  (Etna)
      White Blend
      Nicosia
    
    
      1
      Portugal
      This is ripe and fruity, a wine that is smooth...
      Avidagos
      87
      15.0
      Douro
      NaN
      NaN
      Roger Voss
      @vossroger
      Quinta dos Avidagos 2011 Avidagos Red (Douro)
      Portuguese Red
      Quinta dos Avidagos
    
    
      2
      US
      Tart and snappy, the flavors of lime flesh and...
      NaN
      87
      14.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Rainstorm 2013 Pinot Gris (Willamette Valley)
      Pinot Gris
      Rainstorm
    
    
      3
      US
      Pineapple rind, lemon pith and orange blossom ...
      Reserve Late Harvest
      87
      13.0
      Michigan
      Lake Michigan Shore
      NaN
      Alexander Peartree
      NaN
      St. Julian 2013 Reserve Late Harvest Riesling ...
      Riesling
      St. Julian
    
    
      4
      US
      Much like the regular bottling from 2012, this...
      Vintner's Reserve Wild Child Block
      87
      65.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Sweet Cheeks 2012 Vintner's Reserve Wild Child...
      Pinot Noir
      Sweet Cheeks



In [0]:

    
reviews.rename(columns={'points': 'score'})









    Out[0]:







  
    
      
      country
      description
      designation
      score
      price
      province
      region_1
      region_2
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
  
  
    
      0
      Italy
      Aromas include tropical fruit, broom, brimston...
      Vulkà Bianco
      87
      NaN
      Sicily & Sardinia
      Etna
      NaN
      Kerin O’Keefe
      @kerinokeefe
      Nicosia 2013 Vulkà Bianco  (Etna)
      White Blend
      Nicosia
    
    
      1
      Portugal
      This is ripe and fruity, a wine that is smooth...
      Avidagos
      87
      15.0
      Douro
      NaN
      NaN
      Roger Voss
      @vossroger
      Quinta dos Avidagos 2011 Avidagos Red (Douro)
      Portuguese Red
      Quinta dos Avidagos
    
    
      2
      US
      Tart and snappy, the flavors of lime flesh and...
      NaN
      87
      14.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Rainstorm 2013 Pinot Gris (Willamette Valley)
      Pinot Gris
      Rainstorm
    
    
      3
      US
      Pineapple rind, lemon pith and orange blossom ...
      Reserve Late Harvest
      87
      13.0
      Michigan
      Lake Michigan Shore
      NaN
      Alexander Peartree
      NaN
      St. Julian 2013 Reserve Late Harvest Riesling ...
      Riesling
      St. Julian
    
    
      4
      US
      Much like the regular bottling from 2012, this...
      Vintner's Reserve Wild Child Block
      87
      65.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Sweet Cheeks 2012 Vintner's Reserve Wild Child...
      Pinot Noir
      Sweet Cheeks
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      129966
      Germany
      Notes of honeysuckle and cantaloupe sweeten th...
      Brauneberger Juffer-Sonnenuhr Spätlese
      90
      28.0
      Mosel
      NaN
      NaN
      Anna Lee C. Iijima
      NaN
      Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...
      Riesling
      Dr. H. Thanisch (Erben Müller-Burggraef)
    
    
      129967
      US
      Citation is given as much as a decade of bottl...
      NaN
      90
      75.0
      Oregon
      Oregon
      Oregon Other
      Paul Gregutt
      @paulgwine
      Citation 2004 Pinot Noir (Oregon)
      Pinot Noir
      Citation
    
    
      129968
      France
      Well-drained gravel soil gives this wine its c...
      Kritt
      90
      30.0
      Alsace
      Alsace
      NaN
      Roger Voss
      @vossroger
      Domaine Gresser 2013 Kritt Gewurztraminer (Als...
      Gewürztraminer
      Domaine Gresser
    
    
      129969
      France
      A dry style of Pinot Gris, this is crisp with ...
      NaN
      90
      32.0
      Alsace
      Alsace
      NaN
      Roger Voss
      @vossroger
      Domaine Marcel Deiss 2012 Pinot Gris (Alsace)
      Pinot Gris
      Domaine Marcel Deiss
    
    
      129970
      France
      Big, rich and off-dry, this is powered by inte...
      Lieu-dit Harth Cuvée Caroline
      90
      21.0
      Alsace
      Alsace
      NaN
      Roger Voss
      @vossroger
      Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...
      Gewürztraminer
      Domaine Schoffit
    
  

129971 rows × 13 columns

rename() lets you rename index or column values by specifying a index or column keyword parameter, respectively. It supports a variety of input formats, but usually a Python dictionary is the most convenient. Here is an example using it to rename some elements of the index.



In [0]:

    
reviews.rename(index={0: 'firstEntry', 1: 'secondEntry'})









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
  
  
    
      firstEntry
      Italy
      Aromas include tropical fruit, broom, brimston...
      Vulkà Bianco
      87
      NaN
      Sicily & Sardinia
      Etna
      NaN
      Kerin O’Keefe
      @kerinokeefe
      Nicosia 2013 Vulkà Bianco  (Etna)
      White Blend
      Nicosia
    
    
      secondEntry
      Portugal
      This is ripe and fruity, a wine that is smooth...
      Avidagos
      87
      15.0
      Douro
      NaN
      NaN
      Roger Voss
      @vossroger
      Quinta dos Avidagos 2011 Avidagos Red (Douro)
      Portuguese Red
      Quinta dos Avidagos
    
    
      2
      US
      Tart and snappy, the flavors of lime flesh and...
      NaN
      87
      14.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Rainstorm 2013 Pinot Gris (Willamette Valley)
      Pinot Gris
      Rainstorm
    
    
      3
      US
      Pineapple rind, lemon pith and orange blossom ...
      Reserve Late Harvest
      87
      13.0
      Michigan
      Lake Michigan Shore
      NaN
      Alexander Peartree
      NaN
      St. Julian 2013 Reserve Late Harvest Riesling ...
      Riesling
      St. Julian
    
    
      4
      US
      Much like the regular bottling from 2012, this...
      Vintner's Reserve Wild Child Block
      87
      65.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Sweet Cheeks 2012 Vintner's Reserve Wild Child...
      Pinot Noir
      Sweet Cheeks
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      129966
      Germany
      Notes of honeysuckle and cantaloupe sweeten th...
      Brauneberger Juffer-Sonnenuhr Spätlese
      90
      28.0
      Mosel
      NaN
      NaN
      Anna Lee C. Iijima
      NaN
      Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...
      Riesling
      Dr. H. Thanisch (Erben Müller-Burggraef)
    
    
      129967
      US
      Citation is given as much as a decade of bottl...
      NaN
      90
      75.0
      Oregon
      Oregon
      Oregon Other
      Paul Gregutt
      @paulgwine
      Citation 2004 Pinot Noir (Oregon)
      Pinot Noir
      Citation
    
    
      129968
      France
      Well-drained gravel soil gives this wine its c...
      Kritt
      90
      30.0
      Alsace
      Alsace
      NaN
      Roger Voss
      @vossroger
      Domaine Gresser 2013 Kritt Gewurztraminer (Als...
      Gewürztraminer
      Domaine Gresser
    
    
      129969
      France
      A dry style of Pinot Gris, this is crisp with ...
      NaN
      90
      32.0
      Alsace
      Alsace
      NaN
      Roger Voss
      @vossroger
      Domaine Marcel Deiss 2012 Pinot Gris (Alsace)
      Pinot Gris
      Domaine Marcel Deiss
    
    
      129970
      France
      Big, rich and off-dry, this is powered by inte...
      Lieu-dit Harth Cuvée Caroline
      90
      21.0
      Alsace
      Alsace
      NaN
      Roger Voss
      @vossroger
      Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...
      Gewürztraminer
      Domaine Schoffit
    
  

129971 rows × 13 columns

You'll probably rename columns very often, but rename index values very rarely. For that, set_index() is usually more convenient.

Both the row index and the column index can have their own name attribute. The complimentary rename_axis() method may be used to change these names. For example:



In [0]:

    
reviews.rename_axis("wines", axis='rows').rename_axis("fields", axis='columns')









    Out[0]:







  
    
      fields
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
    
      wines
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      0
      Italy
      Aromas include tropical fruit, broom, brimston...
      Vulkà Bianco
      87
      NaN
      Sicily & Sardinia
      Etna
      NaN
      Kerin O’Keefe
      @kerinokeefe
      Nicosia 2013 Vulkà Bianco  (Etna)
      White Blend
      Nicosia
    
    
      1
      Portugal
      This is ripe and fruity, a wine that is smooth...
      Avidagos
      87
      15.0
      Douro
      NaN
      NaN
      Roger Voss
      @vossroger
      Quinta dos Avidagos 2011 Avidagos Red (Douro)
      Portuguese Red
      Quinta dos Avidagos
    
    
      2
      US
      Tart and snappy, the flavors of lime flesh and...
      NaN
      87
      14.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Rainstorm 2013 Pinot Gris (Willamette Valley)
      Pinot Gris
      Rainstorm
    
    
      3
      US
      Pineapple rind, lemon pith and orange blossom ...
      Reserve Late Harvest
      87
      13.0
      Michigan
      Lake Michigan Shore
      NaN
      Alexander Peartree
      NaN
      St. Julian 2013 Reserve Late Harvest Riesling ...
      Riesling
      St. Julian
    
    
      4
      US
      Much like the regular bottling from 2012, this...
      Vintner's Reserve Wild Child Block
      87
      65.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Sweet Cheeks 2012 Vintner's Reserve Wild Child...
      Pinot Noir
      Sweet Cheeks
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      129966
      Germany
      Notes of honeysuckle and cantaloupe sweeten th...
      Brauneberger Juffer-Sonnenuhr Spätlese
      90
      28.0
      Mosel
      NaN
      NaN
      Anna Lee C. Iijima
      NaN
      Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...
      Riesling
      Dr. H. Thanisch (Erben Müller-Burggraef)
    
    
      129967
      US
      Citation is given as much as a decade of bottl...
      NaN
      90
      75.0
      Oregon
      Oregon
      Oregon Other
      Paul Gregutt
      @paulgwine
      Citation 2004 Pinot Noir (Oregon)
      Pinot Noir
      Citation
    
    
      129968
      France
      Well-drained gravel soil gives this wine its c...
      Kritt
      90
      30.0
      Alsace
      Alsace
      NaN
      Roger Voss
      @vossroger
      Domaine Gresser 2013 Kritt Gewurztraminer (Als...
      Gewürztraminer
      Domaine Gresser
    
    
      129969
      France
      A dry style of Pinot Gris, this is crisp with ...
      NaN
      90
      32.0
      Alsace
      Alsace
      NaN
      Roger Voss
      @vossroger
      Domaine Marcel Deiss 2012 Pinot Gris (Alsace)
      Pinot Gris
      Domaine Marcel Deiss
    
    
      129970
      France
      Big, rich and off-dry, this is powered by inte...
      Lieu-dit Harth Cuvée Caroline
      90
      21.0
      Alsace
      Alsace
      NaN
      Roger Voss
      @vossroger
      Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...
      Gewürztraminer
      Domaine Schoffit
    
  

129971 rows × 13 columns

Combining

When performing operations on a dataset, we will sometimes need to combine different DataFrames and/or Series in non-trivial ways. Pandas has three core methods for doing this. In order of increasing complexity, these are concat(), join(), and merge(). Most of what merge() can do can also be done more simply with join(), so we will omit it and focus on the first two functions here.

The simplest combining method is concat(). Given a list of elements, this function will smush those elements together along an axis.

This is useful when we have data in different DataFrame or Series objects but having the same fields (columns). One example: the YouTube Videos dataset, which splits the data up based on country of origin (e.g. Canada and the UK, in this example). If we want to study multiple countries simultaneously, we can use concat() to smush them together:

https://www.kaggle.com/datasnaek/youtube-new ; datos



In [0]:

    
# canadian_youtube = pd.read_csv("../input/youtube-new/CAvideos.csv")
# british_youtube = pd.read_csv("../input/youtube-new/GBvideos.csv")

canadian_youtube = pd.read_csv("CAvideos.csv")
british_youtube = pd.read_csv("GBvideos.csv")

pd.concat([canadian_youtube, british_youtube])









    Out[0]:







  
    
      
      video_id
      trending_date
      title
      channel_title
      category_id
      publish_time
      tags
      views
      likes
      dislikes
      comment_count
      thumbnail_link
      comments_disabled
      ratings_disabled
      video_error_or_removed
      description
    
  
  
    
      0
      n1WpP7iowLc
      17.14.11
      Eminem - Walk On Water (Audio) ft. Beyoncé
      EminemVEVO
      10
      2017-11-10T17:00:03.000Z
      Eminem|"Walk"|"On"|"Water"|"Aftermath/Shady/In...
      17158579
      787425
      43420
      125882
      https://i.ytimg.com/vi/n1WpP7iowLc/default.jpg
      False
      False
      False
      Eminem's new track Walk on Water ft. Beyoncé i...
    
    
      1
      0dBIkQ4Mz1M
      17.14.11
      PLUSH - Bad Unboxing Fan Mail
      iDubbbzTV
      23
      2017-11-13T17:00:00.000Z
      plush|"bad unboxing"|"unboxing"|"fan mail"|"id...
      1014651
      127794
      1688
      13030
      https://i.ytimg.com/vi/0dBIkQ4Mz1M/default.jpg
      False
      False
      False
      STill got a lot of packages. Probably will las...
    
    
      2
      5qpjK5DgCt4
      17.14.11
      Racist Superman | Rudy Mancuso, King Bach & Le...
      Rudy Mancuso
      23
      2017-11-12T19:05:24.000Z
      racist superman|"rudy"|"mancuso"|"king"|"bach"...
      3191434
      146035
      5339
      8181
      https://i.ytimg.com/vi/5qpjK5DgCt4/default.jpg
      False
      False
      False
      WATCH MY PREVIOUS VIDEO ▶ \n\nSUBSCRIBE ► http...
    
    
      3
      d380meD0W0M
      17.14.11
      I Dare You: GOING BALD!?
      nigahiga
      24
      2017-11-12T18:01:41.000Z
      ryan|"higa"|"higatv"|"nigahiga"|"i dare you"|"...
      2095828
      132239
      1989
      17518
      https://i.ytimg.com/vi/d380meD0W0M/default.jpg
      False
      False
      False
      I know it's been a while since we did this sho...
    
    
      4
      2Vv-BfVoq4g
      17.14.11
      Ed Sheeran - Perfect (Official Music Video)
      Ed Sheeran
      10
      2017-11-09T11:04:14.000Z
      edsheeran|"ed sheeran"|"acoustic"|"live"|"cove...
      33523622
      1634130
      21082
      85067
      https://i.ytimg.com/vi/2Vv-BfVoq4g/default.jpg
      False
      False
      False
      🎧: https://ad.gt/yt-perfect\n💰: https://atlant...
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      38911
      l884wKofd54
      18.14.06
      Enrique Iglesias - MOVE TO MIAMI (Official Vid...
      EnriqueIglesiasVEVO
      10
      2018-05-09T07:00:01.000Z
      Enrique Iglesias feat. Pitbull|"MOVE TO MIAMI"...
      25066952
      268088
      12783
      9933
      https://i.ytimg.com/vi/l884wKofd54/default.jpg
      False
      False
      False
      NEW SONG - MOVE TO MIAMI feat. Pitbull (Click ...
    
    
      38912
      IP8k2xkhOdI
      18.14.06
      Jacob Sartorius - Up With It (Official Music V...
      Jacob Sartorius
      10
      2018-05-11T17:09:16.000Z
      jacob sartorius|"jacob"|"up with it"|"jacob sa...
      1492219
      61998
      13781
      24330
      https://i.ytimg.com/vi/IP8k2xkhOdI/default.jpg
      False
      False
      False
      THE OFFICIAL UP WITH IT MUSIC VIDEO!Get my new...
    
    
      38913
      Il-an3K9pjg
      18.14.06
      Anne-Marie - 2002 [Official Video]
      Anne-Marie
      10
      2018-05-08T11:05:08.000Z
      anne|"marie"|"anne-marie"|"2002"|"two thousand...
      29641412
      394830
      8892
      19988
      https://i.ytimg.com/vi/Il-an3K9pjg/default.jpg
      False
      False
      False
      Get 2002 by Anne-Marie HERE ▶ http://ad.gt/200...
    
    
      38914
      -DRsfNObKIQ
      18.14.06
      Eleni Foureira - Fuego - Cyprus - LIVE - First...
      Eurovision Song Contest
      24
      2018-05-08T20:32:32.000Z
      Eurovision Song Contest|"2018"|"Lisbon"|"Cypru...
      14317515
      151870
      45875
      26766
      https://i.ytimg.com/vi/-DRsfNObKIQ/default.jpg
      False
      False
      False
      Eleni Foureira represented Cyprus at the first...
    
    
      38915
      4YFo4bdMO8Q
      18.14.06
      KYLE - Ikuyo feat.  2 Chainz & Sophia Black [A...
      SuperDuperKyle
      10
      2018-05-11T04:06:35.000Z
      Kyle|"SuperDuperKyle"|"Ikuyo"|"2 Chainz"|"Soph...
      607552
      18271
      274
      1423
      https://i.ytimg.com/vi/4YFo4bdMO8Q/default.jpg
      False
      False
      False
      Debut album 'Light of Mine' out now: http://ky...
    
  

79797 rows × 16 columns

The middlemost combiner in terms of complexity is join(). join() lets you combine different DataFrame objects which have an index in common. For example, to pull down videos that happened to be trending on the same day in both Canada and the UK, we could do the following:



In [0]:

    
left = canadian_youtube.set_index(['title', 'trending_date'])
right = british_youtube.set_index(['title', 'trending_date'])

left.join(right, lsuffix='_CAN', rsuffix='_UK')









    Out[0]:







  
    
      
      
      video_id_CAN
      channel_title_CAN
      category_id_CAN
      publish_time_CAN
      tags_CAN
      views_CAN
      likes_CAN
      dislikes_CAN
      comment_count_CAN
      thumbnail_link_CAN
      comments_disabled_CAN
      ratings_disabled_CAN
      video_error_or_removed_CAN
      description_CAN
      video_id_UK
      channel_title_UK
      category_id_UK
      publish_time_UK
      tags_UK
      views_UK
      likes_UK
      dislikes_UK
      comment_count_UK
      thumbnail_link_UK
      comments_disabled_UK
      ratings_disabled_UK
      video_error_or_removed_UK
      description_UK
    
    
      title
      trending_date
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      !! THIS VIDEO IS NOTHING BUT PAIN !! | Getting Over It - Part 7
      18.04.01
      PNn8sECd7io
      Markiplier
      20
      2018-01-03T19:33:53.000Z
      getting over it|"markiplier"|"funny moments"|"...
      835930
      47058
      1023
      8250
      https://i.ytimg.com/vi/PNn8sECd7io/default.jpg
      False
      False
      False
      Getting Over It continues with RAGE BEYOND ALL...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      #1 Fortnite World Rank - 2,323 Solo Wins!
      18.09.03
      DvPW66IFhMI
      AlexRamiGaming
      20
      2018-03-09T07:15:52.000Z
      PS4 Battle Royale|"PS4 Pro Battle Royale"|"Bat...
      212838
      5199
      542
      11
      https://i.ytimg.com/vi/DvPW66IFhMI/default.jpg
      False
      False
      False
      Discord For EVERYONE - https://discord.gg/nhud...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      #1 Fortnite World Rank - 2,330 Solo Wins!
      18.10.03
      EXEaMjFeiEk
      AlexRamiGaming
      20
      2018-03-10T06:26:17.000Z
      PS4 Battle Royale|"PS4 Pro Battle Royale"|"Bat...
      200764
      5620
      537
      45
      https://i.ytimg.com/vi/EXEaMjFeiEk/default.jpg
      False
      False
      False
      Discord For EVERYONE - https://discord.gg/nhud...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      #1 MOST ANTICIPATED VIDEO (Timber Frame House Raising)
      17.20.12
      bYvQmusLaxw
      Pure Living for Life
      24
      2017-12-20T02:49:11.000Z
      timber frame|"timber framing"|"timber frame ra...
      79152
      7761
      159
      1965
      https://i.ytimg.com/vi/bYvQmusLaxw/default.jpg
      False
      False
      False
      Shelter Institute: http://bit.ly/2iwXj8B\nFull...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      17.21.12
      bYvQmusLaxw
      Pure Living for Life
      24
      2017-12-20T02:49:11.000Z
      timber frame|"timber framing"|"timber frame ra...
      232762
      15515
      329
      3601
      https://i.ytimg.com/vi/bYvQmusLaxw/default.jpg
      False
      False
      False
      Shelter Institute: http://bit.ly/2iwXj8B\nFull...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      😲She Is So Nervous But BLOWS The ROOF After Taking on OPERA Song! | Britain´s Got Talent 2018
      18.02.05
      WttN1Z0XF4k
      How Talented
      24
      2018-04-28T19:40:58.000Z
      bgt|"bgt 2018"|"britain got talent"|"britain´s...
      713400
      4684
      260
      266
      https://i.ytimg.com/vi/WttN1Z0XF4k/default.jpg
      False
      False
      False
      Amy Marie Borg - Britain´s Got Talent 2018\n\n...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      18.29.04
      WttN1Z0XF4k
      How Talented
      24
      2018-04-28T19:40:58.000Z
      bgt|"bgt 2018"|"britain got talent"|"britain´s...
      231906
      1924
      78
      146
      https://i.ytimg.com/vi/WttN1Z0XF4k/default.jpg
      False
      False
      False
      Amy Marie Borg - Britain´s Got Talent 2018\n\n...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      18.30.04
      WttN1Z0XF4k
      How Talented
      24
      2018-04-28T19:40:58.000Z
      bgt|"bgt 2018"|"britain got talent"|"britain´s...
      476253
      3417
      176
      240
      https://i.ytimg.com/vi/WttN1Z0XF4k/default.jpg
      False
      False
      False
      Amy Marie Borg - Britain´s Got Talent 2018\n\n...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      🚨 BREAKING NEWS 🔴 Raja Live all Slot Channels Welcome 🎰
      18.07.05
      Wt9Gkpmbt44
      TheBigJackpot
      24
      2018-05-07T06:58:59.000Z
      Slot Machine|"win"|"Gambling"|"Big Win"|"raja"...
      28973
      2167
      175
      10
      https://i.ytimg.com/vi/Wt9Gkpmbt44/default.jpg
      False
      False
      False
      The Raja takes matters in to his own hands ton...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      🚨Active Shooter at YouTube Headquarters - LIVE BREAKING NEWS COVERAGE
      18.04.04
      Az72jrKbANA
      Right Side Broadcasting Network
      25
      2018-04-03T23:12:37.000Z
      YouTube shooter|"YouTube active shooter"|"acti...
      103513
      1722
      181
      76
      https://i.ytimg.com/vi/Az72jrKbANA/default.jpg
      False
      False
      False
      An active shooter has been reported at the You...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

40900 rows × 28 columns

The lsuffix and rsuffix parameters are necessary here because the data has the same column names in both British and Canadian datasets. If this wasn't true (because, say, we'd renamed them beforehand) we wouldn't need them.

Exercise: Renaming and Combining

https://www.kaggle.com/luisrivera/exercise-renaming-and-combining/edit



In [0]:

    
# import pandas as pd

# reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)

# from learntools.core import binder; binder.bind(globals())
# from learntools.pandas.renaming_and_combining import *
# print("Setup complete.")

Exercises

View the first several lines of your data by running the cell below:



In [0]:

    
reviews.head()









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
  
  
    
      0
      Italy
      Aromas include tropical fruit, broom, brimston...
      Vulkà Bianco
      87
      NaN
      Sicily & Sardinia
      Etna
      NaN
      Kerin O’Keefe
      @kerinokeefe
      Nicosia 2013 Vulkà Bianco  (Etna)
      White Blend
      Nicosia
    
    
      1
      Portugal
      This is ripe and fruity, a wine that is smooth...
      Avidagos
      87
      15.0
      Douro
      NaN
      NaN
      Roger Voss
      @vossroger
      Quinta dos Avidagos 2011 Avidagos Red (Douro)
      Portuguese Red
      Quinta dos Avidagos
    
    
      2
      US
      Tart and snappy, the flavors of lime flesh and...
      NaN
      87
      14.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Rainstorm 2013 Pinot Gris (Willamette Valley)
      Pinot Gris
      Rainstorm
    
    
      3
      US
      Pineapple rind, lemon pith and orange blossom ...
      Reserve Late Harvest
      87
      13.0
      Michigan
      Lake Michigan Shore
      NaN
      Alexander Peartree
      NaN
      St. Julian 2013 Reserve Late Harvest Riesling ...
      Riesling
      St. Julian
    
    
      4
      US
      Much like the regular bottling from 2012, this...
      Vintner's Reserve Wild Child Block
      87
      65.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Sweet Cheeks 2012 Vintner's Reserve Wild Child...
      Pinot Noir
      Sweet Cheeks

1.

region_1 and region_2 are pretty uninformative names for locale columns in the dataset. Create a copy of reviews with these columns renamed to region and locale, respectively.



In [0]:

    
renamed = reviews.rename(columns=dict(region_1='region', region_2='locale'))
# q1.check()
renamed









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region
      locale
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
  
  
    
      0
      Italy
      Aromas include tropical fruit, broom, brimston...
      Vulkà Bianco
      87
      NaN
      Sicily & Sardinia
      Etna
      NaN
      Kerin O’Keefe
      @kerinokeefe
      Nicosia 2013 Vulkà Bianco  (Etna)
      White Blend
      Nicosia
    
    
      1
      Portugal
      This is ripe and fruity, a wine that is smooth...
      Avidagos
      87
      15.0
      Douro
      NaN
      NaN
      Roger Voss
      @vossroger
      Quinta dos Avidagos 2011 Avidagos Red (Douro)
      Portuguese Red
      Quinta dos Avidagos
    
    
      2
      US
      Tart and snappy, the flavors of lime flesh and...
      NaN
      87
      14.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Rainstorm 2013 Pinot Gris (Willamette Valley)
      Pinot Gris
      Rainstorm
    
    
      3
      US
      Pineapple rind, lemon pith and orange blossom ...
      Reserve Late Harvest
      87
      13.0
      Michigan
      Lake Michigan Shore
      NaN
      Alexander Peartree
      NaN
      St. Julian 2013 Reserve Late Harvest Riesling ...
      Riesling
      St. Julian
    
    
      4
      US
      Much like the regular bottling from 2012, this...
      Vintner's Reserve Wild Child Block
      87
      65.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Sweet Cheeks 2012 Vintner's Reserve Wild Child...
      Pinot Noir
      Sweet Cheeks
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      129966
      Germany
      Notes of honeysuckle and cantaloupe sweeten th...
      Brauneberger Juffer-Sonnenuhr Spätlese
      90
      28.0
      Mosel
      NaN
      NaN
      Anna Lee C. Iijima
      NaN
      Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...
      Riesling
      Dr. H. Thanisch (Erben Müller-Burggraef)
    
    
      129967
      US
      Citation is given as much as a decade of bottl...
      NaN
      90
      75.0
      Oregon
      Oregon
      Oregon Other
      Paul Gregutt
      @paulgwine
      Citation 2004 Pinot Noir (Oregon)
      Pinot Noir
      Citation
    
    
      129968
      France
      Well-drained gravel soil gives this wine its c...
      Kritt
      90
      30.0
      Alsace
      Alsace
      NaN
      Roger Voss
      @vossroger
      Domaine Gresser 2013 Kritt Gewurztraminer (Als...
      Gewürztraminer
      Domaine Gresser
    
    
      129969
      France
      A dry style of Pinot Gris, this is crisp with ...
      NaN
      90
      32.0
      Alsace
      Alsace
      NaN
      Roger Voss
      @vossroger
      Domaine Marcel Deiss 2012 Pinot Gris (Alsace)
      Pinot Gris
      Domaine Marcel Deiss
    
    
      129970
      France
      Big, rich and off-dry, this is powered by inte...
      Lieu-dit Harth Cuvée Caroline
      90
      21.0
      Alsace
      Alsace
      NaN
      Roger Voss
      @vossroger
      Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...
      Gewürztraminer
      Domaine Schoffit
    
  

129971 rows × 13 columns

2.

Set the index name in the dataset to wines.



In [0]:

    
reindexed = reviews.rename_axis('wines', axis='rows')
reindexed









    Out[0]:







  
    
      
      country
      description
      designation
      points
      price
      province
      region_1
      region_2
      taster_name
      taster_twitter_handle
      title
      variety
      winery
    
    
      wines
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      0
      Italy
      Aromas include tropical fruit, broom, brimston...
      Vulkà Bianco
      87
      NaN
      Sicily & Sardinia
      Etna
      NaN
      Kerin O’Keefe
      @kerinokeefe
      Nicosia 2013 Vulkà Bianco  (Etna)
      White Blend
      Nicosia
    
    
      1
      Portugal
      This is ripe and fruity, a wine that is smooth...
      Avidagos
      87
      15.0
      Douro
      NaN
      NaN
      Roger Voss
      @vossroger
      Quinta dos Avidagos 2011 Avidagos Red (Douro)
      Portuguese Red
      Quinta dos Avidagos
    
    
      2
      US
      Tart and snappy, the flavors of lime flesh and...
      NaN
      87
      14.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Rainstorm 2013 Pinot Gris (Willamette Valley)
      Pinot Gris
      Rainstorm
    
    
      3
      US
      Pineapple rind, lemon pith and orange blossom ...
      Reserve Late Harvest
      87
      13.0
      Michigan
      Lake Michigan Shore
      NaN
      Alexander Peartree
      NaN
      St. Julian 2013 Reserve Late Harvest Riesling ...
      Riesling
      St. Julian
    
    
      4
      US
      Much like the regular bottling from 2012, this...
      Vintner's Reserve Wild Child Block
      87
      65.0
      Oregon
      Willamette Valley
      Willamette Valley
      Paul Gregutt
      @paulgwine
      Sweet Cheeks 2012 Vintner's Reserve Wild Child...
      Pinot Noir
      Sweet Cheeks
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      129966
      Germany
      Notes of honeysuckle and cantaloupe sweeten th...
      Brauneberger Juffer-Sonnenuhr Spätlese
      90
      28.0
      Mosel
      NaN
      NaN
      Anna Lee C. Iijima
      NaN
      Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...
      Riesling
      Dr. H. Thanisch (Erben Müller-Burggraef)
    
    
      129967
      US
      Citation is given as much as a decade of bottl...
      NaN
      90
      75.0
      Oregon
      Oregon
      Oregon Other
      Paul Gregutt
      @paulgwine
      Citation 2004 Pinot Noir (Oregon)
      Pinot Noir
      Citation
    
    
      129968
      France
      Well-drained gravel soil gives this wine its c...
      Kritt
      90
      30.0
      Alsace
      Alsace
      NaN
      Roger Voss
      @vossroger
      Domaine Gresser 2013 Kritt Gewurztraminer (Als...
      Gewürztraminer
      Domaine Gresser
    
    
      129969
      France
      A dry style of Pinot Gris, this is crisp with ...
      NaN
      90
      32.0
      Alsace
      Alsace
      NaN
      Roger Voss
      @vossroger
      Domaine Marcel Deiss 2012 Pinot Gris (Alsace)
      Pinot Gris
      Domaine Marcel Deiss
    
    
      129970
      France
      Big, rich and off-dry, this is powered by inte...
      Lieu-dit Harth Cuvée Caroline
      90
      21.0
      Alsace
      Alsace
      NaN
      Roger Voss
      @vossroger
      Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...
      Gewürztraminer
      Domaine Schoffit
    
  

129971 rows × 13 columns

3.

The Things on Reddit dataset includes product links from a selection of top-ranked forums ("subreddits") on reddit.com. Run the cell below to load a dataframe of products mentioned on the /r/gaming subreddit and another dataframe for products mentioned on the r//movies subreddit.



In [0]:

    
# gaming_products = pd.read_csv("../input/things-on-reddit/top-things/top-things/reddits/g/gaming.csv")
gaming_products = pd.read_csv("gaming.csv")
gaming_products['subreddit'] = "r/gaming"


# movie_products = pd.read_csv("../input/things-on-reddit/top-things/top-things/reddits/m/movies.csv")
movie_products = pd.read_csv("movies.csv")
movie_products['subreddit'] = "r/movies"

Create a DataFrame of products mentioned on either subreddit.



In [0]:

    
combined_products = pd.concat([gaming_products, movie_products])

# q3.check()
combined_products.head()









    Out[0]:







  
    
      
      name
      category
      amazon_link
      total_mentions
      subreddit_mentions
      subreddit
    
  
  
    
      0
      BOOMco Halo Covenant Needler Blaster
      Toys & Games
      https://www.amazon.com/BOOMco-Halo-Covenant-Ne...
      4.0
      4
      r/gaming
    
    
      1
      Raspberry PI 3 Model B 1.2GHz 64-bit quad-core...
      Electronics
      https://www.amazon.com/Raspberry-Model-A1-2GHz...
      19.0
      3
      r/gaming
    
    
      2
      CanaKit 5V 2.5A Raspberry Pi 3 Power Supply / ...
      Electronics
      https://www.amazon.com/CanaKit-Raspberry-Suppl...
      7.0
      3
      r/gaming
    
    
      3
      Panasonic K-KJ17MCA4BA Advanced Individual Cel...
      Electronics
      https://www.amazon.com/Panasonic-Advanced-Indi...
      29.0
      2
      r/gaming
    
    
      4
      Mayflash GameCube Controller Adapter for Wii U...
      Electronics
      https://www.amazon.com/GameCube-Controller-Ada...
      24.0
      2
      r/gaming

4.

The Powerlifting Database dataset on Kaggle includes one CSV table for powerlifting meets and a separate one for powerlifting competitors. Run the cell below to load these datasets into dataframes:



In [0]:

    
# powerlifting_meets = pd.read_csv("../input/powerlifting-database/meets.csv")
# powerlifting_competitors = pd.read_csv("../input/powerlifting-database/openpowerlifting.csv")
powerlifting_meets = pd.read_csv("meets.csv")
powerlifting_meets.head()









    Out[0]:







  
    
      
      MeetID
      MeetPath
      Federation
      Date
      MeetCountry
      MeetState
      MeetTown
      MeetName
    
  
  
    
      0
      0
      365strong/1601
      365Strong
      2016-10-29
      USA
      NC
      Charlotte
      2016 Junior & Senior National Powerlifting Cha...
    
    
      1
      1
      365strong/1602
      365Strong
      2016-11-19
      USA
      MO
      Ozark
      Thanksgiving Powerlifting Classic
    
    
      2
      2
      365strong/1603
      365Strong
      2016-07-09
      USA
      NC
      Charlotte
      Charlotte Europa Games
    
    
      3
      3
      365strong/1604
      365Strong
      2016-06-11
      USA
      SC
      Rock Hill
      Carolina Cup Push Pull Challenge
    
    
      4
      4
      365strong/1605
      365Strong
      2016-04-10
      USA
      SC
      Rock Hill
      Eastern USA Challenge



In [0]:

    
powerlifting_competitors = pd.read_csv("openpowerlifting.csv")
powerlifting_competitors.head()









    Out[0]:







  
    
      
      MeetID
      Name
      Sex
      Equipment
      Age
      Division
      BodyweightKg
      WeightClassKg
      Squat4Kg
      BestSquatKg
      Bench4Kg
      BestBenchKg
      Deadlift4Kg
      BestDeadliftKg
      TotalKg
      Place
      Wilks
    
  
  
    
      0
      0
      Angie Belk Terry
      F
      Wraps
      47.0
      Mst 45-49
      59.60
      60
      NaN
      47.63
      NaN
      20.41
      NaN
      70.31
      138.35
      1
      155.05
    
    
      1
      0
      Dawn Bogart
      F
      Single-ply
      42.0
      Mst 40-44
      58.51
      60
      NaN
      142.88
      NaN
      95.25
      NaN
      163.29
      401.42
      1
      456.38
    
    
      2
      0
      Dawn Bogart
      F
      Single-ply
      42.0
      Open Senior
      58.51
      60
      NaN
      142.88
      NaN
      95.25
      NaN
      163.29
      401.42
      1
      456.38
    
    
      3
      0
      Dawn Bogart
      F
      Raw
      42.0
      Open Senior
      58.51
      60
      NaN
      NaN
      NaN
      95.25
      NaN
      NaN
      95.25
      1
      108.29
    
    
      4
      0
      Destiny Dula
      F
      Raw
      18.0
      Teen 18-19
      63.68
      67.5
      NaN
      NaN
      NaN
      31.75
      NaN
      90.72
      122.47
      1
      130.47

Both tables include references to a MeetID, a unique key for each meet (competition) included in the database. Using this, generate a dataset combining the two tables into one.



In [0]:

    
powerlifting_combined = powerlifting_meets.set_index("MeetID").join(powerlifting_competitors.set_index("MeetID"))

powerlifting_combined.head()









    Out[0]:







  
    
      
      MeetPath
      Federation
      Date
      MeetCountry
      MeetState
      MeetTown
      MeetName
      Name
      Sex
      Equipment
      Age
      Division
      BodyweightKg
      WeightClassKg
      Squat4Kg
      BestSquatKg
      Bench4Kg
      BestBenchKg
      Deadlift4Kg
      BestDeadliftKg
      TotalKg
      Place
      Wilks
    
    
      MeetID
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      0
      365strong/1601
      365Strong
      2016-10-29
      USA
      NC
      Charlotte
      2016 Junior & Senior National Powerlifting Cha...
      Angie Belk Terry
      F
      Wraps
      47.0
      Mst 45-49
      59.60
      60
      NaN
      47.63
      NaN
      20.41
      NaN
      70.31
      138.35
      1
      155.05
    
    
      0
      365strong/1601
      365Strong
      2016-10-29
      USA
      NC
      Charlotte
      2016 Junior & Senior National Powerlifting Cha...
      Dawn Bogart
      F
      Single-ply
      42.0
      Mst 40-44
      58.51
      60
      NaN
      142.88
      NaN
      95.25
      NaN
      163.29
      401.42
      1
      456.38
    
    
      0
      365strong/1601
      365Strong
      2016-10-29
      USA
      NC
      Charlotte
      2016 Junior & Senior National Powerlifting Cha...
      Dawn Bogart
      F
      Single-ply
      42.0
      Open Senior
      58.51
      60
      NaN
      142.88
      NaN
      95.25
      NaN
      163.29
      401.42
      1
      456.38
    
    
      0
      365strong/1601
      365Strong
      2016-10-29
      USA
      NC
      Charlotte
      2016 Junior & Senior National Powerlifting Cha...
      Dawn Bogart
      F
      Raw
      42.0
      Open Senior
      58.51
      60
      NaN
      NaN
      NaN
      95.25
      NaN
      NaN
      95.25
      1
      108.29
    
    
      0
      365strong/1601
      365Strong
      2016-10-29
      USA
      NC
      Charlotte
      2016 Junior & Senior National Powerlifting Cha...
      Destiny Dula
      F
      Raw
      18.0
      Teen 18-19
      63.68
      67.5
      NaN
      NaN
      NaN
      31.75
      NaN
      90.72
      122.47
      1
      130.47

Ver este curso:

https://www.kaggle.com/learn/geospatial-analysis

	Bob	Sue
Product A	I liked it.	Pretty good.
Product B	It was awful.	Bland.

	Unnamed: 0	country	description	designation	points	price	province	region_1	region_2	taster_name	taster_twitter_handle	title	variety	winery
0	0	Italy	Aromas include tropical fruit, broom, brimston...	Vulkà Bianco	87	NaN	Sicily & Sardinia	Etna	NaN	Kerin O’Keefe	@kerinokeefe	Nicosia 2013 Vulkà Bianco (Etna)	White Blend	Nicosia
1	1	Portugal	This is ripe and fruity, a wine that is smooth...	Avidagos	87	15.0	Douro	NaN	NaN	Roger Voss	@vossroger	Quinta dos Avidagos 2011 Avidagos Red (Douro)	Portuguese Red	Quinta dos Avidagos
2	2	US	Tart and snappy, the flavors of lime flesh and...	NaN	87	14.0	Oregon	Willamette Valley	Willamette Valley	Paul Gregutt	@paulgwine	Rainstorm 2013 Pinot Gris (Willamette Valley)	Pinot Gris	Rainstorm
3	3	US	Pineapple rind, lemon pith and orange blossom ...	Reserve Late Harvest	87	13.0	Michigan	Lake Michigan Shore	NaN	Alexander Peartree	NaN	St. Julian 2013 Reserve Late Harvest Riesling ...	Riesling	St. Julian
4	4	US	Much like the regular bottling from 2012, this...	Vintner's Reserve Wild Child Block	87	65.0	Oregon	Willamette Valley	Willamette Valley	Paul Gregutt	@paulgwine	Sweet Cheeks 2012 Vintner's Reserve Wild Child...	Pinot Noir	Sweet Cheeks

	country	description	designation	points	price	province	region_1	region_2	variety	winery
0	US	This tremendous 100% varietal wine hails from ...	Martha's Vineyard	96	235.0	California	Napa Valley	Napa	Cabernet Sauvignon	Heitz
1	Spain	Ripe aromas of fig, blackberry and cassis are ...	Carodorum Selección Especial Reserva	96	110.0	Northern Spain	Toro	NaN	Tinta de Toro	Bodega Carmen Rodríguez
...	...	...	...	...	...	...	...	...	...	...
150928	France	A perfect salmon shade, with scents of peaches...	Grand Brut Rosé	90	52.0	Champagne	Champagne	NaN	Champagne Blend	Gosset
150929	Italy	More Pinot Grigios should taste like this. A r...	NaN	90	15.0	Northeastern Italy	Alto Adige	NaN	Pinot Grigio	Alois Lageder

	country	description	designation	points	price	province	region_1	region_2	variety	winery
15736	US	Clean and zesty, with high acidity framing jam...	NaN	85	17.0	California	Napa Valley	Napa	Sauvignon Blanc	Madrigal
15737	Australia	Hazyblur has coaxed some expressive fruit out ...	NaN	85	19.0	South Australia	Kangaroo Island	NaN	Pinot Gris	Hazyblur
15738	US	Heavy and full-bodied, a dense, soft Pinot Noi...	Bacigalupi Vineyards	85	42.0	California	Russian River Valley	Sonoma	Pinot Noir	John Tyler
15739	US	Estate-grown, tart and tasting of citrus and g...	NaN	85	12.0	Washington	Columbia Valley (WA)	Columbia Valley	Riesling	Jones of Washington
15740	US	A good, rugged Petite Sirah, with the bigtime ...	NaN	85	15.0	California	North	NaN	NaN	NaN

	country	description	designation	points	price	province	region_1	region_2	winery
variety
Cabernet Sauvignon	US	This tremendous 100% varietal wine hails from ...	Martha's Vineyard	96	235.0	California	Napa Valley	Napa	Heitz
Tinta de Toro	Spain	Ripe aromas of fig, blackberry and cassis are ...	Carodorum Selección Especial Reserva	96	110.0	Northern Spain	Toro	NaN	Bodega Carmen Rodríguez
...	...	...	...	...	...	...	...	...	...
Riesling	US	Estate-grown, tart and tasting of citrus and g...	NaN	85	12.0	Washington	Columbia Valley (WA)	Columbia Valley	Jones of Washington
NaN	US	A good, rugged Petite Sirah, with the bigtime ...	NaN	85	15.0	California	North	NaN	NaN

	country	description	designation	points	price	province	region_1	region_2	variety	winery
10	Italy	Elegance, complexity and structure come togeth...	Ronco della Chiesa	95	80.0	Northeastern Italy	Collio	NaN	Friulano	Borgo del Tiglio
32	Italy	Underbrush, scorched earth, menthol and plum s...	Vigna Piaggia	90	NaN	Tuscany	Brunello di Montalcino	NaN	Sangiovese	Abbadia Ardenga
...	...	...	...	...	...	...	...	...	...	...
15600	Italy	This Negroamaro-Aglianico blend opens slowly, ...	Terra Riserva	90	30.0	Southern Italy	Leverano	NaN	Red Blend	Conti Zecca
15609	Italy	Here is a dense and intense Primitivo with exo...	Punta Aquila	90	NaN	Southern Italy	Puglia	NaN	Primitivo	Tenute Rubino

	country	description	designation	points	price	province	region_1	region_2	variety	winery
4	France	This is the top wine from La Bégude, named aft...	La Brûlade	95	66.0	Provence	Bandol	NaN	Provence red blend	Domaine de la Bégude
10	Italy	Elegance, complexity and structure come togeth...	Ronco della Chiesa	95	80.0	Northeastern Italy	Collio	NaN	Friulano	Borgo del Tiglio
...	...	...	...	...	...	...	...	...	...	...
15704	France	From fruit growing around the village of Wetto...	Cuvée Albert	86	NaN	Alsace	Alsace	NaN	Riesling	Domaine Albert Mann
15727	France	Strong currant aromas, soft in texture with hi...	La Montarlet	85	20.0	Loire Valley	Sancerre	NaN	Sauvignon Blanc	Joseph Mellot

	country	description	designation	points	price	province	region_1	region_2	taster_name	taster_twitter_handle	title	variety	winery
345	Australia	This wine contains some material over 100 year...	Rare	100	350.0	Victoria	Rutherglen	NaN	Joe Czerwinski	@JoeCz	Chambers Rosewood Vineyards NV Rare Muscat (Ru...	Muscat	Chambers Rosewood Vineyards
346	Australia	This deep brown wine smells like a damp, mossy...	Rare	98	350.0	Victoria	Rutherglen	NaN	Joe Czerwinski	@JoeCz	Chambers Rosewood Vineyards NV Rare Muscadelle...	Muscadelle	Chambers Rosewood Vineyards
...	...	...	...	...	...	...	...	...	...	...	...	...	...
122507	New Zealand	This blend of Cabernet Sauvignon (62.5%), Merl...	SQM Gimblett Gravels Cabernets/Merlot	95	79.0	Hawke's Bay	NaN	NaN	Joe Czerwinski	@JoeCz	Squawking Magpie 2014 SQM Gimblett Gravels Cab...	Bordeaux-style Red Blend	Squawking Magpie
122939	Australia	Full-bodied and plush yet vibrant and imbued w...	The Factor	98	125.0	South Australia	Barossa Valley	NaN	Joe Czerwinski	@JoeCz	Torbreck 2013 The Factor Shiraz (Barossa Valley)	Shiraz	Torbreck

		country	description	designation	points	price	province	region_1	region_2	taster_name	taster_twitter_handle	title	variety	winery
country	province
Argentina	Mendoza Province	Argentina	If the color doesn't tell the full story, the ...	Nicasia Vineyard	97	120.0	Mendoza Province	Mendoza	NaN	Michael Schachner	@wineschach	Bodega Catena Zapata 2006 Nicasia Vineyard Mal...	Malbec	Bodega Catena Zapata
Argentina	Other	Argentina	Take note, this could be the best wine Colomé ...	Reserva	95	90.0	Other	Salta	NaN	Michael Schachner	@wineschach	Colomé 2010 Reserva Malbec (Salta)	Malbec	Colomé
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Uruguay	San Jose	Uruguay	Baked, sweet, heavy aromas turn earthy with ti...	El Preciado Gran Reserva	87	50.0	San Jose	NaN	NaN	Michael Schachner	@wineschach	Castillo Viejo 2005 El Preciado Gran Reserva R...	Red Blend	Castillo Viejo
Uruguay	Uruguay	Uruguay	Cherry and berry aromas are ripe, healthy and ...	Blend 002 Limited Edition	91	22.0	Uruguay	NaN	NaN	Michael Schachner	@wineschach	Narbona NV Blend 002 Limited Edition Tannat-Ca...	Tannat-Cabernet Franc	Narbona

	len	min	max
country
Argentina	3800.0	4.0	230.0
Armenia	2.0	14.0	15.0
...	...	...	...
Ukraine	14.0	6.0	13.0
Uruguay	109.0	10.0	130.0

	country	province	len
0	Argentina	Mendoza Province	3264
1	Argentina	Other	536
...	...	...	...
423	Uruguay	San Jose	3
424	Uruguay	Uruguay	24

	country	province	len
179	Greece	Muscat of Kefallonian	1
192	Greece	Sterea Ellada	1
...	...	...	...
415	US	Washington	8639
392	US	California	36247

	min	max
variety
Abouriou	15.0	75.0
Agiorgitiko	10.0	66.0
...	...	...
Çalkarası	19.0	19.0
Žilavka	15.0	15.0

	min	max
variety
Ramisco	495.0	495.0
Terrantez	236.0	236.0
...	...	...
Vital	NaN	NaN
Zelen	NaN	NaN

	country	description	designation	points	price	province	region_1	region_2	taster_name	taster_twitter_handle	title	variety	winery
913	NaN	Amber in color, this wine has aromas of peach ...	Asureti Valley	87	30.0	NaN	NaN	NaN	Mike DeSimone	@worldwineguys	Gotsa Family Wines 2014 Asureti Valley Chinuri	Chinuri	Gotsa Family Wines
3131	NaN	Soft, fruity and juicy, this is a pleasant, si...	Partager	83	NaN	NaN	NaN	NaN	Roger Voss	@vossroger	Barton & Guestier NV Partager Red	Red Blend	Barton & Guestier
...	...	...	...	...	...	...	...	...	...	...	...	...	...
129590	NaN	A blend of 60% Syrah, 30% Cabernet Sauvignon a...	Shah	90	30.0	NaN	NaN	NaN	Mike DeSimone	@worldwineguys	Büyülübağ 2012 Shah Red	Red Blend	Büyülübağ
129900	NaN	This wine offers a delightful bouquet of black...	NaN	91	32.0	NaN	NaN	NaN	Mike DeSimone	@worldwineguys	Psagot 2014 Merlot	Merlot	Psagot

	country	description	designation	points	price	province	region_1	region_2	taster_name	taster_twitter_handle	title	variety	winery
firstEntry	Italy	Aromas include tropical fruit, broom, brimston...	Vulkà Bianco	87	NaN	Sicily & Sardinia	Etna	NaN	Kerin O’Keefe	@kerinokeefe	Nicosia 2013 Vulkà Bianco (Etna)	White Blend	Nicosia
secondEntry	Portugal	This is ripe and fruity, a wine that is smooth...	Avidagos	87	15.0	Douro	NaN	NaN	Roger Voss	@vossroger	Quinta dos Avidagos 2011 Avidagos Red (Douro)	Portuguese Red	Quinta dos Avidagos
2	US	Tart and snappy, the flavors of lime flesh and...	NaN	87	14.0	Oregon	Willamette Valley	Willamette Valley	Paul Gregutt	@paulgwine	Rainstorm 2013 Pinot Gris (Willamette Valley)	Pinot Gris	Rainstorm
3	US	Pineapple rind, lemon pith and orange blossom ...	Reserve Late Harvest	87	13.0	Michigan	Lake Michigan Shore	NaN	Alexander Peartree	NaN	St. Julian 2013 Reserve Late Harvest Riesling ...	Riesling	St. Julian
4	US	Much like the regular bottling from 2012, this...	Vintner's Reserve Wild Child Block	87	65.0	Oregon	Willamette Valley	Willamette Valley	Paul Gregutt	@paulgwine	Sweet Cheeks 2012 Vintner's Reserve Wild Child...	Pinot Noir	Sweet Cheeks
...	...	...	...	...	...	...	...	...	...	...	...	...	...
129966	Germany	Notes of honeysuckle and cantaloupe sweeten th...	Brauneberger Juffer-Sonnenuhr Spätlese	90	28.0	Mosel	NaN	NaN	Anna Lee C. Iijima	NaN	Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...	Riesling	Dr. H. Thanisch (Erben Müller-Burggraef)
129967	US	Citation is given as much as a decade of bottl...	NaN	90	75.0	Oregon	Oregon	Oregon Other	Paul Gregutt	@paulgwine	Citation 2004 Pinot Noir (Oregon)	Pinot Noir	Citation
129968	France	Well-drained gravel soil gives this wine its c...	Kritt	90	30.0	Alsace	Alsace	NaN	Roger Voss	@vossroger	Domaine Gresser 2013 Kritt Gewurztraminer (Als...	Gewürztraminer	Domaine Gresser
129969	France	A dry style of Pinot Gris, this is crisp with ...	NaN	90	32.0	Alsace	Alsace	NaN	Roger Voss	@vossroger	Domaine Marcel Deiss 2012 Pinot Gris (Alsace)	Pinot Gris	Domaine Marcel Deiss
129970	France	Big, rich and off-dry, this is powered by inte...	Lieu-dit Harth Cuvée Caroline	90	21.0	Alsace	Alsace	NaN	Roger Voss	@vossroger	Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...	Gewürztraminer	Domaine Schoffit

fields	country	description	designation	points	price	province	region_1	region_2	taster_name	taster_twitter_handle	title	variety	winery
wines
0	Italy	Aromas include tropical fruit, broom, brimston...	Vulkà Bianco	87	NaN	Sicily & Sardinia	Etna	NaN	Kerin O’Keefe	@kerinokeefe	Nicosia 2013 Vulkà Bianco (Etna)	White Blend	Nicosia
1	Portugal	This is ripe and fruity, a wine that is smooth...	Avidagos	87	15.0	Douro	NaN	NaN	Roger Voss	@vossroger	Quinta dos Avidagos 2011 Avidagos Red (Douro)	Portuguese Red	Quinta dos Avidagos
2	US	Tart and snappy, the flavors of lime flesh and...	NaN	87	14.0	Oregon	Willamette Valley	Willamette Valley	Paul Gregutt	@paulgwine	Rainstorm 2013 Pinot Gris (Willamette Valley)	Pinot Gris	Rainstorm
3	US	Pineapple rind, lemon pith and orange blossom ...	Reserve Late Harvest	87	13.0	Michigan	Lake Michigan Shore	NaN	Alexander Peartree	NaN	St. Julian 2013 Reserve Late Harvest Riesling ...	Riesling	St. Julian
4	US	Much like the regular bottling from 2012, this...	Vintner's Reserve Wild Child Block	87	65.0	Oregon	Willamette Valley	Willamette Valley	Paul Gregutt	@paulgwine	Sweet Cheeks 2012 Vintner's Reserve Wild Child...	Pinot Noir	Sweet Cheeks
...	...	...	...	...	...	...	...	...	...	...	...	...	...
129966	Germany	Notes of honeysuckle and cantaloupe sweeten th...	Brauneberger Juffer-Sonnenuhr Spätlese	90	28.0	Mosel	NaN	NaN	Anna Lee C. Iijima	NaN	Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...	Riesling	Dr. H. Thanisch (Erben Müller-Burggraef)
129967	US	Citation is given as much as a decade of bottl...	NaN	90	75.0	Oregon	Oregon	Oregon Other	Paul Gregutt	@paulgwine	Citation 2004 Pinot Noir (Oregon)	Pinot Noir	Citation
129968	France	Well-drained gravel soil gives this wine its c...	Kritt	90	30.0	Alsace	Alsace	NaN	Roger Voss	@vossroger	Domaine Gresser 2013 Kritt Gewurztraminer (Als...	Gewürztraminer	Domaine Gresser
129969	France	A dry style of Pinot Gris, this is crisp with ...	NaN	90	32.0	Alsace	Alsace	NaN	Roger Voss	@vossroger	Domaine Marcel Deiss 2012 Pinot Gris (Alsace)	Pinot Gris	Domaine Marcel Deiss
129970	France	Big, rich and off-dry, this is powered by inte...	Lieu-dit Harth Cuvée Caroline	90	21.0	Alsace	Alsace	NaN	Roger Voss	@vossroger	Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...	Gewürztraminer	Domaine Schoffit

	video_id	trending_date	title	channel_title	category_id	publish_time	tags	views	likes	dislikes	comment_count	thumbnail_link	comments_disabled	ratings_disabled	video_error_or_removed	description
0	n1WpP7iowLc	17.14.11	Eminem - Walk On Water (Audio) ft. Beyoncé	EminemVEVO	10	2017-11-10T17:00:03.000Z	Eminem\|"Walk"\|"On"\|"Water"\|"Aftermath/Shady/In...	17158579	787425	43420	125882	https://i.ytimg.com/vi/n1WpP7iowLc/default.jpg	False	False	False	Eminem's new track Walk on Water ft. Beyoncé i...
1	0dBIkQ4Mz1M	17.14.11	PLUSH - Bad Unboxing Fan Mail	iDubbbzTV	23	2017-11-13T17:00:00.000Z	plush\|"bad unboxing"\|"unboxing"\|"fan mail"\|"id...	1014651	127794	1688	13030	https://i.ytimg.com/vi/0dBIkQ4Mz1M/default.jpg	False	False	False	STill got a lot of packages. Probably will las...
2	5qpjK5DgCt4	17.14.11	Racist Superman \| Rudy Mancuso, King Bach & Le...	Rudy Mancuso	23	2017-11-12T19:05:24.000Z	racist superman\|"rudy"\|"mancuso"\|"king"\|"bach"...	3191434	146035	5339	8181	https://i.ytimg.com/vi/5qpjK5DgCt4/default.jpg	False	False	False	WATCH MY PREVIOUS VIDEO ▶ \n\nSUBSCRIBE ► http...
3	d380meD0W0M	17.14.11	I Dare You: GOING BALD!?	nigahiga	24	2017-11-12T18:01:41.000Z	ryan\|"higa"\|"higatv"\|"nigahiga"\|"i dare you"\|"...	2095828	132239	1989	17518	https://i.ytimg.com/vi/d380meD0W0M/default.jpg	False	False	False	I know it's been a while since we did this sho...
4	2Vv-BfVoq4g	17.14.11	Ed Sheeran - Perfect (Official Music Video)	Ed Sheeran	10	2017-11-09T11:04:14.000Z	edsheeran\|"ed sheeran"\|"acoustic"\|"live"\|"cove...	33523622	1634130	21082	85067	https://i.ytimg.com/vi/2Vv-BfVoq4g/default.jpg	False	False	False	🎧: https://ad.gt/yt-perfect\n💰: https://atlant...
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
38911	l884wKofd54	18.14.06	Enrique Iglesias - MOVE TO MIAMI (Official Vid...	EnriqueIglesiasVEVO	10	2018-05-09T07:00:01.000Z	Enrique Iglesias feat. Pitbull\|"MOVE TO MIAMI"...	25066952	268088	12783	9933	https://i.ytimg.com/vi/l884wKofd54/default.jpg	False	False	False	NEW SONG - MOVE TO MIAMI feat. Pitbull (Click ...
38912	IP8k2xkhOdI	18.14.06	Jacob Sartorius - Up With It (Official Music V...	Jacob Sartorius	10	2018-05-11T17:09:16.000Z	jacob sartorius\|"jacob"\|"up with it"\|"jacob sa...	1492219	61998	13781	24330	https://i.ytimg.com/vi/IP8k2xkhOdI/default.jpg	False	False	False	THE OFFICIAL UP WITH IT MUSIC VIDEO!Get my new...
38913	Il-an3K9pjg	18.14.06	Anne-Marie - 2002 [Official Video]	Anne-Marie	10	2018-05-08T11:05:08.000Z	anne\|"marie"\|"anne-marie"\|"2002"\|"two thousand...	29641412	394830	8892	19988	https://i.ytimg.com/vi/Il-an3K9pjg/default.jpg	False	False	False	Get 2002 by Anne-Marie HERE ▶ http://ad.gt/200...
38914	-DRsfNObKIQ	18.14.06	Eleni Foureira - Fuego - Cyprus - LIVE - First...	Eurovision Song Contest	24	2018-05-08T20:32:32.000Z	Eurovision Song Contest\|"2018"\|"Lisbon"\|"Cypru...	14317515	151870	45875	26766	https://i.ytimg.com/vi/-DRsfNObKIQ/default.jpg	False	False	False	Eleni Foureira represented Cyprus at the first...
38915	4YFo4bdMO8Q	18.14.06	KYLE - Ikuyo feat. 2 Chainz & Sophia Black [A...	SuperDuperKyle	10	2018-05-11T04:06:35.000Z	Kyle\|"SuperDuperKyle"\|"Ikuyo"\|"2 Chainz"\|"Soph...	607552	18271	274	1423	https://i.ytimg.com/vi/4YFo4bdMO8Q/default.jpg	False	False	False	Debut album 'Light of Mine' out now: http://ky...

		video_id_CAN	channel_title_CAN	category_id_CAN	publish_time_CAN	tags_CAN	views_CAN	likes_CAN	dislikes_CAN	comment_count_CAN	thumbnail_link_CAN	comments_disabled_CAN	ratings_disabled_CAN	video_error_or_removed_CAN	description_CAN	video_id_UK	channel_title_UK	category_id_UK	publish_time_UK	tags_UK	views_UK	likes_UK	dislikes_UK	comment_count_UK	thumbnail_link_UK	comments_disabled_UK	ratings_disabled_UK	video_error_or_removed_UK	description_UK
title	trending_date
!! THIS VIDEO IS NOTHING BUT PAIN !! \| Getting Over It - Part 7	18.04.01	PNn8sECd7io	Markiplier	20	2018-01-03T19:33:53.000Z	getting over it\|"markiplier"\|"funny moments"\|"...	835930	47058	1023	8250	https://i.ytimg.com/vi/PNn8sECd7io/default.jpg	False	False	False	Getting Over It continues with RAGE BEYOND ALL...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
#1 Fortnite World Rank - 2,323 Solo Wins!	18.09.03	DvPW66IFhMI	AlexRamiGaming	20	2018-03-09T07:15:52.000Z	PS4 Battle Royale\|"PS4 Pro Battle Royale"\|"Bat...	212838	5199	542	11	https://i.ytimg.com/vi/DvPW66IFhMI/default.jpg	False	False	False	Discord For EVERYONE - https://discord.gg/nhud...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
#1 Fortnite World Rank - 2,330 Solo Wins!	18.10.03	EXEaMjFeiEk	AlexRamiGaming	20	2018-03-10T06:26:17.000Z	PS4 Battle Royale\|"PS4 Pro Battle Royale"\|"Bat...	200764	5620	537	45	https://i.ytimg.com/vi/EXEaMjFeiEk/default.jpg	False	False	False	Discord For EVERYONE - https://discord.gg/nhud...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
#1 MOST ANTICIPATED VIDEO (Timber Frame House Raising)	17.20.12	bYvQmusLaxw	Pure Living for Life	24	2017-12-20T02:49:11.000Z	timber frame\|"timber framing"\|"timber frame ra...	79152	7761	159	1965	https://i.ytimg.com/vi/bYvQmusLaxw/default.jpg	False	False	False	Shelter Institute: http://bit.ly/2iwXj8B\nFull...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
#1 MOST ANTICIPATED VIDEO (Timber Frame House Raising)	17.21.12	bYvQmusLaxw	Pure Living for Life	24	2017-12-20T02:49:11.000Z	timber frame\|"timber framing"\|"timber frame ra...	232762	15515	329	3601	https://i.ytimg.com/vi/bYvQmusLaxw/default.jpg	False	False	False	Shelter Institute: http://bit.ly/2iwXj8B\nFull...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
😲She Is So Nervous But BLOWS The ROOF After Taking on OPERA Song! \| Britain´s Got Talent 2018	18.02.05	WttN1Z0XF4k	How Talented	24	2018-04-28T19:40:58.000Z	bgt\|"bgt 2018"\|"britain got talent"\|"britain´s...	713400	4684	260	266	https://i.ytimg.com/vi/WttN1Z0XF4k/default.jpg	False	False	False	Amy Marie Borg - Britain´s Got Talent 2018\n\n...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	18.29.04	WttN1Z0XF4k	How Talented	24	2018-04-28T19:40:58.000Z	bgt\|"bgt 2018"\|"britain got talent"\|"britain´s...	231906	1924	78	146	https://i.ytimg.com/vi/WttN1Z0XF4k/default.jpg	False	False	False	Amy Marie Borg - Britain´s Got Talent 2018\n\n...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	18.30.04	WttN1Z0XF4k	How Talented	24	2018-04-28T19:40:58.000Z	bgt\|"bgt 2018"\|"britain got talent"\|"britain´s...	476253	3417	176	240	https://i.ytimg.com/vi/WttN1Z0XF4k/default.jpg	False	False	False	Amy Marie Borg - Britain´s Got Talent 2018\n\n...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
🚨 BREAKING NEWS 🔴 Raja Live all Slot Channels Welcome 🎰	18.07.05	Wt9Gkpmbt44	TheBigJackpot	24	2018-05-07T06:58:59.000Z	Slot Machine\|"win"\|"Gambling"\|"Big Win"\|"raja"...	28973	2167	175	10	https://i.ytimg.com/vi/Wt9Gkpmbt44/default.jpg	False	False	False	The Raja takes matters in to his own hands ton...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
🚨Active Shooter at YouTube Headquarters - LIVE BREAKING NEWS COVERAGE	18.04.04	Az72jrKbANA	Right Side Broadcasting Network	25	2018-04-03T23:12:37.000Z	YouTube shooter\|"YouTube active shooter"\|"acti...	103513	1722	181	76	https://i.ytimg.com/vi/Az72jrKbANA/default.jpg	False	False	False	An active shooter has been reported at the You...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	name	category	amazon_link	total_mentions	subreddit_mentions	subreddit
0	BOOMco Halo Covenant Needler Blaster	Toys & Games	https://www.amazon.com/BOOMco-Halo-Covenant-Ne...	4.0	4	r/gaming
1	Raspberry PI 3 Model B 1.2GHz 64-bit quad-core...	Electronics	https://www.amazon.com/Raspberry-Model-A1-2GHz...	19.0	3	r/gaming
2	CanaKit 5V 2.5A Raspberry Pi 3 Power Supply / ...	Electronics	https://www.amazon.com/CanaKit-Raspberry-Suppl...	7.0	3	r/gaming
3	Panasonic K-KJ17MCA4BA Advanced Individual Cel...	Electronics	https://www.amazon.com/Panasonic-Advanced-Indi...	29.0	2	r/gaming
4	Mayflash GameCube Controller Adapter for Wii U...	Electronics	https://www.amazon.com/GameCube-Controller-Ada...	24.0	2	r/gaming

	MeetID	MeetPath	Federation	Date	MeetCountry	MeetState	MeetTown	MeetName
0	0	365strong/1601	365Strong	2016-10-29	USA	NC	Charlotte	2016 Junior & Senior National Powerlifting Cha...
1	1	365strong/1602	365Strong	2016-11-19	USA	MO	Ozark	Thanksgiving Powerlifting Classic
2	2	365strong/1603	365Strong	2016-07-09	USA	NC	Charlotte	Charlotte Europa Games
3	3	365strong/1604	365Strong	2016-06-11	USA	SC	Rock Hill	Carolina Cup Push Pull Challenge
4	4	365strong/1605	365Strong	2016-04-10	USA	SC	Rock Hill	Eastern USA Challenge

	Name	Sex	Equipment	Age	Division	BodyweightKg	WeightClassKg	Squat4Kg	BestSquatKg	Bench4Kg	BestBenchKg	Deadlift4Kg	BestDeadliftKg	TotalKg	Place	Wilks
0	Angie Belk Terry	F	Wraps	47.0	Mst 45-49	59.60	60	NaN	47.63	NaN	20.41	NaN	70.31	138.35	1	155.05
1	Dawn Bogart	F	Single-ply	42.0	Mst 40-44	58.51	60	NaN	142.88	NaN	95.25	NaN	163.29	401.42	1	456.38
2	Dawn Bogart	F	Single-ply	42.0	Open Senior	58.51	60	NaN	142.88	NaN	95.25	NaN	163.29	401.42	1	456.38
3	Dawn Bogart	F	Raw	42.0	Open Senior	58.51	60	NaN	NaN	NaN	95.25	NaN	NaN	95.25	1	108.29
4	Destiny Dula	F	Raw	18.0	Teen 18-19	63.68	67.5	NaN	NaN	NaN	31.75	NaN	90.72	122.47	1	130.47

	MeetPath	Federation	Date	MeetCountry	MeetState	MeetTown	MeetName	Name	Sex	Equipment	Age	Division	BodyweightKg	WeightClassKg	Squat4Kg	BestSquatKg	Bench4Kg	BestBenchKg	Deadlift4Kg	BestDeadliftKg	TotalKg	Place	Wilks
MeetID
0	365strong/1601	365Strong	2016-10-29	USA	NC	Charlotte	2016 Junior & Senior National Powerlifting Cha...	Angie Belk Terry	F	Wraps	47.0	Mst 45-49	59.60	60	NaN	47.63	NaN	20.41	NaN	70.31	138.35	1	155.05
0	365strong/1601	365Strong	2016-10-29	USA	NC	Charlotte	2016 Junior & Senior National Powerlifting Cha...	Dawn Bogart	F	Single-ply	42.0	Mst 40-44	58.51	60	NaN	142.88	NaN	95.25	NaN	163.29	401.42	1	456.38
0	365strong/1601	365Strong	2016-10-29	USA	NC	Charlotte	2016 Junior & Senior National Powerlifting Cha...	Dawn Bogart	F	Single-ply	42.0	Open Senior	58.51	60	NaN	142.88	NaN	95.25	NaN	163.29	401.42	1	456.38
0	365strong/1601	365Strong	2016-10-29	USA	NC	Charlotte	2016 Junior & Senior National Powerlifting Cha...	Dawn Bogart	F	Raw	42.0	Open Senior	58.51	60	NaN	NaN	NaN	95.25	NaN	NaN	95.25	1	108.29
0	365strong/1601	365Strong	2016-10-29	USA	NC	Charlotte	2016 Junior & Senior National Powerlifting Cha...	Destiny Dula	F	Raw	18.0	Teen 18-19	63.68	67.5	NaN	NaN	NaN	31.75	NaN	90.72	122.47	1	130.47