Craftcans.com: scraping with Pandas

Craftcans.com provides a database of 2692 crafted canned beers. The data on beers includes the following variables:

  • Name
  • Style
  • Size
  • Alcohol by volume (ABV)
  • IBU’s
  • Brewer name
  • Brewer location

This is very popular, nice database used for various kinds of analytical tasks. The database can be obtained in three ways:

  1. Manually copying and pasting into MS Excel (easy).
  2. Using pandas.read_html() function to scrape, as it is given inside table (medium).
  3. Using BeautifulSoup to scrape the content and get the data (hard).

In [1]:
import pandas

In [2]:
url = "http://craftcans.com/db.php?search=all&sort=beerid&ord=desc&view=text"

Option 2: pandas.read_html()


In [3]:
scraped_data = pandas.read_html(url)

In [4]:
len(scraped_data)


Out[4]:
8

In [5]:
data = scraped_data[-1]

In [6]:
data.head()


Out[6]:
0 1 2 3 4 5 6 7
0 ENTRY BEER BREWERY LOCATION STYLE SIZE ABV IBUs
1 2692. Get Together NorthGate Brewing Minneapolis, MN American IPA 16 oz. 4.5% 50
2 2691. Maggie's Leap NorthGate Brewing Minneapolis, MN Milk / Sweet Stout 16 oz. 4.9% 26
3 2690. Wall's End NorthGate Brewing Minneapolis, MN English Brown Ale 16 oz. 4.8% 19
4 2689. Pumpion NorthGate Brewing Minneapolis, MN Pumpkin Ale 16 oz. 6.0% 38

In [7]:
data.to_excel("craftcans.xlsx")