scraping_craigslist



In [1]:
import requests, lxml.html
import pandas as pd
from IPython.display import display, Image

pd.set_option('display.max_colwidth', 100)

In [2]:
offset = 0
response = requests.get("http://washingtondc.craigslist.org/search/apa?s=%s" % offset)
doc = lxml.html.fromstring(response.content)

rows = []
for row in doc.cssselect("div.content p.row"):
    item_id = row.get('data-pid')
    repost_of = row.get('data-repost-of')
    link = "http://washingtondc.craigslist.org" + row.cssselect('a')[0].get('href')
    row = [item_id, link]
    rows.append(row)
df = pd.DataFrame(rows, columns=['item_id', 'link'])
df.head()


Out[2]:
item_id link
0 5549206236 http://washingtondc.craigslist.org/doc/apa/5549206236.html
1 5549168290 http://washingtondc.craigslist.org/doc/apa/5549168290.html
2 5549112878 http://washingtondc.craigslist.org/doc/apa/5549112878.html
3 5549194740 http://washingtondc.craigslist.org/nva/apa/5549194740.html
4 5549187428 http://washingtondc.craigslist.org/nva/apa/5549187428.html

In [3]:
response = requests.get("http://washingtondc.craigslist.org/doc/apa/5549206236.html")
doc = lxml.html.fromstring(response.content)

print(doc.cssselect("section.body h2.postingtitle span.postingtitletext")[0].text_content().strip())
print("--------------------------------------------------")

first_image_in_carousel = doc.cssselect("section.body section.userbody div.slide.visible img")
if first_image_in_carousel:
    img_url = first_image_in_carousel[0].get('src')
    display(Image(url=img_url))

print("--------------------------------------------------")
    
print(doc.cssselect("#postingbody")[0].text_content())


$3250 / 2br - 1100ft2 - UNIQUE CUSTOM BUILD IN PRIME LOCATION w/PARKING (U St/Shaw)
--------------------------------------------------
--------------------------------------------------

Tired of living in a box that looks like all the other boxes? If so, you don't want to miss this. Asthetic: think industrial meets modern with a little wabi sabi and steampunk thrown in! If you don't know what that means...come have a look. First tenants post construction moving out of state-run don't walk!

-2BR
-2BA
-2 floors
-Private outdoor patio
-ONE CAR OFF STREET PARKING optional @250/mo additional
-Approx 1100 sf
-A wall of pennies and backsplash of nickels painstakingly applied by hand
-Acid stained concrete main level and bath floors, gleaming HWF on upper
-Exposed brick wall
-Exposed beams
-Industrial fixtures
-Concrete and copper bathroom sinks
-Spray foam insulated building & new Pella windows =lower utility costs
-Reclaimed woods used throughout for various design elements
-Handmade towel bars, tp holders, hooks from industrial piping
-Stainless appliances
-Double closets in one BR and walk in closet in other
-Washer/Dryer
-TV/Cable/Sound prewired
-Built in speaker system
-Central heat/cooling
-Gated rear entry to unit from well lit alleys
-Electronic keyless door entry system
AND MORE...

Less than a block from Compass Coffee, Glen's Garden Market, Kit n Ace, Lettie Gooch, around the corner from the entire 9th st development (Mockingbird Hill, Eat the Rich, CVS, etc), a block from u st (too much to list), Howard Theatre, Atlantic Plumbing movie theatre and 2 blocks from metro. Nightlife and restaurants galore. Walk score of 98, transit score 81, and a bike score of 96.

Available 5/1. One month's security deposit. Rent includes water, recycling, trash pickup. Tenant pays electric/phone/internet/cable.

One owner is licensed Realtor(disclosure required by law).