facebook-scraper

This is a short introduction to using the scraper to fully scrape a public FB page

Requirements

  • You need to register yourself as a developer on Facebook
  • You create an App on your Facebook developer page
  • You go to Graph Explorer to generate an Access Token with the permissions you want (I recommend getting all of them for this purpose to avoid errors later)

Notes

You will absolutely need to introduce the ACCESS_TOKEN but APP_ID and APP_ID_SECRET are only required in order to extend your ACCESS_TOKEN. If you are fine working with a short lived ACCESS_TOKEN and renewing that ACCESS_TOKEN manually on your Facebook developers page, then you can leave APP_ID and APP_ID_SECRET empty

PAGE_ID: The ID of the Public page you will scrape (for instance: '1889414787955466'). You will usually see this on the URL on your browser. Sometimes, however, a name is provided. The name WILL NOT work, you need to figure out the ID. (There are plenty of websites that do this, I use https://www.wallflux.com/facebook_id/)


In [ ]:
import fb_scraper.prodcons

APP_ID = ''
APP_ID_SECRET = ''
ACCESS_TOKEN = ''

Producer/Consummer Manager

The prodcons module, builds on a Producer/Consummer multithreaded approach to issue batch requests to the FB API and process the corresponding responses, saving them to the respective .CSV files


In [ ]:
mgr = fb_scraper.prodcons.Manager(
    access_token=ACCESS_TOKEN,
    api_key=APP_ID,
    api_secret=APP_ID_SECRET
    )

Extending ACCESS_TOKEN

(Must have APP_ID and APP_ID_SECRET setup)

This function extends the ACCESS_TOKEN and automatically replaces it in the mgr object

NOTE: Copy-paste it on your application setup to start using the extended token in the future


In [ ]:
mgr.graph.extend_token()

Start scraping threads

Just call the start() function from the Manager and wait until it is completed.

A line is printed to indicate how far the scraping has reached (i.e. how many posts, reactions, comments, etc... have been received and stored in the .CSV file structure)


In [ ]:
mgr.start()

Add scraping jobs

From the mgr object, just add the group or post (what is available at the moment) that you would like to scrape


In [ ]:
mgr.scrape_post('XXXXXXXXXXXXXX')  # Where 'XXXXXXXXXXXXXXX' is the FULL post ID, i.e. GROUPID_POSTID
mgr.scrape_group('XXXXXXXXXXXXXX')  # Where 'XXXXXXXXXXXXXXX' is the Group ID