gdeltPyR retrieves Global Database of Events, Language, and Tone (GDELT) data (version 1.0 or version 2.0) via parallel HTTP GET requests and is an alternative to accessing GDELT data via Google BigQuery .

Performance will vary based on the number of available cores (i.e. CPUs), internet connection speed, and available RAM. For systems with limited RAM, Later iterations of gdeltPyR will include an option to store the output directly to disc.

Memory Considerations

Take your systems specifications into consideration when running large or complex queries. While gdeltPyR loads each temporary file long enough only to convert it into a pandas dataframe (15 minutes each for 2.0, full day for 1.0 events tables), GDELT data can be especially large and exhaust a computers RAM. For example, Global Knowledge Graph (gkg) table queries can eat up large amounts of RAM when pulling data for only a few days. Before trying month long queries, try single day queries or create a pipeline that pulls several days worth of data, writes to discs, flushes globals, and continues to pull more data.

It's best to use a system with at least 8 GB of RAM.


pip install gdeltPyR

You can also install directly from

pip install git+

Basic Usage

gdeltPyR queries revolve around 4 concepts:

Name Description Input Possibilities/Examples
version (integer) - Selects the version of GDELT data to query; defaults to version 2. 1 or 2
date (string or list of strings) - Dates to query "2016 10 23" or "2016 Oct 23"
coverage (bool) - For GDELT 2.0, pulls every 15 minute interval in the dates passed in the 'date' parameter. Default coverage is False or None. gdeltPyR will pull the latest 15 minute interval for the current day or the last 15 minute interval for a historic day. True or False or None
tables (string) - The specific GDELT table to pull. The default table is the 'events' table. See the GDELT documentation page for more information 'events' or 'mentions' or 'gkg'

With these basic concepts, you can run any number of GDELT queries.

# Import the package
import gdelt

# Instantiate the gdelt object

gd = gdelt.gdelt(version=2)

To launch your query, pass in your dates. When passing multiple dates, pass as a list of strings. We will time the multi-day query.

Important Date Details for GDELT 1.0 and 2.0

For GDELT 2.0, every 15 minute interval is a zipped CSV file, and gdeltPyR makes concurrent HTTP GET requests to each file. When the coverage parameter is set to True, each full day of data has 96 15 minute interval files to pull. If you are pulling the current day and coverage is set to True, gdeltPyR all the intervals leading up to the latest 15 minute interval. When coverage is False, the package pulls the last 15 minute interval when querying a historical date and the latest 15 minute interval when querying the current date. Additinally, GDELT 2.0 data only goes back as far as Feb 2015. The additional features of GDELT 2.0 are discussed here.

GDELT 1.0 releases the previous day's query at 6AM EST of the next day (if today's current date is 23 Oct, the 22 Oct results would be available at 6AM Eastern on 23 Oct).

The Query

To launch your query, just pass in dates. When passing multiple dates, pass as a list of strings. First, some information on my OS.

import platform
import multiprocessing

print (platform.platform())

print (multiprocessing.cpu_count())


And now the query.

%time results = gd.Search(['2016 10 19','2016 10 22'],table='events',coverage=True)

CPU times: user 6.6 s, sys: 2.1 s, total: 8.7 s
Wall time: 36.8 s

Let's get an idea for the number of results we returned.

In ~36 seconds, gdeltPyR returned nearly a 900,000 by 61 (rows x columns) Pandas dataframe that only consumes 407.2 MBs of memory. With the data in a tidy format, GDELT data can be analyzed with any number of pandas data analysis pipelines and techniques.