In [1]:
import json
import pandas as pd
from pandas.io.json import json_normalize
The data is a JSON file containing ratings and reviews for a coffee shop.
In [9]:
with open("blue-bottle-coffee-san-francisco.json", "r") as f:
data = json.load(f)
Let's extract the contents of reviewList (i.e. ratings and content) into a Pandas dataframe.
In [3]:
df = json_normalize(data, "reviewList")
df.head(3)
Out[3]:
Since ratings were originally strings, let's convert them to numeric values so that we can do analyses on them.
In [4]:
df.dtypes
Out[4]:
In [5]:
df["ratings"] = pd.to_numeric(df["ratings"])
df.dtypes
Out[5]:
In [7]:
df["ratings"].describe()
Out[7]:
Now that our Pandas dataframe is in the correct format, let's write it to BigQuery. The df.to_gcp()
function below creates a dataset named mydataset
and a table named mytable
whose schema is df.dtypes
. You may check that this dataset is present in the Bigquery UI.
In [6]:
project_id = "your-project-ID"
df.to_gbq("mydataset.mytable", project_id=project_id, verbose=True, if_exists="replace")
You may also query this dataset from within Pandas, which returns a dataframe with the query results.
In [8]:
query = "SELECT * FROM mydataset.mytable LIMIT 5"
pd.read_gbq(query=query, dialect="standard", project_id=project_id)
Out[8]: