Below is the command to install package pymysql from the command line:
pip install pymysql
To run shell commands from inside the JupyterLab notebooks, we need to append the command with an exclamation mark:
!pip install pymysql
In [ ]:
!pip install pymysql
The JupyterLab notebook VMs created using the initialization script will come pre-installed with a wide number of python packages. Please refer the TDD document Section 3.1.2 to review.
An authentication call to the GCS, BQ and Cloud SQL can be made by using a valid Service Account JSON key. \ If you do not have a Service Account yet, follow the steps given here to retrieve Service Account JSON key. \ Service Account JSON key retreival involves below steps: \
1. Create a Service Account \
2. Assign it required IAM permissions \
3. Download Service Account JSON key \
4. Upload the key from where it can be used \
Once the key has been obtained, code snippets provided below can be used to authenticate the user and connect to the services via Python APIs.
In [ ]:
bucket_name = "name-of-your-bucket"
file_name = "filename.extension"
key_path = "path/to/service_account.json"
# TODO(developer):
# Set key_path to the path to the service account key file.
# key_path = "path/to/service_account.json"
# Set the bucket_name to the name of the bucket you want to read from
# Set the file_name to the file you want to read along with folder structure inside the mentioned bucket
# bucket_name and file_name variables are used to create the GCS object URL: gs://bucket_name/file_name
storage_client = storage.Client.from_service_account_json(key_path)
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(file_name)
#Below command will download all the contents of the specified GCS object and save it in string format
contents = blob.download_as_string()
print(contents)
In [ ]:
from google.cloud import bigquery
from google.oauth2 import service_account
import pandas
# TODO(developer):
# Set key_path to the path to the service account key file.
# key_path = "path/to/service_account.json"
credentials = service_account.Credentials.from_service_account_file(
key_path,
scopes=["https://www.googleapis.com/auth/cloud-platform"],
)
client = bigquery.Client(
credentials=credentials,
project=credentials.project_id,
)
The Cloud SQL proxy connection
can be authenticated at the time of connection creation. Please refer this guide to learn more about Cloud SQL authentication.
Download the Cloud SQL proxy
In [ ]:
!wget https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64 -O cloud_sql_proxy
!chmod +x cloud_sql_proxy
Run below command from the terminal to start SQLproxy
In [ ]:
# TODO(developer):
# Set PATH_TO_KEY_FILE to the path to the service account key file.
# Example: PATH_TO_KEY_FILE = "path/to/service_account.json"
!./cloud_sql_proxy -instances=<INSTANCE_NAME>=tcp:3306 -credential_file=<PATH_TO_KEY_FILE> &
In [ ]:
import pymysql
# TODO(developer):
# Change USERNAME and PASSWORD to the user and password created on Cloud SQL instance
# Set DB to the name of the database to be connected to
connection = pymysql.connect(host='127.0.0.1',
user='USERNAME',
password='PASSWORD',
db='DB')
mycursor = connection.cursor()
Use the Client.query method to run the query, and the QueryJob.to_dataframe method to return the results as a pandas DataFrame
.
In [ ]:
from google.cloud import bigquery
import pandas
client = bigquery.Client(location="US")
print("Client creating using default project: {}".format(client.project))
query = """
SELECT name, SUM(number) as total
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
ORDER BY total DESC
LIMIT 10
"""
query_job = client.query(
query,
# Location must match that of the dataset(s) referenced in the query.
location="US",
) # API request - starts the query
df = query_job.to_dataframe()
df
Panda's read_csv method supports reading dataframes directly from GCS file URL.
In [ ]:
import pandas as pd
df = pd.read_csv('gs://BUCKET/your_file.csv')
Panda's read_sql method can be used to read a table from Cloud SQL database and use it as a Dataframe.
In [ ]:
import pymysql
# TODO(developer):
# Change USERNAME and PASSWORD to the user and password created on Cloud SQL instance
# Set DB to the name of the database to be connected to
connection = pymysql.connect(host='127.0.0.1',
user='USERNAME',
password='PASSWORD',
db='DB')
query = "SELECT * FROM orders WHERE date_time BETWEEN ? AND ?"
df = pd.read_sql(query, connection, params=(start_date, end_date))