Runs this to start from scratch

Both should return an error if no credentials were previously set and your are using the service account of the instance.


In [34]:
!gcloud auth revoke --quiet


ERROR: (gcloud.auth.revoke) Cannot revoke GCE-provided credentials.

In [35]:
!gcloud auth application-default revoke --quiet


ERROR: (gcloud.auth.application-default.revoke) Application Default Credentials have not been set up, nothing was revoked.

Authentication

  1. As a developer, I want to interact with GCP via gcloud.

    gcloud auth login (run from a Notebook terminal)

    This obtains your credentials via a web flow and stores them in /root/.config/gcloud/credentials.db and for backward compatibility in /root/.config/gcloud/legacy_credentials/[YOUR_EMAIL]/adc.json

    Now:

  2. As a developer, I want my code to interact with GCP via SDK.

    gcloud auth application-default login (run using the GCP option in the navigation menu)

    This obtains your credentials via a web flow and stores them in /root/.config/gcloud/application_default_credentials.json.

    Now:

For more information, you can read the Google documentation or this excellent blog post.

Authenticate code running in the Notebook


In [45]:
# General
import google.auth
credentials, project_id = google.auth.default()

In [48]:
# Transparent for google.cloud libraries
from google.cloud import bigquery
client = bigquery.Client()

In [49]:
# If you try to run a query, this gets updated with values.
client.__dict__['_credentials'].__dict__


Out[49]:
{'token': None,
 'expiry': None,
 '_scopes': None,
 '_service_account_email': 'default'}

NOTE

For BigQuery, if you run a query using the following, your identity should have the following IAM roles or similar:

  • roles/bigquery.jobUser (Lower resource is Project) that includes the bigquery.jobs.create permission.
  • roles/bigquery.dataViewer (Lower resource is Dataset) that includes bigquery.tables.getData permission.
query_job = client.query(QUERY) 
rows = query_job.result()

Authenticate Spark

You can authenticate Spark using the credentials file or its content. Although you could use the file directly, workers would not have it locally because gcloud auth application-default login runs only for the Master. It means that the application_default_credentials.json file is only created on the Master node.

We have 3 options:

  • Option 1 [Recommended]: Read the file and pass the value as a string.
  • Option 2: Have the add-on to write the file to the master and workers. Requires proper permissions.
  • Option 3: Manually copy the file using a gcloud scp for example. Requires proper firewall access.

In [44]:
import base64

CREDENTIALS_FILE = "/root/.config/gcloud/application_default_credentials.json"

def get_credentials_text():
    if not os.path.isfile(CREDENTIALS_FILE):
        print("\x1b[31m\nNo credentials defined. Run gcloud auth application-default login.\n\x1b[0m")
        return
    
    return open(CREDENTIALS_FILE, "r").read()

credentials_txt = get_credentials_text()
credentials_b64 = base64.b64encode(credentials_txt.encode('utf-8')).decode('utf-8')

# Can not have both credentialsFile and credentials set.
spark.conf.unset("credentialsFile")
spark.conf.set("credentials", credentials_b64)

print("\x1b[32m\nSpark is now authenticated on this Master node.\n\x1b[0m")



Spark is now authenticated on this Master node.