Runs this to start from scratch

Both should return an error if no credentials were previously set and your are using the service account of the instance.



In [34]:

    
!gcloud auth revoke --quiet









    



ERROR: (gcloud.auth.revoke) Cannot revoke GCE-provided credentials.



In [35]:

    
!gcloud auth application-default revoke --quiet









    



ERROR: (gcloud.auth.application-default.revoke) Application Default Credentials have not been set up, nothing was revoked.

Authentication

As a developer, I want to interact with GCP via gcloud.

gcloud auth login (run from a Notebook terminal)

This obtains your credentials via a web flow and stores them in /root/.config/gcloud/credentials.db and for backward compatibility in /root/.config/gcloud/legacy_credentials/[YOUR_EMAIL]/adc.json

Now:
- gcloud commands runs from the Notebook's cells finds your credentials automatically.
- Other code or SDK (Python, Java,...) not automatically picks up those credentials.
  
  Reference: https://cloud.google.com/sdk/gcloud/reference/auth/login
As a developer, I want my code to interact with GCP via SDK.

gcloud auth application-default login (run using the GCP option in the navigation menu)

This obtains your credentials via a web flow and stores them in /root/.config/gcloud/application_default_credentials.json.

Now:
- Other code or SDK (Python, Java,...) finds the credentials automatically.
- Can run code locally which would normally run on a server without the need of a credentials file.
  
  Reference: https://cloud.google.com/sdk/gcloud/reference/auth/application-default/login

For more information, you can read the Google documentation or this excellent blog post.

Authenticate code running in the Notebook



In [45]:

    
# General
import google.auth
credentials, project_id = google.auth.default()



In [48]:

    
# Transparent for google.cloud libraries
from google.cloud import bigquery
client = bigquery.Client()



In [49]:

    
# If you try to run a query, this gets updated with values.
client.__dict__['_credentials'].__dict__









    Out[49]:





{'token': None,
 'expiry': None,
 '_scopes': None,
 '_service_account_email': 'default'}

NOTE

For BigQuery, if you run a query using the following, your identity should have the following IAM roles or similar:

roles/bigquery.jobUser (Lower resource is Project) that includes the bigquery.jobs.create permission.
roles/bigquery.dataViewer (Lower resource is Dataset) that includes bigquery.tables.getData permission.

query_job = client.query(QUERY) 
rows = query_job.result()

Authenticate Spark

You can authenticate Spark using the credentials file or its content. Although you could use the file directly, workers would not have it locally because gcloud auth application-default login runs only for the Master. It means that the application_default_credentials.json file is only created on the Master node.

We have 3 options:

Option 1 [Recommended]: Read the file and pass the value as a string.
Option 2: Have the add-on to write the file to the master and workers. Requires proper permissions.
Option 3: Manually copy the file using a gcloud scp for example. Requires proper firewall access.



In [44]:

    
import base64

CREDENTIALS_FILE = "/root/.config/gcloud/application_default_credentials.json"

def get_credentials_text():
    if not os.path.isfile(CREDENTIALS_FILE):
        print("\x1b[31m\nNo credentials defined. Run gcloud auth application-default login.\n\x1b[0m")
        return
    
    return open(CREDENTIALS_FILE, "r").read()

credentials_txt = get_credentials_text()
credentials_b64 = base64.b64encode(credentials_txt.encode('utf-8')).decode('utf-8')

# Can not have both credentialsFile and credentials set.
spark.conf.unset("credentialsFile")
spark.conf.set("credentials", credentials_b64)

print("\x1b[32m\nSpark is now authenticated on this Master node.\n\x1b[0m")









    




Spark is now authenticated on this Master node.