Step 0: Authenticate & Fetch Code


In [ ]:
GCP_PROJECT_ID = "" #@param {type:"string"}

In [ ]:
!gcloud auth login
!gcloud config set project $GCP_PROJECT_ID

In [ ]:
!rm -rf modem && git clone https://github.com/google/modem.git

Step 1: Copy the service key file text into the cell below.

Replace the existing content expect the %%writefile header (line 1).


In [ ]:
%%writefile modem/bqml/pipeline/svc_key.json
{
  "TODO": "Replace file." 
}

Step 2: Fill out the parameters

GA_ACCOUNT_ID, GA_PROPERTY_ID, GA_DATASET_ID, BQML_PREDICT_QUERY (required)
Note: Please don't remove the string quotes.


In [ ]:
%%writefile modem/bqml/pipeline/params.py
# -------------------------MANDATORY SECTION------------------------

# GA Details 
GA_ACCOUNT_ID = ""
GA_PROPERTY_ID = ""
GA_DATASET_ID = "" 
GA_IMPORT_METHOD = "di" # "di" - Data Import or "mp" - Measurement Protocol 

# BQML Details -
# Ensure that the BQ result headers resemble the data import schema
# E.g. If data import schema looks like  - ga:clientId, ga:dimension1, etc.
# BQ result headers should like ga_clientId, ga_dimension1, etc.
BQML_PREDICT_QUERY = """
                     """

# Options for logging & error monitoring
# LOGGING: Create BQ Table for logs with schema as follows -
# time TIMESTAMP, status STRING, error ERROR
ENABLE_BQ_LOGGING = False
# ERROR MONITORING: Sign up for the free Sendgrid API.
ENABLE_SENDGRID_EMAIL_REPORTING = False

# --------------------------OPTIONAL SECTION-------------------------

# (OPTIONAL) Workflow Logging - BQ details, if enabled
GCP_PROJECT_ID = ""
BQ_DATASET_NAME = ""
BQ_TABLE_NAME = ""


# (OPTIONAL) Email Reporting - Sendgrid details, if enabled
SENDGRID_API_KEY = ""
TO_EMAIL = ""


# (OPTIONAL) Email Reporting - Additional Parameters
FROM_EMAIL = "workflow@example.com"
SUBJECT = "FAILED: Audience Upload to GA"
HTML_CONTENT = """
               <p>
               Hi WorkflowUser, <br>
               Your BQML Custom Audience Upload has failed- <br>
               Time: {0} UTC <br>
               Reason: {1}
               </p>
               """

Step 3: Deploy the cloud function

Run the cell below. Takes 2 - 3 mins. Asks for the following -

  • GCP PROJECT ID
  • FUNCTION NAME (any name you like)
  • Allow unauthenticated invocations of new function (y/N)? -> N

In [ ]:
%%shell
cd modem/bqml/pipeline
sh deploy.sh > upload.txt
cat upload.txt

Step 4: Test the function

Run the cell below.


In [ ]:
import re
functions_ui_url = "https://console.cloud.google.com/functions/list?project="+GCP_PROJECT_ID
print("Cloud Functions UI: ", functions_ui_url)
FUNCTION_URL = re.findall(r'https://.*', open("modem/bqml/pipeline/upload.txt").read())[0]
print("Testing: ",FUNCTION_URL)
!curl $FUNCTION_URL -H "Authorization: Bearer $(gcloud auth print-identity-token)"

Step 5: Schedule the function using Cloud Scheduler

Specify the params and run the cell below -

  • JOBNAME: Any name you like, e.g. "schedule_model_upload"
  • SCHEDULE: Specify the schedule in a cron-tab format e.g. "45 23 *" to run job every day at 11:45 pm
  • TIMEZONE: Specify timezone e.g. "EST", "PST", "CST" etc. for US time zones

In [ ]:
JOB_NAME="" #@param {type:"string", description:"ad"}
SCHEDULE="" #@param {type:"string"}
TIMEZONE="EST" #@param {type:"string"}
SERVICE_ACCOUNT_EMAIL=GCP_PROJECT_ID+"@appspot.gserviceaccount.com"

!gcloud scheduler jobs create http $JOB_NAME --schedule="$SCHEDULE" --uri="$FUNCTION_URL" --time-zone=$TIMEZONE --oidc-service-account-email=$SERVICE_ACCOUNT_EMAIL --attempt-deadline="540s"
scheduler_url = "https://console.cloud.google.com/cloudscheduler?project="+GCP_PROJECT_ID
print("Job scheduled. Check in the Cloud Scheduler UI: ", scheduler_url)