Step 0: Authenticate & Fetch Code


In [ ]:
GCP_PROJECT_ID = "" #@param {type:"string"}
INSTANCE_NAME="" #@param {type:"string"}

In [ ]:
!gcloud auth login
!gcloud config set project $GCP_PROJECT_ID
!rm -rf modem && git clone https://github.com/google/modem.git

In [ ]:
import re
email=!gcloud config get-value account
SSH_DESTINATION = re.findall(r".*@",email[0])[0]+INSTANCE_NAME
SSH_DESTINATION = SSH_DESTINATION.replace('.','_')
print("SSH Destination: ", SSH_DESTINATION)

In [ ]:
%%writefile enable_firewall.sh
exists_rule=$(gcloud compute firewall-rules list | grep "default-allow-ssh" | wc -l)
if [ $exists_rule -eq 0 ] 
then 
  gcloud compute firewall-rules create default-allow-ssh --allow tcp:22;
  echo "New firewall rule created."
else
  echo "The correct firewall rules exist." 
fi

Step 1: Copy the service key file text into the cell below.

Replace the existing content expect the %%writefile header (line 1).


In [ ]:
%%writefile modem/bqml/pipeline/svc_key.json
{
  "TODO": "Replace file." 
}

Step 2: Fill out the parameters

GA_ACCOUNT_ID, GA_PROPERTY_ID, GA_DATASET_ID, BQML_PREDICT_QUERY (required)
Note: Please don't remove the string quotes.


In [ ]:
%%writefile modem/bqml/pipeline/params.py
# -------------------------MANDATORY SECTION------------------------

# GA Details 
GA_ACCOUNT_ID = ""
GA_PROPERTY_ID = ""
GA_DATASET_ID = "" 
GA_IMPORT_METHOD = "di" # "di" - Data Import or "mp" - Measurement Protocol 

# BQML Details -
# Ensure that the BQ result headers resemble the data import schema
# E.g. If data import schema looks like  - ga:clientId, ga:dimension1, etc.
# BQ result headers should like ga_clientId, ga_dimension1, etc.
BQML_PREDICT_QUERY = """
                     """

# Options for logging & error monitoring
# LOGGING: Create BQ Table for logs with schema as follows -
# time TIMESTAMP, status STRING, error ERROR
ENABLE_BQ_LOGGING = False
# ERROR MONITORING: Sign up for the free Sendgrid API.
ENABLE_SENDGRID_EMAIL_REPORTING = False

# --------------------------OPTIONAL SECTION-------------------------

# (OPTIONAL) Workflow Logging - BQ details, if enabled
GCP_PROJECT_ID = ""
BQ_DATASET_NAME = ""
BQ_TABLE_NAME = ""


# (OPTIONAL) Email Reporting - Sendgrid details, if enabled
SENDGRID_API_KEY = ""
TO_EMAIL = ""


# (OPTIONAL) Email Reporting - Additional Parameters
FROM_EMAIL = "workflow@example.com"
SUBJECT = "FAILED: Audience Upload to GA"
HTML_CONTENT = """
               <p>
               Hi WorkflowUser, <br>
               Your BQML Custom Audience Upload has failed- <br>
               Time: {0} UTC <br>
               Reason: {1}
               </p>
               """

Change the cron schedule (line 1) to the desired frequency.


In [ ]:
%%writefile modem/bqml/pipeline/shell_scheduler.sh
cron_schedule="*/2 * * * *" 
echo "$cron_schedule cd ~/modem/bqml/pipeline && python main.py >> logs.csv" | crontab -
echo "Your workflow has been scheduled with the cron schedule of $cron_schedule. Enjoy!"

Step 3: Deploy the code

Run the cell below.


In [ ]:
!sh enable_firewall.sh
!gcloud compute scp --recurse modem $SSH_DESTINATION:~/
!echo "Code deployed."

Step 4: Test and schedule the code

When you run the cell below, an interactive shell opens up. Use the following commands -

  1. Test the code. If successful, you should see a SUCCESS message along with a timestamp. Proceed to Step 2.
    If there are any errors, you can either work through the colab from Step 1, fixing the parameters, or log into the Compute Engine instance to fix the errors.
    cd modem/bqml/pipeline
    sh shell_deploy.sh
  2. Ensure Step 1 was successful. Run the command below.
    sh shell_scheduler.sh

In [ ]:
!gcloud compute ssh $SSH_DESTINATION