R Guide

The purpose of this guide is to provide data scientists and engineers adequate resources to get started with coding in R using Google Cloud Platform Data products like GCS, Cloud SQL and BigQuery.

Google Cloud Storage

Refer to the googleCloudStorageR package introduction here to learn more about the R package used to access GCS resources.


BigQuery

Refer to the bigrquery R package documentation to learn more about accessing BigQuery resources in R. The guide also provides examples to make authenticated user calls to the BigQuery service and also ways to read/write data from/to BQ.


Cloud SQL

Refer to the Using R with Google Cloud SQL for MySQL guide by GCP to get an introduction to accessing MySQL on a Cloud SQL instance.


Installing R packages

  • R packages can be installed using JupyterLab notebooks or on the command line. Below are examples of each using the standard R repo.

  • R packages can be installed through command file using the below command: \ R -e "install.packages('abind', dependencies=TRUE, repos='http://cran.rstudio.com/')" \

  • To install R package from the JupyterLab notebook itself, use the below command: \ install.packages('abind', dependencies=TRUE, repos='http://cran.rstudio.com/')"


Authenticating R connection to GCS, Bigquery and Cloud SQL

Google Cloud Storage

The best method for authentication is to use your own Google Cloud Project. You can specify the location of a service account JSON file taken from your Google Project:


In [ ]:
Sys.setenv("GCS_AUTH_FILE" = "/fullpath/to/auth.json")

This file will then used for authentication via gcs_auth() when you load the library:


In [ ]:
## GCS_AUTH_FILE set so auto-authentication
library(googleCloudStorageR)

gcs_get_bucket("your-bucket")

BigQuery

When using bigrquery interactively, you’ll be prompted to authorize bigrquery in the browser. Your token will be cached across sessions inside the folder ~/.R/gargle/gargle-oauth/, by default. For non-interactive usage, it is preferred to use a service account token and put it into force via bq_auth(path = "/path/to/your/service-account.json"). More places to learn about auth:

  • Help for bigrquery::bq_auth().
  • How gargle gets tokens.
    • bigrquery obtains a token with gargle::token_fetch(), which supports a variety of token flows. This article provides full details, such as how to take advantage of Application Default Credentials or service accounts on GCE VMs.
  • Non-interactive auth. Explains how to set up a project when code must run without any user interaction.
  • How to get your own API credentials. Instructions for getting your own OAuth client (or “app”) or service account token. \ Note that bigrquery requests permission to modify your data; but it will never do so unless you explicitly request it (e.g. by calling bq_table_delete() or bq_table_upload()). Our Privacy policy provides more info.

Cloud SQL

Refer to R with Cloud SQL for MySQL documentation.


In [ ]:
# Load the DBI library
library(DBI)
# Helper for getting new connection to Cloud SQL
getSqlConnection <- function(){
  con <-
    dbConnect(
      RMySQL::MySQL(),
      username = 'username',
      password = 'password',
      host = '127.0.0.1',
      dbname = 'example'
    ) # TODO: use a configuration group `group = "my-db")`
  return(con)
}