This notebook provides recipes for loading and saving data from external sources.

Local file system

Uploading files from your local file system

files.upload returns a dictionary of the files which were uploaded. The dictionary is keyed by the file name, the value is the data which was uploaded.


In [0]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Downloading files to your local file system

files.download will invoke a browser download of the file to the user's local computer.


In [0]:
from google.colab import files

with open('example.txt', 'w') as f:
  f.write('some content')

files.download('example.txt')

Google Drive

You can access files in Drive in a number of ways, including:

  1. Using the native REST API;
  2. Using a wrapper around the API such as PyDrive; or
  3. Mounting your Google Drive in the runtime's virtual machine.

Example of each are below.

Mounting Google Drive locally

The example below shows how to mount your Google Drive in your virtual machine using an authorization code, and shows a couple of ways to write & read files there. Once executed, observe the new file (foo.txt) is visible in https://drive.google.com/

Note this only supports reading and writing files; to programmatically change sharing settings etc use one of the other options below.


In [0]:
from google.colab import drive
drive.mount('/content/gdrive')


Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code
Enter your authorization code:
··········
Mounted at /content/gdrive

In [0]:
with open('/content/gdrive/My Drive/foo.txt', 'w') as f:
  f.write('Hello Google Drive!')
!cat /content/gdrive/My\ Drive/foo.txt


Hello Google Drive!

PyDrive

The example below shows 1) authentication, 2) file upload, and 3) file download. More examples are available in the PyDrive documentation


In [0]:
!pip install -U -q PyDrive

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# PyDrive reference:
# https://gsuitedevs.github.io/PyDrive/docs/build/html/index.html

# 2. Create & upload a file text file.
uploaded = drive.CreateFile({'title': 'Sample upload.txt'})
uploaded.SetContentString('Sample upload file content')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))

# 3. Load a file by ID and print its contents.
downloaded = drive.CreateFile({'id': uploaded.get('id')})
print('Downloaded content "{}"'.format(downloaded.GetContentString()))


Uploaded file with ID 14vDAdqp7BSCQnoougmgylBexIr2AQx2T
Downloaded content "Sample upload file content"

Drive REST API

The first step is to authenticate.


In [0]:
from google.colab import auth
auth.authenticate_user()

Now we can construct a Drive API client.


In [0]:
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')

With the client created, we can use any of the functions in the Google Drive API reference. Examples follow.

Creating a new Drive file with data from Python


In [0]:
# Create a local file to upload.
with open('/tmp/to_upload.txt', 'w') as f:
  f.write('my sample file')

print('/tmp/to_upload.txt contains:')
!cat /tmp/to_upload.txt


/tmp/to_upload.txt contains:
my sample file

In [0]:
# Upload the file to Drive. See:
#
# https://developers.google.com/drive/v3/reference/files/create
# https://developers.google.com/drive/v3/web/manage-uploads
from googleapiclient.http import MediaFileUpload

file_metadata = {
  'name': 'Sample file',
  'mimeType': 'text/plain'
}
media = MediaFileUpload('/tmp/to_upload.txt', 
                        mimetype='text/plain',
                        resumable=True)
created = drive_service.files().create(body=file_metadata,
                                       media_body=media,
                                       fields='id').execute()
print('File ID: {}'.format(created.get('id')))


File ID: 1Cw9CqiyU6zbXFD9ViPZu_3yX-sYF4W17

After executing the cell above, a new file named 'Sample file' will appear in your drive.google.com file list. Your file ID will differ since you will have created a new, distinct file from the example above.

Downloading data from a Drive file into Python


In [0]:
# Download the file we just uploaded.
#
# Replace the assignment below with your file ID
# to download a different file.
#
# A file ID looks like: 1uBtlaggVyWshwcyP6kEI-y_W3P8D26sz
file_id = 'target_file_id'

import io
from googleapiclient.http import MediaIoBaseDownload

request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
  # _ is a placeholder for a progress object that we ignore.
  # (Our file is small, so we skip reporting progress.)
  _, done = downloader.next_chunk()

downloaded.seek(0)
print('Downloaded file contents are: {}'.format(downloaded.read()))


Downloaded file contents are: my sample file

Google Sheets

Our examples below will use the existing open-source gspread library for interacting with Sheets.

First, we'll install the package using pip.


In [0]:
!pip install --upgrade -q gspread

Next, we'll import the library, authenticate, and create the interface to sheets.


In [0]:
from google.colab import auth
auth.authenticate_user()

import gspread
from oauth2client.client import GoogleCredentials

gc = gspread.authorize(GoogleCredentials.get_application_default())

Below is a small set of gspread examples. Additional examples are shown on the gspread Github page.

Creating a new sheet with data from Python


In [0]:
sh = gc.create('A new spreadsheet')

After executing the cell above, a new spreadsheet will be shown in your sheets list on sheets.google.com.


In [0]:
# Open our new sheet and add some data.
worksheet = gc.open('A new spreadsheet').sheet1

cell_list = worksheet.range('A1:C2')

import random
for cell in cell_list:
  cell.value = random.randint(1, 10)

worksheet.update_cells(cell_list)

After executing the cell above, the sheet will be populated with random numbers in the assigned range.

Downloading data from a sheet into Python as a Pandas DataFrame

We'll read back to the data that we inserted above and convert the result into a Pandas DataFrame.

(The data you observe will differ since the contents of each cell is a random number.)


In [0]:
# Open our new sheet and read some data.
worksheet = gc.open('A new spreadsheet').sheet1

# get_all_values gives a list of rows.
rows = worksheet.get_all_values()
print(rows)

# Convert to a DataFrame and render.
import pandas as pd
pd.DataFrame.from_records(rows)


[['10', '5', '6'], ['9', '6', '2']]
Out[0]:
0 1 2
0 10 5 6
1 9 6 2

Google Cloud Storage (GCS)

We'll start by authenticating to GCS and creating the service client.


In [0]:
from google.colab import auth
auth.authenticate_user()

Upload a file from Python to a GCS bucket

We'll start by creating the sample file to be uploaded.


In [0]:
# Create a local file to upload.
with open('/tmp/to_upload.txt', 'w') as f:
  f.write('my sample file')

print('/tmp/to_upload.txt contains:')
!cat /tmp/to_upload.txt


/tmp/to_upload.txt contains:
my sample file

Next, we'll upload the file using the gsutil command, which is included by default on Colab backends.


In [0]:
# First, we need to set our project. Replace the assignment below
# with your project ID.
project_id = 'Your_project_ID_here'

In [0]:
!gcloud config set project {project_id}


Updated property [core/project].

In [0]:
import uuid

# Make a unique bucket to which we'll upload the file.
# (GCS buckets are part of a single global namespace.)
bucket_name = 'colab-sample-bucket-' + str(uuid.uuid1())

# Full reference: https://cloud.google.com/storage/docs/gsutil/commands/mb
!gsutil mb gs://{bucket_name}


Creating gs://colab-sample-bucket-44971372-baaf-11e7-ae30-0242ac110002/...

In [0]:
# Copy the file to our new bucket.
# Full reference: https://cloud.google.com/storage/docs/gsutil/commands/cp
!gsutil cp /tmp/to_upload.txt gs://{bucket_name}/


Copying file:///tmp/to_upload.txt [Content-Type=text/plain]...
/ [1 files][   14.0 B/   14.0 B]                                                
Operation completed over 1 objects/14.0 B.                                       

In [0]:
# Finally, dump the contents of our newly copied file to make sure everything worked.
!gsutil cat gs://{bucket_name}/to_upload.txt


my sample file

Using Python

This section demonstrates how to upload files using the native Python API rather than gsutil.

This snippet is based on a larger example with additional uses of the API.


In [0]:
# The first step is to create a bucket in your cloud project.
#
# Replace the assignment below with your cloud project ID.
#
# For details on cloud projects, see:
# https://cloud.google.com/resource-manager/docs/creating-managing-projects
project_id = 'Your_project_ID_here'

In [0]:
# Authenticate to GCS.
from google.colab import auth
auth.authenticate_user()

# Create the service client.
from googleapiclient.discovery import build
gcs_service = build('storage', 'v1')

# Generate a random bucket name to which we'll upload the file.
import uuid
bucket_name = 'colab-sample-bucket' + str(uuid.uuid1())

body = {
  'name': bucket_name,
  # For a full list of locations, see:
  # https://cloud.google.com/storage/docs/bucket-locations
  'location': 'us',
}
gcs_service.buckets().insert(project=project_id, body=body).execute()
print('Done')


Done

The cell below uploads the file to our newly created bucket.


In [0]:
from googleapiclient.http import MediaFileUpload

media = MediaFileUpload('/tmp/to_upload.txt', 
                        mimetype='text/plain',
                        resumable=True)

request = gcs_service.objects().insert(bucket=bucket_name, 
                                       name='to_upload.txt',
                                       media_body=media)

response = None
while response is None:
  # _ is a placeholder for a progress object that we ignore.
  # (Our file is small, so we skip reporting progress.)
  _, response = request.next_chunk()

print('Upload complete')


Upload complete

Once the upload has finished, the data will appear in the cloud console storage browser for your project:

https://console.cloud.google.com/storage/browser?project=YOUR_PROJECT_ID_HERE

Downloading a file from GCS to Python

Next, we'll download the file we just uploaded in the example above. It's as simple as reversing the order in the gsutil cp command.


In [0]:
# Download the file.
!gsutil cp gs://{bucket_name}/to_upload.txt /tmp/gsutil_download.txt
  
# Print the result to make sure the transfer worked.
!cat /tmp/gsutil_download.txt


Copying gs://colab-sample-bucket483f20dc-baaf-11e7-ae30-0242ac110002/to_upload.txt...
/ [1 files][   14.0 B/   14.0 B]                                                
Operation completed over 1 objects/14.0 B.                                       
my sample file

Using Python

We repeat the download example above using the native Python API.


In [0]:
# Authenticate to GCS.
from google.colab import auth
auth.authenticate_user()

# Create the service client.
from googleapiclient.discovery import build
gcs_service = build('storage', 'v1')

from apiclient.http import MediaIoBaseDownload

with open('/tmp/downloaded_from_gcs.txt', 'wb') as f:
  request = gcs_service.objects().get_media(bucket=bucket_name,
                                            object='to_upload.txt')
  media = MediaIoBaseDownload(f, request)

  done = False
  while not done:
    # _ is a placeholder for a progress object that we ignore.
    # (Our file is small, so we skip reporting progress.)
    _, done = media.next_chunk()

print('Download complete')


Download complete

In [0]:
# Inspect the file we downloaded to /tmp
!cat /tmp/downloaded_from_gcs.txt