This notebook provides recipes for loading and saving data from external sources.
In [0]:
from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(
name=fn, length=len(uploaded[fn])))
In [0]:
from google.colab import files
with open('example.txt', 'w') as f:
f.write('some content')
files.download('example.txt')
You can access files in Drive in a number of ways, including:
Example of each are below.
The example below shows how to mount your Google Drive in your virtual machine using an authorization code, and shows a couple of ways to write & read files there. Once executed, observe the new file (foo.txt
) is visible in https://drive.google.com/
Note this only supports reading and writing files; to programmatically change sharing settings etc use one of the other options below.
In [0]:
from google.colab import drive
drive.mount('/content/gdrive')
In [0]:
with open('/content/gdrive/My Drive/foo.txt', 'w') as f:
f.write('Hello Google Drive!')
!cat /content/gdrive/My\ Drive/foo.txt
The example below shows 1) authentication, 2) file upload, and 3) file download. More examples are available in the PyDrive documentation
In [0]:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# PyDrive reference:
# https://gsuitedevs.github.io/PyDrive/docs/build/html/index.html
# 2. Create & upload a file text file.
uploaded = drive.CreateFile({'title': 'Sample upload.txt'})
uploaded.SetContentString('Sample upload file content')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))
# 3. Load a file by ID and print its contents.
downloaded = drive.CreateFile({'id': uploaded.get('id')})
print('Downloaded content "{}"'.format(downloaded.GetContentString()))
In [0]:
from google.colab import auth
auth.authenticate_user()
Now we can construct a Drive API client.
In [0]:
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')
With the client created, we can use any of the functions in the Google Drive API reference. Examples follow.
In [0]:
# Create a local file to upload.
with open('/tmp/to_upload.txt', 'w') as f:
f.write('my sample file')
print('/tmp/to_upload.txt contains:')
!cat /tmp/to_upload.txt
In [0]:
# Upload the file to Drive. See:
#
# https://developers.google.com/drive/v3/reference/files/create
# https://developers.google.com/drive/v3/web/manage-uploads
from googleapiclient.http import MediaFileUpload
file_metadata = {
'name': 'Sample file',
'mimeType': 'text/plain'
}
media = MediaFileUpload('/tmp/to_upload.txt',
mimetype='text/plain',
resumable=True)
created = drive_service.files().create(body=file_metadata,
media_body=media,
fields='id').execute()
print('File ID: {}'.format(created.get('id')))
After executing the cell above, a new file named 'Sample file' will appear in your drive.google.com file list. Your file ID will differ since you will have created a new, distinct file from the example above.
In [0]:
# Download the file we just uploaded.
#
# Replace the assignment below with your file ID
# to download a different file.
#
# A file ID looks like: 1uBtlaggVyWshwcyP6kEI-y_W3P8D26sz
file_id = 'target_file_id'
import io
from googleapiclient.http import MediaIoBaseDownload
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
downloaded.seek(0)
print('Downloaded file contents are: {}'.format(downloaded.read()))
Our examples below will use the existing open-source gspread library for interacting with Sheets.
First, we'll install the package using pip
.
In [0]:
!pip install --upgrade -q gspread
Next, we'll import the library, authenticate, and create the interface to sheets.
In [0]:
from google.colab import auth
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())
Below is a small set of gspread examples. Additional examples are shown on the gspread Github page.
In [0]:
sh = gc.create('A new spreadsheet')
After executing the cell above, a new spreadsheet will be shown in your sheets list on sheets.google.com.
In [0]:
# Open our new sheet and add some data.
worksheet = gc.open('A new spreadsheet').sheet1
cell_list = worksheet.range('A1:C2')
import random
for cell in cell_list:
cell.value = random.randint(1, 10)
worksheet.update_cells(cell_list)
After executing the cell above, the sheet will be populated with random numbers in the assigned range.
We'll read back to the data that we inserted above and convert the result into a Pandas DataFrame.
(The data you observe will differ since the contents of each cell is a random number.)
In [0]:
# Open our new sheet and read some data.
worksheet = gc.open('A new spreadsheet').sheet1
# get_all_values gives a list of rows.
rows = worksheet.get_all_values()
print(rows)
# Convert to a DataFrame and render.
import pandas as pd
pd.DataFrame.from_records(rows)
Out[0]:
In [0]:
from google.colab import auth
auth.authenticate_user()
In [0]:
# Create a local file to upload.
with open('/tmp/to_upload.txt', 'w') as f:
f.write('my sample file')
print('/tmp/to_upload.txt contains:')
!cat /tmp/to_upload.txt
Next, we'll upload the file using the gsutil
command, which is included by default on Colab backends.
In [0]:
# First, we need to set our project. Replace the assignment below
# with your project ID.
project_id = 'Your_project_ID_here'
In [0]:
!gcloud config set project {project_id}
In [0]:
import uuid
# Make a unique bucket to which we'll upload the file.
# (GCS buckets are part of a single global namespace.)
bucket_name = 'colab-sample-bucket-' + str(uuid.uuid1())
# Full reference: https://cloud.google.com/storage/docs/gsutil/commands/mb
!gsutil mb gs://{bucket_name}
In [0]:
# Copy the file to our new bucket.
# Full reference: https://cloud.google.com/storage/docs/gsutil/commands/cp
!gsutil cp /tmp/to_upload.txt gs://{bucket_name}/
In [0]:
# Finally, dump the contents of our newly copied file to make sure everything worked.
!gsutil cat gs://{bucket_name}/to_upload.txt
This section demonstrates how to upload files using the native Python API rather than gsutil
.
This snippet is based on a larger example with additional uses of the API.
In [0]:
# The first step is to create a bucket in your cloud project.
#
# Replace the assignment below with your cloud project ID.
#
# For details on cloud projects, see:
# https://cloud.google.com/resource-manager/docs/creating-managing-projects
project_id = 'Your_project_ID_here'
In [0]:
# Authenticate to GCS.
from google.colab import auth
auth.authenticate_user()
# Create the service client.
from googleapiclient.discovery import build
gcs_service = build('storage', 'v1')
# Generate a random bucket name to which we'll upload the file.
import uuid
bucket_name = 'colab-sample-bucket' + str(uuid.uuid1())
body = {
'name': bucket_name,
# For a full list of locations, see:
# https://cloud.google.com/storage/docs/bucket-locations
'location': 'us',
}
gcs_service.buckets().insert(project=project_id, body=body).execute()
print('Done')
The cell below uploads the file to our newly created bucket.
In [0]:
from googleapiclient.http import MediaFileUpload
media = MediaFileUpload('/tmp/to_upload.txt',
mimetype='text/plain',
resumable=True)
request = gcs_service.objects().insert(bucket=bucket_name,
name='to_upload.txt',
media_body=media)
response = None
while response is None:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, response = request.next_chunk()
print('Upload complete')
Once the upload has finished, the data will appear in the cloud console storage browser for your project:
https://console.cloud.google.com/storage/browser?project=YOUR_PROJECT_ID_HERE
In [0]:
# Download the file.
!gsutil cp gs://{bucket_name}/to_upload.txt /tmp/gsutil_download.txt
# Print the result to make sure the transfer worked.
!cat /tmp/gsutil_download.txt
We repeat the download example above using the native Python API.
In [0]:
# Authenticate to GCS.
from google.colab import auth
auth.authenticate_user()
# Create the service client.
from googleapiclient.discovery import build
gcs_service = build('storage', 'v1')
from apiclient.http import MediaIoBaseDownload
with open('/tmp/downloaded_from_gcs.txt', 'wb') as f:
request = gcs_service.objects().get_media(bucket=bucket_name,
object='to_upload.txt')
media = MediaIoBaseDownload(f, request)
done = False
while not done:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = media.next_chunk()
print('Download complete')
In [0]:
# Inspect the file we downloaded to /tmp
!cat /tmp/downloaded_from_gcs.txt