In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Caution: In addition to python packages this notebook uses npm install --user
to install packages. Be careful when running locally.
This tutorial shows how to use read and write files on Azure Blob Storage with TensorFlow, through TensorFlow IO's Azure file system integration.
An Azure storage account is needed to read and write files on Azure Blob Storage. The Azure Storage Key should be provided through environmental variable:
os.environ['TF_AZURE_STORAGE_KEY'] = '<key>'
The storage account name and container name are part of the filename uri:
azfs://<storage-account-name>/<container-name>/<path>
In this tutorial, for demo purposes we also provides the optional setup of Azurite which is a Azure Storage emulator. With Azurite emulator it is possible to read and write files through Azure blob storage interface with TensorFlow.
In [2]:
try:
%tensorflow_version 2.x
except Exception:
pass
!pip install tensorflow-io
In [3]:
!npm install azurite@2.7.0
In [4]:
# The path for npm might not be exposed in PATH env,
# you can find it out through 'npm bin' command
npm_bin_path = get_ipython().getoutput('npm bin')[0]
print('npm bin path: ', npm_bin_path)
# Run `azurite-blob -s` as a background process.
# IPython doesn't recognize `&` in inline bash cells.
get_ipython().system_raw(npm_bin_path + '/' + 'azurite-blob -s &')
The following is an example of reading and writing files to Azure Storage with TensorFlow's API.
It behaves the same way as other file systems (e.g., POSIX or GCS) in TensorFlow once tensorflow-io
package is imported, as tensorflow-io
will automatically register azfs
scheme for use.
The Azure Storage Key should be provided through TF_AZURE_STORAGE_KEY
environmental variable. Otherwise TF_AZURE_USE_DEV_STORAGE
could be set to True
to use Azurite emulator instead:
In [0]:
import os
import tensorflow as tf
import tensorflow_io as tfio
# Switch to False to use Azure Storage instead:
use_emulator = True
if use_emulator:
os.environ['TF_AZURE_USE_DEV_STORAGE'] = '1'
account_name = 'devstoreaccount1'
else:
# Replace <key> with Azure Storage Key, and <account> with Azure Storage Account
os.environ['TF_AZURE_STORAGE_KEY'] = '<key>'
account_name = '<account>'
In [6]:
pathname = 'az://{}/aztest'.format(account_name)
tf.io.gfile.mkdir(pathname)
filename = pathname + '/hello.txt'
with tf.io.gfile.GFile(filename, mode='w') as w:
w.write("Hello, world!")
with tf.io.gfile.GFile(filename, mode='r') as r:
print(r.read())
Configurations of Azure Blob Storage in TensorFlow are always done through environmental variables. Below is a complete list of available configurations:
TF_AZURE_USE_DEV_STORAGE
:
Set to 1 to use local development storage emulator for connections like 'az://devstoreaccount1/container/file.txt'. This will take precendence over all other settings so unset
to use any other connectionTF_AZURE_STORAGE_KEY
:
Account key for the storage account in useTF_AZURE_STORAGE_USE_HTTP
:
Set to any value if you don't want to use https transfer. unset
to use default of httpsTF_AZURE_STORAGE_BLOB_ENDPOINT
:
Set to the endpoint of blob storage - default is .core.windows.net
.