In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
This tutorial demonstrates the tfio.genome
package that provides commonly used genomics IO functionality--namely reading several genomics file formats and also providing some common operations for preparing the data (for example--one hot encoding or parsing Phred quality into probabilities).
This package uses the Google Nucleus library to provide some of the core functionality.
In [0]:
try:
%tensorflow_version 2.x
except Exception:
pass
!pip install tensorflow-io
In [0]:
import tensorflow_io as tfio
import tensorflow as tf
In [0]:
# Download some sample data:
!curl -OL https://raw.githubusercontent.com/tensorflow/io/master/tests/test_genome/test.fastq
In [0]:
fastq_data = tfio.genome.read_fastq(filename="test.fastq")
print(fastq_data.sequences)
print(fastq_data.raw_quality)
As you see, the returned fastq_data
has fastq_data.sequences
which is a string tensor of all sequences in the fastq file (which can each be a different size) along with fastq_data.raw_quality
which includes Phred encoded quality information about the quality of each base read in the sequence.
You can use a helper op to convert this quality information into probabilities if we are interested.
In [0]:
quality = tfio.genome.phred_sequences_to_probability(fastq_data.raw_quality)
print(quality.shape)
print(quality.row_lengths().numpy())
print(quality)
In [0]:
one_hot = tfio.genome.sequences_to_onehot(fastq_data.sequences)
print(one_hot)
print(one_hot.shape)
In [0]:
print(tfio.genome.sequences_to_onehot.__doc__)