In [0]:
import pandas as pd
import tensorflow as tf
from sklearn.utils import shuffle
from google.cloud import bigquery
In [0]:
# Setting a random seed in TensorFlow
# Do this before you run training to ensure reproducible evaluation metrics
# You can use whatever value you'd like for the seed
tf.random.set_seed(2)
You also need to consider randomness when preparing your training, test, and validation datasets. To ensure consistency, prepare a shuffled dataset before training by setting a random seed value.
First, let's look at an example without shuffling. We'll grab some data from the NOAA storms public dataset in BigQuery. You'll need a Google Cloud account to run the cells that use this dataset.
In [0]:
from google.colab import auth
auth.authenticate_user()
Replace your-cloud-project below with the name of your Google Cloud project.
In [0]:
%%bigquery storms_df --project your-cloud-project
SELECT
*
FROM
`bigquery-public-data.noaa_historic_severe_storms.storms_*`
LIMIT 1000
Run the cell below multiple times, and notice that the order of the data changes each time.
In [9]:
storms_df = shuffle(storms_df)
storms_df.head()
Out[9]:
Next, repeat the above but set a random seed. Note that the data order stays the same even when run multiple times.
In [16]:
shuffled_df = shuffle(storms_df, random_state=2)
shuffled_df.head()
Out[16]:
In [0]:
%%bigquery storm_trends --project your-cloud-project
SELECT
SUBSTR(CAST(event_begin_time AS string), 1, 4) AS year,
COUNT(*) AS num_storms
FROM
`bigquery-public-data.noaa_historic_severe_storms.storms_*`
GROUP BY
year
ORDER BY
year ASC
In [18]:
storm_trends.head()
Out[18]:
As seen below, training a model on data before 2000 to predict storms now would result in incorrect predictions.
In [22]:
storm_trends.plot(title='Storm trends over time', x='year', y='num_storms')
Out[22]:
Copyright 2020 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License