Fairness Indicators on TF-Hub Text Embeddings

In this colab, you will learn how to use Fairness Indicators to evaluate embeddings from TF Hub. Fairness Indicators is a suite of tools that facilitates evaluation and visualization of fairness metrics on machine learning models. Fairness Indicators is built on top of TensorFlow Model Analysis, TensorFlow's official model evaluation library.

Imports



In [0]:

    
!pip install fairness-indicators \
  "absl-py==0.8.0" \
  "pyarrow==0.15.1" \
  "apache-beam==2.17.0" \
  "avro-python3==1.9.1"



In [0]:

    
import os
import tempfile
import apache_beam as beam
from datetime import datetime
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_model_analysis as tfma
from tensorflow_model_analysis.addons.fairness.view import widget_view
from tensorflow_model_analysis.addons.fairness.post_export_metrics import fairness_indicators
from fairness_indicators import example_model
from fairness_indicators.examples import util

Defining Constants

TensorFlow parses features from data using FixedLenFeature and VarLenFeature. So to allow TensorFlow to parse our data, we will need to map out our input feature, output feature, and any slicing features that we will want to analyze via Fairness Indicators.



In [0]:

    
BASE_DIR = tempfile.gettempdir()

# The input and output features of the classifier
TEXT_FEATURE = 'comment_text'
LABEL = 'toxicity'

FEATURE_MAP = {
    # input and output features
    LABEL: tf.io.FixedLenFeature([], tf.float32),
    TEXT_FEATURE: tf.io.FixedLenFeature([], tf.string),

    # slicing features
    'sexual_orientation': tf.io.VarLenFeature(tf.string),
    'gender': tf.io.VarLenFeature(tf.string),
    'religion': tf.io.VarLenFeature(tf.string),
    'race': tf.io.VarLenFeature(tf.string),
    'disability': tf.io.VarLenFeature(tf.string)
}

IDENTITY_TERMS = ['gender', 'sexual_orientation', 'race', 'religion', 'disability']

Data

In this exercise, we'll work with the Civil Comments dataset, approximately 2 million public comments made public by the Civil Comments platform in 2017 for ongoing research. This effort was sponsored by Jigsaw, who have hosted competitions on Kaggle to help classify toxic comments as well as minimize unintended model bias.

Each individual text comment in the dataset has a toxicity label, with the label being 1 if the comment is toxic and 0 if the comment is non-toxic. Within the data, a subset of comments are labeled with a variety of identity attributes, including categories for gender, sexual orientation, religion, and race or ethnicity.

You can choose to download the original dataset and process it in the colab, which may take minutes, or you can download the preprocessed data.



In [0]:

    
download_original_data = True

if download_original_data:
  train_tf_file = tf.keras.utils.get_file('train_tf.tfrecord',
                                          'https://storage.googleapis.com/civil_comments_dataset/train_tf.tfrecord')
  validate_tf_file = tf.keras.utils.get_file('validate_tf.tfrecord',
                                             'https://storage.googleapis.com/civil_comments_dataset/validate_tf.tfrecord')

  # The identity terms list will be grouped together by their categories
  # on threshould 0.5. Only the identity term column, text column,
  # and label column will be kept after processing.
  train_tf_file = util.convert_comments_data(train_tf_file)
  validate_tf_file = util.convert_comments_data(validate_tf_file)

else:
  train_tf_file = tf.keras.utils.get_file('train_tf_processed.tfrecord',
                                          'https://storage.googleapis.com/civil_comments_dataset/train_tf_processed.tfrecord')
  validate_tf_file = tf.keras.utils.get_file('validate_tf_processed.tfrecord',
                                             'https://storage.googleapis.com/civil_comments_dataset/validate_tf_processed.tfrecord')

Creating a TensorFlow Model Analysis Pipeline

The Fairness Indicators library operates on TensorFlow Model Analysis (TFMA) models. TFMA models wrap TensorFlow models with additional functionality to evaluate and visualize their results. The actual evaluation occurs inside of an Apache Beam pipeline.

So we need to...

Build a TensorFlow model.
Build a TFMA model on top of the TensorFlow model.
Run the model analysis in a Beam pipeline.

Putting it all Together



In [0]:

    
def embedding_fairness_result(embedding, identity_term='gender'):
  
  model_dir = os.path.join(BASE_DIR, 'train',
                         datetime.now().strftime('%Y%m%d-%H%M%S'))

  print("Training classifier for " + embedding)
  classifier = example_model.train_model(model_dir,
                                         train_tf_file,
                                         LABEL,
                                         TEXT_FEATURE,
                                         FEATURE_MAP,
                                         embedding)

  # We need to create a unique path to store our results for this embedding.
  embedding_name = embedding.split('/')[-2]
  eval_result_path = os.path.join(BASE_DIR, 'eval_result', embedding_name)

  example_model.evaluate_model(classifier,
                               validate_tf_file,
                               eval_result_path,
                               identity_term,
                               LABEL,
                               FEATURE_MAP)
  return tfma.load_eval_result(output_path=eval_result_path)

Run TFMA & Fairness Indicators

Fairness Indicators Metrics

Refer here for more information on Fairness Indicators. Below are some of the available metrics.

Text Embeddings

TF-Hub provides several text embeddings. These embeddings will serve as the feature column for our different models. For this Colab, we use the following embeddings:

random-nnlm-en-dim128: random text embeddings, this serves as a convenient baseline.
nnlm-en-dim128: a text embedding based on A Neural Probabilistic Language Model.
universal-sentence-encoder: a text embedding based on Universal Sentence Encoder.

Fairness Indicator Results

For each of the above embeddings, we will compute fairness indicators with our embedding_fairness_result pipeline, and then render the results in the Fairness Indicator UI widget with widget_view.render_fairness_indicator.

Note that the widget_view.render_fairness_indicator cells may need to be run twice for the visualization to be displayed.

Random NNLM



In [0]:

    
eval_result_random_nnlm = embedding_fairness_result('https://tfhub.dev/google/random-nnlm-en-dim128/1')



In [0]:

    
widget_view.render_fairness_indicator(eval_result=eval_result_random_nnlm)

NNLM



In [0]:

    
eval_result_nnlm = embedding_fairness_result('https://tfhub.dev/google/nnlm-en-dim128/1')



In [0]:

    
widget_view.render_fairness_indicator(eval_result=eval_result_nnlm)

Universal Sentence Encoder



In [0]:

    
eval_result_use = embedding_fairness_result('https://tfhub.dev/google/universal-sentence-encoder/2')



In [0]:

    
widget_view.render_fairness_indicator(eval_result=eval_result_use)

Comparing Embeddings

We can also use Fairness Indicators to compare embeddings directly. Let's compare the models generated from the NNLM and USE embeddings.



In [0]:

    
widget_view.render_fairness_indicator(multi_eval_results={'nnlm': eval_result_nnlm, 'use': eval_result_use})

Exercises

Pick an identity category, such as religion or sexual orientation, and look at False Positive Rate for the Universal Sentence Encoder. How do different slices compare to each other? How do they compare to the Overall baseline?
Now pick a different identity category. Compare the results of this category with the previous one. Does the model weigh one category as more "toxic" than the other? Does this change with the embedding used?
Does the model generally tend to overestimate or underestimate the number of toxic comments?
Look at the graphs for different fairness metrics. Which metrics seem most informative? Which embeddings perform best and worst for that metric?