True score evaluations

    markdown_strs.append("The tables in this section show how well system scores can "
                        "predict *true* scores. According to Test theory, a *true* score "
                        "is a score that would have been obtained if there were no errors "
                        "in measurement. While true scores cannot be observed, the variance "
                        "of true scores and the prediction error can be estimated using observed "
                        "human scores when multiple human ratings are available for a subset of "
                        "responses. In this notebook these are estimated using human scores for "
                        "responses in the evaluation set.")
    markdown_strs.append("The table shows variance of human rater errors, "
                         "true score variance, mean squared error (MSE) and "
                         "proportional reduction in mean squared error (PRMSE) for "
                         "predicting a true score with system score.")
