Survival Volume

Author: Matthew Wakefield

SurvivalVolume is a small python package containing a set of convenience functions for parsing tumour volume time course data and plotting line graphs with Matplotlib and Kaplan Meier curves with the package Lifelines

This is a quick guide to usage

Installation

For a simple install using the release version of mpld3 you can either install from the Python Package Index (PyPI)

pip3 install survivalvolume

or from the github repository

pip3 install git+https://github.com/genomematt/survivalvolume.git

To run the tests use

python3 -m survivalvolume.tests.test_all

It is also recommended to run the user_guide.ipynb file and visually compare it to the html version

Parsing

There are currently no full example files in the repository for StudyLog; however, there is a very minimal file used for testing which demonstrates how this function works



In [1]:

    
from survivalvolume.parse import studylog_prism_to_tv_tables
tvts = studylog_prism_to_tv_tables('survivalvolume/tests/data/Studylogish.xlsx')



In [2]:

    
tvts['Treat']

Importing all files from a folder

The following example shows how to import a collection of related tsv (tab separated value) files from a folder and read them into a dictionary of data frames.



In [3]:

    
import pandas
import pathlib
p = pathlib.Path('survivalvolume/tests/data').glob('*.tsv')

parsed_data = {filename.stem:pandas.read_table(filename.as_posix(), sep='\t',index_col=0) for filename in p}
parsed_data['good_treatment']

If you have parsed data from a clumsy or proprietary format you may wish to export your data to tab separated value files as a clean and future proof backup option



In [4]:

    
from survivalvolume.tests.test_data import test_data

def save_dict_of_dataframes(dodf, dir_name='.'):
    for key in dodf:
        dodf[key].to_csv('{0}/{1}.tsv'.format(dir_name,key),sep='\t')
    pass

save_dict_of_dataframes(test_data, 'survivalvolume/tests/data')



In [5]:

    
from survivalvolume.tests.test_data import test_data
test_data['vehicle']



In [6]:

    
test_data['vehicle'].to_csv()









    Out[6]:





',6661,6663,6664,6665,6666,6667,6668,6669\n1,181.22,261.01,204.29,219.89,189.2,211.23,209.51,207.047\n4,277.63,552.57,344.84,243.41,278.13,199.2,317.32,270.52\n8,300.63,414.3,346.97,592.57,274.69,154.58,276.04,265.35\n11,570.15,557.21,351.16,416.23,347.82,317.42,244.11,276.19\n15,584.07,725.49,575.32,725.89,507.35,326.65,225.95,300.78\n18,841.32,,647.39,,322.61,594.09,338.84,530.2\n22,,,618.77,,729.18,411.46,492.96,342.98\n25,,,856.63,,,,337.02,488.18\n29,,,,,,,579.1,364.79\n32,,,,,,,836.56,919.84\n'

Biweekly measurement can sometimes result in inconsistent data due to the alternating offset of days. There is a convenience function in survivalvolume.parse standardise_days that allows easy renumbering to an alternating pattern



In [7]:

    
from survivalvolume.parse import standardise_days
standardise_days(test_data['vehicle'])



In [8]:

    
help(standardise_days)









    



Help on function standardise_days in module survivalvolume.parse:

standardise_days(dataframe, first_interval=3, second_interval=4)
    Renumber days in study log files by changing
    day numbers to series incrementing by alternating periods.
    (eg 3 day and 4 day periods) to adjust for individuals going
    on study on different days of the week.
    
    Arguments:
    
        dataframe          -  a pandas data frame with a
                              day based row index
        first_interval     -  the magnitude of odd numbered steps
        second_interval    -  the magnitude of even numbered steps
    
    Returns:
    
        a pandas dataframe with standardised days as index.



In [9]:

    
standardise_days(test_data['vehicle'],first_interval=4,second_interval=3)

Interactive Line Plots

The plotting functions all use dataframe objects. As a study log sheet contains multiple tables the parser returns a dictionary of dataframes. To mimic this result object we will import a pre-canned set of test data from SurvivalVolume's tests.



In [10]:

    
# Tell the drawing package we want to see the output in the notebook
import matplotlib
%matplotlib inline
# Lots of warnings are being generated which gets a bit distracting so we will turn them off
import warnings
warnings.filterwarnings('ignore')



In [11]:

    
from survivalvolume.tests.test_data import test_data
test_data['vehicle']

Our line plots are based around a TumourVolumePlot object that we will progressively update with data and plotting instructions



In [12]:

    
from survivalvolume.plot import TumourVolumePlot
tvp = TumourVolumePlot(figsize=(12,10))

There are three data types for a TumourVolumePlot, and you can build up a plot by adding instances of any combination of these data types. Each time you add data the plot will update, and you can display each step as you build up to your final figure.



In [13]:

    
tvp.add_mean('Vehicle',test_data['vehicle'],color='blue')
tvp.display()









    Out[13]:



In [14]:

    
tvp.add_individuals('Vehicle',test_data['vehicle'],color='blue')
tvp.display()









    Out[14]:

Note that this plot is interactive. When you hover your cursor over the lines for each individual you will see the line highlighted and additional information (the name of the individual). You can also use the magnifying glass to zoom in on a section, the arrow tool to move the canvas, and the home button to return to the original view.



In [15]:

    
tvp.add_interval('Vehicle',test_data['vehicle'],color='blue')
tvp.display()









    Out[15]:

The interval that is plotted is the 95% confidence interval of the mean. This is interpreted as the range in which the observed mean would lie 95% of the time with a random sample from the underlying normally distributed population that is estimated from the observed data. It is not the same thing as the 95% confidence interval based on standard deviation - this shows the range in which 95% of the observations would lie given the estimate from the observed data.

When interpreting these graphs the spread of the individual observations provides the information on the range of observed values you would expect to see, and the interval the region in which we are confident the mean line would lie if we repeated the experiment.

Tests of differences in means are likely to be significant if the 95% confidence intervals of the means do not overlap, while significantly different means may overlap a two standard deviation range of observations.

If you add additional data the plot will automatically rescale



In [16]:

    
tvp.add_mean('A treatment',test_data['other_treatment'],color='orange')
tvp.add_interval('A treatment',test_data['other_treatment'],color='orange')
tvp.add_individuals('A treatment',test_data['other_treatment'],color='orange')
tvp.display()









    Out[16]:



In [17]:

    
tvp.add_mean('Good treatment',test_data['good_treatment'],color='green')
tvp.add_interval('Good treatment',test_data['good_treatment'],color='green',alpha=0.1)
tvp.add_individuals('Good treatment',test_data['good_treatment'],color='green',dashes=[3,2,1,2])
tvp.display()









    Out[17]:

You can set the tite and axis labels on the TumourVolumePlot object



In [18]:

    
tvp.title = 'My Great Experiment'

You can work directly on the underlying matplotlib axis object if you need to adjust styles or set display limits



In [19]:

    
tvp.ax.set_xlim([0,100])
tvp.ax.set_ylim([0,1000])
tvp.ax.set_autoscaley_on(False)
tvp.display()









    Out[19]:

If you have multiple plots with the same legend you may wish to turn it off, which is simply achieved with an argument to display



In [20]:

    
tvp.display(legend=False)









    Out[20]:



In [21]:

    
tvp.xlim = [0,120]
tvp.display(legend=False)









    Out[21]:

You can save your output as interactive html, or as static pdf files



In [22]:

    
tvp.save_html('my great experiment.html')
tvp.save_pdf('my great experiment.pdf')

Threshold for display of the mean and intervals

The confidence interval based on the standard error of the mean is only defined when there is more than one individual. The mean of n=1 is valid; however, graphing a single individual as the mean can be visually misleading. We set the threshold of display for the mean so that it is only displayed when n > 2. When n is small the range of the confidence interval will also increase dramatically, and be default it is also only displayed when n > 2. Both the mean and interval can be altered by setting a threshold value, however a minimum of 2 data points is required to display the confidence interval.



In [23]:

    
from survivalvolume.plot import TumourVolumePlot
tvp = TumourVolumePlot(figsize=(12,10))

tvp.add_mean('A treatment',test_data['other_treatment'],color='orange',threshold=0)

tvp.add_interval('A treatment',test_data['other_treatment'],color='orange',threshold=0)
tvp.add_individuals('A treatment',test_data['other_treatment'],color='orange')

tvp.display()









    Out[23]:



In [24]:

    
tvp = TumourVolumePlot(figsize=(12,10))

tvp.add_mean('A treatment',test_data['other_treatment'],color='orange')
tvp.add_individuals('A treatment',test_data['other_treatment'],color='orange', dashes = [1,1])
tvp.add_interval('A treatment',test_data['other_treatment'],color='orange')

tvp.add_mean('Vehicle',test_data['vehicle'],color='blue')
tvp.add_individuals('Vehicle',test_data['vehicle'],color='blue', dashes = [3,3])
tvp.add_interval('Vehicle',test_data['vehicle'],color='blue')

tvp.add_mean('Good Treatment',test_data['good_treatment'],color='green')
tvp.add_individuals('Good Treatment',test_data['good_treatment'],color='green', dashes = [1,2,3,2])
tvp.add_interval('Good Treatment',test_data['good_treatment'],color='green')

tvp.add_mean('Good Treatment',test_data['good_treatment'],color='green')
tvp.add_individuals('Good Treatment',test_data['good_treatment'],color='green', dashes = [1,2,3,2])
tvp.add_interval('Good Treatment',test_data['good_treatment'],color='green')

tvp.xlim = [0,100]
tvp.title = 'My Great Treatment Model'
tvp.ylabel = 'TV mm$^{3}$'
tvp.km_ylabel = 'Survival Probability'

tvp.display()









    Out[24]:

Creating Kaplan Meier survival curves

We use lifelines, a package for drawing Kaplan Meier graphs and calculating statistical significance. Helper functions convert the tumour volume measurements to survival tables that can then be plotted



In [25]:

    
# Lifelines uses matplotlib and to get it to show in a notebook we need to use the following commands
import matplotlib
%matplotlib inline



In [26]:

    
from survivalvolume.plot import make_km
kmfs = {}
for name in test_data:
    kmfs[name] = make_km(test_data[name], label=name)

print(kmfs.keys())









    



dict_keys(['other_treatment', 'vehicle', 'good_treatment'])



In [27]:

    
kmfs['other_treatment'].plot(color='orange')









    Out[27]:





<matplotlib.axes._subplots.AxesSubplot at 0x115be6cc0>



In [28]:

    
ax = matplotlib.pyplot.subplot(111)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')

kmfs['other_treatment'].plot(show_censors=True,ax=ax,color='orange')
kmfs['vehicle'].plot(show_censors=True,ax=ax,color='blue')
kmfs['good_treatment'].plot(show_censors=True,ax=ax,color='green')









    Out[28]:





<matplotlib.axes._subplots.AxesSubplot at 0x115bbee80>

Statistics with lifelines

The lifelines package has two built in statistical tests, the log rank test and a multivariate log rank test. The log rank test assesses the probability of observing two data sets (or more extremely different data) if all data is drawn from populations with the same endpoint generation process. The multivariate test assess the probability of the observed or more extreme data if all observations are from the same endpoint generation process.

We need to convert the tumour volume data into a table of last observed time and whether the endpoint has been observed. In the example above this was done 'under the hood' by the survivalvolume.plot.make_km function.



In [29]:

    
import pandas
from survivalvolume.plot import volume_to_survival

volume_to_survival(test_data['vehicle'],endpoint=700)



In [30]:

    
from lifelines.statistics import logrank_test, pairwise_logrank_test, multivariate_logrank_test

vehicle_survival = volume_to_survival(test_data['vehicle'],endpoint=700)
good_treatment_survival = volume_to_survival(test_data['good_treatment'],endpoint=700)
other_treatment_survival = volume_to_survival(test_data['other_treatment'],endpoint=700)

result = logrank_test(list(vehicle_survival['Time']),
                      list(other_treatment_survival['Time']),
                      list(vehicle_survival['Observed']),
                      list(other_treatment_survival['Observed']),
                     alpha=0.95)
result.print_summary()









    



Results
   t 0: -1
   test: logrank
   alpha: 0.95
   null distribution: chi squared
   df: 1

   __ p-value ___|__ test statistic __|____ test result ____|__ is significant __
         0.00059 |             11.808 |      Reject Null    |        True

Building Dual Volume and Survival Plots

VolumeSurvivalPlot is a class where additions of means are applied to both a tumour volume plot and a Kaplan-Meier plot.

Note that drawstyle steps required for correct rendering of the Kaplan-Meier chart was not implemented until mpld3/mpld3#350 in mpld3 0.3git (August 2016). Full support will not be available until mpld3 v0.3 is released in late 2016. Until this time only static matplotlib plots will render correctly.



In [31]:

    
from survivalvolume.plot import VolumeSurvivalPlot

dual = VolumeSurvivalPlot(figsize=(12,5))
dual.add_mean('A treatment',test_data['other_treatment'],color='orange')
dual.add_individuals('A treatment',test_data['other_treatment'],color='orange', dashes = [1,1])
dual.add_interval('A treatment',test_data['other_treatment'],color='orange')

dual.add_mean('Vehicle',test_data['vehicle'],color='blue')
dual.add_individuals('Vehicle',test_data['vehicle'],color='blue', dashes = [3,3])
dual.add_interval('Vehicle',test_data['vehicle'],color='blue')

dual.add_mean('Good Treatment',test_data['good_treatment'],color='green')
dual.add_individuals('Good Treatment',test_data['good_treatment'],color='green', dashes = [1,2,3,2])
dual.add_interval('Good Treatment',test_data['good_treatment'],color='green')

dual.display(use_mpld3=False)

Using the matplotlib display back end you can't re-display a figure so the following constructs the whole plot again rather than adding to the plot like we did in previous examples. Saving the figure as a pdf will also hide the figure, so you can display html then save a pdf, but not the reverse.



In [32]:

    
dual = VolumeSurvivalPlot(figsize=(15,5),vertical=False)
dual.add_mean('A treatment',test_data['other_treatment'],color='orange')
dual.add_individuals('A treatment',test_data['other_treatment'],color='orange', dashes = [1,1])
dual.add_interval('A treatment',test_data['other_treatment'],color='orange')

dual.add_mean('Vehicle',test_data['vehicle'],color='blue')
dual.add_individuals('Vehicle',test_data['vehicle'],color='blue', dashes = [3,3])
dual.add_interval('Vehicle',test_data['vehicle'],color='blue')

dual.add_mean('Good Treatment',test_data['good_treatment'],color='green')
dual.add_individuals('Good Treatment',test_data['good_treatment'],color='green', dashes = [1,2,3,2])
dual.add_interval('Good Treatment',test_data['good_treatment'],color='green')

dual.xlim = [0,100]

dual.display(legend=False,use_mpld3=False)



In [33]:

    
dual = VolumeSurvivalPlot(figsize=(15,10), km_size=0.5)
dual.add_mean('A treatment',test_data['other_treatment'],color='orange')
dual.add_individuals('A treatment',test_data['other_treatment'],color='orange', dashes = [1,1])
dual.add_interval('A treatment',test_data['other_treatment'],color='orange')

dual.add_mean('Vehicle',test_data['vehicle'],color='blue')
dual.add_individuals('Vehicle',test_data['vehicle'],color='blue', dashes = [3,3])
dual.add_interval('Vehicle',test_data['vehicle'],color='blue')

dual.add_mean('Good Treatment',test_data['good_treatment'],color='green')
dual.add_individuals('Good Treatment',test_data['good_treatment'],color='green', dashes = [1,2,3,2])
dual.add_interval('Good Treatment',test_data['good_treatment'],color='green')

dual.add_mean('Good Treatment',test_data['good_treatment'],color='green')
dual.add_individuals('Good Treatment',test_data['good_treatment'],color='green', dashes = [1,2,3,2])
dual.add_interval('Good Treatment',test_data['good_treatment'],color='green')

dual.xlim = [0,100]
dual.title = 'My Great Treatment Model'
dual.ylabel = 'TV mm'
dual.km_ylabel = 'Survival Probability'


dual.display(legend=False)









    Out[33]:



In [34]:

    
dual.save_pdf('dual_plot.pdf')



In [35]:

    
dual = VolumeSurvivalPlot(figsize=(15,10), km_size=0.5)
dual.add_mean('A treatment',test_data['other_treatment'],color='orange')
dual.add_individuals('A treatment',test_data['other_treatment'],color='orange', dashes = [1,1])
dual.add_interval('A treatment',test_data['other_treatment'],color='orange')

dual.add_mean('Vehicle',test_data['vehicle'],color='blue')
dual.add_individuals('Vehicle',test_data['vehicle'],color='blue', dashes = [3,3])
dual.add_interval('Vehicle',test_data['vehicle'],color='blue')

dual.add_mean('Good Treatment',test_data['good_treatment'],color='green')
dual.add_individuals('Good Treatment',test_data['good_treatment'],color='green', dashes = [1,2,3,2])
dual.add_interval('Good Treatment',test_data['good_treatment'],color='green')

dual.add_mean('Good Treatment',test_data['good_treatment'],color='green')
dual.add_individuals('Good Treatment',test_data['good_treatment'],color='green', dashes = [1,2,3,2])
dual.add_interval('Good Treatment',test_data['good_treatment'],color='green')

dual.xlim = [0,100]
dual.title = 'My Great Treatment Model'
dual.ylabel = 'TV mm3'
dual.km_ylabel = 'Survival Probability'


dual.display(use_mpld3=True)









    Out[35]:



In [36]:

    
dual.save_html('dual_plot.html')

In the vertically stacked plot we usually hide the x axis tick labels on the volume plot to get a cleaner plot. You can toggle this behaviour with the hide_volume_labels display option



In [37]:

    
dual = VolumeSurvivalPlot(figsize=(15,8))
dual.add_mean('A treatment',test_data['other_treatment'],color='orange')
dual.add_individuals('A treatment',test_data['other_treatment'],color='orange', dashes = [1,1])
dual.add_interval('A treatment',test_data['other_treatment'],color='orange')

dual.add_mean('Vehicle',test_data['vehicle'],color='blue')
dual.add_individuals('Vehicle',test_data['vehicle'],color='blue', dashes = [3,3])
dual.add_interval('Vehicle',test_data['vehicle'],color='blue')

dual.add_mean('Good Treatment',test_data['good_treatment'],color='green')
dual.add_individuals('Good Treatment',test_data['good_treatment'],color='green', dashes = [1,2,3,2])
dual.add_interval('Good Treatment',test_data['good_treatment'],color='green')

dual.title = 'My Great Experiment'
dual.xlim = [0,100]
dual.display(legend=False, hide_volume_labels = False, use_mpld3=False)

km_ci_show, the flag for displaying confidence intervals in the Kaplan-Meier plot, and km_show_censors, the flag for displaying censored individuals, both need to be set prior to the addition of data. Data is plotted in the order it is added, which will determine which line overplots at the top of the Kaplan-Meier plot.



In [38]:

    
dual = VolumeSurvivalPlot(figsize=(15,8))

dual.km_show_censors = False
dual.km_ci_show = False

dual.add_mean('A treatment',test_data['other_treatment'],color='orange')
dual.add_individuals('A treatment',test_data['other_treatment'],color='orange', dashes = [1,1])
dual.add_interval('A treatment',test_data['other_treatment'],color='orange')

dual.add_mean('Vehicle',test_data['vehicle'],color='blue')
dual.add_individuals('Vehicle',test_data['vehicle'],color='blue', dashes = [3,3])
dual.add_interval('Vehicle',test_data['vehicle'],color='blue')

dual.add_mean('Good Treatment',test_data['good_treatment'],color='green')
dual.add_individuals('Good Treatment',test_data['good_treatment'],color='green', dashes = [1,2,3,2])
dual.add_interval('Good Treatment',test_data['good_treatment'],color='green')

dual.title = 'My Great Experiment'
dual.xlim = [0,100]


dual.display(legend=False, hide_volume_labels = False)









    Out[38]:

Statistics on dual plot object

You can do a pairwise log rank test on the dual plot object referencing the name given to the mean. We do not do all possible pairwise as this would incur a greater multiple testing penalty. The results of selected tests are calculated manually and then subsequently will need to be corrected for multiple testing



In [39]:

    
print('Available to test are:', dual.volume_data.keys())

a = dual.logrank_test('Vehicle', 'Good Treatment')









    



Available to test are: dict_keys(['A treatment', 'Vehicle', 'Good Treatment'])
Results
   t 0: -1
   test: logrank
   alpha: 0.95
   null distribution: chi squared
   df: 1

   __ p-value ___|__ test statistic __|____ test result ____|__ is significant __
         0.01330 |              6.129 |      Reject Null    |        True

Advanced tricks

When you are doing lots of plots you may wish to set defaults that you can reuse and change once, rather than having to update line styles across lots of individual plots. This can be acchieved with dictionary unpacking using the **dict notation



In [40]:

    
#You can change these definitions here and all the following plots will update in style
other_treatment_mean = {'color':'red','lw':3}
other_treatment_interval = {'color':'red', 'alpha':0.1}
other_treatment_individuals = {'color':'red','lw':1,'dashes':[2,1,1,1,5,1,1,1]}

dual = VolumeSurvivalPlot(figsize=(15,8))
dual.add_mean('A treatment',test_data['other_treatment'], **other_treatment_mean)
dual.add_individuals('A treatment',test_data['other_treatment'], **other_treatment_individuals)
dual.add_interval('A treatment',test_data['other_treatment'], **other_treatment_interval)

dual.display(legend=False, use_mpld3=False)

Annotations



In [41]:

    
dual = VolumeSurvivalPlot(figsize=(15,8))
dual.add_mean('A treatment',test_data['other_treatment'], **other_treatment_mean)
dual.add_individuals('A treatment',test_data['other_treatment'], **other_treatment_individuals)
dual.add_interval('A treatment',test_data['other_treatment'], **other_treatment_interval)

dual.display(legend=False)

dual.ax.annotate("Look Here!",
            xy=(25, 540), xycoords='data',
            xytext=(22, 900), textcoords='data',
            arrowprops=dict(arrowstyle="->",color='green'),
            color='blue')

#arrows are not supported by mpld3 v0.3git
dual.display(use_mpld3=False)



In [42]:

    
dual = VolumeSurvivalPlot(figsize=(15,8))
dual.add_mean('A treatment',test_data['other_treatment'], **other_treatment_mean)
dual.add_individuals('A treatment',test_data['other_treatment'], **other_treatment_individuals)
dual.add_interval('A treatment',test_data['other_treatment'], **other_treatment_interval)

dual.ylim = [0,1000]
dual.ax.annotate("Look Here!",
            xy=(25, 540), xycoords='data',
            xytext=(22, 900), textcoords='data',
            arrowprops=dict(arrowstyle="->",color='green'),
            color='blue')

dual.ax.annotate('',
            xy=(22, 140), xycoords='data',
            xytext=(22, 40), textcoords='data',
            arrowprops=dict(arrowstyle="fancy",color='orange'),
            color='blue')


dual.ax.text(0, 700, "Threshold", ha="right", va="center", rotation=0,
            size=6,
            bbox=dict(boxstyle="rarrow,pad=0.3", fc="cyan", ec="b", lw=2))

#arrows are not supported by mpld3 v0.3git
dual.display(legend=False, use_mpld3=False)



In [43]:

    
from matplotlib.patches import Rectangle
from matplotlib.collections import PatchCollection

dual = VolumeSurvivalPlot(figsize=(15,8))
dual.add_mean('A treatment',test_data['other_treatment'], **other_treatment_mean)
dual.add_individuals('A treatment',test_data['other_treatment'], **other_treatment_individuals)
dual.add_interval('A treatment',test_data['other_treatment'], **other_treatment_interval)

dual.ylim = [0,1000]
dual.xlim = [0,100]


dual.ax.add_patch(Rectangle((0, 0), 21, 20, facecolor="lightgrey",lw=0))
dual.display(legend=False, use_mpld3=False)



In [44]:

    
from matplotlib.patches import Rectangle
from matplotlib.collections import PatchCollection

dual = VolumeSurvivalPlot(figsize=(15,8))
dual.add_mean('A treatment',test_data['other_treatment'], **other_treatment_mean)
dual.add_individuals('A treatment',test_data['other_treatment'], **other_treatment_individuals)
dual.add_interval('A treatment',test_data['other_treatment'], **other_treatment_interval)

dual.ylim = [0,1000]
dual.xlim = [0,100]

dual.shade_interval(0,25)
dual.display(legend=False, use_mpld3=False)



In [45]:

    
dual = VolumeSurvivalPlot(figsize=(15,8))
dual.add_mean('A treatment',test_data['other_treatment'], **other_treatment_mean)
dual.add_individuals('A treatment',test_data['other_treatment'], **other_treatment_individuals)
dual.add_interval('A treatment',test_data['other_treatment'], **other_treatment_interval)

dual.ylim = [0,1000]
dual.xlim = [0,100]

dual.show_treatment_days([1,3,5,8,10,12,], facecolor="lightgrey")
dual.show_treatment_days([15,17,19], facecolor="lightblue")

dual.display(legend=False, use_mpld3=False)

Annotate with horizontal and vertical lines



In [46]:

    
dual = VolumeSurvivalPlot(figsize=(15,8))
dual.add_mean('A treatment',test_data['other_treatment'], **other_treatment_mean)
dual.add_individuals('A treatment',test_data['other_treatment'], **other_treatment_individuals)
dual.add_interval('A treatment',test_data['other_treatment'], **other_treatment_interval)

dual.ylim = [0,1000]
dual.xlim = [0,100]

dual.ax.axhline(700,color='lightgrey',lw=1,alpha=0.3,dashes=[5,5])
dual.ax.axhline(200,color='lightgrey',lw=1,alpha=0.3)

dual.ax.axvline(21,color='lightgrey',lw=1,alpha=0.3)

dual.show_treatment_days([1,3,5,8,10,12,], facecolor="lightgrey")
dual.show_treatment_days([15,17,19], facecolor="lightblue")

dual.display(legend=False, use_mpld3=False)

Sizes in cm, fontsize and exporting to Illustrator

You can save as an adobe Illustrator compatible pdf and open in Illustrator.

Note that opening in Illustrator will not preserve line dashes. You can place the pdf in Illustrator or Indesign, or open in other pdf viewers and maintain the dashes.



In [47]:

    
def cm2inch(cm):
    return cm/2.54

dual = VolumeSurvivalPlot(figsize=(cm2inch(9),cm2inch(8)))
dual.add_mean('A treatment',test_data['other_treatment'], **other_treatment_mean)
dual.add_individuals('A treatment',test_data['other_treatment'], **other_treatment_individuals)
dual.add_interval('A treatment',test_data['other_treatment'], **other_treatment_interval)

dual.ylim = [0,1000]
dual.xlim = [0,100]

dual.fontsize = 8

dual.title = 'Sized for 6 per page'

#save as an adobe illustrator compatible pdf and open in Illustrator
dual.save_pdf('sized6perpage.ai', legend=False)

#!open sized6perpage.ai

Fine control of Legend placement

Legend placement can be manually controlled by calling the add legend method of the plot and providing the legend='custom' argument when calling display to prevent deletion of the existing legend. The most common use for this function is to over ride the default 'best' placement. Additional arguments such as bbox_to_anchor will be passed to matplotlib's legend function. For PDF output this can be used to move the legend outside the plot area; however, this currently will result in unpredicatable results when using the mpld3 html output.



In [48]:

    
dual = VolumeSurvivalPlot(figsize=(15,8))

dual.km_show_censors = False
dual.km_ci_show = False

dual.add_mean('A treatment',test_data['other_treatment'],color='orange')
dual.add_individuals('A treatment',test_data['other_treatment'],color='orange', dashes = [1,1])
dual.add_interval('A treatment',test_data['other_treatment'],color='orange',alpha=0.05)

dual.add_mean('Vehicle',test_data['vehicle'],color='blue')
dual.add_individuals('Vehicle',test_data['vehicle'],color='blue', dashes = [3,3])
dual.add_interval('Vehicle',test_data['vehicle'],color='blue',alpha=0.05)

dual.add_mean('Good Treatment',test_data['good_treatment'],color='green')
dual.add_individuals('Good Treatment',test_data['good_treatment'],color='green', dashes = [1,2,3,2])
dual.add_interval('Good Treatment',test_data['good_treatment'],color='green',alpha=0.05)

dual.title = 'My Great Experiment'
dual.xlim = [0,100]

#set the legend location to 1.01 plot widths right, and 1.0 up from the bottom left of the plot.
dual.add_legend(loc='lower left',bbox_to_anchor=(1.01,1.0))


dual.display(legend='custom', hide_volume_labels = False, use_mpld3=False)

Displaying the number of individuals in the legend

A frequent requirement of plots comparing groups is to be able to rapidly asses the number of individual that are included in a group. Although this information is visually presented as lines on the volume plot and steps in the Kaplain-Meier plot it is often useful to annotate the legend with the number of individual in each group.



In [49]:

    
dual = VolumeSurvivalPlot(figsize=(15,8))

#Display the number of individuals in the legend
dual.n_in_legend = True

dual.km_show_censors = False
dual.km_ci_show = False

dual.add_mean('A treatment',test_data['other_treatment'],color='orange')
dual.add_individuals('A treatment',test_data['other_treatment'],color='orange', dashes = [1,1])
dual.add_interval('A treatment',test_data['other_treatment'],color='orange',alpha=0.05)

dual.add_mean('Vehicle',test_data['vehicle'],color='blue')
dual.add_individuals('Vehicle',test_data['vehicle'],color='blue', dashes = [3,3])
dual.add_interval('Vehicle',test_data['vehicle'],color='blue',alpha=0.05)

dual.add_mean('Good Treatment',test_data['good_treatment'],color='green')
dual.add_individuals('Good Treatment',test_data['good_treatment'],color='green', dashes = [1,2,3,2])
dual.add_interval('Good Treatment',test_data['good_treatment'],color='green',alpha=0.05)

dual.title = 'My Great Experiment'
dual.xlim = [0,100]

#set the legend location to 1.01 plot widths right, and 1.0 up from the bottom left of the plot.
dual.add_legend(loc='lower left',bbox_to_anchor=(1.01,1.0))


dual.display(legend=True, use_mpld3=True)









    Out[49]:

18	T
Day
1	230.79
4	194.62
8	161.48

	1231	1232	1233	1234	1235
1	196.651127	229.118066	183.526092	234.028491	258.094495
4	153.771686	194.616465	258.153781	207.218985	262.370527
8	256.298156	361.942687	206.069331	237.849554	310.116141
11	233.545130	NaN	201.558526	219.750465	NaN
15	206.667291	NaN	221.677633	233.408585	NaN
18	150.125937	NaN	177.526944	233.310291	NaN
22	172.975654	NaN	231.332705	288.396148	NaN
25	205.492510	NaN	136.546338	245.922816	NaN
29	126.566408	NaN	121.945818	146.358947	NaN
32	145.558378	NaN	106.513885	314.243293	NaN
36	184.172373	NaN	45.938737	259.168495	NaN
39	157.568297	NaN	54.397156	217.832192	NaN
43	110.095886	NaN	37.497863	165.379661	NaN
46	139.743084	NaN	31.704332	NaN	NaN
50	129.820337	NaN	23.684161	NaN	NaN
53	123.313983	NaN	12.292366	NaN	NaN
57	42.009149	NaN	21.762810	NaN	NaN
60	65.194730	NaN	34.298148	NaN	NaN
64	28.644473	NaN	27.230834	NaN	NaN
67	59.836170	NaN	52.369844	NaN	NaN
71	46.884158	NaN	47.554727	NaN	NaN
74	19.530721	NaN	60.934294	NaN	NaN
78	6.346032	NaN	62.221407	NaN	NaN
81	NaN	NaN	112.782661	NaN	NaN
314	NaN	NaN	131.746885	NaN	NaN
315	NaN	NaN	167.288307	NaN	NaN
316	NaN	NaN	137.528416	NaN	NaN

	6661	6663	6664	6665	6666	6667	6668	6669
1	181.22	261.01	204.29	219.89	189.20	211.23	209.51	207.047
4	277.63	552.57	344.84	243.41	278.13	199.20	317.32	270.520
8	300.63	414.30	346.97	592.57	274.69	154.58	276.04	265.350
11	570.15	557.21	351.16	416.23	347.82	317.42	244.11	276.190
15	584.07	725.49	575.32	725.89	507.35	326.65	225.95	300.780
18	841.32	NaN	647.39	NaN	322.61	594.09	338.84	530.200
22	NaN	NaN	618.77	NaN	729.18	411.46	492.96	342.980
25	NaN	NaN	856.63	NaN	NaN	NaN	337.02	488.180
29	NaN	NaN	NaN	NaN	NaN	NaN	579.10	364.790
32	NaN	NaN	NaN	NaN	NaN	NaN	836.56	919.840

	Time	Observed
6661	19	True
6663	15	True
6664	26	True
6665	15	True
6666	22	True
6667	22	False
6668	33	True
6669	33	True