Supplemental Information:

"Clonal heterogeneity influences the fate of new adaptive mutations"

Ignacio Vázquez-García, Francisco Salinas, Jing Li, Andrej Fischer, Benjamin Barré, Johan Hallin, Anders Bergström, Elisa Alonso-Pérez, Jonas Warringer, Ville Mustonen, Gianni Liti

Figure 4 (+ Supp. Figs.)

This IPython notebook is provided for reproduction of Figures 4 and S5 of the paper. It can be viewed by copying its URL to nbviewer and it can be run by opening it in binder.



In [1]:

    
# Load external dependencies
from setup import *
# Load internal dependencies
import config,plot,utils

%load_ext autoreload
%autoreload 2

%matplotlib inline

Data import

Chromosome length



In [2]:

    
chr_coords = utils.chr_coords()

chr_coords = chr_coords[~chr_coords['chr_arabic'].isin([17,18])]

chr_coords.head()

Ancestral and evolved isolate genomes

De novo SNVs and indels in ancestral and evolved isolate genomes



In [3]:

    
# Load data
seq_mut_df = pd.read_csv(dir_data+'seq/de-novo/seq_de_novo_snv_indel.csv', encoding='utf-8', keep_default_na=False)

seq_mut_df = seq_mut_df[(seq_mut_df.clone!='')].reset_index(drop=True)

# Remap genotype values
seq_mut_df['genotype'] = seq_mut_df['genotype'].map({'0': 0, '1': 2,'0/0': 0, '0/1': 1, '1/0': 1, '1/1': 2})

seq_mut_df['type'] = 'snv_indel'

seq_mut_df.head()









    Out[3]:







  
    
      
      ac
      af
      alt
      an
      allele
      amino_acids
      biotype
      baseqranksum
      cds_position
      chr_arabic
      ...
      coverage
      substitution
      assignment
      mutation_type
      chr_length
      chr_roman
      pos_cum
      variant_type
      consequence_short
      type
    
  
  
    
      0
      1
      0.5
      [A]
      2.0
      A
      R/I
      protein_coding
      1.723
      101
      7
      ...
      28.795032
      C>A
      bulk
      driver
      1090940
      VII
      4595191.0
      snv
      non-synonymous
      snv_indel
    
    
      1
      1
      0.5
      [C]
      2.0
      
      
      non-coding
      -0.34
      
      4
      ...
      28.795032
      A>C
      bulk
      passenger
      1531933
      IV
      2849276.0
      snv
      non-coding
      snv_indel
    
    
      2
      1
      0.5
      [A]
      2.0
      A
      C/F
      protein_coding
      -1.488
      1883
      10
      ...
      28.795032
      C>A
      bulk
      passenger
      745751
      X
      5980969.0
      snv
      non-synonymous
      snv_indel
    
    
      3
      1
      0.5
      [T]
      2.0
      
      
      non-coding
      -0.653
      
      13
      ...
      28.795032
      C>T
      bulk
      passenger
      924431
      XIII
      8949297.0
      snv
      non-coding
      snv_indel
    
    
      4
      1
      0.5
      [C]
      2.0
      C
      C/W
      protein_coding
      1.674
      477
      13
      ...
      28.795032
      A>C
      bulk
      passenger
      924431
      XIII
      9157827.0
      snv
      non-synonymous
      snv_indel
    
  

5 rows × 84 columns

Background (consensus) and de novo SNVs and indels
Copy number variation
Loss-of-heterozygosity



In [4]:

    
# Load data
seq_cn_loh_df = pd.read_csv(dir_data+'seq/de-novo/seq_de_novo_cn_loh.csv.gz', encoding='utf-8', keep_default_na=False)

seq_cn_loh_df['genotype'] = pd.to_numeric(seq_cn_loh_df['genotype'], errors='coerce')
seq_cn_loh_df['pos_cum'] = pd.to_numeric(seq_cn_loh_df['pos_cum'], errors='coerce')

Join all variants (SNVs, indels, LOH, copy number) into a single dataframe:



In [5]:

    
columns=['selection','population','clone']
seq_mut_fixed_df = pd.merge(
    seq_mut_df, 
    seq_cn_loh_df[['selection','population','clone','set','lineage']].drop_duplicates(columns), 
    on=columns, how='left'
)
seq_cn_loh_fixed_df = pd.merge(
    seq_cn_loh_df, 
    seq_mut_df[['selection','population','clone']].drop_duplicates(columns), 
    on=columns, how='left'
)

seq_dn_df = pd.concat([seq_cn_loh_fixed_df, seq_mut_fixed_df], axis=0).reset_index(drop=True)

seq_dn_df.head()









    Out[5]:







  
    
      
      ac
      af
      allele
      alt
      amino_acids
      an
      assignment
      background
      baseqranksum
      biotype
      ...
      start
      str
      strand
      substitution
      symbol_source
      tag_1
      tag_2
      time
      type
      variant_type
    
  
  
    
      0
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      1.0
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      32
      copy_number
      NaN
    
    
      1
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      1.0
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      32
      copy_number
      NaN
    
    
      2
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      2.0
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      32
      copy_number
      NaN
    
    
      3
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      2.0
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      32
      copy_number
      NaN
    
    
      4
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      2.0
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      32
      copy_number
      NaN
    
  

5 rows × 89 columns

Figure S5 - Sequences of ancestral clones



In [6]:

    
# Keep ancestral haploid isolates only
seq_dn_df_sliced = seq_dn_df[(seq_dn_df['type'].isin(['genotype','snv_indel'])) & \
                             (seq_dn_df['time']==0) & \
                             (seq_dn_df['ploidy']=='haploid')]

seq_dn_df_sliced['clone'] = seq_dn_df_sliced['clone'].astype(int)
seq_dn_df_sliced = seq_dn_df_sliced.sort_values('clone')

seq_dn_df_sliced = pd.pivot_table(
    seq_dn_df_sliced, 
    index=['selection','population','time','clone','lineage','type','ploidy'], 
    columns=['chr_arabic','chr_roman','pos_cum','gene'],
    values='genotype'
)

fig = plt.figure(figsize=(8, 6))

shape = pd.DataFrame({k: x.shape for k,x in seq_dn_df_sliced.groupby(level=['population'])})

grid = gridspec.GridSpec(1, 2, wspace=0.2)
gs = {}

for ii, (p, sdf) in enumerate(seq_dn_df_sliced.groupby(level=['population'])):

    nrows = 1
    ncols = 1
    gs[p] = gridspec.GridSpecFromSubplotSpec(nrows, ncols,
                                             subplot_spec=grid[ii],
                                             hspace=0, wspace=0)
    
    # Retrieve axis
    ax = plt.subplot(gs[p][:])
    
    ### Background genotype ###
    indexer = [slice(None)]*len(sdf.index.names)
    indexer[sdf.index.names.index('type')] = ['genotype']
    gt = sdf.loc[tuple(indexer),:].dropna(axis=1, how='all')
            
    x = gt.columns.get_level_values('pos_cum').values
    y = np.arange(len(gt.index))
    data = gt.fillna(method='pad', axis=1).values
        
    # Make a color map of fixed colors
    cmap = plt.cm.colors.ListedColormap([config.background['color']['NA'], 
                                         config.background['color']['WA']])
    bounds = [0,1]
    norm = plt.cm.colors.BoundaryNorm(bounds, cmap.N)
    
    title = ''
    xlabel= ''
    ylabel= ''
    xticklabels = []
    yticklabels = []
        
    plot.heatmap(np.r_[x, x.max()+1], np.r_[y, y.max()+1], data,
                 ax, title, xlabel, ylabel, xticklabels, yticklabels, cmap=cmap, 
                 vmin=np.nanmin(data), vmax=np.nanmax(data), speed='fast')

    # Set title
    ax.set_title(p.replace('_',' ')+' T0', fontsize=6, weight='bold')
    
    # Set ticks
    lineage = gt.index.get_level_values('lineage').unique()[0]
    ax.tick_params(axis='y', colors=config.lineages[lineage]['fill'], width=5, which='both')
    
    ### De novo genotype ###
    indexer = [slice(None)]*len(sdf.index.names)
    indexer[sdf.index.names.index('type')] = ['snv_indel']
    gn = sdf.loc[tuple(indexer),:].dropna(axis=1, how='all')
        
    if len(gn) > 0:
                
        for i,(k,g) in enumerate(gn.groupby(level='clone')):

            g = g.dropna(axis=1)
            x = g.columns.get_level_values('pos_cum').values
            y = np.repeat(g.index.get_level_values('clone').values+.5, len(x))
            data = g.values.flatten()
            ax.scatter(x, y,
                       facecolors=[config.genotype['color'][d] for d in data],
                       edgecolors='lightgray', s=8, rasterized=False, zorder=3)
            
            genes = g.columns.get_level_values('gene')
            loc = zip(x, y-.4)
            
            for l, xy in zip(genes, loc):

                ann = ax.annotate(l, xy=xy, xytext=(0, 0), textcoords='offset points',
                                  bbox=dict(boxstyle="round,pad=0", fc="w", alpha=0.5),
                                  fontsize=4.5, va='bottom', ha='center',
                                  color='k', style=('italic' if l!='non-coding' else 'normal'),
                                  path_effects=[path_effects.withStroke(linewidth=0.35, foreground='w')])

    # Set grid
    xlabels = chr_coords.chr_start + (chr_coords.chr_end - chr_coords.chr_start)/2.
    xstart = chr_coords.chr_start
    ax.set_xticks(xlabels, minor=False)
    ax.set_xticklabels(chr_coords.chr_roman, minor=False)
    ax.set_xlabel('Chromosome')
    xgrid=[xst+1. for xst in list(set(xstart))]
    [ax.axvline(g, lw=0.5, ls="-", color="lightgray") for g in xgrid]
    
    ylabels = gt.index.get_level_values('clone').map(str)
    ax.set_yticks(np.arange(len(ylabels)) + 0.5, minor=False)
    ax.set_yticklabels('C' + ylabels, fontweight='bold', va='center', minor=False)
    [ax.axhline(g, lw=0.5, ls="-", color="lightgray") for g in np.arange(len(ylabels))]

# Set legend
for k, xy in zip(['WAxNA_F12_2_YPD'],[(2,0),(2,0)]):
    
    ax = plt.subplot(gs[k][:])
    
    wt_artist = patches.Rectangle((0,0), width=1, height=1, color=config.background['color']['WA'])
    hom_artist = patches.Rectangle((0,0), width=1, height=1, color=config.background['color']['NA'])
    
    leg1 = ax.legend([wt_artist,hom_artist], 
                     ['WA','NA'], 
                     bbox_to_anchor=(0.7, -0.1), ncol=1,
                     frameon=False, loc='center',
                     borderaxespad=0, handlelength=0.75, 
                     title='Background\ngenotype', prop={'size':5})
    
    hom_artist = lines.Line2D((0,1),(0,0), mec='k', mfc=config.genotype['color'][2],
                              marker='o', markersize=3, linestyle='')
    
    leg2 = ax.legend([hom_artist], 
                     ['hom.'], 
                     bbox_to_anchor=(0.9, -0.1), ncol=1,
                     frameon=False, loc='center',
                     borderaxespad=0, handlelength=0.75, 
                     title='De novo\ngenotype', prop={'size':5})
    
    ax.add_artist(leg1)
        
    for leg in [leg1,leg2]:
        plt.setp(leg.get_title(),fontsize=6, multialignment='center')

for ax in fig.get_axes():
    ax.xaxis.label.set_size(6)
    ax.yaxis.label.set_size(6)
    ax.tick_params(axis='x', which='major', size=2, labelsize=6)
    ax.tick_params(axis='y', which='major', size=2, labelsize=4.5)
    ax.tick_params(axis='both', which='minor', size=1, labelsize=4)

plot.save_figure(dir_supp+'figures/supp_figure_seq_clones/supp_figure_seq_ancestral_clones')
plt.show()

Fig. S5: Sequences of ancestral clones sampled from the WAxNA $\text{F}_\text{12}$ founder populations, which were obtained by bulk crossing between the WA and NA parents. Pre-existing and de novo SNVs and indels were detected by whole-genome sequencing in single-cell derived clones from ancestral populations at ${t\,{=}\,0}$ days. Chromosomes are shown on the $x$-axis; clone isolates are listed on the left. WA (in blue) and NA (in red) represent haploid genotypes. Individual cells with unique background genotypes carry private de novo SNVs and indels (circles).

Figure 4 - Genome instability in sequences of evolved clones



In [7]:

    
# Keep evolved isolates only
seq_dn_df_sliced = seq_dn_df[(seq_dn_df['type'].isin(['consensus','snv_indel','copy_number','loh'])) & \
                             (seq_dn_df['time']==32)]

seq_dn_df_sliced = pd.pivot_table(
    seq_dn_df_sliced, 
    index=['selection','population','time','set','clone','lineage','type','ploidy'], 
    columns=['chr_arabic','chr_roman','pos_cum','gene'],
    values='genotype'
)

fig = plt.figure(figsize=(4, 18))

indexer = [slice(None)]*len(seq_dn_df_sliced.index.names)
indexer[seq_dn_df_sliced.index.names.index('type')] = ['consensus','loh']
seq_dn_df_shape = seq_dn_df_sliced.loc[tuple(indexer),:].dropna(axis=1, how='all')

shape = pd.DataFrame({k: 
x.shape for k,x in seq_dn_df_shape.groupby(level=['selection','population'])
})

ygrid = shape.ix[0].values
grid = gridspec.GridSpec(len(ygrid), 1, height_ratios=ygrid, hspace=0.9)

gs = {}

for ii, ((e,p), data) in enumerate(seq_dn_df_sliced.groupby(level=['selection','population'])):

    idx = 0
    
    nrows = ygrid[ii]
    ncols = 1
    gs[(e,p)] = gridspec.GridSpecFromSubplotSpec(nrows, ncols,
                                                 subplot_spec=grid[ii],
                                                 hspace=0, wspace=0)
    
    ax = gs[(e,p)]
    
    print(e, p, data.shape)
    plot.genome_instability(data, ax=ax, title=p.replace('_',' ')+' T32')

# Set panel labels
ax1 = plt.subplot(gs['HU','WAxNA_F12_1_HU_1'][(0,0)])
ax1.text(-0.05, 4,  chr(ord('A')), transform=ax1.transAxes,
         fontsize=9, fontweight='bold', va='center', ha='right')
ax1.text(0, 4, 'Selection: %s' % config.selection['long_label']['HU'], 
         transform=ax1.transAxes, fontsize=8, va='center', ha='left')

ax1 = plt.subplot(gs['RM','WAxNA_F12_1_RM_1'][(0,0)])
ax1.text(-0.05, 4,  chr(ord('B')), transform=ax1.transAxes,
         fontsize=9, fontweight='bold', va='center', ha='right')
ax1.text(0, 4, 'Selection: %s' % config.selection['long_label']['RM'],
         transform=ax1.transAxes, fontsize=8, va='center', ha='left')

# Set legend
for k, xy in zip([('HU','WAxNA_F12_2_HU_1'),('RM','WAxNA_F12_1_RM_3')],[(2,0),(2,0)]):
    
    ax3 = plt.subplot(gs[k][(0,0)])
    
    wt_artist = patches.Rectangle((0,0), width=1, height=1, color=config.background['color']['WA/WA'])
    het_artist = patches.Rectangle((0,0), width=1, height=1, color=config.background['color']['WA/NA'])
    hom_artist = patches.Rectangle((0,0), width=1, height=1, color=config.background['color']['NA/NA'])
    
    leg1 = ax3.legend([wt_artist,hom_artist,het_artist], 
                      ['WA/WA','NA/NA','WA/NA'], 
                      bbox_to_anchor=(1.1, 2.5), ncol=1,
                      frameon=False, loc='center',
                      borderaxespad=0, handlelength=0.75, 
                      title='Background', prop={'size':5})
    
    ax4 = plt.subplot(gs[k][(2,0)])
    
    snv_indel_artist = lines.Line2D((0,1),(0,0), mec='k', mfc='w', marker='o', markersize=3, linestyle='')
    loh_artist = patches.Rectangle((0,0), width=1, height=1, color='k')
    cn_gain_artist = patches.Rectangle((0,0), width=1, height=1, color='k', hatch='---', fill=False)
    cn_loss_artist = patches.Rectangle((0,0), width=1, height=1, color='k', hatch='xxx', fill=False)
    
    leg2 = ax4.legend([snv_indel_artist,loh_artist,cn_gain_artist,cn_loss_artist], 
                      ['SNV/indel','LOH','Copy gain','Copy loss'], 
                      bbox_to_anchor=(1.1, 0.0), ncol=1,
                      frameon=False, loc='center',
                      borderaxespad=0, handlelength=0.75, 
                      title='Variant type', prop={'size':5})
    
    het_artist = lines.Line2D((0,1),(0,0), mec='k', mfc=config.genotype['color'][1], 
                              marker='o', markersize=3, linestyle='')
    hom_artist = lines.Line2D((0,1),(0,0), mec='k', mfc=config.genotype['color'][2],
                              marker='o', markersize=3, linestyle='')
    
    leg3 = ax4.legend([het_artist,hom_artist], 
                      ['het.','hom.'], 
                      bbox_to_anchor=(1.1, -4), ncol=1,
                      frameon=False, loc='center',
                      borderaxespad=0, handlelength=0.75, 
                      title='Genotype', prop={'size':5})
    
    ax4.add_artist(leg2)
    
    for leg in [leg1,leg2,leg3]:
        plt.setp(leg.get_title(),fontsize=6)
    
# Axes limits
for ax in fig.get_axes():
    ax.xaxis.label.set_size(6)
    ax.yaxis.label.set_size(6)
    ax.tick_params(axis='both', which='major', size=2, labelsize=6)
    ax.tick_params(axis='both', which='minor', size=1, labelsize=4)

plot.save_figure(dir_supp+'figures/supp_figure_seq_clones/supp_figure_seq_evolved_clones')
plt.show()









    



(u'HU', u'WAxNA_F12_1_HU_1', (8, 10701))
(u'HU', u'WAxNA_F12_1_HU_2', (7, 10701))
(u'HU', u'WAxNA_F12_1_HU_3', (19, 10701))
(u'HU', u'WAxNA_F12_2_HU_1', (8, 10701))
(u'HU', u'WAxNA_F12_2_HU_2', (14, 10701))
(u'HU', u'WAxNA_F12_2_HU_3', (20, 10701))
(u'RM', u'WAxNA_F12_1_RM_1', (11, 10701))
(u'RM', u'WAxNA_F12_1_RM_2', (22, 10701))
(u'RM', u'WAxNA_F12_1_RM_3', (11, 10701))
(u'RM', u'WAxNA_F12_2_RM_1', (11, 10701))
(u'RM', u'WAxNA_F12_2_RM_2', (22, 10701))
(u'RM', u'WAxNA_F12_2_RM_3', (4, 10701))

Fig. 4: Genome instability in sequences of evolved clones. SNVs, indels and chromosome-level aberrations were detected by whole-genome sequencing in single-cell derived clones from evolved populations, after ${t\,{=}\,32}$ days in (A) hydroxurea or (B) rapamycin (see Table S1). Chromosomes are shown on the $x$-axis; clone isolates are listed on the left, colored by lineage (see Fig. S7). The consensus shows the majority genotype across population isolates with sequence identity greater than 80%. WA/WA (in blue) and NA/NA (in red) represent homozygous diploid genotypes and WA/NA (in purple) represents a heterozygous genotype. Individual cells with shared background genotype carry de novo SNVs and indels (circles), de novo mis-segregations with loss-of-heterozygosity (solid segments) and de novo gains or losses in copy number (hatched segments). Driver and passenger mutations are listed along the bottom (drivers are in boldface). Populations marked by $\otimes$ indicate cross-contamination during the selection phase, but any derived events are independent.

	chr_arabic	chr_length	chr_roman	chr_start	chr_end
0	1	230218	I	0	230218
1	2	813184	II	230218	1043402
2	3	316620	III	1043402	1360022
3	4	1531933	IV	1360022	2891955
4	5	576874	V	2891955	3468829

	ac	af	alt	an	allele	amino_acids	biotype	baseqranksum	cds_position	chr_arabic	...	coverage	substitution	assignment	mutation_type	chr_length	chr_roman	pos_cum	variant_type	consequence_short	type
0	1	0.5	[A]	2.0	A	R/I	protein_coding	1.723	101	7	...	28.795032	C>A	bulk	driver	1090940	VII	4595191.0	snv	non-synonymous	snv_indel
1	1	0.5	[C]	2.0			non-coding	-0.34		4	...	28.795032	A>C	bulk	passenger	1531933	IV	2849276.0	snv	non-coding	snv_indel
2	1	0.5	[A]	2.0	A	C/F	protein_coding	-1.488	1883	10	...	28.795032	C>A	bulk	passenger	745751	X	5980969.0	snv	non-synonymous	snv_indel
3	1	0.5	[T]	2.0			non-coding	-0.653		13	...	28.795032	C>T	bulk	passenger	924431	XIII	8949297.0	snv	non-coding	snv_indel
4	1	0.5	[C]	2.0	C	C/W	protein_coding	1.674	477	13	...	28.795032	A>C	bulk	passenger	924431	XIII	9157827.0	snv	non-synonymous	snv_indel

	ac	af	allele	alt	amino_acids	an	assignment	background	baseqranksum	biotype	...	start	str	strand	substitution	symbol_source	tag_1	tag_2	time	type	variant_type
0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	1.0	NaN	NaN	NaN	NaN	NaN	NaN	32	copy_number	NaN
1	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	1.0	NaN	NaN	NaN	NaN	NaN	NaN	32	copy_number	NaN
2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	2.0	NaN	NaN	NaN	NaN	NaN	NaN	32	copy_number	NaN
3	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	2.0	NaN	NaN	NaN	NaN	NaN	NaN	32	copy_number	NaN
4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	2.0	NaN	NaN	NaN	NaN	NaN	NaN	32	copy_number	NaN