Maren Equations

Sergey and Lauren developed a set of equtions found here:

Nuzhdin, S. V, Friesen, M. L., & McIntyre, L. M. (2012). Genotype-phenotype mapping in a post-GWAS world. Trends in Genetics : TIG, 28(9), 421–6. doi:10.1016/j.tig.2012.06.003

which potentially allow for the identificqtion of cis- and trans-effects. Here I try using these qeustions and test if they give reasonable results.

Basics: For a given gene the expression level of $E_{ii}$ of allele i in F1 genotype i.

$E_{ii} = \mu + C_i + (T_i + T_t)/2$

$E_{ti} = \mu + C_t + (T_i + T_t)/2$

For each allele the cis- and trans-effects are deviations from the population means, we expect that they will sum to zero:

$\sum^n_{i=1}C_i = 0$

$\sum^n_{i=1}T_i = 0$

Then the expected difference in expression between the Line and Tester allele over the entire population is:

$\sum^n_{i=1} \frac{E_{ti} - E_{ii}}{n}$

Which can be re-written as

$\sum^n_{i=1} \frac{C_{t} - C_{i}}{n} = C_t$

And

$T_t = 2(\frac{\sum^n_{i=1}E_{ti}}{n} - \mu - C_t)$

The cis-effect of allele i can be estimated by:

$\hat C_i = \hat E_{ii} - \hat E_{ti} + \hat C_t$

and trans-effects of allele i can be estimated by:

$\hat T_i = 2(\hat E_{ti} - \hat \mu - \hat C_t - \frac{\hat T_{ti}}{2})$



In [1]:

    
# Set-up default environment
%run '../ipython_startup.py'

# Import additional libraries
import sas7bdat as sas
import cPickle as pickle

from ase_cisEq import marenEq
from ase_cisEq import marenPrintTable

from ase_normalization import meanStd

from ase_plotting import dfPanelScatter









    



Importing commonly used libraries: os, sys, numpy as np, scipy as sp, pandas as pd, matplotlib as mp, matplotlib.pyplot as plt, datetime as dt, mclib_Python/flagging as fg
Creating project level variables: MCLAB = /home/jfear/mclab, PROJ = /home/jfear/mclab/cegs_ase_paper, TODAY = 20150908
Adding ['scripts/mclib_Python', 'scripts/ase_Python'] to PYTHONPATH

Import clean data set

This data set was created by: ase_summarize_ase_filters.sas

The data has had the following droped:

regions that were always bias in 100 genome simulation
regions with APN $\le 25$
regions not in at least 10% of genotypes
regions not in mated and virgin
genotypes with extreme bias in median(q5_mean_theta)
genotypes with $\le500$ regions



In [2]:

    
# Import clean dataset
with sas.SAS7BDAT(os.path.join(PROJ, 'sas_data/clean_ase_stack.sas7bdat')) as FH:
    df = FH.to_data_frame()
    
dfClean = df[['line', 'mating_status', 'fusion_id', 'flag_AI_combined', 'q5_mean_theta', 'sum_both', 'sum_line', 'sum_tester', 'sum_total', 'mean_apn']]









    



[clean_ase_stack.sas7bdat] header length 65536 != 8192
WARNING:/home/jfear/mclab/cegs_ase_paper/sas_data/clean_ase_stack.sas7bdat:[clean_ase_stack.sas7bdat] header length 65536 != 8192

Additional cleaning

For the maren equations, I am also going to drop exonic regions with less than 10 genotypes. The maren equations make some assumptions about the population level sums. Obvisouly the more genotypes that are present for each fusions the better, but I am comfortable with as few as 10 genotypes.



In [3]:

    
# Drop groups with less than 10 lines per fusion
grp = dfClean.groupby(['mating_status', 'fusion_id'])
dfGt10 = grp.filter(lambda x: x['line'].count() >= 10).copy()
print 'Rows ' + str(dfGt10.shape[0])
print 'Columns ' + str(dfGt10.shape[1])









    



Rows 131700
Columns 10



In [4]:

    
# Function to make a panel plot of kde for the maren equation output
def marenKDE(df, value1='cis_line', value2='trans_line', label1='cis_line', label2='trans_line'):
    # pivot for easy plotting
    line = pd.pivot_table(df, columns='fusion_id', index=('line'), values=value1)
    tester = pd.pivot_table(df, columns='fusion_id', index=('line'), values=value2)

    # Plot only the first 30 fusions
    axes = line.iloc[:, :30].plot(kind='kde', subplots=True, layout=(6, 5), figsize=(20, 15), sharex=False, rot=90, color='b')
    tester.iloc[:, :30].plot(kind='kde', subplots=True, ax=axes, sharex=False, color='g', rot=90)

    # Add a vline to the plots and remove legend
    for ax in axes.ravel():
        ax.axvline(0, color='r', lw=2)
        handles, labels = ax.get_legend_handles_labels()
        ax.set_title(labels[0])
        ax.legend().remove()
        ax.get_yaxis().set_visible(False)

    fig = plt.gcf()
    plt.legend(handles, [label1, label2], bbox_transform=fig.transFigure, bbox_to_anchor=(0.65, 1.03), ncol=2, fontsize=18)
    plt.tight_layout()
    return fig

Raw Counts

Raw counts seem to have some issues. The magnitude of line cis and tester trans effects are very different, while tester cis and trans are the same number just different signs. Also of concern is that the estimated cis and trans effects for the line are not centered at 0, which is a major assumption of the equations.



In [5]:

    
# Calculate Maren TIG equations by mating status and exonic region
marenRawCounts = marenEq(dfGt10, Eii='sum_line', Eti='sum_tester', group=['mating_status', 'fusion_id'])
marenRawCounts['mag_cis'] = abs(marenRawCounts['cis_line'])
marenPrintTable(marenRawCounts)









    






  
    
      
      
      
      flag_AI_combined
      sum_both
      sum_line
      sum_tester
      cis_line
      cis_tester
      mean_apn
    
    
      line
      mating_status
      fusion_id
      
      
      
      
      
      
      
    
  
  
    
      r101
      M
      F10005_SI
      0
      1274
      155
      133
      28.666667
      6.666667
      29.523922
    
    
      r280
      M
      F10005_SI
      0
      1208
      126
      143
      -10.333333
      6.666667
      27.917307
    
    
      r315
      M
      F10005_SI
      0
      1231
      237
      218
      25.666667
      6.666667
      31.867690
    
    
      r324
      M
      F10005_SI
      1
      2554
      349
      276
      79.666667
      6.666667
      60.087419
    
    
      r335
      M
      F10005_SI
      0
      1339
      215
      217
      4.666667
      6.666667
      33.474306
    
    
      r340
      M
      F10005_SI
      1
      1897
      332
      102
      236.666667
      6.666667
      44.059067
    
    
      r357
      M
      F10005_SI
      1
      2797
      358
      408
      -43.333333
      6.666667
      67.345540
    
    
      r358
      M
      F10005_SI
      0
      1494
      185
      162
      29.666667
      6.666667
      34.797401
    
    
      r365
      M
      F10005_SI
      0
      2313
      385
      386
      5.666667
      6.666667
      58.291790
    
    
      r373
      M
      F10005_SI
      0
      1546
      146
      113
      39.666667
      6.666667
      34.116952
    
    
      r374
      M
      F10005_SI
      0
      1703
      231
      155
      82.666667
      6.666667
      39.484938
    
    
      r380
      M
      F10005_SI
      0
      1584
      295
      254
      47.666667
      6.666667
      40.316598
    
    
      r427
      M
      F10005_SI
      0
      4246
      522
      510
      18.666667
      6.666667
      99.761370
    
    
      r491
      M
      F10005_SI
      0
      3827
      657
      599
      64.666667
      6.666667
      96.075605
    
    
      r517
      M
      F10005_SI
      0
      2591
      580
      520
      66.666667
      6.666667
      69.764914
    
    
      r732
      M
      F10005_SI
      0
      1281
      276
      237
      45.666667
      6.666667
      33.909037
    
    
      r737
      M
      F10005_SI
      0
      1576
      126
      156
      -23.333333
      6.666667
      35.118724
    
    
      r799
      M
      F10005_SI
      0
      2798
      520
      480
      46.666667
      6.666667
      71.787360
    
    
      r820
      M
      F10005_SI
      0
      1880
      236
      180
      62.666667
      6.666667
      43.397519
    
    
      r85
      M
      F10005_SI
      0
      1086
      131
      191
      -53.333333
      6.666667
      26.613113
    
    
      w114
      M
      F10005_SI
      0
      1042
      230
      243
      -6.333333
      6.666667
      28.635558
    
    
      w38
      M
      F10005_SI
      1
      2046
      444
      501
      -50.333333
      6.666667
      56.533963
    
    
      w47
      M
      F10005_SI
      0
      5492
      1425
      1417
      14.666667
      6.666667
      157.523922
    
    
      w52
      M
      F10005_SI
      0
      3848
      407
      397
      16.666667
      6.666667
      87.929120
    
    
      w55
      M
      F10005_SI
      1
      4191
      252
      882
      -623.333333
      6.666667
      100.649734
    
    
      w59
      M
      F10005_SI
      0
      1429
      367
      329
      44.666667
      6.666667
      40.165387
    
    
      w64
      M
      F10005_SI
      0
      2153
      284
      239
      51.666667
      6.666667
      50.580035
    
    
      w68
      M
      F10005_SI
      0
      1915
      371
      305
      72.666667
      6.666667
      48.973420
    
    
      w76
      M
      F10005_SI
      0
      1757
      286
      376
      -83.333333
      6.666667
      45.722386
    
    
      w79
      M
      F10005_SI
      0
      2596
      384
      583
      -192.333333
      6.666667
      67.345540



In [6]:

    
fig = marenKDE(marenRawCounts)









    



---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-6-b6ec4c277051> in <module>()
----> 1 fig = marenKDE(marenRawCounts)

<ipython-input-4-7fb5c1c3fab2> in marenKDE(df, value1, value2, label1, label2)
      3     # pivot for easy plotting
      4     line = pd.pivot_table(df, columns='fusion_id', index=('line'), values=value1)
----> 5     tester = pd.pivot_table(df, columns='fusion_id', index=('line'), values=value2)
      6 
      7     # Plot only the first 30 fusions

/home/jfear/.local/lib/python2.7/site-packages/pandas/tools/pivot.pyc in pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna)
    109 
    110     grouped = data.groupby(keys)
--> 111     agged = grouped.agg(aggfunc)
    112 
    113     table = agged

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in agg(self, func, *args, **kwargs)
    675     @Appender(_agg_doc)
    676     def agg(self, func, *args, **kwargs):
--> 677         return self.aggregate(func, *args, **kwargs)
    678 
    679     def _iterate_slices(self):

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in aggregate(self, arg, *args, **kwargs)
   2601     def aggregate(self, arg, *args, **kwargs):
   2602         if isinstance(arg, compat.string_types):
-> 2603             return getattr(self, arg)(*args, **kwargs)
   2604 
   2605         result = OrderedDict()

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in mean(self)
    690         """
    691         try:
--> 692             return self._cython_agg_general('mean')
    693         except GroupByError:
    694             raise

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in _cython_agg_general(self, how, numeric_only)
   2524 
   2525     def _cython_agg_general(self, how, numeric_only=True):
-> 2526         new_items, new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
   2527         return self._wrap_agged_blocks(new_items, new_blocks)
   2528 

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in _cython_agg_blocks(self, how, numeric_only)
   2571 
   2572         if len(new_blocks) == 0:
-> 2573             raise DataError('No numeric types to aggregate')
   2574 
   2575         return data.items, new_blocks

DataError: No numeric types to aggregate

Mean Centered Counts

I am concerned about the calculation of $\mu$. Mean centering the raw counts will allow $\mu = 0$ and I can effectively ignore it. For each fusion_id I take the mean of all the raw counts (line and tester), then subtract this mean value from each count.

This had no affect on the results.



In [7]:

    
# Mean Center raw counts
def meanCenter(x):
    """Mean centering.
    
    Mean center allele specific counts by using both line and tester counts.
    
    """
    # Get the mean for combined counts
    cntMean = np.mean(x[['sum_line', 'sum_tester']].values)
    
    # Center values using mean
    x['sum_line_center'] = x['sum_line'] - cntMean
    x['sum_tester_center'] = x['sum_tester'] - cntMean
    
    return x

# Group by mating status and fusions id
grp = dfGt10.groupby(['mating_status', 'fusion_id'])

# For each ms*fusion_id do the mean centering
meanCentered = grp.apply(meanCenter)
meanCentered.reset_index(inplace=True)



In [8]:

    
# Calculate Maren TIG equations by mating status and exonic region for mean centered data.
marenMeanCenter = marenEq(meanCentered, Eii='sum_line_center', Eti='sum_tester_center', group=['mating_status', 'fusion_id'])
marenMeanCenter['mag_cis'] = abs(marenMeanCenter['cis_line'])
marenPrintTable(marenMeanCenter, line='sum_line_center', tester='sum_tester_center')









    






  
    
      
      
      
      flag_AI_combined
      sum_both
      sum_line_center
      sum_tester_center
      cis_line
      cis_tester
      mean_apn
    
    
      line
      mating_status
      fusion_id
      
      
      
      
      
      
      
    
  
  
    
      r101
      M
      F10005_SI
      0
      1274
      -198.733333
      -220.733333
      28.666667
      6.666667
      29.523922
    
    
      r280
      M
      F10005_SI
      0
      1208
      -227.733333
      -210.733333
      -10.333333
      6.666667
      27.917307
    
    
      r315
      M
      F10005_SI
      0
      1231
      -116.733333
      -135.733333
      25.666667
      6.666667
      31.867690
    
    
      r324
      M
      F10005_SI
      1
      2554
      -4.733333
      -77.733333
      79.666667
      6.666667
      60.087419
    
    
      r335
      M
      F10005_SI
      0
      1339
      -138.733333
      -136.733333
      4.666667
      6.666667
      33.474306
    
    
      r340
      M
      F10005_SI
      1
      1897
      -21.733333
      -251.733333
      236.666667
      6.666667
      44.059067
    
    
      r357
      M
      F10005_SI
      1
      2797
      4.266667
      54.266667
      -43.333333
      6.666667
      67.345540
    
    
      r358
      M
      F10005_SI
      0
      1494
      -168.733333
      -191.733333
      29.666667
      6.666667
      34.797401
    
    
      r365
      M
      F10005_SI
      0
      2313
      31.266667
      32.266667
      5.666667
      6.666667
      58.291790
    
    
      r373
      M
      F10005_SI
      0
      1546
      -207.733333
      -240.733333
      39.666667
      6.666667
      34.116952
    
    
      r374
      M
      F10005_SI
      0
      1703
      -122.733333
      -198.733333
      82.666667
      6.666667
      39.484938
    
    
      r380
      M
      F10005_SI
      0
      1584
      -58.733333
      -99.733333
      47.666667
      6.666667
      40.316598
    
    
      r427
      M
      F10005_SI
      0
      4246
      168.266667
      156.266667
      18.666667
      6.666667
      99.761370
    
    
      r491
      M
      F10005_SI
      0
      3827
      303.266667
      245.266667
      64.666667
      6.666667
      96.075605
    
    
      r517
      M
      F10005_SI
      0
      2591
      226.266667
      166.266667
      66.666667
      6.666667
      69.764914
    
    
      r732
      M
      F10005_SI
      0
      1281
      -77.733333
      -116.733333
      45.666667
      6.666667
      33.909037
    
    
      r737
      M
      F10005_SI
      0
      1576
      -227.733333
      -197.733333
      -23.333333
      6.666667
      35.118724
    
    
      r799
      M
      F10005_SI
      0
      2798
      166.266667
      126.266667
      46.666667
      6.666667
      71.787360
    
    
      r820
      M
      F10005_SI
      0
      1880
      -117.733333
      -173.733333
      62.666667
      6.666667
      43.397519
    
    
      r85
      M
      F10005_SI
      0
      1086
      -222.733333
      -162.733333
      -53.333333
      6.666667
      26.613113
    
    
      w114
      M
      F10005_SI
      0
      1042
      -123.733333
      -110.733333
      -6.333333
      6.666667
      28.635558
    
    
      w38
      M
      F10005_SI
      1
      2046
      90.266667
      147.266667
      -50.333333
      6.666667
      56.533963
    
    
      w47
      M
      F10005_SI
      0
      5492
      1071.266667
      1063.266667
      14.666667
      6.666667
      157.523922
    
    
      w52
      M
      F10005_SI
      0
      3848
      53.266667
      43.266667
      16.666667
      6.666667
      87.929120
    
    
      w55
      M
      F10005_SI
      1
      4191
      -101.733333
      528.266667
      -623.333333
      6.666667
      100.649734
    
    
      w59
      M
      F10005_SI
      0
      1429
      13.266667
      -24.733333
      44.666667
      6.666667
      40.165387
    
    
      w64
      M
      F10005_SI
      0
      2153
      -69.733333
      -114.733333
      51.666667
      6.666667
      50.580035
    
    
      w68
      M
      F10005_SI
      0
      1915
      17.266667
      -48.733333
      72.666667
      6.666667
      48.973420
    
    
      w76
      M
      F10005_SI
      0
      1757
      -67.733333
      22.266667
      -83.333333
      6.666667
      45.722386
    
    
      w79
      M
      F10005_SI
      0
      2596
      30.266667
      229.266667
      -192.333333
      6.666667
      67.345540



In [9]:

    
fig = marenKDE(marenMeanCenter)









    



---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-9-1dda2499c620> in <module>()
----> 1 fig = marenKDE(marenMeanCenter)

<ipython-input-4-7fb5c1c3fab2> in marenKDE(df, value1, value2, label1, label2)
      3     # pivot for easy plotting
      4     line = pd.pivot_table(df, columns='fusion_id', index=('line'), values=value1)
----> 5     tester = pd.pivot_table(df, columns='fusion_id', index=('line'), values=value2)
      6 
      7     # Plot only the first 30 fusions

/home/jfear/.local/lib/python2.7/site-packages/pandas/tools/pivot.pyc in pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna)
    109 
    110     grouped = data.groupby(keys)
--> 111     agged = grouped.agg(aggfunc)
    112 
    113     table = agged

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in agg(self, func, *args, **kwargs)
    675     @Appender(_agg_doc)
    676     def agg(self, func, *args, **kwargs):
--> 677         return self.aggregate(func, *args, **kwargs)
    678 
    679     def _iterate_slices(self):

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in aggregate(self, arg, *args, **kwargs)
   2601     def aggregate(self, arg, *args, **kwargs):
   2602         if isinstance(arg, compat.string_types):
-> 2603             return getattr(self, arg)(*args, **kwargs)
   2604 
   2605         result = OrderedDict()

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in mean(self)
    690         """
    691         try:
--> 692             return self._cython_agg_general('mean')
    693         except GroupByError:
    694             raise

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in _cython_agg_general(self, how, numeric_only)
   2524 
   2525     def _cython_agg_general(self, how, numeric_only=True):
-> 2526         new_items, new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
   2527         return self._wrap_agged_blocks(new_items, new_blocks)
   2528 

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in _cython_agg_blocks(self, how, numeric_only)
   2571 
   2572         if len(new_blocks) == 0:
-> 2573             raise DataError('No numeric types to aggregate')
   2574 
   2575         return data.items, new_blocks

DataError: No numeric types to aggregate

Upper Quartile Normalization

For most of the CEGS projects, we have used a q3 normalization. Here I am taking the count value / the upper quartile for the line * median of the overall upper quartile.

$\frac{\text{sum_line}_{gf}}{q3_g} * \widetilde {q3}$

Where g is genotypes 1...G and f is exonic region 1...F.



In [ ]:

    
# Functiont to calc the upper quartile normalization
def calcQ3(df, column): 
    # Calculate upper quantile for each mating_status*fusion
    q3 = df.groupby(['mating_status', 'fusion_id'])[column].quantile(q=0.75)
    q3.name = 'q3'
    q3 = q3.reset_index()

    # Calculate median q3 by mating_status
    medQ3 = q3.groupby('mating_status').median()
    medQ3.columns = ['medQ3']
    
    # Merge q3 and medQ3 together
    dfQ3 = q3.merge(medQ3, left_on='mating_status', right_index=True)
    
    # Combine with original data
    merged = df.merge(dfQ3, on=['mating_status', 'fusion_id'])
    
    # Calcuate q3 norm and add to original dataset
    # value / q3 * med(q3)
    df['q3_norm_' + column] = merged[column] / merged['q3'] * merged['medQ3']



In [ ]:

    
# run q3 norm
dfQ3 = dfGt10.copy()
calcQ3(dfQ3, 'sum_line')
calcQ3(dfQ3, 'sum_tester')



In [ ]:

    
# Calculate Maren TIG equations by mating status and exonic region
marenQ3Norm = marenEq(dfQ3, Eii='q3_norm_sum_line', Eti='q3_norm_sum_tester', group=['mating_status', 'fusion_id'])
marenQ3Norm['mag_cis'] = abs(marenQ3Norm['cis_line'])
marenPrintTable(marenQ3Norm, line='q3_norm_sum_line', tester='q3_norm_sum_tester')



In [ ]:

    
fig = marenKDE(marenQ3Norm)

Mean Standardization

For mean standardization, for each exonic regions, I subtract the mean value of the exonic regions across genotypes and divide by the standard deviation. Note I am doing environments separately (mated and virgin).



In [ ]:

    
# Mean standardization
meanStd(df=q3Line, colName='q3_norm_sum_line', group='fusion_id')
meanStd(df=q3Tester, colName='q3_norm_sum_tester', group='fusion_id')

# Merge everything together
q3Merge = q3Line.merge(q3Tester, how='inner', on=['line', 'mating_status', 'fusion_id'])
dfQ3Std = dfClean.merge(q3Merge, how='left', on=['line', 'mating_status', 'fusion_id'])



In [ ]:

    
# Calculate Maren TIG equations by mating status and exonic region
marenQ3NormStd = marenEq(dfQ3Std, Eii='mean_std_q3_norm_sum_line', Eti='mean_std_q3_norm_sum_tester', group=['mating_status', 'fusion_id'])
marenQ3NormStd['mag_cis'] = abs(marenQ3NormStd['cis_line'])
marenQ3NormStd.head()



In [ ]:

    
grp = marenQ3NormStd.groupby(['mating_status', 'fusion_id'])
tp = grp.get_group(('M', 'F10005_SI'))



In [ ]:

    
tp[['cis_line', 'trans_line']].plot(kind='kde')

Summary Plots

Now I am going to do a variety of summary plots and see how the different normalization methods compare.



In [ ]:

    
# Figure out the 25 highest expressed fusion
## group fusion by id and env
fusGrp = dfGt10.groupby(['mating_status', 'fusion_id'])

## Calculate the mean apn for each fusion across genotypes
mApn = fusGrp['mean_apn'].mean()
mApnI = mApn.reset_index()
mApnI.set_index('fusion_id', inplace=True)

# Get the 25 highest expressed fusions for mated and virgin
fusGrp2 = mApnI.groupby('mating_status')

m = fusGrp2.get_group('M').rank(ascending=False)
mFusHi = m[m['mean_apn'] <= 25].index

v = fusGrp2.get_group('V').rank(ascending=False)
vFusHi = v[v['mean_apn'] <= 25].index

# Get the 25 lowest expressed fusions for mated and virgin
m = fusGrp2.get_group('M').rank(ascending=True)
mFusLow = m[m['mean_apn'] <= 25].index

v = fusGrp2.get_group('V').rank(ascending=True)
vFusLow = v[v['mean_apn'] <= 25].index



In [ ]:

    
rawMHi = marenRawCounts[marenRawCounts['fusion_id'].isin(mFusHi) & (marenRawCounts['mating_status'] == 'M')]
rawVHi = marenRawCounts[marenRawCounts['fusion_id'].isin(vFusHi) & (marenRawCounts['mating_status'] == 'V')]
dfPanelScatter(df=rawMHi, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Mated\nHigh Expression Exonic regions\nRaw Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=rawVHi, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Virgin\nHigh Expression Exonic regions\nRaw Counts', colorCol='flag_AI_combined')



In [ ]:

    
dfPanelScatter(df=rawMHi, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Mated\nHigh Expression Exonic regions\nRaw Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=rawVHi, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Virgin\nHigh Expression Exonic regions\nRaw Counts', colorCol='flag_AI_combined')



In [ ]:

    
rawMLow = marenRawCounts[marenRawCounts['fusion_id'].isin(mFusLow) & (marenRawCounts['mating_status'] == 'M')]
rawVLow = marenRawCounts[marenRawCounts['fusion_id'].isin(vFusLow) & (marenRawCounts['mating_status'] == 'V')]
dfPanelScatter(df=rawMLow, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Mated\nLow Expression Exonic regions\nRaw Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=rawVLow, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Virgin\nLow Expression Exonic regions\nRaw Counts', colorCol='flag_AI_combined')



In [ ]:

    
dfPanelScatter(df=rawMLow, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Mated\nLow Expression Exonic regions\nRaw Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=rawVLow, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Virgin\nLow Expression Exonic regions\nRaw Counts', colorCol='flag_AI_combined')



In [ ]:

    
q3MHi = marenQ3Norm[marenQ3Norm['fusion_id'].isin(mFusHi) & (marenQ3Norm['mating_status'] == 'M')]
q3VHi = marenQ3Norm[marenQ3Norm['fusion_id'].isin(vFusHi) & (marenQ3Norm['mating_status'] == 'V')]
dfPanelScatter(df=q3MHi, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Mated\nHigh Expression Exonic regions\nQ3 Normalized Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=q3VHi, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Virgin\nHigh Expression Exonic regions\nQ3 Normalized Counts', colorCol='flag_AI_combined')



In [ ]:

    
dfPanelScatter(df=q3MHi, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Mated\nHigh Expression Exonic regions\nQ3 Normalized Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=q3VHi, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Virgin\nHigh Expression Exonic regions\nQ3 Normalized Counts', colorCol='flag_AI_combined')



In [ ]:

    
q3MLow = marenQ3Norm[marenQ3Norm['fusion_id'].isin(mFusLow) & (marenQ3Norm['mating_status'] == 'M')]
q3VLow = marenQ3Norm[marenQ3Norm['fusion_id'].isin(vFusLow) & (marenQ3Norm['mating_status'] == 'V')]
dfPanelScatter(df=q3MLow, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Mated\nLow Expression Exonic regions\nQ3 Normalized Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=q3VLow, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Virgin\nLow Expression Exonic regions\nQ3 Normalized Counts', colorCol='flag_AI_combined')



In [ ]:

    
dfPanelScatter(df=q3MLow, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Mated\nLow Expression Exonic regions\nQ3 Normalized Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=q3VLow, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Virgin\nLow Expression Exonic regions\nQ3 Normalized Counts', colorCol='flag_AI_combined')



In [ ]:

    
q3StdMHi = marenQ3NormStd[marenQ3NormStd['fusion_id'].isin(mFusHi) & (marenQ3NormStd['mating_status'] == 'M')]
q3StdVHi = marenQ3NormStd[marenQ3NormStd['fusion_id'].isin(vFusHi) & (marenQ3NormStd['mating_status'] == 'V')]
dfPanelScatter(df=q3StdMHi, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Mated\nHigh Expression Exonic regions\nQ3 Normalized and Mean Standardized Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=q3StdVHi, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Virgin\nHigh Expression Exonic regions\nQ3 Normalized and Mean Standardized Counts', colorCol='flag_AI_combined')



In [ ]:

    
dfPanelScatter(df=q3StdMHi, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Mated\nHigh Expression Exonic regions\nQ3 Normalized and Mean Standardized Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=q3StdVHi, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Virgin\nHigh Expression Exonic regions\nQ3 Normalized and Mean Standardized Counts', colorCol='flag_AI_combined')



In [ ]:

    
q3StdMLow = marenQ3NormStd[marenQ3NormStd['fusion_id'].isin(mFusLow) & (marenQ3NormStd['mating_status'] == 'M')]
q3StdVLow = marenQ3NormStd[marenQ3NormStd['fusion_id'].isin(vFusLow) & (marenQ3NormStd['mating_status'] == 'V')]
dfPanelScatter(df=q3StdMLow, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Mated\nLow Expression Exonic regions\nQ3 Normalized and Mean Standardized Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=q3StdVLow, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Virgin\nLow Expression Exonic regions\nQ3 Normalized and Mean Standardized Counts', colorCol='flag_AI_combined')



In [ ]:

    
dfPanelScatter(df=q3StdMLow, x='mean_apn', y='cis_line', group='fusion_id', plot_title='Mated\nLow Expression Exonic regions\nQ3 Normalized and Mean Standardized Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=q3StdVLow, x='mean_apn', y='cis_line', group='fusion_id', plot_title='Virgin\nLow Expression Exonic regions\nQ3 Normalized and Mean Standardized Counts', colorCol='flag_AI_combined')



In [ ]:

			flag_AI_combined	sum_both	sum_line	sum_tester	cis_line	cis_tester	mean_apn
line	mating_status	fusion_id
r101	M	F10005_SI	0	1274	155	133	28.666667	6.666667	29.523922
r280	M	F10005_SI	0	1208	126	143	-10.333333	6.666667	27.917307
r315	M	F10005_SI	0	1231	237	218	25.666667	6.666667	31.867690
r324	M	F10005_SI	1	2554	349	276	79.666667	6.666667	60.087419
r335	M	F10005_SI	0	1339	215	217	4.666667	6.666667	33.474306
r340	M	F10005_SI	1	1897	332	102	236.666667	6.666667	44.059067
r357	M	F10005_SI	1	2797	358	408	-43.333333	6.666667	67.345540
r358	M	F10005_SI	0	1494	185	162	29.666667	6.666667	34.797401
r365	M	F10005_SI	0	2313	385	386	5.666667	6.666667	58.291790
r373	M	F10005_SI	0	1546	146	113	39.666667	6.666667	34.116952
r374	M	F10005_SI	0	1703	231	155	82.666667	6.666667	39.484938
r380	M	F10005_SI	0	1584	295	254	47.666667	6.666667	40.316598
r427	M	F10005_SI	0	4246	522	510	18.666667	6.666667	99.761370
r491	M	F10005_SI	0	3827	657	599	64.666667	6.666667	96.075605
r517	M	F10005_SI	0	2591	580	520	66.666667	6.666667	69.764914
r732	M	F10005_SI	0	1281	276	237	45.666667	6.666667	33.909037
r737	M	F10005_SI	0	1576	126	156	-23.333333	6.666667	35.118724
r799	M	F10005_SI	0	2798	520	480	46.666667	6.666667	71.787360
r820	M	F10005_SI	0	1880	236	180	62.666667	6.666667	43.397519
r85	M	F10005_SI	0	1086	131	191	-53.333333	6.666667	26.613113
w114	M	F10005_SI	0	1042	230	243	-6.333333	6.666667	28.635558
w38	M	F10005_SI	1	2046	444	501	-50.333333	6.666667	56.533963
w47	M	F10005_SI	0	5492	1425	1417	14.666667	6.666667	157.523922
w52	M	F10005_SI	0	3848	407	397	16.666667	6.666667	87.929120
w55	M	F10005_SI	1	4191	252	882	-623.333333	6.666667	100.649734
w59	M	F10005_SI	0	1429	367	329	44.666667	6.666667	40.165387
w64	M	F10005_SI	0	2153	284	239	51.666667	6.666667	50.580035
w68	M	F10005_SI	0	1915	371	305	72.666667	6.666667	48.973420
w76	M	F10005_SI	0	1757	286	376	-83.333333	6.666667	45.722386
w79	M	F10005_SI	0	2596	384	583	-192.333333	6.666667	67.345540