Maren Equations

Sergey and Lauren developed a set of equtions found here:

Nuzhdin, S. V, Friesen, M. L., & McIntyre, L. M. (2012). Genotype-phenotype mapping in a post-GWAS world. Trends in Genetics : TIG, 28(9), 421–6. doi:10.1016/j.tig.2012.06.003

which potentially allow for the identificqtion of cis- and trans-effects. Here I try using these qeustions and test if they give reasonable results.

Basics: For a given gene the expression level of $E_{ii}$ of allele i in F1 genotype i.

$E_{ii} = \mu + C_i + (T_i + T_t)/2$

$E_{ti} = \mu + C_t + (T_i + T_t)/2$

For each allele the cis- and trans-effects are deviations from the population means, we expect that they will sum to zero:

$\sum^n_{i=1}C_i = 0$

$\sum^n_{i=1}T_i = 0$

Then the expected difference in expression between the Line and Tester allele over the entire population is:

$\sum^n_{i=1} \frac{E_{ti} - E_{ii}}{n}$

Which can be re-written as

$\sum^n_{i=1} \frac{C_{t} - C_{i}}{n} = C_t$

And

$T_t = 2(\frac{\sum^n_{i=1}E_{ti}}{n} - \mu - C_t)$

The cis-effect of allele i can be estimated by:

$\hat C_i = \hat E_{ii} - \hat E_{ti} + \hat C_t$

and trans-effects of allele i can be estimated by:

$\hat T_i = 2(\hat E_{ti} - \hat \mu - \hat C_t - \frac{\hat T_{ti}}{2})$


In [1]:
# Set-up default environment
%run '../ipython_startup.py'

# Import additional libraries
import sas7bdat as sas
import cPickle as pickle

from ase_cisEq import marenEq
from ase_cisEq import marenPrintTable

from ase_normalization import meanStd

from ase_plotting import dfPanelScatter


Importing commonly used libraries: os, sys, numpy as np, scipy as sp, pandas as pd, matplotlib as mp, matplotlib.pyplot as plt, datetime as dt, mclib_Python/flagging as fg
Creating project level variables: MCLAB = /home/jfear/mclab, PROJ = /home/jfear/mclab/cegs_ase_paper, TODAY = 20150908
Adding ['scripts/mclib_Python', 'scripts/ase_Python'] to PYTHONPATH

Import clean data set

This data set was created by: ase_summarize_ase_filters.sas

The data has had the following droped:

  • regions that were always bias in 100 genome simulation
  • regions with APN $\le 25$
  • regions not in at least 10% of genotypes
  • regions not in mated and virgin
  • genotypes with extreme bias in median(q5_mean_theta)
  • genotypes with $\le500$ regions

In [2]:
# Import clean dataset
with sas.SAS7BDAT(os.path.join(PROJ, 'sas_data/clean_ase_stack.sas7bdat')) as FH:
    df = FH.to_data_frame()
    
dfClean = df[['line', 'mating_status', 'fusion_id', 'flag_AI_combined', 'q5_mean_theta', 'sum_both', 'sum_line', 'sum_tester', 'sum_total', 'mean_apn']]


[clean_ase_stack.sas7bdat] header length 65536 != 8192
WARNING:/home/jfear/mclab/cegs_ase_paper/sas_data/clean_ase_stack.sas7bdat:[clean_ase_stack.sas7bdat] header length 65536 != 8192

Additional cleaning

For the maren equations, I am also going to drop exonic regions with less than 10 genotypes. The maren equations make some assumptions about the population level sums. Obvisouly the more genotypes that are present for each fusions the better, but I am comfortable with as few as 10 genotypes.


In [3]:
# Drop groups with less than 10 lines per fusion
grp = dfClean.groupby(['mating_status', 'fusion_id'])
dfGt10 = grp.filter(lambda x: x['line'].count() >= 10).copy()
print 'Rows ' + str(dfGt10.shape[0])
print 'Columns ' + str(dfGt10.shape[1])


Rows 131700
Columns 10

In [4]:
# Function to make a panel plot of kde for the maren equation output
def marenKDE(df, value1='cis_line', value2='trans_line', label1='cis_line', label2='trans_line'):
    # pivot for easy plotting
    line = pd.pivot_table(df, columns='fusion_id', index=('line'), values=value1)
    tester = pd.pivot_table(df, columns='fusion_id', index=('line'), values=value2)

    # Plot only the first 30 fusions
    axes = line.iloc[:, :30].plot(kind='kde', subplots=True, layout=(6, 5), figsize=(20, 15), sharex=False, rot=90, color='b')
    tester.iloc[:, :30].plot(kind='kde', subplots=True, ax=axes, sharex=False, color='g', rot=90)

    # Add a vline to the plots and remove legend
    for ax in axes.ravel():
        ax.axvline(0, color='r', lw=2)
        handles, labels = ax.get_legend_handles_labels()
        ax.set_title(labels[0])
        ax.legend().remove()
        ax.get_yaxis().set_visible(False)

    fig = plt.gcf()
    plt.legend(handles, [label1, label2], bbox_transform=fig.transFigure, bbox_to_anchor=(0.65, 1.03), ncol=2, fontsize=18)
    plt.tight_layout()
    return fig

Raw Counts

Raw counts seem to have some issues. The magnitude of line cis and tester trans effects are very different, while tester cis and trans are the same number just different signs. Also of concern is that the estimated cis and trans effects for the line are not centered at 0, which is a major assumption of the equations.


In [5]:
# Calculate Maren TIG equations by mating status and exonic region
marenRawCounts = marenEq(dfGt10, Eii='sum_line', Eti='sum_tester', group=['mating_status', 'fusion_id'])
marenRawCounts['mag_cis'] = abs(marenRawCounts['cis_line'])
marenPrintTable(marenRawCounts)


flag_AI_combined sum_both sum_line sum_tester cis_line cis_tester mean_apn
line mating_status fusion_id
r101 M F10005_SI 0 1274 155 133 28.666667 6.666667 29.523922
r280 M F10005_SI 0 1208 126 143 -10.333333 6.666667 27.917307
r315 M F10005_SI 0 1231 237 218 25.666667 6.666667 31.867690
r324 M F10005_SI 1 2554 349 276 79.666667 6.666667 60.087419
r335 M F10005_SI 0 1339 215 217 4.666667 6.666667 33.474306
r340 M F10005_SI 1 1897 332 102 236.666667 6.666667 44.059067
r357 M F10005_SI 1 2797 358 408 -43.333333 6.666667 67.345540
r358 M F10005_SI 0 1494 185 162 29.666667 6.666667 34.797401
r365 M F10005_SI 0 2313 385 386 5.666667 6.666667 58.291790
r373 M F10005_SI 0 1546 146 113 39.666667 6.666667 34.116952
r374 M F10005_SI 0 1703 231 155 82.666667 6.666667 39.484938
r380 M F10005_SI 0 1584 295 254 47.666667 6.666667 40.316598
r427 M F10005_SI 0 4246 522 510 18.666667 6.666667 99.761370
r491 M F10005_SI 0 3827 657 599 64.666667 6.666667 96.075605
r517 M F10005_SI 0 2591 580 520 66.666667 6.666667 69.764914
r732 M F10005_SI 0 1281 276 237 45.666667 6.666667 33.909037
r737 M F10005_SI 0 1576 126 156 -23.333333 6.666667 35.118724
r799 M F10005_SI 0 2798 520 480 46.666667 6.666667 71.787360
r820 M F10005_SI 0 1880 236 180 62.666667 6.666667 43.397519
r85 M F10005_SI 0 1086 131 191 -53.333333 6.666667 26.613113
w114 M F10005_SI 0 1042 230 243 -6.333333 6.666667 28.635558
w38 M F10005_SI 1 2046 444 501 -50.333333 6.666667 56.533963
w47 M F10005_SI 0 5492 1425 1417 14.666667 6.666667 157.523922
w52 M F10005_SI 0 3848 407 397 16.666667 6.666667 87.929120
w55 M F10005_SI 1 4191 252 882 -623.333333 6.666667 100.649734
w59 M F10005_SI 0 1429 367 329 44.666667 6.666667 40.165387
w64 M F10005_SI 0 2153 284 239 51.666667 6.666667 50.580035
w68 M F10005_SI 0 1915 371 305 72.666667 6.666667 48.973420
w76 M F10005_SI 0 1757 286 376 -83.333333 6.666667 45.722386
w79 M F10005_SI 0 2596 384 583 -192.333333 6.666667 67.345540

In [6]:
fig = marenKDE(marenRawCounts)


---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-6-b6ec4c277051> in <module>()
----> 1 fig = marenKDE(marenRawCounts)

<ipython-input-4-7fb5c1c3fab2> in marenKDE(df, value1, value2, label1, label2)
      3     # pivot for easy plotting
      4     line = pd.pivot_table(df, columns='fusion_id', index=('line'), values=value1)
----> 5     tester = pd.pivot_table(df, columns='fusion_id', index=('line'), values=value2)
      6 
      7     # Plot only the first 30 fusions

/home/jfear/.local/lib/python2.7/site-packages/pandas/tools/pivot.pyc in pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna)
    109 
    110     grouped = data.groupby(keys)
--> 111     agged = grouped.agg(aggfunc)
    112 
    113     table = agged

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in agg(self, func, *args, **kwargs)
    675     @Appender(_agg_doc)
    676     def agg(self, func, *args, **kwargs):
--> 677         return self.aggregate(func, *args, **kwargs)
    678 
    679     def _iterate_slices(self):

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in aggregate(self, arg, *args, **kwargs)
   2601     def aggregate(self, arg, *args, **kwargs):
   2602         if isinstance(arg, compat.string_types):
-> 2603             return getattr(self, arg)(*args, **kwargs)
   2604 
   2605         result = OrderedDict()

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in mean(self)
    690         """
    691         try:
--> 692             return self._cython_agg_general('mean')
    693         except GroupByError:
    694             raise

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in _cython_agg_general(self, how, numeric_only)
   2524 
   2525     def _cython_agg_general(self, how, numeric_only=True):
-> 2526         new_items, new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
   2527         return self._wrap_agged_blocks(new_items, new_blocks)
   2528 

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in _cython_agg_blocks(self, how, numeric_only)
   2571 
   2572         if len(new_blocks) == 0:
-> 2573             raise DataError('No numeric types to aggregate')
   2574 
   2575         return data.items, new_blocks

DataError: No numeric types to aggregate

Mean Centered Counts

I am concerned about the calculation of $\mu$. Mean centering the raw counts will allow $\mu = 0$ and I can effectively ignore it. For each fusion_id I take the mean of all the raw counts (line and tester), then subtract this mean value from each count.

This had no affect on the results.


In [7]:
# Mean Center raw counts
def meanCenter(x):
    """Mean centering.
    
    Mean center allele specific counts by using both line and tester counts.
    
    """
    # Get the mean for combined counts
    cntMean = np.mean(x[['sum_line', 'sum_tester']].values)
    
    # Center values using mean
    x['sum_line_center'] = x['sum_line'] - cntMean
    x['sum_tester_center'] = x['sum_tester'] - cntMean
    
    return x

# Group by mating status and fusions id
grp = dfGt10.groupby(['mating_status', 'fusion_id'])

# For each ms*fusion_id do the mean centering
meanCentered = grp.apply(meanCenter)
meanCentered.reset_index(inplace=True)

In [8]:
# Calculate Maren TIG equations by mating status and exonic region for mean centered data.
marenMeanCenter = marenEq(meanCentered, Eii='sum_line_center', Eti='sum_tester_center', group=['mating_status', 'fusion_id'])
marenMeanCenter['mag_cis'] = abs(marenMeanCenter['cis_line'])
marenPrintTable(marenMeanCenter, line='sum_line_center', tester='sum_tester_center')


flag_AI_combined sum_both sum_line_center sum_tester_center cis_line cis_tester mean_apn
line mating_status fusion_id
r101 M F10005_SI 0 1274 -198.733333 -220.733333 28.666667 6.666667 29.523922
r280 M F10005_SI 0 1208 -227.733333 -210.733333 -10.333333 6.666667 27.917307
r315 M F10005_SI 0 1231 -116.733333 -135.733333 25.666667 6.666667 31.867690
r324 M F10005_SI 1 2554 -4.733333 -77.733333 79.666667 6.666667 60.087419
r335 M F10005_SI 0 1339 -138.733333 -136.733333 4.666667 6.666667 33.474306
r340 M F10005_SI 1 1897 -21.733333 -251.733333 236.666667 6.666667 44.059067
r357 M F10005_SI 1 2797 4.266667 54.266667 -43.333333 6.666667 67.345540
r358 M F10005_SI 0 1494 -168.733333 -191.733333 29.666667 6.666667 34.797401
r365 M F10005_SI 0 2313 31.266667 32.266667 5.666667 6.666667 58.291790
r373 M F10005_SI 0 1546 -207.733333 -240.733333 39.666667 6.666667 34.116952
r374 M F10005_SI 0 1703 -122.733333 -198.733333 82.666667 6.666667 39.484938
r380 M F10005_SI 0 1584 -58.733333 -99.733333 47.666667 6.666667 40.316598
r427 M F10005_SI 0 4246 168.266667 156.266667 18.666667 6.666667 99.761370
r491 M F10005_SI 0 3827 303.266667 245.266667 64.666667 6.666667 96.075605
r517 M F10005_SI 0 2591 226.266667 166.266667 66.666667 6.666667 69.764914
r732 M F10005_SI 0 1281 -77.733333 -116.733333 45.666667 6.666667 33.909037
r737 M F10005_SI 0 1576 -227.733333 -197.733333 -23.333333 6.666667 35.118724
r799 M F10005_SI 0 2798 166.266667 126.266667 46.666667 6.666667 71.787360
r820 M F10005_SI 0 1880 -117.733333 -173.733333 62.666667 6.666667 43.397519
r85 M F10005_SI 0 1086 -222.733333 -162.733333 -53.333333 6.666667 26.613113
w114 M F10005_SI 0 1042 -123.733333 -110.733333 -6.333333 6.666667 28.635558
w38 M F10005_SI 1 2046 90.266667 147.266667 -50.333333 6.666667 56.533963
w47 M F10005_SI 0 5492 1071.266667 1063.266667 14.666667 6.666667 157.523922
w52 M F10005_SI 0 3848 53.266667 43.266667 16.666667 6.666667 87.929120
w55 M F10005_SI 1 4191 -101.733333 528.266667 -623.333333 6.666667 100.649734
w59 M F10005_SI 0 1429 13.266667 -24.733333 44.666667 6.666667 40.165387
w64 M F10005_SI 0 2153 -69.733333 -114.733333 51.666667 6.666667 50.580035
w68 M F10005_SI 0 1915 17.266667 -48.733333 72.666667 6.666667 48.973420
w76 M F10005_SI 0 1757 -67.733333 22.266667 -83.333333 6.666667 45.722386
w79 M F10005_SI 0 2596 30.266667 229.266667 -192.333333 6.666667 67.345540

In [9]:
fig = marenKDE(marenMeanCenter)


---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-9-1dda2499c620> in <module>()
----> 1 fig = marenKDE(marenMeanCenter)

<ipython-input-4-7fb5c1c3fab2> in marenKDE(df, value1, value2, label1, label2)
      3     # pivot for easy plotting
      4     line = pd.pivot_table(df, columns='fusion_id', index=('line'), values=value1)
----> 5     tester = pd.pivot_table(df, columns='fusion_id', index=('line'), values=value2)
      6 
      7     # Plot only the first 30 fusions

/home/jfear/.local/lib/python2.7/site-packages/pandas/tools/pivot.pyc in pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna)
    109 
    110     grouped = data.groupby(keys)
--> 111     agged = grouped.agg(aggfunc)
    112 
    113     table = agged

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in agg(self, func, *args, **kwargs)
    675     @Appender(_agg_doc)
    676     def agg(self, func, *args, **kwargs):
--> 677         return self.aggregate(func, *args, **kwargs)
    678 
    679     def _iterate_slices(self):

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in aggregate(self, arg, *args, **kwargs)
   2601     def aggregate(self, arg, *args, **kwargs):
   2602         if isinstance(arg, compat.string_types):
-> 2603             return getattr(self, arg)(*args, **kwargs)
   2604 
   2605         result = OrderedDict()

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in mean(self)
    690         """
    691         try:
--> 692             return self._cython_agg_general('mean')
    693         except GroupByError:
    694             raise

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in _cython_agg_general(self, how, numeric_only)
   2524 
   2525     def _cython_agg_general(self, how, numeric_only=True):
-> 2526         new_items, new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
   2527         return self._wrap_agged_blocks(new_items, new_blocks)
   2528 

/home/jfear/.local/lib/python2.7/site-packages/pandas/core/groupby.pyc in _cython_agg_blocks(self, how, numeric_only)
   2571 
   2572         if len(new_blocks) == 0:
-> 2573             raise DataError('No numeric types to aggregate')
   2574 
   2575         return data.items, new_blocks

DataError: No numeric types to aggregate

Upper Quartile Normalization

For most of the CEGS projects, we have used a q3 normalization. Here I am taking the count value / the upper quartile for the line * median of the overall upper quartile.

$\frac{\text{sum_line}_{gf}}{q3_g} * \widetilde {q3}$

Where g is genotypes 1...G and f is exonic region 1...F.


In [ ]:
# Functiont to calc the upper quartile normalization
def calcQ3(df, column): 
    # Calculate upper quantile for each mating_status*fusion
    q3 = df.groupby(['mating_status', 'fusion_id'])[column].quantile(q=0.75)
    q3.name = 'q3'
    q3 = q3.reset_index()

    # Calculate median q3 by mating_status
    medQ3 = q3.groupby('mating_status').median()
    medQ3.columns = ['medQ3']
    
    # Merge q3 and medQ3 together
    dfQ3 = q3.merge(medQ3, left_on='mating_status', right_index=True)
    
    # Combine with original data
    merged = df.merge(dfQ3, on=['mating_status', 'fusion_id'])
    
    # Calcuate q3 norm and add to original dataset
    # value / q3 * med(q3)
    df['q3_norm_' + column] = merged[column] / merged['q3'] * merged['medQ3']

In [ ]:
# run q3 norm
dfQ3 = dfGt10.copy()
calcQ3(dfQ3, 'sum_line')
calcQ3(dfQ3, 'sum_tester')

In [ ]:
# Calculate Maren TIG equations by mating status and exonic region
marenQ3Norm = marenEq(dfQ3, Eii='q3_norm_sum_line', Eti='q3_norm_sum_tester', group=['mating_status', 'fusion_id'])
marenQ3Norm['mag_cis'] = abs(marenQ3Norm['cis_line'])
marenPrintTable(marenQ3Norm, line='q3_norm_sum_line', tester='q3_norm_sum_tester')

In [ ]:
fig = marenKDE(marenQ3Norm)

Mean Standardization

For mean standardization, for each exonic regions, I subtract the mean value of the exonic regions across genotypes and divide by the standard deviation. Note I am doing environments separately (mated and virgin).


In [ ]:
# Mean standardization
meanStd(df=q3Line, colName='q3_norm_sum_line', group='fusion_id')
meanStd(df=q3Tester, colName='q3_norm_sum_tester', group='fusion_id')

# Merge everything together
q3Merge = q3Line.merge(q3Tester, how='inner', on=['line', 'mating_status', 'fusion_id'])
dfQ3Std = dfClean.merge(q3Merge, how='left', on=['line', 'mating_status', 'fusion_id'])

In [ ]:
# Calculate Maren TIG equations by mating status and exonic region
marenQ3NormStd = marenEq(dfQ3Std, Eii='mean_std_q3_norm_sum_line', Eti='mean_std_q3_norm_sum_tester', group=['mating_status', 'fusion_id'])
marenQ3NormStd['mag_cis'] = abs(marenQ3NormStd['cis_line'])
marenQ3NormStd.head()

In [ ]:
grp = marenQ3NormStd.groupby(['mating_status', 'fusion_id'])
tp = grp.get_group(('M', 'F10005_SI'))

In [ ]:
tp[['cis_line', 'trans_line']].plot(kind='kde')

Summary Plots

Now I am going to do a variety of summary plots and see how the different normalization methods compare.


In [ ]:
# Figure out the 25 highest expressed fusion
## group fusion by id and env
fusGrp = dfGt10.groupby(['mating_status', 'fusion_id'])

## Calculate the mean apn for each fusion across genotypes
mApn = fusGrp['mean_apn'].mean()
mApnI = mApn.reset_index()
mApnI.set_index('fusion_id', inplace=True)

# Get the 25 highest expressed fusions for mated and virgin
fusGrp2 = mApnI.groupby('mating_status')

m = fusGrp2.get_group('M').rank(ascending=False)
mFusHi = m[m['mean_apn'] <= 25].index

v = fusGrp2.get_group('V').rank(ascending=False)
vFusHi = v[v['mean_apn'] <= 25].index

# Get the 25 lowest expressed fusions for mated and virgin
m = fusGrp2.get_group('M').rank(ascending=True)
mFusLow = m[m['mean_apn'] <= 25].index

v = fusGrp2.get_group('V').rank(ascending=True)
vFusLow = v[v['mean_apn'] <= 25].index

In [ ]:
rawMHi = marenRawCounts[marenRawCounts['fusion_id'].isin(mFusHi) & (marenRawCounts['mating_status'] == 'M')]
rawVHi = marenRawCounts[marenRawCounts['fusion_id'].isin(vFusHi) & (marenRawCounts['mating_status'] == 'V')]
dfPanelScatter(df=rawMHi, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Mated\nHigh Expression Exonic regions\nRaw Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=rawVHi, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Virgin\nHigh Expression Exonic regions\nRaw Counts', colorCol='flag_AI_combined')

In [ ]:
dfPanelScatter(df=rawMHi, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Mated\nHigh Expression Exonic regions\nRaw Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=rawVHi, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Virgin\nHigh Expression Exonic regions\nRaw Counts', colorCol='flag_AI_combined')

In [ ]:
rawMLow = marenRawCounts[marenRawCounts['fusion_id'].isin(mFusLow) & (marenRawCounts['mating_status'] == 'M')]
rawVLow = marenRawCounts[marenRawCounts['fusion_id'].isin(vFusLow) & (marenRawCounts['mating_status'] == 'V')]
dfPanelScatter(df=rawMLow, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Mated\nLow Expression Exonic regions\nRaw Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=rawVLow, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Virgin\nLow Expression Exonic regions\nRaw Counts', colorCol='flag_AI_combined')

In [ ]:
dfPanelScatter(df=rawMLow, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Mated\nLow Expression Exonic regions\nRaw Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=rawVLow, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Virgin\nLow Expression Exonic regions\nRaw Counts', colorCol='flag_AI_combined')

In [ ]:
q3MHi = marenQ3Norm[marenQ3Norm['fusion_id'].isin(mFusHi) & (marenQ3Norm['mating_status'] == 'M')]
q3VHi = marenQ3Norm[marenQ3Norm['fusion_id'].isin(vFusHi) & (marenQ3Norm['mating_status'] == 'V')]
dfPanelScatter(df=q3MHi, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Mated\nHigh Expression Exonic regions\nQ3 Normalized Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=q3VHi, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Virgin\nHigh Expression Exonic regions\nQ3 Normalized Counts', colorCol='flag_AI_combined')

In [ ]:
dfPanelScatter(df=q3MHi, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Mated\nHigh Expression Exonic regions\nQ3 Normalized Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=q3VHi, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Virgin\nHigh Expression Exonic regions\nQ3 Normalized Counts', colorCol='flag_AI_combined')

In [ ]:
q3MLow = marenQ3Norm[marenQ3Norm['fusion_id'].isin(mFusLow) & (marenQ3Norm['mating_status'] == 'M')]
q3VLow = marenQ3Norm[marenQ3Norm['fusion_id'].isin(vFusLow) & (marenQ3Norm['mating_status'] == 'V')]
dfPanelScatter(df=q3MLow, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Mated\nLow Expression Exonic regions\nQ3 Normalized Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=q3VLow, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Virgin\nLow Expression Exonic regions\nQ3 Normalized Counts', colorCol='flag_AI_combined')

In [ ]:
dfPanelScatter(df=q3MLow, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Mated\nLow Expression Exonic regions\nQ3 Normalized Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=q3VLow, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Virgin\nLow Expression Exonic regions\nQ3 Normalized Counts', colorCol='flag_AI_combined')

In [ ]:
q3StdMHi = marenQ3NormStd[marenQ3NormStd['fusion_id'].isin(mFusHi) & (marenQ3NormStd['mating_status'] == 'M')]
q3StdVHi = marenQ3NormStd[marenQ3NormStd['fusion_id'].isin(vFusHi) & (marenQ3NormStd['mating_status'] == 'V')]
dfPanelScatter(df=q3StdMHi, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Mated\nHigh Expression Exonic regions\nQ3 Normalized and Mean Standardized Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=q3StdVHi, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Virgin\nHigh Expression Exonic regions\nQ3 Normalized and Mean Standardized Counts', colorCol='flag_AI_combined')

In [ ]:
dfPanelScatter(df=q3StdMHi, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Mated\nHigh Expression Exonic regions\nQ3 Normalized and Mean Standardized Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=q3StdVHi, x='mean_apn', y='mag_cis', group='fusion_id', plot_title='Virgin\nHigh Expression Exonic regions\nQ3 Normalized and Mean Standardized Counts', colorCol='flag_AI_combined')

In [ ]:
q3StdMLow = marenQ3NormStd[marenQ3NormStd['fusion_id'].isin(mFusLow) & (marenQ3NormStd['mating_status'] == 'M')]
q3StdVLow = marenQ3NormStd[marenQ3NormStd['fusion_id'].isin(vFusLow) & (marenQ3NormStd['mating_status'] == 'V')]
dfPanelScatter(df=q3StdMLow, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Mated\nLow Expression Exonic regions\nQ3 Normalized and Mean Standardized Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=q3StdVLow, x='q5_mean_theta', y='cis_line', group='fusion_id', vline=0.5, plot_title='Virgin\nLow Expression Exonic regions\nQ3 Normalized and Mean Standardized Counts', colorCol='flag_AI_combined')

In [ ]:
dfPanelScatter(df=q3StdMLow, x='mean_apn', y='cis_line', group='fusion_id', plot_title='Mated\nLow Expression Exonic regions\nQ3 Normalized and Mean Standardized Counts', colorCol='flag_AI_combined')
dfPanelScatter(df=q3StdVLow, x='mean_apn', y='cis_line', group='fusion_id', plot_title='Virgin\nLow Expression Exonic regions\nQ3 Normalized and Mean Standardized Counts', colorCol='flag_AI_combined')

In [ ]: