The exonic region F40168_SI showed AI in the majority of mated and virgin genotypes. In Mated, 31 genotypes showed AI towards the line, while 7 showed AI towards the tester. In Virgin, 29 genotypes showed AI towards line and 9 genotypes towards tester.
I want to see if I can group genotypes based on cis-effects from the maren equations and then identify variants based on these similarities. The may not work and may be a stupid idea, but if it does work it would be cool.
In [1]:
%run '../ipython_startup.py'
In [137]:
# Additional libraries
from sas7bdat import SAS7BDAT as SAS
from clustering import cluster
In [123]:
# Import AI calls and determine which genotypes group
with SAS(os.path.join(PROJ, 'sas_data/clean_ase_sbs.sas7bdat')) as FH:
ai = FH.to_data_frame()
# mated indicator showing direction
ai['mated_ind'] = 'No AI'
ai.loc[(ai['flag_AI_combined_m'] == 1) & (ai['q5_mean_theta_m'] > 0.5), 'mated_ind'] = 'Tester'
ai.loc[(ai['flag_AI_combined_m'] == 1) & (ai['q5_mean_theta_m'] < 0.5), 'mated_ind'] = 'Line'
# virgin indicator showing direction
ai['virgin_ind'] = 'No AI'
ai.loc[(ai['flag_AI_combined_v'] == 1) & (ai['q5_mean_theta_v'] > 0.5), 'virgin_ind'] = 'Tester'
ai.loc[(ai['flag_AI_combined_v'] == 1) & (ai['q5_mean_theta_v'] < 0.5), 'virgin_ind'] = 'Line'
# Combine indicators and come up with groups
ai['group'] = ai.apply(lambda x: x['mated_ind'] if x['mated_ind'] == x['virgin_ind'] else 'ambig', axis=1)
In [160]:
ai['group'].value_counts()
Out[160]:
In [139]:
# Import cis-effects from maren eq
maren = pd.read_csv(os.path.join(PROJ, 'pipeline_output/cis_effects/cis_line_effects.csv'))
In [159]:
fig = cluster(ai, maren, 'F40168_SI')
fig.savefig('../../pipeline_output/similarity/clustermap_f40168_si.png')
In [ ]: