Here's a template of the kind of exploration, preprocessing, model construction and evaluation I want to do with each project

Outline

Data exploration

Distributions (broken down by class)
- Intensity distributions per sample (boxplots)
  - Make sure no samples are weird
- Mean Feature Intensity (broken down by class)
  - Make sure classes don't look obviously wonky (batch effect)
- Feature Sparsity distributions per sample (distplot)
  - Look for batch effect
- Mean Feature Sparsity distribution (distplot)
  - Batch effects?
Covariates
- intensity x variance
- rt x intensity
- mz x intensity
- rt-width x intensity
- mzwid x intensity

Statistics
- MW-pvals distributions, stratified by
  - Variance
  - mean feature Intensity
  - mean feature Sparsity
  - rt
  - mz
Unsupervised viz (visualize by class)
- PCA
- Clustering
- cosine similarity matrices?

Data Preprocessing

Filtering
- Variance filter
Transformation
- Dilution correction (PQN-normalization)
- Misisng value imputation (fill zero's with 1/2 the minimum intensity detected)
- log transformation and no-log-transform (branchpoint for downstream analyses)
- Robust scaler (subtract median, scale by IQR)

Add in pos/neg mode features (branchpoint)

Process separately, and also together
After preprocessing steps have been looked at/through and filtering performed, combine the positive and negative mode data

Feature Selection

Dimensionality reduction (branchpoint)
- PCA
- Hierarchical clustering
- Problem: These features won't generalize, so not extensible beyond current study

Classification

RF
Logistic regression
- with l-1 penalty for sparsity

Load the data



In [73]:

    
import subprocess
subprocess.call('pip install mwtab', shell=True)
%load_ext autoreload
import src.project_fxns.organize_xcms as xcms_fxns
import src.data.preprocessing as preproc
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import src.data.data_exploration as explore
import pandas as pd
import scipy.stats as stats

# scikit
from sklearn.decomposition import PCA
from sklearn import preprocessing
%matplotlib inline
%autoreload 2









    



The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload



In [2]:

    
mtbls315_pos = xcms_fxns.Xcms_organize(
        '/home/data/processed/MTBLS315/uhplc_pos/xcms_result.tsv'
)
mtbls315_pos.remove_column_prefix(prefix='X')

# get class to samplename mapping
class_col = 'Factor Value[patient group]'
mtbls315_pos.mtbls_class_mapping(
'/home/data/raw/MTBLS315/a_UPLC_POS_nmfi_and_bsi_diagnosis.txt',
'/home/data/raw/MTBLS315/s_NMFI and BSI diagnosis.txt', 
class_col)

# encode classes as numeric
mtbls315_pos.class_encoder()









    



Were you expecting 3 classes? because thats how many you have. look at self.sample_classes if this is unexpected



In [3]:

    
# Samples are from 3 classes, I might want to just differentiate between two
mtbls315_pos.class_dict
no_mal = np.concatenate([mtbls315_pos.class_dict['bacterial bloodstream infection'],
                mtbls315_pos.class_dict['non-malarial febrile illness']])
mal = mtbls315_pos.class_dict['malaria']
new_class_dict = {'malaria': mal, 'non-malarial fever': no_mal}

Data Exploration



In [4]:

    
(mtbls315_pos.feature_table < 1e-15).sum().sum()









    Out[4]:





121007



In [5]:

    
zeros / float(num_feats)









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-a06857e16ca7> in <module>()
----> 1 zeros / float(num_feats)

NameError: name 'zeros' is not defined



In [247]:

    
num_feats = mtbls315_pos.feature_table.shape[1] * mtbls315_pos.feature_table.shape[0]
zeros = (mtbls315_pos.feature_table < 1e-15).sum().sum()
print('Original number of zeros %s, %.2f%% ' % (zeros, 100*float(zeros)/num_feats)
     )
print('Original number of NaNs %s' % mtbls315_pos.feature_table.isnull().sum().sum()
     )
zero_filled = explore.fill_zero_nan(mtbls315_pos.feature_table)
print('Zero-filled number of zeros %s' % (zero_filled < 1e-15).sum().sum()
     )
print('Zero-filled number of NaNs %s' % zero_filled.isnull().sum().sum()
     )
print('Zero-filled min: %s' % zero_filled.min().min())









    



Original number of zeros 121007, 8.98% 
Original number of NaNs 0
Zero-filled number of zeros 0
Zero-filled number of NaNs 0
Zero-filled min: 217.13836028



In [ ]:



In [14]:

    
# Sample Intensity distributions
tidy = explore.tidy(zero_filled)
tidy['value'] = np.log10(tidy['value'])
tidy









    Out[14]:







  
    
      
      index
      variable
      value
    
  
  
    
      0
      1001_P
      100.075432918:130.5724
      4.615378
    
    
      1
      1002_P
      100.075432918:130.5724
      5.206816
    
    
      2
      1003_P
      100.075432918:130.5724
      5.004543
    
    
      3
      1004_P
      100.075432918:130.5724
      5.273708
    
    
      4
      1005_P
      100.075432918:130.5724
      4.440325
    
    
      5
      1006_P
      100.075432918:130.5724
      6.784475
    
    
      6
      1007_P
      100.075432918:130.5724
      4.903290
    
    
      7
      1009_P
      100.075432918:130.5724
      6.025914
    
    
      8
      1010_P
      100.075432918:130.5724
      4.999858
    
    
      9
      1011_P
      100.075432918:130.5724
      4.916920
    
    
      10
      1012_P
      100.075432918:130.5724
      4.395611
    
    
      11
      1013_P
      100.075432918:130.5724
      5.393434
    
    
      12
      1014_P
      100.075432918:130.5724
      6.391552
    
    
      13
      1016_P
      100.075432918:130.5724
      5.216422
    
    
      14
      1017_P
      100.075432918:130.5724
      5.490196
    
    
      15
      1018_P
      100.075432918:130.5724
      5.055997
    
    
      16
      1019_P
      100.075432918:130.5724
      4.861370
    
    
      17
      1020_P
      100.075432918:130.5724
      4.508194
    
    
      18
      1021_P
      100.075432918:130.5724
      4.366947
    
    
      19
      1023_P
      100.075432918:130.5724
      5.523405
    
    
      20
      1024_P
      100.075432918:130.5724
      5.325778
    
    
      21
      1026_P
      100.075432918:130.5724
      5.272066
    
    
      22
      1027_P
      100.075432918:130.5724
      5.256088
    
    
      23
      1029_P
      100.075432918:130.5724
      4.667263
    
    
      24
      1030_P
      100.075432918:130.5724
      4.411805
    
    
      25
      1031_P
      100.075432918:130.5724
      5.408729
    
    
      26
      1032_P
      100.075432918:130.5724
      4.566263
    
    
      27
      1033_P
      100.075432918:130.5724
      4.601434
    
    
      28
      1034_P
      100.075432918:130.5724
      4.783590
    
    
      29
      1035_P
      100.075432918:130.5724
      4.907137
    
    
      ...
      ...
      ...
      ...
    
    
      1347887
      1037_P
      998.48957211:982.2369
      4.126519
    
    
      1347888
      1038_P
      998.48957211:982.2369
      4.081392
    
    
      1347889
      1039_P
      998.48957211:982.2369
      4.392842
    
    
      1347890
      1040_P
      998.48957211:982.2369
      4.390206
    
    
      1347891
      1041_P
      998.48957211:982.2369
      4.352155
    
    
      1347892
      1042_P
      998.48957211:982.2369
      4.405384
    
    
      1347893
      1043_P
      998.48957211:982.2369
      4.343691
    
    
      1347894
      1044_P
      998.48957211:982.2369
      4.420763
    
    
      1347895
      1045_P
      998.48957211:982.2369
      4.537131
    
    
      1347896
      1046_P
      998.48957211:982.2369
      4.388312
    
    
      1347897
      1048_P
      998.48957211:982.2369
      4.301509
    
    
      1347898
      1049_P
      998.48957211:982.2369
      4.346778
    
    
      1347899
      1050_P
      998.48957211:982.2369
      4.163235
    
    
      1347900
      1051_P
      998.48957211:982.2369
      4.307732
    
    
      1347901
      1052_P
      998.48957211:982.2369
      4.221648
    
    
      1347902
      1053_P
      998.48957211:982.2369
      4.330420
    
    
      1347903
      1054_P
      998.48957211:982.2369
      4.405871
    
    
      1347904
      1055_P
      998.48957211:982.2369
      4.162828
    
    
      1347905
      1056_P
      998.48957211:982.2369
      4.338738
    
    
      1347906
      1057_P
      998.48957211:982.2369
      4.124465
    
    
      1347907
      1058_P
      998.48957211:982.2369
      4.302146
    
    
      1347908
      1059_P
      998.48957211:982.2369
      4.282524
    
    
      1347909
      1060_P
      998.48957211:982.2369
      4.128449
    
    
      1347910
      1061_P
      998.48957211:982.2369
      2.336737
    
    
      1347911
      1062_P
      998.48957211:982.2369
      4.122423
    
    
      1347912
      1064_P
      998.48957211:982.2369
      4.011740
    
    
      1347913
      1065_P
      998.48957211:982.2369
      4.322816
    
    
      1347914
      1066_P
      998.48957211:982.2369
      4.407336
    
    
      1347915
      1067_P
      998.48957211:982.2369
      3.921203
    
    
      1347916
      1068_P
      998.48957211:982.2369
      4.082472
    
  

1347917 rows × 3 columns



In [55]:

    
axes = explore.sample_feature_intensity(tidy, 
                                new_class_dict, ylabel='log10(Intensity)',
                                       xlabel='Samples')



In [70]:

    
# Mean intensity per feature
axes = explore.distplot_classes(np.log10(zero_filled),
    class_dict=new_class_dict, fxn=np.mean,
    axlabel='log10(Mean Intensity)', bins=100,
                               kde=False)
plt.show()

# Mean intensity per feature per class
df_mean = pd.DataFrame({k: np.mean(zero_filled.loc[v], axis=0)
                       for k, v in new_class_dict.iteritems()}
                      )
sns.boxplot(data=np.log10(df_mean))









    



non-malarial fever
malaria






    












    Out[70]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f262424ef10>



In [71]:

    
explore.distplot_classes(
    np.log10(zero_filled), class_dict=new_class_dict,
    fxn=np.var, axlabel='$log_{10}$(Variance)', bins=50, kde=False)

plt.show()

df_var = pd.DataFrame({k: np.var(zero_filled.loc[v], axis=0)
                 for k, v in new_class_dict.iteritems()}
            )
sns.boxplot(data=np.log10(df_var))
plt.show()









    



non-malarial fever
malaria



In [10]:

    
# Sparsity per sample
sample_sparsity = ((mtbls315_pos.feature_table < 1e-15).sum(axis=1) 
                   / mtbls315_pos.feature_table.shape[1])
ax = explore.distplot_classes(sample_sparsity, new_class_dict,
                             kde=False, bins=10)
for graph in ax:
    graph.set_title('Feature sparsity per sample')
    graph.set_xlabel('Sparsity ratio')









    



non-malarial fever
malaria



In [39]:

    
# Distribution of mean feature sparsity
for k, v in new_class_dict.iteritems():
    print(v)
    feature_sparsity = ((mtbls315_pos.feature_table.loc[v] < 1e-15).sum(axis=0)
                        / mtbls315_pos.feature_table.loc[v].shape[0])
    ax = explore.distplot_classes(feature_sparsity,
                        kde=False)
    ax.set_title('Class: %s' % k)
    plt.show()









    



['1007_P' '1011_P' '1012_P' '1019_P' '1026_P' '1033_P' '1045_P' '1049_P'
 '1050_P' '1053_P' '1057_P' '1061_P' '1005_P' '1010_P' '1020_P' '1021_P'
 '1023_P' '1024_P' '1032_P' '1034_P' '1036_P' '1043_P' '1052_P' '1054_P'
 '1055_P' '1060_P' '1062_P']






    












    



['1001_P' '1002_P' '1003_P' '1004_P' '1006_P' '1009_P' '1013_P' '1014_P'
 '1016_P' '1017_P' '1018_P' '1027_P' '1029_P' '1030_P' '1031_P' '1035_P'
 '1037_P' '1038_P' '1039_P' '1040_P' '1041_P' '1042_P' '1044_P' '1046_P'
 '1048_P' '1051_P' '1056_P' '1058_P' '1059_P' '1064_P' '1065_P' '1066_P'
 '1067_P' '1068_P']

Stats



In [122]:

    
mw_output = explore.two_group_stat(zero_filled, new_class_dict,
                       stats.mannwhitneyu, alternative='two-sided')



In [123]:

    
# pval distribution (*2 because it should be two-tailed)
mw_pvals = np.array([i[1] for i in mw_output])
sns.distplot(mw_pvals, bins=50, kde=False,
            axlabel='Raw Mann-Whitney pval')









    Out[123]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fca90c4e350>



In [260]:

    
g = sns.jointplot(np.log10(np.mean(zero_filled, axis=0)),
             np.log10(np.var(zero_filled, axis=0)),
             s=3, alpha=0.3)
g.set_axis_labels('$-log_{10}$(Mean Feature Intensity)', 
                  '$-log_{10}$(Feature Variance)')









    Out[260]:





<seaborn.axisgrid.JointGrid at 0x7fca78ccb790>

Stratify by covariates



In [241]:

    
# Stratify by covariates
covariates = mtbls315_pos.all_data.T.loc[['mz', 'rt',
                            'mzmin', 'mzmax', 'rtmin', 'rtmax']].T
# Get just the pvalue as a series
mw_pval_series = mw_output.apply(lambda x: x[1])



In [264]:

    
def pval_strat_hist_and_scatter(covar, stats, covar_name='Covariate',
    **kwargs):
    # First plot stratified pval histograms
    axes = explore.plot_pvals_stratified(covar, stats, covar_name,
                                        **kwargs)
    plt.show()
    # Next plot scatter plot with real values
    df = pd.concat([covar, stats], axis=1, keys=[covar_name, 'pval'])
    g = sns.jointplot(df[covar_name], -np.log10(df['pval']),
                     s=10, alpha=0.3)
    g.set_axis_labels(covar_name, '$-log_{10}$(pval)')
    plt.show()
    # then scatter plot with joint values
    g = sns.jointplot(range(0, df.shape[0]), -np.log10(df['pval']),
                      dropna=False,
                     s=10, alpha=0.3)
    g.set_axis_labels('Rank %s' % covar_name, '$-log_{10}(pval)')
    plt.show()



In [265]:

    
# Variance
pval_strat_hist_and_scatter(np.log10(np.var(zero_filled, axis=0)),
                           mw_pval_series,
                           covar_name='$log_{10}$(Variance)',
                            ngroups=8,
                           title_fxn=None)

Looks like feature-variance is:

Informative of power/prior probability (height of low pvals increases as variance increases)
Indepenent of the pvals under null (pvals > 0.5 are roughly uniform)

(See IHW vignette for more, section 3.4.2 - https://www.bioconductor.org/packages/release/bioc/vignettes/IHW/inst/doc/introduction_to_ihw.html )



In [266]:

    
# Mean Intensity
pval_strat_hist_and_scatter(
    np.log10(np.mean(zero_filled, axis=0)),
    mw_pval_series,
    covar_name='Mean Intensity',
    ngroups=8,
    title_fxn=None
    )

Looks like mean feature intensity is:

NOT informative of power (similar height peak for alt-hypotheses in all strat)
IS independent of the p-vals under null hypothesis (uniform-ish for pvals > 0.5)

But this is odd, b/c of how tightly correlated variance and mean intensity are



In [269]:

    
# Feature sparsity
feature_sparsity = ((mtbls315_pos.feature_table < 1e-15).sum(axis=0)
                        / mtbls315_pos.feature_table.shape[0])

pval_strat_hist_and_scatter(
    feature_sparsity,
    mw_pval_series,
    covar_name='Sparsity',
    ngroups=8,
    title_fxn=np.log10
    )

Looks like feature sparsity:

isn't informative of power (no clear trend between increase/decrease sparsity and inc/dec pval)
is independent of pvals under null (uniform pvals for p > 0.5)



In [272]:

    
pval_strat_hist_and_scatter(covariates['rt'], mw_pval_series,
                           covar_name='RT', ngroups=8,
                           title_fxn=None)

Looks like RT is

not informative of power for Mann-whitney
is independent of pvals under null (uniform-ish for pvals > 0.5)



In [273]:

    
pval_strat_hist_and_scatter(covariates['mz'],
                           mw_pval_series,
                           covar_name='mz',
                           ngroups=8)

Looks like mz:

Is not indicative of power for mann-whitney
is independent under null hypothesis (uniform for p>0.5)



In [276]:

    
rt_width = covariates['rtmax'] - covariates['rtmin']
pval_strat_hist_and_scatter(rt_width, 
                            mw_pval_series, 
                            covar_name='rt width',
                           ngroups=9,
                           title_fxn=None)

Looks like rt width is

not informative of power for mann-whitney
is independent of null hypothesis



In [282]:

    
# Check out mz-width
mz_wid = covariates['mzmax'] - covariates['mzmin']
pval_strat_hist_and_scatter(mz_wid, 
                            mw_pval_series,
                            covar_name='mz-width',
                            ngroups=6,
                            title_fxn=None
                           )

Looks like mzwidth is:

Not informative of power for MW
is independent under null hypothesis



In [147]:

    
y == 0









    Out[147]:





array([ True,  True,  True,  True, False,  True, False,  True, False,
       False, False,  True,  True,  True,  True,  True, False, False,
       False, False, False, False,  True,  True,  True,  True, False,
       False, False,  True, False,  True,  True,  True,  True,  True,
        True, False,  True, False,  True,  True, False, False,  True,
       False, False, False, False,  True, False,  True,  True, False,
       False, False,  True,  True,  True,  True,  True], dtype=bool)



In [150]:

    
# Get class for each sample
new_class_labels = xcms_fxns.encode_new_classes(new_class_dict)
# Make sure orders are right
assert(
    (new_class_labels.index == zero_filled.index).all())
le = preprocessing.LabelEncoder()
y = le.fit_transform(new_class_labels.values)
X = zero_filled.values
pca = PCA(n_components=2, whiten=True)
pca_out = pca.fit_transform(zero_filled)



In [165]:

    
pca_out[y==0,1].shape









    Out[165]:





(27,)



In [159]:

    
le.inverse_transform([0,1])









    Out[159]:





array(['malaria', 'non-malarial fever'], dtype=object)



In [175]:

    
le.inverse_transform(np.unique(y))









    Out[175]:





array(['malaria', 'non-malarial fever'], dtype=object)



In [386]:

    
def plot_PCA(data, class_labels):
    # data - samples x feats
    # class_labels - pd.Series, string or numeric, same index as 
    #    data
    # If index doesn't match, reindex and retry
    '''
    
    try: 
        assert((data.index == class_labels.index).all())
    except AssertionError:
        data = data.sort_index()
        class_labels = class_labels.sort_index()
    try:
        assert ((data.index == class_labels.index).all())
    except:
        print('The indices in class_labels and data dont match')
        raise
    '''
    le = preprocessing.LabelEncoder()
    y = le.fit_transform(class_labels.values)
    pca = PCA(n_components=2, whiten=False)
    pca_out = pca.fit_transform(data)
    colors = ['darkorange', 'turquoise']
    encoded_classes = np.unique(y)
    target_names = le.inverse_transform(encoded_classes)
    for color, i, target_name in zip(colors, encoded_classes, target_names):
        print(color, i, target_name)
        plt.scatter(pca_out[y==i,0], pca_out[y==i, 1],
                    color=color, alpha=0.7, label=target_name,
                    s=20
            )
    plt.legend(loc='best', scatterpoints=1)
    plt.title('PCA')
    return pca_out
    
pca_out = plot_PCA(zero_filled, new_class_labels)









    



('darkorange', 0, 'malaria')
('turquoise', 1, 'non-malarial fever')



In [265]:

    
X_scaled = preprocessing.scale(zero_filled, axis=0)



In [266]:

    
pca_out = plot_PCA(X_scaled, new_class_labels)









    



('darkorange', 0, 'malaria')
('turquoise', 1, 'non-malarial fever')



In [359]:

    
# try t-sne
from sklearn import manifold
from sklearn import feature_selection



In [367]:









    



('Variance Threshold', 0.61774254474818291)



In [404]:

    
# Do some preprocessing - variance filter and stuff
assert( (new_class_labels.index == zero_filled.index).all())
# toss out 25% lowest variances
X = zero_filled
x_var = np.var(X, axis=0)
sort_idx = x_var.argsort()
num_splits = 8
splits = np.array_split(sort_idx, num_splits)
# This is the variance of the 25% feature
var_thresh = x_var[splits[num_splits-2]].max()
print('Variance Threshold', var_thresh)

# preprocess
selector = feature_selection.VarianceThreshold(var_thresh)
X = selector.fit_transform(X)
print('X shape', X.shape)
X = preprocessing.scale((X))
print('X shape', X.shape)









    



('Variance Threshold', 74167754603.651352)
('X shape', (61, 2762))
('X shape', (61, 2762))



In [406]:

    
for perplex in [2, 5, 10, 20, 30, 40, 50, 75, 100, 200]:
    tsne = manifold.TSNE(n_components=2, perplexity=perplex,
                  init='random', random_state=1)
    Y = tsne.fit_transform(X)
    colors = ['navy', 'turquoise']
    classes = le.inverse_transform(np.unique(y))
    for color, i, name in zip(colors, np.unique(y), classes):
        print(color, i, name)
        plt.scatter(Y[y==i, 0], Y[y==i, 1], color=color,
                   label=name)
    plt.legend(bbox_to_anchor=(1.01, 1.01))
    plt.title('Perplexity: %s' % perplex)
    plt.show()









    



('navy', 0, 'malaria')
('turquoise', 1, 'non-malarial fever')






    












    



('navy', 0, 'malaria')
('turquoise', 1, 'non-malarial fever')






    












    



('navy', 0, 'malaria')
('turquoise', 1, 'non-malarial fever')






    












    



('navy', 0, 'malaria')
('turquoise', 1, 'non-malarial fever')






    












    



('navy', 0, 'malaria')
('turquoise', 1, 'non-malarial fever')






    












    



('navy', 0, 'malaria')
('turquoise', 1, 'non-malarial fever')






    












    



('navy', 0, 'malaria')
('turquoise', 1, 'non-malarial fever')






    












    



('navy', 0, 'malaria')
('turquoise', 1, 'non-malarial fever')






    












    



('navy', 0, 'malaria')
('turquoise', 1, 'non-malarial fever')






    












    



('navy', 0, 'malaria')
('turquoise', 1, 'non-malarial fever')



In [407]:

    
pca_out = plot_PCA(X, new_class_labels)
X.shape









    



('darkorange', 0, 'malaria')
('turquoise', 1, 'non-malarial fever')






    Out[407]:





(61, 2762)

Now do the preprocessing and some model building

Also for the artificial sample enrichment, should randomly features from anyone in the class (healthy or sick) - should eliminate the concern that just noising an existing sample will fuck with the cross validation

	index	variable	value
0	1001_P	100.075432918:130.5724	4.615378
1	1002_P	100.075432918:130.5724	5.206816
2	1003_P	100.075432918:130.5724	5.004543
3	1004_P	100.075432918:130.5724	5.273708
4	1005_P	100.075432918:130.5724	4.440325
5	1006_P	100.075432918:130.5724	6.784475
6	1007_P	100.075432918:130.5724	4.903290
7	1009_P	100.075432918:130.5724	6.025914
8	1010_P	100.075432918:130.5724	4.999858
9	1011_P	100.075432918:130.5724	4.916920
10	1012_P	100.075432918:130.5724	4.395611
11	1013_P	100.075432918:130.5724	5.393434
12	1014_P	100.075432918:130.5724	6.391552
13	1016_P	100.075432918:130.5724	5.216422
14	1017_P	100.075432918:130.5724	5.490196
15	1018_P	100.075432918:130.5724	5.055997
16	1019_P	100.075432918:130.5724	4.861370
17	1020_P	100.075432918:130.5724	4.508194
18	1021_P	100.075432918:130.5724	4.366947
19	1023_P	100.075432918:130.5724	5.523405
20	1024_P	100.075432918:130.5724	5.325778
21	1026_P	100.075432918:130.5724	5.272066
22	1027_P	100.075432918:130.5724	5.256088
23	1029_P	100.075432918:130.5724	4.667263
24	1030_P	100.075432918:130.5724	4.411805
25	1031_P	100.075432918:130.5724	5.408729
26	1032_P	100.075432918:130.5724	4.566263
27	1033_P	100.075432918:130.5724	4.601434
28	1034_P	100.075432918:130.5724	4.783590
29	1035_P	100.075432918:130.5724	4.907137
...	...	...	...
1347887	1037_P	998.48957211:982.2369	4.126519
1347888	1038_P	998.48957211:982.2369	4.081392
1347889	1039_P	998.48957211:982.2369	4.392842
1347890	1040_P	998.48957211:982.2369	4.390206
1347891	1041_P	998.48957211:982.2369	4.352155
1347892	1042_P	998.48957211:982.2369	4.405384
1347893	1043_P	998.48957211:982.2369	4.343691
1347894	1044_P	998.48957211:982.2369	4.420763
1347895	1045_P	998.48957211:982.2369	4.537131
1347896	1046_P	998.48957211:982.2369	4.388312
1347897	1048_P	998.48957211:982.2369	4.301509
1347898	1049_P	998.48957211:982.2369	4.346778
1347899	1050_P	998.48957211:982.2369	4.163235
1347900	1051_P	998.48957211:982.2369	4.307732
1347901	1052_P	998.48957211:982.2369	4.221648
1347902	1053_P	998.48957211:982.2369	4.330420
1347903	1054_P	998.48957211:982.2369	4.405871
1347904	1055_P	998.48957211:982.2369	4.162828
1347905	1056_P	998.48957211:982.2369	4.338738
1347906	1057_P	998.48957211:982.2369	4.124465
1347907	1058_P	998.48957211:982.2369	4.302146
1347908	1059_P	998.48957211:982.2369	4.282524
1347909	1060_P	998.48957211:982.2369	4.128449
1347910	1061_P	998.48957211:982.2369	2.336737
1347911	1062_P	998.48957211:982.2369	4.122423
1347912	1064_P	998.48957211:982.2369	4.011740
1347913	1065_P	998.48957211:982.2369	4.322816
1347914	1066_P	998.48957211:982.2369	4.407336
1347915	1067_P	998.48957211:982.2369	3.921203
1347916	1068_P	998.48957211:982.2369	4.082472