Procedural programming in python

Topics

Flow control, part 2
- Functions
- In class exercise:
  - Functionalize this!
- From nothing to something:
  - Pairwise correlation between rows in a pandas dataframe
  - Sketch of the process
  - In class exercise:
    - Write the code!
  - Rejoining, sharing ideas, problems, thoughts

Flow control

Flow control figure

Flow control refers how to programs do loops, conditional execution, and order of functional operations.

If

If statements can be use to execute some lines or block of code if a particular condition is satisfied. E.g. Let's print something based on the entries in the list.



In [ ]:

    
instructors = ['Dave', 'Jim', 'Dorkus the Clown']

if 'Dorkus the Clown' in instructors:
    print('#fakeinstructor')

There is a special do nothing word: pass that skips over some arm of a conditional, e.g.



In [ ]:

    
if 'Jim' in instructors:
    print("Congratulations!  Jim is teaching, your class won't stink!")
else:
    pass

For

For loops are the standard loop, though while is also common. For has the general form:

for items in list:
    do stuff

For loops and collections like tuples, lists and dictionaries are natural friends.



In [ ]:

    
for instructor in instructors:
    print(instructor)

You can combine loops and conditionals:



In [ ]:

    
for instructor in instructors:
    if instructor.endswith('Clown'):
        print(instructor + " doesn't sound like a real instructor name!")
    else:
        print(instructor + " is so smart... all those gooey brains!")

range()

Since for operates over lists, it is common to want to do something like:

NOTE: C-like
for (i = 0; i < 3; ++i) {
    print(i);
}

The Python equivalent is:

for i in [0, 1, 2]:
    do something with i

What happens when the range you want to sample is big, e.g.

NOTE: C-like
for (i = 0; i < 1000000000; ++i) {
    print(i);
}

That would be a real pain in the rear to have to write out the entire list from 1 to 1000000000.

Enter, the range() function. E.g. range(3) is [0, 1, 2]



In [1]:

    
sum = 0
for i in range(10):
    sum += i
print(sum)

Functions

For loops let you repeat some code for every item in a list. Functions are similar in that they run the same lines of code for new values of some variable. They are different in that functions are not limited to looping over items.

Functions are a critical part of writing easy to read, reusable code.

Create a function like:

def function_name (parameters):
    """
    docstring
    """
    function expressions
    return [variable]

Note: Sometimes I use the word argument in place of parameter.

Here is a simple example. It prints a string that was passed in and returns nothing.



In [20]:

    
def print_string(str):
    """This prints out a string passed as the parameter."""
    print(str)
    for c in str:
        print(c)
        if c == 'r':
            break
    print("done")
    return



In [21]:

    
print_string("string")









    



string
s
t
r
done

To call the function, use:

print_string("Dave is awesome!")

Note: The function has to be defined before you can call it!



In [ ]:

    
print_string("Dave is awesome!")

If you don't provide an argument or too many, you get an error.



In [7]:

    
#print_string()

Parameters (or arguments) in Python are all passed by reference. This means that if you modify the parameters in the function, they are modified outside of the function.

See the following example:

def change_list(my_list):
   """This changes a passed list into this function"""
   my_list.append('four');
   print('list inside the function: ', my_list)
   return

my_list = [1, 2, 3];
print('list before the function: ', my_list)
change_list(my_list);
print('list after the function: ', my_list)



In [23]:

    
def change_list(my_list):
   """This changes a passed list into this function"""
   my_list.append('four');
   print('list inside the function: ', my_list)
   return

my_list = [1, 2, 3];
print('list before the function: ', my_list)
change_list(my_list);
print('list after the function: ', my_list)









    



list before the function:  [1, 2, 3]
list inside the function:  [1, 2, 3, 'four']
list after the function:  [1, 2, 3, 'four']

Variables have scope: global and local

In a function, new variables that you create are not saved when the function returns - these are local variables. Variables defined outside of the function can be accessed but not changed - these are global variables, Note there is a way to do this with the global keyword. Generally, the use of global variables is not encouraged, instead use parameters.

my_global_1 = 'bad idea'
my_global_2 = 'another bad one'
my_global_3 = 'better idea'

def my_function():
    print(my_global_1)
    my_global_2 = 'broke your global, man!'
    global my_global_3
    my_global_3 = 'still a better idea'
    return

my_function()
print(my_global_2)
print(my_global_3)



In [25]:

    
my_global_1 = 'bad idea'
my_global_2 = 'another bad one'
my_global_3 = 'better idea'

def my_function():
    print(my_global_1)
    my_global_2 = 'broke your global, man!'
    print(my_global_2)
    global my_global_3
    my_global_3 = 'still a better idea'
    return

my_function()
print(my_global_2)
print(my_global_3)









    



bad idea
broke your global, man!
another bad one
still a better idea

In general, you want to use parameters to provide data to a function and return a result with the return. E.g.

def sum(x, y):
    my_sum = x + y
    return my_sum

If you are going to return multiple objects, what data structure that we talked about can be used? Give and example below.



In [30]:

    
def a_function(parameter):
    return None



In [31]:

    
foo = a_function('bar')
print(foo)









    



None

Parameters have three different types:

type	behavior
required	positional, must be present or error, e.g. `my_func(first_name, last_name)`
keyword	position independent, e.g. `my_func(first_name, last_name)` can be called `my_func(first_name='Dave', last_name='Beck')` or `my_func(last_name='Beck', first_name='Dave')`
default	keyword params that default to a value if not provided



In [32]:

    
def print_name(first, last='the Clown'):
    print('Your name is %s %s' % (first, last))
    return

Take a minute and play around with the above function. Which are required? Keyword? Default?



In [34]:

    
def massive_correlation_analysis(data, method='pearson'):
    pass
    return

Functions can contain any code that you put anywhere else including:

if...elif...else
for...else
while
other function calls



In [39]:

    
def print_name_age(first, last, age):
    print_name(first, last)
    print('Your age is %d' % (age))
    print('Your age is ' + str(age))
    if age > 35:
        print('You are really old.')
    return



In [40]:

    
print_name_age(age=40, last='Beck', first='Dave')









    



Your name is Dave Beck
Your age is 40
Your age is 40
You are really old.

Once you have some code that is functionalized and not going to change, you can move it to a file that ends in .py, check it into version control, import it into your notebook and use it!

Let's do this now for the above two functions.

...

See you after the break!

Import the function...



In [ ]:

Call them!



In [ ]:

Hacky Hack Time with Functions!

Notes from last class:

The os package has tools for checking if a file exists: os.path.exists

import os
filename = 'HCEPDB_moldata.zip'
if os.path.exists(filename):
  print("wahoo!")

Use the requests package to get the file given a url (got this from the requests docs)

import requests
url = 'http://faculty.washington.edu/dacb/HCEPDB_moldata.zip'
req = requests.get(url)
assert req.status_code == 200 # if the download failed, this line will generate an error
with open(filename, 'wb') as f:
  f.write(req.content)

Use the zipfile package to decompress the file while reading it into pandas

import pandas as pd
import zipfile
csv_filename = 'HCEPDB_moldata.csv'
zf = zipfile.ZipFile(filename)
data = pd.read_csv(zf.open(csv_filename))

Here was my solution

import os
import requests
import pandas as pd
import zipfile

filename = 'HCEPDB_moldata.zip'
url = 'http://faculty.washington.edu/dacb/HCEPDB_moldata.zip'
csv_filename = 'HCEPDB_moldata.csv'

if os.path.exists(filename):
    pass
else:
    req = requests.get(url)
    assert req.status_code == 200 # if the download failed, this line will generate an error
    with open(filename, 'wb') as f:
        f.write(req.content)

zf = zipfile.ZipFile(filename)
data = pd.read_csv(zf.open(csv_filename))

My solution:



In [4]:

    
def download_if_not_exists(url, filename):
    if os.path.exists(filename):
        pass
    else:
        req = requests.get(url)
        assert req.status_code == 200 # if the download failed, this line will generate an error
        with open(filename, 'wb') as f:
            f.write(req.content)



In [5]:

    
def load_HCEPDB_data(url, zip_filename, csv_filename):
    download_if_not_exists(url, zip_filename)
    zf = zipfile.ZipFile(zip_filename)
    data = pd.read_csv(zf.open(csv_filename))
    return data



In [6]:

    
import os
import requests
import pandas as pd
import zipfile

load_HCEPDB_data('http://faculty.washington.edu/dacb/HCEPDB_moldata_set1.zip', 'HCEPDB_moldata_set1.zip', 'HCEPDB_moldata_set1.csv')









    Out[6]:






  
    
      
      id
      SMILES_str
      stoich_str
      mass
      pce
      voc
      jsc
      e_homo_alpha
      e_gap_alpha
      e_lumo_alpha
      tmp_smiles_str
    
  
  
    
      0
      655365
      C1C=CC=C1c1cc2[se]c3c4occc4c4nsnc4c3c2cn1
      C18H9N3OSSe
      394.3151
      5.161953
      0.867601
      91.567575
      -5.467601
      2.022944
      -3.444656
      C1=CC=C(C1)c1cc2[se]c3c4occc4c4nsnc4c3c2cn1
    
    
      1
      1245190
      C1C=CC=C1c1cc2[se]c3c(ncc4ccccc34)c2c2=C[SiH2]...
      C22H15NSeSi
      400.4135
      5.261398
      0.504824
      160.401549
      -5.104824
      1.630750
      -3.474074
      C1=CC=C(C1)c1cc2[se]c3c(ncc4ccccc34)c2c2=C[SiH...
    
    
      2
      65553
      [SiH2]1C=CC2=C1C=C([SiH2]2)C1=Cc2[se]ccc2[SiH2]1
      C12H12SeSi3
      319.4448
      6.138294
      0.630274
      149.887545
      -5.230274
      1.682250
      -3.548025
      C1=CC2=C([SiH2]1)C=C([SiH2]2)C1=Cc2[se]ccc2[Si...
    
    
      3
      720918
      C1C=c2c3ccsc3c3[se]c4cc(oc4c3c2=C1)C1=CC=CC1
      C20H12OSSe
      379.3398
      1.991366
      0.242119
      126.581347
      -4.842119
      1.809439
      -3.032680
      C1=CC=C(C1)c1cc2[se]c3c4sccc4c4=CCC=c4c3c2o1
    
    
      4
      1310744
      C1C=CC=C1c1cc2[se]c3c(c4nsnc4c4ccncc34)c2c2ccc...
      C24H13N3SSe
      454.4137
      5.605135
      0.951911
      90.622776
      -5.551911
      2.029717
      -3.522194
      C1=CC=C(C1)c1cc2[se]c3c(c4nsnc4c4ccncc34)c2c2c...
    
    
      5
      196637
      C1C=CC=C1c1cc2[se]c3cc4ccsc4cc3c2[se]1
      C17H10SSe2
      404.2520
      2.644436
      0.587932
      69.223461
      -5.187932
      2.201106
      -2.986827
      C1=CC=C(C1)c1cc2[se]c3cc4ccsc4cc3c2[se]1
    
    
      6
      262174
      C1C=CC=C1c1cc2[se]c3c4occc4c4cscc4c3c2[se]1
      C19H10OSSe2
      444.2730
      2.523057
      0.397670
      97.645325
      -4.997670
      1.982122
      -3.015548
      C1=CC=C(C1)c1cc2[se]c3c4occc4c4cscc4c3c2[se]1
    
    
      7
      393249
      C1C=CC=C1c1cc2[se]c3cc4cccnc4cc3c2c2ccccc12
      C24H15NSe
      396.3495
      3.115895
      0.869140
      55.174815
      -5.469140
      2.331815
      -3.137325
      C1=CC=C(C1)c1cc2[se]c3cc4cccnc4cc3c2c2ccccc12
    
    
      8
      35
      C1C2=C([SiH2]C=C2)C=C1c1cc2occc2c2cscc12
      C17H12OSSi
      292.4328
      2.743214
      0.387106
      109.062905
      -4.987106
      1.909966
      -3.077141
      C1=CC2=C([SiH2]1)C=C(C2)c1cc2occc2c2cscc12
    
    
      9
      1048612
      C1C=CC=C1C1=Cc2sc3cc4C=C[SiH2]c4cc3c2C1
      C18H14SSi
      290.4606
      2.408411
      0.431315
      85.937708
      -5.031315
      2.065850
      -2.965465
      C1=CC=C(C1)C1=Cc2sc3cc4C=C[SiH2]c4cc3c2C1
    
    
      10
      917542
      C1C=c2ccc3[se]c4c5[se]c(cc5[se]c4c3c2=C1)C1=CC...
      C20H12Se3
      489.1948
      2.843278
      0.302591
      144.614366
      -4.902591
      1.708198
      -3.194393
      C1=CC=C(C1)c1cc2[se]c3c([se]c4ccc5=CCC=c5c34)c...
    
    
      11
      1441831
      C1C=CC=C1C1=Cc2ncc3c4[se]ccc4cnc3c2C1
      C18H12N2Se
      335.2668
      2.687240
      0.675497
      61.225278
      -5.275497
      2.270953
      -3.004544
      C1=CC=C(C1)C1=Cc2ncc3c4[se]ccc4cnc3c2C1
    
    
      12
      1376296
      C1C=CC=C1C1=Cc2c(C1)c1[se]c3ccc4cscc4c3c1c1=C[...
      C24H16SSeSi
      443.5024
      2.844637
      0.189206
      231.387394
      -4.789206
      1.312334
      -3.476872
      C1=CC=C(C1)C1=Cc2c(C1)c1[se]c3ccc4cscc4c3c1c1=...
    
    
      13
      1638442
      C1C=c2ccc3cnc4c5[SiH2]C(=Cc5c5nsnc5c4c3c2=C1)C...
      C23H15N3SSi
      393.5445
      6.462512
      0.602405
      165.105179
      -5.202405
      1.603165
      -3.599240
      C1=CC=C(C1)C1=Cc2c([SiH2]1)c1ncc3ccc4=CCC=c4c3...
    
    
      14
      98350
      C1C=CC=C1C1=Cc2ccc3c4CC=Cc4c4cscc4c3c2[SiH2]1
      C22H16SSi
      340.5204
      2.631463
      0.410851
      98.573546
      -5.010851
      1.975707
      -3.035144
      C1=CC=C(C1)C1=Cc2ccc3c4CC=Cc4c4cscc4c3c2[SiH2]1
    
    
      15
      2162747
      C1C=CC=C1C1=Cc2c([SiH2]1)c1c3c[nH]cc3c3ccc4=C[...
      C27H19NOSi2
      429.6251
      2.039158
      0.140744
      222.981280
      -4.740744
      1.361137
      -3.379607
      C1=CC=C(C1)C1=Cc2c([SiH2]1)c1c3c[nH]cc3c3ccc4=...
    
    
      16
      557119
      C1C=c2c3C=C(Cc3c3occc3c2=C1)C1=CC=CC1
      C19H14O
      258.3186
      0.237205
      0.024962
      146.246545
      -4.624962
      1.700415
      -2.924547
      C1=CC=C(C1)C1=Cc2c(C1)c1occc1c1=CCC=c21
    
    
      17
      753728
      C1C=CC=C1C1=Cc2c([SiH2]1)c1cc3ncccc3cc1c1c[nH]...
      C22H16N2Si
      336.4684
      3.103831
      0.409504
      116.650708
      -5.009504
      1.863416
      -3.146088
      C1=CC=C(C1)C1=Cc2c([SiH2]1)c1cc3ncccc3cc1c1c[n...
    
    
      18
      819265
      C1C=CC=C1C1=Cc2c([SiH2]1)c1c(c3cscc23)c2[se]cc...
      C23H16SSeSi2
      459.5774
      5.385253
      0.368606
      224.848916
      -4.968606
      1.352309
      -3.616298
      C1=CC=C(C1)C1=Cc2c([SiH2]1)c1c(c3cscc23)c2[se]...
    
    
      19
      1278019
      C1C=CC=C1C1=Cc2c([SiH2]1)c1c(c3[SiH2]C=Cc3c3=C...
      C23H18OSi3
      394.6522
      5.489489
      0.301242
      280.455932
      -4.901242
      1.135619
      -3.765623
      C1=CC=C(C1)C1=Cc2c([SiH2]1)c1c(c3[SiH2]C=Cc3c3...
    
    
      20
      2096063
      C1C=CC=C1c1cc2[se]c3c(c2c2cscc12)c1ccccc1c1ccc...
      C27H14N2S2Se
      509.5136
      6.204093
      0.570055
      167.497914
      -5.170055
      1.593078
      -3.576977
      C1=CC=C(C1)c1cc2[se]c3c(c2c2cscc12)c1ccccc1c1c...
    
    
      21
      1572945
      C1C=CC=C1C1=Cc2[se]c3c4sccc4c4ccccc4c3c2C1
      C22H14SSe
      389.3786
      2.167252
      0.330623
      100.884304
      -4.930623
      1.961253
      -2.969370
      C1=CC=C(C1)C1=Cc2[se]c3c4sccc4c4ccccc4c3c2C1
    
    
      22
      2359381
      C1C=CC=C1C1=Cc2c(C1)c1c3cscc3c3ccc4nsnc4c3c1c1...
      C26H14N2OS2
      434.5416
      4.112982
      0.299549
      211.318161
      -4.899549
      1.409229
      -3.490319
      C1=CC=C(C1)C1=Cc2c(C1)c1c3cscc3c3ccc4nsnc4c3c1...
    
    
      23
      1540183
      C1C=CC=C1c1cc2[se]c3c([se]c4ccc5cscc5c34)c2cn1
      C20H11NSSe2
      455.2999
      3.212565
      0.683568
      72.329945
      -5.283568
      2.174712
      -3.108856
      C1=CC=C(C1)c1cc2[se]c3c([se]c4ccc5cscc5c34)c2cn1
    
    
      24
      1638500
      C1C=CC=C1c1cc2[se]c3ccc4ccccc4c3c2c2cocc12
      C23H14OSe
      385.3226
      3.088844
      0.482262
      98.573546
      -5.082262
      1.977235
      -3.105027
      C1=CC=C(C1)c1cc2[se]c3ccc4ccccc4c3c2c2cocc12
    
    
      25
      2621542
      C1C=c2c3ccccc3c3c4ccccc4c4C=C(Cc4c3c2=C1)C1=CC...
      C29H20
      368.4770
      2.552886
      0.341115
      115.180406
      -4.941115
      1.872759
      -3.068355
      C1=CC=C(C1)C1=Cc2c(C1)c1c(c3ccccc23)c2ccccc2c2...
    
    
      26
      98411
      C1C=CC=C1c1cc2[se]c3cc4cccnc4cc3c2c2cscc12
      C22H13NSSe
      402.3777
      4.247356
      0.653960
      99.957476
      -5.253960
      1.967245
      -3.286715
      C1=CC=C(C1)c1cc2[se]c3cc4cccnc4cc3c2c2cscc12
    
    
      27
      524398
      C1C=c2c3C=C([SiH2]c3c3ncc4ccc5nsnc5c4c3c2=C1)C...
      C23H15N3SSi
      393.5445
      5.860942
      0.497394
      181.348711
      -5.097394
      1.533947
      -3.563447
      C1=CC=C(C1)C1=Cc2c([SiH2]1)c1ncc3ccc4nsnc4c3c1...
    
    
      28
      131187
      C1C=c2c3ccc4nsnc4c3c3cnc4C=C(Cc4c3c2=C1)C1=CC=CC1
      C24H15N3S
      377.4695
      6.517681
      0.691659
      145.026911
      -5.291659
      1.706854
      -3.584805
      C1=CC=C(C1)C1=Cc2ncc3c(c2C1)c1=CCC=c1c1ccc2nsn...
    
    
      29
      163960
      C1C=CC=C1C1=Cc2ncc3c4CC=Cc4ccc3c2[SiH2]1
      C19H15NSi
      285.4205
      3.235009
      0.585638
      85.014628
      -5.185638
      2.071184
      -3.114454
      C1=CC=C(C1)C1=Cc2ncc3c4CC=Cc4ccc3c2[SiH2]1
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      1106468
      1779493
      c1cc2c3nsnc3c3c(ncc4cc(-c5cccc6c[nH]cc56)c5csc...
      C25H12N4S2Se
      511.4898
      4.404175
      0.608078
      111.468683
      -5.208078
      1.893235
      -3.314843
      NaN
    
    
      1106469
      2860840
      c1cc2c3nsnc3c3c(ncc4cc(-c5cccc6nsnc56)c5nsnc5c...
      C21H7N7S3Se
      532.4933
      6.515421
      1.336547
      75.024985
      -5.936547
      2.152054
      -3.784493
      NaN
    
    
      1106470
      1222442
      C1C(=Cc2[se]c3c4occc4c4nsnc4c3c12)c1cccc2ccccc12
      C23H12N2OSSe
      443.3868
      4.398127
      0.683511
      99.030648
      -5.283511
      1.973899
      -3.309613
      c1cc2c3nsnc3c3c4CC(=Cc4[se]c3c2o1)c1cccc2ccccc12
    
    
      1106471
      3090232
      [SiH2]1C=Cc2c1csc2-c1cc2ccc3c4occc4c4nsnc4c3c2...
      C24H12N2O2S2Si
      452.5888
      4.193127
      0.839972
      76.828261
      -5.439972
      2.136325
      -3.303646
      c1cc2c3nsnc3c3c(ccc4cc(-c5scc6[SiH2]C=Cc56)c5c...
    
    
      1106472
      206659
      c1csc(n1)-c1cc2ccc3c4occc4c4nsnc4c3c2c2cscc12
      C21H9N3OS3
      415.5201
      4.589759
      0.887008
      79.636066
      -5.487008
      2.114699
      -3.372309
      c1cc2c3nsnc3c3c(ccc4cc(-c5nccs5)c5cscc5c34)c2o1
    
    
      1106473
      2434889
      c1occ2c(cccc12)-c1cc2oc3c(c4nsnc4c4ccncc34)c2c...
      C25H11N3O2S2
      449.5129
      5.665579
      0.809923
      107.658433
      -5.409923
      1.917514
      -3.492409
      c1cc2c3nsnc3c3c(oc4cc(-c5cccc6cocc56)c5cscc5c3...
    
    
      1106474
      960331
      c1cc2c3nsnc3c3c(ccc4cc(cnc34)-c3scc4[se]ccc34)...
      C21H9N3S3Se
      478.4811
      5.081765
      0.914993
      85.475998
      -5.514993
      2.067645
      -3.447349
      NaN
    
    
      1106475
      1681228
      [SiH2]1C(=Cc2c1c1c3nsnc3c3ccoc3c1c1ccccc21)c1c...
      C23H11N5OS2Si
      465.5919
      10.033001
      0.953904
      161.872709
      -5.553904
      1.617546
      -3.936358
      c1cc2c3nsnc3c3c4[SiH2]C(=Cc4c4ccccc4c3c2o1)c1c...
    
    
      1106476
      1517392
      C1C(=Cc2sc3c4sccc4c4nsnc4c3c12)c1scc2C=C[SiH2]c12
      C19H10N2S4Si
      422.6520
      5.013859
      0.701342
      110.024660
      -5.301342
      1.904229
      -3.397113
      c1cc2c3nsnc3c3c4CC(=Cc4sc3c2s1)c1scc2C=C[SiH2]c12
    
    
      1106477
      2598739
      C1C=c2cccc(C3=Cc4c([SiH2]3)c3c5nsnc5c5ccoc5c3c...
      C25H15N3OSSi
      433.5655
      3.112296
      0.268163
      178.619700
      -4.868163
      1.545688
      -3.322475
      c1cc2c3nsnc3c3c4[SiH2]C(=Cc4c4c[nH]cc4c3c2o1)c...
    
    
      1106478
      763733
      [SiH2]1C(=Cc2[se]c3c(c12)c1nsnc1c1ccc2cscc2c31...
      C19H9N5S2SeSi
      478.4931
      9.147375
      1.082907
      130.002865
      -5.682907
      1.788135
      -3.894772
      c1cc2c3nsnc3c3c4[SiH2]C(=Cc4[se]c3c2c2cscc12)c...
    
    
      1106479
      42846
      c1cc2csc(-c3cc4sc5c6[se]ccc6c6nsnc6c5c4c4cscc3...
      C22H8N2S5Se
      539.6092
      5.518285
      0.632665
      134.238713
      -5.232665
      1.763726
      -3.468940
      c1cc2c3nsnc3c3c(sc4cc(-c5scc6ccsc56)c5cscc5c34...
    
    
      1106480
      272226
      [SiH2]1C=c2c(cc3sc4c5occc5c5nsnc5c4c3c2=C1)-c1...
      C20H10N2OS2SeSi
      465.4900
      6.291725
      0.642023
      150.822679
      -5.242023
      1.676692
      -3.565331
      c1cc2c3nsnc3c3c(sc4cc(-c5ccc[se]5)c5=C[SiH2]C=...
    
    
      1106481
      2271076
      c1cc2c3nsnc3c3c(sc4cc(-c5scc6cc[se]c56)c5ccccc...
      C24H10N2OS3Se
      517.5140
      4.768060
      0.797364
      92.030673
      -5.397364
      2.021933
      -3.375431
      NaN
    
    
      1106482
      1124198
      C1C(=Cc2c1c1c3nsnc3c3ccoc3c1c1ccccc21)c1ccccn1
      C24H13N3OS
      391.4527
      4.974625
      0.765936
      99.957476
      -5.365936
      1.964935
      -3.401001
      c1cc2c3nsnc3c3c4CC(=Cc4c4ccccc4c3c2o1)c1ccccn1
    
    
      1106483
      1582951
      [SiH2]1C=c2cccc(C3=Cc4cnc5c6cnccc6c6nsnc6c5c4[...
      C22H14N4SSi2
      422.6186
      8.389379
      0.910843
      141.753503
      -5.510843
      1.725547
      -3.785296
      c1cc2c3nsnc3c3c4[SiH2]C(=Cc4cnc3c2cn1)c1cccc2=...
    
    
      1106484
      1058666
      [SiH2]1C=Cc2c1csc2-c1cc2ncc3c4occc4c4nsnc4c3c2cn1
      C20H10N4OS2Si
      414.5440
      7.680649
      1.171715
      100.884304
      -5.771715
      1.960875
      -3.810840
      c1cc2c3nsnc3c3c(cnc4cc(ncc34)-c3scc4[SiH2]C=Cc...
    
    
      1106485
      370546
      [SiH2]1C=c2c(cc3sc4c5sccc5c5nsnc5c4c3c2=C1)-c1...
      C22H12N2S3Si
      428.6348
      6.067688
      0.676091
      138.123000
      -5.276091
      1.744040
      -3.532051
      c1cc2c3nsnc3c3c(sc4cc(-c5ccccc5)c5=C[SiH2]C=c5...
    
    
      1106486
      894837
      C1C=c2cccc(-c3cc4ccc5c6sccc6c6nsnc6c5c4cn3)c2=C1
      C24H13N3S2
      407.5197
      5.638177
      0.479687
      180.895837
      -5.079687
      1.535536
      -3.544151
      c1cc2c3nsnc3c3c(ccc4cc(ncc34)-c3cccc4=CCC=c34)...
    
    
      1106487
      2205559
      [SiH2]1C=c2cccc(-c3cc4ncc5c6occc6c6nsnc6c5c4c4...
      C23H11N5OS2Si
      465.5919
      6.928005
      0.823054
      129.547072
      -5.423054
      1.789620
      -3.633434
      c1cc2c3nsnc3c3c(cnc4cc(-c5cccc6=C[SiH2]C=c56)c...
    
    
      1106488
      141179
      [SiH2]1C=Cc2csc(c12)-c1cc2ncc3c4sccc4c4nsnc4c3...
      C21H9N5S4Si
      487.6871
      6.762845
      1.172927
      88.737230
      -5.772927
      2.044499
      -3.728428
      c1cc2c3nsnc3c3c(cnc4cc(-c5scc6C=C[SiH2]c56)c5n...
    
    
      1106489
      1091453
      C1C(=Cc2cnc3c4sccc4c4nsnc4c3c12)c1cccc2nsnc12
      C20H9N5S3
      415.5241
      7.230940
      1.128969
      98.573546
      -5.728969
      1.974714
      -3.754254
      c1cc2c3nsnc3c3c4CC(=Cc4cnc3c2s1)c1cccc2nsnc12
    
    
      1106490
      2303876
      [SiH2]1C=Cc2csc(C3=Cc4c([SiH2]3)c3c5nsnc5c5cc[...
      C22H12N2S3SeSi2
      535.6808
      7.524076
      0.713512
      162.292795
      -5.313512
      1.615859
      -3.697653
      c1cc2c3nsnc3c3c4[SiH2]C(=Cc4c4cscc4c3c2[se]1)c...
    
    
      1106491
      1648533
      [SiH2]1C=c2c(cc3ccc4c5[se]ccc5c5nsnc5c4c3c2=C1...
      C23H11N5S2SeSi
      528.5529
      10.055248
      0.886720
      174.523481
      -5.486720
      1.562081
      -3.924639
      c1cc2c3nsnc3c3c(ccc4cc(-c5cncc6nsnc56)c5=C[SiH...
    
    
      1106492
      829339
      c1cc2c3nsnc3c3c(ccc4cc(-c5scc6sccc56)c5cocc5c3...
      C24H10N2O2S3
      454.5530
      4.369276
      0.695215
      96.724891
      -5.295215
      1.989505
      -3.305709
      NaN
    
    
      1106493
      2729884
      c1cc2c3nsnc3c3c(sc4cc(-c5cccc6nsnc56)c5ccccc5c...
      C24H10N4S3Se
      529.5290
      5.785201
      1.025143
      86.852379
      -5.625143
      2.056575
      -3.568567
      NaN
    
    
      1106494
      1779614
      [SiH2]1C=Cc2csc(c12)-c1cc2cnc3c4[se]ccc4c4nsnc...
      C21H9N5S3SeSi
      534.5811
      7.293623
      1.213582
      92.495763
      -5.813582
      2.018996
      -3.794586
      c1cc2c3nsnc3c3c(ncc4cc(-c5scc6C=C[SiH2]c56)c5n...
    
    
      1106495
      1943455
      C1C=c2cccc(-c3cc4ncc5c6sccc6c6nsnc6c5c4c4cscc3...
      C26H13N3S3
      463.6077
      5.619779
      0.591400
      146.246545
      -5.191400
      1.699286
      -3.492114
      c1cc2c3nsnc3c3c(cnc4cc(-c5cccc6=CCC=c56)c5cscc...
    
    
      1106496
      1779616
      [SiH2]1C(=Cc2c1c1c3nsnc3c3cc[se]c3c1c1ccccc21)...
      C26H15N3SSeSi
      508.5375
      4.886015
      0.426423
      176.344363
      -5.026423
      1.555438
      -3.470986
      c1cc2c3nsnc3c3c4[SiH2]C(=Cc4c4ccccc4c3c2[se]1)...
    
    
      1106497
      239522
      [SiH2]1C(=Cc2c1c1c3nsnc3c3ccsc3c1c1ccccc21)c1c...
      C23H13N3S2Si
      423.5947
      6.313634
      1.019342
      95.325067
      -5.619342
      1.997782
      -3.621560
      c1cc2c3nsnc3c3c4[SiH2]C(=Cc4c4ccccc4c3c2s1)c1c...
    
  

1106498 rows × 11 columns

How many functions did you use?

Why did you choose to use functions for these pieces?

From something to nothing

Task: Compute the pairwise Pearson correlation between rows in a dataframe.

Let's say we have three molecules (A, B, C) with three measurements each (v1, v2, v3). So for each molecule we have a vector of measurements:

$$X=\begin{bmatrix} X_{v_{1}} \\ X_{v_{2}} \\ X_{v_{3}} \\ \end{bmatrix} $$

Where X is a molecule and the components are the values for each of the measurements. These make up the rows in our matrix.

Often, we want to compare molecules to determine how similar or different they are. One measure is the Pearson correlation.

Pearson correlation:

Expressed graphically, when you plot the paired measurements for two samples (in this case molecules) against each other you can see positively correlated, no correlation, and negatively correlated. Eg.

Simple input dataframe (note when you are writing code it is always a good idea to have a simple test case where you can readily compute by hand or know the output):

index	v1	v3
A	-1	1
B	1	-1
C	.5	.5

If the above is a dataframe what shape and size is the output?

Whare are some unique features of the output?

For our test case, what will the output be?

	A	B	C
A	1	-1	0
B	-1	1	0
C	0	0	1

Let's sketch the idea...



In [ ]:

In class exercise

20-30 minutes

Objectives:

Write code using functions to compute the pairwise Pearson correlation between rows in a pandas dataframe. You will have to use for and possibly if.
Use a cell to test each function with an input that yields an expected output. Think about the shape and values of the outputs.
Put the code in a .py file in the directory with the Jupyter notebook, import and run!

To help you get started...

To create the sample dataframe:

df = pd.DataFrame([[-1, 0, 1], [1, 0, -1], [.5, 0, .5]])

To loop over rows in a dataframe, check out (Google is your friend):

DataFrame.iterrows



In [11]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:

How do we know it is working?

Use the test case!

Our three row example is a useful tool for checking that our code is working. We can write some tests that compare the output of our functions to our expectations.

E.g. The diagonals should be 1, and corr(A, B) = -1, ...

But first, let's talk `assert` and `raise`

We've already briefly been exposed to assert in this code:

if os.path.exists(filename):
    pass
else:
    req = requests.get(url)
    # if the download failed, next line will raise an error
    assert req.status_code == 200
    with open(filename, 'wb') as f:
        f.write(req.content)

What is the assert doing there?

Let's play with assert. What should the following asserts do?

assert True == False, "You assert wrongly, sir!"
assert 'Dave' in instructors
assert function_that_returns_True_or_False(parameters)



In [ ]:

So when an assert statement is true, the code keeps executing and when it is false, it raises an exception (also known as an error).

We've all probably seen lots of exception. E.g.

def some_function(parameter):
    return

some_function()

some_dict = { }
print(some_dict['invalid key'])

'fourty' + 2

Like C++ and other languages, Python let's you raise your own exception. You can do it with raise (surprise!). Exceptions are special objects and you can create your own type of exceptions. For now, we are going to look at the simplest Exception.

We create an Exception object by calling the generator:

Exception()

This isn't very helpful. We really want to supply a description. The Exception object takes any number of strings. One good form if you are using the generic exception object is:

Exception('Short description', 'Long description')



In [ ]:

Creating an exception object isn't useful alone, however. We need to send it down the software stack to the Python interpreter so that it can handle the exception condition. We do this with raise.

raise Exception("An error has occurred.")

Now you can create your own error messages like a pro!



In [ ]:

DETOUR!

There are lots of types of exceptions beyond the generic class Exception. You can use them in your own code if they make sense. E.g.

import math
my_variable = math.inf
if my_variable == math.inf:
    raise ValueError('my_variable cannot be infinity')

List of Standard Exceptions −

EXCEPTION NAME	DESCRIPTION
Exception	Base class for all exceptions
StopIteration	Raised when the next() method of an iterator does not point to any object.
SystemExit	Raised by the sys.exit() function.
StandardError	Base class for all built-in exceptions except StopIteration and SystemExit.
ArithmeticError	Base class for all errors that occur for numeric calculation.
OverflowError	Raised when a calculation exceeds maximum limit for a numeric type.
FloatingPointError	Raised when a floating point calculation fails.
ZeroDivisonError	Raised when division or modulo by zero takes place for all numeric types.
AssertionError	Raised in case of failure of the Assert statement.
AttributeError	Raised in case of failure of attribute reference or assignment.
EOFError	Raised when there is no input from either the raw_input() or input() function and the end of file is reached.
ImportError	Raised when an import statement fails.
KeyboardInterrupt	Raised when the user interrupts program execution, usually by pressing Ctrl+c.
LookupError	Base class for all lookup errors.
IndexError KeyError	Raised when an index is not found in a sequence. Raised when the specified key is not found in the dictionary.
NameError	Raised when an identifier is not found in the local or global namespace.
UnboundLocalError EnvironmentError	Raised when trying to access a local variable in a function or method but no value has been assigned to it. Base class for all exceptions that occur outside the Python environment.
IOError IOError	Raised when an input/ output operation fails, such as the print statement or the open() function when trying to open a file that does not exist. Raised for operating system-related errors.
SyntaxError IndentationError	Raised when there is an error in Python syntax. Raised when indentation is not specified properly.
SystemError	Raised when the interpreter finds an internal problem, but when this error is encountered the Python interpreter does not exit.
SystemExit	Raised when Python interpreter is quit by using the sys.exit() function. If not handled in the code, causes the interpreter to exit.
Raised when Python interpreter is quit by using the sys.exit() function. If not handled in the code, causes the interpreter to exit.	Raised when an operation or function is attempted that is invalid for the specified data type.
ValueError	Raised when the built-in function for a data type has the valid type of arguments, but the arguments have invalid values specified.
RuntimeError	Raised when a generated error does not fall into any category.
NotImplementedError	Raised when an abstract method that needs to be implemented in an inherited class is not actually implemented.



In [ ]:

Put it all together... `assert` and `raise`

Breaking assert down, it is really just an if test followed by a raise. So the code below:

assert <some_test>, <message>

is equivalent to a short hand for:

if not <some_test>:
        raise AssertionError(<message>)

Prove it? OK.

instructors = ['Dorkus the Clown', 'Jim']
assert 'Dave' in instructors, "Dave isn't in the instructor list!"

instructors = ['Dorkus the Clown', 'Jim']
assert 'Dave' in instructors, "Dave isn't in the instructor list!"
if not 'Dave' in instructors:
    raise AssertionError("Dave isn't in the instructor list!")

Questions?

All of this was in preparation for some testing...

Can we write some quick tests that make sure our code is doing what we think it is? Something of the form:

corr_matrix = pairwise_row_correlations(my_sample_dataframe)
assert corr_matrix looks like what we expect, "The function is broken!"

What are the smallest units of code that we can test?

What asserts can we make for these pieces of code?

Remember, in computers, 1.0 does not necessarily = 1

Put the following in an empty cell:

.99999999999999999999

How can we test for two floating point numbers being (almost) equal? Pro tip: Google!



In [ ]:

From nothing to something wrap up

Here we created some functions from just a short description of our needs.

Before we wrote any code, we walked through the flow control and decided on the parts that were necessary.
Before we wrote any code, we created a simple test example with simple predictable output.
We wrote some code according to our specifications.
We wrote tests using assert to verify our code against the simple test example.

Next: errors, part 2; unit tests; debugging;

QUESTIONS?



In [ ]:

	id	SMILES_str	stoich_str	mass	pce	voc	jsc	e_homo_alpha	e_gap_alpha	e_lumo_alpha	tmp_smiles_str
0	655365	C1C=CC=C1c1cc2[se]c3c4occc4c4nsnc4c3c2cn1	C18H9N3OSSe	394.3151	5.161953	0.867601	91.567575	-5.467601	2.022944	-3.444656	C1=CC=C(C1)c1cc2[se]c3c4occc4c4nsnc4c3c2cn1
1	1245190	C1C=CC=C1c1cc2[se]c3c(ncc4ccccc34)c2c2=C[SiH2]...	C22H15NSeSi	400.4135	5.261398	0.504824	160.401549	-5.104824	1.630750	-3.474074	C1=CC=C(C1)c1cc2[se]c3c(ncc4ccccc34)c2c2=C[SiH...
2	65553	[SiH2]1C=CC2=C1C=C([SiH2]2)C1=Cc2[se]ccc2[SiH2]1	C12H12SeSi3	319.4448	6.138294	0.630274	149.887545	-5.230274	1.682250	-3.548025	C1=CC2=C([SiH2]1)C=C([SiH2]2)C1=Cc2[se]ccc2[Si...
3	720918	C1C=c2c3ccsc3c3[se]c4cc(oc4c3c2=C1)C1=CC=CC1	C20H12OSSe	379.3398	1.991366	0.242119	126.581347	-4.842119	1.809439	-3.032680	C1=CC=C(C1)c1cc2[se]c3c4sccc4c4=CCC=c4c3c2o1
4	1310744	C1C=CC=C1c1cc2[se]c3c(c4nsnc4c4ccncc34)c2c2ccc...	C24H13N3SSe	454.4137	5.605135	0.951911	90.622776	-5.551911	2.029717	-3.522194	C1=CC=C(C1)c1cc2[se]c3c(c4nsnc4c4ccncc34)c2c2c...
5	196637	C1C=CC=C1c1cc2[se]c3cc4ccsc4cc3c2[se]1	C17H10SSe2	404.2520	2.644436	0.587932	69.223461	-5.187932	2.201106	-2.986827	C1=CC=C(C1)c1cc2[se]c3cc4ccsc4cc3c2[se]1
6	262174	C1C=CC=C1c1cc2[se]c3c4occc4c4cscc4c3c2[se]1	C19H10OSSe2	444.2730	2.523057	0.397670	97.645325	-4.997670	1.982122	-3.015548	C1=CC=C(C1)c1cc2[se]c3c4occc4c4cscc4c3c2[se]1
7	393249	C1C=CC=C1c1cc2[se]c3cc4cccnc4cc3c2c2ccccc12	C24H15NSe	396.3495	3.115895	0.869140	55.174815	-5.469140	2.331815	-3.137325	C1=CC=C(C1)c1cc2[se]c3cc4cccnc4cc3c2c2ccccc12
8	35	C1C2=C([SiH2]C=C2)C=C1c1cc2occc2c2cscc12	C17H12OSSi	292.4328	2.743214	0.387106	109.062905	-4.987106	1.909966	-3.077141	C1=CC2=C([SiH2]1)C=C(C2)c1cc2occc2c2cscc12
9	1048612	C1C=CC=C1C1=Cc2sc3cc4C=C[SiH2]c4cc3c2C1	C18H14SSi	290.4606	2.408411	0.431315	85.937708	-5.031315	2.065850	-2.965465	C1=CC=C(C1)C1=Cc2sc3cc4C=C[SiH2]c4cc3c2C1
10	917542	C1C=c2ccc3[se]c4c5[se]c(cc5[se]c4c3c2=C1)C1=CC...	C20H12Se3	489.1948	2.843278	0.302591	144.614366	-4.902591	1.708198	-3.194393	C1=CC=C(C1)c1cc2[se]c3c([se]c4ccc5=CCC=c5c34)c...
11	1441831	C1C=CC=C1C1=Cc2ncc3c4[se]ccc4cnc3c2C1	C18H12N2Se	335.2668	2.687240	0.675497	61.225278	-5.275497	2.270953	-3.004544	C1=CC=C(C1)C1=Cc2ncc3c4[se]ccc4cnc3c2C1
12	1376296	C1C=CC=C1C1=Cc2c(C1)c1[se]c3ccc4cscc4c3c1c1=C[...	C24H16SSeSi	443.5024	2.844637	0.189206	231.387394	-4.789206	1.312334	-3.476872	C1=CC=C(C1)C1=Cc2c(C1)c1[se]c3ccc4cscc4c3c1c1=...
13	1638442	C1C=c2ccc3cnc4c5[SiH2]C(=Cc5c5nsnc5c4c3c2=C1)C...	C23H15N3SSi	393.5445	6.462512	0.602405	165.105179	-5.202405	1.603165	-3.599240	C1=CC=C(C1)C1=Cc2c([SiH2]1)c1ncc3ccc4=CCC=c4c3...
14	98350	C1C=CC=C1C1=Cc2ccc3c4CC=Cc4c4cscc4c3c2[SiH2]1	C22H16SSi	340.5204	2.631463	0.410851	98.573546	-5.010851	1.975707	-3.035144	C1=CC=C(C1)C1=Cc2ccc3c4CC=Cc4c4cscc4c3c2[SiH2]1
15	2162747	C1C=CC=C1C1=Cc2c([SiH2]1)c1c3c[nH]cc3c3ccc4=C[...	C27H19NOSi2	429.6251	2.039158	0.140744	222.981280	-4.740744	1.361137	-3.379607	C1=CC=C(C1)C1=Cc2c([SiH2]1)c1c3c[nH]cc3c3ccc4=...
16	557119	C1C=c2c3C=C(Cc3c3occc3c2=C1)C1=CC=CC1	C19H14O	258.3186	0.237205	0.024962	146.246545	-4.624962	1.700415	-2.924547	C1=CC=C(C1)C1=Cc2c(C1)c1occc1c1=CCC=c21
17	753728	C1C=CC=C1C1=Cc2c([SiH2]1)c1cc3ncccc3cc1c1c[nH]...	C22H16N2Si	336.4684	3.103831	0.409504	116.650708	-5.009504	1.863416	-3.146088	C1=CC=C(C1)C1=Cc2c([SiH2]1)c1cc3ncccc3cc1c1c[n...
18	819265	C1C=CC=C1C1=Cc2c([SiH2]1)c1c(c3cscc23)c2[se]cc...	C23H16SSeSi2	459.5774	5.385253	0.368606	224.848916	-4.968606	1.352309	-3.616298	C1=CC=C(C1)C1=Cc2c([SiH2]1)c1c(c3cscc23)c2[se]...
19	1278019	C1C=CC=C1C1=Cc2c([SiH2]1)c1c(c3[SiH2]C=Cc3c3=C...	C23H18OSi3	394.6522	5.489489	0.301242	280.455932	-4.901242	1.135619	-3.765623	C1=CC=C(C1)C1=Cc2c([SiH2]1)c1c(c3[SiH2]C=Cc3c3...
20	2096063	C1C=CC=C1c1cc2[se]c3c(c2c2cscc12)c1ccccc1c1ccc...	C27H14N2S2Se	509.5136	6.204093	0.570055	167.497914	-5.170055	1.593078	-3.576977	C1=CC=C(C1)c1cc2[se]c3c(c2c2cscc12)c1ccccc1c1c...
21	1572945	C1C=CC=C1C1=Cc2[se]c3c4sccc4c4ccccc4c3c2C1	C22H14SSe	389.3786	2.167252	0.330623	100.884304	-4.930623	1.961253	-2.969370	C1=CC=C(C1)C1=Cc2[se]c3c4sccc4c4ccccc4c3c2C1
22	2359381	C1C=CC=C1C1=Cc2c(C1)c1c3cscc3c3ccc4nsnc4c3c1c1...	C26H14N2OS2	434.5416	4.112982	0.299549	211.318161	-4.899549	1.409229	-3.490319	C1=CC=C(C1)C1=Cc2c(C1)c1c3cscc3c3ccc4nsnc4c3c1...
23	1540183	C1C=CC=C1c1cc2[se]c3c([se]c4ccc5cscc5c34)c2cn1	C20H11NSSe2	455.2999	3.212565	0.683568	72.329945	-5.283568	2.174712	-3.108856	C1=CC=C(C1)c1cc2[se]c3c([se]c4ccc5cscc5c34)c2cn1
24	1638500	C1C=CC=C1c1cc2[se]c3ccc4ccccc4c3c2c2cocc12	C23H14OSe	385.3226	3.088844	0.482262	98.573546	-5.082262	1.977235	-3.105027	C1=CC=C(C1)c1cc2[se]c3ccc4ccccc4c3c2c2cocc12
25	2621542	C1C=c2c3ccccc3c3c4ccccc4c4C=C(Cc4c3c2=C1)C1=CC...	C29H20	368.4770	2.552886	0.341115	115.180406	-4.941115	1.872759	-3.068355	C1=CC=C(C1)C1=Cc2c(C1)c1c(c3ccccc23)c2ccccc2c2...
26	98411	C1C=CC=C1c1cc2[se]c3cc4cccnc4cc3c2c2cscc12	C22H13NSSe	402.3777	4.247356	0.653960	99.957476	-5.253960	1.967245	-3.286715	C1=CC=C(C1)c1cc2[se]c3cc4cccnc4cc3c2c2cscc12
27	524398	C1C=c2c3C=C([SiH2]c3c3ncc4ccc5nsnc5c4c3c2=C1)C...	C23H15N3SSi	393.5445	5.860942	0.497394	181.348711	-5.097394	1.533947	-3.563447	C1=CC=C(C1)C1=Cc2c([SiH2]1)c1ncc3ccc4nsnc4c3c1...
28	131187	C1C=c2c3ccc4nsnc4c3c3cnc4C=C(Cc4c3c2=C1)C1=CC=CC1	C24H15N3S	377.4695	6.517681	0.691659	145.026911	-5.291659	1.706854	-3.584805	C1=CC=C(C1)C1=Cc2ncc3c(c2C1)c1=CCC=c1c1ccc2nsn...
29	163960	C1C=CC=C1C1=Cc2ncc3c4CC=Cc4ccc3c2[SiH2]1	C19H15NSi	285.4205	3.235009	0.585638	85.014628	-5.185638	2.071184	-3.114454	C1=CC=C(C1)C1=Cc2ncc3c4CC=Cc4ccc3c2[SiH2]1
...	...	...	...	...	...	...	...	...	...	...	...
1106468	1779493	c1cc2c3nsnc3c3c(ncc4cc(-c5cccc6c[nH]cc56)c5csc...	C25H12N4S2Se	511.4898	4.404175	0.608078	111.468683	-5.208078	1.893235	-3.314843	NaN
1106469	2860840	c1cc2c3nsnc3c3c(ncc4cc(-c5cccc6nsnc56)c5nsnc5c...	C21H7N7S3Se	532.4933	6.515421	1.336547	75.024985	-5.936547	2.152054	-3.784493	NaN
1106470	1222442	C1C(=Cc2[se]c3c4occc4c4nsnc4c3c12)c1cccc2ccccc12	C23H12N2OSSe	443.3868	4.398127	0.683511	99.030648	-5.283511	1.973899	-3.309613	c1cc2c3nsnc3c3c4CC(=Cc4[se]c3c2o1)c1cccc2ccccc12
1106471	3090232	[SiH2]1C=Cc2c1csc2-c1cc2ccc3c4occc4c4nsnc4c3c2...	C24H12N2O2S2Si	452.5888	4.193127	0.839972	76.828261	-5.439972	2.136325	-3.303646	c1cc2c3nsnc3c3c(ccc4cc(-c5scc6[SiH2]C=Cc56)c5c...
1106472	206659	c1csc(n1)-c1cc2ccc3c4occc4c4nsnc4c3c2c2cscc12	C21H9N3OS3	415.5201	4.589759	0.887008	79.636066	-5.487008	2.114699	-3.372309	c1cc2c3nsnc3c3c(ccc4cc(-c5nccs5)c5cscc5c34)c2o1
1106473	2434889	c1occ2c(cccc12)-c1cc2oc3c(c4nsnc4c4ccncc34)c2c...	C25H11N3O2S2	449.5129	5.665579	0.809923	107.658433	-5.409923	1.917514	-3.492409	c1cc2c3nsnc3c3c(oc4cc(-c5cccc6cocc56)c5cscc5c3...
1106474	960331	c1cc2c3nsnc3c3c(ccc4cc(cnc34)-c3scc4[se]ccc34)...	C21H9N3S3Se	478.4811	5.081765	0.914993	85.475998	-5.514993	2.067645	-3.447349	NaN
1106475	1681228	[SiH2]1C(=Cc2c1c1c3nsnc3c3ccoc3c1c1ccccc21)c1c...	C23H11N5OS2Si	465.5919	10.033001	0.953904	161.872709	-5.553904	1.617546	-3.936358	c1cc2c3nsnc3c3c4[SiH2]C(=Cc4c4ccccc4c3c2o1)c1c...
1106476	1517392	C1C(=Cc2sc3c4sccc4c4nsnc4c3c12)c1scc2C=C[SiH2]c12	C19H10N2S4Si	422.6520	5.013859	0.701342	110.024660	-5.301342	1.904229	-3.397113	c1cc2c3nsnc3c3c4CC(=Cc4sc3c2s1)c1scc2C=C[SiH2]c12
1106477	2598739	C1C=c2cccc(C3=Cc4c([SiH2]3)c3c5nsnc5c5ccoc5c3c...	C25H15N3OSSi	433.5655	3.112296	0.268163	178.619700	-4.868163	1.545688	-3.322475	c1cc2c3nsnc3c3c4[SiH2]C(=Cc4c4c[nH]cc4c3c2o1)c...
1106478	763733	[SiH2]1C(=Cc2[se]c3c(c12)c1nsnc1c1ccc2cscc2c31...	C19H9N5S2SeSi	478.4931	9.147375	1.082907	130.002865	-5.682907	1.788135	-3.894772	c1cc2c3nsnc3c3c4[SiH2]C(=Cc4[se]c3c2c2cscc12)c...
1106479	42846	c1cc2csc(-c3cc4sc5c6[se]ccc6c6nsnc6c5c4c4cscc3...	C22H8N2S5Se	539.6092	5.518285	0.632665	134.238713	-5.232665	1.763726	-3.468940	c1cc2c3nsnc3c3c(sc4cc(-c5scc6ccsc56)c5cscc5c34...
1106480	272226	[SiH2]1C=c2c(cc3sc4c5occc5c5nsnc5c4c3c2=C1)-c1...	C20H10N2OS2SeSi	465.4900	6.291725	0.642023	150.822679	-5.242023	1.676692	-3.565331	c1cc2c3nsnc3c3c(sc4cc(-c5ccc[se]5)c5=C[SiH2]C=...
1106481	2271076	c1cc2c3nsnc3c3c(sc4cc(-c5scc6cc[se]c56)c5ccccc...	C24H10N2OS3Se	517.5140	4.768060	0.797364	92.030673	-5.397364	2.021933	-3.375431	NaN
1106482	1124198	C1C(=Cc2c1c1c3nsnc3c3ccoc3c1c1ccccc21)c1ccccn1	C24H13N3OS	391.4527	4.974625	0.765936	99.957476	-5.365936	1.964935	-3.401001	c1cc2c3nsnc3c3c4CC(=Cc4c4ccccc4c3c2o1)c1ccccn1
1106483	1582951	[SiH2]1C=c2cccc(C3=Cc4cnc5c6cnccc6c6nsnc6c5c4[...	C22H14N4SSi2	422.6186	8.389379	0.910843	141.753503	-5.510843	1.725547	-3.785296	c1cc2c3nsnc3c3c4[SiH2]C(=Cc4cnc3c2cn1)c1cccc2=...
1106484	1058666	[SiH2]1C=Cc2c1csc2-c1cc2ncc3c4occc4c4nsnc4c3c2cn1	C20H10N4OS2Si	414.5440	7.680649	1.171715	100.884304	-5.771715	1.960875	-3.810840	c1cc2c3nsnc3c3c(cnc4cc(ncc34)-c3scc4[SiH2]C=Cc...
1106485	370546	[SiH2]1C=c2c(cc3sc4c5sccc5c5nsnc5c4c3c2=C1)-c1...	C22H12N2S3Si	428.6348	6.067688	0.676091	138.123000	-5.276091	1.744040	-3.532051	c1cc2c3nsnc3c3c(sc4cc(-c5ccccc5)c5=C[SiH2]C=c5...
1106486	894837	C1C=c2cccc(-c3cc4ccc5c6sccc6c6nsnc6c5c4cn3)c2=C1	C24H13N3S2	407.5197	5.638177	0.479687	180.895837	-5.079687	1.535536	-3.544151	c1cc2c3nsnc3c3c(ccc4cc(ncc34)-c3cccc4=CCC=c34)...
1106487	2205559	[SiH2]1C=c2cccc(-c3cc4ncc5c6occc6c6nsnc6c5c4c4...	C23H11N5OS2Si	465.5919	6.928005	0.823054	129.547072	-5.423054	1.789620	-3.633434	c1cc2c3nsnc3c3c(cnc4cc(-c5cccc6=C[SiH2]C=c56)c...
1106488	141179	[SiH2]1C=Cc2csc(c12)-c1cc2ncc3c4sccc4c4nsnc4c3...	C21H9N5S4Si	487.6871	6.762845	1.172927	88.737230	-5.772927	2.044499	-3.728428	c1cc2c3nsnc3c3c(cnc4cc(-c5scc6C=C[SiH2]c56)c5n...
1106489	1091453	C1C(=Cc2cnc3c4sccc4c4nsnc4c3c12)c1cccc2nsnc12	C20H9N5S3	415.5241	7.230940	1.128969	98.573546	-5.728969	1.974714	-3.754254	c1cc2c3nsnc3c3c4CC(=Cc4cnc3c2s1)c1cccc2nsnc12
1106490	2303876	[SiH2]1C=Cc2csc(C3=Cc4c([SiH2]3)c3c5nsnc5c5cc[...	C22H12N2S3SeSi2	535.6808	7.524076	0.713512	162.292795	-5.313512	1.615859	-3.697653	c1cc2c3nsnc3c3c4[SiH2]C(=Cc4c4cscc4c3c2[se]1)c...
1106491	1648533	[SiH2]1C=c2c(cc3ccc4c5[se]ccc5c5nsnc5c4c3c2=C1...	C23H11N5S2SeSi	528.5529	10.055248	0.886720	174.523481	-5.486720	1.562081	-3.924639	c1cc2c3nsnc3c3c(ccc4cc(-c5cncc6nsnc56)c5=C[SiH...
1106492	829339	c1cc2c3nsnc3c3c(ccc4cc(-c5scc6sccc56)c5cocc5c3...	C24H10N2O2S3	454.5530	4.369276	0.695215	96.724891	-5.295215	1.989505	-3.305709	NaN
1106493	2729884	c1cc2c3nsnc3c3c(sc4cc(-c5cccc6nsnc56)c5ccccc5c...	C24H10N4S3Se	529.5290	5.785201	1.025143	86.852379	-5.625143	2.056575	-3.568567	NaN
1106494	1779614	[SiH2]1C=Cc2csc(c12)-c1cc2cnc3c4[se]ccc4c4nsnc...	C21H9N5S3SeSi	534.5811	7.293623	1.213582	92.495763	-5.813582	2.018996	-3.794586	c1cc2c3nsnc3c3c(ncc4cc(-c5scc6C=C[SiH2]c56)c5n...
1106495	1943455	C1C=c2cccc(-c3cc4ncc5c6sccc6c6nsnc6c5c4c4cscc3...	C26H13N3S3	463.6077	5.619779	0.591400	146.246545	-5.191400	1.699286	-3.492114	c1cc2c3nsnc3c3c(cnc4cc(-c5cccc6=CCC=c56)c5cscc...
1106496	1779616	[SiH2]1C(=Cc2c1c1c3nsnc3c3cc[se]c3c1c1ccccc21)...	C26H15N3SSeSi	508.5375	4.886015	0.426423	176.344363	-5.026423	1.555438	-3.470986	c1cc2c3nsnc3c3c4[SiH2]C(=Cc4c4ccccc4c3c2[se]1)...
1106497	239522	[SiH2]1C(=Cc2c1c1c3nsnc3c3ccsc3c1c1ccccc21)c1c...	C23H13N3S2Si	423.5947	6.313634	1.019342	95.325067	-5.619342	1.997782	-3.621560	c1cc2c3nsnc3c3c4[SiH2]C(=Cc4c4ccccc4c3c2s1)c1c...