Procedural programming in python

Topics

Flow control, part 2
- Functions
- In class exercise:
  - Functionalize this!
- From nothing to something:
  - Pairwise correlation between rows in a pandas dataframe
  - Sketch of the process
  - In class exercise:
    - Write the code!
  - Rejoining, sharing ideas, problems, thoughts

Flow control

Flow control figure

Flow control refers how to programs do loops, conditional execution, and order of functional operations.

If

If statements can be use to execute some lines or block of code if a particular condition is satisfied. E.g. Let's print something based on the entries in the list.



In [ ]:

    
instructors = ['Dave', 'Jim', 'Dorkus the Clown']

if 'Dorkus the Clown' in instructors:
    print('#fakeinstructor')

There is a special do nothing word: pass that skips over some arm of a conditional, e.g.



In [ ]:

    
if 'Jim' in instructors:
    print("Congratulations!  Jim is teaching, your class won't stink!")
else:
    pass

For

For loops are the standard loop, though while is also common. For has the general form:

for items in list:
    do stuff

For loops and collections like tuples, lists and dictionaries are natural friends.



In [ ]:

    
for instructor in instructors:
    print(instructor)

You can combine loops and conditionals:



In [ ]:

    
for instructor in instructors:
    if instructor.endswith('Clown'):
        print(instructor + " doesn't sound like a real instructor name!")
    else:
        print(instructor + " is so smart... all those gooey brains!")

range()

Since for operates over lists, it is common to want to do something like:

NOTE: C-like
for (i = 0; i < 3; ++i) {
    print(i);
}

The Python equivalent is:

for i in [0, 1, 2]:
    do something with i

What happens when the range you want to sample is big, e.g.

NOTE: C-like
for (i = 0; i < 1000000000; ++i) {
    print(i);
}

That would be a real pain in the rear to have to write out the entire list from 1 to 1000000000.

Enter, the range() function. E.g. range(3) is [0, 1, 2]



In [1]:

    
sum = 0
for i in range(10):
    sum += i
print(sum)



In [ ]:

    
data.head()

Now, use your code from above for the following URLs and filenames

URL	filename	csv_filename
http://faculty.washington.edu/dacb/HCEPDB_moldata_set1.zip	HCEPDB_moldata_set1.zip	HCEPDB_moldata_set1.csv
http://faculty.washington.edu/dacb/HCEPDB_moldata_set2.zip	HCEPDB_moldata_set2.zip	HCEPDB_moldata_set2.csv
http://faculty.washington.edu/dacb/HCEPDB_moldata_set3.zip	HCEPDB_moldata_set3.zip	HCEPDB_moldata_set3.csv

What pieces of the data structures and flow control that we talked about earlier can you use?



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:

How did you solve this problem?

Functions

For loops let you repeat some code for every item in a list. Functions are similar in that they run the same lines of code for new values of some variable. They are different in that functions are not limited to looping over items.

Functions are a critical part of writing easy to read, reusable code.

Create a function like:

def function_name (parameters):
    """
    docstring
    """
    function expressions
    return [variable]

Note: Sometimes I use the word argument in place of parameter.

Here is a simple example. It prints a string that was passed in and returns nothing.



In [20]:

    
def print_string(str):
    """This prints out a string passed as the parameter."""
    print(str)
    for c in str:
        print(c)
        if c == 'r':
            break
    print("done")
    return



In [21]:

    
print_string("string")









    



string
s
t
r
done

To call the function, use:

print_string("Dave is awesome!")

Note: The function has to be defined before you can call it!



In [ ]:

    
print_string("Dave is awesome!")

If you don't provide an argument or too many, you get an error.



In [22]:

    
print_string()









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-ad26026057f7> in <module>()
----> 1 print_string()

TypeError: print_string() missing 1 required positional argument: 'str'

Parameters (or arguments) in Python are all passed by reference. This means that if you modify the parameters in the function, they are modified outside of the function.

See the following example:

def change_list(my_list):
   """This changes a passed list into this function"""
   my_list.append('four');
   print('list inside the function: ', my_list)
   return

my_list = [1, 2, 3];
print('list before the function: ', my_list)
change_list(my_list);
print('list after the function: ', my_list)



In [23]:

    
def change_list(my_list):
   """This changes a passed list into this function"""
   my_list.append('four');
   print('list inside the function: ', my_list)
   return

my_list = [1, 2, 3];
print('list before the function: ', my_list)
change_list(my_list);
print('list after the function: ', my_list)









    



list before the function:  [1, 2, 3]
list inside the function:  [1, 2, 3, 'four']
list after the function:  [1, 2, 3, 'four']

Variables have scope: global and local

In a function, new variables that you create are not saved when the function returns - these are local variables. Variables defined outside of the function can be accessed but not changed - these are global variables, Note there is a way to do this with the global keyword. Generally, the use of global variables is not encouraged, instead use parameters.

my_global_1 = 'bad idea'
my_global_2 = 'another bad one'
my_global_3 = 'better idea'

def my_function():
    print(my_global_1)
    my_global_2 = 'broke your global, man!'
    global my_global_3
    my_global_3 = 'still a better idea'
    return

my_function()
print(my_global_2)
print(my_global_3)



In [25]:

    
my_global_1 = 'bad idea'
my_global_2 = 'another bad one'
my_global_3 = 'better idea'

def my_function():
    print(my_global_1)
    my_global_2 = 'broke your global, man!'
    print(my_global_2)
    global my_global_3
    my_global_3 = 'still a better idea'
    return

my_function()
print(my_global_2)
print(my_global_3)









    



bad idea
broke your global, man!
another bad one
still a better idea

In general, you want to use parameters to provide data to a function and return a result with the return. E.g.

def sum(x, y):
    my_sum = x + y
    return my_sum

If you are going to return multiple objects, what data structure that we talked about can be used? Give and example below.



In [30]:

    
def a_function(parameter):
    return None



In [31]:

    
foo = a_function('bar')
print(foo)









    



None

Parameters have three different types:

type	behavior
required	positional, must be present or error, e.g. `my_func(first_name, last_name)`
keyword	position independent, e.g. `my_func(first_name, last_name)` can be called `my_func(first_name='Dave', last_name='Beck')` or `my_func(last_name='Beck', first_name='Dave')`
default	keyword params that default to a value if not provided



In [32]:

    
def print_name(first, last='the Clown'):
    print('Your name is %s %s' % (first, last))
    return

Take a minute and play around with the above function. Which are required? Keyword? Default?



In [34]:

    
def massive_correlation_analysis(data, method='pearson'):
    pass
    return

Functions can contain any code that you put anywhere else including:

if...elif...else
for...else
while
other function calls



In [39]:

    
def print_name_age(first, last, age):
    print_name(first, last)
    print('Your age is %d' % (age))
    print('Your age is ' + str(age))
    if age > 35:
        print('You are really old.')
    return



In [40]:

    
print_name_age(age=40, last='Beck', first='Dave')









    



Your name is Dave Beck
Your age is 40
Your age is 40
You are really old.

Once you have some code that is functionalized and not going to change, you can move it to a file that ends in .py, check it into version control, import it into your notebook and use it!

Let's do this now for the above two functions.

...

See you after the break!

Import the function...



In [ ]:

Call them!



In [ ]:

Hacky Hack Time with Functions!

Notes from last class:

The os package has tools for checking if a file exists: os.path.exists

import os
filename = 'HCEPDB_moldata.zip'
if os.path.exists(filename):
  print("wahoo!")

Use the requests package to get the file given a url (got this from the requests docs)

import requests
url = 'http://faculty.washington.edu/dacb/HCEPDB_moldata.zip'
req = requests.get(url)
assert req.status_code == 200 # if the download failed, this line will generate an error
with open(filename, 'wb') as f:
  f.write(req.content)

Use the zipfile package to decompress the file while reading it into pandas

import pandas as pd
import zipfile
csv_filename = 'HCEPDB_moldata.csv'
zf = zipfile.ZipFile(filename)
data = pd.read_csv(zf.open(csv_filename))

Here was my solution

import os
import requests
import pandas as pd
import zipfile

filename = 'HCEPDB_moldata.zip'
url = 'http://faculty.washington.edu/dacb/HCEPDB_moldata.zip'
csv_filename = 'HCEPDB_moldata.csv'

if os.path.exists(filename):
    pass
else:
    req = requests.get(url)
    assert req.status_code == 200 # if the download failed, this line will generate an error
    with open(filename, 'wb') as f:
        f.write(req.content)

zf = zipfile.ZipFile(filename)
data = pd.read_csv(zf.open(csv_filename))

In class exercise

5-10 minutes

Objective: How would you functionalize the code for downloading, unzipping, and making a dataframe?

Bonus! Add the the code to a file `HCEPDB_utils.py` and import it!



In [ ]:

    
def download_if_not_exists(filename):
    if os.path.exists(filename):
        pass
    else:
        req = requests.get(url)
        assert req.status_code == 200 # if the download failed, this line will generate an error
        with open(filename, 'wb') as f:
            f.write(req.content)



In [ ]:



In [ ]:



In [ ]:



In [ ]:

How many functions did you use?

Why did you choose to use functions for these pieces?

From something to nothing

Task: Compute the pairwise Pearson correlation between rows in a dataframe.

Let's say we have three molecules (A, B, C) with three measurements each (v1, v2, v3). So for each molecule we have a vector of measurements:

$$X=\begin{bmatrix} X_{v_{1}} \\ X_{v_{2}} \\ X_{v_{3}} \\ \end{bmatrix} $$

Where X is a molecule and the components are the values for each of the measurements. These make up the rows in our matrix.

Often, we want to compare molecules to determine how similar or different they are. One measure is the Pearson correlation.

Pearson correlation:

Expressed graphically, when you plot the paired measurements for two samples (in this case molecules) against each other you can see positively correlated, no correlation, and negatively correlated. Eg.

Simple input dataframe (note when you are writing code it is always a good idea to have a simple test case where you can readily compute by hand or know the output):

index	v1	v3
A	-1	1
B	1	-1
C	.5	.5

If the above is a dataframe what shape and size is the output?

Whare are some unique features of the output?

For our test case, what will the output be?

	A	B	C
A	1	-1	0
B	-1	1	0
C	0	0	1

Let's sketch the idea...



In [ ]:



In [ ]:

In class exercise

20-30 minutes

Objectives:

Write code using functions to compute the pairwise Pearson correlation between rows in a pandas dataframe. You will have to use for and possibly if.
Use a cell to test each function with an input that yields an expected output. Think about the shape and values of the outputs.
Put the code in a .py file in the directory with the Jupyter notebook, import and run!



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [1]:

    
import pandas as pd
import math



In [7]:

    
df = pd.read_csv('HCEPDB_moldata.csv')









    



/Users/Mandy/anaconda/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2717: DtypeWarning: Columns (0,3,4,5,6,7,8,9) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)



In [8]:

    
df









    Out[8]:






  
    
      
      0
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
    
  
  
    
      0
      id
      SMILES_str
      stoich_str
      mass
      pce
      voc
      jsc
      e_homo_alpha
      e_gap_alpha
      e_lumo_alpha
      tmp_smiles_str
    
    
      1
      655365
      C1C=CC=C1c1cc2[se]c3c4occc4c4nsnc4c3c2cn1
      C18H9N3OSSe
      394.3151
      5.16195320211971
      0.86760078740294
      91.5675749599
      -5.46760078740294
      2.02294443593306
      -3.44465635146988
      C1=CC=C(C1)c1cc2[se]c3c4occc4c4nsnc4c3c2cn1
    
    
      2
      1245190
      C1C=CC=C1c1cc2[se]c3c(ncc4ccccc34)c2c2=C[SiH2]...
      C22H15NSeSi
      400.4135
      5.2613977233692
      0.50482419467609
      160.40154923845
      -5.10482419467609
      1.63075003826037
      -3.47407415641572
      C1=CC=C(C1)c1cc2[se]c3c(ncc4ccccc34)c2c2=C[SiH...
    
    
      3
      21847
      C1C=c2ccc3c4c[nH]cc4c4c5[SiH2]C(=Cc5oc4c3c2=C1...
      C24H17NOSi
      363.4903
      0
      0
      197.47477990435
      -4.53952567287262
      1.46215815756611
      -3.07736751530651
      C1=CC=C(C1)C1=Cc2oc3c(c2[SiH2]1)c1c[nH]cc1c1cc...
    
    
      4
      65553
      [SiH2]1C=CC2=C1C=C([SiH2]2)C1=Cc2[se]ccc2[SiH2]1
      C12H12SeSi3
      319.4448
      6.13829369542661
      0.63027445338351
      149.88754514825
      -5.23027445338351
      1.6822495770534
      -3.54802487633011
      C1=CC2=C([SiH2]1)C=C([SiH2]2)C1=Cc2[se]ccc2[Si...
    
    
      5
      720918
      C1C=c2c3ccsc3c3[se]c4cc(oc4c3c2=C1)C1=CC=CC1
      C20H12OSSe
      379.3398
      1.99136566470237
      0.242119009470801
      126.58134716045
      -4.8421190094708
      1.80943882203271
      -3.03268018743809
      C1=CC=C(C1)c1cc2[se]c3c4sccc4c4=CCC=c4c3c2o1
    
    
      6
      1310744
      C1C=CC=C1c1cc2[se]c3c(c4nsnc4c4ccncc34)c2c2ccc...
      C24H13N3SSe
      454.4137
      5.60513478857347
      0.95191087183926
      90.62277586765
      -5.55191087183926
      2.02971670891245
      -3.52219416292681
      C1=CC=C(C1)c1cc2[se]c3c(c4nsnc4c4ccncc34)c2c2c...
    
    
      7
      196637
      C1C=CC=C1c1cc2[se]c3cc4ccsc4cc3c2[se]1
      C17H10SSe2
      404.252
      2.64443641930939
      0.587932414406801
      69.2234614721
      -5.1879324144068
      2.20110577558483
      -2.98682663882197
      C1=CC=C(C1)c1cc2[se]c3cc4ccsc4cc3c2[se]1
    
    
      8
      262174
      C1C=CC=C1c1cc2[se]c3c4occc4c4cscc4c3c2[se]1
      C19H10OSSe2
      444.273
      2.52305655873057
      0.39767026257405
      97.64532544975
      -4.99767026257405
      1.98212181531837
      -3.01554844725568
      C1=CC=C(C1)c1cc2[se]c3c4occc4c4cscc4c3c2[se]1
    
    
      9
      393249
      C1C=CC=C1c1cc2[se]c3cc4cccnc4cc3c2c2ccccc12
      C24H15NSe
      396.3495
      3.1158951050846
      0.86913959183236
      55.174814587685
      -5.46913959183236
      2.33181477476568
      -3.13732481706668
      C1=CC=C(C1)c1cc2[se]c3cc4cccnc4cc3c2c2ccccc12
    
    
      10
      35
      C1C2=C([SiH2]C=C2)C=C1c1cc2occc2c2cscc12
      C17H12OSSi
      292.4328
      2.74321377891055
      0.38710624740493
      109.06290475405
      -4.98710624740493
      1.90996574187542
      -3.07714050552951
      C1=CC2=C([SiH2]1)C=C(C2)c1cc2occc2c2cscc12
    
    
      11
      1048612
      C1C=CC=C1C1=Cc2sc3cc4C=C[SiH2]c4cc3c2C1
      C18H14SSi
      290.4606
      2.40841131373757
      0.43131491941631
      85.9377076701
      -5.03131491941631
      2.06584966433715
      -2.96546525507916
      C1=CC=C(C1)C1=Cc2sc3cc4C=C[SiH2]c4cc3c2C1
    
    
      12
      917542
      C1C=c2ccc3[se]c4c5[se]c(cc5[se]c4c3c2=C1)C1=CC...
      C20H12Se3
      489.1948
      2.84327790532769
      0.3025906196108
      144.6143656087
      -4.9025906196108
      1.70819762918304
      -3.19439299042776
      C1=CC=C(C1)c1cc2[se]c3c([se]c4ccc5=CCC=c5c34)c...
    
    
      13
      1441831
      C1C=CC=C1C1=Cc2ncc3c4[se]ccc4cnc3c2C1
      C18H12N2Se
      335.2668
      2.68724019638341
      0.67549682028117
      61.225277938305
      -5.27549682028117
      2.27095328753055
      -3.00454353275062
      C1=CC=C(C1)C1=Cc2ncc3c4[se]ccc4cnc3c2C1
    
    
      14
      1376296
      C1C=CC=C1C1=Cc2c(C1)c1[se]c3ccc4cscc4c3c1c1=C[...
      C24H16SSeSi
      443.5024
      2.8446368983132
      0.18920592502927
      231.38739350415
      -4.78920592502927
      1.31233370868624
      -3.47687221634303
      C1=CC=C(C1)C1=Cc2c(C1)c1[se]c3ccc4cscc4c3c1c1=...
    
    
      15
      1638442
      C1C=c2ccc3cnc4c5[SiH2]C(=Cc5c5nsnc5c4c3c2=C1)C...
      C23H15N3SSi
      393.5445
      6.46251246238048
      0.60240460581576
      165.1051792767
      -5.20240460581576
      1.60316496595707
      -3.59923963985869
      C1=CC=C(C1)C1=Cc2c([SiH2]1)c1ncc3ccc4=CCC=c4c3...
    
    
      16
      98350
      C1C=CC=C1C1=Cc2ccc3c4CC=Cc4c4cscc4c3c2[SiH2]1
      C22H16SSi
      340.5204
      2.63146328874209
      0.410851163619401
      98.57354638625
      -5.0108511636194
      1.97570703051256
      -3.03514413310684
      C1=CC=C(C1)C1=Cc2ccc3c4CC=Cc4c4cscc4c3c2[SiH2]1
    
    
      17
      2162747
      C1C=CC=C1C1=Cc2c([SiH2]1)c1c3c[nH]cc3c3ccc4=C[...
      C27H19NOSi2
      429.6251
      2.03915811352424
      0.14074406290405
      222.981280483
      -4.74074406290405
      1.36113723091331
      -3.37960683199074
      C1=CC=C(C1)C1=Cc2c([SiH2]1)c1c3c[nH]cc3c3ccc4=...
    
    
      18
      557119
      C1C=c2c3C=C(Cc3c3occc3c2=C1)C1=CC=CC1
      C19H14O
      258.3186
      0.237204563447386
      0.0249623237532005
      146.24654523115
      -4.6249623237532
      1.70041519990913
      -2.92454712384407
      C1=CC=C(C1)C1=Cc2c(C1)c1occc1c1=CCC=c21
    
    
      19
      753728
      C1C=CC=C1C1=Cc2c([SiH2]1)c1cc3ncccc3cc1c1c[nH]...
      C22H16N2Si
      336.4684
      3.10383123118601
      0.409504148061471
      116.65070843205
      -5.00950414806147
      1.8634156621733
      -3.14608848588817
      C1=CC=C(C1)C1=Cc2c([SiH2]1)c1cc3ncccc3cc1c1c[n...
    
    
      20
      819265
      C1C=CC=C1C1=Cc2c([SiH2]1)c1c(c3cscc23)c2[se]cc...
      C23H16SSeSi2
      459.5774
      5.38525291629117
      0.368606419249421
      224.8489157226
      -4.96860641924942
      1.35230882836041
      -3.61629759088901
      C1=CC=C(C1)C1=Cc2c([SiH2]1)c1c(c3cscc23)c2[se]...
    
    
      21
      1278019
      C1C=CC=C1C1=Cc2c([SiH2]1)c1c(c3[SiH2]C=Cc3c3=C...
      C23H18OSi3
      394.6522
      5.48948942078778
      0.30124157478828
      280.45593203485
      -4.90124157478828
      1.13561905617574
      -3.76562251861254
      C1=CC=C(C1)C1=Cc2c([SiH2]1)c1c(c3[SiH2]C=Cc3c3...
    
    
      22
      2096063
      C1C=CC=C1c1cc2[se]c3c(c2c2cscc12)c1ccccc1c1ccc...
      C27H14N2S2Se
      509.5136
      6.20409348575883
      0.570054683857091
      167.49791375135
      -5.17005468385709
      1.59307772190982
      -3.57697696194727
      C1=CC=C(C1)c1cc2[se]c3c(c2c2cscc12)c1ccccc1c1c...
    
    
      23
      2752585
      C1C=CC=C1C1=Cc2c(C1)c1c(c3c[nH]cc23)c2c3c[nH]c...
      C28H20N2Si
      412.566
      0
      0
      198.7499142499
      -4.49944701123171
      1.45720787683099
      -3.04223913440072
      C1=CC=C(C1)C1=Cc2c(C1)c1c(c3c[nH]cc23)c2c3c[nH...
    
    
      24
      1572945
      C1C=CC=C1C1=Cc2[se]c3c4sccc4c4ccccc4c3c2C1
      C22H14SSe
      389.3786
      2.16725162664095
      0.33062319718781
      100.8843043156
      -4.93062319718781
      1.96125330337018
      -2.96936989381763
      C1=CC=C(C1)C1=Cc2[se]c3c4sccc4c4ccccc4c3c2C1
    
    
      25
      2359381
      C1C=CC=C1C1=Cc2c(C1)c1c3cscc3c3ccc4nsnc4c3c1c1...
      C26H14N2OS2
      434.5416
      4.11298236915351
      0.29954882002972
      211.3181606601
      -4.89954882002972
      1.40922933873152
      -3.4903194812982
      C1=CC=C(C1)C1=Cc2c(C1)c1c3cscc3c3ccc4nsnc4c3c1...
    
    
      26
      1540183
      C1C=CC=C1c1cc2[se]c3c([se]c4ccc5cscc5c34)c2cn1
      C20H11NSSe2
      455.2999
      3.21256529851421
      0.68356751158616
      72.32994545335
      -5.28356751158616
      2.17471153849587
      -3.10885597309029
      C1=CC=C(C1)c1cc2[se]c3c([se]c4ccc5cscc5c34)c2cn1
    
    
      27
      1638500
      C1C=CC=C1c1cc2[se]c3ccc4ccccc4c3c2c2cocc12
      C23H14OSe
      385.3226
      3.08884387902497
      0.48226213429058
      98.57354638625
      -5.08226213429058
      1.97723538425119
      -3.10502675003939
      C1=CC=C(C1)c1cc2[se]c3ccc4ccccc4c3c2c2cocc12
    
    
      28
      2621542
      C1C=c2c3ccccc3c3c4ccccc4c4C=C(Cc4c3c2=C1)C1=CC...
      C29H20
      368.477
      2.55288566054067
      0.34111463011233
      115.18040593475
      -4.94111463011233
      1.87275914683688
      -3.06835548327545
      C1=CC=C(C1)C1=Cc2c(C1)c1c(c3ccccc23)c2ccccc2c2...
    
    
      29
      98411
      C1C=CC=C1c1cc2[se]c3cc4cccnc4cc3c2c2cscc12
      C22H13NSSe
      402.3777
      4.24735634211112
      0.65395973081254
      99.9574755777
      -5.25395973081254
      1.96724506127822
      -3.28671466953432
      C1=CC=C(C1)c1cc2[se]c3cc4cccnc4cc3c2c2cscc12
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      2322820
      2705444
      [SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
      C25H17NS3Si2
      483.786
      2.97681
      0.892533
      51.3304
      -5.49253
      2.37349
      -3.11904
      c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
    
    
      2322821
      2925216
      [SiH2]1ccc2csc(c12)-c1sc(-c2sc(-c3scc4occc34)c...
      C24H12O2S5Si
      520.773
      3.68731
      0.323482
      175.432
      -4.92348
      1.55837
      -3.36511
      c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(-c3scc4occc34)c...
    
    
      2322822
      2742210
      [SiH2]1ccc2csc(c12)-c1sc(-c2sc(-c3scc4ccoc34)c...
      C24H12O2S5Si
      520.773
      3.03641
      0.280599
      166.541
      -4.8806
      1.59642
      -3.28418
      c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(-c3scc4ccoc34)c...
    
    
      2322823
      3092419
      [SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
      C23H15N3S3Si2
      485.762
      5.76643
      1.00011
      88.7372
      -5.60011
      2.04536
      -3.55475
      c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
    
    
      2322824
      1253317
      [SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
      C23H17NS2Si2
      427.698
      2.56918
      1.02184
      38.6953
      -5.62184
      2.52339
      -3.09845
      c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
    
    
      2322825
      1841096
      [SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
      C25H17NOS2Si2
      467.719
      3.65147
      0.838712
      67.0043
      -5.43871
      2.22052
      -3.21819
      c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
    
    
      2322826
      2770889
      C1ccc2c1c(sc2-c1scc2cc[SiH2]c12)-c1ccc(-c2cccc...
      C26H17NS3Si
      467.711
      3.2944
      0.667854
      75.9176
      -5.26785
      2.14341
      -3.12444
      c1sc(c2[SiH2]ccc12)-c1sc(c2Cccc12)-c1ccc(-c2cc...
    
    
      2322827
      1816522
      C1ccc2c1c(sc2-c1scc2cc[SiH2]c12)-c1sc(-c2ccccc...
      C25H16S4Si
      472.751
      3.29743
      0.473489
      107.18
      -5.07349
      1.92114
      -3.15235
      c1sc(c2[SiH2]ccc12)-c1sc(c2Cccc12)-c1sc(-c2ccc...
    
    
      2322828
      1810382
      [SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
      C25H17NOS2Si2
      467.719
      3.58162
      0.762095
      72.3299
      -5.3621
      2.17184
      -3.19025
      c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
    
    
      2322829
      1648591
      [SiH2]1ccc2csc(c12)-c1sc(-c2sc(-c3scc4ccoc34)c...
      C24H12O3S4Si
      504.706
      2.78056
      0.264955
      161.513
      -4.86495
      1.61888
      -3.24608
      c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(-c3scc4ccoc34)c...
    
    
      2322830
      2705360
      [SiH2]1ccc2csc(c12)-c1sc(-c2sc(-c3scc4ccoc34)c...
      C24H13NO2S4Si
      503.722
      1.0633
      0.0871941
      187.68
      -4.68719
      1.50298
      -3.18421
      c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(-c3scc4ccoc34)c...
    
    
      2322831
      2349009
      C1ccc2csc(c12)-c1ccc(cn1)-c1sc(-c2scc3cc[SiH2]...
      C24H17NS3Si2
      471.775
      2.8029
      0.911719
      47.3144
      -5.51172
      2.42118
      -3.09054
      c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
    
    
      2322832
      3091107
      [SiH2]1ccc2csc(c12)-c1sc(-c2sc(-c3scc4ccsc34)c...
      C24H14OS5Si2
      534.876
      3.77035
      0.412894
      140.537
      -5.01289
      1.73206
      -3.28083
      c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(-c3scc4ccsc34)c...
    
    
      2322833
      8152
      [SiH2]1ccc2csc(c12)-c1sc(-c2scc3cc[se]c23)c2[s...
      C18H10S3Se2Si
      508.481
      2.88742
      0.549016
      80.9417
      -5.14902
      2.10191
      -3.0471
      c1sc(c2[SiH2]ccc12)-c1sc(-c2scc3cc[se]c23)c2[s...
    
    
      2322834
      1781722
      [SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
      C23H16N2S3Si2
      472.763
      2.81402
      0.556938
      77.7621
      -5.15694
      2.1271
      -3.02984
      c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
    
    
      2322835
      2470223
      [SiH2]1ccc2csc(c12)-c1sc(-c2sc(-c3scc4sccc34)c...
      C24H13NS6Si
      535.856
      2.44574
      0.20756
      181.349
      -4.80756
      1.5331
      -3.27446
      c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(-c3scc4sccc34)c...
    
    
      2322836
      2469856
      C1ccc2c1c(sc2-c1sc(-c2scc3cc[SiH2]c23)c2ccoc12...
      C25H15NOS4Si
      501.75
      2.14342
      0.22746
      145.027
      -4.82746
      1.70726
      -3.1202
      c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(c3Cccc23)-c2scc...
    
    
      2322837
      1912803
      [SiH2]1ccc2csc(c12)-c1sc(-c2sc(-c3scc4ccoc34)c...
      C24H12O3S4Si
      504.706
      2.6569
      0.274521
      148.952
      -4.87452
      1.68676
      -3.18776
      c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(-c3scc4ccoc34)c...
    
    
      2322838
      1216485
      [SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1cccc...
      C18H12N2S3Si2
      408.677
      7.59421
      0.993521
      117.64
      -5.59352
      1.85748
      -3.73604
      c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1cccc...
    
    
      2322839
      2619366
      C1cc2c(ccc(-c3ccccc3)c2c1)-c1sc(-c2scc3cc[SiH2...
      C28H20S2Si
      448.684
      3.74322
      0.466049
      123.612
      -5.06605
      1.824
      -3.24204
      c1sc(c2[SiH2]ccc12)-c1sc(c2Cccc12)-c1ccc(-c2cc...
    
    
      2322840
      1703911
      C1cc2c(ccc(-c3cccnc3)c2c1)-c1sc(-c2scc3cc[SiH2...
      C26H19NS2Si2
      465.747
      4.88105
      0.657693
      114.219
      -5.25769
      1.87628
      -3.38141
      c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
    
    
      2322841
      1814506
      [SiH2]1ccc2csc(c12)-c1sc(-c2sc(c3[SiH2]ccc23)-...
      C23H16N2S3Si2
      472.763
      3.35318
      0.461167
      111.904
      -5.06117
      1.892
      -3.16917
      c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(c3[SiH2]ccc23)-...
    
    
      2322842
      2559314
      [SiH2]1ccc2csc(c12)-c1sc(-c2sc(c3[SiH2]ccc23)-...
      C23H15NOS3Si2
      473.748
      4.26338
      0.688326
      95.3251
      -5.28833
      1.99871
      -3.28961
      c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(c3[SiH2]ccc23)-...
    
    
      2322843
      2351086
      [SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
      C24H16N2S3Si2
      484.774
      6.66266
      0.85006
      120.627
      -5.45006
      1.83969
      -3.61037
      c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
    
    
      2322844
      1712111
      [SiH2]1ccc2csc(c12)-c1sc(-c2sc(-c3scc4ccsc34)c...
      C24H12OS6Si
      536.84
      2.95171
      0.279912
      162.293
      -4.87991
      1.61514
      -3.26477
      c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(-c3scc4ccsc34)c...
    
    
      2322845
      2543603
      [SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1cnc(...
      C22H14N4S3Si2
      486.751
      0
      0
      0
      -5.63251
      1.45408
      -4.17843
      c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1cnc(...
    
    
      2322846
      2304057
      [SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
      C22H14N4S3Si2
      486.751
      9.33549
      1.12074
      128.197
      -5.72074
      1.7986
      -3.92214
      c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
    
    
      2322847
      2007035
      [SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
      C26H18S3Si2
      482.798
      2.49821
      0.834995
      46.0461
      -5.435
      2.43316
      -3.00184
      c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
    
    
      2322848
      1961981
      C1ccc2c1c(sc2-c1scc2cc[SiH2]c12)-c1ccc(cc1)-c1...
      C25H16S3SeSi
      519.645
      2.67907
      0.659243
      62.544
      -5.25924
      2.25847
      -3.00077
      c1sc(c2[SiH2]ccc12)-c1sc(c2Cccc12)-c1ccc(cc1)-...
    
    
      2322849
      2754558
      [SiH2]1ccc2csc(c12)-c1sc(-c2sc(-c3scc4ccsc34)c...
      C24H13NOS5Si
      519.789
      1.2724
      0.102802
      190.49
      -4.7028
      1.49095
      -3.21185
      c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(-c3scc4ccsc34)c...
    
  

2322850 rows × 11 columns



In [ ]:



In [4]:

    
def typ(x,y):
    sol = x.mean() + y.mean()
    return sol



In [7]:

    
typ(df['mass'],df['pce'])









    Out[7]:





419.48866167769404



In [ ]:

	0	1	2	3	4	5	6	7	8	9	10
0	id	SMILES_str	stoich_str	mass	pce	voc	jsc	e_homo_alpha	e_gap_alpha	e_lumo_alpha	tmp_smiles_str
1	655365	C1C=CC=C1c1cc2[se]c3c4occc4c4nsnc4c3c2cn1	C18H9N3OSSe	394.3151	5.16195320211971	0.86760078740294	91.5675749599	-5.46760078740294	2.02294443593306	-3.44465635146988	C1=CC=C(C1)c1cc2[se]c3c4occc4c4nsnc4c3c2cn1
2	1245190	C1C=CC=C1c1cc2[se]c3c(ncc4ccccc34)c2c2=C[SiH2]...	C22H15NSeSi	400.4135	5.2613977233692	0.50482419467609	160.40154923845	-5.10482419467609	1.63075003826037	-3.47407415641572	C1=CC=C(C1)c1cc2[se]c3c(ncc4ccccc34)c2c2=C[SiH...
3	21847	C1C=c2ccc3c4c[nH]cc4c4c5[SiH2]C(=Cc5oc4c3c2=C1...	C24H17NOSi	363.4903	0	0	197.47477990435	-4.53952567287262	1.46215815756611	-3.07736751530651	C1=CC=C(C1)C1=Cc2oc3c(c2[SiH2]1)c1c[nH]cc1c1cc...
4	65553	[SiH2]1C=CC2=C1C=C([SiH2]2)C1=Cc2[se]ccc2[SiH2]1	C12H12SeSi3	319.4448	6.13829369542661	0.63027445338351	149.88754514825	-5.23027445338351	1.6822495770534	-3.54802487633011	C1=CC2=C([SiH2]1)C=C([SiH2]2)C1=Cc2[se]ccc2[Si...
5	720918	C1C=c2c3ccsc3c3[se]c4cc(oc4c3c2=C1)C1=CC=CC1	C20H12OSSe	379.3398	1.99136566470237	0.242119009470801	126.58134716045	-4.8421190094708	1.80943882203271	-3.03268018743809	C1=CC=C(C1)c1cc2[se]c3c4sccc4c4=CCC=c4c3c2o1
6	1310744	C1C=CC=C1c1cc2[se]c3c(c4nsnc4c4ccncc34)c2c2ccc...	C24H13N3SSe	454.4137	5.60513478857347	0.95191087183926	90.62277586765	-5.55191087183926	2.02971670891245	-3.52219416292681	C1=CC=C(C1)c1cc2[se]c3c(c4nsnc4c4ccncc34)c2c2c...
7	196637	C1C=CC=C1c1cc2[se]c3cc4ccsc4cc3c2[se]1	C17H10SSe2	404.252	2.64443641930939	0.587932414406801	69.2234614721	-5.1879324144068	2.20110577558483	-2.98682663882197	C1=CC=C(C1)c1cc2[se]c3cc4ccsc4cc3c2[se]1
8	262174	C1C=CC=C1c1cc2[se]c3c4occc4c4cscc4c3c2[se]1	C19H10OSSe2	444.273	2.52305655873057	0.39767026257405	97.64532544975	-4.99767026257405	1.98212181531837	-3.01554844725568	C1=CC=C(C1)c1cc2[se]c3c4occc4c4cscc4c3c2[se]1
9	393249	C1C=CC=C1c1cc2[se]c3cc4cccnc4cc3c2c2ccccc12	C24H15NSe	396.3495	3.1158951050846	0.86913959183236	55.174814587685	-5.46913959183236	2.33181477476568	-3.13732481706668	C1=CC=C(C1)c1cc2[se]c3cc4cccnc4cc3c2c2ccccc12
10	35	C1C2=C([SiH2]C=C2)C=C1c1cc2occc2c2cscc12	C17H12OSSi	292.4328	2.74321377891055	0.38710624740493	109.06290475405	-4.98710624740493	1.90996574187542	-3.07714050552951	C1=CC2=C([SiH2]1)C=C(C2)c1cc2occc2c2cscc12
11	1048612	C1C=CC=C1C1=Cc2sc3cc4C=C[SiH2]c4cc3c2C1	C18H14SSi	290.4606	2.40841131373757	0.43131491941631	85.9377076701	-5.03131491941631	2.06584966433715	-2.96546525507916	C1=CC=C(C1)C1=Cc2sc3cc4C=C[SiH2]c4cc3c2C1
12	917542	C1C=c2ccc3[se]c4c5[se]c(cc5[se]c4c3c2=C1)C1=CC...	C20H12Se3	489.1948	2.84327790532769	0.3025906196108	144.6143656087	-4.9025906196108	1.70819762918304	-3.19439299042776	C1=CC=C(C1)c1cc2[se]c3c([se]c4ccc5=CCC=c5c34)c...
13	1441831	C1C=CC=C1C1=Cc2ncc3c4[se]ccc4cnc3c2C1	C18H12N2Se	335.2668	2.68724019638341	0.67549682028117	61.225277938305	-5.27549682028117	2.27095328753055	-3.00454353275062	C1=CC=C(C1)C1=Cc2ncc3c4[se]ccc4cnc3c2C1
14	1376296	C1C=CC=C1C1=Cc2c(C1)c1[se]c3ccc4cscc4c3c1c1=C[...	C24H16SSeSi	443.5024	2.8446368983132	0.18920592502927	231.38739350415	-4.78920592502927	1.31233370868624	-3.47687221634303	C1=CC=C(C1)C1=Cc2c(C1)c1[se]c3ccc4cscc4c3c1c1=...
15	1638442	C1C=c2ccc3cnc4c5[SiH2]C(=Cc5c5nsnc5c4c3c2=C1)C...	C23H15N3SSi	393.5445	6.46251246238048	0.60240460581576	165.1051792767	-5.20240460581576	1.60316496595707	-3.59923963985869	C1=CC=C(C1)C1=Cc2c([SiH2]1)c1ncc3ccc4=CCC=c4c3...
16	98350	C1C=CC=C1C1=Cc2ccc3c4CC=Cc4c4cscc4c3c2[SiH2]1	C22H16SSi	340.5204	2.63146328874209	0.410851163619401	98.57354638625	-5.0108511636194	1.97570703051256	-3.03514413310684	C1=CC=C(C1)C1=Cc2ccc3c4CC=Cc4c4cscc4c3c2[SiH2]1
17	2162747	C1C=CC=C1C1=Cc2c([SiH2]1)c1c3c[nH]cc3c3ccc4=C[...	C27H19NOSi2	429.6251	2.03915811352424	0.14074406290405	222.981280483	-4.74074406290405	1.36113723091331	-3.37960683199074	C1=CC=C(C1)C1=Cc2c([SiH2]1)c1c3c[nH]cc3c3ccc4=...
18	557119	C1C=c2c3C=C(Cc3c3occc3c2=C1)C1=CC=CC1	C19H14O	258.3186	0.237204563447386	0.0249623237532005	146.24654523115	-4.6249623237532	1.70041519990913	-2.92454712384407	C1=CC=C(C1)C1=Cc2c(C1)c1occc1c1=CCC=c21
19	753728	C1C=CC=C1C1=Cc2c([SiH2]1)c1cc3ncccc3cc1c1c[nH]...	C22H16N2Si	336.4684	3.10383123118601	0.409504148061471	116.65070843205	-5.00950414806147	1.8634156621733	-3.14608848588817	C1=CC=C(C1)C1=Cc2c([SiH2]1)c1cc3ncccc3cc1c1c[n...
20	819265	C1C=CC=C1C1=Cc2c([SiH2]1)c1c(c3cscc23)c2[se]cc...	C23H16SSeSi2	459.5774	5.38525291629117	0.368606419249421	224.8489157226	-4.96860641924942	1.35230882836041	-3.61629759088901	C1=CC=C(C1)C1=Cc2c([SiH2]1)c1c(c3cscc23)c2[se]...
21	1278019	C1C=CC=C1C1=Cc2c([SiH2]1)c1c(c3[SiH2]C=Cc3c3=C...	C23H18OSi3	394.6522	5.48948942078778	0.30124157478828	280.45593203485	-4.90124157478828	1.13561905617574	-3.76562251861254	C1=CC=C(C1)C1=Cc2c([SiH2]1)c1c(c3[SiH2]C=Cc3c3...
22	2096063	C1C=CC=C1c1cc2[se]c3c(c2c2cscc12)c1ccccc1c1ccc...	C27H14N2S2Se	509.5136	6.20409348575883	0.570054683857091	167.49791375135	-5.17005468385709	1.59307772190982	-3.57697696194727	C1=CC=C(C1)c1cc2[se]c3c(c2c2cscc12)c1ccccc1c1c...
23	2752585	C1C=CC=C1C1=Cc2c(C1)c1c(c3c[nH]cc23)c2c3c[nH]c...	C28H20N2Si	412.566	0	0	198.7499142499	-4.49944701123171	1.45720787683099	-3.04223913440072	C1=CC=C(C1)C1=Cc2c(C1)c1c(c3c[nH]cc23)c2c3c[nH...
24	1572945	C1C=CC=C1C1=Cc2[se]c3c4sccc4c4ccccc4c3c2C1	C22H14SSe	389.3786	2.16725162664095	0.33062319718781	100.8843043156	-4.93062319718781	1.96125330337018	-2.96936989381763	C1=CC=C(C1)C1=Cc2[se]c3c4sccc4c4ccccc4c3c2C1
25	2359381	C1C=CC=C1C1=Cc2c(C1)c1c3cscc3c3ccc4nsnc4c3c1c1...	C26H14N2OS2	434.5416	4.11298236915351	0.29954882002972	211.3181606601	-4.89954882002972	1.40922933873152	-3.4903194812982	C1=CC=C(C1)C1=Cc2c(C1)c1c3cscc3c3ccc4nsnc4c3c1...
26	1540183	C1C=CC=C1c1cc2[se]c3c([se]c4ccc5cscc5c34)c2cn1	C20H11NSSe2	455.2999	3.21256529851421	0.68356751158616	72.32994545335	-5.28356751158616	2.17471153849587	-3.10885597309029	C1=CC=C(C1)c1cc2[se]c3c([se]c4ccc5cscc5c34)c2cn1
27	1638500	C1C=CC=C1c1cc2[se]c3ccc4ccccc4c3c2c2cocc12	C23H14OSe	385.3226	3.08884387902497	0.48226213429058	98.57354638625	-5.08226213429058	1.97723538425119	-3.10502675003939	C1=CC=C(C1)c1cc2[se]c3ccc4ccccc4c3c2c2cocc12
28	2621542	C1C=c2c3ccccc3c3c4ccccc4c4C=C(Cc4c3c2=C1)C1=CC...	C29H20	368.477	2.55288566054067	0.34111463011233	115.18040593475	-4.94111463011233	1.87275914683688	-3.06835548327545	C1=CC=C(C1)C1=Cc2c(C1)c1c(c3ccccc23)c2ccccc2c2...
29	98411	C1C=CC=C1c1cc2[se]c3cc4cccnc4cc3c2c2cscc12	C22H13NSSe	402.3777	4.24735634211112	0.65395973081254	99.9574755777	-5.25395973081254	1.96724506127822	-3.28671466953432	C1=CC=C(C1)c1cc2[se]c3cc4cccnc4cc3c2c2cscc12
...	...	...	...	...	...	...	...	...	...	...	...
2322820	2705444	[SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1ccc(...	C25H17NS3Si2	483.786	2.97681	0.892533	51.3304	-5.49253	2.37349	-3.11904	c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
2322821	2925216	[SiH2]1ccc2csc(c12)-c1sc(-c2sc(-c3scc4occc34)c...	C24H12O2S5Si	520.773	3.68731	0.323482	175.432	-4.92348	1.55837	-3.36511	c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(-c3scc4occc34)c...
2322822	2742210	[SiH2]1ccc2csc(c12)-c1sc(-c2sc(-c3scc4ccoc34)c...	C24H12O2S5Si	520.773	3.03641	0.280599	166.541	-4.8806	1.59642	-3.28418	c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(-c3scc4ccoc34)c...
2322823	3092419	[SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1ccc(...	C23H15N3S3Si2	485.762	5.76643	1.00011	88.7372	-5.60011	2.04536	-3.55475	c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
2322824	1253317	[SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1ccc(...	C23H17NS2Si2	427.698	2.56918	1.02184	38.6953	-5.62184	2.52339	-3.09845	c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
2322825	1841096	[SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1ccc(...	C25H17NOS2Si2	467.719	3.65147	0.838712	67.0043	-5.43871	2.22052	-3.21819	c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
2322826	2770889	C1ccc2c1c(sc2-c1scc2cc[SiH2]c12)-c1ccc(-c2cccc...	C26H17NS3Si	467.711	3.2944	0.667854	75.9176	-5.26785	2.14341	-3.12444	c1sc(c2[SiH2]ccc12)-c1sc(c2Cccc12)-c1ccc(-c2cc...
2322827	1816522	C1ccc2c1c(sc2-c1scc2cc[SiH2]c12)-c1sc(-c2ccccc...	C25H16S4Si	472.751	3.29743	0.473489	107.18	-5.07349	1.92114	-3.15235	c1sc(c2[SiH2]ccc12)-c1sc(c2Cccc12)-c1sc(-c2ccc...
2322828	1810382	[SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1ccc(...	C25H17NOS2Si2	467.719	3.58162	0.762095	72.3299	-5.3621	2.17184	-3.19025	c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
2322829	1648591	[SiH2]1ccc2csc(c12)-c1sc(-c2sc(-c3scc4ccoc34)c...	C24H12O3S4Si	504.706	2.78056	0.264955	161.513	-4.86495	1.61888	-3.24608	c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(-c3scc4ccoc34)c...
2322830	2705360	[SiH2]1ccc2csc(c12)-c1sc(-c2sc(-c3scc4ccoc34)c...	C24H13NO2S4Si	503.722	1.0633	0.0871941	187.68	-4.68719	1.50298	-3.18421	c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(-c3scc4ccoc34)c...
2322831	2349009	C1ccc2csc(c12)-c1ccc(cn1)-c1sc(-c2scc3cc[SiH2]...	C24H17NS3Si2	471.775	2.8029	0.911719	47.3144	-5.51172	2.42118	-3.09054	c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
2322832	3091107	[SiH2]1ccc2csc(c12)-c1sc(-c2sc(-c3scc4ccsc34)c...	C24H14OS5Si2	534.876	3.77035	0.412894	140.537	-5.01289	1.73206	-3.28083	c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(-c3scc4ccsc34)c...
2322833	8152	[SiH2]1ccc2csc(c12)-c1sc(-c2scc3cc[se]c23)c2[s...	C18H10S3Se2Si	508.481	2.88742	0.549016	80.9417	-5.14902	2.10191	-3.0471	c1sc(c2[SiH2]ccc12)-c1sc(-c2scc3cc[se]c23)c2[s...
2322834	1781722	[SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1ccc(...	C23H16N2S3Si2	472.763	2.81402	0.556938	77.7621	-5.15694	2.1271	-3.02984	c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
2322835	2470223	[SiH2]1ccc2csc(c12)-c1sc(-c2sc(-c3scc4sccc34)c...	C24H13NS6Si	535.856	2.44574	0.20756	181.349	-4.80756	1.5331	-3.27446	c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(-c3scc4sccc34)c...
2322836	2469856	C1ccc2c1c(sc2-c1sc(-c2scc3cc[SiH2]c23)c2ccoc12...	C25H15NOS4Si	501.75	2.14342	0.22746	145.027	-4.82746	1.70726	-3.1202	c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(c3Cccc23)-c2scc...
2322837	1912803	[SiH2]1ccc2csc(c12)-c1sc(-c2sc(-c3scc4ccoc34)c...	C24H12O3S4Si	504.706	2.6569	0.274521	148.952	-4.87452	1.68676	-3.18776	c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(-c3scc4ccoc34)c...
2322838	1216485	[SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1cccc...	C18H12N2S3Si2	408.677	7.59421	0.993521	117.64	-5.59352	1.85748	-3.73604	c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1cccc...
2322839	2619366	C1cc2c(ccc(-c3ccccc3)c2c1)-c1sc(-c2scc3cc[SiH2...	C28H20S2Si	448.684	3.74322	0.466049	123.612	-5.06605	1.824	-3.24204	c1sc(c2[SiH2]ccc12)-c1sc(c2Cccc12)-c1ccc(-c2cc...
2322840	1703911	C1cc2c(ccc(-c3cccnc3)c2c1)-c1sc(-c2scc3cc[SiH2...	C26H19NS2Si2	465.747	4.88105	0.657693	114.219	-5.25769	1.87628	-3.38141	c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
2322841	1814506	[SiH2]1ccc2csc(c12)-c1sc(-c2sc(c3[SiH2]ccc23)-...	C23H16N2S3Si2	472.763	3.35318	0.461167	111.904	-5.06117	1.892	-3.16917	c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(c3[SiH2]ccc23)-...
2322842	2559314	[SiH2]1ccc2csc(c12)-c1sc(-c2sc(c3[SiH2]ccc23)-...	C23H15NOS3Si2	473.748	4.26338	0.688326	95.3251	-5.28833	1.99871	-3.28961	c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(c3[SiH2]ccc23)-...
2322843	2351086	[SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1ccc(...	C24H16N2S3Si2	484.774	6.66266	0.85006	120.627	-5.45006	1.83969	-3.61037	c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
2322844	1712111	[SiH2]1ccc2csc(c12)-c1sc(-c2sc(-c3scc4ccsc34)c...	C24H12OS6Si	536.84	2.95171	0.279912	162.293	-4.87991	1.61514	-3.26477	c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(-c3scc4ccsc34)c...
2322845	2543603	[SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1cnc(...	C22H14N4S3Si2	486.751	0	0	0	-5.63251	1.45408	-4.17843	c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1cnc(...
2322846	2304057	[SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1ccc(...	C22H14N4S3Si2	486.751	9.33549	1.12074	128.197	-5.72074	1.7986	-3.92214	c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
2322847	2007035	[SiH2]1ccc2csc(c12)-c1sc(c2[SiH2]ccc12)-c1ccc(...	C26H18S3Si2	482.798	2.49821	0.834995	46.0461	-5.435	2.43316	-3.00184	c1sc(c2[SiH2]ccc12)-c1sc(c2[SiH2]ccc12)-c1ccc(...
2322848	1961981	C1ccc2c1c(sc2-c1scc2cc[SiH2]c12)-c1ccc(cc1)-c1...	C25H16S3SeSi	519.645	2.67907	0.659243	62.544	-5.25924	2.25847	-3.00077	c1sc(c2[SiH2]ccc12)-c1sc(c2Cccc12)-c1ccc(cc1)-...
2322849	2754558	[SiH2]1ccc2csc(c12)-c1sc(-c2sc(-c3scc4ccsc34)c...	C24H13NOS5Si	519.789	1.2724	0.102802	190.49	-4.7028	1.49095	-3.21185	c1sc(c2[SiH2]ccc12)-c1sc(-c2sc(-c3scc4ccsc34)c...

Procedural programming in python

Topics

Flow control

If

For

range()

Now, use your code from above for the following URLs and filenames

Functions

Parameters have three different types:

Hacky Hack Time with Functions!

In class exercise

5-10 minutes

Objective: How would you functionalize the code for downloading, unzipping, and making a dataframe?

Bonus! Add the the code to a file HCEPDB_utils.py and import it!

From something to nothing

Task: Compute the pairwise Pearson correlation between rows in a dataframe.

Let's sketch the idea...

In class exercise

20-30 minutes

Objectives:

Bonus! Add the the code to a file `HCEPDB_utils.py` and import it!