Extracting dipole moment data from the NBO output

This notebook describes code to parse .nbo or .out files generated by Gaussian09/GENNBO6 modules. Lines related to NLMO dipole components are extracted and saved to a *.csv file.
Complete blog entry discussing this Notebook is at chemgplus.blogger.com site.



In [1]:

    
'''Custom styling. File *.css injects the contents of css_file in the 
header of the notebook's HTML file. Other css files are in the /css directory'''
# to (de)activate line numbering pres Esc while in the cell 
# followed by l (lower L)

'''Sanity check since we are changing directories and .css file path
would be incorrect upon cell re-load'''
from IPython.core.display import HTML
import string,sys,os,os.path,re
css_file = './css/blog.css'
if os.path.isfile(css_file):
    css_file
else:
    %cd ..
HTML(open(css_file, "r").read())









    Out[1]:




@import url("custom.css");



In [2]:

    
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', 14)
pd.set_option('display.width', 300)
pd.set_option("max_columns", 14)
pd.set_option('chained_assignment',None)  # suppress warnings for web publishing



In [3]:

    
%%capture  
# suppress output; remove %% capture for debugging
# Enter subdirectory and the input filename
%cd dipoles
filename = 'form.nbo'



In [4]:

    
# Save the file path, name and extension
fullpath =  os.path.abspath(filename)
path,file=os.path.split(fullpath)
basename, extension = os.path.splitext(filename)



In [5]:

    
# Parse the text section of Dipole Analysis into the list 'capture'
start = 0
begin = 0
end = 1
capture = []
with open (filename, 'r') as f:
    for line in f:
        # condition to end parsing
        if begin == 1 and '-------' in line:
            end = 0
        # parse the chunk
        if start == 1 and begin == 1 and end == 1 and not ("deloc" in line):
            if re.match(r"\s$", line): continue # if there's a space in the line
            capture += [line.lstrip()]
        # First condition to initiate capture
        if 'DIPOLE MOMENT ANALYSIS:' in line:
            start = 1
        # Second condition to initiate capture    
        if start == 1 and '==============' in line:
            begin = 1



In [6]:

    
# Extract values
def getdipvalues(list):
    orbnum = []
    orbtype = []
    dipX = []
    dipY = []
    dipZ = []
    dipTot = [] 
    try:
        for item in capture:
            #Regex with capturing groups to parse lines in the dipole section
            pattern = re.search(r"([0-9]{1,3})\.\s([A-Z]{2}.+)\s{7,13}(-?\d\.\d\d)\s?\s(-?\d\.\d\d)\s?\s(-?\d\.\d\d)\s?\s(\d\.\d\d)\s?\s.+", item, re.MULTILINE)
            if pattern:
                orbnum.append(pattern.group(1).strip())
                orbtype.append(pattern.group(2).strip())
                dipX.append(pattern.group(3))
                dipY.append(pattern.group(4))
                dipZ.append(pattern.group(5))
                dipTot.append(pattern.group(6))
        return orbnum, orbtype, dipX, dipY, dipZ, dipTot
    except ValueError, Argument:
        print "The argument does not contain list.\n", Argument



In [7]:

    
%%capture  
# suppress output; remove %% capture for debugging
# Create Pandas dataframe
orbnum, orbtype, dipX, dipY, dipZ, dipTot = getdipvalues(capture)

# Create Pandas DataFrame
df = pd.DataFrame({'NLMO': orbnum,'Type': orbtype,'X': dipX,'Y': dipY,'Z': dipZ,'Tot_Dip': dipTot},columns=['NLMO','Type','X','Y','Z','Tot_Dip'])
df[['X', 'Y','Z', 'Tot_Dip']] = df[['X', 'Y','Z', 'Tot_Dip']].astype(float)
df[['NLMO']] = df[['NLMO']].astype(int)

# Write dataframe to .csv file
try:
    df.to_csv(basename+"_dip.csv",index=False, encoding='utf-8')
except IOError:
    print "Error: can\'t find the file or read data"
else:
    print "\n" +('-'*80)+"\n"
    print ">> Contents of the dataframe was written to "+path+"\\"+basename+"_dip.csv file"

NLMO part of the DIPOLE MOMENT ANALYSIS: section is now shown in Table 1. The corresponding *_dip.csv file was saved with the path shown in the previous cell.



In [8]:

    
# Print html formatted table from the loaded css file
HTML(df.to_html(classes = 'grid', escape=False))









    Out[8]:





  
    
      
      NLMO
      Type
      X
      Y
      Z
      Tot_Dip
    
  
  
    
      0
      1
      CR ( 1) N 1
      0.00
      0
      0.00
      0.00
    
    
      1
      2
      CR ( 1) C 2
      0.00
      0
      0.00
      0.00
    
    
      2
      3
      CR ( 1) O 3
      0.00
      0
      0.00
      0.01
    
    
      3
      4
      LP ( 1) N 1
      -0.20
      0
      -1.64
      1.65
    
    
      4
      5
      LP ( 1) O 3
      -2.37
      0
      -1.64
      2.88
    
    
      5
      6
      LP ( 2) O 3
      0.89
      0
      0.75
      1.16
    
    
      6
      7
      BD ( 1) N 1- C 2
      0.02
      0
      0.85
      0.85
    
    
      7
      8
      BD ( 1) N 1- H 4
      -0.68
      0
      0.47
      0.82
    
    
      8
      9
      BD ( 1) N 1- H 5
      0.75
      0
      0.46
      0.88
    
    
      9
      10
      BD ( 1) C 2- O 3
      -1.90
      0
      -1.44
      2.38
    
    
      10
      11
      BD ( 2) C 2- O 3
      -0.67
      0
      -0.46
      0.81
    
    
      11
      12
      BD ( 1) C 2- H 6
      1.72
      0
      -0.58
      1.82

Table 1. XYZ-coordinates, orbital types, and total dipole values from NLMO dipole output.

iPython Notebook **ReadNboDip.ipynb**:
version 1.0 created on Dec 23, 2014
version 1.1 updated on Jan 4, 2015

	NLMO	Type	X	Z	Tot_Dip
0	1	CR ( 1) N 1	0.00	0.00	0.00
1	2	CR ( 1) C 2	0.00	0.00	0.00
2	3	CR ( 1) O 3	0.00	0.00	0.01
3	4	LP ( 1) N 1	-0.20	-1.64	1.65
4	5	LP ( 1) O 3	-2.37	-1.64	2.88
5	6	LP ( 2) O 3	0.89	0.75	1.16
6	7	BD ( 1) N 1- C 2	0.02	0.85	0.85
7	8	BD ( 1) N 1- H 4	-0.68	0.47	0.82
8	9	BD ( 1) N 1- H 5	0.75	0.46	0.88
9	10	BD ( 1) C 2- O 3	-1.90	-1.44	2.38
10	11	BD ( 2) C 2- O 3	-0.67	-0.46	0.81
11	12	BD ( 1) C 2- H 6	1.72	-0.58	1.82