Extracting dipole moment data from the NBO output

This notebook describes code to parse .nbo or .out files generated by Gaussian09/GENNBO6 modules. Lines related to NLMO dipole components are extracted and saved to a *.csv file.
Complete blog entry discussing this Notebook is at chemgplus.blogger.com site.


In [1]:
'''Custom styling. File *.css injects the contents of css_file in the 
header of the notebook's HTML file. Other css files are in the /css directory'''
# to (de)activate line numbering pres Esc while in the cell 
# followed by l (lower L)

'''Sanity check since we are changing directories and .css file path
would be incorrect upon cell re-load'''
from IPython.core.display import HTML
import string,sys,os,os.path,re
css_file = './css/blog.css'
if os.path.isfile(css_file):
    css_file
else:
    %cd ..
HTML(open(css_file, "r").read())


Out[1]:
@import url("custom.css");

In [2]:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', 14)
pd.set_option('display.width', 300)
pd.set_option("max_columns", 14)
pd.set_option('chained_assignment',None)  # suppress warnings for web publishing

In [3]:
%%capture  
# suppress output; remove %% capture for debugging
# Enter subdirectory and the input filename
%cd dipoles
filename = 'form.nbo'

In [4]:
# Save the file path, name and extension
fullpath =  os.path.abspath(filename)
path,file=os.path.split(fullpath)
basename, extension = os.path.splitext(filename)

In [5]:
# Parse the text section of Dipole Analysis into the list 'capture'
start = 0
begin = 0
end = 1
capture = []
with open (filename, 'r') as f:
    for line in f:
        # condition to end parsing
        if begin == 1 and '-------' in line:
            end = 0
        # parse the chunk
        if start == 1 and begin == 1 and end == 1 and not ("deloc" in line):
            if re.match(r"\s$", line): continue # if there's a space in the line
            capture += [line.lstrip()]
        # First condition to initiate capture
        if 'DIPOLE MOMENT ANALYSIS:' in line:
            start = 1
        # Second condition to initiate capture    
        if start == 1 and '==============' in line:
            begin = 1

In [6]:
# Extract values
def getdipvalues(list):
    orbnum = []
    orbtype = []
    dipX = []
    dipY = []
    dipZ = []
    dipTot = [] 
    try:
        for item in capture:
            #Regex with capturing groups to parse lines in the dipole section
            pattern = re.search(r"([0-9]{1,3})\.\s([A-Z]{2}.+)\s{7,13}(-?\d\.\d\d)\s?\s(-?\d\.\d\d)\s?\s(-?\d\.\d\d)\s?\s(\d\.\d\d)\s?\s.+", item, re.MULTILINE)
            if pattern:
                orbnum.append(pattern.group(1).strip())
                orbtype.append(pattern.group(2).strip())
                dipX.append(pattern.group(3))
                dipY.append(pattern.group(4))
                dipZ.append(pattern.group(5))
                dipTot.append(pattern.group(6))
        return orbnum, orbtype, dipX, dipY, dipZ, dipTot
    except ValueError, Argument:
        print "The argument does not contain list.\n", Argument

In [7]:
%%capture  
# suppress output; remove %% capture for debugging
# Create Pandas dataframe
orbnum, orbtype, dipX, dipY, dipZ, dipTot = getdipvalues(capture)

# Create Pandas DataFrame
df = pd.DataFrame({'NLMO': orbnum,'Type': orbtype,'X': dipX,'Y': dipY,'Z': dipZ,'Tot_Dip': dipTot},columns=['NLMO','Type','X','Y','Z','Tot_Dip'])
df[['X', 'Y','Z', 'Tot_Dip']] = df[['X', 'Y','Z', 'Tot_Dip']].astype(float)
df[['NLMO']] = df[['NLMO']].astype(int)

# Write dataframe to .csv file
try:
    df.to_csv(basename+"_dip.csv",index=False, encoding='utf-8')
except IOError:
    print "Error: can\'t find the file or read data"
else:
    print "\n" +('-'*80)+"\n"
    print ">> Contents of the dataframe was written to "+path+"\\"+basename+"_dip.csv file"

NLMO part of the DIPOLE MOMENT ANALYSIS: section is now shown in Table 1. The corresponding *_dip.csv file was saved with the path shown in the previous cell.



In [8]:
# Print html formatted table from the loaded css file
HTML(df.to_html(classes = 'grid', escape=False))


Out[8]:
NLMO Type X Y Z Tot_Dip
0 1 CR ( 1) N 1 0.00 0 0.00 0.00
1 2 CR ( 1) C 2 0.00 0 0.00 0.00
2 3 CR ( 1) O 3 0.00 0 0.00 0.01
3 4 LP ( 1) N 1 -0.20 0 -1.64 1.65
4 5 LP ( 1) O 3 -2.37 0 -1.64 2.88
5 6 LP ( 2) O 3 0.89 0 0.75 1.16
6 7 BD ( 1) N 1- C 2 0.02 0 0.85 0.85
7 8 BD ( 1) N 1- H 4 -0.68 0 0.47 0.82
8 9 BD ( 1) N 1- H 5 0.75 0 0.46 0.88
9 10 BD ( 1) C 2- O 3 -1.90 0 -1.44 2.38
10 11 BD ( 2) C 2- O 3 -0.67 0 -0.46 0.81
11 12 BD ( 1) C 2- H 6 1.72 0 -0.58 1.82
Table 1. XYZ-coordinates, orbital types, and total dipole values from NLMO dipole output.

iPython Notebook **ReadNboDip.ipynb**:
version 1.0 created on Dec 23, 2014
version 1.1 updated on Jan 4, 2015