Reading QM outputs


In [1]:
# This is to prevent stale code from being executed in those pesky *.pyc files, just in case.
%load_ext autoreload
%autoreload 2

There are a few ways of opening text files, reading them, and parsing their contents. I'll use one of our CO2 frequency outputs as an example.


In [2]:
import os

In [3]:
os.listdir()


Out[3]:
['.ipynb_checkpoints',
 'Reading QM outputs.ipynb',
 'Reading QM outputs - EDA and COVP.ipynb',
 'Plotting.ipynb',
 'Using cclib.ipynb',
 'Frequency Calculations.ipynb']

In [4]:
help(os.listdir)


Help on built-in function listdir in module posix:

listdir(...)
    listdir(path='.') -> list_of_filenames
    
    Return a list containing the names of the files in the directory.
    The list is in arbitrary order.  It does not include the special
    entries '.' and '..' even if they are present in the directory.
    
    path can be specified as either str or bytes.  If path is bytes,
      the filenames returned will also be bytes; in all other circumstances
      the filenames returned will be str.
    On some platforms, path may also be specified as an open file descriptor;
      the file descriptor must refer to a directory.
      If this functionality is unavailable, using it raises NotImplementedError.


In [5]:
os.listdir(path="../qm_files")


Out[5]:
['drop_0375_0qm_0mm.out', 'drop_0001_1qm_2mm_eda_covp.out']

In [6]:
filename = "../qm_files/drop_0375_0qm_0mm.out"
print(filename)


../qm_files/drop_0375_0qm_0mm.out

Opening a file 1

Python has a built-in function called open that takes a filename as a string and returns a handle to it that we can work with, so it can be read, looped over, and closed.


In [7]:
help(open)


Help on built-in function open in module io:

open(...)
    open(file, mode='r', buffering=-1, encoding=None,
         errors=None, newline=None, closefd=True, opener=None) -> file object
    
    Open file and return a stream.  Raise IOError upon failure.
    
    file is either a text or byte string giving the name (and the path
    if the file isn't in the current working directory) of the file to
    be opened or an integer file descriptor of the file to be
    wrapped. (If a file descriptor is given, it is closed when the
    returned I/O object is closed, unless closefd is set to False.)
    
    mode is an optional string that specifies the mode in which the file
    is opened. It defaults to 'r' which means open for reading in text
    mode.  Other common values are 'w' for writing (truncating the file if
    it already exists), 'x' for creating and writing to a new file, and
    'a' for appending (which on some Unix systems, means that all writes
    append to the end of the file regardless of the current seek position).
    In text mode, if encoding is not specified the encoding used is platform
    dependent: locale.getpreferredencoding(False) is called to get the
    current locale encoding. (For reading and writing raw bytes use binary
    mode and leave encoding unspecified.) The available modes are:
    
    ========= ===============================================================
    Character Meaning
    --------- ---------------------------------------------------------------
    'r'       open for reading (default)
    'w'       open for writing, truncating the file first
    'x'       create a new file and open it for writing
    'a'       open for writing, appending to the end of the file if it exists
    'b'       binary mode
    't'       text mode (default)
    '+'       open a disk file for updating (reading and writing)
    'U'       universal newline mode (deprecated)
    ========= ===============================================================
    
    The default mode is 'rt' (open for reading text). For binary random
    access, the mode 'w+b' opens and truncates the file to 0 bytes, while
    'r+b' opens the file without truncation. The 'x' mode implies 'w' and
    raises an `FileExistsError` if the file already exists.
    
    Python distinguishes between files opened in binary and text modes,
    even when the underlying operating system doesn't. Files opened in
    binary mode (appending 'b' to the mode argument) return contents as
    bytes objects without any decoding. In text mode (the default, or when
    't' is appended to the mode argument), the contents of the file are
    returned as strings, the bytes having been first decoded using a
    platform-dependent encoding or using the specified encoding if given.
    
    'U' mode is deprecated and will raise an exception in future versions
    of Python.  It has no effect in Python 3.  Use newline to control
    universal newlines mode.
    
    buffering is an optional integer used to set the buffering policy.
    Pass 0 to switch buffering off (only allowed in binary mode), 1 to select
    line buffering (only usable in text mode), and an integer > 1 to indicate
    the size of a fixed-size chunk buffer.  When no buffering argument is
    given, the default buffering policy works as follows:
    
    * Binary files are buffered in fixed-size chunks; the size of the buffer
      is chosen using a heuristic trying to determine the underlying device's
      "block size" and falling back on `io.DEFAULT_BUFFER_SIZE`.
      On many systems, the buffer will typically be 4096 or 8192 bytes long.
    
    * "Interactive" text files (files for which isatty() returns True)
      use line buffering.  Other text files use the policy described above
      for binary files.
    
    encoding is the name of the encoding used to decode or encode the
    file. This should only be used in text mode. The default encoding is
    platform dependent, but any encoding supported by Python can be
    passed.  See the codecs module for the list of supported encodings.
    
    errors is an optional string that specifies how encoding errors are to
    be handled---this argument should not be used in binary mode. Pass
    'strict' to raise a ValueError exception if there is an encoding error
    (the default of None has the same effect), or pass 'ignore' to ignore
    errors. (Note that ignoring encoding errors can lead to data loss.)
    See the documentation for codecs.register or run 'help(codecs.Codec)'
    for a list of the permitted encoding error strings.
    
    newline controls how universal newlines works (it only applies to text
    mode). It can be None, '', '\n', '\r', and '\r\n'.  It works as
    follows:
    
    * On input, if newline is None, universal newlines mode is
      enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
      these are translated into '\n' before being returned to the
      caller. If it is '', universal newline mode is enabled, but line
      endings are returned to the caller untranslated. If it has any of
      the other legal values, input lines are only terminated by the given
      string, and the line ending is returned to the caller untranslated.
    
    * On output, if newline is None, any '\n' characters written are
      translated to the system default line separator, os.linesep. If
      newline is '' or '\n', no translation takes place. If newline is any
      of the other legal values, any '\n' characters written are translated
      to the given string.
    
    If closefd is False, the underlying file descriptor will be kept open
    when the file is closed. This does not work when a file name is given
    and must be True in that case.
    
    A custom opener can be used by passing a callable as *opener*. The
    underlying file descriptor for the file object is then obtained by
    calling *opener* with (*file*, *flags*). *opener* must return an open
    file descriptor (passing os.open as *opener* results in functionality
    similar to passing None).
    
    open() returns a file object whose type depends on the mode, and
    through which the standard file operations such as reading and writing
    are performed. When open() is used to open a file in a text mode ('w',
    'r', 'wt', 'rt', etc.), it returns a TextIOWrapper. When used to open
    a file in a binary mode, the returned class varies: in read binary
    mode, it returns a BufferedReader; in write binary and append binary
    modes, it returns a BufferedWriter, and in read/write mode, it returns
    a BufferedRandom.
    
    It is also possible to use a string or bytearray as a file for both
    reading and writing. For strings StringIO can be used like a file
    opened in a text mode, and for bytes a BytesIO can be used like a file
    opened in a binary mode.


In [8]:
handle = open(filename)
print(handle)


<_io.TextIOWrapper name='../qm_files/drop_0375_0qm_0mm.out' mode='r' encoding='UTF-8'>

In [9]:
print(type(handle))


<class '_io.TextIOWrapper'>

These will say something slightly different in Python 2 vs. 3, but we work with them in exactly the same way. Here are all the methods that are defined on the handle to our file.


In [10]:
dir(handle)


Out[10]:
['_CHUNK_SIZE',
 '__class__',
 '__del__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__enter__',
 '__eq__',
 '__exit__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__iter__',
 '__le__',
 '__lt__',
 '__ne__',
 '__new__',
 '__next__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '_checkClosed',
 '_checkReadable',
 '_checkSeekable',
 '_checkWritable',
 '_finalizing',
 'buffer',
 'close',
 'closed',
 'detach',
 'encoding',
 'errors',
 'fileno',
 'flush',
 'isatty',
 'line_buffering',
 'mode',
 'name',
 'newlines',
 'read',
 'readable',
 'readline',
 'readlines',
 'seek',
 'seekable',
 'tell',
 'truncate',
 'writable',
 'write',
 'writelines']

This list of strings represents all the methods or member variables that can be called on the handle.

If our file handle is called handle, and we see name is a member of that list, we want to know what it does.


In [11]:
help(handle.name)


no Python documentation found for '../qm_files/drop_0375_0qm_0mm.out'


In [12]:
type(handle.name)


Out[12]:
str

In [13]:
handle.name


Out[13]:
'../qm_files/drop_0375_0qm_0mm.out'

In [14]:
handle.name()


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-ea90735ffa6e> in <module>()
----> 1 handle.name()

TypeError: 'str' object is not callable

So, through a little experimentation, we've figured out that it isn't a function, it's a variable. If it was a function, we'd be able to call it like above.

You're probably wondering what all the names that begin with __ or _ are. These are methods or member variables that aren't meant to be used directly by the user; they're for "under-the-hood" operations only. Let's look only at the parts of handle we're supposed to use.


In [15]:
print([m for m in dir(handle) if m[0] != '_'])


['buffer', 'close', 'closed', 'detach', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'line_buffering', 'mode', 'name', 'newlines', 'read', 'readable', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines']

We can look at the type of all these too:


In [16]:
for m in dir(handle):
    if m[0] != '_':
        print(m, type(eval("handle.{}".format(m))))


buffer <class '_io.BufferedReader'>
close <class 'builtin_function_or_method'>
closed <class 'bool'>
detach <class 'builtin_function_or_method'>
encoding <class 'str'>
errors <class 'str'>
fileno <class 'builtin_function_or_method'>
flush <class 'builtin_function_or_method'>
isatty <class 'builtin_function_or_method'>
line_buffering <class 'bool'>
mode <class 'str'>
name <class 'str'>
newlines <class 'NoneType'>
read <class 'builtin_function_or_method'>
readable <class 'builtin_function_or_method'>
readline <class 'builtin_function_or_method'>
readlines <class 'builtin_function_or_method'>
seek <class 'builtin_function_or_method'>
seekable <class 'builtin_function_or_method'>
tell <class 'builtin_function_or_method'>
truncate <class 'builtin_function_or_method'>
writable <class 'builtin_function_or_method'>
write <class 'builtin_function_or_method'>
writelines <class 'builtin_function_or_method'>

Since we're interested in reading from a file, there are a few methods that sound like they can read.


In [17]:
help(handle.readable)


Help on built-in function readable:

readable(...) method of _io.TextIOWrapper instance


In [18]:
handle.readable()


Out[18]:
True

We can read from the file. I should hope so, we just opened it!


In [19]:
help(handle.read)


Help on built-in function read:

read(...) method of _io.TextIOWrapper instance

Well that isn't very helpful...let's look at the official documentation.


In [20]:
# This will let us do some neat stuff with the notebook, like embed webpages and videos.
import IPython

In [21]:
website = "https://docs.python.org/3.5/tutorial/inputoutput.html#reading-and-writing-files"
IPython.lib.display.IFrame(website, width=800, height=800)


Out[21]:

After a bit of light reading, it looks like we can do contents = handle.read() and contents will be a giant string that contains all of the file contents. Only one way to find out...


In [22]:
contents = handle.read()

In [23]:
print(contents)


                  Welcome to Q-Chem
     A Quantum Leap Into The Future Of Chemistry


 Q-Chem 4.3 (beta), Q-Chem, Inc., Pleasanton, CA (2015)

 Y. Shao,  Z. Gan,  E. Epifanovsky,  A. T. B. Gilbert,  M. Wormit,  
 J. Kussmann,  A. W. Lange,  A. Behn,  J. Deng,  X. Feng,  D. Ghosh,  
 M. Goldey,  P. R. Horn,  L. D. Jacobson,  I. Kaliman,  R. Z. Khaliullin,  
 T. Kus,  A. Landau,  J. Liu,  E. I. Proynov,  Y. M. Rhee,  R. M. Richard,  
 M. A. Rohrdanz,  R. P. Steele,  E. J. Sundstrom,  H. L. Woodcock III,  
 P. M. Zimmerman,  D. Zuev,  B. Albrecht,  E. Alguire,  B. Austin,  
 S. A. Baeppler,  G. J. O. Beran,  Y. A. Bernard,  E. Berquist,  
 K. Brandhorst,  K. B. Bravaya,  S. T. Brown,  D. Casanova,  C.-M. Chang,  
 Y. Chen,  S. H. Chien,  K. D. Closser,  D. L. Crittenden,  M. Diedenhofen,  
 R. A. DiStasio Jr.,  H. Do,  A. D. Dutoi,  R. G. Edgar,  P.-T. Fang,  
 S. Fatehi,  Q. Feng,  L. Fusti-Molnar,  A. Ghysels,  
 A. Golubeva-Zadorozhnaya,  J. Gomes,  A. Gunina,  M. W. D. Hanson-Heine,  
 P. H. P. Harbach,  A. W. Hauser,  E. G. Hohenstein,  Z. C. Holden,  K. Hui,  
 T.-C. Jagau,  H. Ji,  B. Kaduk,  K. Khistyaev,  Jaehoon Kim,  Jihan Kim,  
 R. A. King,  P. Klunzinger,  D. Kosenkov,  T. Kowalczyk,  C. M. Krauter,  
 K. U. Lao,  A. Laurent,  K. V. Lawler,  S. Lehtola,  S. V. Levchenko,  
 C. Y. Lin,  Y.-S. Lin,  F. Liu,  E. Livshits,  R. C. Lochan,  A. Luenser,  
 P. Manohar,  S. F. Manzer,  S.-P. Mao,  N. Mardirossian,  A. V. Marenich,  
 L. A. Martinez-Martinez,  S. A. Maurer,  N. J. Mayhall,  K. Nanda,  
 C. M. Oana,  R. Olivares-Amaya,  D. P. O'Neill,  J. A. Parkhill,  
 T. M. Perrine,  R. Peverati,  P. A. Pieniazek,  F. Plasser,  A. Prociuk,  
 D. R. Rehn,  E. Rosta,  N. J. Russ,  N. Sergueev,  S. M. Sharada,  
 S. Sharma,  D. W. Small,  A. Sodt,  T. Stauch,  T. Stein,  D. Stuck,  
 Y.-C. Su,  A. J. W. Thom,  T. Tsuchimochi,  L. Vogt,  O. Vydrov,  T. Wang,  
 M. A. Watson,  J. Wenzel,  A. White,  C. F. Williams,  V. Vanovschi,  
 S. Yeganeh,  S. R. Yost,  Z.-Q. You,  I. Y. Zhang,  X. Zhang,  Y. Zhao,  
 B. R. Brooks,  G. K. L. Chan,  D. M. Chipman,  C. J. Cramer,  
 W. A. Goddard III,  M. S. Gordon,  W. J. Hehre,  A. Klamt,  
 H. F. Schaefer III,  M. W. Schmidt,  C. D. Sherrill,  D. G. Truhlar,  
 A. Warshel,  X. Xu,  A. Aspuru-Guzik,  R. Baer,  A. T. Bell,  N. A. Besley,  
 J.-D. Chai,  A. Dreuw,  B. D. Dunietz,  T. R. Furlani,  S. R. Gwaltney,  
 C.-P. Hsu,  Y. Jung,  J. Kong,  D. S. Lambrecht,  W. Liang,  C. Ochsenfeld,  
 V. A. Rassolov,  L. V. Slipchenko,  J. E. Subotnik,  T. Van Voorhis,  
 J. M. Herbert,  A. I. Krylov,  P. M. W. Gill,  M. Head-Gordon

 Contributors to earlier versions of Q-Chem not listed above: 
 R. D. Adamson,  J. Baker,  E. F. C. Byrd,  A. K. Chakraborty,  C.-L. Cheng,  
 H. Dachsel,  R. J. Doerksen,  G. Hawkins,  A. Heyden,  S. Hirata,  
 G. Kedziora,  F. J. Keil,  C. Kelley,  P. P. Korambath,  W. Kurlancheek,  
 A. M. Lee,  M. S. Lee,  D. Liotard,  I. Lotan,  P. E. Maslen,  N. Nair,  
 D. Neuhauser,  R. Olson,  B. Peters,  J. Ritchie,  N. E. Schultz,  
 N. Shenvi,  A. C. Simmonett,  K. S. Thanthiriwatte,  Q. Wu,  W. Zhang

 Please cite Q-Chem as follows:
 Y. Shao et al., Mol. Phys. 113, 184-215 (2014)
 DOI: 10.1080/00268976.2014.952696

 Q-Chem 4.3.0 for Intel X86 EM64T Linux

 Parts of Q-Chem use Armadillo 4.550.3 (Singapore Sling Deluxe).
 http://arma.sourceforge.net/

 Q-Chem begins on Tue May 26 01:32:26 2015  

Host: copper
0

     Scratch files written to /media/Backup/qchem3611//
 Mar1115 |home|eric|opt|apps|qchem|trunk_RNUM -1
Processing $rem in /home/eric/.qchemrc.
         symmetry = false
       sym_ignore = true
      cc_symmetry = false
  scf_convergence = 8
           thresh = 14
        varthresh = 0
           incdft = 0
          xc_grid = 000100000302 ! (100,302)
 integrals_buffer = 3000
       mem_static = 1000
        mem_total = 4000
        cc_memory = 4000
       ao2mo_disk = 4000
    n_frozen_core = fc
    molden_format = true
   scf_print_frgm = true
           chelpg = true

Checking the input file for inconsistencies...	...done.

--------------------------------------------------------------
User input:
--------------------------------------------------------------
$molecule
0 1
C      0.0000000    0.0000000    0.0000000
O     -0.8361998    0.6690202    0.4378004
O      0.7377005   -0.7639198   -0.4848003
$end

$external_charges
$end

$rem
 jobtype = freq
 method = b3lyp
 basis = 6-31g**
 chelpg = false
 molden_format = false
$end

--------------------------------------------------------------
 ----------------------------------------------------------------
             Standard Nuclear Orientation (Angstroms)
    I     Atom           X                Y                Z
 ----------------------------------------------------------------
    1      C       0.0000000000     0.0000000000     0.0000000000
    2      O      -0.8361998000     0.6690202000     0.4378004000
    3      O       0.7377005000    -0.7639198000    -0.4848003000
 ----------------------------------------------------------------
 Nuclear Repulsion Energy =    58.3123843069 hartrees
 There are       11 alpha and       11 beta electrons
 Requested basis set is 6-31G(d,p)
 There are 12 shells and 45 basis functions

Total QAlloc Memory Limit   4000 MB
Mega-Array Size       994 MB
MEM_STATIC part      1000 MB

                       Distance Matrix (Angstroms)
             C (  1)   O (  2)
   O (  2)  1.156930
   O (  3)  1.167393  2.319843
 
 A cutoff of  1.0D-14 yielded     77 shell pairs
 There are      1115 function pairs
 
 -------------------------------------------------------
 OpenMP Integral Computing Module                       
 Release: version 1.0, May 2013, Q-Chem Inc. Pittsburgh 
 -------------------------------------------------------
 Integral Job Info:
 Integral job number is                      11
 Integral operator is                         1
 short-range coefficients               -999999
 long-range coefficients                -999999
 Omega coefficients                           0
 if combine SR and LR in K              -999999
 Integral screening is                        0
 Integral computing path is                   2
 max size of driver memory is            800000
 size of driver memory is                593474
 size of scratch memory is              2260544
 max col of scratch BK array               1296
 max len of scratch array in speh3          155
 max len of scratch index in speh4           18
 max int batch size is                      520
 min int batch size is                       52
 fixed nKL is                                52
 max L of basis functions is                  2
 order of int derivative is                   0
 number of shells is                         12
 number of basis is                          45
 number of cartesian basis is                45
 number of contracted shell pairs            77
 number of primitive shell pairs            489
 maxK2 (contraction) of shell pair           36
 max number of K2 of shell pair               1
 max number of CS2 of shell pair             11
 max number of PS2 of shell pair            108
 mem total for path MDJ                   12910
 -------------------------------------------------------
 Smallest overlap matrix eigenvalue = 2.91E-03
 Guess from superposition of atomic densities
 Warning:  Energy on first SCF cycle will be non-variational
 A restricted hybrid HF-DFT SCF calculation will be
 performed using Pulay DIIS extrapolation
 Exchange:     0.2000 Hartree-Fock + 0.0800 Slater + 0.7200 Becke88
 Correlation:  0.8100 LYP + 0.1900 VWN1RPA
 Using Euler-Maclaurin-Lebedev (100,302) quadrature formula
 SCF converges when DIIS error is below 1.0E-08
 CPU 0.00 s  wall 0.00 s
 CPU 0.22 s  wall 0.22 s

 Total DFTman time = 0.22 CPUs 0.22 Wall
 ---------------------------------------
  Cycle       Energy         DIIS Error
 ---------------------------------------
    1    -189.1154962325      1.28E-01
 CPU 0.00 s  wall 0.00 s
 CPU 0.24 s  wall 0.24 s

 Total DFTman time = 0.24 CPUs 0.24 Wall
    2    -188.4993475650      2.09E-02
 CPU 0.00 s  wall 0.00 s
 CPU 0.24 s  wall 0.24 s

 Total DFTman time = 0.24 CPUs 0.24 Wall
    3    -188.3404425374      3.66E-02
 CPU 0.00 s  wall 0.00 s
 CPU 0.24 s  wall 0.24 s

 Total DFTman time = 0.24 CPUs 0.24 Wall
    4    -188.5791308470      9.56E-04
 CPU 0.00 s  wall 0.00 s
 CPU 0.24 s  wall 0.24 s

 Total DFTman time = 0.24 CPUs 0.24 Wall
    5    -188.5791543119      9.36E-04
 CPU 0.00 s  wall 0.00 s
 CPU 0.24 s  wall 0.25 s

 Total DFTman time = 0.24 CPUs 0.25 Wall
    6    -188.5792981965      3.71E-04
 CPU 0.00 s  wall 0.00 s
 CPU 0.24 s  wall 0.24 s

 Total DFTman time = 0.24 CPUs 0.24 Wall
    7    -188.5793253204      2.42E-05
 CPU 0.00 s  wall 0.00 s
 CPU 0.24 s  wall 0.24 s

 Total DFTman time = 0.24 CPUs 0.24 Wall
    8    -188.5793254162      3.23E-06
 CPU 0.00 s  wall 0.00 s
 CPU 0.25 s  wall 0.24 s

 Total DFTman time = 0.25 CPUs 0.24 Wall
    9    -188.5793254182      1.09E-07
 CPU 0.00 s  wall 0.00 s
 CPU 0.24 s  wall 0.24 s

 Total DFTman time = 0.24 CPUs 0.24 Wall
   10    -188.5793254182      3.56E-08
 CPU 0.00 s  wall 0.00 s
 CPU 0.24 s  wall 0.24 s

 Total DFTman time = 0.24 CPUs 0.24 Wall
   11    -188.5793254182      2.59E-09 Convergence criterion met
 ---------------------------------------
 SCF time:  CPU 2.83 s  wall 2.83 s
 SCF   energy in the final basis set = -188.5793254182
 Total energy in the final basis set = -188.5793254182
 Analysis of SCF Wavefunction
 
 --------------------------------------------------------------
 
                    Orbital Energies (a.u.)
 --------------------------------------------------------------
 
 Alpha MOs
 -- Occupied --
-19.236 -19.234 -10.383  -1.165  -1.122  -0.561  -0.517  -0.515
 -0.515  -0.370  -0.370
 -- Virtual --
  0.020   0.035   0.105   0.365   0.471   0.472   0.590   0.740
  0.782   0.875   0.877   1.040   1.040   1.042   1.385   1.392
  1.393   1.412   1.429   1.726   1.726   1.819   2.020   2.021
  2.141   2.148   2.753   2.917   2.961   2.970   3.062   3.755
  4.394   4.429
 --------------------------------------------------------------
 
          Ground-State Mulliken Net Atomic Charges

     Atom                 Charge (a.u.)
  ----------------------------------------
      1 C                     0.714405
      2 O                    -0.352207
      3 O                    -0.362198
  ----------------------------------------
  Sum of atomic charges =     0.000000

 -----------------------------------------------------------------
                    Cartesian Multipole Moments
 -----------------------------------------------------------------
    Charge (ESU x 10^10)
                 0.0000
    Dipole Moment (Debye)
         X       0.0962      Y       0.1470      Z       0.0767
       Tot       0.1917
    Quadrupole Moments (Debye-Ang)
        XX     -16.4748     XY       1.8690     YY     -16.1241
        XZ       1.2035     YZ      -1.0972     ZZ     -15.1260
    Octopole Moments (Debye-Ang^2)
       XXX       1.5072    XXY       0.4784    XYY       0.4427
       YYY       1.5836    XXZ       0.2324    XYZ      -0.0153
       YYZ       0.2965    XZZ       0.4654    YZZ       0.5293
       ZZZ       0.7735
    Hexadecapole Moments (Debye-Ang^3)
      XXXX     -50.7101   XXXY      18.5130   XXYY     -16.0469
      XYYY      18.4257   YYYY     -43.7461   XXXZ      11.9281
      XXYZ      -3.8351   XYYZ       4.1298   YYYZ     -10.8659
      XXZZ     -12.5519   XYZZ       6.1750   YYZZ     -11.3791
      XZZZ      11.7235   YZZZ     -10.7361   ZZZZ     -24.0657
 -----------------------------------------------------------------
 Calculating MO derivatives via CPSCF
 CPU 0.00 s  wall 0.00 s
 CPU 1.83 s  wall 1.83 s

 Total DFTman time = 1.83 CPUs 1.83 Wall
 CPU 0.00 s  wall 0.00 s
 CPU 1.55 s  wall 1.55 s

 Total DFTman time = 1.55 CPUs 1.55 Wall
 CPU 0.00 s  wall 0.00 s
 CPU 1.81 s  wall 1.82 s

 Total DFTman time = 1.81 CPUs 1.82 Wall
   1         3           9      0.167052    0.046142
 CPU 0.00 s  wall 0.00 s
 CPU 1.82 s  wall 1.82 s

 Total DFTman time = 1.82 CPUs 1.82 Wall
   2         3           9      0.020551    0.005543
 CPU 0.00 s  wall 0.00 s
 CPU 1.86 s  wall 1.86 s

 Total DFTman time = 1.86 CPUs 1.86 Wall
   3         3           9      0.001111    0.000255
 CPU 0.00 s  wall 0.00 s
 CPU 1.82 s  wall 1.82 s

 Total DFTman time = 1.82 CPUs 1.82 Wall
   4         3           9      0.000063    0.000014
 CPU 0.00 s  wall 0.00 s
 CPU 1.82 s  wall 1.82 s

 Total DFTman time = 1.82 CPUs 1.82 Wall
   5        12           0      0.000000    0.000000    Roots Converged
 CPU 0.00 s  wall 0.00 s
 CPU 2.29 s  wall 2.29 s

 Total DFTman time = 2.29 CPUs 2.29 Wall
 Calculating analytic Hessian of the SCF energy
 Polarizability Matrix (a.u.)
            1           2           3
    1 -14.1737885   5.8736715   3.7867806
    2   5.8736715 -13.0664354  -3.4819575
    3   3.7867806  -3.4819575  -9.8895272
 CPU 0.00 s  wall 0.00 s
 CPU 5.54 s  wall 5.54 s

 Total DFTman time = 5.54 CPUs 5.54 Wall
 
 Direct stationary perturbation theory relativistic correction:
 
 rels  =       0.068935909736
 relv  =      -0.274784382948
 rel2e =       0.090968803792
 E_rel =      -0.114879669420
 
 Hessian of the SCF Energy
            1           2           3           4           5           6
    1   1.0247744  -0.8068231  -0.5202480  -0.5826944   0.4191537   0.2742436
    2  -0.8068231   0.8607577   0.4729366   0.4286992  -0.4016804  -0.2244742
    3  -0.5202480   0.4729366   0.4306138   0.2796608  -0.2238104  -0.2051113
    4  -0.5826944   0.4286992   0.2796608   0.6023285  -0.4739293  -0.3091596
    5   0.4191537  -0.4016804  -0.2238104  -0.4739293   0.4119744   0.2535997
    6   0.2742436  -0.2244742  -0.2051113  -0.3091596   0.2535997   0.1885765
    7  -0.4420800   0.3781240   0.2405872  -0.0196341   0.0547757   0.0349160
    8   0.3876694  -0.4590773  -0.2491262   0.0452302  -0.0102940  -0.0291255
    9   0.2460043  -0.2484624  -0.2255025   0.0294988  -0.0297893   0.0165348
            7           8           9
    1  -0.4420800   0.3876694   0.2460043
    2   0.3781240  -0.4590773  -0.2484624
    3   0.2405872  -0.2491262  -0.2255025
    4  -0.0196341   0.0452302   0.0294988
    5   0.0547757  -0.0102940  -0.0297893
    6   0.0349160  -0.0291255   0.0165348
    7   0.4617141  -0.4328996  -0.2755032
    8  -0.4328996   0.4693714   0.2782517
    9  -0.2755032   0.2782517   0.2089677
 Gradient time:  CPU 21.70 s  wall 21.71 s
 **********************************************************************
 **                                                                  **
 **                       VIBRATIONAL ANALYSIS                       **
 **                       --------------------                       **
 **                                                                  **
 **        VIBRATIONAL FREQUENCIES (CM**-1) AND NORMAL MODES         **
 **     FORCE CONSTANTS (mDYN/ANGSTROM) AND REDUCED MASSES (AMU)     **
 **                  INFRARED INTENSITIES (KM/MOL)                   **
 **                                                                  **
 **********************************************************************
 

 Mode:                 1                      2                      3
 Frequency:       621.29                1410.25                2498.02
 Force Cnst:      2.9340                18.6957                47.3623
 Red. Mass:      12.9010                15.9552                12.8822
 IR Active:          YES                    YES                    YES
 IR Intens:       30.852                  0.431                554.265
 Raman Active:       YES                    YES                    YES
               X      Y      Z        X      Y      Z        X      Y      Z
 C          0.644  0.540  0.262   -0.050 -0.077 -0.040    0.601 -0.543 -0.350
 O         -0.201 -0.241 -0.123    0.495 -0.406 -0.265   -0.254  0.203  0.133
 O         -0.282 -0.164 -0.073   -0.458  0.463  0.295   -0.198  0.205  0.130
 TransDip   0.130  0.109  0.053    0.008 -0.017 -0.010    0.512 -0.465 -0.300

 STANDARD THERMODYNAMIC QUANTITIES AT   298.15 K  AND     1.00 ATM

   This Molecule has  0 Imaginary Frequencies
   Zero point vibrational energy:        6.475 kcal/mol

   Atom    1 Element C  Has Mass   12.00000
   Atom    2 Element O  Has Mass   15.99491
   Atom    3 Element O  Has Mass   15.99491
   Molecular Mass:    43.989820 amu
   Principal axes and moments of inertia in amu*Bohr^2:
                             1           2           3
    Eigenvalues --        0.16210   153.69832   153.86042
          X               0.67840     0.73221     0.06038
          Y              -0.61774     0.61296    -0.49263
          Z              -0.39772     0.29690     0.86814
   Rotational Symmetry Number is   1
   The Molecule is an Asymmetric Top
   Translational Enthalpy:        0.889 kcal/mol
   Rotational Enthalpy:           0.889 kcal/mol
   Vibrational Enthalpy:          6.573 kcal/mol
   gas constant (RT):             0.592 kcal/mol
   Translational Entropy:        37.270  cal/mol.K
   Rotational Entropy:           16.001  cal/mol.K
   Vibrational Entropy:           0.432  cal/mol.K

   Total Enthalpy:                8.943 kcal/mol
   Total Entropy:                53.703  cal/mol.K
Archival summary:
1\1\copper\FREQ\ProcedureUnspecified\6-31G**\21\eric\TueMay2601:32:512015TueMay2601:32:512015\0\\#,FREQ,ProcedureUnspecified,6-31G**,\\0,1\C\O,1,1.15693\O,1,1.16739,2,172.884\\\@

 Total job time:  24.77s(wall), 24.75s(cpu) 
 Tue May 26 01:32:51 2015

        *************************************************************
        *                                                           *
        *  Thank you very much for using Q-Chem.  Have a nice day.  *
        *                                                           *
        *************************************************************




In [24]:
print(len(contents))


17970

Can you see why we don't normally print entire files to the screen? This isn't even a big one.

I wonder if the handle is still readable...


In [25]:
handle.readable()


Out[25]:
True

What if I try and read from it again?


In [26]:
contents2 = handle.read()
print(contents2)



Nothing! So, the end of the file's been reached, and we might as well close it, since we'll be working with contents, not the file (handle) itself.


In [27]:
handle.close()
handle.closed


Out[27]:
True

Just to reiterate, here's what we actually did to open a file, read it into a string, then close it:


In [28]:
handle = open(filename)
contents = handle.read()
handle.close()

Opening a file 2

There were a few other methods that we could call on our handle that had to do with reading, specifically handle.readline() and handle.readlines().

readline will read a single line from an open file handle up until a newline (which is the character '\n'). Basically, every time you see a linebreak or hit return, this invisible character is present.


In [29]:
handle = open(filename)

first_line = handle.readline()
second_line = handle.readline()

print(first_line)
print(second_line)
handle.close()


                  Welcome to Q-Chem

     A Quantum Leap Into The Future Of Chemistry

Notice that the newlines are being interpreted and printed. I think this is because of the print() function.


In [30]:
help(print)


Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.


In [31]:
print(first_line, end='')
print(second_line, end='')


                  Welcome to Q-Chem
     A Quantum Leap Into The Future Of Chemistry

handle.readlines() does the same thing as readline() looped over the entire file, so it returns a list of strings.


In [32]:
handle = open(filename)

contents3 = handle.readlines()

handle.close()

In [33]:
print(contents3[:10])


['                  Welcome to Q-Chem\n', '     A Quantum Leap Into The Future Of Chemistry\n', '\n', '\n', ' Q-Chem 4.3 (beta), Q-Chem, Inc., Pleasanton, CA (2015)\n', '\n', ' Y. Shao,  Z. Gan,  E. Epifanovsky,  A. T. B. Gilbert,  M. Wormit,  \n', ' J. Kussmann,  A. W. Lange,  A. Behn,  J. Deng,  X. Feng,  D. Ghosh,  \n', ' M. Goldey,  P. R. Horn,  L. D. Jacobson,  I. Kaliman,  R. Z. Khaliullin,  \n', ' T. Kus,  A. Landau,  J. Liu,  E. I. Proynov,  Y. M. Rhee,  R. M. Richard,  \n']

In [34]:
contents3[:10]


Out[34]:
['                  Welcome to Q-Chem\n',
 '     A Quantum Leap Into The Future Of Chemistry\n',
 '\n',
 '\n',
 ' Q-Chem 4.3 (beta), Q-Chem, Inc., Pleasanton, CA (2015)\n',
 '\n',
 ' Y. Shao,  Z. Gan,  E. Epifanovsky,  A. T. B. Gilbert,  M. Wormit,  \n',
 ' J. Kussmann,  A. W. Lange,  A. Behn,  J. Deng,  X. Feng,  D. Ghosh,  \n',
 ' M. Goldey,  P. R. Horn,  L. D. Jacobson,  I. Kaliman,  R. Z. Khaliullin,  \n',
 ' T. Kus,  A. Landau,  J. Liu,  E. I. Proynov,  Y. M. Rhee,  R. M. Richard,  \n']

There's another convenient method for strings that lets us do the same thing to a large string; rather than call split(), which will split on spaces, we call splitlines(), which will split on newlines.


In [35]:
contents.splitlines()[:10]


Out[35]:
['                  Welcome to Q-Chem',
 '     A Quantum Leap Into The Future Of Chemistry',
 '',
 '',
 ' Q-Chem 4.3 (beta), Q-Chem, Inc., Pleasanton, CA (2015)',
 '',
 ' Y. Shao,  Z. Gan,  E. Epifanovsky,  A. T. B. Gilbert,  M. Wormit,  ',
 ' J. Kussmann,  A. W. Lange,  A. Behn,  J. Deng,  X. Feng,  D. Ghosh,  ',
 ' M. Goldey,  P. R. Horn,  L. D. Jacobson,  I. Kaliman,  R. Z. Khaliullin,  ',
 ' T. Kus,  A. Landau,  J. Liu,  E. I. Proynov,  Y. M. Rhee,  R. M. Richard,  ']

Notice that the newlines have been removed in this case. Hopefully that doesn't bite us in the future; it may or may not be important for what we do.

Opening a file 3

I think I've shown all the ways contents of files can be read, but what about the opening and closing? There's an easier way, one where the file will be closed for us automatically.


In [36]:
with open(filename) as handle2:
    contents4 = handle2.read()

In [37]:
handle2.closed


Out[37]:
True

This is what we call "syntactic sugar", it's something convenient. Using either is fine. I personally prefer doing it this way, because if you open a bunch of files and forget to close them, over and over again, eventually your memory usage will grow and things might get unbearably slow.

Looping over a file 1

Just like everything else in Python, there are a couple of ways to do this. We can either loop over the contents of the file we've stored in our contents variable, or we can loop over the file directly. Yes, file handles are iterable, just like lists and tuples!

Here's directly looping over the file:


In [38]:
with open(filename) as handle:
    for line in handle:
        if 'Albrecht' in line:
            print(line)
        if 'Berquist' in line:
            print(line)
        if 'Lambrecht' in line:
            print(line)


 P. M. Zimmerman,  D. Zuev,  B. Albrecht,  E. Alguire,  B. Austin,  

 S. A. Baeppler,  G. J. O. Beran,  Y. A. Bernard,  E. Berquist,  

 C.-P. Hsu,  Y. Jung,  J. Kong,  D. S. Lambrecht,  W. Liang,  C. Ochsenfeld,  

and here's looping over our stored variable:


In [39]:
for line in contents.splitlines():
    if 'time' in line:
        print(line)


 Total DFTman time = 0.22 CPUs 0.22 Wall
 Total DFTman time = 0.24 CPUs 0.24 Wall
 Total DFTman time = 0.24 CPUs 0.24 Wall
 Total DFTman time = 0.24 CPUs 0.24 Wall
 Total DFTman time = 0.24 CPUs 0.24 Wall
 Total DFTman time = 0.24 CPUs 0.25 Wall
 Total DFTman time = 0.24 CPUs 0.24 Wall
 Total DFTman time = 0.24 CPUs 0.24 Wall
 Total DFTman time = 0.25 CPUs 0.24 Wall
 Total DFTman time = 0.24 CPUs 0.24 Wall
 Total DFTman time = 0.24 CPUs 0.24 Wall
 SCF time:  CPU 2.83 s  wall 2.83 s
 Total DFTman time = 1.83 CPUs 1.83 Wall
 Total DFTman time = 1.55 CPUs 1.55 Wall
 Total DFTman time = 1.81 CPUs 1.82 Wall
 Total DFTman time = 1.82 CPUs 1.82 Wall
 Total DFTman time = 1.86 CPUs 1.86 Wall
 Total DFTman time = 1.82 CPUs 1.82 Wall
 Total DFTman time = 1.82 CPUs 1.82 Wall
 Total DFTman time = 2.29 CPUs 2.29 Wall
 Total DFTman time = 5.54 CPUs 5.54 Wall
 Gradient time:  CPU 21.70 s  wall 21.71 s
 Total job time:  24.77s(wall), 24.75s(cpu) 

Notice that I called contents.splitlines(); this way, we make a list of strings, so iterating will give us one string at a time.

We can't loop over contents directly. Why not?


In [40]:
for line in contents[2000:2500]:
    print(line)


u
,
 
 
I
.
 
Y
.
 
Z
h
a
n
g
,
 
 
X
.
 
Z
h
a
n
g
,
 
 
Y
.
 
Z
h
a
o
,
 
 


 
B
.
 
R
.
 
B
r
o
o
k
s
,
 
 
G
.
 
K
.
 
L
.
 
C
h
a
n
,
 
 
D
.
 
M
.
 
C
h
i
p
m
a
n
,
 
 
C
.
 
J
.
 
C
r
a
m
e
r
,
 
 


 
W
.
 
A
.
 
G
o
d
d
a
r
d
 
I
I
I
,
 
 
M
.
 
S
.
 
G
o
r
d
o
n
,
 
 
W
.
 
J
.
 
H
e
h
r
e
,
 
 
A
.
 
K
l
a
m
t
,
 
 


 
H
.
 
F
.
 
S
c
h
a
e
f
e
r
 
I
I
I
,
 
 
M
.
 
W
.
 
S
c
h
m
i
d
t
,
 
 
C
.
 
D
.
 
S
h
e
r
r
i
l
l
,
 
 
D
.
 
G
.
 
T
r
u
h
l
a
r
,
 
 


 
A
.
 
W
a
r
s
h
e
l
,
 
 
X
.
 
X
u
,
 
 
A
.
 
A
s
p
u
r
u
-
G
u
z
i
k
,
 
 
R
.
 
B
a
e
r
,
 
 
A
.
 
T
.
 
B
e
l
l
,
 
 
N
.
 
A
.
 
B
e
s
l
e
y
,
 
 


 
J
.
-
D
.
 
C
h
a
i
,
 
 
A
.
 
D
r
e
u
w
,
 
 
B
.
 
D
.
 
D
u
n
i
e
t
z
,
 
 
T
.
 
R
.
 
F
u
r
l
a
n
i
,
 
 
S
.
 
R
.
 
G
w
a
l
t
n
e
y
,
 
 


 
C
.
-
P
.
 
H
s
u
,
 
 
Y
.
 
J
u
n
g
,
 
 
J
.
 
K
o
n
g
,
 
 
D
.
 
S
.
 
L
a
m
b
r
e
c
h
t
,
 
 
W
.
 
L
i
a
n
g
,
 
 
C
.
 
O
c
h
s
e
n
f
e
l
d
,
 
 


 
V
.
 
A
.
 
R
a
s
s
o
l
o
v
,
 
 
L
.
 
V
.
 
S
l
i
p
c

contents is a string; iterating over a string will give you its characters. Clearly this is nonsense. Be careful!

Looping over a file 2

Extracting frequencies

Now we can use our file opening/closing/looping knowledge to extract useful information from files.

Let's say I want to extract all of the vibrational frequencies from an output file, and store them in a list called frequencies as floating-point numbers.

The key to extracting information from QM outputs (or any text file, really) is to understand the context in which the information appears. What's the file structured like? How do we actually get the information we want?

I'll split the contents on newlines to make it easier to work with.


In [41]:
contents_splitlines = contents.splitlines()

The information I want occurs near the end.


In [42]:
contents_splitlines[370:]


Out[42]:
[' **                                                                  **',
 ' **                       VIBRATIONAL ANALYSIS                       **',
 ' **                       --------------------                       **',
 ' **                                                                  **',
 ' **        VIBRATIONAL FREQUENCIES (CM**-1) AND NORMAL MODES         **',
 ' **     FORCE CONSTANTS (mDYN/ANGSTROM) AND REDUCED MASSES (AMU)     **',
 ' **                  INFRARED INTENSITIES (KM/MOL)                   **',
 ' **                                                                  **',
 ' **********************************************************************',
 ' ',
 '',
 ' Mode:                 1                      2                      3',
 ' Frequency:       621.29                1410.25                2498.02',
 ' Force Cnst:      2.9340                18.6957                47.3623',
 ' Red. Mass:      12.9010                15.9552                12.8822',
 ' IR Active:          YES                    YES                    YES',
 ' IR Intens:       30.852                  0.431                554.265',
 ' Raman Active:       YES                    YES                    YES',
 '               X      Y      Z        X      Y      Z        X      Y      Z',
 ' C          0.644  0.540  0.262   -0.050 -0.077 -0.040    0.601 -0.543 -0.350',
 ' O         -0.201 -0.241 -0.123    0.495 -0.406 -0.265   -0.254  0.203  0.133',
 ' O         -0.282 -0.164 -0.073   -0.458  0.463  0.295   -0.198  0.205  0.130',
 ' TransDip   0.130  0.109  0.053    0.008 -0.017 -0.010    0.512 -0.465 -0.300',
 '',
 ' STANDARD THERMODYNAMIC QUANTITIES AT   298.15 K  AND     1.00 ATM',
 '',
 '   This Molecule has  0 Imaginary Frequencies',
 '   Zero point vibrational energy:        6.475 kcal/mol',
 '',
 '   Atom    1 Element C  Has Mass   12.00000',
 '   Atom    2 Element O  Has Mass   15.99491',
 '   Atom    3 Element O  Has Mass   15.99491',
 '   Molecular Mass:    43.989820 amu',
 '   Principal axes and moments of inertia in amu*Bohr^2:',
 '                             1           2           3',
 '    Eigenvalues --        0.16210   153.69832   153.86042',
 '          X               0.67840     0.73221     0.06038',
 '          Y              -0.61774     0.61296    -0.49263',
 '          Z              -0.39772     0.29690     0.86814',
 '   Rotational Symmetry Number is   1',
 '   The Molecule is an Asymmetric Top',
 '   Translational Enthalpy:        0.889 kcal/mol',
 '   Rotational Enthalpy:           0.889 kcal/mol',
 '   Vibrational Enthalpy:          6.573 kcal/mol',
 '   gas constant (RT):             0.592 kcal/mol',
 '   Translational Entropy:        37.270  cal/mol.K',
 '   Rotational Entropy:           16.001  cal/mol.K',
 '   Vibrational Entropy:           0.432  cal/mol.K',
 '',
 '   Total Enthalpy:                8.943 kcal/mol',
 '   Total Entropy:                53.703  cal/mol.K',
 'Archival summary:',
 '1\\1\\copper\\FREQ\\ProcedureUnspecified\\6-31G**\\21\\eric\\TueMay2601:32:512015TueMay2601:32:512015\\0\\\\#,FREQ,ProcedureUnspecified,6-31G**,\\\\0,1\\C\\O,1,1.15693\\O,1,1.16739,2,172.884\\\\\\@',
 '',
 ' Total job time:  24.77s(wall), 24.75s(cpu) ',
 ' Tue May 26 01:32:51 2015',
 '',
 '        *************************************************************',
 '        *                                                           *',
 '        *  Thank you very much for using Q-Chem.  Have a nice day.  *',
 '        *                                                           *',
 '        *************************************************************',
 '',
 '']

We can see the frequencies all occur on a single line:

' Frequency:       621.29                1410.25                2498.02',

and maybe we can check for whether Frequency: is in a line to get frequencies. First, we need a place to store our results.

Check all the lines for a match, and print out a match if it exists:


In [43]:
for line in contents_splitlines:
    if 'Frequency:' in line:
        print(line)


 Frequency:       621.29                1410.25                2498.02

It worked! You'll have to take my word for it that if this occurred on multiple lines of an output (say, if there were more than 3 vibrational frequencies), this would catch every instance. That's the beauty of looping.

The only problem with the code above is that once we match the line, we don't actually do anything with it other than print it. Let's try storing it in a variable s so we can manipulate it after our loop is complete.


In [44]:
for line in contents_splitlines:
    if 'Frequency:' in line:
        s = line

In [45]:
print(s)


 Frequency:       621.29                1410.25                2498.02

Now, to make it into a list of floats:


In [46]:
s.split()


Out[46]:
['Frequency:', '621.29', '1410.25', '2498.02']

In [47]:
s.split()[1:]


Out[47]:
['621.29', '1410.25', '2498.02']

In [48]:
map(float, s.split()[1:])


Out[48]:
<map at 0x7f1ae92a9518>

In [49]:
list(map(float, s.split()[1:]))


Out[49]:
[621.29, 1410.25, 2498.02]

Ok, now we know how to turn a frequency line into a list of numbers we can work with in some other piece of code.

But, now we have another problem.

What if we have more than one line that contains Frequency:? This will only catch the very last one! We need to do all this work inside the loop.


In [50]:
frequencies = []

for line in contents_splitlines:
    if 'Frequency:' in line:
        frequencies_oneline = list(map(float, line.split()[1:]))
        frequencies.extend(frequencies_oneline)

In [51]:
print(frequencies)


[621.29, 1410.25, 2498.02]

We can't use list.append(); that would append a list to a list, so we'd end up with a list-of-lists. We just want a single list, so we extend() it.