This notebook is about several practical examples of regular expression. Online regex tester and debugger : regex101
In [1]:
import re
Some notes
In [59]:
a = " SCF Done: E(RB3LYP) = -599.864175717 A.U. after 16 cycles"
if re.search("^\sSCF Done:", a):
print("ok")
In [40]:
a = " CCl 1.70000 1.60000 1.60000 1.70000 1.80000\n"
b = " D1 148.47288 140.38227\n"
c = " D3 -116.60811-112.89609-109.06468-104.97240-100.75211\n"
d = " D1 148.47288\n"
In [33]:
pattern = re.compile("(\d+\.\d+)")
res = pattern.search(b)
print(res.group(0))
In [41]:
pattern = re.compile("\s*([+-]?\d+\.\d+)")
print("a ", pattern.findall(a))
print("b ", pattern.findall(b))
print("c ", pattern.findall(c))
print("d ", pattern.findall(d))
In [64]:
eigentest = " Eigenvalues -- -547.05077-547.86712-548.29237-548.49474-548.57146 "
pattern = re.compile("\s*([+-]?\d+\.\d+)")
pattern.findall(eigentest)
Out[64]:
In [42]:
soup = """ Rotational constants (GHZ): 142.1344479 0.4743210 0.4743149
Standard basis: 6-31+G(d) (6D, 7F)
There are 67 symmetry adapted cartesian basis functions of A symmetry.
There are 67 symmetry adapted basis functions of A symmetry.
67 basis functions, 132 primitive gaussians, 67 cartesian basis functions
18 alpha electrons 18 beta electrons
nuclear repulsion energy 45.1237784897 Hartrees.
NAtoms= 6 NActive= 6 NUniq= 6 SFac= 1.00D+00 NAtFMM= 60 NAOKFM=F Big=F
Integral buffers will be 131072 words long.
Raffenetti 2 integral format.
Two-electron integral symmetry is turned on.
One-electron integrals computed using PRISM.
NBasis= 67 RedAO= T EigKep= 7.02D-03 NBF= 67
NBsUse= 67 1.00D-06 EigRej= -1.00D+00 NBFU= 67
Initial guess from the checkpoint file: "/scratch/183547/Gau-16335.chk"
B after Tr= 0.000000 0.000000 0.000000
Rot= 1.000000 0.000065 0.000000 0.000022 Ang= 0.01 deg.
ExpMin= 4.38D-02 ExpMax= 2.52D+04 ExpMxC= 3.78D+03 IAcc=2 IRadAn= 4 AccDes= 0.00D+00
Harris functional with IExCor= 402 and IRadAn= 4 diagonalized for initial guess.
HarFok: IExCor= 402 AccDes= 0.00D+00 IRadAn= 4 IDoV= 1 UseB2=F ITyADJ=14
ICtDFT= 3500011 ScaDFX= 1.000000 1.000000 1.000000 1.000000
FoFCou: FMM=F IPFlag= 0 FMFlag= 100000 FMFlg1= 0
NFxFlg= 0 DoJE=T BraDBF=F KetDBF=T FulRan=T
wScrn= 0.000000 ICntrl= 500 IOpCl= 0 I1Cent= 200000004 NGrid= 0
NMat0= 1 NMatS0= 1 NMatT0= 0 NMatD0= 1 NMtDS0= 0 NMtDT0= 0
Petite list used in FoFCou.
Keep R1 ints in memory in canonical form, NReq=3514379.
Requested convergence on RMS density matrix=1.00D-08 within 128 cycles.
Requested convergence on MAX density matrix=1.00D-06.
Requested convergence on energy=1.00D-06.
No special actions if energy rises.
EnCoef did 2 forward-backward iterations
EnCoef did 2 forward-backward iterations
EnCoef did 100 forward-backward iterations
EnCoef did 2 forward-backward iterations
SCF Done: E(RB3LYP) = -599.864175717 A.U. after 16 cycles
NFock= 16 Conv=0.99D-09 -V/T= 2.0046
Calling FoFJK, ICntrl= 2127 FMM=F ISym2X=0 I1Cent= 0 IOpClX= 0 NMat=1 NMatS=1 NMatT=0.
***** Axes restored to original set *****
-------------------------------------------------------------------
Center Atomic Forces (Hartrees/Bohr)
Number Number X Y Z
-------------------------------------------------------------------
1 6 -0.000049868 0.001084800 -0.000123630
2 1 0.000003801 0.000176939 0.000127406
3 1 0.000110364 0.000154941 -0.000056689
4 1 -0.000090176 0.000143240 -0.000052491
5 9 -0.000046616 0.003735651 -0.000031395
6 17 0.000072495 -0.005295572 0.000136799
-------------------------------------------------------------------"""
In [55]:
pattern = re.compile("^(\sSCF Done:).*([+-]\d+.\d+)")
for line in soup.split("\n"):
if pattern.match(line):
m = pattern.match(line)
for i in range(3):
print(i, m.group(i))
In [56]:
scan = """ SCF Done: E(RB3LYP) = -548.021019862 A.U. after 19 cycles
NFock= 19 Conv=0.27D-08 -V/T= 2.0040
Scan completed.
Summary of the potential surface scan:
N DSO SCF
---- --------- -----------
1 1.0000 -547.05077
2 1.1000 -547.86712
3 1.2000 -548.29237
4 1.3000 -548.49474
5 1.4000 -548.57146
6 1.5000 -548.57912
7 1.6000 -548.55048
8 1.7000 -548.50429
9 1.8000 -548.45102
10 1.9000 -548.39653
11 2.0000 -548.34390
12 2.1000 -548.29464
13 2.2000 -548.24935
14 2.3000 -548.20822
15 2.4000 -548.17114
16 2.5000 -548.13792
17 2.6000 -548.10833
18 2.7000 -548.08209
19 2.8000 -548.05895
20 2.9000 -548.03866
21 3.0000 -548.02102
---- --------- -----------
"""
In [57]:
scan_patt = re.compile("^\sSummary of the potential surface scan:")
for line in scan.split("\n"):
if scan_patt.match(line):
print(line.strip())