script available on GitHub
Refer to this how-to.
Python has its own package manager "pip" to keep Python self-contained. pip also allows access to new packges even if your OS is out-of-date. If you installed Python using Anaconda, then your package manager will be "conda", which also has a nice documentation.
Open a terminal and type "python"
print("hello world")) in a file, say "hello.py".python hello.py
If you plan to do computational research in the future, please pick either emacs or vim. They are excellent command-line editors with many academic users. Command-line editors have steeper learning curves than GUI-based ones, but you can do way more with them over time (good luck using GUI editors on supercomputers). Many excellent online tutorials exist.
In [4]:
# I don't know how to write a program but I am charming,
# so I will write down the equations to be implemented
# and find a friend to write it :)
"""
It is annoying to have to start each comment with a #,
triple quotation allows multi-line comments.
It is always a good idea to write lots of comment to lay out the
cohesive idea you had while starting to write a piece of code.
More often than not, we forget that impressive grand plan we started
with as we fight with syntax error and other nitty-gritty of
talking to a computer.
""";
An excellent paper by Greg Wilson et. al. concisely summarizes the best practices of scientific computing. I will steal the most relavant section from the summary paragraph here:
Much of the following can be found on "A Beginner's Python Tutorial"
basic arithmetics are built in
In [5]:
1 + 1
Out[5]:
In [6]:
2*3
Out[6]:
In [7]:
2**3
Out[7]:
In [8]:
7/2 # gotcha !
Out[8]:
In [9]:
7./2
Out[9]:
In [10]:
5%2 # modulo
Out[10]:
more advanced functions can be accessed using the numpy package
In [11]:
import numpy as np
In [12]:
np.exp(1j)
Out[12]:
In [13]:
np.cos(1) + 1j*np.sin(1)
Out[13]:
In [14]:
np.sqrt(144)
Out[14]:
In [15]:
for my_index in [1,2,3]:
print(my_index)
# end for
In [16]:
for my_index in range(3):
print(my_index)
# end for
In [17]:
# while loop may not terminate
my_index = 1
while my_index < 4:
print(my_index)
my_index += 1 # try comment this out... JK don't do it!
# end while
python uses indentation to determine blocks, you could easily have done
my_index = 1
while my_index < 4:
print(my_index)
my_index += 1
that would be a big oopsy
In [18]:
# for loop always terminates, thus it is preferred
for my_index in range(10):
if (my_index>0) and (my_index<=3):
print(my_index)
elif (my_index>3):
break
# end if
# end for
define modular functions to maximize code reusability and readability
In [19]:
def boltzmann_factor(energy_in_J,temperature_in_K):
# 1 joule = 7.243e22 K *kB
kB = 1.38e-23 # m^2 kg/ s^2 K
return np.exp(-float(energy_in_J)/kB/temperature_in_K)
# end def
In [20]:
def fermi_dirac_dist(energy_in_J,temperature_in_K,chemical_pot_in_J):
denomimator = 1.0/boltzmann_factor(
energy_in_J-chemical_pot_in_J
,temperature_in_K
) + 1.0
return 1.0/denomimator
# end def
In [21]:
def bose_einstein_dist(energy_in_J,temperature_in_K,chemical_pot_in_J):
denomimator = 1.0/boltzmann_factor(
energy_in_J-chemical_pot_in_J
,temperature_in_K
) - 1.0
return 1.0/denomimator
# end def
In [22]:
# 50% occupation near chemical potential
fermi_dirac_dist(1.01e-22,300,1e-22)
Out[22]:
In [23]:
# divergent occupation near chemical potential
bose_einstein_dist(1.01e-22,300,1e-22)
Out[23]:
list: iterable, extendable, mutable and ordered array of elements
tuple: immutable list
dictionary: iterable, extendable, mutable and un-ordered key-value pairs
In [24]:
mylist = [5,4,2,3,1]
for item in mylist:
print(item)
# end for
In [25]:
mylist[2] = 100
mylist.insert(0,50)
In [26]:
for i in range(len(mylist)):
print( mylist[i] )
# end for
In [27]:
mytuple = (5,4,2,3,1)
for item in mytuple:
print(item)
# end for
In [28]:
mytuple[2] = 100
# oopsy-daisies
In [29]:
mydict = {0:5,1:4,2:2,3:3,4:1}
for i in range(len(mydict)):
print( mydict[i] )
# end for
In [30]:
mydict = {
"name":"Paul"
,"favorite number":42
,"where abouts":"elusive"
,"hobbies":["coffee","code"]
}
mydict.keys()
Out[30]:
In [31]:
mydict["where abouts"]
Out[31]:
In [32]:
mydict["new entry"] = False
In [33]:
for key,value in mydict.iteritems():
print( "%s : %s" % (str(key),str(value)) )
# end for
In [34]:
mylist = [5,4,2,3,1]
[item**2 for item in mylist]
Out[34]:
In [35]:
square_and_shift = lambda x,y:x**2+y
[square_and_shift(item,50) for item in mylist]
Out[35]:
In [36]:
# from index 1 to -2 (wrap around)
mylist[1:-2]
Out[36]:
In [37]:
# all even indices
mylist[::2]
Out[37]:
In [38]:
# all odd indices
mylist[1::2]
Out[38]:
In [39]:
mylist = [5,4,2,3,1]
entry = [1,2]
mylist.append(entry)
# only a reference to entry is saved, NOT a copy, which means ...
# entry can be changed elsewhere without mylist knowing
In [40]:
mylist
Out[40]:
In [41]:
entry[0] = 10
mylist
Out[41]:
In [42]:
# use a deep copy to avoid the above problem
from copy import deepcopy
mylist = [5,4,2,3,1]
entry = [1,2]
mylist.append( deepcopy(entry) )
entry[0] = 10
mylist
Out[42]:
In [43]:
demon_burn_my_soul = 50.0
def firey_hell(demon_burn_my_soul):
demon_burn_my_soul += 10.
firey_hell(20)
print(demon_burn_my_soul)
In [44]:
# you can use a global variable, but this is NOT recommended
# see classes for better solution
global demon_burn_my_soul
demon_burn_my_soul = 50.0
def firey_hell():
# side effect! bad! bad! bad!
global demon_burn_my_soul
demon_burn_my_soul += 10.
firey_hell()
print(demon_burn_my_soul)
In [45]:
class RockStar:
def __init__(self):
self.demon_burn_my_soul = 50.0
# end def init
def firey_hell(self):
self.demon_burn_my_soul += 10.0
# end def
def cry_my_veins(self):
return self.demon_burn_my_soul
# end def cry_my_veins
# end class RockStar
In [46]:
me = RockStar()
In [47]:
me.cry_my_veins()
Out[47]:
In [48]:
me.firey_hell()
me.cry_my_veins()
Out[48]:
In [49]:
trace_text = """-7.436823 -7.290942 -7.271528 -7.282786 -7.283622 -7.268156 -7.401003
-7.304412 -7.211659 -7.231061 -7.27238 -7.287718 -7.240896 -7.121189
-7.098841 -7.169402 -7.16689 -7.161854 -7.204029 -7.284694 -7.260288
-7.368507 -7.472383 -7.442443 -7.448409 -7.409199 -7.353145 -7.242572
-7.277459 -7.24589 -7.159036 -7.268178 -7.234837 -7.165567 -7.165357
-7.137534 -7.231942 -7.225935 -7.16142 -7.183465 -7.257877 -7.279006
-7.284249 -7.306481 -7.240192 -7.286245 -7.316336 -7.251441 -7.192566
-7.191351 -7.065362 -7.050815 -7.116456 -7.186705 -7.242357 -7.240123
-7.284564 -7.385903 -7.468834 -7.427641 -7.378051 -7.315574 -7.287397
-7.262906 -7.197077 -7.187754 -7.136347 -7.149802 -7.301047 -7.281932
-7.353314 -7.434607 -7.375526 -7.397572 -7.433974 -7.477175 -7.471739
-7.474228 -7.51791 -7.525722 -7.52028 -7.534158 -7.539559 -7.53915
-7.533163 -7.426446 -7.417031 -7.475554 -7.41521 -7.377752 -7.319138
-7.20372 -7.294216 -7.290163 -7.310827 -7.302531 -7.339285 -7.252367
-7.232718 -7.275662"""
In [50]:
trace = map(float,trace_text.split())
In [51]:
import matplotlib.pyplot as plt
%matplotlib inline
# Jupyter-specific magic command, ignore for regular script
In [52]:
stuff = plt.plot(trace)
# plt.show() needed for regular script
In [53]:
# suppose the entries have error
err = np.std(trace) * np.random.rand(len(trace))
plt.errorbar(range(len(trace)),trace,err)
Out[53]:
In [54]:
# see trend (correlation) with exponential smoothing
import pandas as pd
data = pd.Series(trace)
plt.plot( trace )
plt.plot( data.ewm(span=5).mean(),ls="--",lw=2 )
Out[54]:
python for loops are VERY slow
numpy vectorized operations are about as fast as fortran (LAPACK under the hood)
In [1]:
import numpy as np
def get_mat_vec(nsize):
mat = np.random.rand(nsize,nsize)
vec = np.random.rand(nsize)
return mat,vec
# end def
def mat_vec_np(mat,vec):
prod = np.dot(mat,vec)
return prod
# end def
def mat_vec_naive(mat,vec):
prod = np.zeros(nsize)
for i in range(nsize):
for j in range(nsize):
prod[i] += mat[i,j]*vec[j]
# end for j
# end for i
return prod
# end def
In [2]:
# verify correctness
nsize = 100
mat,vec = get_mat_vec(nsize)
p1 = mat_vec_np(mat,vec)
p2 = mat_vec_naive(mat,vec)
np.allclose(p1,p2)
Out[2]:
In [3]:
# time it
nsize = 1000
mat,vec = get_mat_vec(nsize)
%timeit mat_vec_np(mat,vec)
In [4]:
%timeit -n 10 mat_vec_naive(mat,vec)
3 orders of magnitude speed difference!
In [1]:
import numpy as np
from copy import deepcopy
import matplotlib.pyplot as plt
%matplotlib inline
def rastrigin2d(rvec,A=10.):
ndim = len(rvec)
const = A * ndim
tosum = rvec**2. - A*np.cos(2*np.pi*rvec)
return const + tosum.sum()
# end def
# put function on a grid for visualization
minx = -5.12
maxx = 5.15
nx = 100
x = np.linspace(minx,maxx,nx)
grid = np.apply_along_axis(rastrigin2d,1
,[np.array([myx,myy]) for myx in x for myy in x] ) # vectorized
grid = grid.reshape(nx,nx) # reshape for plotting
# visualize
fig = plt.figure()
ax = fig.add_subplot(111,aspect=1)
ax.set_xlabel("x")
ax.set_ylabel("y")
cs = ax.contourf(x,x,grid.T,cmap=plt.cm.magma)
# transpose is needed because matrix index direction and plot axes
# directions are opposite of one another.
plt.colorbar(cs)
# below I will use pso to find the minimum of this function
Out[1]:
In [18]:
# initialize population
pop_size = 20
dim = 2
pop = (maxx-minx) * np.random.rand(pop_size,dim) + minx
# find personal best
individual_best = np.apply_along_axis(rastrigin2d,1,pop) # vectorized
individual_best_pos = deepcopy(pop) # deep copy for array of arrays
# find population best
min_idx = np.argmin(individual_best) # find minimum index
global_best = individual_best[min_idx].copy() # find minimum
global_best_pos = pop[min_idx].copy() # shalow copy sufficient for array
# initialize hopping sizes and directions
max_hop = 0.3
hop = max_hop * np.random.rand(pop_size,dim)
In [19]:
background = plt.figure()
ax = background.add_subplot(111,aspect=1)
ax.set_xlabel("x")
ax.set_ylabel("y")
cs = ax.contourf(x,x,grid.T,alpha=0.3,cmap=plt.cm.magma)
ax.scatter(pop.T[0],pop.T[1],label="current positions")
ax.scatter(individual_best_pos.T[0],individual_best_pos.T[1]
,c="g",alpha=0.7,label="individual best",marker='^',s=40)
ax.scatter(global_best_pos[0],global_best_pos[1],color="r"
,label="global best",marker="*",s=80)
ax.legend(scatterpoints = 1,fontsize=10,loc="best")
background.colorbar(cs)
Out[19]:
In [49]:
c1 = 2
c2 = 2
max_it = 5
for istep in range(max_it):
# evaluate fitness of population
fitness = np.apply_along_axis(rastrigin2d,1,pop)
# calculate global best
min_idx = np.argmin(fitness)
current_best = fitness[min_idx]
if current_best < global_best:
global_best = current_best
global_best_pos = pop[min_idx].copy()
# end if
# update individual best
idx = np.where( np.array(fitness) < np.array(individual_best) )
individual_best[idx] = fitness[idx]
individual_best_pos[idx] = deepcopy( pop[idx] )
# update hopping
hop += c1*np.random.rand()*(individual_best_pos-pop) + \
c2*np.random.rand()*(global_best_pos-pop)
idx = np.where( abs(hop) > max_hop )
hop[idx] = np.sign(hop[idx])*max_hop
# update populaton
pop += hop
# end for istep
In [50]:
background = plt.figure()
ax = background.add_subplot(111,aspect=1)
ax.set_xlabel("x")
ax.set_ylabel("y")
cs = ax.contourf(x,x,grid.T,alpha=0.3,cmap=plt.cm.magma)
ax.scatter(pop.T[0],pop.T[1],label="current positions")
ax.scatter(individual_best_pos.T[0],individual_best_pos.T[1]
,c="g",alpha=0.7,label="individual best",marker='^',s=40)
ax.scatter(global_best_pos[0],global_best_pos[1],color="r"
,label="global best",marker="*",s=80)
ax.legend(scatterpoints = 1,fontsize=10,loc="best")
background.colorbar(cs)
Out[50]:
In [51]:
global_best
Out[51]:
In [52]:
global_best_pos
Out[52]:
plain text file
In [16]:
%%writefile output.txt
# I am an ugly output file, but I have many hidden treasures
BEGIN PASSWORDS OF EVERYONE
test123
1234567890
abcde
hello
passwd
password
END PASSWORDS OF EVERYONE
data follows
3
1.0 2.0 3.0
4.0 5.0 6.0
7.0 8.0 9.0
In [17]:
# one text block
fhandle = open("output.txt",'r')
# now you have to parse this ugly text
fhandle.read()
Out[17]:
In [18]:
# line by line
fhandle = open("output.txt",'r')
for line in fhandle:
print line
In [23]:
# smart search
from mmap import mmap
fhandle = open("output.txt",'r+')
mm = mmap(fhandle.fileno(),0) # 0 means read from beginning
In [24]:
# read block
begin_idx = mm.find("BEGIN")
end_idx = mm.find("END")
good_lines= mm[begin_idx:end_idx].split("\n")[1:-1]
good_lines
Out[24]:
In [27]:
# read data section
mm.seek(0) # go to beginning of file
idx = mm.find("data follows")
mm.seek(idx) # goto data line
mm.readline() # skip header
ndata = int(mm.readline())
data = []
for idata in range(ndata):
data.append( map(float,mm.readline().split()) )
# end for idata
data
Out[27]:
In [37]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [5]:
# loaded databases is easy
dft = pd.read_json("dft.json")
qmc = pd.read_json("qmc.json")
In [10]:
# first thing to do is look at it? not so useful
dft
Out[10]:
In [11]:
# look at columns
dft.columns
Out[11]:
In [30]:
# access interesting columns
dft[["energy","pressure"]]
Out[30]:
In [21]:
# plot energy vs. displacement
xlabel = "istep"
ylabel = "energy"
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.scatter(dft[xlabel],dft[ylabel])
plt.ylim(-3.8665,-3.865)
Out[21]:
In [23]:
dmc = qmc[qmc["iqmc"]==4]
vmc = qmc[qmc["iqmc"]==0]
In [24]:
xlabel = "istep"
ylabel = "LocalEnergy_mean"
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.scatter(dmc[xlabel],dmc[ylabel])
Out[24]:
In [31]:
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_xlabel("displacement (bohr)")
ax.set_ylabel("Varaince (Ha$^2$)")
marker_style = {0:"s",1:"^"}
colors = {0:"g",1:"r"}
rjas = 1 # use reference jastrow
for rorb in [0,1]:
mydf = vmc[ vmc["rorb"] == rorb ]
ax.errorbar(mydf["disp"],mydf["Variance_mean"],mydf["Variance_error"].values,ls="",marker=marker_style[rorb]
,color=colors[rorb],label="ref. orb %d"%rorb)
# end for
ax.legend(loc="best",scatterpoints=1)
#ax.set_ylim(1.1,1.5)
fig.tight_layout()
#plt.savefig("variance_vs_disp-rjas1.eps")
In [34]:
# drop bad runs
sel1 = (qmc["imode"]==5) & (qmc["istep"]==-2)
qmc = qmc.drop(qmc[sel1].index)
sel2 = (qmc["imode"]==10) & (qmc["istep"]==2)
qmc = qmc.drop(qmc[sel2].index)
dmc = qmc[qmc["iqmc"]==4]
vmc = qmc[qmc["iqmc"]==0]
In [35]:
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_xlabel("displacement (bohr)")
ax.set_ylabel("Varaince (Ha$^2$)")
marker_style = {0:"s",1:"^"}
colors = {0:"g",1:"r"}
rjas = 1 # use reference jastrow
for rorb in [0,1]:
mydf = vmc[ vmc["rorb"] == rorb ]
ax.errorbar(mydf["disp"],mydf["Variance_mean"],mydf["Variance_error"].values,ls="",marker=marker_style[rorb]
,color=colors[rorb],label="ref. orb %d"%rorb)
# end for
ax.legend(loc="best",scatterpoints=1)
#ax.set_ylim(1.1,1.5)
fig.tight_layout()
#plt.savefig("variance_vs_disp-rjas1.eps")
In [38]:
dmc.groupby(["rorb","rjas"]).apply(np.mean)[["LocalEnergy_mean","Variance_mean"]]
Out[38]:
In [ ]: