This notebook was put together by [Jake Vanderplas](http://www.vanderplas.com) for UW's [Astro 599](http://www.astro.washington.edu/users/vanderplas/Astr599/) course. Source and license info is on [GitHub](https://github.com/jakevdp/2013_fall_ASTR599/).
In [1]:
import string
dir(string)
Out[1]:
In [2]:
s = "HeLLo tHEre MY FriEND"
In [3]:
s.upper()
Out[3]:
In [4]:
s.lower()
Out[4]:
In [5]:
s.title()
Out[5]:
In [6]:
s.capitalize()
Out[6]:
In [7]:
s.swapcase()
Out[7]:
In [8]:
s.split()
Out[8]:
In [9]:
L = s.capitalize().split()
print L
In [10]:
s = '_'.join(L)
print s
In [11]:
s.split('_')
Out[11]:
In [12]:
''.join(s.split('_'))
Out[12]:
In [13]:
s = " Too many spaces! "
s.strip()
Out[13]:
In [14]:
s = "*~*~*~*Super!!**~*~**~*~**~"
s.strip('*~')
Out[14]:
In [15]:
s.rstrip('*~')
Out[15]:
In [16]:
s.lstrip('*~')
Out[16]:
In [17]:
s.replace('*', '')
Out[17]:
In [18]:
s.replace('*', '').replace('~', '')
Out[18]:
In [19]:
s = "The quick brown fox jumped"
s.find("fox")
Out[19]:
In [20]:
s[16:]
Out[20]:
In [21]:
s.find('booyah')
Out[21]:
In [22]:
s.startswith('The')
Out[22]:
In [23]:
s.endswith('jumped')
Out[23]:
In [24]:
s.endswith('fox')
Out[24]:
In [25]:
'1234'.isdigit()
Out[25]:
In [26]:
'123.45'.isdigit()
Out[26]:
In [27]:
'ABC'.isalpha()
Out[27]:
In [28]:
'ABC123'.isalpha()
Out[28]:
In [29]:
"ABC123".isalnum()
Out[29]:
In [30]:
'ABC easy as 123'.isalnum()
Out[30]:
In [31]:
'hello'.islower()
Out[31]:
In [32]:
'HELLO'.isupper()
Out[32]:
In [33]:
'Hello'.istitle()
Out[33]:
In [34]:
' '.isspace()
Out[34]:
In [35]:
from math import pi
"my favorite integer is %d, but my favorite float is %f." % (42, pi)
Out[35]:
In [36]:
"in exponential notation it's %e" % pi
Out[36]:
In [37]:
"to choose smartly if exponential is needed: %g" % pi
Out[37]:
In [38]:
"or with a bigger number: %g" % 123456787654321.0
Out[38]:
In [39]:
"rounded to three decimal places it's %.3f" % pi
Out[39]:
In [40]:
"an integer padded with spaces: %10d" % 42
Out[40]:
In [41]:
"an integer padded on the right: %-10d" % 42
Out[41]:
In [42]:
"an integer padded with zeros: %010d" % 42
Out[42]:
In [43]:
"we can also name our arguments: %(value)d" % dict(value=3)
Out[43]:
In [44]:
"Escape the percent sign with an extra symbol: the %d%%" % 99
Out[44]:
Read more about formats in the Python docs
In [45]:
"{}{}".format("ABC", 123)
Out[45]:
In [46]:
"{0}{1}".format("ABC", 123)
Out[46]:
In [47]:
"{0}{0}".format("ABC", 123)
Out[47]:
In [48]:
"{1}{0}".format("ABC", 123)
Out[48]:
Formatting comes after the :
In [49]:
("%.2f" % 3.14159) == "{:.2f}".format(3.14159)
Out[49]:
In [50]:
"{0:d} is an integer; {1:.3f} is a float".format(42, pi)
Out[50]:
In [51]:
"{the_answer:010d} is an integer; {pi:.5g} is a float".format(the_answer=42,
pi=pi)
Out[51]:
In [52]:
'{desire} to {place}'.format(desire='Fly me',
place='The Moon')
Out[52]:
In [53]:
# using a pre-defined dictionary
f = {"desire": "Won't you take me",
"place": "funky town?"}
'{desire} to {place}'.format(**f)
Out[53]:
In [54]:
# format also supports binary numbers
"int: {0:d}; hex: {0:x}; oct: {0:o}; bin: {0:b}".format(42)
Out[54]:
In [55]:
%%file inout.dat
Here is a nice file
with a couple lines of text
it is a haiku
In [56]:
f = open('inout.dat')
print f.read()
f.close()
In [57]:
f = open('inout.dat')
print f.readlines()
f.close()
In [58]:
for line in open('inout.dat'):
print line.split()
In [59]:
# write() is the opposite of read()
contents = open('inout.dat').read()
out = open('my_output.dat', 'w')
out.write(contents.replace(' ', '_'))
out.close()
In [60]:
!cat my_output.dat
In [61]:
# writelines() is the opposite of readlines()
lines = open('inout.dat').readlines()
out = open('my_output.dat', 'w')
out.writelines(lines)
out.close()
In [62]:
!cat my_output.dat
In [63]:
# Don't modify this: it simply writes the example file
f = open('messy_data.dat', 'w')
import random
for i in range(100):
for j in range(5):
f.write(' ' * random.randint(0, 6))
f.write('%0*.*g' % (random.randint(8, 12),
random.randint(5, 10),
100 * random.random()))
if j != 4:
f.write(',')
f.write('\n')
f.close()
In [64]:
# Look at the first four lines of the file:
!head -4 messy_data.dat
Your task: Write a program that reads in the contents of "messy_data.dat"
and extracts the numbers from each line, using the string manipulations we used above (remember that float()
will convert a suitable string to a floating-point number).
Next write out a new file named "clean_data.dat"
. The new file should contain the same data as the old file, but with uniform formatting and aligned columns.
In [65]:
# your solution here
In [66]:
import numpy as np
data = np.loadtxt("messy_data.dat", delimiter=',')
np.savetxt("clean_data.dat", data,
delimiter=',', fmt="%8.4f")
In [67]:
!head -5 clean_data.dat
Still, text manipulation is a very good skill to have under your belt!