This notebook was put together by [Jake Vanderplas](http://www.vanderplas.com) for UW's [Astro 599](http://www.astro.washington.edu/users/vanderplas/Astr599/) course. Source and license info is on [GitHub](https://github.com/jakevdp/2013_fall_ASTR599/).
In [1]:
    
import string
dir(string)
    
    Out[1]:
In [2]:
    
s = "HeLLo tHEre MY FriEND"
    
In [3]:
    
s.upper()
    
    Out[3]:
In [4]:
    
s.lower()
    
    Out[4]:
In [5]:
    
s.title()
    
    Out[5]:
In [6]:
    
s.capitalize()
    
    Out[6]:
In [7]:
    
s.swapcase()
    
    Out[7]:
In [8]:
    
s.split()
    
    Out[8]:
In [9]:
    
L = s.capitalize().split()
print L
    
    
In [10]:
    
s = '_'.join(L)
print s
    
    
In [11]:
    
s.split('_')
    
    Out[11]:
In [12]:
    
''.join(s.split('_'))
    
    Out[12]:
In [13]:
    
s = "    Too many spaces!    "
s.strip()
    
    Out[13]:
In [14]:
    
s = "*~*~*~*Super!!**~*~**~*~**~"
s.strip('*~')
    
    Out[14]:
In [15]:
    
s.rstrip('*~')
    
    Out[15]:
In [16]:
    
s.lstrip('*~')
    
    Out[16]:
In [17]:
    
s.replace('*', '')
    
    Out[17]:
In [18]:
    
s.replace('*', '').replace('~', '')
    
    Out[18]:
In [19]:
    
s = "The quick brown fox jumped"
s.find("fox")
    
    Out[19]:
In [20]:
    
s[16:]
    
    Out[20]:
In [21]:
    
s.find('booyah')
    
    Out[21]:
In [22]:
    
s.startswith('The')
    
    Out[22]:
In [23]:
    
s.endswith('jumped')
    
    Out[23]:
In [24]:
    
s.endswith('fox')
    
    Out[24]:
In [25]:
    
'1234'.isdigit()
    
    Out[25]:
In [26]:
    
'123.45'.isdigit()
    
    Out[26]:
In [27]:
    
'ABC'.isalpha()
    
    Out[27]:
In [28]:
    
'ABC123'.isalpha()
    
    Out[28]:
In [29]:
    
"ABC123".isalnum()
    
    Out[29]:
In [30]:
    
'ABC easy as 123'.isalnum()
    
    Out[30]:
In [31]:
    
'hello'.islower()
    
    Out[31]:
In [32]:
    
'HELLO'.isupper()
    
    Out[32]:
In [33]:
    
'Hello'.istitle()
    
    Out[33]:
In [34]:
    
'   '.isspace()
    
    Out[34]:
In [35]:
    
from math import pi
"my favorite integer is %d, but my favorite float is %f." % (42, pi)
    
    Out[35]:
In [36]:
    
"in exponential notation it's %e" % pi
    
    Out[36]:
In [37]:
    
"to choose smartly if exponential is needed: %g" % pi
    
    Out[37]:
In [38]:
    
"or with a bigger number: %g" % 123456787654321.0
    
    Out[38]:
In [39]:
    
"rounded to three decimal places it's %.3f" % pi
    
    Out[39]:
In [40]:
    
"an integer padded with spaces: %10d" % 42
    
    Out[40]:
In [41]:
    
"an integer padded on the right: %-10d" % 42
    
    Out[41]:
In [42]:
    
"an integer padded with zeros: %010d" % 42
    
    Out[42]:
In [43]:
    
"we can also name our arguments: %(value)d" % dict(value=3)
    
    Out[43]:
In [44]:
    
"Escape the percent sign with an extra symbol: the %d%%" % 99
    
    Out[44]:
Read more about formats in the Python docs
In [45]:
    
"{}{}".format("ABC", 123)
    
    Out[45]:
In [46]:
    
"{0}{1}".format("ABC", 123)
    
    Out[46]:
In [47]:
    
"{0}{0}".format("ABC", 123)
    
    Out[47]:
In [48]:
    
"{1}{0}".format("ABC", 123)
    
    Out[48]:
Formatting comes after the :
In [49]:
    
("%.2f" % 3.14159) ==  "{:.2f}".format(3.14159)
    
    Out[49]:
In [50]:
    
"{0:d} is an integer; {1:.3f} is a float".format(42, pi)
    
    Out[50]:
In [51]:
    
"{the_answer:010d} is an integer; {pi:.5g} is a float".format(the_answer=42,
                                                              pi=pi)
    
    Out[51]:
In [52]:
    
'{desire} to {place}'.format(desire='Fly me',
                             place='The Moon')
    
    Out[52]:
In [53]:
    
# using a pre-defined dictionary
f = {"desire": "Won't you take me",
     "place": "funky town?"}
'{desire} to {place}'.format(**f)
    
    Out[53]:
In [54]:
    
# format also supports binary numbers
"int: {0:d};  hex: {0:x};  oct: {0:o};  bin: {0:b}".format(42)
    
    Out[54]:
In [55]:
    
%%file inout.dat
Here is a nice file
with a couple lines of text
it is a haiku
    
    
In [56]:
    
f = open('inout.dat')
print f.read()
f.close()
    
    
In [57]:
    
f = open('inout.dat')
print f.readlines()
f.close()
    
    
In [58]:
    
for line in open('inout.dat'):
    print line.split()
    
    
In [59]:
    
# write() is the opposite of read()
contents = open('inout.dat').read()
out = open('my_output.dat', 'w')
out.write(contents.replace(' ', '_'))
out.close()
    
In [60]:
    
!cat my_output.dat
    
    
In [61]:
    
# writelines() is the opposite of readlines()
lines = open('inout.dat').readlines()
out = open('my_output.dat', 'w')
out.writelines(lines)
out.close()
    
In [62]:
    
!cat my_output.dat
    
    
In [63]:
    
# Don't modify this: it simply writes the example file
f = open('messy_data.dat', 'w')
import random
for i in range(100):
    for j in range(5):
        f.write(' ' * random.randint(0, 6))
        f.write('%0*.*g' % (random.randint(8, 12),
                            random.randint(5, 10),
                            100 * random.random()))
        if j != 4:
            f.write(',')
    f.write('\n')
f.close()
    
In [64]:
    
# Look at the first four lines of the file:
!head -4 messy_data.dat
    
    
Your task: Write a program that reads in the contents of "messy_data.dat" and extracts the numbers from each line, using the string manipulations we used above (remember that float() will convert a suitable string to a floating-point number).
Next write out a new file named "clean_data.dat".  The new file should contain the same data as the old file, but with uniform formatting and aligned columns.
In [65]:
    
# your solution here
    
In [66]:
    
import numpy as np
data = np.loadtxt("messy_data.dat", delimiter=',')
np.savetxt("clean_data.dat", data,
           delimiter=',', fmt="%8.4f")
    
In [67]:
    
!head -5 clean_data.dat
    
    
Still, text manipulation is a very good skill to have under your belt!