Strings and Stuff in Python



In [ ]:

    
import numpy as np

Strings are just arrays of characters



In [ ]:

    
s = 'spam'

s,len(s),s[0],s[0:2]



In [ ]:

    
s[::-1]

Arithmetic with Strings



In [ ]:

    
s = 'spam'
e = "eggs"

s + e



In [ ]:

    
s + " " + e



In [ ]:

    
4 * (s + " ") + e



In [ ]:

    
print(4 * (s + " ") + s + " and\n" + e)     # use \n to get a newline with the print function

You can compare strings



In [ ]:

    
"spam" == "good"



In [ ]:

    
"spam" != "good"



In [ ]:

    
"spam" == "spam"



In [ ]:

    
"sp" < "spam"



In [ ]:

    
"spam" < "eggs"

Python supports `Unicode` characters

You can enter unicode characters directly from the keyboard (depends on your operating system), or you can use the ASCII encoding.

A list of ASCII encoding can be found here.

For example the ASCII ecoding for the greek capital omega is U+03A9, so you can create the character with \U000003A9



In [ ]:

    
print("This resistor has a value of 100 k\U000003A9")



In [ ]:

    
Ω = 1e3

Ω + np.pi

Emoji are unicode characters, so you can use them a well (not all OSs will show all characters!)



In [ ]:

    
radio_active = "\U00002622"
wink = "\U0001F609"

print(radio_active + wink)

Emoji can not be used as variable names (at least not yet ...)



In [ ]:

    
☢ = 2.345

☢ ** 2

Watch out for variable types!



In [ ]:

    
n = 4

print("I would like " + n + " orders of spam")



In [ ]:

    
print("I would like " + str(n) + " orders of spam")

Use explicit formatting to avoid these errors

Python string formatting has the form:

{Variable Index: Format Type} .format(Variable)

Format Types d = Integer decimal g = Floating point format (Uses exponential format if exponent is less than -4) f = Floating point decimal x = hex s = String o = octal e = Floating point exponential b = binary



In [ ]:

    
A = 42
B = 1.23456
C = 1.23456e10
D = 'Forty Two'



In [ ]:

    
"I like the number {0:d}".format(A)



In [ ]:

    
"I like the number {0:s}".format(D)



In [ ]:

    
"The number {0:f} is fine, but not a cool as {1:d}".format(B,A)



In [ ]:

    
"The number {0:.3f} is fine, but not a cool as {1:d}".format(C,A)       # 3 places after decimal



In [ ]:

    
"The number {0:.3e} is fine, but not a cool as {1:d}".format(C,A)       # sci notation



In [ ]:

    
"{0:g} and {1:g} are the same format but different results".format(B,C)

Nice trick to convert number to a different base



In [ ]:

    
"Representation of the number {1:s} - dec: {0:d};  hex: {0:x};  oct: {0:o};  bin: {0:b}".format(A,D)

Formatting is way better than piecing strings together



In [ ]:

    
import pandas as pd



In [ ]:

    
planet_table = pd.read_csv('Planets.csv')



In [ ]:

    
for idx,val in enumerate(planet_table['Name']):
    
    a = planet_table['a'][idx]
    
    if (a < 3.0):
        Place = "Inner"
        
    else:
        Place = "Outer"
    
    my_string = ("The planet {0:s}, at a distance of {1:.1f} AU, is in the {2:s} solar system"
                .format(val,a,Place))
   
    print(my_string)

Really long strings



In [ ]:

    
long_string = (
"""
The planets {0:s} and {1:s} are at a distance
of {2:.1f} AU and {3:.1f} AU from the Sun.
"""
.format(planet_table['Name'][1],planet_table['Name'][4],
        planet_table['a'][1],planet_table['a'][4])
)



In [ ]:

    
print(long_string)

You can also use the `textwrap` module



In [ ]:

    
import textwrap



In [ ]:

    
lots_of_spam = (s + " ") * 100



In [ ]:

    
print(lots_of_spam)



In [ ]:

    
textwrap.wrap(lots_of_spam, width=70)

Working with strings



In [ ]:

    
line = "My hovercraft is full of eels"

Find and Replace



In [ ]:

    
line.replace('eels', 'wheels')

Justification and Cleaning



In [ ]:

    
line.center(100)



In [ ]:

    
line.ljust(100)



In [ ]:

    
line.rjust(100, "*")



In [ ]:

    
line2 = "            My hovercraft is full of eels      "



In [ ]:

    
line2.strip()



In [ ]:

    
line3 = "*$*$*$*$*$*$*$*$My hovercraft is full of eels*$*$*$*$"



In [ ]:

    
line3.strip('*$')



In [ ]:

    
line3.lstrip('*$'), line3.rstrip('*$')

Splitting and Joining



In [ ]:

    
line.split()



In [ ]:

    
'_*_'.join(line.split())



In [ ]:

    
' '.join(line.split()[::-1])

Line Formatting



In [ ]:

    
anotherline = "mY hoVErCRaft iS fUlL oF eEELS"



In [ ]:

    
anotherline.upper()



In [ ]:

    
anotherline.lower()



In [ ]:

    
anotherline.title()



In [ ]:

    
anotherline.capitalize()



In [ ]:

    
anotherline.swapcase()

Regular Expression in Python (`re`)



In [ ]:

    
import re



In [ ]:

    
myline = "This is a test, this in only a test."



In [ ]:

    
print(myline)

Raw strings begin with a special prefix (r) and signal Python not to interpret backslashes and other special metacharacters in the string, allowing you to pass them through directly to the regular expression engine.



In [ ]:

    
regex1 = r"test"



In [ ]:

    
match1 = re.search(regex1, myline)



In [ ]:

    
match1



In [ ]:

    
myline[10:14]



In [ ]:

    
match3 = re.findall(regex1, myline)



In [ ]:

    
match3

One of the useful tings about regular expressions in Python is using the to search and replace parts of string (`re.sub`)



In [ ]:

    
mynewline = re.sub(regex1, "*TEST*", myline)



In [ ]:

    
mynewline

RegEx Golf!



In [ ]:

    
golf_file = open("./GOLF/golf_00").read().splitlines()



In [ ]:

    
golf_file



In [ ]:

    
for i in golf_file:
    print(i)



In [ ]:

    
def regex_test_list(mylist, myregex):
    
    for line in mylist:
        
        mytest = re.search(myregex, line)
        
        if (mytest):
            print(line + " YES")
        else:
            print(line + " NOPE")



In [ ]:

    
regex = r"one"



In [ ]:

    
regex_test_list(golf_file, regex)



In [ ]:

    
regex = r"t|n"



In [ ]:

    
regex_test_list(golf_file, regex)

Working with Files and Directories (OS agnostic)

The os package allows you to do operating system stuff without worrying about what system you are using



In [ ]:

    
import os



In [ ]:

    
os.chdir("./MyData")



In [ ]:

    
my_data_dir = os.listdir()



In [ ]:

    
my_data_dir



In [ ]:

    
for file in my_data_dir:
    
    if file.endswith(".txt"):
        print(file)



In [ ]:

    
for file in my_data_dir:
    
    if file.endswith(".txt"):
        print(os.path.abspath(file))

You can also find files with `glob`



In [ ]:

    
import glob



In [ ]:

    
my_files = glob.glob('02_*.fits')



In [ ]:

    
my_files



In [ ]:

    
for file in my_files:
    
    file_size = os.stat(file).st_size
    out_string = "The file {0} as a size of {1} bytes".format(file,file_size)
    
    print(out_string)