Strings and Stuff in Python


In [ ]:
import numpy as np

Strings are just arrays of characters


In [ ]:
s = 'spam'

s,len(s),s[0],s[0:2]

In [ ]:
s[::-1]

Arithmetic with Strings


In [ ]:
s = 'spam'
e = "eggs"

s + e

In [ ]:
s + " " + e

In [ ]:
4 * (s + " ") + e

In [ ]:
print(4 * (s + " ") + s + " and\n" + e)     # use \n to get a newline with the print function

You can compare strings


In [ ]:
"spam" == "good"

In [ ]:
"spam" != "good"

In [ ]:
"spam" == "spam"

In [ ]:
"sp" < "spam"

In [ ]:
"spam" < "eggs"

Python supports Unicode characters

You can enter unicode characters directly from the keyboard (depends on your operating system), or you can use the ASCII encoding.

A list of ASCII encoding can be found here.

For example the ASCII ecoding for the greek capital omega is U+03A9, so you can create the character with \U000003A9


In [ ]:
print("This resistor has a value of 100 k\U000003A9")

In [ ]:
Ω = 1e3

Ω + np.pi

Emoji are unicode characters, so you can use them a well (not all OSs will show all characters!)


In [ ]:
radio_active = "\U00002622"
wink = "\U0001F609"

print(radio_active + wink)

Emoji can not be used as variable names (at least not yet ...)


In [ ]:
 = 2.345

 ** 2

Watch out for variable types!


In [ ]:
n = 4

print("I would like " + n + " orders of spam")

In [ ]:
print("I would like " + str(n) + " orders of spam")

Use explicit formatting to avoid these errors

Python string formatting has the form:

{Variable Index: Format Type} .format(Variable)

Format Types d = Integer decimal g = Floating point format (Uses exponential format if exponent is less than -4) f = Floating point decimal x = hex s = String o = octal e = Floating point exponential b = binary

In [ ]:
A = 42
B = 1.23456
C = 1.23456e10
D = 'Forty Two'

In [ ]:
"I like the number {0:d}".format(A)

In [ ]:
"I like the number {0:s}".format(D)

In [ ]:
"The number {0:f} is fine, but not a cool as {1:d}".format(B,A)

In [ ]:
"The number {0:.3f} is fine, but not a cool as {1:d}".format(C,A)       # 3 places after decimal

In [ ]:
"The number {0:.3e} is fine, but not a cool as {1:d}".format(C,A)       # sci notation

In [ ]:
"{0:g} and {1:g} are the same format but different results".format(B,C)

Nice trick to convert number to a different base


In [ ]:
"Representation of the number {1:s} - dec: {0:d};  hex: {0:x};  oct: {0:o};  bin: {0:b}".format(A,D)

Formatting is way better than piecing strings together


In [ ]:
import pandas as pd

In [ ]:
planet_table = pd.read_csv('Planets.csv')

In [ ]:
for idx,val in enumerate(planet_table['Name']):
    
    a = planet_table['a'][idx]
    
    if (a < 3.0):
        Place = "Inner"
        
    else:
        Place = "Outer"
    
    my_string = ("The planet {0:s}, at a distance of {1:.1f} AU, is in the {2:s} solar system"
                .format(val,a,Place))
   
    print(my_string)

Really long strings


In [ ]:
long_string = (
"""
The planets {0:s} and {1:s} are at a distance
of {2:.1f} AU and {3:.1f} AU from the Sun.
"""
.format(planet_table['Name'][1],planet_table['Name'][4],
        planet_table['a'][1],planet_table['a'][4])
)

In [ ]:
print(long_string)

You can also use the textwrap module


In [ ]:
import textwrap

In [ ]:
lots_of_spam = (s + " ") * 100

In [ ]:
print(lots_of_spam)

In [ ]:
textwrap.wrap(lots_of_spam, width=70)

Working with strings


In [ ]:
line = "My hovercraft is full of eels"

Find and Replace


In [ ]:
line.replace('eels', 'wheels')

Justification and Cleaning


In [ ]:
line.center(100)

In [ ]:
line.ljust(100)

In [ ]:
line.rjust(100, "*")

In [ ]:
line2 = "            My hovercraft is full of eels      "

In [ ]:
line2.strip()

In [ ]:
line3 = "*$*$*$*$*$*$*$*$My hovercraft is full of eels*$*$*$*$"

In [ ]:
line3.strip('*$')

In [ ]:
line3.lstrip('*$'), line3.rstrip('*$')

Splitting and Joining


In [ ]:
line.split()

In [ ]:
'_*_'.join(line.split())

In [ ]:
' '.join(line.split()[::-1])

Line Formatting


In [ ]:
anotherline = "mY hoVErCRaft iS fUlL oF eEELS"

In [ ]:
anotherline.upper()

In [ ]:
anotherline.lower()

In [ ]:
anotherline.title()

In [ ]:
anotherline.capitalize()

In [ ]:
anotherline.swapcase()

Regular Expression in Python (re)


In [ ]:
import re

In [ ]:
myline = "This is a test, this in only a test."

In [ ]:
print(myline)

Raw strings begin with a special prefix (r) and signal Python not to interpret backslashes and other special metacharacters in the string, allowing you to pass them through directly to the regular expression engine.


In [ ]:
regex1 = r"test"

In [ ]:
match1 = re.search(regex1, myline)

In [ ]:
match1

In [ ]:
myline[10:14]

In [ ]:
match3 = re.findall(regex1, myline)

In [ ]:
match3

One of the useful tings about regular expressions in Python is using the to search and replace parts of string (re.sub)


In [ ]:
mynewline = re.sub(regex1, "*TEST*", myline)

In [ ]:
mynewline

RegEx Golf!


In [ ]:
golf_file = open("./GOLF/golf_00").read().splitlines()

In [ ]:
golf_file

In [ ]:
for i in golf_file:
    print(i)

In [ ]:
def regex_test_list(mylist, myregex):
    
    for line in mylist:
        
        mytest = re.search(myregex, line)
        
        if (mytest):
            print(line + " YES")
        else:
            print(line + " NOPE")

In [ ]:
regex = r"one"

In [ ]:
regex_test_list(golf_file, regex)

In [ ]:
regex = r"t|n"

In [ ]:
regex_test_list(golf_file, regex)

Working with Files and Directories (OS agnostic)

  • The os package allows you to do operating system stuff without worrying about what system you are using

In [ ]:
import os

In [ ]:
os.chdir("./MyData")

In [ ]:
my_data_dir = os.listdir()

In [ ]:
my_data_dir

In [ ]:
for file in my_data_dir:
    
    if file.endswith(".txt"):
        print(file)

In [ ]:
for file in my_data_dir:
    
    if file.endswith(".txt"):
        print(os.path.abspath(file))

You can also find files with glob


In [ ]:
import glob

In [ ]:
my_files = glob.glob('02_*.fits')

In [ ]:
my_files

In [ ]:
for file in my_files:
    
    file_size = os.stat(file).st_size
    out_string = "The file {0} as a size of {1} bytes".format(file,file_size)
    
    print(out_string)