Strings, files combined with lists, dicts and sets with inspection

T.N.Olsthoorn, Feb 27, 2017

Last week we focused on the essential tools in Python, i.e tuples, lists, dicts and sets. Strings are another such tool, but strings become essentially effective if combined with reading files and lists, dicts and set. That's whey we postponed this subject to this week, allowing us more handling space.

We've already mentioned that strings are immutable sequences of characters. So we can't replace individual characters or change them from uppercase to lowercase, nor can't we append characters, but we can replace an old string by a new one, and while doing so apply the required changes. So immutability is never a big issue, on the contrary immutability allows using strings as keys in dicts, which is a game changing advantage.

Let's see some strings:



In [172]:

    
from pprint import pprint



In [82]:

    
s1 = 'This is a string'
s2 ="This too is a ; the `quotes` don't matter as long as your are consequent you can use quotes inside quotes"
s3 =  """This is a multiline
string, mostly used for doc
strings in fucntions and classes
"""

print(s1)
print(s2)
print()
print(s3)









    



This is a string
This too is a ; the `quotes` don't matter as long as your are consequent you can use quotes inside quotes

This is a multiline
string, mostly used for doc
strings in fucntions and classes

Escape character \\ (backslash) is used to interprete special characters that otherwise cannot be printed like the newline \\n and the tab \\tab. There are many more, but these two are most important. To prevent the bacslash from being interpreted as the prelude of a special character use double backslash.

This is often necessary when typing or using strings that represent dirctories in Windows.

For example C:\users\system\python\IHEcourse



In [190]:

    
print("This prints with tabs \t\t and newlines\n\n")
print("A windows directory: C\\users\\system\\python\\IHEcourse")









    



This prints with tabs 		 and newlines


A windows directory: C\users\system\python\IHEcourse

If you don't want the \\ to be intepreted by python you should use "raw" strings, which you get by putting the lower case letter r in front if the string.



In [191]:

    
print(r"This prints with tabs \t\t and newlines\n\n")
print(r"A windows directory: C\\users\\system\\python\\IHEcourse")









    



This prints with tabs \t\t and newlines\n\n
A windows directory: C\\users\\system\\python\\IHEcourse

String addition with + is concatenation and multiplication with * means multiple concatenation:



In [19]:

    
"This is " + 'a `so-called` concatenated' + ' ' +  "string" + (', ha') *3 + '!'









    Out[19]:





'This is a `so-called` concatenated string, ha, ha, ha!'

Or bind the string with a variable, like a and print it:



In [20]:

    
s = "This is " + 'a `so-called` concatenated' + ' ' +  "string" + (', ha') *3 + '!'
print(s)









    



This is a `so-called` concatenated string, ha, ha, ha!

Stings can contain replacement fields { } and then be formatted with the method format(...).

The the values in myList are placed in the replacement field { }. But before doing this, a string is built like so

" {}" * len(myList)

to get the string with the required number of such fields.



In [13]:

    
myList = [3.0, 2.279345, 1.9823, -3.4, 1e3]

(" {}," * len(myList)).format(*myList)









    Out[13]:





' 3.0, 2.279345, 1.9823, -3.4, 1000.0,'

Notice that format() is a method of the string class. It's intensively used in print statements for put values in the strings to be printed. There are many many options to manage the way values are shown after they have been placed in these fields. It's said that format() has a mini-language. You'll get aquited with it, but it is very usefull to read the documentation, or to be at least of aware of it.

Some examples for using the replacement fields:



In [72]:

    
from math import pi, e


print("Just using the replacement fields:\n{0}, {1}, {1}, {1}, {2}, {2}, {2}\n".format(2, pi, e))
print("The number in them indixes the number of the parameters in the format list.\n")
print("Just using the replacement fields, with a different order of the printed variables:\n\
            {2}, {0}, {2}, {1}, {0}, {2}, {1}\n".format(2, pi, e))
print("You don't need the variable number specifier if you use the order of the variables in the format")
print("{}, {}, {}, {}\n".format(pi, e, 314, pi/e))


print("Using d, f, e, and g format specifiers:\n{0:d}, {1:.2f}, {1:.4f}, {1:.2e}, {2:.5e}, {2:.3g}, {2:.5g}\n".format(2, pi, e))
print("Using d, f, e, and g format specifiers with field width:\n\
    {0:5d}, {1:8.2f}, {1:8.4f}, {1:10.2e}, {2:10.5e}, {2:10.3g}, {2:10.5g}\n".format(2, pi, e))

print('d format is integer (whole number), with field width specified\n\
            {0:d}, {0:4d}, {0:10d}\n'.format(314))
print('f format is floating point with field width and decimals specified\n\
            {0:10.0f}, {0:10.2f}, {0:10.6f}\n'.format(pi))
print('e format is floating scientific form with field width and decimals specified\n\
            {0:10.0e}, {0:10.2e}, {0:10.6e}\n'.format(pi))
print('g format is floating general form with field width and significant digits specified\n\
            {0:10.0g}, {0:10.2g}, {0:10.6g}\n'.format(pi))

print('You can combine alingment within the specified field width\n\
            {0:>10.0g}, {0:<10.2g}, {0:<10.6g}\n'.format(pi))

print('Pad integers with leading zeros:\n\
            {0:4d}, {0:04d}, {0:10d}, {0:010d}\n'.format(314))

print('You don\'t even need the `d` when printing integers:\n\
            {0:4}, {0:04}, {0:10}, {0:010}\n'.format(314))

print('The most general replacement is with strings, using s-format:\n\
   {0:s}, {0:10s}, {0:<10s}, {0:>10s}\n\
   you may also here drop the letter s of the format:\n\
   {0}, {0:10}, {0:<10}, {0:>10}'.format('Hello!'))









    



Just using the replacement fields:
2, 3.141592653589793, 3.141592653589793, 3.141592653589793, 2.718281828459045, 2.718281828459045, 2.718281828459045

The number in them indixes the number of the parameters in the format list.

Just using the replacement fields, with a different order of the printed variables:
            2.718281828459045, 2, 2.718281828459045, 3.141592653589793, 2, 2.718281828459045, 3.141592653589793

You don't need the variable number specifier if you use the order of the variables in the format
3.141592653589793, 2.718281828459045, 314, 1.1557273497909217

Using d, f, e, and g format specifiers:
2, 3.14, 3.1416, 3.14e+00, 2.71828e+00, 2.72, 2.7183

Using d, f, e, and g format specifiers with field width:
        2,     3.14,   3.1416,   3.14e+00, 2.71828e+00,       2.72,     2.7183

d format is integer (whole number), with field width specified
            314,  314,        314

f format is floating point with field width and decimals specified
                     3,       3.14,   3.141593

e format is floating scientific form with field width and decimals specified
                 3e+00,   3.14e+00, 3.141593e+00

g format is floating general form with field width and significant digits specified
                     3,        3.1,    3.14159

You can combine alingment within the specified field width
                     3, 3.1       , 3.14159   

Pad integers with leading zeros:
             314, 0314,        314, 0000000314

You don't even need the `d` when printing integers:
             314, 0314,        314, 0000000314

The most general replacement is with strings, using s-format:
   Hello!, Hello!    , Hello!    ,     Hello!
   you may also here drop the letter s of the format:
   Hello!, Hello!    , Hello!    ,     Hello!

Just one more compound example of usign +, * and replacement.

First construct the string:

String indexing and slicing

We can index and slice strings to get parts of it:



In [34]:

    
s1 = "ok, according to {}, this is '" + s[8:20] + s[34:41] + "'?"
print(s1.format('you'))









    



ok, according to you, this is 'a `so-called string'?

Use th slicing with a negative step size get a reversed copy of the sting



In [75]:

    
print(s[::-1])









    



!ah ,ah ,ah ,gnirts detanetacnoc `dellac-os` a si sihT

String methods

the string class has a number of useful and importand methods associated with it, which can be inspected in the notebook by typing a dot immediately after the string and pressing the key. You can then scroll up and down throug the list of available attributes and press return to accept one or a question mark to see it's doc string.



In [84]:

    
s1.upper?



In [86]:

    
# dir(s1) to see all the attributes of the s1 (in fact of the class str)
[k for k in dir(s1) if not k.startswith('_')] # use this comprehension to see only the public ones









    Out[86]:





['capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

Some examples:



In [188]:

    
s = "This is a string with Upper and lower case characters"
print("=" * 80)
print("Below the results of applying about all methods of str:\n")
print(s.capitalize()) # make first character uppercase and the rest lower case
print(s.casefold()) # returns s suitable for caseless comparisons
print(s.center(70, '#')) # Return S centered in a string of length width. Padding is
print('How ofthen does the letter `a` occur in s? ', s.count('a')) # number of times given character is found in s
print('Does s end with `ters` ?       ', s.endswith('ters'))
print("Does s start with 'This is a` ?", s.startswith('This is a'))
print("\tThis\tis\ta\tstring\twith\ttabs\tinstead\tof\ta\tspace.\t\t") # Return a copy of S where all tab characters are expanded using spaces
print('The word `Upper` is found at position {}'.format(s.find('Upper'))) # Return the lowest index in S where substring sub is found
print('The word `Upper` is found at position {}'.format(s.index('Upper', 10, -1))) # Like S.find() but raise ValueError when the substring is not found
print(s.split(' '))  # splits a at specified character, space in this case. Yield list of string.
print('^'.join(s.split(' '))) # joins list of string putting the specifed character between the words.
print(s.lower()) # lower case copy of s
print(s.upper()) # upper ase copy of s
print(s.replace('characters', 'alphanumeric symbols'))
print(s.title(), '  (All words now capitalized)') # returns string with all words capitalized
print(s.title().swapcase(), '  (Case firt titled and then swapped)') # returns string with all words capitalized
s2 = " \t  string with\twhitespace\t "
print("'" + s2 + "'", "(String with whitespace, (with tabs and spaces))")
print("'" + s2.strip() + "'", "(Left and right whitespace removed)")  # whitespace removed
print("'" + s2.lstrip() + "'", '(Left whitespace removed)') # left whitespace removed
print("'" + s2.rstrip() + "'", "(Right whitespace removed)") # right withspace removed
print("'" + s2.ljust(40, '=') + "'", "(str is left justified and padded with '=')")
print("'" + s2.rjust(40, '+') + "'", "(str is right justified and padded with '+')")
s2 = " 'a a a a a"
print('First index of `{}` in `{}` is {}'.format('a', s2, s2.find('a')))
print('Last  index of `{}` in `{}` is {}'.format('a', s2, s2.rfind('a')))
s3 = 'This/is/the/day and That\\was\\yesterday'
print(s3.partition('/'))  # Search for the separator sep in S, and return the part before it,
print('First part is `{}`, separator is `{}` and last part is `{}`'.format(*s3.partition('/')))

print(s3.rpartition('\\'))  # Search for the separator sep in S, and return the part before it,
print('First part is `{}`, separator is `{}` and last part is `{}`'.format(*s3.rpartition('\\')))

# Format_map is like format but can use values from a dict if the keys are used in the replacement fields
print()
pprint("Using format_map, replace keys in {} by values from dict:\n")
horse={'name' : 'duke',
       'age' : 2,
       'color': 'brown',
       'likes' : 'hay'}
print()
pprint(horse, width=40)
print()
print('My {color} horse named {name} is {age} years old and especially likes {likes} on Sundays'.format_map(horse))
print()
print("This is about all on the methods of str.")
print("=" * 80)









    



================================================================================
Below the results of applying about all methods of str:

This is a string with upper and lower case characters
this is a string with upper and lower case characters
########This is a string with Upper and lower case characters#########
How ofthen does the letter `a` occur in s?  5
Does s end with `ters` ?        True
Does s start with 'This is a` ? True
	This	is	a	string	with	tabs	instead	of	a	space.		
The word `Upper` is found at position 22
The word `Upper` is found at position 22
['This', 'is', 'a', 'string', 'with', 'Upper', 'and', 'lower', 'case', 'characters']
This^is^a^string^with^Upper^and^lower^case^characters
this is a string with upper and lower case characters
THIS IS A STRING WITH UPPER AND LOWER CASE CHARACTERS
This is a string with Upper and lower case alphanumeric symbols
This Is A String With Upper And Lower Case Characters   (All words now capitalized)
tHIS iS a sTRING wITH uPPER aND lOWER cASE cHARACTERS   (Case firt titled and then swapped)
' 	  string with	whitespace	 ' (String with whitespace, (with tabs and spaces))
'string with	whitespace' (Left and right whitespace removed)
'string with	whitespace	 ' (Left whitespace removed)
' 	  string with	whitespace' (Right whitespace removed)
' 	  string with	whitespace	 ============' (str is left justified and padded with '=')
'++++++++++++ 	  string with	whitespace	 ' (str is right justified and padded with '+')
First index of `a` in ` 'a a a a a` is 2
Last  index of `a` in ` 'a a a a a` is 10
('This', '/', 'is/the/day and That\\was\\yesterday')
First part is `This`, separator is `/` and last part is `is/the/day and That\was\yesterday`
('This/is/the/day and That\\was', '\\', 'yesterday')
First part is `This/is/the/day and That\was`, separator is `\` and last part is `yesterday`

'Using format_map, replace keys in {} by values from dict:\n'

{'age': 2,
 'color': 'brown',
 'likes': 'hay',
 'name': 'duke'}

My brown horse named duke is 2 years old and especially likes hay on Sundays

This is about all on the methods of str.
================================================================================