In many other languages, once you have an error, the program crashes and is over. So before doing something that could crash the program, one checks the input before doing that something. This is called "look before you leap" or LBYL.

In Python, one often has another option, which is to just try something, and if it blows up, do something else without crashing. This is called "Easier to ask for forgiveness than permission." or EAFP. Python's try/except statements make this easy.

This notebook plays with EAFP on some code that converts input to various types.

Hopefully Amazing Grace would approve.



In [1]:

    
import csv
import io

Note that:

The barest minimum code is in the try clauses.
A specific exception is mentioned in each except clause.

See PEP 8 -- Style Guide for Python Code.



In [2]:

    
def convert(s):
    # First tries to convert input string to an integer.
    # If that does not work, then tries to convert it to a float.
    # If that does not work, leaves it as a string.

    try:
        value = int(s)
    except ValueError:
        pass
    else:
        return value

    try:
        value = float(s)
    except ValueError:
        pass
    else:
        return value

    return s

The following version of convert() is refactored to use a loop.



In [3]:

    
def convert(s):
    converters = (int, float)
    
    for converter in converters:
        try:
            value = converter(s)
        except ValueError:
            pass
        else:
            return value
        
    return s



In [4]:

    
data = '''Saeger Buick,123.456,Moosetang
Bobb Ford,234234,Rustang
Mario Fiat,987432.9832,127
'''



In [5]:

    
print(io.StringIO(data).read(), end='')









    



Saeger Buick,123.456,Moosetang
Bobb Ford,234234,Rustang
Mario Fiat,987432.9832,127



In [6]:

    
with io.StringIO(data) as csvfile:
    csv_reader = csv.reader(csvfile)
    for row in csv_reader:
        items = [convert(s) for s in row]
        print('row: %r becomes:' % row)
        for item in items:
            print('    %r (%s)' % (item, type(item)))









    



row: ['Saeger Buick', '123.456', 'Moosetang'] becomes:
    'Saeger Buick' (<class 'str'>)
    123.456 (<class 'float'>)
    'Moosetang' (<class 'str'>)
row: ['Bobb Ford', '234234', 'Rustang'] becomes:
    'Bobb Ford' (<class 'str'>)
    234234 (<class 'int'>)
    'Rustang' (<class 'str'>)
row: ['Mario Fiat', '987432.9832', '127'] becomes:
    'Mario Fiat' (<class 'str'>)
    987432.9832 (<class 'float'>)
    127 (<class 'int'>)

Chris likes .format() instead of C-style % formatting, so the above cell is redone below with .format() style formatting.



In [7]:

    
with io.StringIO(data) as csvfile:
    csv_reader = csv.reader(csvfile)
    for row in csv_reader:
        items = [convert(s) for s in row]
        print('row: {!r} becomes:'.format(row))
        for item in items:
            print('    {0!r} ({1})'.format(item, type(item)))









    



row: ['Saeger Buick', '123.456', 'Moosetang'] becomes:
    'Saeger Buick' (<class 'str'>)
    123.456 (<class 'float'>)
    'Moosetang' (<class 'str'>)
row: ['Bobb Ford', '234234', 'Rustang'] becomes:
    'Bobb Ford' (<class 'str'>)
    234234 (<class 'int'>)
    'Rustang' (<class 'str'>)
row: ['Mario Fiat', '987432.9832', '127'] becomes:
    'Mario Fiat' (<class 'str'>)
    987432.9832 (<class 'float'>)
    127 (<class 'int'>)

In the data, the third field was a car model. Most model values were names and stayed as strings. The 127 was converted to an integer. Is that OK? If not, what would you do? If the data was going to be put in a database, how would the database deal with a field sometimes having a string value and sometimes having an integer value?

The following were added for the 2016-07-22 dojo to explore simplification of det_type() of csv_tools/init.py. I have a tough time understanding csv_tools/init.py. How about you?



In [8]:

    
# Tolerate (ignore) commas in float numbers.

class EmptyField(str):
    pass

def my_empty_field(s):
    if s:
        raise ValueError
    return EmptyField()  

def my_float(s):
    return float(s.replace(',', ''))

def get_data_converter(s):
    # Note that this returns a converter function,
    # not the converted value.

    converters = (int, my_float, my_empty_field, str)
    
    for converter in converters:
        try:
            converter(s)
        except ValueError:
            pass
        else:
            return converter
        
    assert False, 'Should never get here'



In [9]:

    
for s in ('19,999.99', 'hello', '1,234', '1234', ''):
    print(repr(s), repr(get_data_converter(s)))









    



'19,999.99' <function my_float at 0xb4b48ecc>
'hello' <class 'str'>
'1,234' <function my_float at 0xb4b48ecc>
'1234' <class 'int'>
'' <function my_empty_field at 0xb4b4853c>



In [10]:

    
# Is this good?
# I don't know enough about problem to say either way.

def get_data_type(s):
    # Note that this returns type of converted value,
    # not the converter function or converted value.

    converters = (int, my_float, my_empty_field, str)
    
    for converter in converters:
        try:
            value = converter(s)
        except ValueError:
            pass
        else:
            return type(value)
        
    assert False, 'Should never get here'



In [11]:

    
for s in ('19,999.99', 'hello', '1,234', '1234', ''):
    print(repr(s), repr(get_data_type(s)))









    



'19,999.99' <class 'float'>
'hello' <class 'str'>
'1,234' <class 'float'>
'1234' <class 'int'>
'' <class '__main__.EmptyField'>



In [12]:

    
# Could the if/elif/elif of _proc_dict_to_schema_vals()
# be simplified in part with something like the following?

headers = ()  # Stub value to suppress execution and errors.

for field_name in headers:
    types_and_names = (
        # Starts with highest priority,
        # in descending order of priority.
        (str, get_type_of_string(maximum_length)),
        (float, 'Double'),
        (int, 'Long'),
        (EmptyField, get_type_of_string(maximum_length)),
    )
    
    field_types = set(field_types)

    for type_, type_name in types_and_names:
        if type_ in field_types:
            schema.append((field_name, type_name))
            break
    else:
        # Could this be provoked by not having any data rows?
        # If so, which error should be raised?
        raise TypeError('Ran out of types. Should never get here.')

Smart data structures and dumb code works a lot better than the other way around. - Eric Raymond