Molecular weight

We want to calculate the molecular weight of a chemical formula. So given $\mathrm{C_6H_6}$ we want the result to be $12\times 6+1\times 6 = 18$. We can use integer as an approximation to the atomic weight to ease our mental calculations. Improving that at the end is trivial.

The most straightforward way to keep the atomic weights of elements is in a dicctionary:


In [2]:
weightDict = {
'C':12,
'H':1,
'O':16,
'Cl':35
#add more if needed.
}

Parsing the molecular formula is not a trivial task that we will do later. We start by assuming that the formula has been parsed. We will start by keeping the formula as a dictionary. In a condensed formula, each element appears only once, and its subindex indicates the total amount of atoms of that element in the molecule. Ethanol is $\mathrm{C_2H_6O}$.


In [12]:
ethanol = {'C':2, 'H':6, 'O':1}
water = {'H':2, 'O':1}
HCl = {'H':1, 'Cl':1}

From that, calculate the total weight:


In [1]:
#Finish...

Now imagine we also accept formulas in an extended way, for example ethanol as $\mathrm{CH_3CH_2OH}$. In that case it makes sense that our parsing procedure returns a list of tuples such as:


In [21]:
ethanol2 = [('C',1), ('H',3), ('C',1), ('H',2), ('O',1), ('H',1)]
acetic2 = [('C',1), ('H',3), ('C',1), ('O',1), ('O',1), ('H',1)]

From that, we could also create a dictionary such as the previous one, but we can also calculate the weight directly:


In [2]:
#Finish

Parsing

Parsing the formula is not a trivial task. You have to remember the follwing:

  • Some elements have 1 letter names, others have 2. In that case the second letter is always lower-case.
  • Some numbers can be higher than 9, i.e. use 2 or more figures.
  • When the number is 1, it us usually not written.

When coding a complex situation such as this one, it makes sense to plan a strategy for all these scenarios, but start coding and testing the simple cases (1 letter per element, etc)

Python has a built-in module to work with regular expressions that could ease the parsing. But in this case the problem is simple enough that you can solve it without the use of regular expressions.

Here is a possible solution, but different (and possibly better!) approaches are surely possible:


In [3]:
#Try to do it before looking at the answer!

In [1]:
def weight(formula):
    """
    Calcula el pes atòmic d'una formula química
    """
    def parsing(formula):
        """
        Parse the formula and return a list of pairs such as ('C', 3)
        """   
        formList = []
        number = None
        symbol=''
        for s in formula:
            if s.isdigit():
                try:
                    number = number + s
                except:
                    number = s
            elif s.islower(): #we're reading a 2-letter symbol
                symbol = symbol + s 
            else: #We're reading a new symbol
                if not number and symbol: #If the symbol does not have subindex
                    formList.append((symbol,1))
                elif number:
                    formList.append((symbol,int(number)))
                symbol = s
                number = None
        #Add the last element
        if number: 
            formList.append((symbol,int(number)))
        else:
            formList.append((symbol,1))
        return formList
    
    formulaList = parsing(formula)
    # With the list, calculate the weight
    weightDict={'C':12, 'O':16, 'H':1, 'Cl':35, 'S':32,'Na':23}
    weight = 0
    for elem, quant in formulaList:
        weight += weightDict[elem]*quant
    return weight

(weight('C2H6O'),    
weight('CH3CH2OH'),
weight('CH3COOH'),
weight('C56H3CCl3H2OClNa3Na'))


Out[1]:
(46, 46, 60, 937)

In [24]:
molecule = input("Write a molecule: ")


Write a molecule: 4

In [38]:
Atomic_dict = Atomic_dict_fun()
lista = list("C2H6")
vS,vN = read_molecule(lista)
print("The result is", molecular_weight_number(vS,vN))


What do you want?
 Molecular Number (Type 2) or Molecular Weight (Type 3): 3
The result is 30.06964