Parsing XML documents in python

Vahid Mirjalili, Data Scientist

Example 1: food menu


In [1]:
import xml.etree.ElementTree as ET

from prettytable import PrettyTable

#########################################
tree = ET.parse('examples/simple.xml')
root = tree.getroot()
print (root)
print (tree.getroot().tag)
tree.getroot().attrib

## Access child
for child in tree.getroot():
    print (child.tag, child.attrib)


<Element 'breakfast_menu' at 0x28dc310>
breakfast_menu
('food', {})
('food', {})
('food', {})
('food', {})
('food', {})

In [2]:
## Create A Table to output the results
tb = PrettyTable(["Item Name", "Price", "Calories"])
tb.align["Item Name"] = "l" # left aligned
tb.padding_width = 1

## Finding items' names, prices and calories
for child in tree.getroot():
    food_name = child.find('name')
    price = child.find('price')
    calories = child.find('calories')
    #print food_name.text, price.text
    tb.add_row([food_name.text, price.text, calories.text])

print(tb)


+-----------------------------+-------+----------+
| Item Name                   | Price | Calories |
+-----------------------------+-------+----------+
| Belgian Waffles             | $5.95 |   650    |
| Strawberry Belgian Waffles  | $7.95 |   900    |
| Berry-Berry Belgian Waffles | $8.95 |   900    |
| French Toast                | $4.50 |   600    |
| Homestyle Breakfast         | $6.95 |   950    |
+-----------------------------+-------+----------+

Example 2: Plant Catalog


In [3]:
tree = ET.parse('examples/plant_catalog.xml')
root = tree.getroot()

tb = PrettyTable(["Common Name", "Light Condition", "Price"])
tb.align["Common Name"] = "l" # left aligned
tb.padding_width = 1

for child in tree.getroot():
    name = child.find('COMMON')
    price = child.find('PRICE')
    light = child.find('LIGHT')
    tb.add_row([name.text, light.text, price.text])

print(tb)


+---------------------+-----------------+-------+
| Common Name         | Light Condition | Price |
+---------------------+-----------------+-------+
| Bloodroot           |   Mostly Shady  | $2.44 |
| Columbine           |   Mostly Shady  | $9.37 |
| Marsh Marigold      |   Mostly Sunny  | $6.81 |
| Cowslip             |   Mostly Shady  | $9.90 |
| Dutchman's-Breeches |   Mostly Shady  | $6.44 |
| Ginger, Wild        |   Mostly Shady  | $9.03 |
| Hepatica            |   Mostly Shady  | $4.45 |
| Liverleaf           |   Mostly Shady  | $3.99 |
| Jack-In-The-Pulpit  |   Mostly Shady  | $3.23 |
| Mayapple            |   Mostly Shady  | $2.98 |
| Phlox, Woodland     |   Sun or Shade  | $2.80 |
| Phlox, Blue         |   Sun or Shade  | $5.59 |
| Spring-Beauty       |   Mostly Shady  | $6.59 |
| Trillium            |   Sun or Shade  | $3.90 |
| Wake Robin          |   Sun or Shade  | $3.20 |
| Violet, Dog-Tooth   |      Shade      | $9.04 |
| Trout Lily          |      Shade      | $6.94 |
| Adder's-Tongue      |      Shade      | $9.58 |
| Anemone             |   Mostly Shady  | $8.86 |
| Grecian Windflower  |   Mostly Shady  | $9.16 |
| Bee Balm            |      Shade      | $4.59 |
| Bergamot            |      Shade      | $7.16 |
| Black-Eyed Susan    |      Sunny      | $9.80 |
| Buttercup           |      Shade      | $2.57 |
| Crowfoot            |      Shade      | $9.34 |
| Butterfly Weed      |      Sunny      | $2.78 |
| Cinquefoil          |      Shade      | $7.06 |
| Primrose            |      Sunny      | $6.56 |
| Gentian             |   Sun or Shade  | $7.81 |
| Blue Gentian        |   Sun or Shade  | $8.56 |
| Jacob's Ladder      |      Shade      | $9.26 |
| Greek Valerian      |      Shade      | $4.36 |
| California Poppy    |       Sun       | $7.89 |
| Shooting Star       |   Mostly Shady  | $8.60 |
| Snakeroot           |      Shade      | $5.63 |
| Cardinal Flower     |      Shade      | $3.02 |
+---------------------+-----------------+-------+

Writing Data to XML Files

Now, we read in an example dataset (iris) in a tabular format, and then write the output to into XML format.


In [4]:
fp = open("examples/iris.dat")

root = ET.Element("root")

for line in fp:
    d = line.strip().split(" ")
    
    row = ET.SubElement(root, "iris")
    sepal_length = ET.SubElement(row, "SepalLength")
    sepal_length.text = d[0]
    sepal_width = ET.SubElement(row, "SepalWidth")
    sepal_width.text = d[1]
    petal_length = ET.SubElement(row, "PetalLength")
    petal_length.text = d[2]
    petal_wodth = ET.SubElement(row, "PetalWidth")
    petal_wodth.text = d[3]
    species = ET.SubElement(row, "Species")
    species.text = d[4]
    
    tree = ET.ElementTree(root)
    tree.write("examples/iris.xml")

fp.close()

In [ ]: