XML documents have a long history behind yet they are still very popular and are one of the main types of the response provided by most of the APIs (the other type is JSON). They have very similar structure to HTML documents in a sense that both have the tag-based structure. However, in XML the user is the one who defines the tag name (also, as in the <status> line below, one may provide an identifier like organization and a value for it, like AUA). Below, a sample XML document is developed.
In [1]:
data = '''
<xml_data>
<person>
<id>01</id>
<name>
<first>Hrant</first>
<last>Davtyan</last>
</name>
<status organization="AUA">Instructor</status>
</person>
<person>
<id>02</id>
<name>
<first>Jack</first>
<last>Nicolson</last>
</name>
<status organization="Hollywood">Actor</status>
</person>
</xml_data>
'''
There are many options to work with a XML document including the built-in Python support. Yet, we will go for the lxml option and will import etree from it to get the XML tree from a string.
In [2]:
from lxml import etree
In [3]:
tree = etree.fromstring(data)
To find the text content of a tag one just needs to use the find() function and then the text method on it as follows:
In [4]:
tree.find('person').text
Out[4]:
Similar to BeautifulSoup, one can use a findall() function to return a list of all elements and get the text of one of them.
In [5]:
tree.findall("person/name/last")[1].text
Out[5]:
In [7]:
tree.find("person/status").text
Out[7]:
To get a value of an attribute defined by the user the get() function should be used.
In [9]:
tree.find("person/status").get("organization")
Out[9]:
Now, we can print some text by applying a for loop over our tree (XML document).
In [10]:
for i in tree:
print("Full name: "+ i.find("name/first").text + i.find("name/last").text)
print("Position: " + i.find("status").text + " at " + i.find("status").get("organization")+"\n")
To write an existing XML document from the Python enviroinment into a XML file, one needs to use the general approach: open a file (which probably does not exist yet) in a writing mode and then write. One just needs to remember that you write a text/string, which means the XML tree should be converted (in a pretty way) into a string using the tostring() sungion. The latter takes one required argument: the name of the variable to be converted, while the other argument is quite an optional one.
In [12]:
with open('output.xml', 'w') as f:
f.write(etree.tostring(tree, pretty_print = True))
Excellent! Now we have a XML document called output.xml saved in our computer. Let's read it.
In [13]:
with open('output.xml') as f:
tree = etree.parse(f)
In [15]:
tree.find('person/id').text
Out[15]:
Perfect, it works!