Parsing xml in Python with etree.ElementTree (2024)

The xml module in the standard library provide tools for working with XML documents.

The ElementTree class in the etree submodule of the xml module offers an intuitive way of parsing and representing XML data.

ElementTree objects represents xml data in form of a tree structure in which the hierarchy is based on the nesting of the xml elements.

Basic parsing example

Consider if we have an xml file called articles.xml with the following content.

<?xml version = '1.0' encoding = 'UTF-8'?><articlelist> <article> <author country = 'India'>John Doe</author> <datepublished>2024/04/05</datepublished> <title>Lorem ipsum dolor sit amet consectetur adipisicing elit</title> <content>Lorem ipsum dolor sit amet consectetur adipisicing elit. Maxime mollitia, molestiae quas vel sint commodi repudiandae consequuntur voluptatum laborum numquam blanditiis harum quisquam eius sed odit fugiat iusto fuga praesentium optio, eaque rerum! Provident similique accusantium nemo autem. </content> </article> <article> <author country = 'Finland'>Mary Smith</author> <datepublished>2024/04/07</datepublished> <title>Perspiciatis minima nesciunt dolorem</title> <content>Perspiciatis minima nesciunt dolorem! Officiis iure rerum voluptates a cumque velit quibusdam sed amet tempora. Sit laborum ab, eius fugit doloribus tenetur fugiat, temporibus enim commodi iusto libero magni deleniti quod quam consequuntur! Commodi minima excepturi repudiandae velit hic maxime doloremque.</content> </article> </articlelist>

We can parse the document by passing the opened file object as an argument to the ElementTree.parse() method, as shown below:

from xml.etree import ElementTreewith open('articles.xml') as file: tree = ElementTree.parse(file) print(tree)

<xml.etree.ElementTree.ElementTree object at 0x000001B03DB770E0>

As shown in the above example, the ElementTree.parse() helper method creates an ElementTree instance from the given file object.

The ElementTree object represents the structure of the xml documents in form of a tree, where each node in the tree represents the corresponding element in the xml document.

Traversing an ElementTree

The tree.iter() method returns an iterator object that yields the nodes of the parsed tree from top to bottom. By default it returns all nodes in the tree.

from xml.etree import ElementTreewith open('articles.xml') as file: tree = ElementTree.parse(file) for node in tree.iter(): print(node.tag)

articlelist
article
author
datepublished
title
content
article
author
datepublished
title
content

You can pass a tag as an argument to the tree.iter() method so that it will only iterate over the elements with that tag.

tree.iter(tag = None)

For example, to get onlynodes with author tag, we can parse 'author' as the tag argument, as shown below:

from xml.etree import ElementTreewith open('articles.xml') as file: tree = ElementTree.parse(file) for node in tree.iter('author'): print(node.text)

John Doe
Mary Smith

Search for Nodes

Parsed trees contains some useful methods to expressively search for nodes with certain characteristics. This allows you to find for nodes with given tags or even nodes that appears at certain depth of the parse tree.

The two basic methods for searching are find() and findall().

Find single node - tree.find()

The tree.find() method returns the first node that matches the search strings. It returns None, if there is no matching node.

from xml.etree import ElementTreewith open('articles.xml') as file: tree = ElementTree.parse(file) n = tree.find('.//author') print(n.text)

John Doe

Note how the search string is formatted, each stroke represents a depth starting from the root. So in the above case, we are finding for a node withauthor tag at the second depth. If we are looking for an article tag we would use a single stroke i.e "./article"to correspond with the depth of that article tag from the root.

from xml.etree import ElementTreewith open('pynerds.txt') as file: tree = ElementTree.parse(file) article = tree.find('./article') for i in article: print(i.tag, ': ', i.text)

author : John Doe
datepublished : 2024/04/05
title : Lorem ipsum dolor sit amet consectetur adipisicing elit
content : Lorem ipsum dolor sit amet consectetur adipisicing elit. Maxime mollitia,
molestiae quas vel sint commodi repudiandae consequuntur voluptatum laborum
numquam blanditiis harum quisquam eius sed odit fugiat iusto fuga praesentium
optio, eaque rerum! Provident similique accusantium nemo autem.

If you only want the text value, you can use the findtext() method instead of find().

from xml.etree import ElementTreewith open('articles.xml') as file: tree = ElementTree.parse(file) print(tree.findtext('./article/title'))

Lorem ipsum dolor sit amet consectetur adipisicing elit

The './article/title' search string, searches for a title element that is nested inside of an article element. This can be especially useful if the xml document contains elements that have similar tags.

Find all matching elements - tree.findall()

The tree.findall() method returns a list of all matching nodes for the given search string.

from xml.etree import ElementTreewith open('articles.xml') as file: tree = ElementTree.parse(file) nodes = tree.findall('.//author') print(nodes) for n in nodes: print(n.text)

[<Element 'author' at 0x0000011425175FD0>, <Element 'author' at 0x0000011425176160>]
John Doe
Mary Smith

Deeper look on nodes

The Elementobjects returned by methods liketree.iter(), tree.find(), etc are used to represent a single node in the xml parse tree.

Element objects contain some useful attributes and methods for accessing and manipulating information of the represented xml element. We have already used some of the attributes such as text and tag.

The attrib dictionary of an Element object stores the attributes of the represented xml element.

from xml.etree import ElementTreewith open('articles.xml') as file: tree = ElementTree.parse(file) n = tree.find('*author') print(n.attrib) print(n.text) print(n.attrib.get('country'))

{'country': 'India'}
John Doe
India

You can use the tail attribute to get the text that comes after the closing tag of a given node.

Parsing Strings

If the xml data is in form of a string, we can parse it using the XML() function. Which takes the xml string as an argument, parses it and creates an Element object representation.

from xml.etree import ElementTreexml_data = ''' <article> <author country = 'India'>John Doe</author> <datepublished>2024/04/05</datepublished> <title>Lorem ipsum dolor sit amet consectetur adipisicing elit</title> <content>Lorem ipsum dolor sit amet consectetur adipisicing elit. Maxime mollitia, molestiae quas vel sint commodi repudiandae consequuntur voluptatum laborum numquam blanditiis harum quisquam eius sed odit fugiat iusto fuga praesentium optio, eaque rerum! Provident similique accusantium nemo autem. </content> </article>'''article = ElementTree.XML(xml_data)print(article.findtext('.author'))print(article.findtext('.datepublished'))print(article.findtext('.title'))

John Doe
2024/04/05
Lorem ipsum dolor sit amet consectetur adipisicing elit

Note that unlike parse() which returns an ElementTree instance, the return value of XML() is an Element object.

‹‹ Prevpickle module→

Parsing xml in Python with etree.ElementTree (2024)

References

Top Articles
Chipotle Mexican Grill hiring Kitchen Leader in Closter, New Jersey, United States | LinkedIn
Lucille Skin the Alie Plain arrived when critical leading the difference company assisting moves who Ninth Change further
Mileage To Walmart
PRISMA Technik 7-10 Baden-Württemberg
Activities and Experiments to Explore Photosynthesis in the Classroom - Project Learning Tree
Hawkeye 2021 123Movies
Wausau Marketplace
Lycoming County Docket Sheets
Legacy First National Bank
Scentsy Dashboard Log In
Myql Loan Login
Buying risk?
Mephisto Summoners War
R/Afkarena
Hca Florida Middleburg Emergency Reviews
6813472639
Moviesda3.Com
Equipamentos Hospitalares Diversos (Lote 98)
Fdny Business
Trac Cbna
3476405416
CDL Rostermania 2023-2024 | News, Rumors & Every Confirmed Roster
Loft Stores Near Me
Apple Original Films and Skydance Animation’s highly anticipated “Luck” to premiere globally on Apple TV+ on Friday, August 5
Cbssports Rankings
Orange Pill 44 291
Reborn Rich Kissasian
Danielle Ranslow Obituary
Delectable Birthday Dyes
Danielle Moodie-Mills Net Worth
Lcsc Skyward
Dailymotion
Sam's Club Gas Price Hilliard
Panchang 2022 Usa
140000 Kilometers To Miles
Capital Hall 6 Base Layout
Best Weapons For Psyker Darktide
Waffle House Gift Card Cvs
Gold Nugget at the Golden Nugget
Craigslist Boats Eugene Oregon
Bitchinbubba Face
R Nba Fantasy
How are you feeling? Vocabulary & expressions to answer this common question!
Hebrew Bible: Torah, Prophets and Writings | My Jewish Learning
Linda Sublette Actress
Craigslist Food And Beverage Jobs Chicago
Booknet.com Contract Marriage 2
Haunted Mansion Showtimes Near Millstone 14
Vcuapi
Convert Celsius to Kelvin
Lorcin 380 10 Round Clip
Latest Posts
Article information

Author: Cheryll Lueilwitz

Last Updated:

Views: 5717

Rating: 4.3 / 5 (54 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Cheryll Lueilwitz

Birthday: 1997-12-23

Address: 4653 O'Kon Hill, Lake Juanstad, AR 65469

Phone: +494124489301

Job: Marketing Representative

Hobby: Reading, Ice skating, Foraging, BASE jumping, Hiking, Skateboarding, Kayaking

Introduction: My name is Cheryll Lueilwitz, I am a sparkling, clean, super, lucky, joyous, outstanding, lucky person who loves writing and wants to share my knowledge and understanding with you.