How to Parse and Modify XML in Python

Related Courses

Introduction:

  1. As we know that the Python is more powerful language which offer great tools for data crunching and preparation, as well as for complex scientific data analysis and modelling. 

  2. In most of the cases we often require parsing the data which is being written in different languages like C, C++ and Java.

  3. Here the Python is used to provides us a numerous enriched functionable libraries which may be get used to parse the data which is being written in other languages. 

  4. Here I am going to discuss the Python XML Parser, which will let you know to learn how to parse XML using Python.

  5. Python has multiple implementations including Jython, scripted in Java language for Java Virtual Machine.

  6. Most of the Python modules work on community development model and are open-source and free.

Here as stated earlier above, I am going to discuss the Python XML Parser, which will let you know to learn how to parse XML using Python.

What is XML?

  1. The XML is basically used to defined and referred as Extensible Markup Language. 

  2. It is having almost similar syntax like HTML in its appearance when we are going to make the code.

  3. It was first get designed for the purpose of storing and transportation of data. 

  4. It has a simple structural approach so that it can be both human- and machine-readable form.

  5. Technically the XML is specially get used for the presentation of data in the schema.

  6. Whereas if we consider the HTML then it is basically used for defining about what data is being used. 

  7. In general way of appearance, the XML is exclusively designed to send and receive data back when we are going to consider the clients and servers interactive modules or application, whereas the HTML is basically used for Webpage designing concept.

Let us consider the following example which will let you know how to write a XML file in Python. Mostly in Python 3 when we need to write an XML file it has to go through the following manner.

  1. As we know that the XML do not have predefined tags like HTML. 

  2. So, we need to create the Tag. Here in XML while writing the XML file or program the author has to define his/her own tags, and in addition with this he/she need to define the document structure too. 

  1. Here in Python3 we will be using “minidom” library for accomplishing the above task. 

  2. This module does not come built-in with Python. 

  3. So, when we need to go for the same, we need to install this using the below command in the terminal.

pip install minidom

Python XML Parsing Modules:

In Python the XML Parsing modules we are going to parse the XML document. Python basically have two modules to accomplish this task such as 

  1.  xml.etree.ElementTree module and 

  2. Minidom (Minimal DOM Implementation). 

As I have already discussed above that the Parsing is a process which is basically used to read information from a specific file being specified and splitting it into pieces by identifying parts of that particular XML file.

Let’s now see how we can use these modules to parse XML data.

Xml.etree.ElementTree Module:

  1. This module is basically get used for providing us the format for XML data in a tree structure.

  2. It is the structure which is the most natural representation of hierarchical data.

  3. Here while the representing the element the type is basically used to allow the storage of hierarchical data structures in memory.

  4. It basically uses the property for representing the data value. The details has been mentioned as below.

Tag Property:   This is a property which is named as Tag which is a string representing the type of data being stored. 

Attributes Property: It is basically used to consist of a number of sub attributes that are get stored as dictionaries. It is more concrete and precise.

Text String Property: It is basically a text string type which is having the information that we are going to require to be displayed when we need.

Tail String Property: It is another property like above but can have the tail strings. It is Optional and being used if necessary.

Child Elements Property: It is basically get used when we are going for number of child elements that need to be get stored as sequences.

The Xml.etree.ElementTree Module is basically used to have the following modules which is mainly used for parsing the XML file. Such as parse() function, and  fromstring() function.

parse() function:

 This function is basically get used when we need to parse the XML document when it is get supplied in the form of File.

Let us consider the following example as mentioned below.

import xml.etree.ElementTree as ET

mytree = ET.parse('sample.xml')

myroot = mytree.getroot()

print(myroot)

fromstring() function:

Like similar to above this method is basically used when we need to parse the XML document when it is get supplied as a string. i.e within triple quotes.

Let us consider the following example which will let you know and helps you to understand the concept.

import xml.etree.ElementTree as ET
data='''<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<food>
    <item name="breakfast">Idly</item>
    <price>$2.5</price>
    <description>
   Two idly's with chutney
   </description>
    <calories>553</calories>
</food>
</metadata>
'''
myroot = ET.fromstring(data)
#print(myroot)
print(myroot.tag)

Finding Elements of Interest:

It is another important concept which deals with the root. The root is basically getting consists of child tags as well. Here it should be get noted that if we need to retrieve the child of the root tag, then we need to proceed as follows.

Example:

for x in myroot[0]:
     print(x.tag, x.attrib)

Here in the above code when it is get subjected to run then it will retrieve the child attributes and tags values.

Modifying XML files:

As you know that if you need then the elements which are present in your XML file that can be manipulated. Here we need to use the set() function to manipulate the content of the file. 

For example, if we need to add something to our existing XML file then we can able to proceed as follows.

Adding to XML:

The following example shows how you can add something to the description of items.

for description in myroot.iter('description'):
     new_desc = str(description.text)+'wil be served'
     description.text = str(new_desc)
     description.set('updated', 'yes')
 mytree.write('new.xml')

So, if we need to add the content then we can be able to write the above code to our existing program.

Deleting from XML:

Like similar to above if we need to delete the attributes or sub-elements using ElementTree approach then here we can use the pop() method. Like the set() which is used for adding and modifying the content, This method will remove the desired attribute or element that is not needed by the user form the XML documents.

Let us consider the same example mentioned earlier, but now to remove the attribute we have to write the following code as mentioned below.

myroot[0][0].attrib.pop('name', None)
 # create a new XML file with the results
mytree.write('output5.xml')

xml.dom.minidom Module:

  1. Like similar to Xml.etree.ElementTree Module,  This module is basically used by people who are familiar and proficient with DOM (Document Object module). 

  2. It should be get noted that the DOM applications often start by parsing XML into DOM. 

  3. In xml.dom.minidom, if we need to proceed then  we need to follows as:

parse() function:

As mentioned earlier in the above it is the first method which is basically used when we know we are supplying the XML file to be parsed as a parameter.  

For example:

from xml.dom import minidom
p1 = minidom.parse("sample.xml");

parseString() Method:

This method is used when you want to supply the XML to be parsed as a string.

Let us consider the following example as mentioned below where we are supplying the XML to be parsed as a string.

minidom.parseString('<myxml>Using<empty/> parseString</myxml>')

Finding Elements of Interest:

When the file is already been parsed, if we need to display the content of the file, the output that is returned displays a message that the variable storing the parsed data is an object of DOM.

Let us consider the following syntax as mentioned below.

print(minidom.parse('sample.xml'))

Accessing Elements

If we need to access the element, then we may use the methods such as GetElementByTagName.

Let us consider the following syntax as mentioned below.

print(dat.getElementsByTagName('item')[0])

Scope @ NareshIT:

  1. At Naresh IT you will get a good Experienced faculty who will guide you, mentor you and nurture you to achieve your dream goal.

  2. Here you will get a good hand on practice in terms of practical industry-oriented environment which will definitely help you a lot to shape your future.

     3.During the designing process of application, we will let you know about the other aspect of the application too. 

     4. Our Expert trainer will let you know about every in’s and out’s about the problem scenario.

Achieving your dream goal is our motto. Our excellent team is working restlessly for our students to click their target. So, believe on us and our advice, and we assured you about your sure success.