Parsing XML files with Python (xml.etree.ElementTree)

Рет қаралды 76,695

Күн бұрын

Пікірлер: 45

@bayrakmusti1 Жыл бұрын

That's how it is supposed to be taught. I have been browsing the courses on how to do it and they all are complicated. Thankfully found this video. Thanks a lot. Great job!

@ImtiazEbnaMannan Жыл бұрын

Thanks a lot for the great tutorial. Your approach to XML parsing was spot-on for me and it was exactly what I was looking for to get started on XML parsing.

@jdvelasquezr 7 ай бұрын

Thank you, Francesco, for taking the time to review this library's different functions. You have greatly helped me finish a much-needed script for our localization engineering tasks. Notably, adding text to an existing tag saved the day.

@UsmanSaadat 2 жыл бұрын

Thanks a lot for this video. I couldn't grasp the concepts properly even after reading from books. This video made it look like piece of cake.

@RodrigoMontes Жыл бұрын

Excellent man! This is what I was looking for :)

@ginopeduto4264 4 жыл бұрын

Grazie Mille!!! That was exactly what I was looking for and all well explained!!!

@konradp6379 3 ай бұрын

Ciao a Tutti. Ho provato da me stesso cosi come lo hai fatto tu. Dopo di che ho trovato il tuo video. Grazzie mille per averlo fatto.

@konradp6379 3 ай бұрын

C'e' qualche modo build in di non essere costretto di formattare questo xml?

@xst-k6 Жыл бұрын

Can you show us how to parse a Tableau dashboard file (*.twb)? It's an XML file, Tableau just renamed it. I am trying to create a data dictionary from the .twb file.

@A_A7337 2 жыл бұрын

Great video. Thanks

@markdillon9588 2 жыл бұрын

can you mass edit multiple files?

@CinemagicMindset 2 жыл бұрын

Hi Francesco, i'm getting error while parsing xml file since it is having special words. kindly hep me to avoid this error. Error : xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 277, column 366

@fcento 2 жыл бұрын

If you are sure the file you have is a valid xml (there are online tools to help you there), then what comes to mind is incorrect encoding. Check the documentation here: docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.XMLParser

@giacomocillari4448 2 жыл бұрын

Is there a way to change sub-element instead of the whole element string? let's say for example that I want to change W with SW but not the name, and I need to do it in a loop so I can't put the name string inside as it changes anytime, is there a way to call the specific sub element?

@fantasticprajwal7442 4 жыл бұрын

How to install xmltree in python 2.7.5 I am not able to upgrade due to restriction

@fcento 4 жыл бұрын

docs.python.org/2.7/library/xml.etree.elementtree.html it's a built in library no need to install. I recommend you start to figure out the restrictions because 2.7 is deprecated. 3.8 is now also available with Anaconda. Possibly some of the code I used will not work on 2.7, bear that in mind.

@hoscoharding7319 4 жыл бұрын

Hi Francesco! I have been trying to do something with elementtree for several days but it is impossible for me ... And it gives me the feeling that it is very simple. I want to make a little script that adds a child element only if it doesn't already have it. Imagine that the document lacks year to panama. My script would go through the xml document and add only the year to Panama ... Could you give me some idea please? Many thanks.

@fcento 4 жыл бұрын

I've just made a video about it: kzbin.info/www/bejne/a3PVh4OmhM6ZqtE

@arnolda7417 4 жыл бұрын

Hi Francesco. Thanks for the great video! I ran into an error after editing my xml file. I tried to view the entire file to make sure my changes were made with ET.dump(tree) and I always get "AttributeError: 'str' object has no attribute 'items'" I'm testing with Jupyter notebook and when I restart the kernel, ET.dump works just fine before I make changes to the file. Any idea on how to fix this? I'm new to Python.

@fcento 4 жыл бұрын

Hi Arnold, can you share the code?

@arnolda7417 4 жыл бұрын

@@fcento absolutely. Is there an email address I can send it to? I’d like to include the payload as well for reference

@KiviliG 4 жыл бұрын

Can this be done by Beautifulsoup library?

@stanleymbah8983 2 жыл бұрын

thank you for this

@shrinivasulunandyala9269 3 жыл бұрын

Merge XML files using python,can you please make video on this top

@KrishnaManohar8021 3 жыл бұрын

looking forword...

@vijayalakshmi8282 2 жыл бұрын

hii franseco great video thanks i need small suggestion here let's saya 100 so in this i need output like KTOPL 100 here i need tag and value both how we can get can u please explian

@debasishsahoo1268 7 ай бұрын

Awesome

@sidjjj 2 жыл бұрын

Thanks for this video, I needed to parse xml from a variable instead of a file and found this : xml_data_tree = ET.fromstring(received_packet)

@arshap9351 3 жыл бұрын

Increase your font size before doing tutorials. its quite complicated to read texts. anyway goodjob

@attilioturco 11 ай бұрын

nice vid thanks

@myyoutubeaccount0123_ 2 жыл бұрын

thanks a lot

@fcento 2 жыл бұрын

Happy to help

@padraigmaccu9333 3 жыл бұрын

Go raibh céad maith agat, a Francesco. Rud a bhí de dhíth orm le fada. Pádraig Mac Con Uladh

@codelearnexe475 3 жыл бұрын

Was not expecting irish in this chat XD

@Gamer-mg6my 2 жыл бұрын

Hi i'm trying to get the text of every tag named , but inside every tag has this: , some idea to extract/ the content of the tags?: 1 1 1 0 #000000 1 0.010000 #000000 #000000 1 #000000 0.000000 0.590551 0.000000 0.000000 0 0.000000 0 0.000000 1 #000000 0 1.000000 0.166667 -1 &#xe000; 0 -1.200000 1.651575 0.748031 0 0 0 0.708661 3.720472 6.023622 1.612205 #000000 #FFFFFF 1 #000000 1 #000000 1 0.039370 0 0 0 0 0.000000 0.000000 1.612205 0.000000 1.612205 -0.708661 0.000000 -0.708661 0.000000 0.000000 0 0 0 0.247563 3.889961 5.511811 1.273228 #000000 0 1.000000 0.247563 1 Entity 0 0 0 0.708661 3.720472 6.023622 1.612205 #000000 #FFFFFF 1 #000000 1 #000000 1 0.039370 0 0 0 0 0.000000 0.000000 1.612205 0.000000 1.612205 -0.708661 0.000000 -0.708661 0.000000 0.000000 0 0 0 0.247563 3.889961 5.511811 1.273228 #000000 0 1.000000 0.247563 1 EntityTwo 0 0 0 0.708661 3.720472 6.023622 1.612205 #000000 #FFFFFF 1 #000000 1 #000000 1 0.039370 0 0 0 0 0.000000 0.000000 1.612205 0.000000 1.612205 -0.708661 0.000000 -0.708661 0.000000 0.000000 0 0 0 0.247563 3.889961 5.511811 1.273228 #000000 0 1.000000 0.247563 1 EntityThree

@fcento 2 жыл бұрын

Let's take it in steps. I'm assuming you want to extract 'Entity', 'EntityTwo', 'EntityThree' from the element (...let me know if i misunderstood your question). The way it's formatted it contains 2 elements ( and ) as well as the piece of text you want to extract. If you just use findall() and use 'text' you get None back, what you want to use in this case is 'tail' instead. I've included a sample code here: gist.github.com/fcento100/74b8691af014a8126f8e9ca2ff03c6ea

@fcento 2 жыл бұрын

i've put the xml code from your comment in a file here gist.github.com/fcento100/19cb7ae6b857c539a2c2843519239efc for convenience

@Gamer-mg6my 2 жыл бұрын

@@fcento Yes, you understood me good. Ohhhh with tail .Well, i checked it but with other xml didn't compile :( , instead of that i put findall('.//cp', ns) and print elm.tail, with that we got the text. I like more your solution but with other xml didn't compile :(((((.This is the error that i got: elmtail = elm.tail.strip() AttributeError: 'NoneType' object has no attribute 'strip'

@fcento 2 жыл бұрын

Apologies for not catching the 'NoneType' error, effectively 'tail' returns None if it doesn't find anything rather than an empty string. It's fixed now in this version: gist.github.com/fcento100/11847ad0d8d42eec6c1dc42de897b842 with an if statement to catch it. The reason i wasn't getting this error was because i copied pasted from your message and since it was formatted, 'tail' returned ' ' and '\t' (which are the string representation of new-line and tab) where it should have returned None, hence why i was able to run the strip command everywhere without error. In the new code i posted I've shown 2 methods of getting at that piece of data; in your sample xml "Entity" etc.. is the tail of ; root.findall('.//visio:Text/',ns) and root.findall('.//visio:cp',ns) do similar things. The only difference is that using './/visio:Text/' in method 1 will also extract the tail for if is available, which may be undesirable! In that case './/visio:cp' like you suggested is the way to go.

@Gamer-mg6my 2 жыл бұрын

@@fcento a lot of thanks for your kind help Francesco :))