-
Notifications
You must be signed in to change notification settings - Fork 179
Open
Labels
Description
Hi there,
I was trying to parse the pubmed baseline xml from https://ftp.ncbi.nlm.nih.gov/pubmed/baseline/ with the pp.parse_medline_xml function. But every second file I get an syntax error:
File "/home/xxx/.local/lib/python3.12/site-packages/pubmed_parser/medline_parser.py", line 751, in parse_medline_xml
for event, element in etree.iterparse(f, events=("end",)):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "src/lxml/iterparse.pxi", line 210, in lxml.etree.iterparse.__next__
File "src/lxml/iterparse.pxi", line 195, in lxml.etree.iterparse.__next__
File "src/lxml/iterparse.pxi", line 230, in lxml.etree.iterparse._read_more_events
File "src/lxml/parser.pxi", line 1379, in lxml.etree._FeedParser.feed
File "src/lxml/parser.pxi", line 609, in lxml.etree._ParserContext._handleParseResult
File "src/lxml/parser.pxi", line 618, in lxml.etree._ParserContext._handleParseResultDoc
File "src/lxml/parser.pxi", line 728, in lxml.etree._handleParseResult
File "src/lxml/parser.pxi", line 657, in lxml.etree._raiseParseError
File "/home/xxx/Downloads/xml_files_baseline/pubmed24n0007.xml.gz", line 1538867
lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: ArticleTitle line 1538866 and ArticleId, line 1538867, column 39
If I try to parse the same file with xmltodict it works.