Skip to content

parse_itut_bulletins.py - running into AssertionError #3

@tobiasfunke1

Description

@tobiasfunke1

Hi, thanks for your tools and collection of mnc and mcc data.

When running parse_itut_bulletins.py I get the following output:

$ ./parse_itut_bulletins.py -d -j -p
...
[+] downloaded PDF bulletin 1314 from year 2025 and converted to text
[+] downloaded PDF bulletin 1315 from year 2025 and converted to text
> error occured during MNC extraction: AssertionError()

I think because of different versions of pdftotext its output changed. For example in T-SP-OB.1162-2018-OAS-PDF-E.txt line 2259 there was added a space after *:

before:

*This designation is without prejudice to positions on status, and is in line with UNSCR 1244 and the ICJ Opinion on the Kosovo

after:

* This designation is without prejudice to positions on status, and is in line with UNSCR 1244 and the ICJ Opinion on the Kosovo

The regex pattern does not match anymore:

'(\*This designation is without prejudice to positions on status, and is in line with UNSCR 1244 and the ICJ Opinion on the Kosovo)|'\

system infos

OS: Ubuntu 24.04.2 LTS
Python: 3.12.3
lxml: 5.4.0
pdftotext: 24.02.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions