parse_itut_bulletins.py - running into AssertionError

Hi, thanks for your tools and collection of mnc and mcc data.

When running `parse_itut_bulletins.py` I get the following output:

```bash
$ ./parse_itut_bulletins.py -d -j -p
...
[+] downloaded PDF bulletin 1314 from year 2025 and converted to text
[+] downloaded PDF bulletin 1315 from year 2025 and converted to text
> error occured during MNC extraction: AssertionError()
```

I think because of different versions of `pdftotext` its output changed. For example in `T-SP-OB.1162-2018-OAS-PDF-E.txt` line 2259 there was added a space after `*`:

before:
```
*This designation is without prejudice to positions on status, and is in line with UNSCR 1244 and the ICJ Opinion on the Kosovo
```

after:
```
* This designation is without prejudice to positions on status, and is in line with UNSCR 1244 and the ICJ Opinion on the Kosovo
```

The regex pattern does not match anymore:

https://github.com/P1sec/MCC_MNC/blob/a5613a2f2dbb1c9cb439d185127a432dea4405ea/parse_itut_bulletins.py#L224


### system infos

OS: `Ubuntu 24.04.2 LTS`
Python: `3.12.3`
lxml: `5.4.0`
pdftotext: `24.02.0`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parse_itut_bulletins.py - running into AssertionError #3

system infos

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

parse_itut_bulletins.py - running into AssertionError #3

Description

system infos

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions