-
Notifications
You must be signed in to change notification settings - Fork 44
Description
This is a wonderful package, but I have run into a problem: In TeX and LaTeX, once can specify accents such as \H either in the form \H{o} or \H o. In the latter case, the NameParser code splits the second form incorrectly.
biblib.algo.parse_names("Sz\\H{o}l\\H{o}, Abel and Sz\\H ol\\H o, Baker")
[Name(first='Abel', von='', last='Sz\\H{o}l\\H{o}', jr=''),
Name(first='Baker', von='Sz\\H ol\\H', last='o', jr='')]
For reasons I don't understand, surrounding the second \H o with braces for Baker will parse that name correctly, even though the first \H o still contains a space.
biblib.algo.parse_names("Sz\\H ol{\\H o}, Baker")
[Name(first='Baker', von='', last='Sz\\H ol{\\H o}', jr='')]
This particular problem can be solved by converting to Unicode first
biblib.algo.parse_names(biblib.algo.tex_to_unicode("Sz\\H{o}l\\H{o}, Abel and Sz\\H ol\\H o, Baker"))
[Name(first='Abel', von='', last='Szőlő', jr=''),
Name(first='Baker', von='', last='Szőlő', jr='')]
but that approach strips out braces needed, for example, to specify an institution name and have it parsed as all being a last name, not split into name components. E.g.,
biblib.algo.parse_names(biblib.algo.tex_to_unicode("Sz\\H ol\H o, Baker and {NRC Committee}"))
[Name(first='Baker', von='', last='Szőlő', jr=''),
Name(first='NRC', von='', last='Committee', jr='')]
I think that the NameParser's algorithm must special-case accent specifiers such as \H, \r, \u, \v, etc., to be sure not to split tokens on spaces following them, just as it now special-cases spaces at a brace depth > 0.