-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
At the moment there are categories in corpora like "film-tv" and files like "materials/abridged-body-fluids". When using tools like pycorpora, these names cause problems because they prevent the user from retrieving files using standard syntax, such as pycorpora.category_name.file_name['key'], because - is not a legal character in Python identifiers.
In pycorpora I can work around this as follows:
getattr(pycorpora, 'film-tv').tv_shows[''tv_shows']
pycorpora.materials.get_file('abridged-body-fluids')['abridged body fluids']
However, this isn't ideal and probably either pycorpora and similar libraries should perform these workarounds internally (translating - to _, for instance), or corpora should restrict category and file names to valid JS/Python/C (for example) identifiers.
I've opened a similar issue in pycorpora: aparrish/pycorpora#11.