Release v1.0.0: Support for deep learning workflows and entity-aware rule matcher · explosion/spaCy

✨ Major features and improvements

NEW: custom processing pipelines, to support deep learning workflows
NEW: Rule matcher now supports entity IDs and attributes
NEW: Official/documented training APIs and GoldParse class
Download and use GloVe vectors by default
Make it easier to load and unload word vectors
Improved rule matching functionality
Move basic data into the code, rather than the json files. This makes it simpler to use the tokenizer without the models installed, and makes adding new languages much easier.
Replace file-system strings with Path objects. You can now load resources over your network, or do similar trickery, by passing any object that supports the Path protocol.

The data_dir keyword argument of Language.__init__ (and its subclasses English.__init__ and German.__init__) has been renamed to path.
Details of how the Language base-class and its sub-classes are loaded, and how defaults are accessed, have been heavily changed. If you have your own subclasses, you should review the changes.
The deprecated token.repvec name has been removed.
The .train() method of Tagger and Parser has been renamed to .update()
The previously undocumented GoldParse class has a new __init__() method. The old method has been preserved in GoldParse.from_annot_tuples().
Previously undocumented details of the Parser class have changed.
The previously undocumented get_package and get_package_by_name helper functions have been moved into a new module, spacy.deprecated, in case you still need them while you update.