Skip to content

Releases: explosion/spaCy

v1.1.0: Bug fixes and adjustments

23 Oct 16:54

Choose a tag to compare

✨ Major features and improvements

  • Rename new pipeline keyword argument of spacy.load() to create_pipeline.
  • Rename new vectors keyword argument of spacy.load() to add_vectors.

🔴 Bug fixes

  • Fix issue #544: Add vocab.resize_vectors() method, to support changing to vectors of different dimensionality.
  • Fix issue #536: Default probability was incorrect for OOV words.
  • Fix issue #539: Unspecified encoding when opening some JSON files.
  • Fix issue #541: GloVe vectors were being loaded incorrectly.
  • Fix issue #522: Similarities and vector norms were calculated incorrectly.
  • Fix issue #461: ent_iob attribute was incorrect after setting entities via doc.ents
  • Fix issue #459: Deserialiser failed on empty doc
  • Fix issue #514: Serialization failed after adding a new entity label.

v1.0.0: Support for deep learning workflows and entity-aware rule matcher

19 Oct 00:29

Choose a tag to compare

✨ Major features and improvements

  • NEW: custom processing pipelines, to support deep learning workflows
  • NEW: Rule matcher now supports entity IDs and attributes
  • NEW: Official/documented training APIs and GoldParse class
  • Download and use GloVe vectors by default
  • Make it easier to load and unload word vectors
  • Improved rule matching functionality
  • Move basic data into the code, rather than the json files. This makes it simpler to use the tokenizer without the models installed, and makes adding new languages much easier.
  • Replace file-system strings with Path objects. You can now load resources over your network, or do similar trickery, by passing any object that supports the Path protocol.

⚠️ Backwards incompatibilities

  • The data_dir keyword argument of Language.__init__ (and its subclasses English.__init__ and German.__init__) has been renamed to path.
  • Details of how the Language base-class and its sub-classes are loaded, and how defaults are accessed, have been heavily changed. If you have your own subclasses, you should review the changes.
  • The deprecated token.repvec name has been removed.
  • The .train() method of Tagger and Parser has been renamed to .update()
  • The previously undocumented GoldParse class has a new __init__() method. The old method has been preserved in GoldParse.from_annot_tuples().
  • Previously undocumented details of the Parser class have changed.
  • The previously undocumented get_package and get_package_by_name helper functions have been moved into a new module, spacy.deprecated, in case you still need them while you update.

🔴 Bug fixes

  • Fix get_lang_class bug when GloVe vectors are used.
  • Fix Issue #411: doc.sents raised IndexError on empty string.
  • Fix Issue #455: Correct lemmatization logic
  • Fix Issue #371: Make Lexeme objects hashable
  • Fix Issue #469: Make noun_chunks detect root NPs

👥 Contributors

Thanks to @daylen, @RahulKulhari, @stared, @adamhadani, @izeye and @crawfordcomeaux for the pull requests!