You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Alternatively, you can also clone the latest version from the [repository](https://github.com/UKPLab/sentence-transformers) and install it directly from the source code:
39
+
Alternatively, you can also clone the latest version from the [repository](https://github.com/huggingface/sentence-transformers) and install it directly from the source code:
42
40
43
-
````
41
+
```
44
42
pip install -e .
45
-
````
43
+
```
46
44
47
45
**PyTorch with CUDA**
48
46
@@ -57,15 +55,15 @@ See [Quickstart](https://www.sbert.net/docs/quickstart.html) in our documentatio
57
55
58
56
First download a pretrained embedding a.k.a. Sentence Transformer model.
59
57
60
-
````python
58
+
```python
61
59
from sentence_transformers import SentenceTransformer
62
60
63
61
model = SentenceTransformer("all-MiniLM-L6-v2")
64
-
````
62
+
```
65
63
66
64
Then provide some texts to the model.
67
65
68
-
````python
66
+
```python
69
67
sentences = [
70
68
"The weather is lovely today.",
71
69
"It's so sunny outside!",
@@ -74,17 +72,17 @@ sentences = [
74
72
embeddings = model.encode(sentences)
75
73
print(embeddings.shape)
76
74
# => (3, 384)
77
-
````
75
+
```
78
76
79
77
And that's already it. We now have numpy arrays with the embeddings, one for each text. We can use these to compute similarities.
We provide a large list of pretrained models for more than 100 languages. Some models are general purpose models, while others produce embeddings for specific use cases.
170
+
We provide a large list of pretrained models for more than 100 languages. Some models are general purpose models, while others produce embeddings for specific use cases.
This framework allows you to fine-tune your own sentence embedding methods, so that you get task-specific sentence embeddings. You have various options to choose from in order to get perfect sentence embeddings for your specific task.
178
+
This framework allows you to fine-tune your own sentence embedding methods, so that you get task-specific sentence embeddings. You have various options to choose from in order to get perfect sentence embeddings for your specific task.
180
179
181
-
* Embedding Models
182
-
*[Sentence Transformer > Training Overview](https://www.sbert.net/docs/sentence_transformer/training_overview.html)
183
-
*[Sentence Transformer > Training Examples](https://www.sbert.net/docs/sentence_transformer/training/examples.html) or [training examples on GitHub](https://github.com/UKPLab/sentence-transformers/tree/master/examples/sentence_transformer/training).
184
-
* Reranker Models
185
-
*[Cross Encoder > Training Overview](https://www.sbert.net/docs/cross_encoder/training_overview.html)
186
-
*[Cross Encoder > Training Examples](https://www.sbert.net/docs/cross_encoder/training/examples.html) or [training examples on GitHub](https://github.com/UKPLab/sentence-transformers/tree/master/examples/cross_encoder/training).
187
-
* Sparse Embedding Models
188
-
*[Sparse Encoder > Training Overview](https://www.sbert.net/docs/sparse_encoder/training_overview.html)
189
-
*[Sparse Encoder > Training Examples](https://www.sbert.net/docs/sparse_encoder/training/examples.html) or [training examples on GitHub](https://github.com/UKPLab/sentence-transformers/tree/master/examples/sparse_encoder/training).
180
+
- Embedding Models
181
+
-[Sentence Transformer > Training Overview](https://www.sbert.net/docs/sentence_transformer/training_overview.html)
182
+
-[Sentence Transformer > Training Examples](https://www.sbert.net/docs/sentence_transformer/training/examples.html) or [training examples on GitHub](https://github.com/huggingface/sentence-transformers/tree/master/examples/sentence_transformer/training).
183
+
- Reranker Models
184
+
-[Cross Encoder > Training Overview](https://www.sbert.net/docs/cross_encoder/training_overview.html)
185
+
-[Cross Encoder > Training Examples](https://www.sbert.net/docs/cross_encoder/training/examples.html) or [training examples on GitHub](https://github.com/huggingface/sentence-transformers/tree/master/examples/cross_encoder/training).
186
+
- Sparse Embedding Models
187
+
-[Sparse Encoder > Training Overview](https://www.sbert.net/docs/sparse_encoder/training_overview.html)
188
+
-[Sparse Encoder > Training Examples](https://www.sbert.net/docs/sparse_encoder/training/examples.html) or [training examples on GitHub](https://github.com/huggingface/sentence-transformers/tree/master/examples/sparse_encoder/training).
190
189
191
190
Some highlights across the different types of training are:
191
+
192
192
- Support of various transformer networks including BERT, RoBERTa, XLM-R, DistilBERT, Electra, BART, ...
193
193
- Multi-Lingual and multi-task learning
194
194
- Evaluation during training to find optimal model
@@ -199,29 +199,36 @@ Some highlights across the different types of training are:
For all examples, see [examples/sentence_transformer/applications](https://github.com/UKPLab/sentence-transformers/tree/master/examples/sentence_transformer/applications).
231
+
For all examples, see [examples/sentence_transformer/applications](https://github.com/huggingface/sentence-transformers/tree/master/examples/sentence_transformer/applications).
225
232
226
233
## Development setup
227
234
@@ -243,7 +250,7 @@ pytest
243
250
244
251
If you find this repository helpful, feel free to cite our publication [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084):
245
252
246
-
```bibtex
253
+
```bibtex
247
254
@inproceedings{reimers-2019-sentence-bert,
248
255
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
249
256
author = "Reimers, Nils and Gurevych, Iryna",
@@ -271,10 +278,18 @@ If you use one of the multilingual models, feel free to cite our publication [Ma
271
278
272
279
Please have a look at [Publications](https://www.sbert.net/docs/publications.html) for our different publications that are integrated into SentenceTransformers.
273
280
274
-
Maintainer: [Tom Aarsen](https://github.com/tomaarsen), 🤗 Hugging Face
281
+
### Maintainers
275
282
276
-
https://www.ukp.tu-darmstadt.de/
283
+
Maintainer: [Tom Aarsen](https://github.com/tomaarsen), 🤗 Hugging Face
277
284
278
285
Don't hesitate to open an issue if something is broken (and it shouldn't be) or if you have further questions.
279
286
287
+
---
288
+
289
+
This project was originally developed by the [Ubiquitous Knowledge Processing (UKP) Lab](https://www.ukp.tu-darmstadt.de/) at TU Darmstadt. We're grateful for their foundational work and continued contributions to the field.
290
+
280
291
> This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
In practice, not all loss functions get used equally often. The most common scenarios are:
44
46
45
-
*`(sentence_A, sentence_B) pairs` with `float similarity score` or `1 if positive, 0 if negative`: <ahref="../package_reference/cross_encoder/losses.html#binarycrossentropyloss"><code>BinaryCrossEntropyLoss</code></a> is a traditional option that remains very challenging to outperform.
46
-
*`(anchor, positive) pairs` without any labels: combined with <ahref="../package_reference/util.html#sentence_transformers.util.mine_hard_negatives"><code>mine_hard_negatives</code></a>
47
-
* with <code>output_format="labeled-list"</code>, then <ahref="../package_reference/cross_encoder/losses.html#lambdaloss"><code>LambdaLoss</code></a> is frequently used for learning-to-rank tasks.
48
-
* with <code>output_format="labeled-pair"</code>, then <ahref="../package_reference/cross_encoder/losses.html#binarycrossentropyloss"><code>BinaryCrossEntropyLoss</code></a> remains a strong option.
47
+
-`(sentence_A, sentence_B) pairs` with `float similarity score` or `1 if positive, 0 if negative`: <ahref="../package_reference/cross_encoder/losses.html#binarycrossentropyloss"><code>BinaryCrossEntropyLoss</code></a> is a traditional option that remains very challenging to outperform.
48
+
-`(anchor, positive) pairs` without any labels: combined with <ahref="../package_reference/util.html#sentence_transformers.util.mine_hard_negatives"><code>mine_hard_negatives</code></a>
49
+
- with <code>output_format="labeled-list"</code>, then <ahref="../package_reference/cross_encoder/losses.html#lambdaloss"><code>LambdaLoss</code></a> is frequently used for learning-to-rank tasks.
50
+
- with <code>output_format="labeled-pair"</code>, then <ahref="../package_reference/cross_encoder/losses.html#binarycrossentropyloss"><code>BinaryCrossEntropyLoss</code></a> remains a strong option.
49
51
50
52
## Custom Loss Functions
51
53
@@ -62,4 +64,4 @@ To get full support with the automatic model card generation, you may also wish
62
64
- a ``citation`` property so your work gets cited in all models that train with the loss.
63
65
64
66
Consider inspecting existing loss functions to get a feel for how loss functions are commonly implemented.
0 commit comments