Skip to content

Commit aaec753

Browse files
committed
Merge branch 'master' into v2.5-release
2 parents c4b32c2 + 66e0ee3 commit aaec753

File tree

77 files changed

+138
-39
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

77 files changed

+138
-39
lines changed

examples/applications/clustering/agglomerative.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33
44
Sentences are mapped to sentence embeddings and then agglomerative clustering with a threshold is applied.
55
"""
6+
67
from sentence_transformers import SentenceTransformer
78
from sklearn.cluster import AgglomerativeClustering
8-
import numpy as np
99

1010
embedder = SentenceTransformer("all-MiniLM-L6-v2")
1111

@@ -25,8 +25,8 @@
2525
]
2626
corpus_embeddings = embedder.encode(corpus)
2727

28-
# Normalize the embeddings to unit length
29-
corpus_embeddings = corpus_embeddings / np.linalg.norm(corpus_embeddings, axis=1, keepdims=True)
28+
# Some models don't automatically normalize the embeddings, in which case you should normalize the embeddings:
29+
# corpus_embeddings = corpus_embeddings / np.linalg.norm(corpus_embeddings, axis=1, keepdims=True)
3030

3131
# Perform kmean clustering
3232
clustering_model = AgglomerativeClustering(

examples/applications/clustering/fast_clustering.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
1212
In this example, we download a large set of questions from Quora and then find similar questions in this set.
1313
"""
14+
1415
from sentence_transformers import SentenceTransformer, util
1516
import os
1617
import csv

examples/applications/clustering/kmeans.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
44
Sentences are mapped to sentence embeddings and then k-mean clustering is applied.
55
"""
6+
67
from sentence_transformers import SentenceTransformer
78
from sklearn.cluster import KMeans
89

examples/applications/computing-embeddings/computing_embeddings_streaming.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
when encoding large text collections.
55
It also demonstrates how to stream data which is helpful in case you don't
66
want to wait for an extremely large dataset to download, or if you want to
7-
limit the amount of memory used. More info about dataset streaming:
7+
limit the amount of memory used. More info about dataset streaming:
88
https://huggingface.co/docs/datasets/stream
99
"""
1010

examples/applications/cross-encoder/cross-encoder_reranking.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
77
Then, we re-rank the hits from the Bi-Encoder using a Cross-Encoder.
88
"""
9+
910
from sentence_transformers import SentenceTransformer, util
1011
from sentence_transformers import CrossEncoder
1112
import os

examples/applications/cross-encoder/cross-encoder_usage.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
sentences in a corpus using a Cross-Encoder for semantic textual similarity (STS).
44
It output then the most similar sentences for the given query.
55
"""
6+
67
from sentence_transformers.cross_encoder import CrossEncoder
78
import numpy as np
89

examples/applications/parallel-sentence-mining/bitext_mining.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
This script requires that you have FAISS installed:
1313
https://github.com/facebookresearch/faiss
1414
"""
15+
1516
from sentence_transformers import SentenceTransformer, models
1617
import numpy as np
1718
from bitext_mining_utils import score_candidates, kNN, file_open

examples/applications/parallel-sentence-mining/bucc2018.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
This script requires that you have FAISS installed:
1010
https://github.com/facebookresearch/faiss
1111
"""
12+
1213
from sentence_transformers import SentenceTransformer, models
1314
from collections import defaultdict
1415
import os

examples/applications/semantic-search/semantic_search.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
77
This script outputs for various queries the top 5 most similar sentences in the corpus.
88
"""
9+
910
from sentence_transformers import SentenceTransformer, util
1011
import torch
1112

examples/applications/semantic-search/semantic_search_publications.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,14 @@
11
"""
22
This example demonstrates how we can perform semantic search for scientific publications.
33
4-
As model, we use SPECTER (https://github.com/allenai/specter), which encodes paper titles and abstracts
4+
As model, we use SPECTER (https://github.com/allenai/specter), which encodes paper titles and abstracts
55
into a vector space.
66
77
When can then use util.semantic_search() to find the most similar papers.
88
99
Colab example: https://colab.research.google.com/drive/12hfBveGHRsxhPIUMmJYrll2lFU4fOX06
1010
"""
11+
1112
import json
1213
import os
1314
from sentence_transformers import SentenceTransformer, util

0 commit comments

Comments
 (0)