Annoy
Annoy (
Approximate Nearest Neighbors Oh Yeah
) is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large read-only file-based data structures that are mapped into memory so that many processes may share the same data.
You'll need to install langchain-community
with pip install -qU langchain-community
to use this integration
This notebook shows how to use functionality related to the Annoy
vector database.
NOTE: Annoy is read-only - once the index is built you cannot add any more embeddings!
If you want to progressively add new entries to your VectorStore then better choose an alternative!
%pip install --upgrade --quiet annoy
Create VectorStore from texts
from langchain_community.vectorstores import Annoy
from langchain_huggingface import HuggingFaceEmbeddings
model_name = "sentence-transformers/all-mpnet-base-v2"
embeddings_func = HuggingFaceEmbeddings(model_name=model_name)
API Reference:Annoy | HuggingFaceEmbeddings
texts = ["pizza is great", "I love salad", "my car", "a dog"]
# default metric is angular
vector_store = Annoy.from_texts(texts, embeddings_func)
# allows for custom annoy parameters, defaults are n_trees=100, n_jobs=-1, metric="angular"
vector_store_v2 = Annoy.from_texts(
texts, embeddings_func, metric="dot", n_trees=100, n_jobs=1
)
vector_store.similarity_search("food", k=3)
[Document(page_content='pizza is great', metadata={}),
Document(page_content='I love salad', metadata={}),
Document(page_content='my car', metadata={})]
# the score is a distance metric, so lower is better
vector_store.similarity_search_with_score("food", k=3)
[(Document(page_content='pizza is great', metadata={}), 1.0944390296936035),
(Document(page_content='I love salad', metadata={}), 1.1273186206817627),
(Document(page_content='my car', metadata={}), 1.1580758094787598)]