Use Hugging Face Embeddings with Upstash Vector
In this tutorial, we’ll demonstrate how to use Hugging Face embeddings with Upstash Vector and LangChain to perform a similarity search. We will upload a few sample documents, embed them using Hugging Face, and then perform a search query to find the most semantically similar documents.
Important Note on Python Version
Recent Python versions may cause compatibility issues with torch
, a dependency for Hugging Face models. Therefore, we recommend using Python 3.9 to avoid any installation issues.
Installation and Setup
First, we need to set up our environment and install the necessary libraries. Install the dependencies by running the following command:
Next, create a .env
file in your project directory with the following content, replacing your_upstash_url
and your_upstash_token
with your actual Upstash credentials:
This configuration file will allow us to load the required environment variables.
Code
We will load our environment variables and initialize the Hugging Face embeddings model along with the Upstash Vector store.
Next, we will create sample documents and embed them using Hugging Face embeddings, then store them in Upstash Vector.
When inserting documents, they are first embedded using the Embeddings
object. Many embedding models, such as the Hugging Face models, support embedding multiple documents at once. This allows for efficient processing by batching documents and embedding them in parallel.
- The
embedding_chunk_size
parameter controls the number of documents processed in parallel when creating embeddings.
Once the embeddings are created, they are stored in Upstash Vector. To reduce the number of HTTP requests, the vectors are also batched when they are sent to Upstash Vector.
- The
batch_size
parameter controls the number of vectors included in each HTTP request when sending to Upstash Vector.
In the Upstash Vector free tier, there is a limit of 1000 vectors per batch.
Now, we can perform a semantic search.
Here’s the output of our text-based similarity search:
Alternatively, you can perform a similarity search using the vector directly.
This will output similar results, as it is searching based on the similarity of the embedding.
Notes
- You can specify batch sizes and chunk sizes to control the efficiency of document processing and storage in Upstash Vector.
- Upstash Vector supports namespaces for organizing different types of documents. You can set a namespace while creating the
UpstashVectorStore
instance.
To learn more about LangChain and its integration with Upstash Vector, visit the LangChain documentation.