How to use Redis as a Vector Database for Recommendations
This blog post is the result of some preparation work for a recent meetup, where I introduced a bunch of recommendation engine algorithms. The idea of using vector similarity search for recommendations is quite simple:
- The interests of a user are expressed as a vector. Each component of the vector is associated with one category of interest.
- If we know the interests of a specific user, we can search for the K-N(earest)N(eighbours) to find other users that share the same interests.
- We can then inspect the behavior of these users (e.g., the purchase history) to recommend our user specific products.
Let's assume that a user is only interested in a specific category if the interest value is larger than the threshold of 0.4, which means that our user is interested in 'books' and 'comics' but not in 'computers'.
How to use RedisYou can use Redis Stack's query and search capabilities to:
- Index vectors
- Find similar vectors
initThe Redis connection is established within the constructor of the VectorDB class:
create_indexAs the name indicates, this method creates a search index. The default schema has a descr text field, a labels tag field, a numeric field called time, and a vector field named vec. The relevant code within this method is equivalent to the following:
This method adds a vector with metadata to the database. I use a Redis hash in this case, but you can also store vectors within JSON with Redis Stack.
The data dictionary contains the fields labels, descr, time, and vec. The vector is stored as binary within the vec field. The library numpy is used to convert a more human-readable float vector (a Python list) to its byte string representation:
Here is an example of such a data dictionary:
The vector_search method performs the vector similarity search. My implementation only returns the id and vector score. The query string has a few arguments:
- Metadata query: The variable meta_data_query is set to the query string that is executed to pre-filter based on the metadata, such as the description (desc) or the labels. The => operator means execute before => execute after. So, the metadata query is executed before the vector similarity search is performed.
- Number of neighbours: The value of num_neighbours is set to the KNN integer value.
- Vector field: This is the vector field that is used for the search. Redis can store multiple vector fields within an item (hash or JSON).
You can then query the database the following way:
For further details, please look at the vector similarity search reference documentation.
Putting it all togetherAs explained, I decided to add a thin layer of abstraction by implementing this VectorDB class. The following example shows how to use it:
- Create an index
- Add some vectors with metadata
- Perform a simple query for users that are labeled with specific interests
- Execute a vector similarity search for the two nearest neighbours
Here is the source code of the demo application:
The output of this program is:
It's important to understand that a lower score means that a vector is closer to the search vector. In my case, the result is ordered (ascending) by the vector field's score.
I hope that you enjoyed this blog post. If you didn't find the time to read it entirely, then I also recorded a video walk-through.
Post a Comment