Thursday, 27 July 2017

Indexing with Redis

If you follow my news on Twitter then you might have realized that I just started to work more with Redis.  Redis (=Remote Dictionary Server) is known as a Data Structure Store. This means that we can not just deal with Key-Value Pairs (called Strings in Redis) but in addition with data structures as Hashes (Hash-Maps), Lists, Sets or Sorted Sets. Further details about data structures can be found here:


Indexing in Key-Value Stores

With a pure Key-Value Store, you would typically maintain your index structures manually by applying some KV-Store patterns. Here some examples:

  • Direct access via the primary key: The key itself is semantically meaningful and so you can access a value directly by knowing how the key is structured (by using key patterns). An example would be to access an user profile by knowing the user's id. The key looks like 'user::<uid>'.
  • Exact match by a secondary key: The KV-Store itself can be seen as a huge Hash-Map, which means that you can use lookup items in order to reference other ones. This gives you a kind of Hash Index. An example would be to find a user by his email address. The lookup item has the key 'email::<email_addr>', whereby the value is the key of the user. In order to fetch the user with a specific email address you just need to do a Get operation on the key with the email prefix and then another one on the key with the user prefix. 
  • Range by a secondary key: This is where it is getting a bit more complicated with pure KV-Stores. Most of them allow you to retrieve a list of all keys, but doing a full 'key space scan' is not efficient (complexity of O(n), n=number of keys). You can indeed build your own tree structure by storing lists as values and by referencing between them, but maintaining these search trees on the application side is really not what you usually want to do.


The Redis Way

So how is Redis addressing these examples? We are leveraging the power of data structures as Hashes and Sorted Sets.

Direct Access via the Primary Key

A Get operation already has a complexity of O(1). This is the same for Redis.

Exact Match by a Secondary Key

Hashes (as the name already indicates) can be directly used to build a hash index in order to support exact match 'queries'. The complexity of accessing an entry in a Redis Hash is indeed O(1). Here an example:

In addition Redis Hashes are supporting operations as HSCAN. This provides you a cursor based approach to scan hashes. Further information can be found here:


Here an example:


Range By a Secondary Key

Sorted Sets can be used to support range 'queries'.  The way how this works is that we use the value for which we are searching  as the score (order number). To scan such a Sorted Set has then a complexity of O(log(n)+m) whereby n is the number of elements in the set and m is the result set size.

Here an example:

If you add 2 elements with the same score then they are sorted lexicographically. This is interesting for non-numeric values. The command ZRANGEBYLEX allows you to perform range 'queries' by taking the lexicographic order into account.


Modules

Redis supports now Modules (since v4.0). Modules are allowing you to extend Redis' functionality. One module which perfectly matches the topic of this blog post is RediSearch. RediSearch is basically providing Full Text Indexing and Searching capabilities to Redis. It uses an Inverted Index behind the scenes. Further details about RediSearch can be found here:

Here a very basic example from the RediSearch documentation:


As usual, I hope that you found this article useful and informative. Feedback is very welcome!


2 comments:

  1. Great overview of indexing (and *SCAN which is so mysterious to some). One question though - why two colons in your keys? That's something I've not seen before; just idiomatic?

    ReplyDelete
  2. Hi Kyle, '::' is a name space delimiter in some programming languages as e.g. Perl. I typically use '::' or '_' as the delimiter within the key. It's most unlikely that '::' is occurring in a component of the key and so I prefer to use this one. Regards, David

    ReplyDelete