Friday, 25 October 2019

So what exactly is an Event Loop?


Most of you are knowing that I am working a lot with Redis. And some of you might also know that Redis OSS 'standalone' is mainly single-threaded. The reason why it can achieve that high throughputs on a single instance is that it uses an event loop. But what the hell is an event loop and how does it work?

First of all, let me tell you the story behind this article. It all started last weekend. For some reason, I found the time to read a Kotlin book. Not sure why I did, guess I just had the feeling that I am too long disconnected from actual development tasks and wanted to explore one of the comparable new programming languages. The book was great and I had the impression that Kotlin is actually quite nice. Then I went back to my main task (helping to enable the Technical Field at Redis Labs) and had to work on a slide about the Redis event loop. A look at the source code (and the following article, raised a question for me: Why is there a TimeEvent? The answer might finally sound simple, but just the fact that I dug a bit deeper into the event loop topic caused the idea to just implement a very simple event loop in Kotlin. So I did a bit more research and found also very good explanations of how the Node.js event loop works.

So this article has the intention to share my learnings by using this light-weighted event loop that I implemented as an academic example

What is kEventLib?

So kEventLib is a light-weighted event loop implementation in Kotlin. As said, this project has more academical character. The idea is to illustrate how an event loop works by giving me the chance to play a bit around with Kotlin.

What's an Event?

We are defining a generic event as something which can happen at a specific time, has a type and a payload:

More specific events were derived from Event:
  • SimpleEvent: An event without a specific time. It doesn't matter exactly when such an event should be executed
  • TimedEvent: An event which allows passing a delay, which means that the event should not be executed before this time is over
Timed events are having a lower priority than non-timed events. So we will execute non-timed events first, but we are considering that timed events are deferred to be executed in the future.
An excellent example of timed events would be 'disk write' events in Redis or async calls in Node.js.

Why do we need an Event Queue and Event Buffer?

I decided to implement two different structures, dependent on if it is about a non-timed event or a timed event.
  • EventQueue: We are using the event queue to process the events in the order of their appearance. Node.js is using a stack instead of a queue because calls can be nested, and so a call-stack makes more sense.
  • EventBuffer: This structure is used to buffer timed events. My naive event-loop works in a way that timed events are only processed after all non-timed events are processed. You could indeed think of more sophisticated scheduling approaches.

What does the Event Loop?

The event loop is a ... loop which runs a function call in a single thread:

Processing an event means to check first if the event queue is empty. If not, then we are processing one of the queued events. If it is empty, then we start processing the buffered events. All buffered events that are in the past will be processed, whereby the event with the minimum timestamp (the one which happened earliest) will be processed first.
It can happen that no event can be processed. Then an EmptyEvent is returned. It can also happen that an error occurs when submitting an event to the loop. This will return an ErrorEvent. Such an error is caused by the fact that either the event loop's queue or buffer is fully utilized. The 'submitter' would then need to implement a back-pressure mechanism.

Show me an Example!

Here some example code:

 The execution output (handling an event just prints some details about it) looks like:

Events that are printed with the prefix '-1' are non-timed events. Otherwise, the prefix is the timestamp (to which the event was deferred to).

You can see that the non-timed events were executed first. Then we didn't submit non-timed for a while (due to the sleep after every submission of a timed event) which is the reason why some of the timed events are executed. Then we are executing the non-timed events again and finally the deferred timed ones.

 I hope you enjoyed reading this post. The full source code can be found here: .

Friday, 10 May 2019

A simple but special Redis Web Client

It was a while ago when I wrote my last blog post. So let me take the chance to write something about a very simple piece of software which started as a fun project of mine. Let's do it a bit different this time. I will show you the results first and then I will explain what's special about this application:

This looks very basic, right? What's special about it? So here is the story behind it:

It all started a few weeks ago when I decide to buy an iPad Pro (11 inches). The motivation was indeed not to use it as a development machine. My current role requires to draw some diagrams and to explain stuff a bit more vizually. So the iPad Pro seemed to be a nice device for such a purpose.
Being a techie, I wondered a bit which kind of development can be done on it and I started to install some tools for experimenting with them:
  • Working Copy: A Git client
  • Pythonista: A Python IDE
  • StaSh: A shell for Pythonista which allows you to use packages via 'pip'
  • Blink: A CLI with an SSH client
  • VNC Viewer: Access the screen of your computer (which is running a VNC server)

Especially Pythonista is a great tool. Here a screen shot of it:

It all went a bit "crazy" when I went to my barber to get my beard cut. Now my Turkish barber is a very good one which means that he is very busy (seems this is a pattern across all industries - let it be IT-specialists or barbers). So I had to wait for about 2 hours to get a shave. As I already expected some waiting time, I took my iPad Pro with me (for i.e. reading a book). Holding it in my hands, I was then thinking why not trying to drive this 'How to develop on this device?' idea forward. But which kind of application should I develop? So one of the thoughts was to be able to demonstrate Redis ( itself on the iPad. Wouldn't it be cool to
  • Just connect a small device with a Pen to a big screen (i.e. projector via USB-C, screen sharing)
  • Explain some concepts by just drawing like on a whiteboard
  • Demonstrate some Redis basics by being connected to a Redis Cloud instance (You can get a 30MB database for free here:

Given the fact that I invested some time to write this article, my opinion is clearly: "Yes, it would be cool!". However, if you are still wondering "Why the hell should I want to develop on an iPad Pro?" then I guess the answer is: "Because you can!" ;-) So this is more a fun project, whereby it might be a good basic example for:

So have fun!

Ah before I forget it: If you are searching for a very basic Redis Web Client or if you are interested in to take a look at the source code, then the source code can be found here for now.

Friday, 8 February 2019

Building a Recommendation Engine with Redis

When I was asked which topic I would like to present at this year's OOP conference, I was out of the box thinking about 'Something with Machine Learning' involved. It was years ago at the university when I had a secondary focus on 'Artificial Intelligence and Neural Networks' and I think that's fair to say that the topic was not as 'hot' as it is today. The algorithms were the same as today but the frameworks were not that commodity and calculations happened either on paper, with MatLab or with some very specialized software for neural network training. However, the actual discipline stayed fascinating and even if I would not call myself a Data Scientist (I sticked more with my primary focus which was Database Implementation Techniques - so I am more a database guy :-) ) I am really amazed of the adoption and number of arising frameworks in the field of Machine Learning and Artifical Intelligence.

Machine Learning or Artificial Intelligence is quite a wide field and so I concluded to go with something more specific which has touch points to Machine Learning and AI. The topic with which I finally went is:
  •  Redis Modules for Recommender Systems
Most of you might know Redis already but it's maybe worth to mention what Redis actually is:
Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. Redis has built-in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence.
Redis is very popular. Here a ranking of the most popular database systems (just to highlight how popular Redis is):
Redis is modular, which means that it can be extended by modules (like plug-ins). A list of Redis modules can be found here.

As I am already seeing that this article is becoming a bit longer than a blog article should be, here an outlook regarding the topics. I hope this will motivate the one or the other to continue reading ... .
  1. Data Model
  2. Preparations
  3. Content Based Filtering (using Sets)
  4. Collaborative Filtering (using Sets)
  5. Ratings based Collaborative Filtering (using Sorted Sets)
  6. Social Collaborative Filtering (using RedisGraph)
  7. Content Relevance via Full Text Search (using RediSearch)
  8. Probabilistic Data Structures (using the built-in HyperLogLog structure + ReBloom)
  9. Machine Learning for Classifications and Predictions (using Redis-ML and other AI modules)
Data Model

Let's discuss which problem needs to be solved. Therefore the following simplified data model might be interesting:
  • User: A real-life person which is interacting with a system by showing some interests in items. Users can be classified.
  • Item: The thing users can be interested in. Items can be classified.
  • Interest Classification: Classifications can happen based on item properties, the user attributes, the relationship of users to existing items or the relationship of users to other users and their items. A classification can be for instance expressed as a simple 'Class membership' or as a number which is telling how likely something belongs to which class.
  • Recommendation: The actual recommendation, so which items could be interesting for a user is derived from some classifications. 
 Our example code will basicall use the following terms:
  • Users are just named users
  • We are looking at specific items, which means Comic books
  • Several algorithms and approaches are using different kinds of classifications


If you want to follow the code samples of this blog post then you can also find a Jupyter notbook here.
I basically prepared the following Redis instances (one per module):
  • Redis (r)
  • Redis + Machine Learning (r_m)
  • Redis (Bloom Filters) + HyperLogLog (r_b)
  • Redis + Graph (r_g)
  • Redis + Full Text Search (r_s)

Python Prep Script

Content Based Filtering

The idea is to look at what a specific user is interested in and then to recommend things those are similar (i.e. having the same class) as other things the user is liking.

Content based filtering can be done based on 'real set' operations. Redis is coming with a 'Set' data structure which is allowing to perform membership checks, scans, intersections, unions and so on.

Let's look at the following example: 

Python Script

The output is:
David could be also interested in: { 'Fantastic Four',  'Wonder Woman',  'Batman',  'Dragon Age', 'Avatar', 'Valerian', 'Spiderman'}

Collarborative Filtering

The underlying idea is that if person A likes the same things as person B, then person B might also like the other items those are liked by person A. So it's mandatory to have details about many other users collected for a proper classification:

We are using again Redis' Set data structure and especially the union and diff operations.

Let's add some demo data:

Python Demo Data Script

Now let's look at the following example:

Python Script

We can now see which other users B could be interested in the same items as A and then derive a recommendation for the given user A based on the other interests of users B:
Users interested in the same items as David: {'david', 'pieter'}
David is interested in: {'Spiderman', 'Batman'}
David could be also interested in: {'Wonder Woman'}

Ratings based Collaborative Filtering

We are talking about collaborative filtering again. So the approach is to derive a recommendation from similarities to other users. In addition we are now interested in 'How much does a user like an item?'. This allows us i.e. to find out if two or more users are liking similar things. Items those are liked by user B but not yet liked by user A could be also interesting for user A.
The Redis structure which is used is a 'Sorted Set'. An element in a sorted has a score. We will use this score as our rating value (1-5, i.e. stars).  A very cool feature of sorted sets is that set operations are allowing to aggregate scores. By default, the resulting score of an element is the sum of the scores across the considered sorted sets. You can combine aggregations with weights. Weights are multiplicators for scores. The weight (1,-1) means to subtract the second score value from the first score value.
We will first find users B those rated the same items as a specific user A. The idea is then to leverage this aggregation feature in order to calculate the distance vector of ratings between user A and the previously identified users B. We will then use RMS in order to calculate the average distance as a scalar. The R(oot) M(ean) S(quare) value of a set of values is the square root of the arithmetic mean of the squares of the values. Only users with an average rating distance less or equal to 1 (which means that the users rating was very similar) will be considered. Finally we will recommend items of users B to user A, whereby we are only considering items with a score of at least 4.

I added some helper functions to the following script. I am bascially not pasting them here again by hoping that their functionality is self-explaining. The full source code can be found here: .

Python Demo Data Script

Let's first find users B those are liking the same things as A:

Python Script

The output is:
The following users rated David's items: ['pieter', 'david']
Now let's calculate the similarities by then proposing the highest rated items of users B:

Python Script

The result is that Pieter has a matching rating distance and so 'Aqua Man' is a highly recommended Comic to David (whatever tells this about Pieter ;-) ):
The rating distance to pieter is [('batman', 1.0), ('superman', -1.0)]
The average distance (RMS) to pieter is 1.0
The following is highly recommended: [('aqua_man', 5.0)]

Social Collaborative Filtering

The previous examples used Sets and Sorted Sets. We are now exploring how to use Graphs. Our example is taking a social ('friend of') aspect into account. A mathematical Graph is described as a set of vertices V and a set of Edges E, whereby E is a subset of VxV. Graph database systems are extending this mathematical definition to a so called 'Property Graph Model' which means that vertices and edges can have properties (KV pairs) associated.

Our idea is to find all comics of friends B of a given user A those are interested in a specific comic category (Super Heros). Comic books that are liked more often by the friends of user A are more relevant and should be recommended.

I am again skipping the helper functions in order to avoid to blow this article even more up. If you are interested, then the full source code can be found here: The function names are hopefully self-explaining.

Python Demo Data Script

Here the actual Graph Query:

Python Script

As 'Wonder Woman' is liked by 2 of David's friends it is more relevant than the other comics:
David has the following friends: [[['name'], ['Pieter'], ['Vassilis'], ['Katrin']]]
David likes [[['name'], ['Spiderman'], ['Batman']]]
Comic 'Wonder Woman' with relevance 2.000000
Comic 'Batman' with relevance 1.000000
Comic 'Superman' with relevance 1.000000

Content Relevance via Full Text Search

RediSearch is a search engine module for Redis. It comes with multiple built-in scoring functions. We will look at T(erm)F(requency)I(inverse)D(ocument)F(requency). It takes the following aspects into account:
  1. Term Frequency: How often does a specific term appear?
  2. Inverse Document Frequency: An inverse document frequency factor is incorporated which diminishes the weight of terms that occur very frequently in the document set and increases the weight of terms that occur rarely (i.e. the relevance of the word 'the'
Furhter details about scoring can be found here:
We are trying to identify how likely something belongs to a specific class/category by performing a text search for terms those are associated to this category.

Helper functions are again skipped in this article but can be found in the source code repo.

Python Script

Spiderman is more likely a super hero than Batman:
[2, 'spiderman', 0.10000000000000001,['name', 'Spiderman'], 'batman', 0.035714285714285712,['name', 'Batman']]

Probabilistic Data Structures

Probabilistic data structures are characterized in the follwoing way: They ...
  • use hash functions for randamization purposes
  • return an approximated result
  • the error is under a specific threshold
  • are much more space efficient than deterministic approaches
  • provide a constant query time
You would use them because sometimes …
  • speed is more important than correctness
  • compactness is more important than correctness
  • you only need certain data guarantees
It's possible to combine them with deterministic approaches (i.e. HLL + det. counter for discovering counter manipulations).

We will take a look at the following two structures:
  • HyperLogLog: Cardinality estimation of a set, i.e. unique visits
  • Bloom Filter: Check if an item is contained in a set whereby false-positves are possible
Our example will not use 'unique visits' but we are more interested in how many unique users 'touched' a specific comic. Just imagine a real-life comic book store. A bunch of nerds (including myself ...) are hanging around and they are browsing for comics. Interesting comics will be removed from the shelf in order to take a closer look. This is what I mean with 'touched'. A comic which is more often touched can be considered as more interesting. We can count these unique touches quite space efficently by using a HyperLogLog:

Python Script

The output is:
HLL initial size: 31
Approx. count: 4
Please wait ...
Final HLL size: 10590 bytes
Approx. count: 99475

The bloom filter can be used to check if a user is interested a specific comic category without storing the users per category in a set:

Python Script

We are also printing the sizes out in order to demonstrate how space efficient Bloom filters are. The output of this script is:
BF size: 115 bytes
BF size: 76 bytes
Is Katrin interested in Fantasy?: 1
Is Katrin interested in Super Heros?: 0
Is David interested in Super Heros?: 1

Machine Learning for Classifications and Predictions

We are closing this blog article by circling back to the introduction of it. Classifications were so far often seen as a given (i.e. a comic book belonging to the 'Fantasy' category).  Others could be derived by taking the existing user interests into account. We also mentioned in the section 'Data Model' that classifications might be derived from user attributes or item properties. Now, Machine Learning is providing us ways to describe a more complex models by taking such attributes (=features) into account. Such features can be represented by structured data (i.e. the comic name, ...) or unstructured data (i.e. the images within a comic book, the used colors within comic book). Feature vectors could be for instance derived from the bitmap of a scanned cover of a comic book. At the end, you can consider every ML approach as a way to approximate a function F(x) -> y, whereby x is a feature vector and y is the output vector. The idea is to create/train a model based on the known values y for given vectors x. These given vectors are called the training features. The idea is to derive a model which is able to approximate/predict a 'good' output vector y for an unknown input vector x.

Here 2 examples for such models:
  • Decision Tree ensembles (random forests). The idea is to conduct a forest of decision trees at training time. RedisML can be used for the Model Serving by leveraging these decision trees for i.e. classification purposes. The class which appears most often will be the winner.
  • Neural networks: Train the weighted connections between neurons by using a learning algorithm (i.e. Backpropagation).
The following example leverages 2 very small decision trees:
  • Users with an age <=20 are liking Manga comics
  • Users with more than 1000 are not liking Manga comics
Python Script

I am feeling that this article could tell much more about 'Neural Networks and Artificial Intelligence', but I am also hoping that it's understandable that this is a very wide field and so I am thinking that it is worth to write a dedicated article about Redis for Machine Learning and Artificial Intelligence at a later point in time.

Finally, here some attitional Redis modules those didn't make it into one of the earlier sections:

  • Neural Redis: Is a Redis module that implements feed forward neural networks as a native data type for Redis. The project goal is to provide Redis users with an extremely simple to use machine learning experience.
  • Countminsketch: An apporximate frequency counter
  • Topk: An almost deterministic top k elements counte
  • Redis-tdigest: T-digest data structure wich can be used for accurate online accumulation of rank-based statistics such as quantiles and cumulative distribution at a point.
Thanks for reading! Feedback regarding this article is very welcome!