To PubSub or not to PubSub, that is the question
Introduction
The PubSub pattern is quite simple:- Publishers can publish messages to channels
- Subscribers of these channels are able to receive the messages from them
Here a very brief example with Redis:
- Open a session via 'redis-cli' and enter the following command in order to subscribe to a channel with the name 'public'
- In another 'redis-cli' session enter the following command in order to publish the message 'Hello world' to the 'public' channel:
The result in the first session is:
BTW: It's also possible to subscribe to a bunch of channels by using patterns, e.g. `PSUBSCRIBE pub*`
Fire and Forget
If we would start additional subscribers after our experiment then they won't receive the previous messages. So we can see that we can only receive messages when we are actively subscribed. Meaning that we can't retrieve missed messages afterwards. In other words:- Only currently listening subscribers are retrieving messages
- A message is retrieved by all active subscribers of a channel
- If a subscriber dies and comes back later then it might have missed messages
Message Queues
Message queues on the other's hand side are intended to scale the workload. A list of messages is processed by a pool of workers. As the pool of workers is usually limited in size, it's important that messages are buffered until a worker is free in order to process it. Redis (Enterprise) features like- Persistency
- High Availability
It's important to state that there are already plenty of libraries and solutions out there for this purpose. Here two examples:
A very simple queue implementation would use a list. Because entries of the list are strings, it would be good to encode messages into e.g. JSON if they have a more complex structure.
- Create a queue and inform the scheduler that a new queue is alive:
- Add 2 messages to the queue:
- Schedule the workers: We could indeed use a more complex scheduling apporach. However, the simplest and stupidest would be to just assign the next worker of the pool to the next message. So in order to dequeue a message we can just use `LPOP`:
BTW: If our queue would be initially empty then there is a way to wait for a while until something arrives by using the `BLPOP` command.
Using PubSub is actually optional for our message queue example. It's easy to see that the scheduler could also assign workers without getting notified because it can at any time access the queues and messages. However, I found it a bit more dynamic to combine our queue example with PubSub:
- The scheduler gets notified when new work needs to be assigned to the workers
- As these notifications are fire and forget, it would be also possible for the scheduler to check from time to time if there is something to do
- If the scheduler dies then another instance can be started which can access the database in order to double check which work was already done by the workers and which work still needs to be done. An interuppted job can be restarted based on such state information.
Summary
Redis' PubSub is 'Fire and forget'. It's intended to be used to deliver messages from many (publishers) to many (subscribers). It's indeed a useful feature for notification purposes. However, it's important to understand the differences between a messaging and a message processing use case.The way how we used it was to inform a single scheduler that some work needs to be done. The scheduler would then hand over to a pool of worker threads in order to process the actual queue. The entire state of the queue was stored in our database as list because PubSub alone is not intended to be used for message queuing use cases. In fact the usage of PubSub for our queuing example was optional.