Posts

Building a Recommendation Engine with Redis

When I was asked which topic I would like to present at this year's OOP conference, I was out of the box thinking about 'Something with Machine Learning' involved. It was years ago at the university when I had a secondary focus on 'Artificial Intelligence and Neural Networks' and I think that's fair to say that the topic was not as 'hot' as it is today. The algorithms were the same as today but the frameworks were not that commodity and calculations happened either on paper, with MatLab or with some very specialized software for neural network training. However, the actual discipline stayed fascinating and even if I would not call myself a Data Scientist (I sticked more with my primary focus which was Database Implementation Techniques - so I am more a database guy :-) ) I am really amazed of the adoption and number of arising frameworks in the field of Machine Learning and Artifical Intelligence. Machine Learning or Artificial Intelligence is quite a ...

Asynchronous Operation Execution with Netty on Redis

Netty got my attention a while back and I just wanted to play a bit around with it. Given the fact that I am already fallen in love with Redis, what would be more fun than implementing a low level client for Redis based on Netty? Let's begin to answer the question "What the hell is Netty?". Netty is an asynchronous (Java based) event-driven network application framework. It is helping you to develop high performance protocol servers and clients. We are obviously more interested in the client part here, meaning that this article is focusing on how to interact with a Redis Server. Netty is already coming with RESP  support. The package 'io.netty.handler.codec.redis' contains several Redis message formats: RedisMessage : A general Redis message ArrayRedisMessages : An implementation of the RESP Array message SimpleRedisStringMessage : An implementation of a RESP Simple String message ... So all we need to do is to: Boostrap a channel : A channel is...

Data Encryption at Rest

Data security and protection is currently a hot topic. It seems that we reached the point when the pendulum is swinging back again. After years of voluntary openness by sharing personal information freely with social networks, people are getting more and more concerned about how their personal data is used in order to profile or influence them. Social network vendors are getting currently bad press, but maybe we should ask ourself the fair question "Didn't we know all the time that their services are not for free and that we are paying them with our data?". Maybe not strictly related to prominent (so called) 'data scandals' but at least following the movement of the pendulum is the new European GDPR regulation around data protection. Even if I think that it tends to 'overshoot the mark' (as we would say in German) and leaves data controllers and processors sometimes in the dark (unexpected rhyme ...), it is a good reason for me address some security topics...

To PubSub or not to PubSub, that is the question

Introduction   The PubSub pattern is quite simple: Publishers can publish messages to channels Subscribers of these channels are able to receive the messages from them There is no knowledge of the publisher about the functionality of any of the subscribers. So they are acting independently. The only thing which glues them together is a message within a channel. Here a very brief example with Redis: Open a session via 'redis-cli' and enter the following command in order to subscribe to a channel with the name 'public'  In another 'redis-cli' session enter the following command in order to publish the message 'Hello world' to the 'public' channel: The result in the first session is: BTW: It's also possible to subscribe to a bunch of channels by using patterns, e.g. `PSUBSCRIBE pub*`    Fire and Forget  If we would start additional subscribers after our experiment then they won't receive the previous message...

Indexing with Redis

If you follow my news on Twitter then you might have realized that I just started to work more with Redis.  Redis (=Remote Dictionary Server) is known as a Data Structure Store. This means that we can not just deal with Key-Value Pairs (called Strings in Redis) but in addition with data structures as Hashes (Hash-Maps), Lists, Sets or Sorted Sets. Further details about data structures can be found here: https://redis.io/topics/data-types-intro Indexing in Key-Value Stores With a pure Key-Value Store, you would typically maintain your index structures manually by applying some KV-Store patterns. Here some examples: Direct access via the primary key: The key itself is semantically meaningful and so you can access a value directly by knowing how the key is structured (by using key patterns). An example would be to access an user profile by knowing the user's id. The key looks like 'user::<uid>'. Exact match by a secondary key: The KV-Store itself ca...

Kafka Connect with Couchbase

Image
About Kafka Apache Kafka is a distributed persistent message queuing system. It is used in order to realize publish-subscribe use cases, process streams of data in real-time and store a stream of data safely in a distributed replicated cluster. That said Apache Kafka is not a database system but can stream data from a database system in near-real-time. The data is represented as a message stream with Kafka. Producers put messages in a so called message topic and Consumers take messages out of it for further processing. There is a variety of connectors available. A short introduction to Kafka can be found here:  https://www.youtube.com/watch?v=fFPVwYKUTHs . This video explains the basic concepts and how Producers and Consumers are looking like. However, Couchbase supports 'Kafka Connect' since version 3.1 of it's connector. The Kafka documentation says "Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. It makes i...

Visualizing time series data from Couchbase with Grafana

Image
Grafana is a quite popular tool for querying and visualizing time series data and metrics. If you follow my blog then you might have seen my earlier post about how to use Couchbase Server for managing time series data: http://nosqlgeek.blogspot.de/2016/08/time-series-data-management-with.html This blog is now about extending this idea by providing a Grafana Couchbase plug-in for visualizing purposes. After you installed Grafana (I installed it on Ubuntu, but there are installation guides available here for several platforms), you are asked to configure a data source. Before we will use Grafana's 'SimpleJson' data source, it's relevant how the backend of such a data source looks like. '/' : Returns any successful response in order to test if the data source is available '/search ': Returns the available metrics. We will just return 'dax' in our example. '/annotations' : Returns an array of annotations. Such an annotation h...