Microservices and Polyglot Persistence

Introduction

The idea behind Microservices is already described by it's name. In summary it means to use multiple smaller self-contained services to build up a system, instead of using one monolithic one. This explanation does sound simple, doesn't it? We will see that it is not because breaking up one single big system in several services has quite a lot implications.

Why Microservices?

A monolithic system would be a system which has only one main component. One of the disadvantages is usually that you have to deploy changes in a way that they affect the deployment of the whole system. A today's system is actually not completely monolithic at all, because it normally already consists of several sub-components. Often other decomposition mechanisms are already used. One way would be to build your system modular. Such a module might be actually a good candidate for a microservice, whereby it should optimally have business domain specific functionality and not a pure technical one.

Another aspect, you should be already familiar with as an object oriented developer, is de-coupling (loose coupling). Actually one component should live in a way for it's own. Sure there are well defined dependencies to other components. De-coupling allows you to ensure that you can replace one component of your system without the need to rewrite the a majority system again.

If splitting up a monolithic application into several parts, whereby specific functionalities are provided as services, you end up with a distributed system because each service is deployed by it's own. The idea is exactly to be able to scale these services independently out.

So Microservices are in a way not a complete new invention. Microservices are often just a consequence of what we already know or target regarding software architectures. Service oriented designs are also not completely new for us.

Polyglot character

One system made of multiple smaller services:

Can have a variety of communication protocols: About 10 years ago, I remember to have discussions about SOAP vs. REST. I actually liked SOAP because it was well defined and so your service client could be created just by the service definition. It has message based communication and there was a kind of standard message format (dependent on the binding). REST on the other hand's side had the charm to be less chatty and resource based. The protocol how 2 parties are communicating (which resources are accessed in exactly which way) was not out of the box predefined. Indeed, you also define what a REST service exposes. But it seemed to happen more often that the service did no longer talk exactly the same language as the client. Actually, it was more like it was speaking partially a weird dialect which could no longer be understood by the client and so the client had to learn it as well. There are libs and frameworks those are helping you (e.g annotations in JAX-RS). However, I'm pretty sure that a today's green field solutions would rely on RESTFul services. Sometimes you don't come from a green field and so you might still need to integrate a variety of different kinds of services.
Can be implemented by using several programming languages and frameworks: It's just relevant for another component of your system how to communicate with a specific service. The actual implementation is completely hidden from the other components of your system. So one service might be implemented in Node.js but another one might be implemented in Python. There are sometimes good reasons to develop one part of a system in e.g. C but others with maybe less effort e.g. in Node.js. Not every component might have the same resource and efficiency requirements (e.g. Garbage collection vs. manual disposal)
Can be developed by different kinds of developers: This is indeed related to the different programming languages and frameworks point. From my personal experience I would say that a C and a Java developer are really speaking different languages. Not just regarding the programming language, also how a specific problem would be addressed or how the tool chains are looking like. There is no good or bad, it's just different. So given that different functionalities might be developed by multiple different and independent teams, this point especially makes sense if these teams already got skill sets around specific programming languages and frameworks.
Polyglot persistence: A modern application consists usually of 3 tiers: interface, service and persistence. Given that we split the service tier up into multiple smaller services, there is the fair question what happens with your database/storage tier. We will discuss this a bit later in depth. Important is that the several services can use different database/storage back-ends. One service might need to write content items and stores the content itself in a Blob Store and the meta data in a Document Database. Another service might need to store the information who knows whom and so uses a Graph Database. A third service might handle user sessions and so uses a KV-store. This is what what polyglot persistence means.

What's happening with my Database?

This is actually quite interesting. Even if your system used before several modules, you will quite often see that the modules integrate with each other on the database level. The reasons is that the rules for your schema consolidation (regarding the good database schema design) might conflict with the de-coupling requirements. At the end each independent service should use it's own database. Instead of integrating service functionality on the database tier, the services should talk with each other. Let's use a very simple and stupid example. We talk about orders and customers. Let's assume that your shopping service is independent from the user profile service. Sure, shopping needs to know who the customer is, but not at the same level of detail as the user profile service knows. In a monolithic application you would have a 1-many relationship between customers and orders. So getting all the orders of a customer would be JOIN query. In a more service oriented world, you would ask for the directory service for the customer (by e.g. his email address) and then you would ask the shopping service for all the orders of this customer. If e.g. a new order should be processed then the shopping service would also need to talk with the payment service in order to fulfill the payment. The payment service also only knows the relevant information about the customer and not the complete user profile. Again, a very simple example, but the point is clear. A Microservices approach leads to distributed system, made of several services, which leads to split up databases as well.

Now, the relational database was so far gluing your data focused operations together by talking about transactions. Doesn't the service based approach mean that I loose these transactions on the database tier? Exactly! Given that you decoupled your services by no longer integrating on the database tier, you know have also to take care about the transactions on the service level. Relational database systems vendors are talking since decades about ACID and you got the impression that you absolutely need it? From my experience it's quite often the case that you anyway give up on ACID for performance reasons (weaker isolation level - e.g. read uncommitted) and we tend to rely such a lot on the DB's transaction management (by accidentally tolerating it's overhead) that we forgot that we often don't need ACID but only handling concurrent access to specific data items. The NoSQL system Couchbase Server for instance, doesn't come with a built-in transaction manager, but it comes with a framework which helps you on the client side to handle transactional behavior. You can e.g. lock specific documents (JSON documents or KV-pairs) and so somebody else has to wait until it is released again. Or you can be more optimistic and use C(ompare)A(nd)S(wap). A write operation is then successful if the CAS value for your document is still the same. This means if nobody else did change the document since you have fetched it. Otherwise you can just try it again with the updated document until you are the winner. Sure, there are also strictly transactional cases out there. They can be addressed by using a service side transaction manager (e.g. implementing 2-phase commit).

Not to use one single and big database is also a chance. We already talked about that you want to be able to scale your services out (adding new service instances behind a load balancer - so web scale) independently. Scaling out the service tier is only half of the story. More and more service instances might also raise the need to scale out on the storage/database tier. So instead doing all with your non-scalable relational DBMS, you can now follow the polyglot persistence idea and use the right database for the job, which means that you might introduce a highly scalable NoSQL database system for some of your service.

Summary

As explained Microservices are self-contained services those are providing business domain specific functionality. A system which uses Microservices is per definition a distributed system, with all it's advantages and disadvantages. Getting your system more scalable is easier possible, whereby distributed transactions are harder. Polyglot persistence is one benefit. You can now use the right storage or database system for the job, dependent on the requirements of the specific service.

Search This Blog

nosqlgeek.org