Monday, 28 September 2015

Document Versioning in Couchbase

Couchbase Server does out of the box not support document revisions but it would be quite simple to implement it on the application side. This article describes ways how to do this. The following topics are covered:

  • Handling concurrent access
  • Relevant attributes
  • One document per version
  • Embedded revision tree
  • Combined approaches 

Handling concurrent access

In a context of versioning multiple users/threads are creating new versions (and this maybe nearly the same time). So I think it makes sense to spot a light on concurrent access before we talk about versioning approaches. You will most probably need to combine concurrency handling with versioning.  Couchbase supports 2 ways of handling concurrent access to the same document. 
  • C(ompare) A(nd) S(wap): This is the optimistic approach. Each document has a built-in property which is the CAS-value. The CAS-value changes as soon as somebody updates the document. So the idea is to implement something like the following on the application side:
  1. Get the document and especially the CAS-value!
  2. Modify some document properties!
  3. Perform an update operation by passing the old CAS-value (from step 1.)!
  4. If somebody else updated the document meanwhile then a CAS mismatch error occurs because your client side CAS value is no longer identical to the server side one. If so then wait for a very short moment and then try again from step 1.!
  5. Multiple users/threads are accessing the same document. You will reach step 5. because you have the same chance as all others and so you will have the chance to update the document before someone else is doing it.
  • Locking: You can lock a document before you perform the changes and then release the lock. This is the pessimistic approach. A lock wait implementation is require in this case:
  1. Get the document and request a lock.
  2. If the document is locked then a lock error occurs!
  3. Wait until the document is released again and try from 1. !
  4. Update the document!
  5. Release the the document!

Relevant Attributes

Couchbase has some built-in attributes but you might want (dependent on your requirements) like to introduce some own versioning attributes.

Here 2 relevant built-in attributes:
  • Revision number: Couchbase has the built-in attribute 'rev' which is accessible via the document's meta data (meta.rev). The revision number is increased for every update and is internally used for the conflict resolution is you use Couchbase's Cross Data Center Replication feature. A higher revision number means that a document was more often updated.
  • CAS value: This attribute was already explained in the previous chapter. It is used to determine if a document was changed since you accessed it the last time.
Own attributes could be:
  • Update time stamp: A version could contain the update time stamp in order to determine who updated it last. You have to be careful here because your clients may not be time synchronized.
  • Custom revision number: Even if there is a built-in one, you can also introduce just a incrementing number as your revision number.
  • Updater: Person/service who/which updated the document.
  • Revision identifier: Another option would be to use an artificial id as the version number. So something like a UUID would be suitable.
  • Parent version: The previous revision.
So the idea would be that you just embed the suitable versioning details into your document:



One Document per Version

The easiest implementation of versioning would just use the version as part of the key. So let's assume that we have a kind of Content Management use case. To avoid confusion regarding the terminology let's call the objects 'content items' (The term document would be overloaded in this case because we talk about JSON document in Couchbase but not about Word documents in ECMS-s.). So the key of a content item would be:
  • cnti::1abc-2def-3ghi-4jkl
This follows the key pattern '$prefix::$id'. 

What we need now is an additional atomic counter object to generate our revision numbers. Couchbase supports such counters. They can usually be incremented by using the 'incr' function of the SDK of your choice.
  • count::rev::cnti = 0
The counter value is now used as part of the key of a content item:
  • cnti::1abc-2def-3ghi-4jkl::7
This follows the key pattern '$prefix::$id::$rev'.

It's easy to see that the counter also acts as a pointer to the latest revision. So the simplified approach would be:
  1. Increment the counter by generating a new revision id
  2. Get the old document which has the revision 'rev-1'
  3. Create a new document with the new revision id
Because we create new revisions for every document there is no concurrent write access to the document itself but there is indeed concurrency regarding the counter.

Embedded Revision Tree

A more complex approach would be to embed the versions to the main document as a tree of changes. The disadvantage could be that you document size becomes quite big. So you should limit the number of revisions to embed. Couchbase's Sync Gateway (a synchronization endpoint for Couchbase Lite instances, whereby Couchbase Lite is a light weighted Couchbase which can run on your mobile device - Rev Tree Storage on Couchbase Server) uses this approach.

The tree definition is quite simple. A tree has nodes. Each node, except the root node, has exactly one parent node. Each node in such a tree is representing one document revision. The tree describes which revision was derived from which other revision. The sub-tree from a specific node in the tree down to the leafs is called a branch.


The picture above shows 6 revisions. Now your application has a lot of possibilities to use such a revision tree.

  • From which revision to fork?
  • Which revisions/branches to keep?
  • How to merge based on the revisions?
  • What should be the max. size of the revision tree?
So as you can see this versioning approach is quite more complicated but provides you a lot of freedom.

The idea is to have a head reference in the document which points to the current base revision.


Combined Approaches

I can see the following 2 main requirements for versioning:
  • Change History: Some compliance or security rules are enforcing that you have to be able to answer the question who changed what and when. For this approach the 'One Document per Version' approach would be sufficient.
  • Conflict Handling: Multiple users are creating several versions and you want to decide to be able to pick a winner or even merge several versions. For this the 'Revision Tree' approach would work best.
Let's assume that you store only trees of a specific depth. The revision tree needs to be truncated in order to realize this but such a truncation would mean that you loose some older revisions. So an idea would be to archive the state of the tree as another revision:
  • If the revision tree becomes to big then
  1. Archive the current state of the revision tree by creating a new document for this version! An extra 'archive' bucket can be used for this purpose.
  2. Truncate the tree by setting a new head revision!

Conclusion

Even if each document has a 'rev' attribute, Couchbase Server is not directly supporting document versioning. But you can use the described approaches to implement your own document versioning on the application side. Such an approach can be very simple or more complicated. Which approach should be used by you depends on your actual requirements.

Friday, 25 September 2015

How many Buckets?


In Couchbase the equivalent of a database is called a bucket. A bucket is basically a data container which is split into 1024 partitions, the so called vBuckets. Each partition is stored as a Couchstore file on disk and the partitioning is also reflected in memory. Each configured replica leads to additional 1024 replica vBuckets. So if you have a bucket with 1 replica configured then this bucket has 1024 active vBuckets and 1024 replica vBuckets. The vBuckets are spread across the cluster nodes. So in a 2 node cluster each node would have 512 active vBuckets. In a n-node cluster each node would by default have nearly 1024/n vBuckets.

An often asked question is 'How many Buckets should I create?'.

There are multiple aspects to take into account here.
  • Max. possible number of Buckets
  • Logical data separation
  • Load separation
  • Physical data separation
  • Multi-tenancy


Max. possible number of Buckets

Before we start talking about why it might make sense to use multiple buckets, let's talk about how many buckets are possible in practice per cluster.

As we will see a bucket is not just a logical container but also a physical one. Furthermore some cluster management is happening per bucket. So it would not be a good idea to create hundreds of buckets because it would lead to a management overhead.  The actual max. number is dependent on size, workload and so on. Couchbase is usually recommending a max. number of 10 buckets.

Logical Data Separation

Let's assume that you have multiple services. Regarding a service oriented approach (and especially micro services) those service would be decoupled by having disjunct responsibilities. An example would be:
  • Session Management Service: Create/validate session tokens for authorization purposes
  • Chat Service: Send/retrieve messages from/to users 
The session management service is then indeed used by the chat service but both services have disjunct responsibilities.

A logical data separation would mean that we create 2 buckets in order to make sure that there is a logical separation between sessions and messages. So you can co-locate some service by taking the scalability requirements into account. If these requirements are completely different then it would make sense to open a new cluster.


Load Separation

It's indeed the case that the workload pattern of the session management service is different from the chat service. Millions of users sending/reading messages would cause a lot of writes and reads whereby the token based authorization would cause mostly reads and a fewer number of writes for the log-ins.

So it would be useful to create the session management bucket with other settings than the messaging one. For a session management use case it makes sense to have all of the sessions cached in order to reply quite quick to the question if a user is authorized to act in the context of a specific session.

So the statement here is that it makes sense to have multiple buckets (or even multiple clusters) to support the individual load patterns better. Buckets with a high read throughput profit from higher memory quota settings. 



Buckets with higher disk write throughput benefit from a higher disk I/O thread priority.



Another aspect is load separation for indexing purposes. In 3.x this refers to the View indexing load. Couchbase Views are defined per bucket (one bucket can have multiple Design Documents, one Design Document contains multiple Views)  and by using map-reduce functions. Views are sharded across the cluster. So each node is responsible for indexing its data. They can have multiple roles:
  • Primary Index: Only emit a key of a KV-pair as the key into the View index
  • Secondary Index: Emit a property of the value of a KV-pair as the key into the View index
  • Materialized View: Emit a key and a value into the View index
Views are using incremental Map-Reduce to be updated. This means that a creation/update/deletion of a document will be reflected in the View index. Let's assume that you have 20% of user documents in your bucket.  Your Views are defined in a way that they only work with user documents and so your map function is filtering them out. In this case the map function would be anyway executed for all documents by skipping 80% of them after checking if they have a type attribute which has the value 'User'. This is means some overhead. So for Views it would be a good idea to move these 20% of data into another bucket called 'users'. The Views are then only created for the 'users' bucket.

In Couchbase Server 4.0, G(lobal) S(econdary) I(ndexes) are helping. As the name says a GSI is global and so it sits on a single server (or on multiple nodes for failover purposes). They are not sharded across multiple nodes. So the indexing node gets all the mutations streamed and then has to index also based on filter criteria, but the indexing load is separated from the data serving node and so does not affect it.

CREATE INDEX def_name_type ON travel-sample(`name`) WHERE (`_type` = "User")
So you win control regarding the indexing load separation by deciding which node serves which index.


Physical Data Separation

The previously mentioned load separation is closely related to a physical separation. 

To store several GSI-s on several machines means to separate them physically. So Couchbase Server 4.0 and it's multi-dimensional scalability feature, whereby the dimensions are
  • Data Serving
  • Indexing
  • Querying
allows you to physically scale-up per individual dimension. A node wich runs the query service can now have more CPU-s whereby a node which runs the data service can now have more RAM in order to support lighting fast reads.

A physical data separation is also possible per bucket. Couchbase allows you to specify separated data and index directories. So it is already possible to keep the GSI-s or Views on faster SSD-s whereby (indeed dependent on performance requirements) your data can be stored on bigger slower disks. Each bucket is reflected as one folder on disk. So it's also possible to mount a specific bucket folders to a specific disks, which provides you a physical per bucket data separation.

Multi-tenancy

The multiple buckets question is also often asked in a context of multi-tenancy. So what if my service/application has to support multiple tenants? Should then every tenant have his own bucket?

Let's answer this question by asking another one. Let's assume you still use a relational Database System. Would you create one database for every tenant? The answer is most often 'No' because the service/application is modeling the tenancy. So it would be more likely the case that there is a tenants table which is referenced by a user table (multiple users belong to one tenant). I saw only a very few Enterprise Applications in the wild which relied on features (like database side access control) of the underlying Database System for this purpose. 

In Couchbase you have in theory the following ways to realize multi-tenancy:
  • One tenant per cluster: This would be suitable if you provide Couchbase as a Service. Then your customers have full access to Couchbase and they implement their own services/applications on top of it. In this case a cluster per tenant makes sense because the cluster needs to be scaled out dependent on the performance requirements of your customer's services/applications.
  • One tenant per bucket: As explained, there are two reasons why it is often not an option. The first one is the answer to the question above. The second one is the max. number of buckets per cluster.
  • One tenant per name space: As explained your application often already models multi-tenancy. So you can model a kind of name space per tenant by using key prefixes. ( Each document is stored as a KV-Pair in Couchbase. ) So every key which belongs to a specific tenant id belongs to a specific tenant. If an user with an email address of 'something@nosqlgeek.org' would log in, then he would only have access to documents those have the prefix 'nosqkgeek.org'. The key pattern for a message would be '$tenant::$type::$id', so for instance 'nosqlgeek.org::msg::234234234'.

Conclusion

Multiple buckets are making sense for Logical Data separation in a context of service oriented approaches. If the max. number of buckets per cluster would be exceeded or the scalability requirements of the buckets are too different then an additional cluster makes sense. An important reason for multiple buckets in 3.x is the load or physical data separation. 4.0 helps you to externalize the indexing load to other nodes which gives you more control regarding the balancing of the indexing load. Multi-tenancy should happen normally via name spaces (key prefixes) and not via multiple buckets.