Monday, 28 September 2015

Document Versioning in Couchbase

Couchbase Server does out of the box not support document revisions but it would be quite simple to implement it on the application side. This article describes ways how to do this. The following topics are covered:

  • Handling concurrent access
  • Relevant attributes
  • One document per version
  • Embedded revision tree
  • Combined approaches 

Handling concurrent access

In a context of versioning multiple users/threads are creating new versions (and this maybe nearly the same time). So I think it makes sense to spot a light on concurrent access before we talk about versioning approaches. You will most probably need to combine concurrency handling with versioning.  Couchbase supports 2 ways of handling concurrent access to the same document. 
  • C(ompare) A(nd) S(wap): This is the optimistic approach. Each document has a built-in property which is the CAS-value. The CAS-value changes as soon as somebody updates the document. So the idea is to implement something like the following on the application side:
  1. Get the document and especially the CAS-value!
  2. Modify some document properties!
  3. Perform an update operation by passing the old CAS-value (from step 1.)!
  4. If somebody else updated the document meanwhile then a CAS mismatch error occurs because your client side CAS value is no longer identical to the server side one. If so then wait for a very short moment and then try again from step 1.!
  5. Multiple users/threads are accessing the same document. You will reach step 5. because you have the same chance as all others and so you will have the chance to update the document before someone else is doing it.
  • Locking: You can lock a document before you perform the changes and then release the lock. This is the pessimistic approach. A lock wait implementation is require in this case:
  1. Get the document and request a lock.
  2. If the document is locked then a lock error occurs!
  3. Wait until the document is released again and try from 1. !
  4. Update the document!
  5. Release the the document!

Relevant Attributes

Couchbase has some built-in attributes but you might want (dependent on your requirements) like to introduce some own versioning attributes.

Here 2 relevant built-in attributes:
  • Revision number: Couchbase has the built-in attribute 'rev' which is accessible via the document's meta data (meta.rev). The revision number is increased for every update and is internally used for the conflict resolution is you use Couchbase's Cross Data Center Replication feature. A higher revision number means that a document was more often updated.
  • CAS value: This attribute was already explained in the previous chapter. It is used to determine if a document was changed since you accessed it the last time.
Own attributes could be:
  • Update time stamp: A version could contain the update time stamp in order to determine who updated it last. You have to be careful here because your clients may not be time synchronized.
  • Custom revision number: Even if there is a built-in one, you can also introduce just a incrementing number as your revision number.
  • Updater: Person/service who/which updated the document.
  • Revision identifier: Another option would be to use an artificial id as the version number. So something like a UUID would be suitable.
  • Parent version: The previous revision.
So the idea would be that you just embed the suitable versioning details into your document:

One Document per Version

The easiest implementation of versioning would just use the version as part of the key. So let's assume that we have a kind of Content Management use case. To avoid confusion regarding the terminology let's call the objects 'content items' (The term document would be overloaded in this case because we talk about JSON document in Couchbase but not about Word documents in ECMS-s.). So the key of a content item would be:
  • cnti::1abc-2def-3ghi-4jkl
This follows the key pattern '$prefix::$id'. 

What we need now is an additional atomic counter object to generate our revision numbers. Couchbase supports such counters. They can usually be incremented by using the 'incr' function of the SDK of your choice.
  • count::rev::cnti = 0
The counter value is now used as part of the key of a content item:
  • cnti::1abc-2def-3ghi-4jkl::7
This follows the key pattern '$prefix::$id::$rev'.

It's easy to see that the counter also acts as a pointer to the latest revision. So the simplified approach would be:
  1. Increment the counter by generating a new revision id
  2. Get the old document which has the revision 'rev-1'
  3. Create a new document with the new revision id
Because we create new revisions for every document there is no concurrent write access to the document itself but there is indeed concurrency regarding the counter.

Embedded Revision Tree

A more complex approach would be to embed the versions to the main document as a tree of changes. The disadvantage could be that you document size becomes quite big. So you should limit the number of revisions to embed. Couchbase's Sync Gateway (a synchronization endpoint for Couchbase Lite instances, whereby Couchbase Lite is a light weighted Couchbase which can run on your mobile device - Rev Tree Storage on Couchbase Server) uses this approach.

The tree definition is quite simple. A tree has nodes. Each node, except the root node, has exactly one parent node. Each node in such a tree is representing one document revision. The tree describes which revision was derived from which other revision. The sub-tree from a specific node in the tree down to the leafs is called a branch.

The picture above shows 6 revisions. Now your application has a lot of possibilities to use such a revision tree.

  • From which revision to fork?
  • Which revisions/branches to keep?
  • How to merge based on the revisions?
  • What should be the max. size of the revision tree?
So as you can see this versioning approach is quite more complicated but provides you a lot of freedom.

The idea is to have a head reference in the document which points to the current base revision.

Combined Approaches

I can see the following 2 main requirements for versioning:
  • Change History: Some compliance or security rules are enforcing that you have to be able to answer the question who changed what and when. For this approach the 'One Document per Version' approach would be sufficient.
  • Conflict Handling: Multiple users are creating several versions and you want to decide to be able to pick a winner or even merge several versions. For this the 'Revision Tree' approach would work best.
Let's assume that you store only trees of a specific depth. The revision tree needs to be truncated in order to realize this but such a truncation would mean that you loose some older revisions. So an idea would be to archive the state of the tree as another revision:
  • If the revision tree becomes to big then
  1. Archive the current state of the revision tree by creating a new document for this version! An extra 'archive' bucket can be used for this purpose.
  2. Truncate the tree by setting a new head revision!


Even if each document has a 'rev' attribute, Couchbase Server is not directly supporting document versioning. But you can use the described approaches to implement your own document versioning on the application side. Such an approach can be very simple or more complicated. Which approach should be used by you depends on your actual requirements.

1 comment:

  1. Hi,

    Thanks for sharing this.

    Where could I find the documentation of metadata rev and cas structure ? I am using NodeJS SDK and the cas seems embedded in an opaque object and rev is not available in the get result...