Friday, 6 February 2015

A JMX monitoring service for Couchbase

I had to investigate the Couchbase Bucket statistics a bit this week and so started a small side project which allows to access them via JMX. The source code is available here: https://github.com/dmaier-couchbase/cb-jmx . JMX means Java Management Extensions and provides a standard way to manage and monitor applications in the Java world.

The idea of the service is to expose the Couchbase bucket statistics via JMX. So tools like E.G. 'JConsole' or  'VisualVM' can connect in order to access the metrics.

The following simple Managed Beans are implemented:
  • info: Shows general information about the JMX service
  • cmd_get: Shows the information how many get commands are currently processed for the given bucket. This is the number of commands those are invoked by the application.
  • cmd_set: Shows the information how many set commands are currently processed for the given bucket. This is the number of commands those are invokded by the application.
  • cpu_utilization_rate: This is the current CPU load for all cluster nodes.
  • delete_hits: This is the number of delete operations those did hit a document in Couchbase. So it is the number of successfully performed delete operations those were invoked by the application.
  • ep_bg_fetched: This is the number of fetches from disk. The cache hit ratio for get operations is (ep_bg_fetched / cmd_get) * 100 .
  • ep_diskqueue_items: This is the current size of the disk write queue. The disk write queue size should not increase lineary.
  • ep_mem_high_wat: This is the high watermark in megabytes which is configured. When the consumed memory (for the bucket) reaches the high watermark, then the system will begin to eject items until the low water mark threashold is reached again.
  • ep_mem_low_wat: This is the low watermark in megabytes which is configured. If this threashold is reached, then you can expect that that items will be ejected from memory soon. If you want a working set of 100% then you should increase the memory quota of your bucket as soon as the low water mark is reached.
  • mem_used: The current memory usage for the given bucket.
  • vb_replica_queue_size: The size of the replication queue. The replication should be able to keep up, which means that you don't want to see a linear increasement of this value.
The JMX service runs a job which gathers the provided bucket level metrics every 5 seconds. Every metric is returned as a series of samples. The standard accuracy is 1 minute, which means that you get a series of about 60 values per metric. This means that Couchbase measured one value per second. The MBeans are now approximating this series of values in order to provide a flat value. (BTW: The series is also exposed as a stringified array of values.) This flat value can then be plotted (see screen shot below). The closest approximation is the 'Next' one.
  • Min: The minimum value of the series of samples
  • Max: The maximum value of the series of samples
  • Avg: The average value of the series (sum of all values divided by the number of values)
  • Median: The central value of the series (the value in the middle)
  • Next: The next value of the series of samples (which means that this is the value which you get by looping over the series)