nosqlgeek.org

Posts

Showing posts from 2016

Visualizing time series data from Couchbase with Grafana

By David Maier September 12, 2016

Grafana is a quite popular tool for querying and visualizing time series data and metrics. If you follow my blog then you might have seen my earlier post about how to use Couchbase Server for managing time series data: http://nosqlgeek.blogspot.de/2016/08/time-series-data-management-with.html This blog is now about extending this idea by providing a Grafana Couchbase plug-in for visualizing purposes. After you installed Grafana (I installed it on Ubuntu, but there are installation guides available here for several platforms), you are asked to configure a data source. Before we will use Grafana's 'SimpleJson' data source, it's relevant how the backend of such a data source looks like. '/' : Returns any successful response in order to test if the data source is available '/search ': Returns the available metrics. We will just return 'dax' in our example. '/annotations' : Returns an array of annotations. Such an annotation h...

Time series data management with Couchbase Server

By David Maier August 26, 2016

Couchbase Server is a Key Value store and Document database. The combination of being able to store time series entries as KV pairs with the possibilities to aggregate data automatically in the background via Map-Reduce and the possibility to dynamically query the data via the query language N1QL makes Couchbase Server a perfect fit for time series management use cases. The high transaction volume seen in time series use cases is meaning that relational database systems are often not a good fit. A single Couchbase Cluster on the other hand side might support hundreds of thousands (up to millions) of operations per second (indeed dependent on the node and cluster size). Time series use cases seen with Couchbase are for instance: Activity tracking : Track the activity of a user whereby each data point is a vector of activity measurement values (e.g location, ...) Internet of things: Frequently gathering data points of internet connected devices (as cars, alarm systems, home ...

Caching in JavaEE with Couchbase

By David Maier July 01, 2016

One of Couchbase Server's typical use cases is caching. As you might know it is a KV store. The value of a KV pair can be JSON document. Not only the fact that Couchbase Server can store JSON documents makes it a document database, more the fact that you can index and query on JSON data defines it's characteristic as a JSON document database. Back to the KV store: If you you configure the built-in managed cache in a way that all your data is fitting into memory then Couchbase Server is used as a highly available distributed cache. If you are a Java developer, then one of your questions might be if it makes sense to use Couchbase as a cache for your applications. I had several projects, where EhCache was replaced by Couchbase because of the Garbage Collection implications. The performance was often quite better with a centralized, low-latency (sub-milliseconds) cache than with one which was colocated with the application instances. This indeed depends on several factors (siz...

How to build Couchbase Server

By David Maier June 01, 2016

Couchbase Server is Open Source under Apache2 license and even if an user would normally not build it from the source code (in fact the custom built versions are not officially supported by Couchbase), you might want to participate in the Couchbase Community by providing some lines of code. The first thing you need is to be able to build Couchbase Server from the source code. The Couchbase Server source code is not just in one repository. Instead it is spread over multiple Git repositories. A tool which can be used in order to abstract the access to these multiple Git repositories is 'repo'. So 'repo' is a repository management tool on top of Git. It's also by Google for Android and so a short documentation can be found here: https://source.android.com/source/using-repo.html . The installation instructions are available at http://source.android.com/source/downloading.html#installing-repo . Here some 'repo' commands: repo init : Installs the reposit...

Couchbase Server 4.5's new Sub-Document API

By David Maier May 13, 2016

Introduction The Beta version of Couchbase Server 4.5 has just been released, so let's try it out! A complete overview of all the great new features can be found here: http://developer.couchbase.com/documentation/server/4.5/introduction/intro.html . This article will highlight the new Sub-Document API feature. What's a sub-document? The following document contains a sub-document which is accessible via the field 'tags': So far With earlier Couchbase versions (<4.5) the update of a document had to follow the following pattern: Get the whole document which needs to be updated Update the documents on the client side (e.g. by only updating a few properties) Write the whole document back A simple Java code example would be: Now with 4.5 The new sub-document API is a server side feature which allows you to (surprise, surprise ...) only get or modify a sub-document of an existing document in Couchbase. The advantages are: Better usabil...

Microservices and Polyglot Persistence

By David Maier May 06, 2016

TOC Introduction Why Microservices? Polyglot character What's happening with my Database? Summary Introduction The idea behind Microservices is already described by it's name. In summary it means to use multiple smaller self-contained services to build up a system, instead of using one monolithic one. This explanation does sound simple, doesn't it? We will see that it is not because breaking up one single big system in several services has quite a lot implications. Why Microservices? A monolithic system would be a system which has only one main component. One of the disadvantages is usually that you have to deploy changes in a way that they affect the deployment of the whole system. A today's system is actually not completely monolithic at all, because it normally already consists of several sub-components. Often other decomposition mechanisms are already used. One way would be to build your system modular. Such a module might be actually a good candidat...

CBGraph now supports edge list compression

By David Maier March 26, 2016

About CBGraph CBGraph ( https://github.com/dmaier-couchbase/cb-graph ) is a Graph API for the NoSQL database system Couchbase Server. Adjacency list compression The latest version of CBGraph (v0.9.1) supports now adjacency list compression. An adjacency list is the list of neighbors of a vertex in a Graph. So far the adjacency lists were stored directly at the vertices but vertices can become quite big if they have a huge amount of incoming or outgoing edges (such a vertex is called a supernode). One of the limitations which such a supernode introduces is that it just takes longer to transfer a e.g. a 10MB vertex over the wire than e.g. a 1KB one. In order support such supernodes better by reducing the network latency, two optimization steps were introduced for CBGraph. Compress the adjacency list by still storing it at the vertex (as base64 string). The base64 encoding causes that the lists are taking a bit more space for small vertices but you save a lot (saw up to ...

Large-scale data processing with Couchbase Server and Apache Spark

By David Maier February 09, 2016

I just had the chance to work a bit with Apache Spark. Apache Spark is a distributed computation framework. So the idea is to spread computation tasks to many machines in a computation cluster. The idea here is to load data from Couchbase, process it in Spark and store the results back to Couchbase. Couchbase is the perfect companion for Spark because it is capable to handle huge amounts of data, provides a high performance (hundreds of thousands ops per second / sub-milliseconds latency) and is horizontally scalable by also being fault tolerant (replica copies, failover, ...). You might already know Hadoop for this purpose. Sparks approach is similar but different ;-). In Hadoop you typically load everything into the Hadoop distributed file system and then let process it 'co-located' in parallel. In Spark each worker node is processing the data by default in memory. Your data is described by a R(esilient) D(istributed) D(ataset). Such an RDD is in the first step not the data...