Saturday, 26 March 2016

CBGraph now supports edge list compression

About CBGraph

CBGraph (https://github.com/dmaier-couchbase/cb-graph) is a Graph API for the NoSQL database system Couchbase Server.

Adjacency list compression

The latest version of CBGraph (v0.9.1) supports now adjacency list compression. An adjacency list is the list of neighbors of a vertex in a Graph.

So far the adjacency lists were stored directly at the vertices but vertices can become quite big if they have a huge amount of incoming or outgoing edges (such a vertex is called a supernode). One of the limitations which such a supernode introduces is that it just takes longer to transfer a e.g. a 10MB vertex over the wire than e.g. a 1KB one. In order support such supernodes better by reducing the network latency, two optimization steps were introduced for CBGraph.
  1. Compress the adjacency list by still storing it at the vertex (as base64 string). The base64 encoding causes that the lists are taking a bit more space for small vertices but you save a lot (saw up to 50% with UUID-s as vertex id-s) for supernodes.
  2. Externalize and compress the adjacency list as a binary
The used compression algorithm is GZIP.

There are the following switches in the graph.properties file in order to controll the compression mode.
  • graph.compression.enabled
  • graph.compression.binary
The first property controls if compression is used. The second one controls if the compressed adjacency list is stored as a separated binary (As Couchbase Server is a KV store and a document database, you can directly store binaries as KV pairs).


Compression disabled

The following setting is used in order to disable compression:
  • graph.compression.enabled=false
The document model which is used in Couchbase looks then as the following one:

Compression enabled by embedding the adjacency list

The following setting is used in order to enable compression:
  • graph.compression.enabled=true
  • graph.compression.enabled=false
The document model that is used in Couchbase looks then as the following one:


Compression enabled by storing the adjacency list as a separated binary

The following setting is used in order to enable the storage of the adjacency list as compressed binary:
  • graph.compression.enabled=true
  • graph.compression.enabled=true
The document model which is used looks then as the following one:


Conclusion

Edge list compression helps in order to handle supernodes. The advantages are obviously that you can store more edges at a vertex, but also that the average size of a node is lower and so the latency behavior is improved because the network latency for getting a vertex from Couchbase to CBGraph is a function of the size of a vertex. The overall performance for bigger graphs was improved via this feature. As you can see, the underlying model looks different dependent on the compression mode. So the compression mode is a life time decision for a Graph. You can't access an uncompressed Graph via a CBGraph instance which is configured to use compression and vice versa.

No comments:

Post a Comment