Go Microservices blog series, part 13 - data consistency, gorm and CockroachDB.

14 February 2018 // Erik Lupander

In this part of the Go microservices blog series, we’ll take a look at distributed data storage using CockroachDB and the GORM O/R-mapper.

Contents

  • Overview
  • The CAP theorem
  • CockroachDB
  • Installing CockroachDB
  • The new “Dataservice” with GORM
  • Running and testing an endpoint
  • Load test and resilience
  • Summary

Source code

The finished source can be cloned from github:

  1. > git clone https://github.com/callistaenterprise/goblog.git
  2. > git checkout P13

1. Overview

Data consistency vs availability in distributed systems is a very interesting topic. These days, traditional ACID relational databases are often replaced by NoSQL databases operating on the principles of eventual consistency from the BASE model. BASE combined with bounded contexts often forms the basis of persistence in distributed microservice architectures.

Bounded contexts and eventual consistency can somewhat simplified be explained as:

  • Bounded contexts are a central pattern in Domain-driven design, which is a very useful pattern when designing microservice architectures. For example - if you have an “Accounts” microservice and an “Orders” microservice, they should own their own data (e.g. “accounts” and “orders”) in separate databases without old-school foreign key constraints between them. Each microservice is solely responsible for writing and reading data from its own domain. If the “orders” microservice needs to know about the owning “account” for a given “order”, the “orders” microservice must ask the “account” microservice for account data - the “orders” microservice may not under any circumstance query or write directly to the tables or document stores of the “account” microservice.
  • Eventual consistency can be several things. It’s primarily the concept of a data replication mechanism where a given data write will eventually be replicated across the distributed storage system so any given read will yield the latest version of the data. One can also consider it a requisite of the bounded context pattern, e.g. for a “business transaction” write that appears atomic to an outside viewer, many microservices may be involved in writing data across several bounded contexts without any distributed mechanisms guaranteeing a global ACID transaction. Instead, eventually all involved microservices will have performed their writes, resulting in a consistent state across the distributed system from the perspective of the business transaction. See a good comparison of ACID and BASE here.
    These days, many people turn to the NoSQL database Apache Cassandra when they require horizontally scalable data storage with automatic replication and eventual consistency. However, I’m a bit curious how a cutting edge “SQL” database such as CockroachDB works in our microservice context, so that’ll be the focus of this blog post.

First, a few words about the CAP theorem.

2. The CAP theorem

CAP is a three-letter acronymn for database systems that claims that no distributed database may ever fulfill all these three criterias at any one time:

  • Consistent: A read is guaranteed to return the most recent write.
  • Available: The choice is always between serving the data you have even though you can’t guarantee that is the most recent version of it (a write may have occured 10 microseconds ago on another cluster member) OR you must deny serving data if you’re not absolutely sure there’s no inconsistent state of the requested data anywhere in the cluster.
  • Partition tolerant: If a database server goes down, the remaining nodes must continue to function and when the failed node recovers, consistent data must still be served.
    A distributed database may only choose two of above, making them either “CAP-Available” (AP) or “CAP-Consistent” (CP). The main advantage of an AP database is better latencies since CP databases must coordinate writes and reads across nodes, while an AP system is allowed to possibly return inconsistent or missing data which is faster. In other words - AP databases favor speed while CP databases favors robustness.

Do note that it’s fully possible to run a CAP-capable distributed database as long as there are no network or other problems. The problem is that there’s always going to be network problems at some point, see the fallacies of distributed computing. This is especially relevant for microservices given that we’re typically leaving the monolithic database of your enterprise behind, instead letting each microservice “own” their own domain of data - sometimes split over many databases, possibly even across multiple data centers.

CockroachDB is a CAP Consistent (CP) database. For a more in-depth explanation, check out this awesome article from cockroachlabs.

3. CockroachDB

CockroachDB was created by ex-Google employees that used to work on Google’s Cloud Spanner. CockroachDB do - as prevoiusly stated - not claim to be a CAP-database, but claims full C and P, and a significant number of 9’s for availability.

At it’s core, CockroachDB is a distributed key-value store written in Go, but differs from its peers by having an ANSI-compliant SQL interface, behaving like a relational database in most, if not all, aspects. The authors are very transparent about CockroachDB still having some issues making it unsuitable for OLAP-like workloads. Essentially, JOIN operations are continuously being optimized but they still have quite a way to go until the JOIN performance is on par with old-school databases.

cockroachdb overviewSource: Cockroachlabs

A CockroachDB cluster always consists of at least three database nodes, where the database will stay 100% operational if one node goes down. The underlying replication engine always makes sure any entry exists on at least two nodes with auto-replication if a node goes down. We’ll get back to this claimed resilience a bit later where we’ll stress test things while taking down a DB node, should be a fun exercise!

4. Install and run

Time to get this database installed and up and running in our cluster. We’re going to pull v1.1.3 directly from Docker Hub and start three nodes, each running one instance of CockroachDB on separate ports. Since each node needs it’s own mounted storage we cannot (AFAIK) run three instances of a CockroachDB docker swarm mode service, we need three separate services.

For development purposes, this is actually very easy. I’ve prepared a bash-script to set this up:

  1. #!/bin/bash
  2. # CoachroachDB master, will publish admin GUI at 3030, mapped from 8080
  3. docker service rm cockroachdb1
  4. docker service create --name=cockroachdb1 --network=my_network -p 26257:26257 -p 3030:8080 --mount type=volume,source=cockroach-data1,target=/cockroach/cockroach-data cockroachdb/cockroach:v1.1.3 start --insecure
  5. # CoachroachDB
  6. docker service rm cockroachdb2
  7. docker service create --name=cockroachdb2 --network=my_network --mount type=volume,source=cockroach-data2,target=/cockroach/cockroach-data cockroachdb/cockroach:v1.1.3 start --insecure --join=cockroachdb1
  8. # CoachroachDB
  9. docker service rm cockroachdb3
  10. docker service create --name=cockroachdb3 --network=my_network --mount type=volume,source=cockroach-data3,target=/cockroach/cockroach-data cockroachdb/cockroach:v1.1.3 start --insecure --join=cockroachdb1

Let’s dissect the first docker service create a bit:

  • Ports: We’re publishing port 26257 which actually isn’t necessary unless we want to try to connect to the cluster from the outside. We’re also mapping the admin GUI locally at port 8080 to port 3030.
  • Volume mounts. CockroachDB requires some persistent storage, so we’re mounting a local folder as persistent storage using the –mount flag.
  • Start command: start –insecure. We’re supplying a start command and the –insecure argument (it’s a CockroachDB argument, has nothing to do with Docker!) in order to run a local cluster without setting up certificates. Also note the –join=cockroachdb1 argument passed to the two “workers” telling them to form a cluster with their leader.
    Startup may take a few minutes, after which the green and pleasant admin GUI should be available in your favorite browser at http://192.168.99.100:3030:

OverviewThe overview

Nodes listList of server nodes

Nice! Now we’re ready to create some databases and users. For more details, please check the rich documentation.

We’re going to use the built-in SQL client to create two databases and a two users - one for each of our bounded contexts. Since our CockroachDB instances are running in Docker Containers, we can’t use cockroach sql directly. We must do it by connecting to a running container using a bit of docker wizardry:

  1. > docker ps
  2. CONTAINER ID IMAGE COMMAND
  3. 10f4b6c727f8 cockroachdb/cockroach:v1.1.3 "/cockroach/cockro..."

Find a container running the cockroachdb/cockroach container and note the container ID. Then we’ll use docker exec to launch the SQL CLI:

  1. > docker exec -it 10f4b6c727f8 ./cockroach sql --insecure
  2. # Welcome to the cockroach SQL interface.
  3. # All statements must be terminated by a semicolon.
  4. # To exit: CTRL + D.
  5. #
  6. # Server version: CockroachDB CCL v1.1.3 (linux amd64, built 2017/11/27 13:59:10, go1.8.3) (same version as client)
  7. # Cluster ID: 5c317c3e-5784-4d8f-8478-ec629d8a920d
  8. #
  9. # Enter \? for a brief introduction.
  10. #
  11. root@:26257/>

We’re in!

I’ve prepared a .sql file whose contents we easily can copy-paste directly into the console. This is a one-time job for the purpose of this particular blog post. In a real-life scenario you’d obviously script this using some build automation tool.

  1. CREATE DATABASE account;
  2. CREATE DATABASE image;
  3. CREATE USER account_user WITH PASSWORD 'account_password';
  4. CREATE USER image_user WITH PASSWORD 'image_password';
  5. GRANT ALL ON DATABASE account TO account_user;
  6. GRANT ALL ON DATABASE image TO image_user;

Done! Now on to the wonderful world of Go and O/R-mapping!

5. Using CockroachDB from Go using GORM

5.1 Landscape overview

First let’s start with a brand new overview of what the microservice landscape will look like once this part is done:

New landscape overview

Key stuff:

  • The BoltDB is gone from the accountservice.
  • The new “dataservice” will access accounts and account events stored in a CockroachDB database named “account”.
  • The existing “imageservice” will now store image urls in another CockroachDB database named “image”. (Remember, bounded-contexts and the share-nothing principle of microservices)
  • The two databases above are both hosted in the three-node CockroachDB cluster. Data may exist on any two of three server nodes.
  • The accountservice used to both act as a service aggregator AND account storage. It’s purpose is now strictly orchestrating the fetching of account objects by talking to the Go-based “imageservice” and “dataservice” as well as the Java-based “quotes-service” and then aggregating them to a unified response.
  • The communication from our microservices to the CockroachDB cluster uses the postgresql wire protocol.

5.2 GORM

GORM is an “object-relational mapper” (ORM) for Go - think of it as a rough equivalent of Hibernate or similar, although perhaps not as mature or fully-featured. Still - > 7000 stars on github and over 120 contributors gives an indication of a well-liked and commonly used library.

CockroachDB uses the postgresql wire protocol which works very nicely with GORM - GORM has support for several major SQL vendors out of the box.

What about tables where we’ll store and retrieve actual data? In this particular blog, we’ll utilize the AutoMigrate feature of GORM to create our tables.

5.2.1 Structs and gorm tags

The AutoMigrate feature introspects Go structs with ‘gorm’-tags and automatically creates tables and columns given these structs. Let’s take a closer look how we declare primary keys, foreign keys and an index directly on the structs by using gorm tags.

  1. type AccountData struct {
  2. ID string `json:"" gorm:"primary_key"`
  3. Name string `json:"name"`
  4. AccountEvents []AccountEvent `json:"events" gorm:"ForeignKey:AccountID"`
  5. }
  6. type AccountEvent struct {
  7. ID string `json:"" gorm:"primary_key"`
  8. AccountID string `json:"-" gorm:"index"`
  9. EventName string `json:"eventName"`
  10. Created string `json:"created"`
  11. }

Most of the GORM tags should be self-explanatory for people vaguely familiar with relational databases - e.g. “primary_key”, “index” etc.

The AccountData struct has a has-many relationship with AccountEvents, mapped using the “ForeignKey:AccountID” tag. This will result in AutoMigrate creating two tables with columns appropriate for each of the struct fields, including foreign key constraints and the specified index. The two tables will be created within the same database with full referential integrity, i.e. they belong to the same “account data” bounded context that’ll be served by our new dataservice. The “image data” - consisting of a single AccountImage struct, will belong to its own bounded context and be served from the imageservice microservice.

The generated tables looks like this from the CockroachDB GUI:

tables tables

(I’ve rearranged the codebase somewhat so “model” structs used by more than one service resides in /goblog/common/model now.)

5.2.2 Working with Gorm

Dealing with Gorm requires surprisingly little boilerplate on the structs, but working with its DSL for querying and mutating data may take a little while getting used to. Let’s take a look at a few basic use cases:

5.2.2.1 Basics, Connection and AutoMigrate

All interactions with the GORM API in these examples happen through “gc.crDB” which is my wrapping of a pointer to gorm.DB, i.e:

  1. type GormClient struct {
  2. crDB *gorm.DB
  3. }
  4. var gc &GormClient{}

Below, we’re opening the connection using postgres SQL dialect and then calling the AutoMigrate function to create tables.

  1. var err error
  2. gc.crDB, err = gorm.Open("postgres", addr) // Addr is supplied from config server, of course
  3. if err != nil {
  4. panic("failed to connect database: " + err.Error())
  5. }
  6. // Migrate the schema
  7. gc.crDB.AutoMigrate(&model.AccountData{}, &model.AccountEvent{}) // Note that we pass the structs we want tables for.
5.2.2.2 Persisting data
  1. // Create an instance of our Account struct
  2. acc := model.AccountData{
  3. ID: key, // A pre-generated string id
  4. Name: randomPersonName(), // Some person name
  5. Events: accountEvents, // slice of AccountEvents
  6. }
  7. gc.crDB.Create(&acc) // Persist!

The code above will write both a row to the ACCOUNT_DATA table as well as any ACCOUNT_EVENT rows present in the Events slice, including foreign keys. Using the SQL client, we can try a standard JOIN:

  1. root@:26257> use account;
  2. root@:26257/account> SELECT * FROM account_data AS ad INNER JOIN account_events AS ae ON ae.account_id = ad.id WHERE ad.id='10000';
  3. +-------+----------+--------------------+------------+------------+---------------------+
  4. | id | name | id | account_id | event_name | created |
  5. +-------+----------+--------------------+------------+------------+---------------------+
  6. | 10000 | Person_0 | accountEvent-10000 | 10000 | CREATED | 2017-12-22T21:38:21 |
  7. +-------+----------+--------------------+------------+------------+---------------------+
  8. (1 row)

We’re seeding one AccountEvent per AccountData so the result is absolutely right!

5.2.2.3 Querying data

It’s of course possible to use the postgres driver and do standard SQL queries like the one above. However, to leverage GORM appropriately, we’ll use the query DSL of GORM.

Here’s an example where we load an AccountData instance by ID, eagerly loading any AccountEvents related to it.

  1. func (gc *GormClient) QueryAccount(ctx context.Context, accountId string) (model.AccountData, error) {
  2. acc := model.AccountData{} // Create empty struct to store result in
  3. gc.crDB.Preload("Events").First(&acc, "ID = ?", accountId) // Use the Preload to eagerly fetch events for
  4. // the account. Note use of ID = ?
  5. if acc.ID == "" { // Not found handling...
  6. return acc, fmt.Errorf("Not Found")
  7. }
  8. return acc, nil // Return populated struct.
  9. }

A more complex example - find all AccountData instances having a person whose name starts with ‘Person_8’ and count the number of AccountEvents for each entry.

  1. func (gc *GormClient) QueryAccountByNameWithCount(ctx context.Context, name string) ([]Pair, error) {
  2. rows, err := gc.crDB.Table("account_data as ad"). // Specify table including alias
  3. Select("name, count(ae.ID)"). // Select columns including count, see Group by
  4. Joins("join account_events as ae on ae.account_id = ad.id"). // Do a JOIN
  5. Where("name like ?", name + "%"). // Add a where clause
  6. Group("name") // Group by name
  7. .Rows() // Call Rows() to execute the query
  8. result := make([]Pair, 0) // Create slice for result
  9. for rows.Next() { // Iterate over returned rows
  10. pair := Pair{} // Pair is just a simple local struct
  11. rows.Scan(&pair.Name, &pair.Count) // Pass result into struct fields
  12. result = append(result, pair) // Add resulting pair into slice
  13. }
  14. return result, err // Return slice with pairs.
  15. }

Note the fluent DSL with Select..Joins..Where..Group which is surprisingly pleasant to work with once you get used to it. Should be familiar if you’ve worked with similar APIs in the past such as JOOQ

Calling an endpoint exposing the query above yields:

  1. [{
  2. "Name": "Person_80",
  3. "Count": 3
  4. },
  5. {
  6. "Name": "Person_81",
  7. "Count": 6
  8. }]

Tidied up the response JSON for the sake of readability

5.3 Unit Testing with GORM

Regrettably, there doesn’t seem to be an idiomatic and super-simple way to unit-test GORM interactions with the database. Some strategies do however exist, such as:

  • Using go-sqlite3 to boot a real light-weight database in unit tests.
  • Using go-sqlmock, see some examples here.
  • Using go-testdb.
    In all honesty, I havn’t really examined any of the options above closely. Instead, I’ve wrapped the GORM db struct in a struct of my own, which implicitly implements this interface:
  1. type IGormClient interface {
  2. QueryAccount(ctx context.Context, accountId string) (model.AccountData, error)
  3. QueryAccountByNameWithCount(ctx context.Context, name string) ([]Pair, error)
  4. SetupDB(addr string)
  5. SeedAccounts() error
  6. Check() bool
  7. Close()
  8. }

Having an interface makes it very straightforward to use testify/mock to mock any interaction with methods on the struct wrapping the GORM db object.

6. Running and testing an endpoint

If you’ve cloned the source and have installed CockroachDB, you can execute the ./copyall.sh script to build and deploy the updated microservices:

  • accountservice
  • imageservice
  • dataservice (NEW)
  • vipservice
    The configuration has been updated, including .yaml-files for the new “dataservice”.

Once we’re up and running, let’s do a curl request to the “accountservice” /accounts/{accountId} endpoint:

  1. > curl http://192.168.99.100:6767/accounts/10002 -k | json_pp
  2. {
  3. "imageData" : {
  4. "id" : "10002",
  5. "servedBy" : "10.0.0.26",
  6. "url" : "http://path.to.some.image/10002.png"
  7. },
  8. "id" : "10002",
  9. "servedBy" : "10.0.0.3",
  10. "name" : "Person_2",
  11. "accountEvents" : [
  12. {
  13. "ID" : "accountEvent-10002",
  14. "created" : "2017-12-22T22:31:06",
  15. "eventName" : "CREATED"
  16. }
  17. ],
  18. "quote" : {
  19. "ipAddress" : "eecd94253fcc/10.0.0.18:8080",
  20. "quote" : "To be or not to be",
  21. "language" : "en"
  22. }
  23. }

Looks good to me!

7. Load test and resilience

Let’s get down to the business of testing whether our setup with CockroachDB is Consistent and Partition Tolerant, while providing acceptable levels of Availability.

Load- and resilience testing a microservice landscape with a distributed data store such as CockroachDB on a laptop running everything in virtualbox isn’t that realistic perhaps, but should at least provide some insights.

For this purpose, I’m going to set up a landscape with the following characteristics:

  • We’ll bypass our EDGE server. We’ll call the accountservice directly to remove TLS overhead for this particular test case.
  • 1 instance of the accountservice, imageservice, dataservice respectively.
  • 2 instances of the quotes-service.
  • 3 CockroachDB instances each running as a Docker Swarm mode service.

7.1 Results - Gatling

I’ve pre-seeded the “account” database with about 15000 records, including at least one “accountevent” per “account”. First test runs a gatlingtest that bombs away at the /accounts/{accountId}_ microservice to fetch our account objects with a peak rate of 50 req/s.

7.1.1 First run

The test runs for 75 seconds with a 5 second ramp-up time.

gatling report 1Figure 7.1.1: Latencies (ms)

Overall latencies are just fine, our microservices and the CockroachDB have no issue whatsoever handling ~50 req/s.

(Why not more traffic? I ran into this bug which introduced 1 or 2 seconds of extra latency per “hop” inside the cluster when running the test for a longer time _or with more traffic - effectively making the results worthless for this test case)_

7.1.2 Second run

During the second run at approx. 20:10:00 in the test, I’m deliberately killing the “cockroachdb3” service. At 20:10:30, I restart the “cockroachdb3” service.

gatling report 2Figure 7.1.2.1: Service response time (ms)

Killing one of the three cockroachdb nodes and restarting it ~30 seconds later has the following effects:

  • No requests fail. This is probably a combination of the CockroachDB master quickly stopping handing over queries to the unavailable node as well as the retrier logic in our microservice which makes sure a failed call from the accountservice to the dataservice is retried 100 ms later.
  • Taking down the node just before 20:10:00 probably causes the small latency spike at ~20:09:57_, though I’d say it is a very manageable little spike end-users probably wouldn’t notice unless this was some kind of near-realtime trading platform or similar.
  • The larger much more noticable spike actually happens when the “cockroachdb3” node comes back up again. My best guess here is that the cockroachdb cluster spends some CPU time and possibly blocks operations when the node re-joins the cluster making sure it’s put into a synchronized state or similar.
  • The mean service latency increased from 33 in run #1 to 39 in run #2, which indicates that while the “spike” at 20:10:30 is noticable, it affects relatively few requests as a whole causing just a slight adverse effect on the overall latencies of the test run.
7.1.3 Both scenarios from the CockroachDB GUI

We can look at the same scenarios from the perspective of the CockroachDB GUI where we can examine a plethora of different metrics.

In the graphs below, we see both scenarios in each graph - i.e. we first run the Gatling test without taking down a CockroachDB instance, while we do the same “kill and revive”-scenario a minute later.

cockroachdb1Figure 7.1.3.1: CockroachDB queries per second over the last 10 seconds

cockroachdb2Figure 7.1.3.2: CockroachDB 99th percentile latency over the last minute

cockroachdb3Figure 7.1.3.3: CockroachDB live node count

The graphs from CockroachDB are pretty consistent with what we saw in the Gatling tests - taking down a CockroachDB node has hardly any noticable effect on latencies or availabilty, while taking up a node actually has a rather severe - though short-lived - effect on the system.

7.1.3 Resource utilization

A typical snapshot of Docker Swarm mode manager node CPU and memory utilization for a number of running containers during the first test:

  1. CONTAINER CPU % MEM USAGE / LIMIT
  2. cockroachdb1.1.jerstedhcv8pc7a3ec3ck9th5 33.46% 207.9MiB / 7.789GiB
  3. cockroachdb2.1.pkhk6dn93fyr14dp8mpqwkpcx 1.30% 148.3MiB / 7.789GiB
  4. cockroachdb3.1.2ek4eunib4horzte5l1utacc0 10.94% 193.1MiB / 7.789GiB
  5. dataservice.1.p342v6rp7vn79qsn3dyzx0mq6 8.41% 10.52MiB / 7.789GiB
  6. imageservice.1.o7odce6gaxet5zxrpme8oo8pr 9.81% 11.5MiB / 7.789GiB
  7. accountservice.1.isajx2vrkgyn6qm50ntd2adja 17.44% 15.98MiB / 7.789GiB
  8. quotes-service.2.yi0n6088226dafum8djz6u3rf 7.03% 264.5MiB / 7.789GiB
  9. quotes-service.1.5zrjagriq6hfwom6uydlofkx1 10.16% 250.7MiB / 7.789GiB

We see that the master CockroachDB instance (#1) takes most of the load, while #2 seems to be almost unused while #3 uses ~10% CPU. Not entirely sure what’s going on under the hood among the CockroachDB nodes, probably the master node is handing off some work to the other node(s) (perhaps those requests whose data it doesn’t store itself?).

Another note is that our Go microservices - especially the “accountservice” - is using a substantial amount of CPU serving the load - in a more real-life scenario we would almost certainly have scaled the accountservice to several worker nodes as well. On a positive note - our Go-based microservices are still using very little RAM.

7.2 Concurrent read/writes

This test case will write random account objects through a new POST API in the accountservice to the databases while simultaneously performing a lot of reads. We’ll observe behaviour as we put the system under moderate (total ~140 DB interactions per second) load and finally see what happens when we pull the plug from one, then another, of the CockroachDB instances just like in 7.1.2 above.

This load-test that writes/reads things concurrently and acts upon newly created data is written in a simple Go program. We’ll observe the behaviour by looking at the graphs in the CockroachDB admin GUI.

concurrent 1Figure 7.2.1: Queries per second and 99th percentile

concurrent 2Figure 7.2.2: Node count

concurrent 3Figure 7.2.3: Replicas

What can we make of the above?

  • CoachroachDB and our microservices seems to handle taking down and then up nodes during a read/write load quite well.
  • The main noticable latency spike we see happens at 10:18:30 in the timeline when we bring “cockroachdb3” back up.
  • Again - taking down nodes are handled really well.
  • Taking up “cockroachdb2” at 10:15:30 was hardly noticable, while taking up “cockroachdb3” at 10:18:30 affected latencies much more. This is - as previously stated - probably related to how CockroachDB distributes data and queries amongst cluster members. For example - perhaps the ~500 records written per minute while a node were down is automatically replicated to the node that was unavailable when it comes back up.

7.3 The issue of addressing

As you just saw, our cluster can handle when a CockroachDB worker node goes down, providing seamless balancing and failover mechanisms. The problem is that if we kill “cockroachdb1”, things comes abruptly to a halt. This stems from the fact that our CoackroachDB cluster is running as three separate Docker Swarm mode services - each having their own unique “cockroachdb1”, “cockroachdb2” and “cockroachdb3” service name. Our dataservice only knows about this connection URL:

  1. postgresql://account_user:account_password@cockroachdb1:26257/account
  2. ^ HERE! ^

so if the service named “cockroachdb1” goes down, we’re in deep s–t. The setup with three separate Docker Swarm mode services is by the way the official way to run CockroachDB on Docker Swarm mode.

Ideally, our “dataservice” should only need to know about a single “cockroachdb” service, but at this point I havn’t figured out how to run three replicas of a CockroachDB service which would make them a single adressable entity. The main issue seems to be mounting separate persistent storage volumes for each replica, but there may be other issues.

Anyway - my interrim hacky solution would probably be based around the concept of client-side load balancing (see part 7 of the blog series), where our dataservice would have to become Docker API-aware and use the Docker Remote API to get and maintain a list of IP-addresses for containers having a given label.

If we add –label cockroachdb to our docker service create commands, we could then apply a filter predicate for that label to a “list services” Docker API call in order to get all running CockroachDB instances. Then, it’ll be straightforward to implement a simple round-robin client-side load balancing mechanism rotating connection instance(s) to the CockroachDB nodes including circuit-breaking and housekeeping.

part13 - client side load balancerFigure 7.3

I’d consider the above solution a hack, I’d much rather figure out how to run CockroachDB instances using replicas. Also - do note that running production databases inside containers with mounted storage is kind of frowned upon anyway, so in a production scenario you’d probably want to use a dedicated DB cluster anyway.

8. Summary

In this part of the blog series, we’ve added a “dataservice” that works with the CockroachDB database well suited to distributed operation, also using the Gorm O/R-mapper for Go for mapping our Go structs to SQL and back. While we’ve only scratched the surface of the capabilities of CockroachDB, our simple tests seems to indicate an open-source database that might be a really interesting candidate for systems that needs a SQL/ACID-capable relational database with horizontal scalability, consistency and high availability.

The next partshould deal with an issue that actually should be one of the first things to incorporate in a sound software architecture - security~ adds support for querying accounts using GraphQL. We’ll get to security - promise!

Please help spread the word! Feel free to share this blog post using your favorite social media platform, there’s some icons below to get you started.

Until next time,

// Erik