Using Akka cluster-sharding and Akka HTTP on Kubernetes
This article captures the implementation of an application serving data over HTTP which is stored in cluster-sharded actors and deployed on Kubernetes.
Use case: An application, serving data over HTTP and with a high request rate, and the latency of order of 10ms with limited database IOPS available.
My initial idea was to cache it in memory, which worked pretty well for some time. But this meant larger instances due to duplication of cached data in the instances behind the load balancer. As an alternative I wanted to use Kubernetes for this problem and do a proof of concept (PoC) of a distributed cache with Akka cluster-sharding and Akka-HTTP on Kubernetes.
This article is by no means a complete tutorial to Akka cluster sharding or Kubernetes. It outlines knowledge I gained while doing this PoC. The code for this PoC can be found here.
Let’s dig into the details of this implementation.
To form an Akka Cluster, there needs to a pre-defined ordered set of contact points often called seed nodes. Each Akka node will try to register itself with the first node from the list of seed nodes. Once, all the seed nodes have joined the cluster, any new node can join the cluster programmatically.
The ordered part is important here, because if the first seed node changes frequently then the chances of split-brain increases. More info about Akka Clustering can be found here.
StatefulSet guarantees stable and ordered pod creation, which satisfies the requirement of our seed nodes, and Headless Service is responsible for their deterministic discovery in the network. So, the first node will be “-0” and the second will be “-1” and so on.
- is replaced by the actual name of the application
The DNS for the seed nodes will be of the form:
- Start with creating the Kubernetes resources. First, the Headless Service, which is responsible for deterministic discovery of seed nodes(Pods), can be created using the following manifest:
kind: Service apiVersion: v1 metadata: name: distributed-cache labels: app: distributed-cache spec: clusterIP: None selector: app: distributed-cache ports: - port: 2551 targetPort: 2551 protocol: TCP
Note, that the “clusterIP” is set to “None.” Which indicates it’s a Headless Service.
Create a StatefulSet, which is a manifest for ordered pod creation:
apiVersion: "apps/v1beta2" kind: StatefulSet metadata: name: distributed-cache spec: selector: matchLabels: app: distributed-cache serviceName: distributed-cache replicas: 3 template: metadata: labels: app: distributed-cache spec: containers: - name: distributed-cache image: "localhost:5000/distributed-cache-on-k8s-poc:1.0" env: - name: AKKA_ACTOR_SYSTEM_NAME value: "distributed-cache-system" - name: AKKA_REMOTING_BIND_PORT value: "2551" - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: AKKA_REMOTING_BIND_DOMAIN value: "distributed-cache.default.svc.cluster.local" - name: AKKA_SEED_NODES value: "distributed-cache-0.distributed-cache.default.svc.cluster.local:2551,distributed-cache-1.distributed-cache.default.svc.cluster.local:2551,distributed-cache-2.distributed-cache.default.svc.cluster.local:2551" ports: - containerPort: 2551 readinessProbe: httpGet: port: 9000 path: /health
Create a service, which will be responsible for redirecting outside internet traffic to pods:
apiVersion: v1 kind: Service metadata: labels: app: distributed-cache name: distributed-cache-service spec: selector: app: distributed-cache type: ClusterIP ports: - port: 80 protocol: TCP # this needs to match your container port targetPort: 9000
apiVersion: extensions/v1beta1 kind: Ingress metadata: name: distributed-cache-ingress spec: rules: # DNS name your application should be exposed on - host: "distributed-cache.com" http: paths: - backend: serviceName: distributed-cache-service servicePort: 80
And the distributed cache is ready to use:
Summary This article covers Akka Cluster-sharding on Kubernetes with the pre-requirements of an ordered set of Seed Nodes and their deterministic discovery in the network, and how it can be solved with StatefulSet(s) and Headless Service(s).
This approach of caching data in a distributed fashion offered the following advantages:
- Less database lookup, saving database IOPS
- Efficient usage of resources; fewer instances as a result of no duplication of data
- Lower latencies to serve data
Interested in working at Zalando Tech? Our job openings are here.