Talking to Elasticsearch
Good communication between applications and clusters is an essential requirement for scaling.
You may have previously read about our use of Elasticsearch at Zalando Tech, and especially our utilization of Elasticsearch Express: An appliance with a toolkit enabling quick deployment and management of Elasticsearch clusters. For us, clusters are not exposed to consumers directly, but rather behind applications. Good communication between applications and clusters is an essential requirement for scaling either without causing problems.
Your application has to be aware of the cluster nodes. If new ones have joined, the application needs to be redistributing requests on all nodes equally to allow Elasticsearch to handle more traffic. If your application is not informed about new nodes, there is no point in adding them, since you are still hitting the older ones. Also, the application must not send requests to nodes that have already left the cluster; it is disappointing to send a request to someone who is not around to receive it.
That's why communication between applications and clusters is a key for the dynamic scaling of serving data.
The most recommended way for applications to talk to Elasticsearch is using the TCP transport client. If configured properly, it will allow the application to constantly fetch the current member nodes of the cluster and distribute requests in a round-robin fashion equally. It opens a TCP connection pool to all available nodes directly without crossing load balancers.
Let’s dive into the process of getting the application acquainted with the cluster.
First, a TCP ELB is available only for the first acquaintance with the cluster. It's basically a DNS that forwards the first connection request to a random node. In this instance your application only knows this node, and can only send requests to it.

This is very limiting if you have ambitious scaling plans. To allow your application to sniff the rest of the cluster nodes, you need to configure the TCP client in the application to sniff the list of cluster nodes through the first node the application established a connection with, by adding “client.transport.sniff:true”.

The application will receive a list of IPs with all the member nodes of the cluster, and can distribute calls equally on each. The TCP ELB was only used for a random acquaintance with the cluster at the start of a new application instance. There is no need to worry about warming up ELBs, as the application is hitting cluster nodes by their direct IPs without extra network hops.
Your application or cluster can now scale individually without causing issues for your consumers.

We are still learning and experimenting, with more improvements and guarantees to come for the stability and availability of our infrastructure. We’ll keep you updated with changes, but in the meantime, you can send through any questions via Twitter at @alaa_elhadba.
We're hiring! Do you like working in an ever evolving organization such as Zalando? Consider joining our teams as a Data Engineer!


