How Zalando software engineers develop internal and external APIs
Imagine a distributed system consisting of 8,000+ active service applications; developed and operated by 300+ delivery teams in six tech hubs. 1,200+ software engineers use various technologies to implement business needs and are responsible end-to-end for those components.
A pretty complex system of people and software. And a real challenge to manage the complexity and balance fast delivery and technical dept.
We believe that interfaces are highly valuable technical assets. That’s why we decided early on standards for API engineering including a common API specification language for RESTful service-to-service communication. In our case, it is the OpenAPI standard for synchronous REST interface specification and JSON Schema for asynchronous events.
API-as-a-Product and API First Principle
Zalando is customer-obsessed. As software engineers at Zalando, we treat our APIs as products, always putting ourselves “in the customer’s shoes.” The best way to provide value is to create a well-designed, explicitly defined, discoverable, reusable, easy-to-understand interface which implements the demanded functionality.
We believe in the API First principle and always follow it. It allows us better alignment between a service provider and consumers (i.e. contract) and contributes greatly to the API and overall system design quality.
And here is how we typically develop an API:
Often it starts with a business requirement or an idea for a new product. As a software engineer, I make myself familiar with the domain and the requirements. Already, I think about who the potential consumers of my new API are, how they interact with the interface and what are the main building parts (business processes, resources) of the domain and the API.
The next step is to draft an API outside the code first. We adopted RESTful API web service principles with JSON as main payload format, and use OpenAPI Specification language (a.k.a. Swagger Spec) as format for our API descriptions.
API design is a crucial aspect of the API quality. In order to have the same look-and-feel experience for the API consumers and to raise the quality bar of APIs, our engineers and architects condensed their knowledge and experience in Zalando API Guidelines. I consult them often for design principles and best practices when drafting a new API.
Zalando’s API Portal provides a central repository where API specifications of all deployed services can be discovered. I regularly check related APIs to learn from API design practices and to align my application API with other service APIs of our ecosystem.
The API Portal is the central hub for all API-related information. I can use a comprehensive search to find APIs with their deployment and version information. Basically, I get all I need here to be able to consume the API: contact and deployment information, service location, authorization and authentication requirements, and the most important part: the OpenAPI specification of the interface. This is a great source for examples and inspirations providing even the history of APIs.
All API specifications have to be compliant to our API Guidelines. This ensures the same quality and look-and-feel experience across all Zalando APIs. The API Guild is the owner of the guidelines, but everyone is encouraged to contribute.
Becoming and staying compliant creates some efforts. Fortunately, some of the guideline rules can be automatically checked by Zally - our API Linter. Zally is a set of open source tools to automate compliance and quality assurance of RESTful APIs. It’s able to check lower-level aspects like the format, naming, as well as higher-level interface specification details like error handling and security.
Now it is time to get some real feedback.
Early Review and Feedback
After a team-internal discussion and prototyping work, I ask our peers, the API consumers and other stakeholders, for feedback. They should get the best experience and be able to easily integrate it into their components. Typically, I create a GitHub (Enterprise) pull request, a great tool for collaborative reviews, on the API specification file. If the review is a bigger one (new prominent, external, highly used API, or a bigger change) I additionally invite a special group of API enthusiasts, the API Guild, and involved architects. They provide feedback on API guidelines compliance and best practises, and inspire me to improve and harden my API design.
After the API design is aligned, implementation of the service is the easiest and the most fun part. We have a polyglot microservice application environment. Based on our Zalando Tech Radar principles, our teams have high autonomy to pick the best technologies to implement their services. Hence, there are lots of ways to realize the API. Depending on the implementation use-case, I would pick, for instance, Spring for Java or Kotlin, Akka HTTP for Scala application, or would Go for Resty. If I decide to use Python this time, our open sourced Connexion framework will implement a big part for me. It handles HTTP requests as specified in API specification and maps endpoints to Python functions. Many teams manually implement the API definitions. Sometimes, generators are also used to create, for example, Java or Scala client and server stubs out of the API specification.
Publishing and Operation
In order to promote my newly implemented service, I’m going to publish its API. This is done via deployment artifact, in our case a Docker image. All I need to do is to include the API specification into the image. That’s it. After a deployment to our Kubernetes production infrastructure, the API and all context information appears in API Portal. From now on the API’s history is tracked and it can be discovered by everyone at Zalando.
From the first deployment on, I’m interested in the performance of my API. With some lines of (Kubernetes deployment) configuration, I can activate monitoring and get a ( ZMON) monitoring dashboard “for free.” It is endpoint-based and provides metrics like the number of requests per status code classes, latency (incl. percentiles), and some basic client load monitoring. Additionally, I can easily configure authorization & authentication settings and rate limitations for the endpoints via deployment configuration. Especially in the times of many microservices, this infrastructure features it is a great relief from the operational perspective.
Conclusion and Outlook
Our vision is to build new business capabilities in days, not in weeks, to be highly efficient in engineering and operation of our SaaS ecosystem at scale, based on consistent high quality APIs that are sustainable and fun to use. We are now closer to this vision due to our tools and infrastructure features, like API design principles and guidelines, open API review culture, API portal, and API monitoring.
We are happy that our principles and tools find adoption outside Zalando by other tech companies and API enthusiasts. Our open source API Guidelines and API Linter gain external contributors and improve every day.
We plan to enrich our API service infrastructure with features like out-of-the-box monitoring, authentication/authorization, rate limitation. Our API Portal will be a central access hub for relevant API service operation information (e.g. like hostnames of deployed API services, effective rate limits) and will support backward compatibility checks and subscriptions for notification on API changes, and much more. We will raise adoption and developer experience via application-centric integration of all infrastructure services consistently supporting the developer productivity journey over design, code, build, deploy, and operate phases.
If you want to learn more about API engineering at Zalando, please also check out InfoQ interview How Zalando Delivers APIs with autonomous teams, and earlier tech blog post On APIs and the Zalando API Guild.