How we use GraphQL at Europe's largest fashion e-commerce company

Managing consistent and backwards-compatible APIs for Web and mobile App frontends is always a complex task in the long-term. At Zalando, we have used GraphQL to solve some of the common problems of frontend data requirements while gaining speed of delivery in a large and quickly growing organisation. This article is about GraphQL as Unified-Backend-For-Frontend (UBFF) application and first in a series of posts about problems we solved with our use of GraphQL at Zalando.

photo of Aditya Pratap Singh
Aditya Pratap Singh

Senior Software Engineer

Posted on Mar 04, 2021

GraphQL logo

Background

Today's large scale organizations leveraging microservice architecture face a plethora of problems at the data aggregation and presentation layers. Managing consistent and backwards-compatible APIs for Web and Mobile App frontends is definitely one of the complex ones. The balance between a frontend developer's need for consistent data source and of product managers for delivering new features quickly in a fast-paced, large organization is a tough nut to crack. It is very common for frontend developers to struggle finding the right backend service to deliver a given feature.

The Backend-for-frontend (BFF) concept is a pattern pioneered by Soundcloud wherein a backend application is created for every business and frontend specific use case. With our adoption of microservices at Zalando in 2015, we used this pattern to create a large number of BFFs for Web Product details page, Web wishlist page, Mobile app wishlist view, Mobile app home view and so on. The BFF is very similar to Netflix’s approach of Embracing the Differences which pointed out 4 key characteristics for APIs serving frontend applications:

  • Embrace the differences of the devices
  • Separate content gathering from content formatting/delivery
  • Redefine the border between “Client” and “Server”
  • Distribute innovation

While these two approaches addressed most of these concerns of frontend development, they also introduced other issues for a large organisation like Zalando:

  1. Lack of optimal balance between fast feature delivery and developer experience
  2. Duplication of efforts due to the large number of Backend-for-Frontend microservices
  3. Inconsistent experience for Zalando customers across platforms
  4. Fragmented handling of Security and Authentication concerns
  5. Fragmented Observability implementations

Out of the above problems, Inconsistent experience for Zalando customers across platforms is a subtle one to understand and is more evident when the same business logic and aggregation is done in multiple ways in multiple backends leading to broken customer experiences. This is a classic example of Conway's law which in this case may ignore the User's point of view of different user experiences in their interaction with different frontend applications for the same organization.

The diagram below shows the inconsistency issue that is not uncommon across different user interfaces for the same application if served via multiple backends. In the mobile app the delivery date range for an article on Zalando is 5-9 Feb whereas in the desktop version it’s 1-3 Feb. Even though this particular example is hypothetical, we have seen such inconsistent data bugs at Zalando in the past due to the different BFFs having fragmented logic across different services.

Inconsistent data across desktop and mobile

All the above problems at large scale become exponentially hard. We observed this also at Zalando and used our Unified Backend-For-Frontend graph of Entities approach to address most of these concerns.

Our setup

GraphQL is a query language developed by Facebook to enable declarative data fetching. The users of the API declaratively specify the shape of the data requirement via the query and response structure they expect.

For example, in order to fetch the name of the example product mentioned above you can query it as:

graphql query

From the GraphQL specification design principles, GraphQL was created with business requirements and hierarchical views in modern applications in mind:

Hierarchical: GraphQL specification recommends the language to be structured in hierarchy to be well suited for Hierarchical Views in modern frontend applications

Product-centric: The evolution of a GraphQL schema is directly influenced by the product/business features being developed by frontend engineers

These are the two main principles we have kept in mind at Zalando while building a single GraphQL API as a Unified Backend-For-Frontends (UBFFs) for all Web and mobile App frontend feature teams. We use a monorepo which has a shared ownership across 12+ domain teams using a set of contribution principles. This is similar to the one unified graph concept highlighted in Principled GraphQL.

We use an Entity system where entities are the first-class citizens in the graph with our custom implementation of GraphQL specification (graphql-jit) for performance optimization. The Entities themselves represent content and domain models spread across the Zalando shop e.g. Product, Campaign (elaborating the Entity model will be its own post in the series). The overall application data flow looks like this.

Architecture and data flow across desktop and mobile

We started with the GraphQL solution at Zalando in the first half of 2018 and have had the service in production since the end of 2018. The unified GraphQL schema has grown significantly in the last 2 years to a dense graph now with more than 12 domains and serves more than 80% of Web and 50% of the App use cases (as of February 2021).

Advantages

With our implementation of GraphQL running in production for the last 2 years at Zalando, we addressed most of the aforementioned concerns and observed multiple advantages including:

  • Improved efficiency for developers to find and access data in one place as opposed to finding and integrating with the individual APIs.
  • Improved developer experience via GraphQL tools such as explorer with live assortment data.
  • Faster deployments leading to shipping features faster, leading to happy product managers.
  • Consistent customer experience across platforms with a single consistent data source for frontends.
  • Reduced duplication of effort to develop the same feature across platforms.
  • Easy to enforce governance and organisational best practices.
  • The GraphQL layer has a "No Business Logic" principle, which allows domain specific backend APIs to steer domain or platform (Web vs. App) specific content on their own.

Known concerns and challenges

Code reuse leading to bloated code base

Our approach with GraphQL has been to avoid any platform or domain specific logic in the GraphQL layer and instead let the domain specific teams drive this via presentation layer backend services. This allows us to keep a business logic agnostic data-aggregation layer which serves frontend developers and also helps in operational maintenance.

Presentation layer ensuring business logic agnostic graph

Adoption and learning curve

Given GraphQL was a new technology for our teams, it involved investment in terms of learning curve and adoption. We addressed the adoption using some common mechanisms:

  1. One-stop-shop Documentation: We use a single structured documentation with embedded GraphQL editor, schema documentation, Voyager for schema exploration, practice exercises to allow our new users to adopt GraphQL.
  2. Support chat: Just like any platform team we also provide support channel for any queries from users and contributors of the GraphQL service.
  3. Trainings: Given that GraphQL is new at Zalando, we conducted GraphQL adoption training with 150+ developers participating to learn about using GraphQL at Zalando. The training had a broad impact on a large population of developers intending to switch to GraphQL.
  4. Consultation: GraphQL schema design is always a tricky topic even for frontend developers who can use GraphQL. In order to ensure a single, dense, unified graph, our team also provided consultation for all new domains being integrated into the Unified graph.

These four measures have resulted in increasing the number of contributors to our monorepo from 50 to 150+ in 2020 and developers using GraphQL for feature development from 70 to 200.

God Component

God component is a design smell when a component is excessively large either in the terms of LOC or number of classes. We have a monorepo for the unified GraphQL service which makes it a potential architectural and operational risk. We address the architectural risk by shared ownership mechanism at Zalando, guided by a set of contribution principles. For the operational risk, we observe and address most issues by Reliability Patterns such as Circuit breakers, Timeouts and Retry patterns. We also introduced Bulkhead pattern to provide more Fault tolerance and isolation by deploying the application to serve traffic per platform (separate deployments for Web and mobile Apps).

Related work on Unified GraphQL

Unified Graph is a known concept which is being adopted by a lot of large organisations. Below is a list of some of the large organisations using unified GraphQL in production:

  1. Github has a GraphQL implementation with a single graph of all the domains including repos, users, marketplace etc. in it.

  2. Shopify has a single GraphQL implementations for its StoreFront (customer facing) and Admin (merchant facing) APIs where they allow customers and partners to build experiences using the unified graphs for each of those.

  3. AirBnB has been working on creating a Unified Schema for GraphQL solution, which they shared during the GraphQL Summit 2019 talk.

  4. Expedia moved from a REST specific service to a Central data graph using GraphQL to solve their problems of using REST endpoints where developers were spending more time to figure out which service to call than to develop features.

  5. Apollo Federation is Apollo's solution for providing single data Graph over multiple Graphs across an organization. The difference between the Unified Graph we have at Zalando and Apollo's federation is that instead of having multiple Graphs connected via a library and gateway we have a single service at Zalando which connects all the domains in a single schema Graph. This has tradeoffs which we have addressed as mentioned here, since we gain by keeping a single Graph in terms of tooling, deployment and governance.

  6. Netflix also has its own version of one-graph that they use in the Netflix Studio ecosystem and elaborated the setup in this blog post series.

Conclusion and next steps

The Unified Backend-For-Frontend (UBFF) GraphQL is not a silver bullet, but is a tradeoff which has worked well for our frontend data fetching problems at Zalando. In the next few articles in this series we will cover other aspects of our usage of GraphQL at Zalando in context of Observability, Performance Optimization, Security, Tooling, Errors etc. which allowed us to scale the adoption of the service to 200+ Web and App developers and serve the use cases of more than 25-30 feature teams.

References

If you would like to work on similar challenges and help scale our approach to developing web and mobile clients, consider joining our client engineering teams.



Related posts