Optimize GraphQL Server with Lookaheads

GraphQL offers a way to optimize the data between a client and a server. We can use the declarative nature of a GraphQL query to perform lookaheads. Lookaheads provide us a way to optimize the data between the GraphQL server and a backend data provider - like a database or another server that can return partial responses.

photo of Boopathi Rajaa Nedunchezhiyan
Boopathi Rajaa Nedunchezhiyan

Senior Software Engineer

Posted on Mar 18, 2021

In our first post about How we use GraphQL at Zalando, we briefly shared about performance optimizations using GraphQL-JIT. GraphQL-JIT allowed us to scale our implementation without performance degradations. In this post, we share another optimization we use - Lookaheads.

Lookaheads

Same Model; Different Views

In our GraphQL service, we do not have resolvers for every single field in the schema. Instead, we have certain groups of fields resolved together as a single request to a backend service that provides the data. For example, let's take a look at the product resolver,

resolvers = {
  Query: {
    product(_, { id }) {
      return ProductBackend.getProduct(id);
    },
  },
};

This resolver will be responsible for getting multiple properties of the Product - name, price, stock, images, material, sizes, brand, color, other colors, and further details. The same Product type in the schema can render as a Product Card in a grid or the entire Product Page. The amount of data required for a Product card is less than the complete product details of a product page.

Different views of the same model

Every time the product resolver is called, the entire response from the product backend is requested by the GraphQL service. Though GraphQL allows us to specify the data requirements to fetch optimally, it becomes beneficial only between the client-server communication. The data transfers between the GraphQL server and the Backend server remain unoptimized.

Partial Responses

Most of the backend services in Zalando support Partial responses. In the request, one can specify the fields' list. Only these fields must be in the response trimming other fields which were not specified in the request. The backend service treats this as a filter and returns only those fields. It is similar to what GraphQL offers us, and the request somewhat looks like this -

GET /product?id=product-id&fields=name,stock,price

Here, the fields query parameter is used to declare the required response fields. The backend can use this to compute only those response fields. Likewise, the backend can pass it further down the pipeline to another service or database. The response for the above request would look like the following -

{
  "name": "Fancy T-Shirt",
  "stock": "AVAILABLE",
  "price": "EUR 35.50"
}

Partial responses help in reducing the amount of data over the wire and give a good performance boost. A GraphQL query is also precisely the same thing - it provides a well-defined language for the fields parameter in the above request.

Lookahead

Let's leverage these partial responses and use them in the GraphQL server. When resolving the product, we must know what the next fields are within this product, (or) we need to look ahead in the query to get the sub-fields of the product.

query {
  product(id: "foo") {
    name
    price
    stock
  }
}

A thing to note - name, stock, and price do not have explicitly declared resolvers. When resolving product, how can we know what its sub-selections are? Here, navigating the query AST (Abstract Syntax Tree) helps. During execution, the resolver function will receive the AST of the current field. The structure of the AST depends on the language and implementation. For GraphQL-JS, or GraphQL-JIT executors, it is available in the last parameter (of the resolver function) which is called a Resolve Info.

resolvers = {
  Query: {
    product(_, { id }, context, info) {
      const fields = getFields(info);
      return ProductBackend.getProduct(id, fields);
    },
  },
};

We use the query AST in the resolve info to compute the list of fields under product, pass this list of fields to the product backend, which supports partial responses, and then send the backend response as the resolved result.

Field Nodes

The resolve info is useful for doing a lot of optimizations. Here, for this case, we are interested in the fieldNodes. It is an array of objects, each representing the same field - in this case - product. Why is it an array? A single field may appear in more than one place in a query - for instance, fragments, inline fragments, aliasing, etc. For simplicity, we will not consider fragments and aliasing in this post.

The entire query is a tree of field nodes where the children at each level are available as selection sets.

Each fieldNode has a Selection Set, a list of subfield nodes - here - the selection set will be the field nodes of name, stock, and price. So the getFields implementation (without considering fragments and aliasing) will look like the following -

function getFields(info) {
  // TODO: handle all field nodes in other fragments
  return info.fieldNodes[0].selectionSet.selections.map(
    (selection) =>
      // TODO: handle fragments
      selection.name.value
  );
}

When we pass product resolver's info, the getFields function returns [name, stock, price]. We can take this list and pass it to the backend as the query parameter.

For simple use-cases like these, where the backend data structure and the GraphQL schema are the same, it's possible to use GraphQL fields as the backend fields. When it's a bit different, we need to map the schema fields to backend fields for the request. Also, we need to map the backend fields back to schema fields for the response.

Different schemas

If the backend fields are different from the GraphQL schema fields, then there exists a mapping from schema fields to backend fields. A simple mapping may be the difference in the name of the fields. For example, name in schema might be title in the backend. This mapping can get complex where a single schema field might derive from multiple backend fields. For example, price in schema might be a concatenation of currency and amount from the backend. It gets interesting when we have nested structures - for example, price in schema might be a concatenation of price.currency and price.amount.

The response is partial

Another aspect of this mapping is that it's not enough to think about it one way - from schema fields to backend fields. It only suffices the request from the GraphQL server to the backend server. The response that the backend sends must transform to match the schema, and it isn't free when we have such complications in the mapping of fields.

When we have a single transform function that converts backend response to match the schema, we have to understand that it is built from a partial response and not the complete response -

function backendProductToSchemaProduct(backendProduct) {
  return {
    name: backendProduct.title,
    // we have a problem here -
    price: `${backendProduct.currency} ${backendProduct.amount}`,
    stock: backendProduct.stock_availability,
  };
}

In the above implementation, when the query is { product(id) { name } }, the transformer will try to convert, assuming the complete response is available. Since the backend responded with partial data (only the name field is used), the access to a nested property will throw an error - Cannot read property currency of 'undefined'. We could have a null check at every place, but the code becomes not maintainable. So we need a way to model it both ways -

  1. Map schema fields to backend fields during the request to the backend
  2. Map backend fields to schema fields with the response from the backend

Dependency Maps

The mapping we talked about in our scribbling phase is what a dependency map is. Every schema field depends on one or many nested fields in the backend. A way to represent this can be as simple as an object whose keys are schema fields, and the values are a list of object paths.

const dependencyMap = {
  name: ["title"],
  price: ["price.currency", "price.amount"],
  stock: ["stock_availability"],
};

Dependency Map

From this dependency map, we can create our request to the backend. Let's say the backend takes a query parameter fields in the following form - a comma-separated list of object path strings. Depending on the implementation, there can be a wide variety of formats for this. Here, we will take a simple one.

function getBackendFields(schemaFields, dependencyMap) {
  // Set helps in deduping
  const backendFields = new Set(
    schemaFields
      .map((field) => dependencyMap[field])
      .reduce((acc, field) => [...acc, ...field], [])
  );
  return backendFields.join(",");
}

For schema fields name and price, the computed backend fields would be a string, and we can construct the request to the backend -

GET /product?id=foo&fields=title,price.currency,price.amount

Transformation Maps

After the request, we know that the backend returns a partial response instead of the complete response. We also saw above that a single function that transforms the entire backend response to schema fields is not enough. Here, we use a transformation map. It's a map of schema fields to transformation logic. Like the dependency map, the keys are schema fields, but the values are transform functions that use only specific fields from the backend.

const transformerMap = {
  name: (resp) => resp.title,
  price: (resp) => `${resp.currency} ${resp.amount}`,
  stock: (resp) => resp.stock_availability,
};

As you see here, each value is a function where the only properties used inside this function are from the dependency map. To construct the result object from the partial response of the backend, we use the same computed sub-fields (from the getFields function) and use them on the transformer map. For example -

function getSchemaResponse(backendResponse, transformerMap, schemaFields) {
  const schemaResponse = {};
  for (const field of schemaFields) {
    schemaResponse[field] = transformerMap[field](backendResponse);
  }
  return schemaResponse;
}

So far,

Let's recap on how the concept we have so far unwrapped -

  1. getFields: compute sub-fields by looking ahead in AST
  2. getBackendFields: compute backend fields from sub-fields and dependency map
  3. request the backend with the computed backend fields
  4. getSchemaResponse: compute schema response from partial backend response, sub-fields, and the transformer map

Batching

At Zalando, like partial responses, most of our backends support batching multiple requests into a single request. Instead of getting a resource by its id, most backends have to get resources by ids. For example,

GET /products?ids=a,b,c&fields=name

will return the response,

[{ "name": "a" }, { "name": "b" }, { "name": "c" }]

We should take advantage of such features. One of the popular libraries that aid us in batching is the DataLoader by Facebook.

We provide the dataloader - an implementation for handling an array of inputs that returns an array of outputs/responses in the same order. The dataloader takes care of combining and batching requests from multiple places in the code in an optimal fashion. You can read more about it in the Dataloader's documentation.

Dataloader for Product resolver

When a Product appears in multiple parts of the same GraphQL query, each will create separate requests to the backend. For example, let's consider this simple GraphQL query -

query {
  foo: product(id: "foo") {
    ...productCardFields
  }
  bar: product(id: "bar") {
    ...productCardFields
  }
}

The products foo and bar are batched together into a single query using aliasing. If we implement a resolver for a product that calls the ProductBackend, we will end with two separate requests. Our goal is to make it in a single request. We can implement this with a dataloader -

async function getProductsByIds(ids) {
  const products = await fetch(`/products?ids=${ids.join(",")}`);
  return products;
}

const productLoader = new Dataloader(getProductsByIds);

We can use this productLoader in our product resolver -

resolvers.Query.product = async (_, { id }) => {
  const product = await productLoader.load(id);
  return product;
};

The Dataloader takes care of the magic of combining multiple calls to the load method into a single call to our implementation - getProductsByIds.

Complexities

The DataLoader deduplicates inputs, optionally cache the outputs and also provides a way to customize these functionalities. In the productLoader defined above, our input is the product id - a string. When we introduce the concepts of partial responses, the backend expects more than just the id - it also predicts the fields parameter used to select the fields for the response. So our input to the loader is not just a string - let's say, it's an object with keys - ids and fields. The dataloader implementation now becomes -

async function getProductsByIds(inputs) {
  const ids = inputs.map((input) => input.id);
  //
  // We have a problem here
  //                    v
  const fields = inputs[0].fields;
  const products = await fetch(
    `/products?ids=${ids.join(",")}&fields=${fields}`
  );
  return products;
}

Here, in the above code-block, the problem is highlighted with a comment - each of the productLoader.load calls can have a different set of fields. What is our strategy for merging all of these fields? Why do we need to merge?

Let's go back to an example and understand why we should handle this -

query {
  foo: product(id: "foo") {
    name
  }
  bar: product(id: "bar") {
    price
  }
}

The product foo requires name and product bar requires price. If we remind ourselves how this gets translated to backend fields using the dependency map, we end up with the following calls -

productLoader.load({
  id: "foo",
  fields: ["name"],
});

productLoader.load({
  id: "bar",
  fields: ["price.currency", "price.amount"],
});

If these two calls get into a single batch, we need to merge the fields such that both of them work during the transformation of backend fields to schema fields. Unfortunately, it's impossible to select different fields for different ids in the backend in most cases. If this is possible in your case, you probably do not need merging. But for our use-case and probably many others, let's continue the topic assuming merging is necessary.

Merging fields

Merge fields and IDs

In the above example, the correct request to the backend would be -

GET /products
  ? ids = foo , bar
  & fields = name , price.currency , price.amount

The merge strategy is quite simple; it's a union of all the fields. Structurally we need the following transformation - [ { id, fields } ] to { ids, mergedFields }. The following implementation merges the inputs -

function mergeInputs(inputs) {
  const ids = [];
  const fields = new Set();
  for (const input of inputs) {
    ids.push(input.ids);
    for (const field of input.fields) {
      fields.add(field);
    }
  }

  return {
    ids,
    mergedFields: [...fields].join(","),
  };
}

Putting it all together

Combining all the little things we handled so far, the flow for the product field resolution would be -

  1. getFields: compute sub-fields by looking ahead in AST
  2. getBackendFields: compute the list of backend fields from sub-fields and dependency map
  3. productLoader.load({ id, backendFields }): use the product loader to schedule in the dataloader to fetch a product.
  4. mergeFields: merge the different inputs to dataloader into a list of ids and union of all backendFields from all inputs.
  5. Send the batched input as a request to the backend and get the partial response
  6. getSchemaResponse: compute schema fields from partial backend response, sub-fields computed in the first step, and the transformer map
const productLoader = new DataLoader(getBackendProducts);

const resolvers = {
  Query: {
    async product(_, { id }, __, info) {
      const fields = getFields(info);
      const backendFields = getBackendFields(fields, dependencyMap);
      const backendResponse = await productLoader.load({
        id,
        fields: backendFields,
      });
      const schemaResponse = getSchemaResponse(
        backendResponse,
        fields,
        transformerMap
      );
      return schemaResponse;
    },
  },
};

const dependencyMap = {
  name: ["title"],
  price: ["price.currency", "price.amount"],
  stock: ["stock_availability"],
};

const transformerMap = {
  name: (resp) => resp.title,
  price: (resp) => `${resp.currency} ${resp.amount}`,
  stock: (resp) => resp.stock_availability,
};

function getFields(info) {
  return info.fieldNodes[0].selectionSet.selections // TODO: handle all field nodes in other fragments
    .map(
      (
        selection // TODO: handle fragments
      ) => selection.name.value
    );
}

function getBackendFields(schemaFields, dependencyMap) {
  // Set helps in deduping
  const backendFields = new Set(
    schemaFields
      .map((field) => dependencyMap[field])
      .reduce((acc, field) => [...acc, ...field], [])
  );
  return backendFields;
}

async function getBackendProducts(inputs) {
  const { ids, mergedFields } = mergeInputs(inputs);
  const products = await fetch(
    `/products?ids=${ids.join(",")}&fields=${mergedFields}`
  );
  return products;
}

function mergeInputs(inputs) {
  const ids = [];
  const fields = new Set();
  for (const input of inputs) {
    ids.push(input.ids);
    for (const field of input.fields) {
      fields.add(field);
    }
  }

  return {
    ids,
    mergedFields: [...fields].join(","),
  };
}

function getSchemaResponse(backendResponse, transformerMap, schemaFields) {
  const schemaResponse = {};
  for (const field of schemaFields) {
    schemaResponse[field] = transformerMap[field](backendResponse);
  }
  return schemaResponse;
}

Conclusion

All of the code, patterns, and nuances we have seen until now may differ for different applications or different languages. The critical aspect is to leverage the declarative nature of GraphQL and optimize for better user experience at all points throughout the lifecycle of a request.

Field filtering using Dependency Maps and Transformer Maps enables us to handle complexities in optimizing GraphQL servers for performance. Though this looks like a lot of work, at runtime, this outperforms the otherwise unoptimized handling of huge responses from the backend - JSON parsing cost + transfer of bytes + construction time of the response by the backend.

You also have to consider the trade-off of whether such optimizations work for every backend. As the GraphQL schema grows, these solutions scale well. At Zalando's scale, it has proved to be better than transferring a giant unoptimized blob of data.


_If you would like to work on similar challenges, consider joining our engineering teams.


Related posts



Related posts