Production Ready GraphQL

There are many articles written about GraphQL. Many touch on the basics. But there are pitfalls to watch out for when building large production applications. Applications that enable collaboration by large teams.

In this guide i'll take you through the best practices from years of experience building graphql services that scale.

In this guide we'll cover the following topics.

Let's start with project structure and learn how to design a system that scales well.

Project Structure

The key to scale is grouping related code by feature, not by responsibility or relationship. I first saw this pattern when working with Redux; it's sometimes referred to as the feature folder approach or simply the ducks pattern.

co-locating logic for a given feature in one place typically makes it easier to maintain that code

https://redux.js.org/style-guide/style-guide#structure-files-as-feature-folders-with-single-file-logic

This pattern also works incredibly well for GraphQL. Imagine we have a schema with many types. Like User, Post and Comment.

type User {
  id: ID!
  name: String!
  posts: [Post!]!
  comments: [Comment!]!
}

type Post {
  id: ID!
  title: String!
  author: User!
  comments: [Comment!]!
}

type Comment {
  id: ID!
  text: String!
  user: User!
  post: Post!
}

The best approach here is to keep these entities separate. The folder structure for this schema would look as follows:

src/
  /entities
    /User
      /schema.graphql
      /resolvers.ts
      /index.ts
    /Post
      /schema.graphql
      /resolvers.ts
      /index.ts
    /Comment
      /schema.graphql
      /resolvers.ts
      /index.ts

Note: All code within each entity folder should be considered private, with only the index responsible for exposing anything publicly.

Let's take a look at how an entity is structured with the Post type.

schema.ts

The schema only defines types that are directly a part of the Post type. The Comment and User types are defined within there own entity folder.

# src/entities/Post/schema.graphql

type Post {
  id: ID!
  title: String!
  author: User!
  comments: [Comment!]!
}

type Query {
  post(id: ID!): Post!
  posts: [Post!]!
}

type Mutation {
  createPost(title: String!): Post!
}

The resolvers define only the fields for the Post entity. They are a direct mapping to the schema. We also define nested resolvers for the direct relationships of Post. In this case Post.author and Post.comments. But that's it. Any other relationships, for example the comments of an author would be defined in the Author entitiy.

// src/entities/Post/resolvers.ts

const resolvers = {
  Post: {
    author: () => {},
    comments: () => {},
  },
  Query: {
    post: () => {},
    posts: () => {},
  },
  Mutation: {
    createPost: () => {},
  },
}

export default resolvers

How does this structure help?

Structuring our code this way will completely decouple features, meaning teams can collaborate on a large codebase without causing frustrating code conflicts.
The schema and resolvers will almost always change together, so it's much easier to work on a particular feature when everything is in one place.
By defining nested resolvers of only the direct relationships it doesn't matter how complex our graph becomes. Each entity remains simple and it becomes logical to find a particular peice of code.

To merge each of these entities into a single schema we can use @graphql-tools.

// src/entities/index.ts

import { mergeTypeDefs, mergeResolvers } from '@graphql-tools/merge'
import {
  resolvers as postResolvers,
  schema as postSchema,
} from './entities/Post'
import {
  resolvers as commentResolvers,
  schema as commentSchema,
} from './entities/Comment'
import {
  resolvers as userResolvers,
  schema as userSchema,
} from './entities/User'

const typeDefs = mergeTypeDefs([postSchema, commentSchema, userSchema])
const resolvers = mergeResolvers([
  postResolvers,
  commentResolvers,
  userResolvers,
])

export { typeDefs, resolvers }

Next we'll looks futher into schema design.

Schema Design

The most important take away in defining a solid schema is consistency.

Consider how your schema will change over time and choose a pattern that allows the introduction of new fields without introducing breaking changes.

Unlike a typical Rest API it is not common to version a GraphQL API. Instead, GraphQL APIs are designed to be flexible and evolve over time.

Queries and Mutations

Looking at the schema for the User entity, let's introduce a few operations such as:

getUser - fetch a single user
getUsers - fetch a list of users
createUser - create a new user
updateUser - update an existing user

# Define the User type and include any direct relationships
# as outlines in part 1
type User {
  id: ID!
  name: String!
  posts: [Post!]!
  comments: [Comment!]!
}

# Define a response type for the getUsers query. Here we return
# the list of data within a records field to avoid breaking
# changes in case of pagination
type GetUsersResponse {
  records: [User!]!
}

# All queries and mutations accept a single input
# as this avoids needing to update front-end queries when
# the input grows or  changes.
input WhereUserInput {
  id: ID
}

input WhereUsersInput {
  name: String
  age: Int
}

input CreateUserInput {
  name: String!
  age: Int!
}

input UpdateUserInput {
  name: String
  age: Int
}

type Query {
  getUser(where: WhereUserInput!): User!
  getUsers(where: WhereUsersInput): GetUsersResponse!
}

type Mutation {
  createUser(input: CreateUserInput!): User!
  updateUser(input: UpdateUserInput!): User!
}

Single Input

You'll notice all mutations take a single input. This makes it far easier to write graphql queries and generate types because the base variables never change.

Consumers of the API do not need to update client code every time you introduce new input.

We can write a query with a single input and never need to change it when introducing new fields.

mutation createUser($input: CreateUserInput!) {
  createUser(input: $input) {
    id
    name
  }
}

const res = graphql.request(query, {
  input: {
    name: 'John',
  },
})

Lists and Pagination

Lists of data are returned within a records field rather than on the root like getUsers(where: WhereUsersInput): [Users!]!.

Doing this ahead of time means you can introduce other functionality such as meta data without causing breaking changes.

Here we could introduce pagination to getUsers without causing breaking changes.

type PaginationMeta {
  total: Int!
  perPage: Int!
  currentPage: Int!
  totalPages: Int!
}

input PaginationInput {
  page: Int
  perPage: Int
}

type GetUsersResponse {
  records: [User!]!
  meta: PaginationMeta!
}

type Query {
  getUsers(
    where: WhereUsersInput
    pagination: PaginationInput
  ): GetUsersResponse!
}

Relationships

When defining related data include the graphql types only. For example include Post.user but not Post.userId The id will be already exist on Post.user.id

This ensures your schema does not include duplicated information or expose internals about how you choose to define relationships. The fact the user could be related to a post through a userId on a post table is irrelevant to the consumer of your API.

type Post {
  id: ID!
  title: String
  user: User!
  # This is an anti-pattern, you are exposing how the backend
  # defines the relationship and duplicating data which
  # is already available on the User type.
  userId: String!
}

type User {
  id: ID!
  name: String!
  age: Int!
  posts: [Post!]!
}

Tooling

There are opinionated tools available which can help you define a schema more quickly. However you may sacrifice some flexibility. I prefer to define them manually.

GraphQL Relay - Allows the easy creation of Relay-compliant servers. https://www.npmjs.com/package/graphql-relay

Hasura - Generate an instant GraphQL API in-front of your database. https://hasura.io/

My main recommendation would be to write a basic boilerplate for a new type and use that when adding new types to your schema. You could even automate this with a basic generator.

Performance

We started to implement a nested resolver pattern earlier. But why is this important?

The alternative to this is to return everything you need from the root resolver. But this would quickly cause performance issues such as data over-fetching.

The graph could also be connected in exponentially complex ways. Therefore it would become impractical to return everything you need from a single root resolver.

Imagine a query that looks like this.

query {
  posts {
    id
    title
    comments {
      id
      text
      user {
        id
        name
      }
    }
  }
}

Here we would need to find all posts with all their comments and for each comment return the user who created it.

const posts = (root, args, context) => {
  return context.db.posts.find({
    include: {
      comments: {
        include: {
          user: true,
        },
      },
    },
  })
}

This is not scalable. As the schema becomes more complex the posts resolver would grow more complex.

The posts resolver would also need to know information on how the comment and user types are related. This tightly couples the code and makes it less resilient to change.

By using nested resolvers we can throw any complex query at our service and see it gracefully handled with no extra effort on our part. Each entity remains isolated only implementing logic directly related to itself.

The best approach here is nested resolvers which only resolve relationships one level deep.

// nodes/Post/resolvers.ts
const resolvers = {
  Post: {
    comments: (post, args, context) => {
      return context.db.comments.find({
        where: {
          postId: post.id,
        },
      })
    },
  },
}

// nodes/Comment/resolvers.ts
const resolvers = {
  Comment: {
    user: (comment, args, context) => {
      return context.db.users.findOne(comment.userId)
    },
  },
}

With this it doesn't matter which point in a query a type was requested. It will always resolve.

# Request all posts with all their comments and
# for each comment return the user who created it.
query {
  posts {
    id
    title
    comments {
      id
      text
      user {
        id
        name
      }
    }
  }
}

# Request all comments and the user who created it.
query {
  comments {
    id
    text
    user {
      id
      name
    }
  }
}

N+1 Problem

Using nested resolvers can create other problems.

the n+1 problem occurs when you loop through the results of a query and perform one additional query per result, resulting in n number of queries plus the original (n+1). This is a common problem with ORMs, particularly in combination with GraphQL, because it is not always immediately obvious that your code is generating inefficient queries.

Imagine we want to make the following query again:

query {
  posts {
    id
    title
    comments {
      id
      text
      user {
        id
        name
      }
    }
  }
}

For each resolver, this would create a separate call to our database.

Select all posts

SELECT id, title FROM posts
# Return three posts with ids 1, 2, 3

For each post run a separate query for its comment

SELECT id, text, userId FROM comments WHERE postId = 1;
SELECT id, text, userId FROM comments WHERE postId = 2;
SELECT id, text, userId FROM comments WHERE postId = 3;

For each comment, run a separate call for the user. This could be the same user across multiple comments.

SELECT id, name FROM users WHERE id = 1;
SELECT id, name FROM users WHERE id = 1;
SELECT id, name FROM users WHERE id = 1;
SELECT id, name FROM users WHERE id = 2;
SELECT id, name FROM users WHERE id = 2;
SELECT id, name FROM users WHERE id = 2;

In this simple example a single graphql query resulted in 10 database requests.

The number of round trips to the database will rapidly grow. But this can be avoided using dataloaders. ORM's such as Prisma already implement a dataloader pattern using query optimization

A dataloader allows you to batch multiple queries together and only perform one query per batch.

Taking a look at our user resolver we could instead use a dataloader.

const resolvers = {
  Comment: {
    user: (comment, args, context) => {
      // Direct call to database (Inefficient)
      // return context.db.users.findOne(comment.userId);

      // Using a dataloader (Efficient)
      return userDataloader.load(comment.userId)
    },
  },
}

The implementation of this dataloader would look something like:

const batchUserFunction = (ids) => {
  return db.users.findAll({
    whereIn: ids,
  })
}

export default new DataLoader(batchUserFunction)

This would translate to a single database query for all users. The dataloader will handle deduplicating the inputs and returning the correct user for each call.

SELECT id, name FROM users WHERE id in (1, 2);

Read more about dataloaders https://github.com/graphql/dataloader

With a combination of nested resolvers and dataloaders we can avoid data over-fetching and ensure we don't make too many roundtrips to the underlying datasource.

Testing

The best way to test a graphql server is to send it graphql requests, sounds obvious, right? This is best suited to integration testing because we'll also be testing the schema and nested resolvers with each request.

Calling the resolvers directly, as you would in a typical unit test is not desirable because you circumvent the relationship between the schema > resolvers and fail to execute any nested resolvers to build out the final response.

❌ Avoid tests like this

const res = resolvers.User.getUser({ args: { id: 1 } })

expect(res).toEqual({ id: 1, name: 'John' })

✅ Write tests like this

const res = await query({
  query: `
    query getUser($input: GetUserInput!) {
      getUser(input: $input) {
        id
        name
      }
    }
  `,
})

expect(res.data.getUser).toEqual({ id: 1, name: 'John' })

Try to limit the data you return to direct relationships only. For example the following query should be broken into two tests.

query {
  getUser(input: { ... }) {
    id
    name
    posts {
      id
      title
      comments {
        id
        text
      }
    }
  }
}

The first test ensures the root resolver for getUser works and the direct nested resolvers on the User type, User.posts

// src/entities/User/resolvers.integration.test.ts

const res = await query({
  query: `
    query getUser($input: GetUserInput!) {
      getUser(input: $input) {
        id
        name
        posts {
          id
          title
        }
      }
    }
  `,
})

expect(res.data.getUser).toEqual({ id: 1, name: 'John' })
expect(res.data.getUser.posts).toEqual([{ id: 1, title: 'Hello World' }])

The second test validates the nested resolver on the Post type works as expected. Post.comments

// src/entities/Post/resolvers.integration.test.ts

const res = await query({
  query: `
    query getPost($input: GetPostInput!) {
      getPost(input: $input) {
        id
        title
        comments {
          id
          text
        }
      }
    }
  `,
})

expect(res.data.getPost).toEqual({ id: 1, name: 'John' })
expect(res.data.getPost.comments).toEqual([{ id: 1, text: 'Great post!' }])

This may not be possible if you don't have a getPost query. In which case a good rule to follow is; always write the shallowest possible query to access a peice of the graph.

Monitoring

Any production server needs a solid monitoring solution and with GraphQL thats no different. However there are a few more things to consider.

When we send a request to our server, we're potentially calling any number of resolvers to build the final payload. So it's important to have a breakdown of where time is spent.

We want to do this without needing to explicity add logs to each resolver, because thats just another developer task that can easily be forgotten.

Services such as sentry support tracing through transactions which allow us to build a record of the entire request before sending data to our monitoring service.

We can use the context method to acheive this, as it's executed on every request.

import * as Sentry from '@sentry/node';
import { Transaction } from '@sentry/tracing';
import { Transaction } from "@sentry/types";

export interface Context {
  // ... other context fields for your context
  transaction: Transaction
}

export async function createContext(): Promise<Context> { {
  // ... create other context fields
  const transaction = Sentry.startTransaction({
    op: "gql",
    name: "GraphQLTransaction", // this will be the default name, unless the gql query has a name
  })
  return { transaction };
}

Then with apollo middleware (this is possible with other providers too) we can capture each resolver.

import { ApolloServerPlugin } from "apollo-server-plugin-base"
import { Context } from "./context"

const plugin: ApolloServerPlugin<Context> = {
  requestDidStart({ request, context }) {
    if (!!request.operationName) { // set the transaction Name if we have named queries
      context.transaction.setName(request.operationName!)
    }
    return {
      willSendResponse({ context }) { // hook for transaction finished
        context.transaction.finish()
      },
      executionDidStart() {
        return {
          willResolveField({ context, info }) { // hook for each new resolver
            const span = context.transaction.startChild({
              op: "resolver",
              description: `${info.parentType.name}.${info.fieldName}`,
            })
            return () => { // this will execute once the resolver is finished
              span.finish()
            }
          },
        }
      },
    }
  },
}

export default plugin

Our final server initialization would then look as follows:

import { ApolloServer } from 'apollo-server-micro'
import { createContext } from './context'
import SentryPlugin from './sentry-plugin'

const apolloServer = new ApolloServer({
  // ... your ApolloServer options
  // Create context function
  context: ({ req, connection }) => createContext({ req, connection }),
  // Add our sentry plugin
  plugins: [SentryPlugin],
})

When this data is pushed to sentry you will see a useful graph of the time(ms) spent in each resolver executed as part of the entire query.

Tracing in Sentry

That covers structure, schema design, performance, testing and monitoring. These are the approaches I have personally taken to build large graphql applications.

Thanks for reading.