--- title: Normalized Caching order: 1 --- # Normalized Caching In GraphQL, like its name suggests, we create schemas that express the relational nature of our data. When we create and query against a `Query` type we walk a graph that starts at the root `Query` type and walks through relational types. Rather than querying for normalized data, in GraphQL our queries request a specific shape of denormalized data, a view into our relational data that can be re-normalized automatically. As the GraphQL API walks our query documents it may read from a relational database and _entities_ and scalar values are copied into a JSON document that matches our query document. The type information of our entities isn't lost however. A query document may still ask the GraphQL API about what entity it's dealing with using the `__typename` field, which dynamically introspects an entity's type. This means that GraphQL clients can automatically re-normalize data as results come back from the API by using the `__typename` field and keyable fields like an `id` or `_id` field, which are already common conventions in GraphQL schemas. In other words, normalized caches can build up a relational database of tables in-memory for our application. For our apps normalized caches can enable more sophisticated use-cases, where different API requests update data in other parts of the app and automatically update data in our cache as we query our GraphQL API. Normalized caches can essentially keep the UI of our applications up-to-date when relational data is detected across multiple queries, mutations, or subscriptions. ## Normalizing Relational Data As previously mentioned, a GraphQL schema creates a tree of types where our application's data always starts from the `Query` root type and is modified by other data that's incoming from either a selection on `Mutation` or `Subscription`. All data that we query from the `Query` type will contain relations between "entities", JSON objects that are hierarchical. A normalized cache seeks to turn this denormalized JSON blob back into a relational data structure, which stores all entities by a key that can be looked up directly. Since GraphQL documents give the API a strict specification on how it traverses a schema, the JSON data that the cache receives from the API will always match the GraphQL query document that has been used to query this data. A common misconception is that normalized caches in GraphQL store data by the query document somehow, however, the only thing a normalized cache cares about is that it can use our GraphQL query documents to walk the structure of the JSON data it received from the API. ```graphql { __typename todo(id: 1) { __typename id title author { __typename id name } } } ``` ```json { "__typename": "Query", "todo": { "__typename": "Todo", "id": 1, "title": "implement graphcache", "author": { "__typename": "Author", "id": 1, "name": "urql-team" } } } ``` Above, we see an example of a GraphQL query document and a corresponding JSON result from a GraphQL API. In GraphQL, we never lose access to the underlying types of the data. Normalized caches can ask for the `__typename` field in selection sets automatically and will find out which type a JSON object corresponds to. Generally, a normalized cache must do one of two things with a query document like the above: - It must be able to walk the query document and JSON data of the result and cache the data, normalizing it in the process and storing it in relational tables. - It must later be able to walk the query document and recreate this JSON data just by reading data from its cache, by reading entries from its in-memory relational tables. While the normalized cache can't know the exact type of each field, thanks to the GraphQL query language it can make a couple of assumptions. The normalized cache can walk the query document. Each field that has no selection set (like `title` in the above example) must be a "record", a field that may only be set to a scalar. Each field that does have a selection set must be another "entity" or a list of "entities". The latter fields with selection sets are our relations between entities, like a foreign key in relational databases. Furthermore, the normalized cache can then read the `__typename` field on related entities. This is called _Type Name Introspection_ and is how it finds out about the types of each entity. From the above document we can assume the following relations: - `Query.todo(id: 1)` → `Todo` - `Todo.author` → `Author` However, this isn't quite enough yet to store the relations from GraphQL results. The normalized cache must also generate primary keys for each entity so that it can store them in table-like data structures. This is for instance why [Relay enforces](https://relay.dev/docs/guides/graphql-server-specification/#object-identification) that each entity must have an `id` field. This allows it to assume that there's an obvious primary key for each entity it may query. Instead, `urql`'s Graphcache and Apollo assume that there _may_ be an `id` or `_id` field in a given selection set. If Graphcache can't find these two fields it'll issue a warning, however a custom `keys` configuration may be used to generate custom keys for a given type. With this logic the normalized cache will actually create the following "links" between its relational data: - `"Query"`, `.todo(id: 1)` → `"Todo:1"` - `"Todo:1"`, `.author` → `"Author:1"` As we can see, the `Query` root type itself has a constant key of `"Query"`. All relational data originates here, since the GraphQL schema is a graph and, like a tree, all selections on a GraphQL query document originate from it. Internally, the normalized cache now stores field values on entities by their primary keys. The above can also be said or written as: - The `Query` entity's `todo` field with `{"id": 1}` arguments points to the `Todo:1` entity. - The `Todo:1` entity's `author` field points to the `Author:1` entity. In Graphcache, these "links" are stored in a nested structure per-entity. "Records" are kept separate from this relational data. ![Normalization is based on types, keys, and relations. This information can all be inferred from the query document.](../assets/query-document-info.png) ## Storing Normalized Data At its core, normalizing data means that we take individual fields and store them in a table. In our case we store all values of fields in a dictionary of their primary key, generated from an ID or other key and type name, and the field’s name and arguments, if it has any. | Primary Key | Field | Value | | ---------------------- | ----------------------------------------------- | ------------------------ | | Type name and ID (Key) | Field name (not alias) and optionally arguments | Scalar value or relation | To reiterate we have three pieces of information that are stored in tables: - The entity's key can be derived from its type name via the `__typename` field and a keyable field. By default _Graphcache_ will check the `id` and `_id` fields, however this is configurable. - The field's name (like `todo`) and optional arguments. If the field has any arguments then we can normalize it by JSON stringifying the arguments, making sure that the JSON key is stable by sorting its keys. - Lastly, we may store relations as either `null`, a primary key that refers to another entity, or a list of such. For storing "records" we can store the scalars in a separate table. In _Graphcache_ the data structure for these tables looks a little like the following, where each entity has a record from fields to other entity keys: ```js { links: Map { 'Query': Record { 'todo({"id":1})': 'Todo:1' }, 'Todo:1': Record { 'author': 'Author:1' }, 'Author:1': Record { }, } } ``` We can see how the normalized cache is now able to traverse a GraphQL query by starting on the `Query` entity and retrieve relations for other fields. To retrieve "records" which are all fields with scalar values and no selection sets, _Graphcache_ keeps a second table around with an identical structure. This table only contains scalar values, which keeps our non-relational data away from our "links": ```js { records: Map { 'Query': Record { '__typename': 'Query' }, 'Todo:1': Record { '__typename': 'Todo', 'id': 1, 'title': 'implement graphcache' }, 'Author:1': Record { '__typename': 'Author', 'id': 1, 'name': 'urql-team' }, } } ``` This is very similar to how we'd go about creating a state management store manually, except that _Graphcache_ can use the GraphQL document to perform this normalization automatically. What we gain from this normalization is that we have a data structure that we can both read from and write to, to reproduce the API results for GraphQL query documents. Any mutation or subscription can also be written to this data structure. Once _Graphcache_ finds a keyable entity in their results it's written to its relational table which may update other queries in our application. Similarly queries may share data between one another which means that they effectively share entities using this approach and can update one another. In other words, once we have a primary key like `"Todo:1"` we may find this primary key again in other entities in other GraphQL results. ## Custom Keys and Non-Keyable Entities In the above introduction we've learned that while _Graphcache_ doesn't enforce `id` fields on each entity, it checks for the `id` and `_id` fields by default. There are many situations in which entities may either not have a key field or have different keys. As _Graphcache_ traverses JSON data and a GraphQL query document to write data to the cache you may see a warning from it along the lines of ["Invalid key: [...] No key could be generated for the data at this field."](./errors.md/#15-invalid-key) _Graphcache_ has many warnings like these that attempt to detect undesirable behaviour and helps us to update our configuration or queries accordingly. In the simplest cases, we may simply have forgotten to add the `id` field to the selection set of our GraphQL query document. However, what if the field is instead called `uuid` and our query looks accordingly different? ```graphql { item { uuid } } ``` In the above selection set we have an `item` field that has a `uuid` field rather than an `id` field. This means that _Graphcache_ won't automatically be able to generate a primary key for this entity. Instead, we have to help it generate a key by passing it a custom `keys` config: ```js cacheExchange({ keys: { Item: data => data.uuid, }, }); ``` We may add a function as an entry to the `keys` configuration. The property here, `"Item"` must be the typename of the entity for which we're generating a key. The function may return an arbitarily generated key. So for our `item` field, which in our example schema gives us an `Item` entity, we can create a `keys` configuration entry that creates a key from the `uuid` field rather than the `id` field. This also raises a question, **what does _Graphcache_ do with unkeyable data by default? And, what if my data has no key?**
This special case is what we call "embedded data". Not all types in a GraphQL schema will have keyable fields and some types may just abstract data without themselves being relational. They may be "edges", entities that have a field pointing to other entities that simply connect two entities, or data types like a `GeoJson` or `Image` type. In these cases, where the normalized cache encounters unkeyable types, it will create an embedded key by using the parent's primary key and combining it with the field key. This means that "embedded entities" are only reachable from a specific field on their parent entities. They're globally unique and aren't strictly speaking relational data. ```graphql { __typename todo(id: 1) { id image { url width height } } } ``` In the above example we're querying an `Image` type on a `Todo`. This imaginary `Image` type has no key because the image is embedded data and will only ever be associated to this `Todo`. In other words, the API's schema doesn't consider it necessary to have a primary key field for this type. Maybe it doesn't even have an ID in our backend's database. We _could_ assign this type an imaginary key (maybe based on the `url`) but in fact if it's not shared data it wouldn't make much sense to do so. When _Graphcache_ attempts to store this entity it will issue the previously mentioned warning. Internally, it'll then generate an embedded key for this entity based on the parent entity. If the parent entity's key is `Todo:1` then the embedded key for our `Image` will become `Todo:1.image`. This is also how this entity will be stored internally by _Graphcache_: ```js { records: Map { 'Todo:1.image': Record { '__typename': 'Image', 'url': '...', 'width': 1024, 'height': 768 }, } } ``` This doesn't however mute the warning that _Graphcache_ outputs, since it believes we may have made a mistake. The warning itself gives us advice on how to mute it: > If this is intentional, create a keys config for `Image` that always returns null. Meaning, that we can add an entry to our `keys` config for our non-keyable type that explicitly returns `null`, which tells _Graphcache_ that the entity has no key: ```js cacheExchange({ keys: { Image: () => null, }, }); ``` ### Flexible Key Generation In some cases, you may want to create a pattern for your key generation. For instance, you may want to say "create a special key for every type ending in `'Node'`. In such a case we recommend creating a small JS `Proxy` to take care of key generation for you and making the keys functional. ```js cacheExchange({ keys: new Proxy( { Image: () => null, }, { get(target, prop, receiver) { if (prop.endsWith('Node')) { return data => data.uid; } const fallback = data => data.uuid; return target[prop] || fallback; }, } ), }); ``` In the above example, we dynamically change the key generator depending on the typename. When a typename ends in `'Node'`, we return a key generator that uses the `uid` field. We still fall back to an object of manual key generation functions however. Lastly though, when a type doesn't have a predefined key generator, we change the default behavior from using `id` and `_id` fields to using `uuid` fields. ## Non-Automatic Relations and Updates While _Graphcache_ is able to store and update our entities in an in-memory relational data structure, which keeps the same entities in singular unique locations, a GraphQL API may make a lot of implicit changes to the relations of data as it runs or have trivial relations that our cache doesn't need to see to resolve. Like with the `keys` config, we have two more configuration options to combat this: `resolvers` and `updates`. ### Manually resolving entities Some fields in our configuration can be resolved without checking the GraphQL API for relations. The `resolvers` config allows us to create a list of client-side resolvers where we can read from the cache directly as _Graphcache_ creates a local GraphQL result from its cached data. ```graphql { todo(id: 1) { id } } ``` Previously we've looked at the above query to illustrate how data from a GraphQL API may be written to _Graphcache_'s relational data structure to store the links and entities in a result against this GraphQL query document. However, it may be possible for another query to have already written this `Todo` entity to the cache. So, **how do we resolve a relation manually?** In such a case, _Graphcache_ may have seen and stored the `Todo` entity but isn't aware of the relation between `Query.todo({"id":1})` and the `Todo:1` entity. However, we can tell _Graphcache_ which entity it should look for when it accesses the `Query.todo` field by creating a resolver for it: ```js cacheExchange({ resolvers: { Query: { todo(parent, args, cache, info) { return { __typename: 'Todo', id: args.id }; }, }, }, }); ``` A resolver is a function that's similar to [GraphQL.js' resolvers on the server-side](https://www.graphql-tools.com/docs/resolvers/). They receive the parent data, the field's arguments, access to _Graphcache_'s cached data, and an `info` object. [The entire function signature and more explanations can be found in the API docs.](../api/graphcache.md#resolvers-option) Since it can access the field's arguments from the GraphQL query document, we can return a partial `Todo` entity. As long as this object is keyable, it will tell _Graphcache_ what the key of the returned entity is. In other words, we've told it how to get to a `Todo` from the `Query.todo` field. This mechanism is immensely more powerful than this example. We have other use-cases that resolvers may be used for: - Resolvers can be applied to fields with records, which means that it can be used to change or transform scalar values. For instance, we can update a string or parse a `Date` right inside a resolver. - Resolvers can return deeply nested results, which will be layered on top of the in-memory relational cached data of _Graphcache_, which means that it can emulate infinite pagination and other complex behaviour. - Resolvers can change when a cache miss or hit occurs. Returning `null` means that a field’s value is literally `null`, which will not cause a cache miss, while returning `undefined` will mean a field’s value is uncached. - Resolvers can return either partial entities or keys, so we can chain `cache.resolve` calls to read fields from the cache, even when a field is pointing at another entity, since we can return keys to the other entity directly. [Read more about resolvers on the following page about "Local Resolvers".](./local-resolvers.md) ### Manual cache updates While `resolvers`, as shown above, operate while _Graphcache_ is reading from its in-memory cache, `updates` are a configuration option that operate while _Graphcache_ is writing to its cached data. Specifically, these functions can be used to add more updates onto what a `Mutation` or `Subscription` may automatically update. As stated before, a GraphQL schema's data may undergo a lot of implicit changes when we send it a `Mutation` or `Subscription`. A new item that we create may for instance manipulate a completely different item or even a list. Often mutations and subscriptions alter relations that their selection sets wouldn't necessarily see. Since mutations and subscriptions operate on a different root type, rather than the `Query` root type, we often need to update links in the rest of our data when a mutation is executed. ```graphql query TodosList { todos { id title } } mutation AddTodo($title: String!) { addTodo(title: $title) { id title } } ``` In a simple example, like the one above, we have a list of todos in a query and create a new todo using the `Mutation.addTodo` mutation field. When the mutation is executed and we get the result back, _Graphcache_ already writes the `Todo` item to its normalized cache. However, we also want to add the new `Todo` item to the list on `Query.todos`: ```js import { gql } from '@urql/core'; cacheExchange({ updates: { Mutation: { addTodo(result, args, cache, info) { const query = gql` { todos { id } } `; cache.updateQuery({ query }, data => { data.todos.push(result.addTodo); return data; }); }, }, }, }); ``` In this code example we can first see that the signature of the `updates` entry is very similar to the one of `resolvers`. However, we're seeing the `cache` in use for the first time. The `cache` object (as [documented in the API docs](../api/graphcache.md#cache)) gives us access to _Graphcache_'s mechanisms directly. Not only can we resolve data using it, we can directly start sub-queries or sub-writes manually. These are full normalized cache runs inside other runs. In this case we're calling `cache.updateQuery` on a list of `Todo` items while the `Mutation` that added the `Todo` is already being written to the cache. As we can see, we may perform manual changes inside of `updates` functions, which can be used to affect other parts of the cache (like `Query.todos` here) beyond the automatic updates that a normalized cache is expected to perform. We get methods like `cache.updateQuery`, `cache.writeFragment`, and `cache.link` in our updater functions, which aren't available to us in local resolvers, and can only be used in these `updates` entries to change the data that the cache holds. [Read more about writing cache updates on the "Cache Updates" page.](./cache-updates.md) ## Deterministic Cache Updates Above, in [the "Storing Normalized Data" section](#storing-normalized-data), we've talked about how Graphcache is able to store normalized data. However, apart from storing this data there are a couple of caveats that many applications simply ignore, skip, or simplify when they implement a store to cache their data in. Amongst features like [Optimistic Updates](./cache-updates.md#optimistic-updates) and [Offline Support](./offline.md), Graphcache supports several features that allow our API results to be more unreliable. Essentially we don't expect API results to always come back in order or on time. However, we expect Graphcache to prevent us from making "indeterministic cache updates", meaning that we expect it to handle API results that come back in a random order and delayed gracefully. In terms of the ["Manual Cache Updates"](#manual-cache-updates) that we've talked about above and [Optimistic Updates](./cache-updates.md#optimistic-updates) the limitations are pretty simple at first and if we use Graphcache as usual we may not even notice them: - When we make an _optimistic_ change, we define what a mutation's result may look like once the API responds in the future and apply this temporary result immediately. We store this temporary data in a separate "layer". Once the real result comes back this layer can be deleted and the real API result can be applied as usual. - When multiple _optimistic updates_ are made at the same time, we never allow these layers to be deleted separately. Instead Graphcache waits for all mutations to complete before deleting the optimistic layers and applying the real API result. This means that a mutation update cannot accidentally commit optimistic data to the cache permanently. - While an _optimistic update_ has been applied, Graphcache stops refetching any queries that contain this optimistic data so that it doesn't "flip back" to its non-optimistic state without the optimistic update being applied. Otherwise we'd see a "flicker" in the UI. These three principles are the basic mechanisms we can expect from Graphcache. The summary is: **Graphcache groups optimistic mutations and pauses queries so that optimistic updates look as expected,** which is an implementation detail we can mostly ignore when using it. However, one implementation detail we cannot ignore is the last mechanism in Graphcache which is called **"Commutativity"**. As we can tell, "optimistic updates" need to store their normalized results on a separate layer. This means that the previous data structure we've seen in Graphcache is actually more like a list, with many tables of links and entities. Each layer may contain optimistic results and have an order of preference. However, this order also applies to queries. Since queries are run in one order but their API results can come back to us in a very different order, if we access enough pages in a random order things can sometimes look rather weird. We may see that in an application on a slow network connection the results may vary depending on when their results came back. ![Commutativity means that we store data in separate layers.](../assets/commutative-layers.png) Instead, Graphcache actually uses layers for any API result it receives. In case, an API result arrives out-of-order, it sorts them by precedence — or rather by when they've been requested. Overall, we don't have to worry about this, but Graphcache has mechanisms that keep our updates safe. ## Reading on This concludes the introduction to Graphcache with a short overview of how it works, what it supports, and some hidden mechanisms and internals. Next we may want to learn more about how to use it and more of its features: - [How do we write "Local Resolvers"?](./local-resolvers.md) - [How to set up "Cache Updates" and "Optimistic Updates"?](./cache-updates.md) - [What is Graphcache's "Schema Awareness" feature for?](./schema-awareness.md) - [How do I enable "Offline Support"?](./offline.md)