Conflict-Free Replicated Data Types (CRDTs) in Swift

Other Posts in Series

If you read any of my earlier posts here, you were probably left wondering what the name of the blog was all about. When I came up with the idea, I was planning to blog about decentralized approaches to handling app data, ie, how to sync with other devices, servers, extensions, and even multiple windows within the same app. Today I will finally begin on that journey…

Introducing Replicating Types

I want to begin a series of posts on the new coolness in the world of sync: Conflict-Free Replicated Data Types (CRDTs). CRDTs, which I will simply call ‘replicating types’ from here on, are data types that have the ability to merge themselves when changes are made on different devices, in order to reach a consistent state. They have built-in logic which allows them to be updated independently, and then merged back together in a completely deterministic way, such that all syncing devices end up with the same value.

What does this mean in practice? When using replicating types in Swift, you will typically replace the standard data types of your properties (eg String, Int) with more advanced types, like so

struct Note {
    enum Priority {
        case low, normal, high
    }

    var title: ReplicatingRegister<String> = .init("Untitled Note")
    var text: ReplicatingArray<Character> = .init()
    var priority: ReplicatingRegister<Priority> = .init(.normal)
}

These new types still hold the value you need in your model, but also have the ‘intelligence’ to be able to merge in changes made on other devices, and reach a consistent value on all devices.

Where is the Server?

Note that I ascribe the sync ‘intelligence’ to the type itself, rather than to any framework or server. There is no central server deciding which values to keep, and which to discard. There is no truth — decentral apps rely on trust rather than truth. Each device follows the same set of well structured rules, and stays consistent with the other devices without any direct means of communication.

There is no truth — decentral apps rely on trust rather than truth.

Servers and cloud services, where they are used, are purely for data transport and storage. They aren’t actively involved in the policy making of sync. This makes replicating types ideal for utilizing with CloudKit, Dropbox, Firebase, MondoDB, Amazon DynamoDB, WebDAV, S3 and any other online data storage. No need to write any Node.js or Go; everything you need to have a robust syncing app already exists right in the client OS.

A Look Ahead

I’ll start this series off with the fundamental maths of replicating types. Don’t fret, I will make it very short, and very approachable, describing things in everyday terms.

In the coming posts, we’ll start to build simple replicating types, and eventually more advanced ones. By the end of the series, we will have a kit of types that can be used to develop an app with full sync capability.

And that is exactly what we will do: we’ll make a basic note taking app with offline storage, which syncs across devices via CloudKit, and allows simultaneous edits to the same note.

Roll Your Own

You may ask, “Why doesn’t he just give us the code and be done with it, instead of writing a blog series?” If the goal was simply to produce a library of replicating types, that would have made sense, but I aspire to something more — I want you to be able to build your own replicating types.

Yes, you can take the types we make here and use them in your own apps, but the broader goal is to get you to start thinking about how replicating data types work, the aspects they have in common, and the tricks you can use to design them. Ultimately, you will be able to design and build your own replicating types tailored to the specific challenges you encounter in your own apps.

Math Made Manageable

I promised a little math, and I promise this will be the last little math. But it is useful to understand a few mathematical aspects of replicating types before we start. It will make it easier to evaluate data types, to understand which fit in the scope of replicating types, and which don’t. And most importantly, how we might convert a standard data type into a replicating type.

The replicating types we will design are state-based CRDTs. What this means is that when we come to sync, we will copy the whole value of the replicating type to other devices, rather than just any changes that have been made since the last sync. (The latter would be an operation-based CRDT.)

State-based replicating types all share the ability to merge. A single value may be copied to other devices, edited in different ways, and then transferred back to the original device. We can then ask the replicated value to merge with each of the modified values, resulting in a single value that is the same on every device.

The merging of state-based CRDTs must conform to a few basic mathematical principles. It must be associative, commutative, and idempotent. These terms may seem intimidating, but they are really just a way of saying that it doesn’t matter…

  1. How the replicated values get paired together for merging.
  2. What order the merging of values takes place.
  3. Whether a given value is merged just once, or multiple times.


It is largely common sense. You could imagine that if any of these points were not true, the data would quickly get out of sync, as each device syncs values from other devices in different orders, and perhaps multiple times. (With no central server, there is no coordination between devices to avoid such scenarios.)

Associative

Imagine you are an app with some piece of text, running on Device A. You receive two edited values of that text, made on Devices B and C. And imagine the other devices also get the recent text from their counterparts, and the goal is that they all eventually agree on what the final value of the text should be.

On Device A, we might merge together the text from B and C first, then merge the result with our own version of the text. Device B may merge the text from A with its own value first, and then the result with the text from C.

If we are going to end up with the same answer, it should not matter how the values are paired up during the merge. The values can be associated (paired) with each other in any way, and we should still get the same result.

(A merged-with B) merged-with C = A merged-with (B merged-with C)

In short, the type is associative if it doesn’t matter how you pair the merge operations from different devices. Whichever pairing you use, you will get the same answer.

(As an common example, addition is an operation that we are all familiar with which is associative. Eg 1+(2+3) = (1+2)+3.)

Commutative

Operations are commutative if they can be reordered, and the answer remains the same. Addition is commutative, as well as associative. For example,

2 + 3 + 4 = 4 + 3 + 2

In the context of replicating types, the merge function is commutative if the order of the values doesn’t matter.

A merged-with B = B merged-with A

Idempotent

We are all familiar with associative and commutive operators in basic arithmetic, even if we have never heard the terms before, but idempotency is a little more obscure. It just means that after an operator has been applied once, applying it again has no effect.

There are no really simple arithmetic operator that exhibits this behavior. For example, addition is not idempotent. An operator that adds 1 will produce ever larger results when applied multiple times.

3 + 1 ≠ 3 + 1 + 1

But you also don’t have to look very far to find fairly simple idempotent operators. An example from Swift is the max function. Imagine an operator that takes the maximum of some variable with the integer 10.

f(x) := max(x, 10)

Applying this function twice will be equivalent to applying it once. The second application will have no effect, because the result will already be the maximum of the two values.

max(max(x, 10), 10) = max(x, 10)

For our replicating types, idempotency implies that if we merge the same value twice (or more), we will get the same answer as if we only merged once. In short, each change will be incorporated once, and only once. Additional attempts to apply exactly the same edit will be ignored.

Next Time…

In the next installment, we we build our first replicating type. We’ll start with something simple, but nonetheless useful. There is even a strong possibility you have used it yourself at some point…

1 thought on “Conflict-Free Replicated Data Types (CRDTs) in Swift

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s