Monthly Archives: July 2020

A First Replicating Type

Source code for this series is in this GitHub repo. It will be extended with each post. Other Posts in Series

When I first started reading about Conflict-Free Replicated Data Types (CRDTs), I expected a fertile new field of Computer Science dedicated to a whole new approach to managing data. I was soon somewhat disappointed, because everything I learned was accompanied by a generaous slather of déjà vu (yes, I Googled the accents). It felt like every replicating type encountered was already in widespread use. Heck, I had used half of them myself!

So CRDTs are not always new, but it is still good that they are being baptised and catagorized. It’s important to recognize patterns in solutions; it makes ideas easier to convey to others, gives a common language, and allows a new generation of types to be built up on the shoulders of existing ones.

Today we are going to look at a replicating type — our first in this series — that I am pretty sure you will have seen before in some way, shape or form. Your reaction may be to ask “How is that a CRDT when I already use it regularly?” Don’t get discouraged. There are more advanced types coming down the pipe, but walk before run you must.

The Replicating Register

Ask yourself what most people expect to happen when they make changes to the same data on two different devices. In nearly all cases, they would expect the most recent change to survive after the data is merged. In many apps, this is all the merge logic you need — last change wins.

What would a type that encapsulated this simple behavior look like? Well, like this

public struct ReplicatingRegister<T> {
    
    fileprivate struct Entry: Identifiable {
        var value: T
        var timestamp: TimeInterval
        var id: UUID
        
        init(value: T, 
            timestamp: TimeInterval = 
                Date().timeIntervalSinceReferenceDate, 
            id: UUID = UUID()) {
            self.value = value
            self.timestamp = timestamp
            self.id = id
        }
        
        func isOrdered(after other: Entry) -> Bool {
            (timestamp, id.uuidString) > 
                (other.timestamp, other.id.uuidString)
        }
    }
    
    private var entry: Entry
    
    public var value: T {
        get {
            entry.value
        }
        set {
            entry = Entry(value: newValue)
        }
    }
    
    public init(_ value: T) {
        entry = Entry(value: value)
    }
}

Aside from a few odds and ends, which we will deal with below, this code is the whole type. It’s really very simple: a ReplicatingRegister is a type that stores a value together with the timestamp at which the value was last changed.

The ReplicatingRegister is generic on the type of the value it contains. You can put any value you like in a register. Here is how you would create a register for an Int.

var a = ReplicatingRegister(5)

We can update the value like this

a.value = 6

and get the value out again like this

let number = a.value // number becomes 6

The Workings of a Register

You can see from the code above that the ReplicatingRegister contains a helper struct called Entry.

fileprivate struct Entry: Identifiable {
    var value: T
    var timestamp: TimeInterval
    var id: UUID

    init(value: T, 
        timestamp: TimeInterval = 
            Date().timeIntervalSinceReferenceDate, 
        id: UUID = UUID()) {
        self.value = value
        self.timestamp = timestamp
        self.id = id
    }

    func isOrdered(after other: Entry) -> Bool {
        (timestamp, id.uuidString) > 
            (other.timestamp, other.id.uuidString)
    }
}

Entry contains the actual value, but also a timestamp. When the value of the register gets set, a whole new Entry is created, with a new timestamp.

public var value: T {
    get {
        entry.value
    }
    set {
        entry = Entry(value: newValue)
    }
}

The isOrdered function determines which of two entries is most recent, which we will need when we come to merge registers a bit later.

Murphy’s Law and Unique Identifiers

You may be wondering why the Entry type includes a UUID identifier. It already has a timestamp, which is an identity of a sort. Isn’t that timestamp unique enough?

Maybe, but you will sleep better at night if you assume it is not unique enough. A timestamp has something like millisecond accuracy. A computing device can do thousands, even millions of operations in that time. Two changes to the same value on the same device may very well fall on exactly the same tick of the timestamp clock.

What would happen if we used the timestamp in isolation? If two changes collided — had the same timestamp — the ‘winner’ would effectively be random. Your devices could easily pick different outcomes, and your type will have diverged — your app is no longer in sync. To avoid this, we need some way to pick the same winner on all devices, even if the timestamps are exactly the same. For that, we add the UUID. It ensures a deterministic result in cases where the timestamps collide.

Merging Registers

There is one ability that all replicating types must have: merging with other replicas. We can capture this in a very simple protocol.

public protocol Replicable {
    func merged(with other: Self) -> Self
}

Merging two values of the same register simply involves picking the latest one.

extension ReplicatingRegister: Replicable {
    
    public func merged(with other: ReplicatingRegister) -> 
        ReplicatingRegister {
        entry.isOrdered(after: other.entry) ? self : other
    }
    
}

It’s so simple, it probably should be forbidden. With just this type in your arsenal, you can build very robust distributed data applications.

Testing the Type

Writing unit tests for replicating types is generally pretty straightforward, because they are so self contained. Typically you add a few basic creation and update tests

final class ReplicatingArrayTests: XCTestCase {
    
    var a: ReplicatingRegister<Int>!
    var b: ReplicatingRegister<Int>!

    override func setUp() {
        super.setUp()
        a = .init(1)
        b = .init(2)
    }
    
    func testInitialCreation() {
        XCTAssertEqual(a.value, 1)
    }
    
    func testSettingValue() {
        a.value = 2
        XCTAssertEqual(a.value, 2)
        a.value = 3
        XCTAssertEqual(a.value, 3)
    }

Then you will want to test merging

func testMergeOfInitiallyUnrelated() {
    let c = a.merged(with: b)
    XCTAssertEqual(c.value, b.value)
}

func testLastChangeWins() {
    a.value = 3
    let c = a.merged(with: b)
    XCTAssertEqual(c.value, a.value)
}

And lastly, you should run through the mathematical requirements we discussed in the first post of the series: associativity, commutativity, and idempotency.

If you recall, merging is associative if it doesn’t matter how you group values. It should not matter if we merge register a with b, and the result with c; or we begin with b and c, and merge the result with a.

func testAssociativity() {
    let c: ReplicatingRegister<Int> = .init(3)
    let e = a.merged(with: b).merged(with: c)
    let f = a.merged(with: b.merged(with: c))
    XCTAssertEqual(e.value, f.value)
}

Commutativity of registers means that it doesn’t matter whether you merge a with b, or b with a.

func testCommutativity() {
    let c = a.merged(with: b)
    let d = b.merged(with: a)
    XCTAssertEqual(d.value, c.value)
}

And idempotency simply means that merging the same values multiple times has no effect on the outcome: you get the same answer as the first time you merge.

func testIdempotency() {
    let c = a.merged(with: b)
    let d = c.merged(with: b)
    let e = c.merged(with: a)
    XCTAssertEqual(c.value, d.value)
    XCTAssertEqual(c.value, e.value)
}

The Weaknesses of a Register

If registers are so simple, and you can build whole apps with them, why do we need other, more complex replicating types? Registers work fine for simple values that we want to change atomically. Numbers, simple strings, enums and the like, all work well with registers. All changes are atomic. We don’t want changes from one device ‘mixing’ with those from another, to give half the value from each. We want one of the changes to win outright.

But there are many cases where this is undesirable. Imagine you have someone’s full name. It would be a shame if you couldn’t edit the first and last names independently. The solution in this case could be as simple as using two different registers, but in other cases, even that may not suffice. Imagine you are developing a collaborative text editor. It would be very limiting if someone entering some text on one device caused all the changes made by someone on another device to be overridden and lost. In that situation, we want to combine the changes; we need a more sophisticated replicating type, capable of partial merges.

Conditional Conformance to Protocols

That covers the replication of registers, but there are a few other Swifty aspects you will probably want to include in your replicating types. For example, you may wish to make them conform to protocols like Equatable and Hashable. Of course, this will only work if the contained value also conforms to the protocol, so we use conditional conformance for this.

extension ReplicatingRegister: Equatable where T: Equatable {
}

extension ReplicatingRegister.Entry: Equatable where T: Equatable {
}

extension ReplicatingRegister: Hashable where T: Hashable {
}

extension ReplicatingRegister.Entry: Hashable where T: Hashable {
}

You need to be a bit careful here. It is very easy to confuse the equality or hashability of the contained value with that of the replicating type. They aren’t the same. It is possible that two replicas have the same value, but the replicas themselves are not equal, having been set on different devices with different timestamps. Keep clear in your head whether you want to test the whole state of the replicating type, or just the value it contains.

Codability

One of the more important conditional conformances for replicating types is the Codable protocol. Without a way to serialize the replicating types, we have no way to transfer them between devices for sync purposes. In most cases, it is trivial to add.

extension ReplicatingRegister: Codable where T: Codable {
}

extension ReplicatingRegister.Entry: Codable where T: Codable {
}

And a test confirms that it works as expected.

func testCodable() {
    let data = try! JSONEncoder().encode(a)
    let d = try! JSONDecoder().decode(ReplicatingRegister<Int>.self, 
        from: data)
    XCTAssertEqual(a, d)
}

Property Wrappers

The ReplicatingRegister struct is so simple, you may be tempted to use it as a property wrapper. You could certainly do that, but I have avoided the urge, because the goal of property wrappers is to hide the enclosing type and have things just work as if it weren’t there. With replicating types, you often need to manipulate the type directly — it is not just a decorative shell around your data. This is even more the case with the advanced types we will meet later in the series. For this reason, I resist the urge to hide away the replicating type containers. They are first class citizens here.

Next Time…

That’s it. Your first replicating type! Simple really, but very powerful. Next time we will move to containers, and start to build replicating types for storing multiple values.

Conflict-Free Replicated Data Types (CRDTs) in Swift

Other Posts in Series

If you read any of my earlier posts here, you were probably left wondering what the name of the blog was all about. When I came up with the idea, I was planning to blog about decentralized approaches to handling app data, ie, how to sync with other devices, servers, extensions, and even multiple windows within the same app. Today I will finally begin on that journey…

Introducing Replicating Types

I want to begin a series of posts on the new coolness in the world of sync: Conflict-Free Replicated Data Types (CRDTs). CRDTs, which I will simply call ‘replicating types’ from here on, are data types that have the ability to merge themselves when changes are made on different devices, in order to reach a consistent state. They have built-in logic which allows them to be updated independently, and then merged back together in a completely deterministic way, such that all syncing devices end up with the same value.

What does this mean in practice? When using replicating types in Swift, you will typically replace the standard data types of your properties (eg String, Int) with more advanced types, like so

struct Note {
    enum Priority {
        case low, normal, high
    }

    var title: ReplicatingRegister<String> = .init("Untitled Note")
    var text: ReplicatingArray<Character> = .init()
    var priority: ReplicatingRegister<Priority> = .init(.normal)
}

These new types still hold the value you need in your model, but also have the ‘intelligence’ to be able to merge in changes made on other devices, and reach a consistent value on all devices.

Where is the Server?

Note that I ascribe the sync ‘intelligence’ to the type itself, rather than to any framework or server. There is no central server deciding which values to keep, and which to discard. There is no truth — decentral apps rely on trust rather than truth. Each device follows the same set of well structured rules, and stays consistent with the other devices without any direct means of communication.

There is no truth — decentral apps rely on trust rather than truth.

Servers and cloud services, where they are used, are purely for data transport and storage. They aren’t actively involved in the policy making of sync. This makes replicating types ideal for utilizing with CloudKit, Dropbox, Firebase, MondoDB, Amazon DynamoDB, WebDAV, S3 and any other online data storage. No need to write any Node.js or Go; everything you need to have a robust syncing app already exists right in the client OS.

A Look Ahead

I’ll start this series off with the fundamental maths of replicating types. Don’t fret, I will make it very short, and very approachable, describing things in everyday terms.

In the coming posts, we’ll start to build simple replicating types, and eventually more advanced ones. By the end of the series, we will have a kit of types that can be used to develop an app with full sync capability.

And that is exactly what we will do: we’ll make a basic note taking app with offline storage, which syncs across devices via CloudKit, and allows simultaneous edits to the same note.

Roll Your Own

You may ask, “Why doesn’t he just give us the code and be done with it, instead of writing a blog series?” If the goal was simply to produce a library of replicating types, that would have made sense, but I aspire to something more — I want you to be able to build your own replicating types.

Yes, you can take the types we make here and use them in your own apps, but the broader goal is to get you to start thinking about how replicating data types work, the aspects they have in common, and the tricks you can use to design them. Ultimately, you will be able to design and build your own replicating types tailored to the specific challenges you encounter in your own apps.

Math Made Manageable

I promised a little math, and I promise this will be the last little math. But it is useful to understand a few mathematical aspects of replicating types before we start. It will make it easier to evaluate data types, to understand which fit in the scope of replicating types, and which don’t. And most importantly, how we might convert a standard data type into a replicating type.

The replicating types we will design are state-based CRDTs. What this means is that when we come to sync, we will copy the whole value of the replicating type to other devices, rather than just any changes that have been made since the last sync. (The latter would be an operation-based CRDT.)

State-based replicating types all share the ability to merge. A single value may be copied to other devices, edited in different ways, and then transferred back to the original device. We can then ask the replicated value to merge with each of the modified values, resulting in a single value that is the same on every device.

The merging of state-based CRDTs must conform to a few basic mathematical principles. It must be associative, commutative, and idempotent. These terms may seem intimidating, but they are really just a way of saying that it doesn’t matter…

  1. How the replicated values get paired together for merging.
  2. What order the merging of values takes place.
  3. Whether a given value is merged just once, or multiple times.


It is largely common sense. You could imagine that if any of these points were not true, the data would quickly get out of sync, as each device syncs values from other devices in different orders, and perhaps multiple times. (With no central server, there is no coordination between devices to avoid such scenarios.)

Associative

Imagine you are an app with some piece of text, running on Device A. You receive two edited values of that text, made on Devices B and C. And imagine the other devices also get the recent text from their counterparts, and the goal is that they all eventually agree on what the final value of the text should be.

On Device A, we might merge together the text from B and C first, then merge the result with our own version of the text. Device B may merge the text from A with its own value first, and then the result with the text from C.

If we are going to end up with the same answer, it should not matter how the values are paired up during the merge. The values can be associated (paired) with each other in any way, and we should still get the same result.

(A merged-with B) merged-with C = A merged-with (B merged-with C)

In short, the type is associative if it doesn’t matter how you pair the merge operations from different devices. Whichever pairing you use, you will get the same answer.

(As an common example, addition is an operation that we are all familiar with which is associative. Eg 1+(2+3) = (1+2)+3.)

Commutative

Operations are commutative if they can be reordered, and the answer remains the same. Addition is commutative, as well as associative. For example,

2 + 3 + 4 = 4 + 3 + 2

In the context of replicating types, the merge function is commutative if the order of the values doesn’t matter.

A merged-with B = B merged-with A

Idempotent

We are all familiar with associative and commutive operators in basic arithmetic, even if we have never heard the terms before, but idempotency is a little more obscure. It just means that after an operator has been applied once, applying it again has no effect.

There are no really simple arithmetic operator that exhibits this behavior. For example, addition is not idempotent. An operator that adds 1 will produce ever larger results when applied multiple times.

3 + 1 ≠ 3 + 1 + 1

But you also don’t have to look very far to find fairly simple idempotent operators. An example from Swift is the max function. Imagine an operator that takes the maximum of some variable with the integer 10.

f(x) := max(x, 10)

Applying this function twice will be equivalent to applying it once. The second application will have no effect, because the result will already be the maximum of the two values.

max(max(x, 10), 10) = max(x, 10)

For our replicating types, idempotency implies that if we merge the same value twice (or more), we will get the same answer as if we only merged once. In short, each change will be incorporated once, and only once. Additional attempts to apply exactly the same edit will be ignored.

Next Time…

In the next installment, we we build our first replicating type. We’ll start with something simple, but nonetheless useful. There is even a strong possibility you have used it yourself at some point…