Source code for this series is in this GitHub repo. It will be extended with each post. Other Posts in Series
When I first started reading about Conflict-Free Replicated Data Types (CRDTs), I expected a fertile new field of Computer Science dedicated to a whole new approach to managing data. I was soon somewhat disappointed, because everything I learned was accompanied by a generaous slather of déjà vu (yes, I Googled the accents). It felt like every replicating type encountered was already in widespread use. Heck, I had used half of them myself!
So CRDTs are not always new, but it is still good that they are being baptised and catagorized. It’s important to recognize patterns in solutions; it makes ideas easier to convey to others, gives a common language, and allows a new generation of types to be built up on the shoulders of existing ones.
Today we are going to look at a replicating type — our first in this series — that I am pretty sure you will have seen before in some way, shape or form. Your reaction may be to ask “How is that a CRDT when I already use it regularly?” Don’t get discouraged. There are more advanced types coming down the pipe, but walk before run you must.
The Replicating Register
Ask yourself what most people expect to happen when they make changes to the same data on two different devices. In nearly all cases, they would expect the most recent change to survive after the data is merged. In many apps, this is all the merge logic you need — last change wins.
What would a type that encapsulated this simple behavior look like? Well, like this
public struct ReplicatingRegister<T> {
fileprivate struct Entry: Identifiable {
var value: T
var timestamp: TimeInterval
var id: UUID
init(value: T,
timestamp: TimeInterval =
Date().timeIntervalSinceReferenceDate,
id: UUID = UUID()) {
self.value = value
self.timestamp = timestamp
self.id = id
}
func isOrdered(after other: Entry) -> Bool {
(timestamp, id.uuidString) >
(other.timestamp, other.id.uuidString)
}
}
private var entry: Entry
public var value: T {
get {
entry.value
}
set {
entry = Entry(value: newValue)
}
}
public init(_ value: T) {
entry = Entry(value: value)
}
}
Aside from a few odds and ends, which we will deal with below, this code is the whole type. It’s really very simple: a ReplicatingRegister
is a type that stores a value together with the timestamp at which the value was last changed.
The ReplicatingRegister
is generic on the type of the value it contains. You can put any value you like in a register. Here is how you would create a register for an Int
.
var a = ReplicatingRegister(5)
We can update the value like this
a.value = 6
and get the value out again like this
let number = a.value // number becomes 6
The Workings of a Register
You can see from the code above that the ReplicatingRegister
contains a helper struct called Entry
.
fileprivate struct Entry: Identifiable {
var value: T
var timestamp: TimeInterval
var id: UUID
init(value: T,
timestamp: TimeInterval =
Date().timeIntervalSinceReferenceDate,
id: UUID = UUID()) {
self.value = value
self.timestamp = timestamp
self.id = id
}
func isOrdered(after other: Entry) -> Bool {
(timestamp, id.uuidString) >
(other.timestamp, other.id.uuidString)
}
}
Entry
contains the actual value, but also a timestamp. When the value of the register gets set, a whole new Entry
is created, with a new timestamp.
public var value: T {
get {
entry.value
}
set {
entry = Entry(value: newValue)
}
}
The isOrdered
function determines which of two entries is most recent, which we will need when we come to merge registers a bit later.
Murphy’s Law and Unique Identifiers
You may be wondering why the Entry
type includes a UUID
identifier. It already has a timestamp, which is an identity of a sort. Isn’t that timestamp unique enough?
Maybe, but you will sleep better at night if you assume it is not unique enough. A timestamp has something like millisecond accuracy. A computing device can do thousands, even millions of operations in that time. Two changes to the same value on the same device may very well fall on exactly the same tick of the timestamp clock.
What would happen if we used the timestamp in isolation? If two changes collided — had the same timestamp — the ‘winner’ would effectively be random. Your devices could easily pick different outcomes, and your type will have diverged — your app is no longer in sync. To avoid this, we need some way to pick the same winner on all devices, even if the timestamps are exactly the same. For that, we add the UUID
. It ensures a deterministic result in cases where the timestamps collide.
Merging Registers
There is one ability that all replicating types must have: merging with other replicas. We can capture this in a very simple protocol.
public protocol Replicable {
func merged(with other: Self) -> Self
}
Merging two values of the same register simply involves picking the latest one.
extension ReplicatingRegister: Replicable {
public func merged(with other: ReplicatingRegister) ->
ReplicatingRegister {
entry.isOrdered(after: other.entry) ? self : other
}
}
It’s so simple, it probably should be forbidden. With just this type in your arsenal, you can build very robust distributed data applications.
Testing the Type
Writing unit tests for replicating types is generally pretty straightforward, because they are so self contained. Typically you add a few basic creation and update tests
final class ReplicatingArrayTests: XCTestCase {
var a: ReplicatingRegister<Int>!
var b: ReplicatingRegister<Int>!
override func setUp() {
super.setUp()
a = .init(1)
b = .init(2)
}
func testInitialCreation() {
XCTAssertEqual(a.value, 1)
}
func testSettingValue() {
a.value = 2
XCTAssertEqual(a.value, 2)
a.value = 3
XCTAssertEqual(a.value, 3)
}
Then you will want to test merging
func testMergeOfInitiallyUnrelated() {
let c = a.merged(with: b)
XCTAssertEqual(c.value, b.value)
}
func testLastChangeWins() {
a.value = 3
let c = a.merged(with: b)
XCTAssertEqual(c.value, a.value)
}
And lastly, you should run through the mathematical requirements we discussed in the first post of the series: associativity, commutativity, and idempotency.
If you recall, merging is associative if it doesn’t matter how you group values. It should not matter if we merge register a
with b
, and the result with c
; or we begin with b
and c
, and merge the result with a
.
func testAssociativity() {
let c: ReplicatingRegister<Int> = .init(3)
let e = a.merged(with: b).merged(with: c)
let f = a.merged(with: b.merged(with: c))
XCTAssertEqual(e.value, f.value)
}
Commutativity of registers means that it doesn’t matter whether you merge a
with b
, or b
with a
.
func testCommutativity() {
let c = a.merged(with: b)
let d = b.merged(with: a)
XCTAssertEqual(d.value, c.value)
}
And idempotency simply means that merging the same values multiple times has no effect on the outcome: you get the same answer as the first time you merge.
func testIdempotency() {
let c = a.merged(with: b)
let d = c.merged(with: b)
let e = c.merged(with: a)
XCTAssertEqual(c.value, d.value)
XCTAssertEqual(c.value, e.value)
}
The Weaknesses of a Register
If registers are so simple, and you can build whole apps with them, why do we need other, more complex replicating types? Registers work fine for simple values that we want to change atomically. Numbers, simple strings, enums and the like, all work well with registers. All changes are atomic. We don’t want changes from one device ‘mixing’ with those from another, to give half the value from each. We want one of the changes to win outright.
But there are many cases where this is undesirable. Imagine you have someone’s full name. It would be a shame if you couldn’t edit the first and last names independently. The solution in this case could be as simple as using two different registers, but in other cases, even that may not suffice. Imagine you are developing a collaborative text editor. It would be very limiting if someone entering some text on one device caused all the changes made by someone on another device to be overridden and lost. In that situation, we want to combine the changes; we need a more sophisticated replicating type, capable of partial merges.
Conditional Conformance to Protocols
That covers the replication of registers, but there are a few other Swifty aspects you will probably want to include in your replicating types. For example, you may wish to make them conform to protocols like Equatable
and Hashable
. Of course, this will only work if the contained value also conforms to the protocol, so we use conditional conformance for this.
extension ReplicatingRegister: Equatable where T: Equatable {
}
extension ReplicatingRegister.Entry: Equatable where T: Equatable {
}
extension ReplicatingRegister: Hashable where T: Hashable {
}
extension ReplicatingRegister.Entry: Hashable where T: Hashable {
}
You need to be a bit careful here. It is very easy to confuse the equality or hashability of the contained value with that of the replicating type. They aren’t the same. It is possible that two replicas have the same value, but the replicas themselves are not equal, having been set on different devices with different timestamps. Keep clear in your head whether you want to test the whole state of the replicating type, or just the value it contains.
Codability
One of the more important conditional conformances for replicating types is the Codable
protocol. Without a way to serialize the replicating types, we have no way to transfer them between devices for sync purposes. In most cases, it is trivial to add.
extension ReplicatingRegister: Codable where T: Codable {
}
extension ReplicatingRegister.Entry: Codable where T: Codable {
}
And a test confirms that it works as expected.
func testCodable() {
let data = try! JSONEncoder().encode(a)
let d = try! JSONDecoder().decode(ReplicatingRegister<Int>.self,
from: data)
XCTAssertEqual(a, d)
}
Property Wrappers
The ReplicatingRegister
struct is so simple, you may be tempted to use it as a property wrapper. You could certainly do that, but I have avoided the urge, because the goal of property wrappers is to hide the enclosing type and have things just work as if it weren’t there. With replicating types, you often need to manipulate the type directly — it is not just a decorative shell around your data. This is even more the case with the advanced types we will meet later in the series. For this reason, I resist the urge to hide away the replicating type containers. They are first class citizens here.
Next Time…
That’s it. Your first replicating type! Simple really, but very powerful. Next time we will move to containers, and start to build replicating types for storing multiple values.
Why would a UUID() comparison give a deterministic ordering?
Well, I guess technically there is a minute chance that you get the same UUID created twice. In practice, that is so minute the universe would sooner implode. It is very, very unlikely. I have been using UUIDs like this for years, and never had a collision.
Sorry for my quick out of context comment.
I was not referring to collisions between UUID.
I was referring to the comparison in the isOrdered(after:) function.
Having two exact same timestamps (which is extremely rare) how would two different, dare I say random, UUIDs define the order between two Entries?
It is entirely possible (and even very probable) that two UUIDs instances can be compared an any order, either ascending either descending.
Ah, OK. You are right that a UUID can’t determine the true order of events, but it can at least make sure that each device chooses the same order. That is our goal here: determinism. The same comparison made on two separate devices should lead to the same order. If we didn’t have the UUID, and the timestamps collided, we could get one device randomly choosing one winner, and the other choosing a different winner, and the two would be out of sync.
I see.
If all devices have the same UUID then effectively the result would be deterministic.
I was focusing on a different use for the UUID.
Thank you for the clarification.
Right. The UUIDs in this case are actually part of the type’s data. A UUID is created when the type value is set, just like the timestamp. If you are syncing to other devices, the UUID is part of the data that goes across to the other device (together with timestamp and value). So yes, each device ends up with that UUID, and can use it together with the timestamp to determine the “winner” if there is a conflict.
Hey this series is truly amazing! One question to this article, if the `UUID` is there only because of deterministic comparison, wouldn’t it be simpler to just always return `true` in the `isOrdered` if the timestamps are equal?
Thanks for the support!
The reason you want something to be deterministic is that each device must make the same choice. If you just return “true” for `isOrdered`, the sorting order will depend on whether you do `a.isOrdered(b)` or `b.isOrdered(a)`. That would be very bad. One device would choose one order, and another device could choose the opposite order.
If you are brave, you could just assume there would never be a timestamp collision, but I prefer to be on the safe side, and use a UUID to resolve potential collisions.
Okay I just posted a comment but can’t edit it now, tho now I understand why we need UUID instead of just returning true, as this will fail on commutativity rule.
Exactly. And the hand waving argument is that you want all devices to make the same choice (sort order).