Replicants All the Way Down

In the previous installment of this ongoing series on Conflict-Free Replicated Data Types (CRDTs) in Swift, I introduced you to tombstones and Lamport counts, in constructing a replicating set. This time we’ll take a similar approach for a more common data type: the dictionary or map.

The code for our ReplicatingDictionary will be quite similar to what we used in the ReplicatingSet, so I won’t focus too much on those repetitious aspects. Instead, we will look in detail at a new trick the dictionary type has up its sleeve: the ability to recursively merge whenever the values it contains are also of a replicating type.

This ability to compose new types out of existing replicating types is one of the strengths of the CRDT approach. We’ll see in a few posts from now that you can construct your app’s data model entirely out of replicating types — the whole model becomes one monstrous replicating type, and merging across devices becomes a single function call. Simple — and completely robust — sync.

Introducing the ReplicatingDictionary

As usual, you can find all the code from the series in this GitHub repo. The core of the dictionary type is very similar to the set type from last time.

public struct ReplicatingDictionary<Key, Value> where Key: Hashable {
    
    fileprivate struct ValueContainer {
        var isDeleted: Bool
        var lamportTimestamp: LamportTimestamp
        var value: Value
        
        init(value: Value, lamportTimestamp: LamportTimestamp) {
            self.isDeleted = false
            self.lamportTimestamp = lamportTimestamp
            self.value = value
        }
    }
    
    private var valueContainersByKey: Dictionary<Key, ValueContainer>
    private var currentTimestamp: LamportTimestamp

The data is stored in the valueContainersByKey dictionary, but instead of just storing the values directly, like in a ‘normal’ dictionary, we wrap the values in a struct with some extra metadata. The metadata in question is a Lamport timestamp, and an isDeleted flag, which gives us tombstone functionality.

Just as for the replicating set type, this metadata allows values modified on different peers to be merged in a deterministic way, so that each peer ends up with the same set of key-value pairs. The mechanics of the dictionary type are almost identical to the set type; the main difference is that the set is just storing values, and the dictionary is storing values and keys. Whereas last time the entries in the set were used as the keys of the storage dictionary, this time we have a dedicated key for that, and the values get boxed with the metadata.

The rest of the type is fairly straightforward. The main challenge is taking into account the tombstones. For example, here is the set block of the subscript func.

        set(newValue) {
            currentTimestamp.tick()
            if let newValue = newValue {
                let container = ValueContainer(value: newValue, lamportTimestamp: currentTimestamp)
                valueContainersByKey[key] = container
            } else if let oldContainer = valueContainersByKey[key] {
                var newContainer = ValueContainer(value: oldContainer.value, lamportTimestamp: currentTimestamp)
                newContainer.isDeleted = true
                valueContainersByKey[key] = newContainer
            }
        }

It begins by incrementing the Lamport timestamp, because the contents of the type are being changed. It then checks if the newValue being set is non-nil; if it is, it creates a new ValueContainer box, with the new timestamp and value, and puts that in the valueContainersByKey storage. If newValue is nil, the function looks to see if there is an old value already stored for the key, because if there is, it needs to be marked deleted. We don’t just remove the entry from valueContainersByKey, we update it with the isDeleted flag set to true.

Recursive Merging

The RecursiveDictionary type has a standard merge method very similar to the one in ReplicatingSet, but it has an extra string to its merging bow: an extra merge method specialized for the case when the value type is also Replicable.

Why would we want this? The standard merge treats values atomically. If two devices update the value for a given key at the same time, later — when they sync up — one of the two values will win. You can never get half of one value, and half of the other — it’s an atomic merge.

But what if it actually would be useful to partially merge the values? Imagine you have written a nice replicating type to store the first name and last name of a contact, and you store values of this type in a ReplicatingDictionary. With the standard merge, if the first name of a contact was edited on one device, and the last name on another device, the eventual result would not be correct. One of the two values would win, and either the first name or the last name would still be wrong. But if the merge func of the values themselves could be called, we could have our replicating name type properly integrate the two edits, and both names would end up correct.

The specialized merged method looks like this.

extension ReplicatingDictionary where Value: Replicable {
   
    public func merged(with other: ReplicatingDictionary) -> ReplicatingDictionary {
        var haveTicked = false
        var resultDictionary = self
        resultDictionary.currentTimestamp = max(self.currentTimestamp, other.currentTimestamp)
        resultDictionary.valueContainersByKey = other.valueContainersByKey.reduce(into: valueContainersByKey) { result, entry in
            let first = result[entry.key]
            let second = entry.value
            if let first = first {
                if !first.isDeleted, !second.isDeleted {
                    // Merge the values
                    if !haveTicked {
                        resultDictionary.currentTimestamp.tick()
                        haveTicked = true
                    }
                    let newValue = first.value.merged(with: second.value)
                    let newValueContainer = ValueContainer(value: newValue, lamportTimestamp: resultDictionary.currentTimestamp)
                    result[entry.key] = newValueContainer
                } else {
                    // At least one deletion, so just revert to atomic merge
                    result[entry.key] = first.lamportTimestamp > second.lamportTimestamp ? first : second
                }
            } else {
                result[entry.key] = second
            }
        }
        return resultDictionary
    }
    
}

The two valueContainersByKey get merged together with a reduce, just like in the other merged functions we have seen, but the details are a bit different. It is very important to take into account the tombstones. If either of the values is deleted, the method just reverts back to the standard merge algorithm, picking the one that is ‘most recent’. If the value exists in both copies of valueContainersByKey, and is not deleted, rather than picking one value or the other, it calls the merged method of the value itself, and stores the result.

                    let newValue = first.value.merged(with: second.value)
                    let newValueContainer = ValueContainer(value: newValue, lamportTimestamp: resultDictionary.currentTimestamp)
                    result[entry.key] = newValueContainer

The rest of the function is just taking care to increase the Lamport timestamp appropriately.

Using Recursive Merge

That’s how the ReplicatingDictionary works, but how do you use it? In particular, how do you use that recursive merging functionality? There’s no better place to find out than in the unit tests.

final class ReplicatingDictionaryTests: XCTestCase {
    
    ...
    
    var dictOfSetsA: ReplicatingDictionary<String, ReplicatingSet<Int>>!
    var dictOfSetsB: ReplicatingDictionary<String, ReplicatingSet<Int>>!

    override func setUp() {
        super.setUp()
        ...
        dictOfSetsA = .init()
        dictOfSetsB = .init()
    }

    ...

    func testNonAtomicMergingOfReplicatingValues() {
        dictOfSetsA["1"] = .init(array: [1,2,3])
        dictOfSetsA["2"] = .init(array: [3,4,5])
        dictOfSetsA["3"] = .init(array: [1])
        
        dictOfSetsB["1"] = .init(array: [1,2,3,4])
        dictOfSetsB["3"] = .init(array: [3,4,5])
        dictOfSetsB["1"] = nil
        dictOfSetsB["3"]!.insert(6)
        
        let dictOfSetC = dictOfSetsA.merged(with: dictOfSetsB)
        let dictOfSetD = dictOfSetsB.merged(with: dictOfSetsA)
        XCTAssertEqual(dictOfSetC["3"]!.values, [1,3,4,5,6])
        XCTAssertNil(dictOfSetC["1"])
        XCTAssertEqual(dictOfSetC["2"]!.values, [3,4,5])
        
        let valuesC = dictOfSetC.dictionary.values.flatMap({ $0.values }).sorted()
        let valuesD = dictOfSetD.dictionary.values.flatMap({ $0.values }).sorted()
        XCTAssertEqual(valuesC, valuesD)
    }

This test sets up two ReplicatingDictionary’s, which contain ReplicatingSet’s as values. We introduced the ReplicatingSet last time; because it also conforms to Replicable, the specialized merged function is used.

Concentrate on the code for the key “3”.

    func testNonAtomicMergingOfReplicatingValues() {
        dictOfSetsA["3"] = .init(array: [1])
        
        ...
        dictOfSetsB["3"] = .init(array: [3,4,5])
        ...
        dictOfSetsB["3"]!.insert(6)
        
        let dictOfSetC = dictOfSetsA.merged(with: dictOfSetsB)
        ...
        XCTAssertEqual(dictOfSetC["3"]!.values, [1,3,4,5,6])

        ...
    }

The set in the first dictionary just contains the value 1. The second dictionary has a set containing 3, 4, and 5, and is updated to also include 6. The two dictionaries get merged to form dictOfSetC, and the test shows that the value for key “3” contains 1, 3, 4, 5 and 6. In other words, it has merged the two sets together, rather than atomically picking the first or the second.

This recursive merging is very powerful, and it doesn’t just work with sets. You can put any replicating data type you like in the dictionary, and it will undergo partial merging. With this tool at our disposal, we can build up complex data models that sync up autonomously.

Next time…

We have one more replicating type to go before we are ready to build our distributed text editing app. Next time we will develop a replicating array — the most complex type we will encounter in this series. It will act as the textual engine of our collaborative editor.

4 thoughts on “Replicants All the Way Down

  1. In the set(newValue) blcok of the subscript function is it necessary to tick() the lamport timestamp if a non-existent key is set no nil? In this particular case the dictionary is not changed.
    Is there a reason to do it anyway?

    1. You might be right, indeed. If nothing changes, it should not be necessary. You could add an else branch to cover that case.

      I don’t think that the extra tick is dangerous in any way. Ticking when nothing changed is not that serious. But what you suggest would be better, I think.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s