A First Replicating Collection

Other Posts in Series

In the last post I introduced you to the first replicating type in this series: a register. A register is a simple type, one that you may even have used yourself at some point, which keeps the most recent value that has been set, together with the timestamp of the update, so it can merge conflicting values from other devices. Even with this very simple type, you can develop complete distributed-data apps.

But there are limitations. A value in a replicating register can only be changed atomically. When it comes time to merge conflicting values, the register will choose one or the other, but cannot merge the two together. For example, if you stored a full name in a register, and edited the first name on one device, and the surname on another, the merged result would lose one of the two edits. Either the first name would be reverted, or the surname.

To go beyond atomic merging, we need to have new types that are capable of partial merging. Often, this means a collection. Collections can be added to, and removed from, on different devices, and the results merged at the element level, to give a combined result of the changes made on each device. Think about a collaborative editor like Google Docs. You can be editing some text at the same time as a colleague, and both edits will survive in the final document.

In this post, we will start our journey into collection types with the simplest of them all: the add-only set. It is so simple in fact, all you need to do to make one is remove one of the capabilities of the standard Set type…

The Add-Only Set

Imagine a Set where you can add new elements, but never remove them, and what you have is an add-only set. And the code is just as simple as the description

public struct ReplicatingAddOnlySet<T: Hashable> {
    
    private var storage: Set<T>
    
    public mutating func insert(_ entry: T) {
        storage.insert(entry)
    }
    
    public var values: Set<T> {
        storage
    }
    
    public init() {
        storage = .init()
    }
    
    public init(_ values: Set<T>) {
        storage = values
    }
}

The storage property holds the values in a standard mutable Set, which is kept private so that it can’t be changed from outside our replicating type. We then allow public access only to insert values, or retrieve all values, but not to remove values from the set.

You’ll recall from last time that a replicating type needs to have the ability to be merged with related copies of itself from other devices. Merging in this case just involves taking the union of the two copies of the set.

extension ReplicatingAddOnlySet: Replicable {
    
    public func merged(with other: ReplicatingAddOnlySet) 
        -> ReplicatingAddOnlySet {
        ReplicatingAddOnlySet(storage.union(other.storage))
    }
    
}

The union of the two storage sets is taken, and used to create a new add-only set, which is returned. Note that there is no tracking of timestamps, because there is no need: we don’t care when the values were inserted; once they are inserted, they are permanently in the set, and can never be removed — the add-only set can only grow.

Checking the Math

You should add unit tests for new replicating types, including tests for the three musketeers: associativity, commutativity and idempotency. I won’t got through these in detail here, but think about what they mean for a minute, and I am pretty sure you will see that it is self-evident that the add-only set conforms.

For example, the order of sets is unimportant when taking the union, so commutativity is a no brainer. And taking the union of two sets, and then taking the union of the result with either of the two original sets clearly changes nothing — idempotency is also a cinch.

How is this useful?

The add-only set is simple, but is it at all useful? What problems does it solve?

Clearly, it is not as powerful as a general purpose set which allows for removal, but there are situations where an add-only set can be used to good effect. In general, any time you are tracking a transactional history, this type is a good choice. It could be literal transactions, such as records in a banking system, or transaction-like values like entries in a log. If your values accumulate over time, don’t change once created, and are never removed, the add-only set will work great.

There are other types of replicating sets that can handle removal, so why would you ever choose the add-only set over those? The answer there is that an add-only set is very cheap. Recall that we didn’t have to save any timestamps, unique identifiers, or other metadata that was needed for the replicating register type. The space occupied by an add-only set is actually the same as that of a standard set. This makes it a cheap option for storage, and data transfer.

When nothing is something

To finish off, I want to address ‘nothing’. In particular, when ‘nothing’ is ‘something’. In the coming posts, this will become a recurring theme. It is typically one of the main challenges you need to solve when designing a replicating type — how do you represent nothing.

See, there are different types of nothing. There is the nothing you get when you create a new replicating value, and it is empty. And there is the nothing you get when you remove a value from a replicating type. The two are generally not the same.

How to represent and store the absense of a value in a replicating type is key to satisfying our mathematical requirements. In the add-only set, we solved this problem by simply excluding one type of nothing, namely, the one you get when you remove an element. With that option gone, if a value is not in our storage, we know it was never added.

Why a Swift Set is not a replicating type

As an exercise, let’s take a standard Swift Set, and run through some tests in our collective heads. Note that a standard Set does allow removal of elements. Imagine that we have two devices (A and B), with the same set of values on each. Then, on device A, a value is removed, and now device B has to merge that change in.

Device B sees that it has one extra value in its own copy of the set, but what does that mean? Did A remove that value, or did B add it since the last sync? There is no way to know, and that is all because there is no entry in the set to represent a removed entry.

When we start to look at more advanced types, we will see how you introduce something to represent nothing.

Next Time…

The add-only set is a very basic type, but does have its uses, is easy to implement, and cheap in terms of storage space and data transfer. If we want something that behaves more like a standard set, with removals, we need a more advanced collection. We will take a look at how you can do that in the next post.

Feature image from here and licensed Creative Commons.

2 thoughts on “A First Replicating Collection

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s