Last time we introduced our first replicating collection, an add-only set. Although useful in some specific circumstances, it isn’t a generally applicable set type, because it can only grow — you can’t remove entries from an add-only set. In this post, we are going to build a more advanced set which is capable of handling removals.
But what you find below is designed to do more than just introduce a new collection type. It also introduces some important concepts and tools you can use for building other replicating types. In particular, we will see how you can handle deleted values, and meet a more robust way to track time.
Introducing the ReplicatingSet
The add-only set was simple, because it was really just a standard set with the removal functions…err…removed. The data storage was exactly the same as a Swift Set
, and merging involved nothing more that taking the union of the two sets.
As we touched upon, the add-only set was incapable of handling removal, because it couldn’t distinguish between the case where an entry did exist and was removed, and that the entry was never added in the first place. This meant it was impossible to decide how merging should proceed.
To get around this, we will include some metadata in our type, and store it in a dictionary keyed by the values themselves. Here is what it looks like.
public struct ReplicatingSet<T: Hashable> {
fileprivate struct Metadata {
var isDeleted: Bool
var lamportTimestamp: LamportTimestamp
init(lamportTimestamp: LamportTimestamp) {
self.isDeleted = false
self.lamportTimestamp = lamportTimestamp
}
}
private var metadataByValue: Dictionary<T, Metadata>
private var currentTimestamp: LamportTimestamp
The Metadata
struct includes a flag isDeleted
. If this is set to true
, we know that the value once did exist in the set, but has been removed. We call this a tombstone, because it acts as evidence that something did exist, but has since deceased.
Also included in the metadata is a timestamp, but it is no ordinary timestamp. I will discuss that aspect more below, but for now, just think of it is the time of the last change to that particular entry in the set.
Inserting Values
So how does the ReplicatingSet
work? The initializers allow you to create an empty set, or one prepopulated by an array of entries.
public init() {
self.metadataByValue = .init()
self.currentTimestamp = .init()
}
public init(array elements: [T]) {
self = .init()
elements.forEach { self.insert($0) }
}
You can insert entries in an existing ReplicatingSet
just like you would the standard Set
type
@discardableResult
public mutating func insert(_ value: T) -> Bool {
currentTimestamp.tick()
let metadata = Metadata(lamportTimestamp: currentTimestamp)
let isNewInsert: Bool
if let oldMetadata = metadataByValue[value] {
isNewInsert = oldMetadata.isDeleted
} else {
isNewInsert = true
}
metadataByValue[value] = metadata
return isNewInsert
}
This function begins with a call to the tick
function on the current timestamp; think of this as simply incrementing the time. We need to do this whenever something changes.
New metadata is then created with this timestamp, and that metadata is set in the metadataByValue
dictionary. There is a little more code there to check if there was already a value in the dictionary, but that is purely needed so that we can return if the value was new or not.
Handling Removal with Tombstones
You can also remove entries.
@discardableResult
public mutating func remove(_ value: T) -> T? {
let returnValue: T?
if let oldMetadata =
metadataByValue[value], !oldMetadata.isDeleted {
currentTimestamp.tick()
var metadata =
Metadata(lamportTimestamp: currentTimestamp)
metadata.isDeleted = true
metadataByValue[value] = metadata
returnValue = value
} else {
returnValue = nil
}
return returnValue
}
Removing doesn’t actually remove anything from the metadataByValue
dictionary, it just changes the metadata so that it becomes a tombstone. Again, the timestamp is updated first, and new metadata is created with the isDeleted
flag set to true
. The result is stored in the dictionary.
Accessing Values
The rest of the struct is straightforward. Each function has to be careful to account for tombstones. For example, to get all the current values, we have to filter them out.
public var values: Set<T> {
let values = metadataByValue
.filter({ !$1.isDeleted })
.map({ $0.key })
return Set(values)
}
The same is true when checking if a particular value is in the set. It is not enough to just check if the key exists in the dictionary; the isDeleted
property of the metadata also needs to be taken into account.
public func contains(_ value: T) -> Bool {
!(metadataByValue[value]?.isDeleted ?? true)
}
Merging
As you know by now, replicating types are types where a value can be modified independently on different devices, and merged back together in a deterministic way. The ReplicatingSet
is no different, and the algorithm used is much the same as that of the ReplicatingRegister
. For each entry, the value with the latest time wins.
extension ReplicatingSet: Replicable {
public func merged(with other: ReplicatingSet) -> ReplicatingSet {
var result = self
result.metadataByValue = other.metadataByValue.reduce(into: metadataByValue) { result, entry in
let firstMetadata = result[entry.key]
let secondMetadata = entry.value
if let firstMetadata = firstMetadata {
result[entry.key] =
firstMetadata.lamportTimestamp > secondMetadata.lamportTimestamp ? firstMetadata : secondMetadata
} else {
result[entry.key] = secondMetadata
}
}
result.currentTimestamp = Swift.max(self.currentTimestamp, other.currentTimestamp)
return result
}
}
Note also that the currentTimestamp
is also set to the largest of the two values before the merge. It is important that the timestamp keep increasing; if it were to decrease, we would risk losing more recent changes in future merges.
Pros and Cons
We’ve touched on the pros and cons of our new set type in passing above, but let’s address it head on. The big pro is of course that we can both add and remove values, like a real Set
, and yet merge things back together consistently when changes are made on different devices.
On the negative side, we have had to introduce extra metadata, which takes up more memory and consumes network bandwidth. Even worse, because we have had to introduce tombstones, the data usage of the ReplicatingSet
can only increase. Even when we do remove an element, it remains in the set as a tombstone — no memory is freed at all. Even a deleted item takes up space
Lamport Counts
Let’s finish off by addressing the handling of time, which I skipped over earlier. We could use standard timestamps for this, exactly like we did for the ReplicatingRegister
, but there are some risks with that. For example, imagine your iPhone has some issues with the clock, and jumps forward a year for a few hours. No big deal, right? That depends on what you do in those few hours. Imagine you make significant changes in an app that is using standard Date
values to merge data. Any changes you make will effectively be locked in for the next year. Any attempt to change those values after the clock recovers will fail, because the new timestamps will be smaller than the timestamps used during the glitch.
Some argue you can work around this type of issue with some ad hoc checks. For example, you could save the date occasionally, and if you see a large jump, you could warn the user and refuse to save changes. But what if someone puts a device in a drawer for a few months? In reality, it is quite difficult to correctly identify issues with the clock. Probably the best you can do is regularly check with a reliable time server, though even this will not help if the device is offline.
The ReplicatingSet
uses a different approach: a Lamport count. It’s really very simple. A Lamport count is nothing more than an integer value that increases as each change is made. This is rock solid, in the sense that it doesn’t depend on the real time at all. Your iPhone’s clock could be completely messed up, and the Lamport count would be unphased.
The strength of the Lamport count is that it has no relation to the actual time, other than that it can only increase, just like real time. But that is also a weakness: When changes are made on different devices to the same value, you can determine a winner using the Lamport count, but there is nothing to say that the winning change was actually the most recent. As long as changes are made serially on a single device, the ordering is well defined, as the Lamport count will continually grow, but when changes are made concurrently on different devices, the winner is largely unpredictable, though importantly, completely deterministic.

Next time…
That’s it for sets. Next time we are going to move on to the Dictionary
type, and see if we can apply a similar approach there. But we will go one step further, by making our dictionary type merge recursively. This opens up much more interesting possibilities, as we can start to build more complex replicating types out of simple building blocks.
The following code seems not commutative when the two timestamps are equal, is it?
firstMetadata.lamportTimestamp > secondMetadata.lamportTimestamp ? firstMetadata : secondMetadata
That is true, which is why the Lamport timestamps include an extra random element to deterministically assign the order. So if the two timestamps are exactly the same, the extra piece (a uuid) is used. This makes a collision extremely unlikely.