An idea for an atomic lock-free protocol
Preface
A long-lived operation on a database can significantly affect performance by holding long-lived locks on resources which are required by other operations.
To make operations atomic, database transactions typically lock resources until all of the changes have been fully committed, and as a result, other transactions are blocked from reading or writing to these.
Long-lived transaction can increase the frequency of deadlocks, which can cause these long-lived transactions to fail, or cause other transactions to abort, increasing the latency of responses.
I propose a system of breaking up these long-lived operations (or transactions) with a system to "watch" as operations are committed by other transactions, and reactively re-executing segments of queries based on those results b into the query language or database transaction system.
Motivation
I am currently developing a new general-purpose multi-modal (graph, document) database called FractalDB built in Rust for my startup framework.tools.
We wanted something flexible, scalable and something that would allow us to group indexes across different customers, as each customer requires their own indexes. This allows customers to new writes records without slowing down writes of other customers.
They're a bit like compound indexes, except there can be more than 1 sub-index under each layer of the compound index.
The problem
Say we have a query that looks like this
// give $1,000 to everyone who has less than $1,000 in their bank account
for account in accounts {
if account.balance < 1000 {
account.balance += 1000
}
}
return
Let's say we have 100,000 accounts in our database.
In a ACID compliant database (more on this later) using 2PC (2 phase commit), executing this query would modify 100,000 accounts in a single transaction tx1
, locking each account that it runs into. During this time, no other transaction could modify these accounts in order to maintain consistency.
Now, imagine that another transaction tx2
comes along and decides to add $1500 into a bank account which had a $500 balance, putting that specific account over the threshold in the if account.balance < 1000
statement.
If tx1
was iterating through the loop, and ran into the back account in question, and incremented it's balance by $1,000 because it was at $500, and then tx2
came along and added $1500 to the bank account balance. You might expect the bank account to be at $500 + $1,500 + $1,000, however, this is not the case.
One of the purposes of database transactions under ACID compliance is to isolate the changes from other transactions. So both transactions would see the bank account balance at $500, and the bank account balance would be overwritten by tx2
.
This is why we lock resources, so that a transaction is not invalidated by changes done by different transactions at the same time.
But this creates new issues...
Now that we're locking things, We open the possibility of creating dead-lock scenarios, which will force us to abort a transaction. On top of this, there will be added latency as transactions need to wait until other transaction are completed with their resources.
ACID-ity
Before I get into the alternative solution, I want to talk a bit about what ACID compliance stands for:
- ATOMICITY
- A transaction is made up of multiple statements, or changes. The purpose of atomicity is to ensure that the transaction is treated as a single unit, rather than multiple units of changes. This means that a transaction either entirely succeeds, or entirely fails, with no in-between.
- CONSISTENCY
- Relating to atomicity, a transaction ensures that each group of operations brings the database from one valid state into another, ensuring each constraint, rule and relationships are left in a valid state. For example, for each debit, there is a credit in a corresponding account. (money does not just appear from nowhere)
- ISOLATION
- Changes which are in-progress are isolated from other transactions (meaning that they can not be viewed by other transactions). This is needed as databases generally need concurrency for performance reasons which means that multiple transactions may be executing at a single time.
- DURABILITY
- Durability means that all the changes that have been committed, will remain committed even during a power failure, crash or shutdown. This also means that the data inside of transactions must be persisted to disk, so that they can be recovered after restart. This is typically done by a WAL (write ahead log)
An alternative
Say that we have this query again
// give $1,000 to everyone who has less than $1,000 in their bank account
for account in accounts {
if account.balance < 1000 {
account.balance += 1000
}
}
return
Imagine that we have the goal of removing locks (ie. allowing transactions to modify data that other transactions are also modifying) from our database while maintaining each requirement of ACID.
How?
Through what I call reactive code re-execution (RCR).
The idea is breaking down the transaction into sub-transactions that are part of a larger longer-lived transaction.
Each sub-transaction has a few components.
- A resource, which is "watching" or "listening" for changes to the resource by other transactions.
- A statement (or code) which has a reactive dependency for specific events on the resource.
When the resource changes, the sub-transaction resets the state of the changes it applied to the resource, taking into account the new changes that have been applied by other transactions in an asynchronous manner.
// give $1,000 to everyone who has less than $1,000 in their bank account
for account in accounts {
if account.balance < 1000 {
account.balance += 1000
}
}
return
When an another transaction comes along and changes the balance of this account, the sub-transaction (which involves a scoped part of the query code) re-executes based on those changes made to the resource.
Before the parent transaction holding all of the sub-transactions can commit, it needs to be serialised into the log file, but while this serialisation is occurring, other transactions may want to start commit changes to the same resources.
This is a non issue if only one transaction is allowed to serialise and write to the log at once (this guarantees that the other transactions will receive the changes that the single transaction is about to apply)
This obviously can introduce some bottlenecks as we may want to batch the serialised data of multiple transactions into a single write or separate writes into different threads by having a mechanism to allocate segments of log files to different threads.
We can solve this problem by modifying our transaction serialisation system to allow appending of the sub-transaction re-executions as they come up, before finally locking the log file (guaranteeing that no more changes will come into the database, except for the transaction about to be committed)
Some rust pseudo-code representing this
let buf = Vec::new();
tx.serialise_data_into_buf(buf);
while tx.has_change_events() {
tx.append_one_change_event(buf);
}
log.lock(); // acquire log lock, so no more events can occur
while tx.has_change_events() { // one last check to get any events that may have been created
tx.append_one_change_event(buf);
}
log.write_buf(buf);
// drop log lock
Don't quote me on this, but from what I've seen, most databases typically only use a single thread to write to a single log file at a time. A modern SSD at the time of writing writes about 3.5GB/second while a 3.2ghz CPU core with a data bus of 64 bits can at most handle up to 25GB/second of data.
The bottleneck in this case is probably going to be your SSD.
Once the transaction has been persisted to the log, the transaction should trigger events on all of the resources that the transaction modified before releasing the lock on the log file append. This itself may present another bottleneck depending on how the
Final Thoughts
The database we're building is a database designed to be able to efficiently support microservice architectures without the problems that come with distributed services.
Our database allow multiple microservices connect to the same database and share types & information (similar to a GraphQL federation).
We have also plans to open source FractalDB in the future.