Serializable Isolation Level

Ishan Mishra
8 min readMay 30, 2021

In the previous post, we read about the various issues possible if we use weaker forms of isolation like read committed and snapshot isolation. We read about the lost-updates and write-skew problems. We concluded the section by admitting that Serializable Isolation is the way to go in order to avoid all sorts of concurrency issues. In this post, let’s define the Snapshot Isolation and see how it is implemented in databases.

Serializable Isolation is the strongest isolation that is possible in databases. It ensures that the result produced by a transaction (in presence of multiple concurrent transactions) is the same as if all the concurrent transactions were run serially (one after the other). It is noteworthy that the isolation level guarantees the result, but concurrent transactions may run in parallel, provided they produce the same result which is produced if they had run one after the other.

There are mainly three ways in which databases implement the Serializable isolation:-

  1. Actual Serial Execution
  2. Two Phase Locking
  3. Optimistic Concurrency Control Technique

Actual Serial Execution

What is the simplest way to execute the transactions one after the other?
Just run the transactions serially, that is, only execute one transaction at a time.

This seems simple enough. If we want the result to be produced similar to a serial execution, we will make the execution of transactions in serial order. But an interesting thing to note here is that although it seems like a simple idea, it was only recently adopted by databases like VoltDB, H-Store, Redis and Datomic. There are primarily two reasons for that:

  1. Memory issue
    For transactions to run serially, there must be minimum disk IO. Otherwise, a transaction requiring data from disk would stall other transactions.
  2. OLAP transaction
    OLAP transactions read/write a huge amount of data. One OLAP read transaction could stall all other transactions.

Actual Serial execution became possible as RAM became cheaper (allowing entire datasets to reside in-memory, thus resolving the disk IO issue) and the developers realised that most of the database writes occur via short and fast OLTP transactions. OLAP transactions could run using snapshot isolation (since OLAP queries require less writes).

To implement actual serial execution, the database tries to reduce as much application-transaction interaction as possible. Application-transaction interaction pertains to when a transaction pauses for interacting with application, and depending on the result of that interaction, the transaction resumes. Database would have to avoid this kind of behaviour to avoid IO, which could stall other transactions while the currency transaction is waiting for application code to return.

In order to do so, database vendors provide the feature of stored procedure. One needs to submit the entire transaction code to the database ahead of time as a stored procedure.

It should be noted that partitioning the data helps immensely in the serial execution of data. It helps databases use multiple cores in order to process multiple transactions (which are to be run on separate partitions) parallely.

Since Actual Serial Execution executes transactions serially, there is not any problems related to concurrency issues (since there is no concurrency).

Two Phase Locking (2PL)

For a long time, 2PL was the only way to ensure Serializable isolation level. Actual Serial Execution was only possible after the RAM became cheaper, and SSI (Serializable Snapshot Isolation) is fairly new idea. It is implemented in MySQL InnoDB, SQL Server and DB2.

Databases implement 2PL by using locks. Till now, we were using locks to prevent dirty writes. But, we were using them in a way which would unblock readers (like in snapshot isolation). That enabled multiple concurrency issues (like phantoms and write skew) to seep into our system. In order to avoid those issues, databases define two kinds of lock modes: shared and exclusive. Moreover, the rules for acquiring locks are properly defined as:

  • Any transaction requiring read must acquire the shared mode lock on that object. Any transaction requiring write must acquire the exclusive mode lock on that object.
  • If any transaction has acquired any lock, it must keep it for the length of the transaction.
  • Shared mode lock can be acquired by multiple transactions simultaneously. That is, multiple concurrent transactions can read an object parallely.
  • If any transaction wants to write to an object, it must have an exclusive lock on that object.
  • Exclusive mode lock can only be acquired if there is no lock (neither shared nor exclusive) on that object.
  • If an object is exclusively locked by any transaction, it cannot be read or written into, unless the exclusive mode lock is relieved. Thus, in 2PL, the writer blocks readers.

There are two cons to 2PL:-

  1. Deadlock: two (or multiple) transactions are stuck indefinitely for the same lock.
  2. Terrible Performance: due to reduced concurrency and overhead of acquiring and releasing multiple locks.

So we read that 2PL implements locks. But what if, there are no objects to lock into? For example, when we are claiming a username (say ‘ishan’) from a list of usernames, then we would run something like

select count(*) from user where username = ‘ishan’; //return 0

If currently the row does not exist, it would after we run the next query in the transaction:

insert into user (id, username,….) values (1234, ‘ishan’,…..);

Now, if any other concurrent query wanted to claim the same username, and both users read at the same time, then the first read query would return 0 to both users, and then both users would be able to take the same username.

Note that we cannot solve this issue by using locks, since there is no object (when the first query is executed) on which we could lock on.

For solving issues like this, we utilise the concept of predicate locks and index-range locks. Some pointers about these concepts:

  • Predicate locks do not lock onto an object, they lock on to a search condition. They are usually not preferred since they do not perform well.
  • Index-range locks lock on the index (for example, it would lock on the index corresponding to the field username (if present), or the index corresponding to the entire table). Note that index-range is usually larger than the exact requirement (for example, a database could block the last read parent node for the username index which would correspond to the value = ‘ishan’).

Serializable Snapshot Isolation

SSI is a fairly new technique (inception: 2008) and has a very small performance penalty. It has been implemented as a “Serializable” isolation level in PGSQL since version 9.1.

Why the low performance penalty? Because it doesn’t hamper concurrency.
Without causing any concurrency cost, all transactions execute in snapshot isolation mode. But when the transaction is about to commit, it checks if it has breached any of the isolation rules. If it has, it aborts and retries.

Thus, it is an optimistic approach, as opposed to the pessimistic approaches discussed under 2PL and actual serial execution.

While operating under snapshot isolation mode, how does SSI technique check if any isolation rule has been violated? SSI check for outdated premise actions.

What is meant by outdated premise action?

Write-skews and phantoms are caused under read-modify-write cycle. What happens is that a transaction reads something from the database, and based on the result of that query, the application takes action. This initial read query is called a “premise”.

Write skews occur when an action is taken based on an outdated premise (eg, we read in the PS5 example that no_of_items_left = 1 (read by Sanjay), but another concurrent write transaction (rahul’s) rendered that premise obsolete by decrementing the counter). Thus, in order to ensure serializability, it would be enough that while committing, we ensure that the commit is not based on an outdated premise.

How does the database check which actions might have been taken by an outdated premise? There are two cases to consider here:-

  1. Detecting stale MVCC read:
    The first case is when a concurrent write occurred before the read (case of an uncommitted write, when the transaction started). For this, the database checks if a transaction (say t1) ignored another transaction’s (say t2’s) write due to MVCC visibility rules (describing which all writes would be visible to a reading transaction). Now, when the t1 is about to commit, it will check if t2 (or any other ignored transaction whose writes were ignored by t1) had committed. If yes, then we abort t1.
  2. Detecting writes that affect prior reads:
    The second case to consider is when a concurrent read occurred (by another transaction) before a write. To prevent this, databases use a modification of index-range lock technique. Basically, databases keep multiple entries corresponding to any index value. These entries denote which all transactions interacted with that index value (for example, in case of claiming username, SSI would keep an entry consisting of all the transactions that recently read that user). If the transaction wishes to write to an index value, it ensures that all other transactions that read the index value are notified that their read is now obsolete. Hence, any further action on that outdated premise is notified to all read transactions. Once the transaction commits, that entry corresponding to the index value is deleted. For example, say transactions t1 and t2 read the index user with user_id = 1234. Now, t1 writes to that index value. Then when t1 commits, it notifies t2 that it’s read is now obsolete. Accordingly, t2 aborts.

The major performance decider in case of SSI is the number of aborts. Hence, in high contention databases, it performs poorly. That is, when the number of writes happening on the same set of objects is really high, it performs terribly.

Another factor to consider is the length of the transaction (is it lengthy or is it short). Once a transaction fails, it would have to be retried. Thus, large transactions are not usually preferred.

But since our workloads consist majorly of short and quick OLTP write and reads, SSI works quite well. Compared to 2PL, it does not block readers (since it operates under snapshot isolation), and compared to serial execution, it does not limit itself to the throughput of a single core.

Conclusion

In this entire transaction series, we saw the various isolation levels. We started with the Read Committed isolation level and read about how it solves dirty read and dirty write problem. Then we observed that despite solving dirty read/write issues, we still have concurrency issues such as read-skew problem. To solve that, we introduced Snapshot Isolation level. But even Snapshot Isolation level had concurrency issues involved with it. We briefly read about these issues and decided that Serializable Isolation level is the best way to solve all kinds of concurrency issues. We ended this post series with discussion about Serializable Isolation and it’s implementation.

For a deeper understanding of the subject matter, please refer to the book mentioned in the bibliography. Note that this post is part of the series on Transactions:-

  1. Database Transaction and Isolation Levels
  2. Read Committed Isolation Level
  3. Snapshot Isolation Level
  4. Issues with Weaker Isolation Levels
  5. Serializable Isolation

Bibliography

Kleppmann, M., 2017. Designing data-intensive applications: The big ideas behind reliable, scalable, and maintainable systems. O’Reilly Media, Inc

--

--