Safe transactions are usually referred to as ACID.
All writes succeed or none of them do. Imagine having to withdraw $10 from Jordan to give $10 to Patrick. Both of them have to succeed.
Changes made within a transaction are consistent with database constraints. All rules, constraints and triggers. If the data gets into an illegal state, the whole transaction fails.
No race conditions. All transactions are executed independently of each other. If two transactions for a withdrawal of $100 run at the same time, at the end the balance should be -200.
Once the transaction completes and changes are written to the DB, they are persisted.
We use a WAL to achieve atomicity, consistency and durability. Isolation, on the other hand, is hard and takes long to implement. Every transaction, we write all operations in a transaction, and at the end we write commit. Once we write commit, we know all the writes we put in the WAL are “safe”. If the system were to shutdown, we’d go through the WAL and replay all transactions to make sure everything is good.
A database is said to be read committed isolated if it protects against these two race conditions:
Long reads (more than a minute long) risk to see inconsistent data throughout the transaction. Let’s say row 1 has 100 dollars, and row 100 has 100 dollars. During the reading, someone writes to the table (reads are non-blocking) and moves the 100th row’s money to the 1st row. The read transaction could’ve read 100 dollars for row 1 and then 0 dollars for row 100 since that value was changed by the write operation while the read was ongoing.
How do you fix this? Simple, reads read a snapshot of the data. Snapshots are implemented by storing old values of modified data. Every write, you keep it in the WAL with a certain timestamp. So basically for every row you will have a list of different values in time. If you start a read transaction at timestamp 15, you will only read values with a timestamp of 15 max. If a write happens at timestamp 20, you won’t read that value.
Let’s assume 2 different threads want to change their respective row’s status to “inactive” only if 1 other row is active. Right now, they’re the only 2 active rows.
If they both change themselves to inactive at the same time, there’ll be no active rows left! This is a problem. We fix this by using a lock. Threads acquire locks on ALL active rows, check if there’s an active one, then change their status, then release locks.
Similar example, let’s assume 2 threads want to add a new row if there’s no other row with the same name in use. If both do at the same time, things break. But there’s no lock to acquire, how do we fix this? Well, we do materialize conflicts! This means, we insert the possible names as rows, with their own respective lock.
Threads just have to acquire a lock, check if the row is used or not, if it’s not used, then use it.
We saw how weak isolation is hard to debug and implemented differently among databases. Also, detecting race conditions is hard and difficult to tell looking at code.
Serializable isolation is regarded as the strongest isolation level. It guarantees that even though transactions may execute in parallel, the end result is the same as if they had executed one at a time, serially, without any concurrency.
…