A multithreaded program has multiple PCs, each of which is being fetched and executed from.
Threads are similar to separate processes.
Similarities with processes:
Differences with processes:
threads share the same address space and thus can access the same data.
the context switch that happens whens switching from one thread to another:
we have 1 stack per thread. obviously because each thread runs independently and may call into various routines
The above address space is less beautiful but normally ok, as stacks do not have to be very large.
You could use processes too, but threads share an address space and thus make it easy to share data (context switching is also slightly faster). Processes make more sense for logically separate tasks where little sharing of data structures in memory is needed.
as you can see from the right, there multiple possible thread traces, depending on when the OS scheduler decides to run each thread.
If two threads try to add 1 million to a counter (naively), we might expect the final count to be 2 millions. Actually this doesn’t work, because counter++
is not atomic.
This is what happens when counter++
runs:
A timer interrupt could be dispatched while T1 is in this block of code, the OS saves the state of T1 (its PC, its registers, including eax
, etc, to T1’s TCB), and then T2 runs. This could yield unexpected results.
This code is called a critical section, because it can result in race conditions (where results depend on the timing of the code’s execution).
A critical section is a piece of code that accesses a shared resource and must not be concurrently executed by more than one thread.
We solve this with mutual exclusion → if T1 is executing the critical section, others will be prevented from doing so.
We would like to turn that 3 assembly instructions into a single atomic instruction.
The hardware will need to provide us with some instructions we can use to build a set of synchronization primitives.
int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine)(void*), void *arg);
pthread_join()
takes in a double pointer to the return value, because it can modify the existing pointer (ofc, it needs to put the return value inside).
lock function is blocking, thread waits until it can grab the lock.
btw lock has to be initialized statically: pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
or dynamically: int rc = pthread_mutex_init(&lock, NULL);
These are used when some kind of signaling has to take place among threads (one thread is waiting for another to do something before it can continue).
int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex);
→ puts the calling thread to sleep until it’s signaled by int pthread_cond_signal(pthread_cond_t *cond);
Use a while loop instead of an if. The thread could be awaken at random, and think ready = 1
when it’s not.
Why does pthread_cond_wait()
take in the lock? because while waiting, the lock is released, of course. otherwise, the other thread wouldnt be able to acquire it. However, just before it’s awaken, it grabs the lock again.
Why are we using locks and condition variables at the same time? Well, ready is a shared resource, we wouldn’t want a race condition.
Why use condition variables instead of waiting just like this?
^^^ BAD BAD BAD ^^^ → it wastes CPU cycles instead of being blocked, and it’s error prone.
Tips:
Criteria for a good lock:
If we were on a single processor, one way would be to disable interrupts when acquiring the code, and enabling them when releasing it.
Ofc this solution is bad, we’re trusting applications too much. Doesnt work on multiprocessors too. Also, if we disable interrupts for too long the OS might miss the fact that some I/O operations finished.
This could be nonetheless used by the OS itself for some of its own data structures.
If we just use a simple integer variable, two problems arise:
This type of lock provides mutual exclusion, but not fairness or performance (both due to spin-waiting). Also this requires the scheduler to be preemeptive. If yu spin lock on a non-preemptive shceduler the thread will never relinquish the CPU.
Checks the value of a variable and if it’s the expected value, atomically sets to another value.
One hardware instruction to fetch to increment a value and return the old value. Very powerful.
With this you can implement ticket locking (increment a counter and when that counter is equal to a thread’s ticket then it’s their turn) and that ensures fairness too.
Yield! Whenever you cannot grab a lock, yield, so you give up the CPU (moving your state from running to ready).
This is beter than spin locking but it’s still not optimal, here’s why:
Basically, a queue of threads, when unlocking a thread calls the next thread in the queue to wake up. When locking, the thread puts himself in the queue if it cant acquire the lock.
How about this: if a lock cant be grabbed, the thread spins for a while hoping to acquire it. if it can’t grab it, in the second phase, it’s put to sleep until the lock becomes free later. → hybrid approach
There’s countes, linked lists, queues and hashmaps with locking built-in so they’re concurrent.
What if you want to check if a condition is true before continuing your execution? And of course you dont want to check a shared variable while spinning (wasting CPU cycles).
A condition variable is a queue that threads can put themselves on when waiting for some condition. Some other thread, when it changes said state, can then wake one or more waiting threads (signaling).
pthread_cond_wait(pthread_cond_t *c, pthread_mutex_t *m);
release the lock, put calling thread to sleep. after waking up, it will re-acquire the lock.
pthread_cond_signal(pthread_cond_t *c);
signal another thread to wake up.
do you need a lock when waiting on a condition variable? yes you do.
and do you need the lock when signaling? not always but most likely yes.
why? let’s say you check if done == 0
, it is, so you want to go sleep, but right before you go sleep, the other thread changes done = 1
and signal()
you to awake. but you’re not even asleep yet, you were just going to! so when you get the CPU back, you go to sleep and will never be awaken again. sad death.
To work on a proper bounded buffer, you need both a lock (to actually check shared resource) and a condition variable (to check that the buffer is empty - for the producer - and the buffer is full - for the consumer).
For a proper working solution, you need two condition variables, not just one.
If you use one, and you have two consumers, after a thread consumes from the buffer, it will signal the queue. But in the queue there can be both producer and the second consumer. If the second consumer wakes up, sees the buffer is empty, goes back to sleep, all three threads will be left sleeping for ever.
Consumers should not wake other consumers, only producers. And vice-versa.
Therefore:
By doing this, a consumer can never accidentally wake a consumer, and a producer can never accidentally wake a producer.
Semaphores allows us to replace locks and condition variables.
A semaphore is an object with an integer value that we can manipulate with:
When the value of the semaphore is negative, it will equal the number of waiting threads (useful to know).
You can use a binary semaphore as a lock (value can be [-1, 1]):
On Linux semaphores can never be negative. The implementation above is the one done by Djikstra though.
Semaphores can also be used for ordering, and for solving the producer consumer problem.
why do you need a mutex lock too? 3 semaphores??
yes. what if you have 2 producers, that at the same time put a value in the queue, but the first producer is interrupted just before it manages to increase the counter of elements inside. the second producer will overwrite the first element!!!!!!!!
If a thread wants to update a data structure, it will acquire the write lock, and then release it when done.
If a reader wants to read, the reader will first acquire lock, then increment the readers varaible to track how many readers are currently inside the data structure. When the first reader acquires the lock, it will also acquire the writelock so that no one can write. More readers will then be able to acquire the read lock too. Writers though, have to wait until all readers have exited the critical section and called sem_post().
To eat a philosopher needs both forks, but only one is in between all philosophers.
A simple solution like this doesnt work because of deadlocking. If each philosopher happens to grab the fork on their left before any philosopher can grab the fork on their right, each will be stuck holding one fork and waiting for another, forever.
The simplest way to fix this is this: make one philosopher grab the forks in a different order from the others. That’s it! It breaks the cycle of waiting.
A semaphore also allows us to implement easy thread throttling. Let’s say there’s a part of the code that if too many threads access, they will use too much memory and crash the system.
You can just initialize the value of the semaphore to the maximum number of threads allowed. Put sem_wait() and sem_post() around the region, and that’s it!