The processes that try to acquire the common resource should not in any situation reach
a) Deadlock — This is a situation when no process can proceed further. In a scenario when all processes are waiting on each other to proceed, no process proceeds thereby leading to a situation where the system comes to a halt.
Ex. Say Process P1 & P2 and resources R1 & R2. Say P1 has R1 and is waiting to acquire R2, and P2 has R2 and is waiting for R1. In this case both P1 & P2 are waiting for resource held by another process.
b) Starvation — This is a situation when a process is waiting for a resource and it is not being granted access although available for access.
Ex. Say P1 wants to access R1. Say P2….PN are always prioritised before P1, and P2…PN are always in need for the resource. In such a case, P1 never gets his turn at R1 and is starved.
In a single system there might be multiple processes that are trying to access a resource concurrently. This could be multiple processes trying to write or read from a resource.
Ex. in a database there are multiple threads that want to write or read from the same row on a table.
In a single system this is resolved by having a semaphore. A semaphore can be thought of as a lock on a resource.
For example in real world, a resource could be a meeting room, with the semaphore being a person who restricts or provides access to the meeting room, and the people who want access to the room being the processes.
A very basic semaphore (binary semaphore) would need following methods :-
a. Wait — Allow access to the resource if available, else wait till the resource becomes available.
b. Signal — This can be invoked only by the process that holds the resource. Invoking this method releases the hold on the resource, and other resources can now try to acquire the resource.
It should have variables
a. boolean locked — True if resource is being used, false otherwise
b. Process p — The process that is currently using the resource.
c. List<Process> waiting — The queue of processes waiting for accessing the resource.
Advanced use cases can be built over the binary semaphore.
In a distributed system managing the access to a resource gets complicated. The different ways to restrict accesses are :-
There can be a central manager (co-ordinator) to restrict access to the resource. Each process sends a (request) packet to acquire, and waits till it receives a (grant) packet in response of acknowledgement to access the resource. Once its done with the critical section it sends a (release) packet to the co-ordinator.
Say System S1,S2 want access. S0 being central co-ordinator.
S1 (request) -> S0
S2 (request) -> S0
S0(grant) -> S1
S1 proceeds, S2 waits as it has not received grant access.
S1 (release) -> S0
S0 (grant) -> S2
S2 can now access.
- Co-ordinator is single point of failure
2. If any system holding lock dies mid process, release packet is never received by co-ordinator, leading to deadlock. This can be mitigated by putting an upper cap to lock holding time, but with time synchronisation issues in distributed system, and cases where access time can be unbounded, this may not always work.
3. Process cannot distinguish between a dead co-ordinator and when its not being granted access. This could be overcome by creating a deny packet which is sent when the process requests for resource and is not granted. Tradeoff may involve polling the co-ordinator multiple times for access.
4. Performance bottleneck in co-ordinator.
Lets say that there are n systems that share a resource R. A resource can be accessed by a process P` when any k > n/2 systems provide access to it.
In other words, each system holds 1 grant. Whenever any system wants to access the resource, it broadcasts it request to all the servers. If it hears back from k servers where k>n/2 it proceeds to access the resource. A system sends back the response only when it has not granted access to the resource to any other system.
Say S1…..Sn are the systems and R is the resource. Say S1 requests for the resource. S1….Sn all receive the request, and S1….S(n/2+1) respond with providing access to the resource to S1.
Say now S2 wants access to R. S2 sends it requests. S2 can now hear back only from n — (n/2+1) which is less then n/2 and hence does not proceed to access the resource.
Say now S1 releases the resource and informs all the nodes. Now when S2 requests for the resource its request is granted.
- When multiple systems need the resource concurrently, no one is able to get the majority and therefore the resource utilisation drops.
When a system wants access to a resource it sends the request to all other systems.
Say S1….Sn are the systems. Sk wants to access the resource R. Hence it sends the request to S1…Sn.
We assume that delivery is reliable and all systems are able to receive the request packet.
Any system S would do one of the following :-
a. It sends Ok if it is not accessing the resource.
b. If it is accessing the resource, it queues the request.
c. If it has already requested for the resource, and receives a request, it breaks the tie by comparing timestamp of the incoming message with the one it had sent to everyone. The lower timestamp wins. If it had lower timestamp, it queues the request, if it had higher timestamp, it sends Ok.
This is similar to centralised algorithm, where each server is now acting as a co-ordinator.
- Multiple points of failure, as any system crash (a silence) would be treated as denial to access the resource. This could be solved for by adding a reply of denial.
- Scalability would be a challenge as each node is now involved in decision making.
A token is circulated in the network. Each system can when it has the token, access the resource. When done with the access or when it does not want to access it forwards the token to the next system.
This could be thought of as a circle of the systems where each system knows of the next system in the loop and forwards the token to the next system when it is done.
- If the token is lost, it needs to be re-generated. Knowing if the token is lost is complicated as any resource may be using it instead of the token being lost.
Another solution using Centralised , Decentralised and DHT based
Lets say that we have systems S1…Sn that work as co-ordinators. Lets say that every resource R has a replication factor of Rep, and is replicated on Rep number of servers in S1….Sn.
As in a DHT all servers know the owner for the resource R.
Say Sys1….Sysk want to access the resource R.
The request can be sent to any system S1…..Sn. Whenever a system receives a request for resource R, it broadcasts the message between S1…Sn. All the systems that have replicated the resource R, send back an Ok of they have not given the grant to a different process (Similar to Decentralized). If the system receives back Ok from servers > Rep/2. The sys is granted access.
The same drawback of low resource utilisation would happen if simultaneous systems want to access the resource and the grants are given to different systems.
A deadlock breaking strategy could be applied, where if a sys is not granted access to the resource, it sends a release trigger. On receiving a release trigger, the systems that are in need for the resource could broadcast their requests again.
Thoughts suggestions? Which one do you use and why?