PostgreSQL Based Distributed Lock

☕️ 6 min read

Distributed Systems

Many teams across different companies started using cloud infrastructure extensively, and cloud computing popularity in business continue to grow. Industry experts predicts that market will grow over $200 billion in 2020, that is one of the rapid advancement in IT throughtout recent years. Clear indication that, with investments increasing in cloud computing the need for new tools are increasing as well, mainly in most known trio of cloud platforms Amazon Web Services (AWS), Microsoft (Azure), Google Cloud (GC).

Distributed Lock

Concurrency and non-determinism are the biggest reasons that multi-threading programming is pain. When there are multiple threads and either thread may produce a different result in the same piece of memory region. In a conceptual sense distributed locks are similar to the multi-threading locks, only difference is; threading locks take place on machine itself, whereas distributed locks take place within the environment, in my experiment cloud (AWS). I like to think distributed locks like a remote-controller of TV, at a time only 1 thread holds the controller(lock), similar to holding a remote controller in your family, that will determine who will watch what and what next.

Distributed Solutions

Of course as always, there has been different approaches to solve this determinism issue for multi-jobs, such as:

  • ZooKeeper: In a nutshell zookeeper gives you ability build server applications by creating master, worker, worker (at least 3 processes required). And the zookeeper manages who will be next master if master is down, or which worker does work at a time. Most common problem in this approach is leader-worker election in different scenarios, as nicely stated in this article: https://www.embedded.com/distributed-software-design-challenges-and-solutions/

  • Using 3rd Party Lock Holder: In this approach you can use either a database or file to determine and hold/release a lock. Biggest challenge in this solution is ensuring the atomicity of operation, because you need to make sure only 1 guy has the remote controller and power of change the channel (modifying resources). To ensure data always accurate, there must be only one process in server fleet to modify the shared resource. There is also some protocols to solve this problem but I found they are little bit too hard to implement in my case (eg. Raft, Paxos). Nevertheless, I will leave couple links for future exploration.

PostgreSQL based lock

picture

To do this experiment I set up two different machines, and each have database access. And I used pg_advisory_lock to create a unique lock with unique lockid, so each time primary or secondary workers access to table for RMW (read-modify-write) they have to acquire the pg_lock, and every time they are done, the they should release the lock.

Ingredients:

  • 2 EC2 instances
  • Each Instance have 2 S3 Workers (Where they write to the same resource S3)
  • Writers (workers) run on timers (to test functionality)
  • S3 Workers distinguished as Primary / Secondary
  • Only Primary Worker has ability to Write, while Secondary Worker always checks on if Primary is Down

To write modify the shared resource (S3 File), each worker should know about what each other’s doing. For the same S3 topic, only one instance at a time should be active as Primary.

In the C++ code I also wanted to wrap advisory lock call to a class, in order to effectively destroy the lock similar to this;

        // lock functions
        void lock(const int lockid) const ;
        void unlock(const int lockid) const ;

        // RAII style lock wrapper
        class lockguard {
        public:
            explicit lockguard(const psqldb::PsqlDb* ptrdb, const int lockid) :
                    ptrDb_(ptrdb), lockid_(lockid) {
                // lock on construction
                ptrdb->lock(lockid_);
            }

            ~lockguard() {
                ptrDb_->unlock(lockid_);
            }

        private:
            lockguard(const lockguard&) = delete; // delete this
            const psqldb::PsqlDb* ptrDb_;
            const int lockid_;
        };

In code when you want to acquire the lock you need to pass 2 paramaters;

  • database pointer which connection in session
  • lockid. This lockid should be the same for the workers doing the same job meaning the processes trying to access the same resource (in our case AWS-S3 File) And whenever the lockguard unlocked / or destroyed (goes out of scope) lock will be released.

In Action

Basically workflow as follows in Database by leveraging PostgreSQL:

RDS Manager is responsible of database connection and employing workers for each S3 recorder. Worker is responsible to execute the policy in time intervals (Primary-Secondary). Each Worker uses the same database connection through RDS_manager, Rds manager has its own Queue put transactions in order.

  • Make sure database connection credentials set correctly, some information similar to this;
    [postgredb]
    host=exampleEndPoint.com (AWS endpoint)
    dbname=exampleDatabase
    port=xxxx (5432 default)
    user=exampleuser
    password=xxxxxx
  • Make sure each machine has UNIQUE name, AWS IAM, or AWS IP ADDR etc.
  • lockid: Make sure each S3 recorder has UNIQUE lockid as integer between 1–998 (999 Reserved for Create, Insert Table Records)
  • If this is true, all the config files on all other instantances/machines should be false for this specific recorder, otherwise thaty will create contention (First Come First Server Basis)
  • Each Machine insert into info_recorders table its name / time / num_primary / num_secondary information, then starts its own primary & secondary recorders
  • Workers check database every 0.5 seconds (500 Milliseconds)
  • Primary Updates Time & Primary Up every 0.5 seconds
  • Secondary Checks Time || Primary Down every 0.5 seconds
  • If 2 seconds of time elapsed, Secondary takes over
  • If Primary Down for any reason, all of the locks released on database level (pg_advisory_lock)
  • Primary and Secondary workers locks only their own level of record by lockid when executing DB Read || Write commits
  • Primary and Secondary should have identitical config files except the fields machinename and their respective recorders as primary=true or primary=false

Why pg_advisory_lock in PostgreSQL?

In this example session level pg_advisory_lock used, good thing about this type of lock is when the connection goes down, or instance dies lock is released by itself without needing to interact with postgresql, to dodge deadlock situation.

You can also read about advisory locks in detail here: https://vladmihalcea.com/how-do-postgresql-advisory-locks-work/

Conclusion

PostgreSQL lock table works with a good-design, and it is cost-efficient, basically rather than using 3 machines / instances you can leverage PostgreSQL lock-table as main communication point. But every step of locking process and lock time intervals should be well-thought. I just experienced only 2 workers and only and 1 writer at a time, however for more multi writers things gets complicated real-fast and lock-table & write-policy, reader-election design gets more difficult.