Broken File Lock System in Linux

☕️ 5 min read
picture

File Lock

File System is probably one of the first things to develop on an operating system, and I cannot believe how far Linux progressed without improving its file system abilities; locking, swapping, modifying.

Linux File System

The file system mentioned here, the HDD file system that is written on disk (not on RAM or OS pages in that manner) and Linux supports more than 100 FileSystem types including NFS, EXT3, XFS etc.

All file systems start with namespace, which includes length of file, logical structure on disk (.word, .txt). Second thing we need is the metadata of file with information enclosed; creating time, last access time, last modification time etc. I will also not get into directory structure of filesystems in Linux, but all we need to know is no matter the filesystem type, Linux provides some APIs to access to files through Kernel. When you try to access to file from multiple processes, Linux provides a few handy-dandy locking mechanisms.

And some known file locks to be discussed here:

  • F_SET_LK : POSIX locking API
  • lockf() similar to above
  • BSD locking flock()

POSIX type (F_SET_LK)

POSIX type lock is the most portable lock-type and on paper it looks like it works with NFS and it also does byte-range locking means that file-range, file-region type locking. When you working on file you can only lock part of the file or region. (practice always differs from theory) But the problem arises when you want to use this inter-threaded it will not work because each lock bound to process (multiple threads can be in same process) unless you want use process-wide locks. Still it is really dangerous to use this kind of lock on multi-threaded program, for example if you lock your file from the the same process but 2 different threads, when thread A locks file, and thread B access file to read, thread A would not know if thread B goes in and accesses file. So there is no clear communication between threads.

lockf()

The same problems with above F_SET_LK, top of that there is many file systems NFS NFS+ and none of them properly defined for file-type locks and these lock almost impossible to use “correctly”

flock()

So far flock is the only lock I can get it to work, but still it suffers the some problems such as network delays and latency. When I try to access file and lock it; locking might take a second or so, while other process try to access file. flock leverages file descriptors (fd) and can be inherited between processes (fork, exec), and it becomes only broken on fclose(). Linux type locking is not portable either so you need to make sure, you use similar file system to NFS > 2.6.

Calling inter-process locking on Ubuntu / C++ using fnctl()

In order to use/test fnctl type lock I created a small C++ wrapper similar to mutex to call whenever you need read or write file, similar mutex guard, destruct on out of scope {} Purpose of this lock to create some inter-process lock leveraging NFS Type lock file, it is not perfect most cases but considerable for 2+ seconds latency

#ifndef FNCTLRAIILOCK_HPP
#define FNCTLRAIILOCK_HPP

#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <glog/logging.h>

/*
 * fnctl struct wrapper
 */
struct FnctlRAIILock{
    flock locker_{};
    FnctlRAIILock(){
        memset(&locker_, 0, sizeof(locker_));
        locker_.l_type = F_WRLCK;
    }
};

/*
 * RAII Type Lock
 */
class fnctlRaiiLock {
public:
    // lock on construction, no default constructor
    explicit fnctlRaiiLock(int fd){
        fileDescriptor_ = fd;
        fcntl (fileDescriptor_, F_SETLKW, &raiLocker_);
    }

    // unlock on destruction
    ~fnctlRaiiLock(){
        raiLocker_.locker_.l_type = F_UNLCK;
        fcntl (fileDescriptor_, F_SETLKW, &raiLocker_); // unlock
        fileDescriptor_ = int(); // reset fd
    }


private:
    // delete this
    fnctlRaiiLock(const fnctlRaiiLock&) = delete; // no copy
    int fileDescriptor_;
    FnctlRAIILock raiLocker_;
};


#endif

Every time you acquire the fnctl lock, you need to save filedescriptor which is a integer number. And lock should be non-copyable so we can create-destruct within scopes. in fnctl I used F_SETLKW since I did not lock specific byte/part of the file. where you can find type of locks on Linux description http://man7.org/linux/man-pages/man2/fcntl.2.html

       F_SETLKW (struct flock *)
              As for F_SETLK, but if a conflicting lock is held on the file,
              then wait for that lock to be released.  If a signal is caught
              while waiting, then the call is interrupted and (after the
              signal handler has returned) returns immediately (with return
              value -1 and errno set to EINTR; see signal(7)).

Finally we can wrap our lock in functions to test


    // Sample Test functions

    void lockFileToWrite(std::string& filenametowrite) {
                    {
                        fnctlRaiiLock guardfnctl(eachEntry.fd); // locked
                        LOG(INFO) << "Writing";
                        std::this_thread::sleep_for(std::chrono::milliseconds(3000));
                    }
                    LOG(INFO) << "Sleeping";
                    std::this_thread::sleep_for(std::chrono::milliseconds(4000));
    }


    void readLockedFile(std::string& filenametoread){
                        {
                            fnctlRaiiLock guardfnctl(eachEntry.fd); // locked
                            // unlocked
                            std::this_thread::sleep_for(std::chrono::milliseconds(2000));
                            LOG(INFO) << "Reading" ;
                        }
    }

Conclusion

File Locks are not reliable for most of the cases, but if you think your latency within network would be around 2+ seconds, and you will have infrequent read and writes EFS file system can be used to leverage fnctl() lock. I tested above function on AWS EC2 and EFS within the same region and it works pretty much as expected.

For better lock systems in general, or inter-process communications, database tables could be used, there is also an example that AWS demonstrated with dynamoDb;

https://aws.amazon.com/blogs/database/building-distributed-locks-with-the-dynamodb-lock-client/

Or you might be better off with Leader Selection Algorithms similar to ZooKeeper where you need 3 or more machines (1 Master, 2 Workers)

https://zookeeper.apache.org/doc/r3.5.0-alpha/zookeeperOver.html#:~:targetText=ZooKeeper%20is%20simple.,similar%20to%20files%20and%20directories.

In next article I will try to explain myself distributed-lock using Postgres, where I am influenced by AWS article above.