POSIX 알아보기 #2 : 가상 파일 시스템 VFS(Virtual Filesystem)의 의의와 동작 방식

NAVER CLOUD PLATFORM

Published in

NAVER CLOUD PLATFORM

29 min readAug 24, 2021

네이버클라우드 개발자가 직접 전하는 기술 이야기

POSIX 알아보기 2편 : VFS(Virtual Filesystem) 의의와 동작 방식

지난 1편에서는 애플리케이션 인터페이스 규격 POSIX(포직스)의 등장 배경과 의의부터, 다양한 Linux(리눅스) 파일 시스템의 종류와 특징을 알아보았습니다.

👇POSIX 알아보기 1편👇 리눅스 파일 시스템의 종류와 특징

네이버 클라우드 플랫폼 (NAVER Cloud Platform) : 네이버 블로그

Edit description

blog.naver.com

이번 2편에서는 애플리케이션이 다양한 파일 시스템에 일관된 방식으로 접근할 수 있도록 도와주는 VFS(Virtual filesystem)에 대해 알아보겠습니다.

* 본문의 모든 코드는 linux kernel v5.11을 기준으로 작성되었습니다.

VFS (Virtual Filesystem)

VFS(Virtual filesystem)는 애플리케이션과 파일시스템 사이의 가상적인(virtual) 층으로 애플리케이션이 다양한 파일시스템에 일관된 방식으로 접근할 수 있도록 합니다.

먼저 VFS가 존재하지 않는다고 가정해보겠습니다. 하드디스크를 3개의 파티션으로 나누고 각각 EXT2, EXT4, XFS 파일 시스템을 mount 했을 때 사용자 task는 EXT2 파일시스템에 저장된 파일에 접근할 때는 EXT2 고유의 함수를 호출해야 하고, EXT4 파일시스템에 저장된 파일에 접근할 때는 EXT4 고유의 함수를 호출해야 합니다.

이처럼 VFS가 존재하지 않는 경우에는 사용자가 직접 파일시스템의 종류를 판별하고 그에 해당하는 함수를 호출해야 하는 문제점이 발생합니다.

반대로 VFS가 존재하는 경우를 살펴보겠습니다.

이제 애플리케이션은 open(), read(), write()와 같이 일관된 함수를 통해 파일 시스템에 접근할 수 있습니다.

VFS는 인자에 담겨있는 파일을 확인하여 해당 파일을 관리하는 파일 시스템이 무엇인지를 판단합니다. 그리고 사용자가 호출한 일관된 함수에 맞는 파일 시스템 고유의 함수를 호출합니다. 또한 파일 시스템의 함수가 리턴 한 결과를 애플리케이션에 전달합니다.

다시 말해, VFS는 애플리케이션이 접근하는 파일이 어느 파일시스템에 저장되었는지 고려할 필요 없이 일관된 POSIX 표준 인터페이스를 이용해 파일에 접근하는 것을 가능하게 합니다.

VFS 객체

이제 VFS의 구현에 대해 더 자세히 살펴보겠습니다. VFS는 다양한 파일 시스템과 데이터를 주고받기 위해 4개의 객체(super block, inode, file, dentry)를 이용합니다.

1. super block

super block 객체는 마운트 된 파일 시스템 구조에 관한 정보를 제공하며 파일시스템 당 하나씩 주어집니다.

파일시스템을 마운트 하면 VFS는 파일시스템 고유의 마운트 함수를 호출하면서 인자로 빈 super block 객체를 전달합니다. 그러면 파일 시스템 고유의 마운트 함수는 VFS가 넘긴 super block 객체의 내용을 채워 리턴합니다. ‘fs/super.c’에 정의된 alloc_super() 함수에 의해 super block 객체가 생성 및 초기화됩니다.

super block의 실제 구현 코드를 통해 더 자세히 알아보겠습니다.

* super_block은 ‘include/linux/fs.h’에 정의되어 있습니다.

struct super_block {
    struct list_head            s_list; /* Keep this first */
    dev_t                      s_dev; /* search index; _not_ kdev_t */
    unsigned char              s_blocksize_bits;
    unsigned long              s_blocksize;
    loff_t                      s_maxbytes; /* Max file size */
    struct file_system_type    *s_type;
    const struct super_operations        *s_op;
    ...
    unsigned long              s_magic;
    struct dentry              *s_root;
    ...
}

이 중 s_op field에 대해 더 자세히 알아보겠습니다.

* super_operations은 ‘include/linux/fs.h’에 정의되어 있습니다.

struct super_operations {
    struct inode *(*alloc_inode)(struct super_block *sb);
    void (*destroy_inode)(struct inode *);
    void (*free_inode)(struct inode *);
    void (*dirty_inode) (struct inode *, int flags);
    int (*write_inode) (struct inode *, struct writeback_control *wbc);
    int (*drop_inode) (struct inode *);
    void (*evict_inode) (struct inode *);
    void (*put_super) (struct super_block *);
    int (*sync_fs)(struct super_block *sb, int wait);
    int (*freeze_super) (struct super_block *);
    int (*freeze_fs) (struct super_block *);
    int (*thaw_super) (struct super_block *);
    int (*unfreeze_fs) (struct super_block *);
    int (*statfs) (struct dentry *, struct kstatfs *);
    int (*remount_fs) (struct super_block *, int *, char *);
    void (*umount_begin) (struct super_block *);
    int (*show_options)(struct seq_file *, struct dentry *);
    int (*show_devname)(struct seq_file *, struct dentry *);
    int (*show_path)(struct seq_file *, struct dentry *);
    int (*show_stats)(struct seq_file *, struct dentry *);
#ifdef CONFIG_QUOTA
    ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
    ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
    struct dquot **(*get_dquots)(struct inode *);
#endif
    int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
    long (*nr_cached_objects)(struct super_block *, struct shrink_control *);
    long (*free_cached_objects)(struct super_block *, struct shrink_control *);
};

super_operations 구조체는 파일시스템과 관련된 함수들의 함수 포인터로 구성되어 있습니다. VFS는 파일 시스템을 마운트 할 때 파일시스템 고유의 함수들을 super_block의 s_op field에 매핑합니다.

EXT4 파일 시스템은 마운트 하는 경우를 예로 들어보겠습니다.

* 아래 코드는 ‘fs/ext4/super.c’에 정의되어 있습니다.

static int ext4_fill_super(struct super_block *sb, void *data, int silent) 
{
    ...
    sb->s_op = &ext4_sops;
    ...
}
static const struct super_operations ext4_sops = {
    .alloc_inode        = ext4_alloc_inode,
    .free_inode         = ext4_free_in_core_inode,
    .destroy_inode      = ext4_destroy_inode,
    .write_inode        = ext4_write_inode,
    .dirty_inode        = ext4_dirty_inode,
    .drop_inode         = ext4_drop_inode,
    .evict_inode        = ext4_evict_inode,
    .put_super          = ext4_put_super,
    .sync_fs            = ext4_sync_fs,
    .freeze_fs          = ext4_freeze,
    .unfreeze_fs        = ext4_unfreeze,
    .statfs            = ext4_statfs,
    .remount_fs        = ext4_remount,
    .show_options      = ext4_show_options,
#ifdef CONFIG_QUOTA
    .quota_read         = ext4_quota_read,
    .quota_write        = ext4_quota_write,
    .get_dquots         = ext4_get_dquots,
#endif
    .bdev_try_to_free_page = bdev_try_to_free_page,
};

위의 코드처럼 EXT4 파일 시스템 고유의 함수들을 각 함수 포인터에 매핑합니다. 그러면 이후부터는 EXT4 파일 시스템 super_block 구조체의 s_op field를 통해 EXT4 파일 시스템 고유의 함수를 호출할 수 있습니다.

2. inode

inode 객체는 특정 파일에 관한 메타데이터 정보를 제공하며 파일 당 하나씩 주어집니다. (VFS의 inode 객체와 EXT 계열 파일시스템의 inode는 다릅니다.) 보통 파일을 생성할 때 alloc_inode() 함수를 실행하여 inode 객체를 할당합니다.

- alloc_inode() 함수는‘fs/inode.c’에 정의되어 있습니다.

static struct inode *alloc_inode(struct super_block *sb)
{
    const struct super_operations *ops = sb->s_op;
    struct inode *inode;
    if (ops->alloc_inode)
        inode = ops->alloc_inode(sb);
    ...
    return inode;
}

파일 시스템을 마운트 할 때 super_block의 s_op field를 초기화했기 때문에 ops->alloc_inode(sb)를 호출하면 EXT4 파일시스템의 경우 ext4_alloc_inode 함수가 호출됩니다. 이처럼 super_operations를 통해 파일시스템 고유의 함수를 호출하여 inode 객체를 할당 및 초기화할 수 있습니다.

inode의 실제 구현 코드를 통해 inode 객체에 대해 더 자세히 살펴보겠습니다.

* inode는 ‘include/linux/fs.h’에 정의되어 있습니다.

struct inode {
    umode_t              i_mode;
    unsigned short        i_opflags;
    kuid_t                i_uid;
    kgid_t                i_gid;
    ...
    const struct inode_operations    *i_op;
    struct super_block    *i_sb;
    ...
    loff_t                  i_size;
    struct timespec64      i_atime;
    struct timespec64      i_mtime;
    struct timespec64      i_ctime;
    ...
}

이 중 i_op field는 super_block의 s_op field와 같이 파일시스템 고유의 연산이 등록되어 있습니다.

* inode_operations는 ‘include/linux/fs.h’에 정의되어 있습니다.

struct inode_operations {
    struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
    const char * (*get_link) (struct dentry *, struct inode *, struct delayed_call *);
    int (*permission) (struct inode *, int);
    struct posix_acl * (*get_acl)(struct inode *, int);
    int (*readlink) (struct dentry *, char __user *,int);
    int (*create) (struct inode *,struct dentry *, umode_t, bool);
    int (*link) (struct dentry *,struct inode *,struct dentry *);
    int (*unlink) (struct inode *,struct dentry *);
    int (*symlink) (struct inode *,struct dentry *,const char *);
    int (*mkdir) (struct inode *,struct dentry *,umode_t);
    int (*rmdir) (struct inode *,struct dentry *);
    int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
    int (*rename) (struct inode *, struct dentry *, struct inode *, struct dentry *, unsigned int);
    int (*setattr) (struct dentry *, struct iattr *);
    int (*getattr) (const struct path *, struct kstat *, u32, unsigned int);
    ssize_t (*listxattr) (struct dentry *, char *, size_t);
    int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start, u64 len);
    int (*update_time)(struct inode *, struct timespec64 *, int);
    int (*atomic_open)(struct inode *, struct dentry *, struct file *, unsigned open_flag, umode_t create_mode);
    int (*tmpfile) (struct inode *, struct dentry *, umode_t);
    int (*set_acl)(struct inode *, struct posix_acl *, int);
} ____cacheline_aligned;

이처럼 inode_operations 구조체는 inode와 관련된 함수들의 함수 포인터로 구성되어 있습니다.

3. file

file 객체는 open 한 파일과 연관된 정보를 제공하며 해당 파일을 open 한 task 당 하나씩 주어집니다.

두 개 이상의 task가 하나의 파일에 접근하는 경우 inode 객체는 하나만 존재하게 됩니다. 그러나 task들이 접근하는 위치(offset)와 같은 정보들은 task마다 다르게 유지되어야 합니다. 이때 task와 연관된 파일 정보를 담기 위해 file 객체가 사용됩니다.

file은 ‘include/linux/fs.h’에 정의되어 있습니다.

struct file {
    ...
    struct path                f_path;
    struct inode                *f_inode; /* cached value */
    const struct file_operations    *f_op;
    ...
    atomic_long_t              f_count;
    unsigned int                f_flags;
    fmode_t                    f_mode;
    struct mutex                f_pos_lock;
    loff_t                      f_pos;
    ...
}

f_op field의 file operations 또한 파일과 관련된 파일시스템 고유의 연산이 등록되어 있습니다.

- file_operations는 ‘include/linux/fs.h’에 정의되어 있습니다.

struct file_operations {
    struct module *owner;
    loff_t (*llseek) (struct file *, loff_t, int);
    ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
    ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
    ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
    ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
    int (*iopoll)(struct kiocb *kiocb, bool spin);
    int (*iterate) (struct file *, struct dir_context *);
    int (*iterate_shared) (struct file *, struct dir_context *);
    __poll_t (*poll) (struct file *, struct poll_table_struct *);
    long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
    long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
    int (*mmap) (struct file *, struct vm_area_struct *);
    unsigned long mmap_supported_flags;
    int (*open) (struct inode *, struct file *);
    int (*flush) (struct file *, fl_owner_t id);
    int (*release) (struct inode *, struct file *);
    int (*fsync) (struct file *, loff_t, loff_t, int datasync);
    int (*fasync) (int, struct file *, int);
    int (*lock) (struct file *, int, struct file_lock *);
    ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
    unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
    int (*check_flags)(int);
    int (*flock) (struct file *, int, struct file_lock *);
    ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
    ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);
    int (*setlease)(struct file *, long, struct file_lock **, void **);
    long (*fallocate)(struct file *file, int mode, loff_t offset, loff_t len);
    void (*show_fdinfo)(struct seq_file *m, struct file *f);
#ifndef CONFIG_MMU
    unsigned (*mmap_capabilities)(struct file *);
#endif
    ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int);
    loff_t (*remap_file_range)(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t len, unsigned int remap_flags);
    int (*fadvise)(struct file *, loff_t, loff_t, int);
} __randomize_layout;

사용자가 open(), read(), write() 등의 일관된 함수를 호출하면 file_operations 구조체에 연결된 파일 시스템 고유의 open, read, write 함수(EXT4의 경우 ext4_file_open, ext4_file_read_iter, ext4_file_write_iter 함수)가 실행됩니다.

4. dentry

dentry 객체는 directory와 관련된 정보를 제공합니다.

경로 검색과 같은 directory-specific 한 작업들을 수행해야 하는 경우 dentry 객체가 사용됩니다. 예를 들어 /home/user/name으로 구성된 파일을 열면 4개의 dentry 객체가 존재하게 됩니다. (root directory, home directory, user directory, name file) 즉, dentry 객체는 일반 파일을 포함한 경로의 모든 파일에 대해 생성됩니다.

root directory는 상위 계층이 없는 유일한 dentry이며 이외의 모든 dentry에는 상위계층이 존재합니다. 하위 계층을 가지고 있는 dentry도 존재합니다.

* dentry는 ‘include/linux/dcache.h’에 정의되어 있습니다.

struct dentry {
    ...
    struct dentry *d_parent;      /* parent directory */
    struct qstr d_name;
    struct inode *d_inode;        /* Where the name belongs to - NULL is
                                  * negative */
    unsigned char d_iname[DNAME_INLINE_LEN];    /* small names */
    ...
    const struct dentry_operations *d_op;
    struct super_block *d_sb;            /* The root of the dentry tree */
    ...
    struct list_head d_child;            /* child of parent list */
    struct list_head d_subdirs;           /* our children */ 
    ...
} __randomize_layout;

d_op field의 dentry_operations 또한 directory와 관련된 파일시스템 고유의 연산이 등록되어 있습니다.

VFS 객체 구조

위에서 설명한 VFS의 4가지 객체(super block, inode, file, dentry)의 구조를 그림으로 나타내면 다음과 같습니다.

위 그림은 2개의 task에서 동일한 파일을 open 한 경우를 보여줍니다.

파일을 open 하면 파일의 각 객체를 그림처럼 연결한 후 file descriptor를 리턴합니다. 따라서 file descriptor를 통해 file의 f_op field에 접근하여 파일시스템 고유의 read, write 함수 호출이 가능합니다.

(file, dentry, inode, super_block의 f_op, d_op, i_op, s_op field는 모두 파일시스템 고유의 함수를 호출하기 위해 사용되는 구조입니다.)

file descriptor 0번, 1번, 2번은 각각 stdin(표준 입력), stdout(표준 출력), stderr(표준 에러 출력)로 설정되어 있으므로 파일을 처음 open 하면 file descriptor는 3번부터 할당됩니다.

kernel v3.8까지는 file 구조체에 f_inode field가 존재하지 않았으며 dentry 구조체를 통해 inode 구조체에 접근해야 했습니다. 그러나 약 linux kernel v3.9부터 file 구조체에 inode 구조체를 캐싱 하기 위한 f_inode field가 추가되었습니다.

f_path field는 dentry 구조체를 가리키는데 좀 더 정확하게 표현하면 file 구조체가 path 구조체인 f_path field를 가지고 있고 path 구조체에는 dentry 구조체 포인터인 dentry field가 존재하는 것입니다.

* path 구조체는 ‘include/linux/path.h’에 정의되어 있습니다.

struct path {
    struct vfsmount *mnt;
    struct dentry *dentry;
} __randomize_layout;

파일 open 동작

이제 파일 open 시에 VFS 동작 과정에 대해 살펴보겠습니다.

VFS는 파일 open 시에 크게 4가지 동작을 수행합니다.

1. file descriptor 할당 : alloc_fd()
2. file 객체 할당 : __alloc_file()
3. file 객체 초기화 및 파일시스템 고유의 open 함수 호출 : do_dentry_open()
4. file descriptor table에 file descriptor 등록 : fd_install()

먼저 User space에서 open() 함수를 호출하면 kernel space의 sys_open() 함수가 실행됩니다.

sys_open() 함수는 do_sys_open() 함수를 호출하고 do_sys_open() 함수는 do_sys_openat2() 함수를 호출합니다.

* do_sys_openat2()는 ‘fs/open.c’에 정의되어 있습니다.

static long do_sys_openat2(int dfd, const char __user *filename,
                          struct open_how *how)
{
    struct open_flags op;
    int fd = build_open_flags(how, &op);
    ...
    fd = get_unused_fd_flags(how->flags);
    if (fd >= 0) {
        struct file *f = do_filp_open(dfd, tmp, &op);
        if (IS_ERR(f)) {
            put_unused_fd(fd);
            fd = PTR_ERR(f);
        } else {
            fsnotify_open(f);
            fd_install(fd, f);
        }
    }
    putname(tmp);
    return fd;
}

get_unused_fd_flags(), do_filp_open(), fd_install() 함수에 주목해 주세요. do_sys_openat2 함수의 동작을 순서대로 정리하면 다음과 같습니다.

1. get_unused_fd_flags() 함수는 alloc_fd() 함수를 호출하고 alloc_fd() 함수는 file descriptor를 할당하여 리턴합니다.
- 이때 커널은 해당 프로세스의 file descriptor 번호 중에 사용하지 않는 가장 작은 값을 할당합니다.
2. do_filp_open() 함수는 file 객체 할당 및 초기화를 수행하고 파일시스템 고유의 open 함수를 호출합니다.
3. fd_install() 함수는 alloc_fd() 함수에서 할당한 file descriptor를 file descriptor table에 등록합니다.
4. 함수 실행이 완료되면 file descriptor를 리턴합니다. 따라서 사용자 task가 open() 함수를 호출하면 open 한 파일의 fd 번호를 리턴 받게 됩니다.

이 중 do_filp_open() 함수를 더 자세히 살펴보겠습니다. do_filp_open() 함수는 path_openat() 함수를 호출합니다.

- path_openat()는 ‘fs/namei.c’에 정의되어 있습니다.

static struct file *path_openat(struct nameidata *nd,
                                const struct open_flags *op, unsigned flags)
{
    struct file *file;
    int error;
    file = alloc_empty_file(op->open_flag, current_cred());
    if (IS_ERR(file))
        return file;
    if (unlikely(file->f_flags & __O_TMPFILE)) {
        ...
    } else if (unlikely(file->f_flags & O_PATH)) {
        ...
    } else {
        ...
        if (!error)
            error = do_open(nd, file, op);
        ...
    }
    ...
    return ERR_PTR(error);
}

alloc_empty_file(), do_open() 함수에 주목해 주세요.

1. alloc_empty_file() 함수는 __alloc_file() 함수를 호출하여 file 객체를 할당합니다.
2. do_open() 함수는 do_dentry_open() 함수를 호출합니다.

* do_dentry_open()은‘fs/open.c’에 정의되어 있습니다.

static int do_dentry_open(struct file *f,
                          struct inode *inode,
                          int (*open)(struct inode *, struct file *))
{
    static const struct file_operations empty_fops = {};
    int error;
    path_get(&f->f_path);
    f->f_inode = inode;
    f->f_mapping = inode->i_mapping;
    f->f_wb_err = filemap_sample_wb_err(f->f_mapping);
    f->f_sb_err = file_sample_sb_err(f);
    if (unlikely(f->f_flags & O_PATH)) {
        f->f_mode = FMODE_PATH | FMODE_OPENED;
        f->f_op = &empty_fops;
        return 0;
    }
    ...
    f->f_op = fops_get(inode->i_fop);
    ...
    if (!open)
        open = f->f_op->open;
    if (open) {
        error = open(inode, f);
        if (error)
            ...
    }
    ...
    return 0;
}

do_dentry_open() 함수는 할당한 file 객체를 초기화하고 file 객체의 f_op field를 통해 파일시스템 고유의 open 함수를 호출합니다. 위에서 설명한 EXT4를 예로 들면 f->f_op->open() 호출 시 ext4_file_open() 함수가 실행되는 것입니다.

사용자 task가 open() 함수를 호출하여 ext4_file_open() 함수가 실행될 때까지의 과정을 정리하면 다음과 같습니다.

이제 file descriptor와 file 객체, dentry 객체, inode 객체, super_block 객체가 모두 연결되었기 때문에 사용자 task는 file descriptor를 이용하여 read(), write()와 같은 일관된 함수를 실행하여 파일을 조작할 수 있습니다.

지금까지 VFS(Virtual Filesystem)의 의의와 동작 방식 등에 대해 알아보았습니다.

정리하자면 VFS는 애플리케이션이 실제 파일시스템에 관계없이 공통된 인터페이스로 접근할 수 있도록 하는 계층입니다. VFS는 파일 시스템뿐만 아니라 디바이스 드라이버, procfs에도 적용되기 때문에 디바이스에 접근하는 경우에도 open(), read(), write()와 같이 일관된 인터페이스를 이용할 수 있습니다.