katzekit 2 - FUSE File Systems

A file system is software used to store and organize computer files. The hard drive we talk about in everyday life is just a storage medium. You can think of it as a blank sheet of paper. The paper itself does not know how to divide data into blocks, and it certainly cannot guarantee that every bit will still be correct after sitting there for a while. The file system is what manages that sheet of paper. In addition to storing metadata, most file systems also provide checksums and other integrity features so user data does not silently change when the storage medium itself has problems.

Because a file system is software, it can also provide extra features such as permissions, encryption, compression, and snapshots. Those features live at the file-system layer, so users do not need to care about their internal implementation; they only interact with the exposed interfaces. File systems can even be built on top of other file systems, as in projects like cvsfs-fuse.

Before user-space file systems became practical, file-system development was mostly the job of kernel engineers. Building a new file system meant understanding kernel programming and kernel-side concepts such as VFS. As removable storage became more common and the need for custom data-management layers grew, that development model started to feel unnecessarily painful. What people really needed was a simpler way to build file systems, one that could still integrate cleanly with existing operating-system APIs. That is where FUSE (Filesystem in Userspace) comes in.

The goal of this post is to take a quick look at the FUSE API and its design style so later katze development can borrow some of those ideas.

A quick introduction to FUSE

FUSE (Filesystem in Userspace) defines a straightforward API for communication between a file system and the kernel. FUSE itself is designed as a kernel module, while the user-space file system implements the FUSE API and cooperates with that module to provide file-system access from user space.

File systems built with FUSE can link directly against the FUSE library. In other words, you do not need deep kernel knowledge or kernel programming just to build one, which makes file-system development dramatically more approachable.

History and predecessors

User-space file systems were not a brand-new idea when FUSE appeared. A few earlier designs already existed:

LUFS was a hybrid user-space filesystem framework that could transparently support an unlimited number of file systems for applications through a kernel module and a user-space daemon;
the Ufo project on Solaris was a global file system that let users treat remote files as though they were local.

One of FUSE’s main goals was to bring this style of file-system implementation to Linux.

FUSE operations

To build a file system with FUSE, you install the FUSE kernel module and use the FUSE library and API set from user space.

In practice, most modern Linux distributions already ship FUSE in their package repositories, and the kernel module is typically available out of the box.

The fuse_operations structure exposes the main callbacks:

struct fuse_operations {
    int (*getattr)(const char *, struct stat *);
    int (*readlink)(const char *, char *, size_t);
    int (*getdir)(const char *, fuse_dirh_t, fuse_dirfil_t);
    int (*mknod)(const char *, mode_t, dev_t);
    int (*mkdir)(const char *, mode_t);
    int (*unlink)(const char *);
    int (*rmdir)(const char *);
    int (*symlink)(const char *, const char *);
    int (*rename)(const char *, const char *);
    int (*link)(const char *, const char *);
    int (*chmod)(const char *, mode_t);
    int (*chown)(const char *, uid_t, gid_t);
    int (*truncate)(const char *, off_t);
    int (*utime)(const char *, struct utimbuf *);
    int (*open)(const char *, struct fuse_file_info *);
    int (*read)(const char *, char *, size_t, off_t, struct fuse_file_info *);
    int (*write)(const char *, const char *, size_t, off_t, struct fuse_file_info *);
    int (*statfs)(const char *, struct statfs *);
    int (*flush)(const char *, struct fuse_file_info *);
    int (*release)(const char *, struct fuse_file_info *);
    int (*fsync)(const char *, int, struct fuse_file_info *);
    int (*setxattr)(const char *, const char *, const char *, size_t, int);
    int (*getxattr)(const char *, const char *, char *, size_t);
    int (*listxattr)(const char *, char *, size_t);
    int (*removexattr)(const char *, const char *);
};

getattr: get file attributes. This is similar to stat(). st_dev and st_blksize are ignored. Unless use_ino is enabled, st_ino is ignored too;
readlink: read a symbolic link;
getdir: read directory contents. This callback effectively combines the opendir(), readdir(), and closedir() sequence. filldir() should be called for each directory entry;
mknod: create a file node;
mkdir: create a directory;
unlink: delete a file;
rmdir: delete a directory;
symlink: create a symbolic link;
rename: rename a file;
link: create a hard link;
chmod: change file permissions;
chown: change file owner and group;
truncate: change file size;
utime: change access and modification times;
open: open a file;
read: read a file. read() should return exactly the number of bytes requested except at EOF or on error. One exception is when direct_io is enabled, in which case the return value from read() becomes the return value of the system call itself;
write: write to a file;
statfs: get file-system status;
flush: flush buffers;
release: release an open file;
fsync: synchronize file data;
setxattr: set extended attributes;
getxattr: get extended attributes;
listxattr: list extended attributes;
removexattr: remove extended attributes.

Not all of these operations are strictly required. A complete file system can still be built by implementing only a subset of them.

For katze, we only need read-side behavior, so the API surface can be simplified quite a bit:

getattr
getdir
read
statfs
getprops

The real implementation will still add a few internal methods. The main simplification I made was turning file reads into a single read operation that returns only the requested byte range rather than loading the whole file at once. In a production implementation, that path may still need optimization. For files that live directly on the host file system, the file descriptor should probably stay open; otherwise each read call has to reopen and close the descriptor again, which wastes I/O. For files that live inside an image, though, that is much less necessary, because all data is already located step by step through the relay object down to the image file itself.