Fusefilters

Fusefilters

Filesystem algebra. One day. Maybe.

Here is the source (now with a README file :-). It's a hack. You might expect it to compile, but not much more. It does work for me, though.

Background

The idea came from the fusexmp filesystem bundled with fuse. This example implements each method in the fuse interface by calling the corresponding libc function, thus providing a "transparent" filter over the root filesystem.
I realized this was very usefull for implementing small changes in a filesystem semantics. For example, you could convert all filenames to uppercase "on the fly" by slightly modifying the fusexmp source. This is analogous to "filesystem semantic filters", hence the name.

Unfortunatelly, there are two caveats to this approach.
First, on an abstract level, one would wish that a small filter would be expressed with a small source code. Unfortunatelly, the fusexmp filter (which represents the particularly simple identity function) has a quite long source code expressions.
Second, when combining several filters (an operation analogous to function composition), this approach becomes messy since it requires a low level "embedding" of the filters in one big source code.

There is another way to implement filter composition with fuse: just implement each filter separatelly and mount these filesystems in a "piggyback" fashion one on top of the other. It works fine, but adds a lot of overhead to the resulting filesystem.

Fusefilters aims at solving both these problems. Filters expressing simple behaviours should ultimately be described by simple expressions, and these filters would then be combined together in a flexible way to express more complex filesystem behaviours.

Current implementation

For now, I decided to stick to a C implementation of the filters that closely resembles the general shape of a "normal" fuse filesystem. A filter is then made of a set of functions that implement the semantics of each fuse operation. For example, the unlink() function of the afformentioned "uppercase" filter could look like this:

int my_unlink (char *path)
{
	return next_unlink (to_uppercase (path));
}

The set of all "my_<operation>" functions is the input interface of the filter whereas the set of all "next_<operation>" is its output interface. During operation, when an application calls unlink(), the my_unlink() function of the first filter is called, which in turn calls the next_unlink() function which is bound to the my_unlink() function of the second filter, and so on.
One function can also call other functions in the fuse interface: for example, if you want to move a file to a trash folder instead of deleting it, the unlink() function of the trash filter would call the next_rename() function instead.

Binding the output interface

A filter has an input interface and an output interface.
The input interface is the same as with any fuse filesystem : a set of methods implementing filesystem semantics, grouped in the fuse way in a fuse_operations structure. A regular fuse filesystem declares this interface to the fuse library by calling the fuse_main() function. Likewise, a filter declares its input interface through an init() callback function. The process of turning a fuse filesystem into a fusefilter plugin is here very simple : just rewrite the main() function of the filesystem to an init() function that does the same thing and passes it's fuse_operations structure back as a result instead of passing it to the fuse_main() function.

The output interface is a little more troublesome. In the filter's point of view, it is composed of all the references to the next fuse_operations methods that the filter needs to access. In the case of fusexmp, it is for example the references to the lstat() function in the xmp_getattr() method, along with all other libc functions that this filesystems calls.
The current implementation of fusefilters binds the output interface of a plugin by providing the init() callback with a fuse_operation structure of its own. The filter's methods then call these functions explicitly.

Here is a complete example, showing the output interface variable, an unlink() method, the input interface variable and the init() callback:

static struct fuse_operations next;

int my_unlink (char *path)
{
	return next.unlink (to_uppercase (path));
}

static struct fuse_operations my_oper = {
	.unlink = my_unlink
};

struct fuse_operations init (const struct fuse_operations next_oper)
{
	next = next_oper;
	return my_oper;
}

Preprocessor hacks

One of the goals of fusefilters is to express simple operations with short implementations. Now, let's considere the (quite simple) operation of path translation : I want all requests (getattr, unlink, etc.) with a specific path <path1> to be turned into the same request to a new path <prefix>/<path1>. In effect, this would allow to filter the calls to fusexmp so that it can mirror any directory (instead of just the root directory) without the need to modify fusexmp itself.

Well, maybe I'm not familiar with C well enough, or maybe C just isn't the right language to do such things, but all I came up with is the ugly C preprocessor hack used in translate_path.c and null.c. Any suggestion is welcome at this point.

Future developments

With a little added documentation and a configuration system, fusefilters would be quite usable (at least for me). Nonetheless, I would like it to become much more than just that. Here are some of my ideas.

Binary compatible output interface

The idea here is to just take fusexmp in its compiled fuse-application state and plug it into fusefilter. I was unable to do this mainly because I don't know of any way to tell the dynamic linking loader that I want to bind to the output interface of the library I load.

To illustrate, let's say I want to load a library that defines a foo() function. I know (with dlopen() and dlsym()) how to call this function in my own code. But suppose that this foo() function calls an (internally undefined) bar() function. How do I tell the dynamic linking loader that I want all calls of this bar() function to be directed to one of my own functions? What if bar() is a commonplace libc function like lstat()?

In retrospect, I'm not sure this would be very usefull. The fuse_operation interface is very compact, but there are so many ways available to a C library to access the filesystem that just binding the standard libc functions like read() would hardly do the trick. What if the library calls an external program through system()? What about the stdio calls? All in all, C and unix are much too messy for this kind of "semantic transformations".

Higher-level plugins

In the prospect of describing filesystem behaviour in the simplest possible way, it would be nice to be able to write filters in a higher level (functionnal) language tailored to the task and that could be changed on the fly without the need to even unmount the filesystem or recompile anything.

Filter graphs

Right now, the "geometry" of the set of filters is a simple chain. It could be possible to define filters with multiple output interfaces (for example, a unionfs-like filter that would need access to two separate filesystems) and thus express much more complex filter functions in the form of graphs of filters.
Such graphs could even contain loops and thus define recursive semantics at the filesystem level.

Filesystem Algebra

Finally, such a complex and powerfull tool for defining filesystem behaviour would need to be formalized, leading to the definition of a filesystem algebra that would allow all kind of behaviour to be simply deducted from a set of primitives.