knowledge/technology/linux/filesystems/MergerFS.md

58 KiB

repo obj
https://github.com/trapexit/mergerfs filesystem

MergerFS

mergerfs is a union filesystem geared towards simplifying storage and management of files across numerous commodity storage devices. It is similar to mhddfs, unionfs, and aufs.

Usage: mergerfs -o<options> <branches> <mountpoint>

mergerfs logically merges multiple paths together. Think a union of sets. The file/s or directory/s acted on or presented through mergerfs are based on the policy chosen for that particular action

See MergerFS Tools for managing mergerfs.

Terminology

  • branch: A base path used in the pool.
  • pool: The mergerfs mount. The union of the branches.
  • relative path: The path in the pool relative to the branch and mount.
  • function: A filesystem call (open, unlink, create, getattr, rmdir, etc.)
  • category: A collection of functions based on basic behavior (action, create, search).
  • policy: The algorithm used to select a file when performing a function.
  • path preservation: Aspect of some policies which includes checking the path for which a file would be created.

Basic Setup

Command Line:

mergerfs -o cache.files=partial,dropcacheonclose=true,category.create=mfs /mnt/hdd0:/mnt/hdd1 /media

/etc/fstab:

/mnt/hdd0:/mnt/hdd1 /media fuse.mergerfs cache.files=partial,dropcacheonclose=true,category.create=mfs 0 0

Systemd:

[Unit]
Description=mergerfs service

[Service]
Type=simple
KillMode=none
ExecStart=/usr/bin/mergerfs \
  -f \
  -o cache.files=partial \
  -o dropcacheonclose=true \
  -o category.create=mfs \
  /mnt/hdd0:/mnt/hdd1 \
  /media
ExecStop=/bin/fusermount -uz /media
Restart=on-failure

[Install]
WantedBy=default.target

Options

These options are the same regardless of whether you use them with the mergerfs commandline program, in fstab, or in a config file.

mount option description
config Path to a config file. Same arguments as below in key=val / ini style format.
branches Colon delimited list of branches.
minfreespace=SIZE The minimum space value used for creation policies. Can be overridden by branch specific option. Understands 'K', 'M', and 'G' to represent kilobyte, megabyte, and gigabyte respectively. (default: 4G)
moveonenospc=BOOL|POLICY When enabled if a write fails with ENOSPC (no space left on device) or EDQUOT (disk quota exceeded) the policy selected will run to find a new location for the file. An attempt to move the file to that branch will occur (keeping all metadata possible) and if successful the original is unlinked and the write retried. (default: false, true = mfs)
inodecalc=passthrough|path-hash|devino-hash|hybrid-hash Selects the inode calculation algorithm. (default: hybrid-hash)
dropcacheonclose=BOOL When a file is requested to be closed call posix_fadvise on it first to instruct the kernel that we no longer need the data and it can drop its cache. Recommended when cache.files=partial|full|auto-full|per-process to limit double caching. (default: false)
symlinkify=BOOL When enabled and a file is not writable and its mtime or ctime is older than symlinkify_timeout files will be reported as symlinks to the original files. Please read more below before using. (default: false)
symlinkify_timeout=UINT Time to wait, in seconds, to activate the symlinkify behavior. (default: 3600)
nullrw=BOOL Turns reads and writes into no-ops. The request will succeed but do nothing. Useful for benchmarking mergerfs. (default: false)
lazy-umount-mountpoint=BOOL mergerfs will attempt to "lazy umount" the mountpoint before mounting itself. Useful when performing live upgrades of mergerfs. (default: false)
ignorepponrename=BOOL Ignore path preserving on rename. Typically rename and link act differently depending on the policy of create (read below). Enabling this will cause rename and link to always use the non-path preserving behavior. This means files, when renamed or linked, will stay on the same filesystem. (default: false)
security_capability=BOOL If false return ENOATTR when xattr security.capability is queried. (default: true)
xattr=passthrough|noattr|nosys Runtime control of xattrs. Default is to passthrough xattr requests. 'noattr' will short circuit as if nothing exists. 'nosys' will respond with ENOSYS as if xattrs are not supported or disabled. (default: passthrough)
link_cow=BOOL When enabled if a regular file is opened which has a link count > 1 it will copy the file to a temporary file and rename over the original. Breaking the link and providing a basic copy-on-write function similar to cow-shell. (default: false)
statfs=base|full Controls how statfs works. 'base' means it will always use all branches in statfs calculations. 'full' is in effect path preserving and only includes branches where the path exists. (default: base)
statfs_ignore=none|ro|nc 'ro' will cause statfs calculations to ignore available space for branches mounted or tagged as 'read-only' or 'no create'. 'nc' will ignore available space for branches tagged as 'no create'. (default: none)
nfsopenhack=off|git|all A workaround for exporting mergerfs over NFS where there are issues with creating files for write while setting the mode to read-only. (default: off)
branches-mount-timeout=UINT Number of seconds to wait at startup for branches to be a mount other than the mountpoint's filesystem. (default: 0)
follow-symlinks=never|directory|regular|all Turns symlinks into what they point to. (default: never)
link-exdev=passthrough|rel-symlink|abs-base-symlink|abs-pool-symlink When a link fails with EXDEV optionally create a symlink to the file instead.
rename-exdev=passthrough|rel-symlink|abs-symlink When a rename fails with EXDEV optionally move the file to a special directory and symlink to it.
readahead=UINT Set readahead (in kilobytes) for mergerfs and branches if greater than 0. (default: 0)
posix_acl=BOOL Enable POSIX ACL support (if supported by kernel and underlying filesystem). (default: false)
async_read=BOOL Perform reads asynchronously. If disabled or unavailable the kernel will ensure there is at most one pending read request per file handle and will attempt to order requests by offset. (default: true)
fuse_msg_size=UINT Set the max number of pages per FUSE message. Only available on Linux >= 4.20 and ignored otherwise. (min: 1; max: 256; default: 256)
threads=INT Number of threads to use. When used alone (process-thread-count=-1) it sets the number of threads reading and processing FUSE messages. When used together it sets the number of threads reading from FUSE. When set to zero it will attempt to discover and use the number of logical cores. If the thread count is set negative it will look up the number of cores then divide by the absolute value. ie. threads=-2 on an 8 core machine will result in 8 / 2 = 4 threads. There will always be at least 1 thread. If set to -1 in combination with process-thread-count then it will try to pick reasonable values based on CPU thread count. NOTE: higher number of threads increases parallelism but usually decreases throughput. (default: 0)
read-thread-count=INT Alias for threads.
process-thread-count=INT Enables separate thread pool to asynchronously process FUSE requests. In this mode read-thread-count refers to the number of threads reading FUSE messages which are dispatched to process threads. -1 means disabled otherwise acts like read-thread-count. (default: -1)
process-thread-queue-depth=UINT Sets the number of requests any single process thread can have queued up at one time. Meaning the total memory usage of the queues is queue depth multiplied by the number of process threads plus read thread count. 0 sets the depth to the same as the process thread count. (default: 0)
pin-threads=STR Selects a strategy to pin threads to CPUs (default: unset)
scheduling-priority=INT Set mergerfs' scheduling priority. Valid values range from -20 to 19. See setpriority man page for more details. (default: -10)
fsname=STR Sets the name of the filesystem as seen in mount, df, etc. Defaults to a list of the source paths concatenated together with the longest common prefix removed.
func.FUNC=POLICY Sets the specific FUSE function's policy. See below for the list of value types. Example: func.getattr=newest
func.readdir=seq|cosr|cor|cosr INT|cor:INT: Sets readdir policy. INT value sets the number of threads to use for concurrency. (default: seq)
category.action=POLICY Sets policy of all FUSE functions in the action category. (default: epall)
category.create=POLICY Sets policy of all FUSE functions in the create category. (default: epmfs)
category.search=POLICY Sets policy of all FUSE functions in the search category. (default: ff)
cache.open=UINT 'open' policy cache timeout in seconds. (default: 0)
cache.statfs=UINT 'statfs' cache timeout in seconds. (default: 0)
cache.attr=UINT File attribute cache timeout in seconds. (default: 1)
cache.entry=UINT File name lookup cache timeout in seconds. (default: 1)
cache.negative_entry=UINT Negative file name lookup cache timeout in seconds. (default: 0)
cache.files=libfuse|off|partial|full|auto-full|per-process File page caching mode (default: libfuse)
cache.files.process-names=LIST A pipe | delimited list of process comm names to enable page caching for when cache.files=per-process. (default: "rtorrent|qbittorrent-nox")
cache.writeback=BOOL Enable kernel writeback caching (default: false)
cache.symlinks=BOOL Cache symlinks (if supported by kernel) (default: false)
cache.readdir=BOOL Cache readdir (if supported by kernel) (default: false)
parallel-direct-writes=BOOL Allow the kernel to dispatch multiple, parallel (non-extending) write requests for files opened with cache.files=per-process (if the process is not in process-names) or cache.files=off. (This requires kernel support, and was added in v6.2)
direct_io deprecated - Bypass page cache. Use cache.files=off instead. (default: false)
kernel_cache deprecated - Do not invalidate data cache on file open. Use cache.files=full instead. (default: false)
auto_cache deprecated - Invalidate data cache if file mtime or size change. Use cache.files=auto-full instead. (default: false)
async_read deprecated - Perform reads asynchronously. Use async_read=true instead.
sync_read deprecated - Perform reads synchronously. Use async_read=false instead.
splice_read deprecated - Does nothing.
splice_write deprecated - Does nothing.
splice_move deprecated - Does nothing.
allow_other deprecated - mergerfs always sets this FUSE option as normal permissions can be used to limit access.
use_ino deprecated - mergerfs should always control inode calculation so this is enabled all the time.

Value Types

Type Value
BOOL 'true' | 'false'
INT [MIN_INT,MAX_INT]
UINT [0,MAX_INT]
SIZE 'NNM'; NN = INT, M = 'K' | 'M' | 'G' | 'T'
STR string (may refer to an enumerated value, see details of argument)
FUNC filesystem function
CATEGORY function category
POLICY mergerfs function policy

branches

The 'branches' argument is a colon (':') delimited list of paths to be pooled together. It does not matter if the paths are on the same or different filesystems nor does it matter the filesystem type (within reason). Used and available space will not be duplicated for paths on the same filesystem and any features which aren't supported by the underlying filesystem (such as file attributes or extended attributes) will return the appropriate errors.

Branches currently have two options which can be set. A type which impacts whether or not the branch is included in a policy calculation and a individual minfreespace value. The values are set by prepending an = at the end of a branch designation and using commas as delimiters. Example: /mnt/drive=RW,1234

branch mode

  • RW: (read/write) - Default behavior. Will be eligible in all policy categories.
  • RO: (read-only) - Will be excluded from create and action policies. Same as a read-only mounted filesystem would be (though faster to process).
  • NC: (no-create) - Will be excluded from create policies. You can't create on that branch but you can change or delete.

globbing

To make it easier to include multiple branches mergerfs supports globbing. The globbing tokens MUST be escaped when using via the shell else the shell itself will apply the glob itself.

# mergerfs /mnt/hdd\*:/mnt/ssd /media

The above line will use all mount points in /mnt prefixed with hdd and ssd.

To have the pool mounted at boot or otherwise accessible from related tools use /etc/fstab.

# <file system>        <mount point>  <type>         <options>             <dump>  <pass>
/mnt/hdd*:/mnt/ssd    /media          fuse.mergerfs  minfreespace=16G      0       0

Functions, Categories and Policies

The POSIX filesystem API is made up of a number of functions. creat, stat, chown, etc. For ease of configuration in mergerfs most of the core functions are grouped into 3 categories: action, create, and search. These functions and categories can be assigned a policy which dictates which branch is chosen when performing that function.

Functions and their Category classifications

Category FUSE Functions
action chmod, chown, link, removexattr, rename, rmdir, setxattr, truncate, unlink, utimens
create create, mkdir, mknod, symlink
search access, getattr, getxattr, ioctl (directories), listxattr, open, readlink
N/A fchmod, fchown, futimens, ftruncate, fallocate, fgetattr, fsync, ioctl (files), read, readdir, release, statfs, write, copy_file_range

Policies

A policy is the algorithm used to choose a branch or branches for a function to work on or generally how the function behaves.

A policy's behavior differs, as mentioned above, based on the function it is used with. Sometimes it really might not make sense to even offer certain policies because they are literally the same as others but it makes things a bit more uniform.

Policy Description
all Search: For mkdir, mknod, and symlink it will apply to all branches. create works like ff.
epall (existing path, all) For mkdir, mknod, and symlink it will apply to all found. create works like epff (but more expensive because it doesn't stop after finding a valid branch).
epff (existing path, first found) Given the order of the branches, as defined at mount time or configured at runtime, act on the first one found where the relative path exists.
eplfs (existing path, least free space) Of all the branches on which the relative path exists choose the branch with the least free space.
eplus (existing path, least used space) Of all the branches on which the relative path exists choose the branch with the least used space.
epmfs (existing path, most free space) Of all the branches on which the relative path exists choose the branch with the most free space.
eppfrd (existing path, percentage free random distribution) Like pfrd but limited to existing paths.
eprand (existing path, random) Calls epall and then randomizes. Returns 1.
ff (first found) Given the order of the branches, as defined at mount time or configured at runtime, act on the first one found.
lfs (least free space) Pick the branch with the least available free space.
lus (least used space) Pick the branch with the least used space.
mfs (most free space) Pick the branch with the most available free space.
msplfs (most shared path, least free space) Like eplfs but if it fails to find a branch it will try again with the parent directory. Continues this pattern till finding one.
msplus (most shared path, least used space) Like eplus but if it fails to find a branch it will try again with the parent directory. Continues this pattern till finding one.
mspmfs (most shared path, most free space) Like epmfs but if it fails to find a branch it will try again with the parent directory. Continues this pattern till finding one.
msppfrd (most shared path, percentage free random distribution) Like eppfrd but if it fails to find a branch it will try again with the parent directory. Continues this pattern till finding one.
newest Pick the file / directory with the largest mtime.
pfrd (percentage free random distribution) Chooses a branch at random with the likelihood of selection based on a branch's available space relative to the total.
rand (random) Calls all and then randomizes. Returns 1 branch.