FAT file systems do not use inodes, instead all file meta-information
is stored in directory entries.
FAT12 and FAT16 use a fixed size area for root directories, with
typically 512 entries of 32 bytes each (for a total of 16 KB) on hard
disk formats. The file system data is stored in clusters of typically
512 to 4096 bytes, depending on the size of the file system.
The current code uses the offset of a DOS 8.3 style directory entry as
a pseudo-inode, which leads to inode values of 0 to 16368 for typical
root directories with 512 entries.
Sub-directories use 2 cluster length plus the byte offset of the
directory entry in the data area for the pseudo-inode, which may be
as low as 1024 in case of 512 byte clusters. A sub-directory in
cluster 2 and with 512 byte clusters will therefore lead to a
re-use of inode 1024 when there are at least 32 DOS 8.3 style
filenames in the root directory (or 11 14-character Windows
long file names, each of which takes up 3 directory entries).
FAT32 file systems are not affected by this issue and FAT12/FAT16
file systems with larger cluster sizes are unlikely to have as
many directory entries in the root directory as are required to
cause the collision.
This commit leads to inode numbers that are guaranteed to not collide
for all valid FAT12 and FAT16 file system parameters. It does also
provide a small speed-up due to more efficient use of the vnode cache.
Approved by: mckusick
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D43978
The is a bug in MSDOSFS that can be triggered when the target of a
rename operation exists. It is caused by the lack of inodes in the
FAT file system, which are substituted by the location of the DOS 8.3
directory entry in the file system. This causes the "inode" of a file
to change when its directory entry is moved to a different location.
The rename operation wants to re-use the existing directory entry
position of an existing target file name (POS1). But the code does
instead locate the first position in the directory that provides
sufficient free directory slots (POS2) to hold the target file name
and fills it with the directory data.
The rename operation continues and at the end writes directory data to
the initially retrieved location (POS1) of the old target directory.
This leads to 2 directory entries for the target file, but with
inconsistent data in the directory and in the cached file system
state.
The location that should have been re-used (POS1) is marked as deleted
in the directory, and new directory data has been written to a
different location (POS2). But the VFS cache has the newly written
data stored under the inode number that corresponds to the initially
planned position (POS1).
If then a new file is written, it can allocate the deleted directory
entries (POS1) and when it queries the cache, it retrieves data that
is valid for the target of the prior rename operation, leading to a
corrupt directory entry (at POS1) being written (DOS file name of the
earlier rename target combined with the Windows long file name of the
newly written file).
PR: 268005
Reported by: wbe@psr.com
Approved by: kib, mckusick
Fixes: 2c9cbc2d45
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D43951
If taskqueue_enqueue() returned error, unbusy().
Handle parallel calls to msdosfs_integrity_error() by unbusying in
msdosfs_remount_ro() up to pending times.
Noted and reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D43482
Properly working storage and correct filesystem structure indeed only
allow the EJUSTRETURN return code, but since the called function needs
to read directory blocks and (re)parse the content, the assert is not
neccessary hold.
PR: 276408
Reported by: John F. Carr
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D43482
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.
Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/
Sponsored by: Netflix
It contains arbitrary garbage, which is not cleared by vfs_bio_clrbuf()
which only zeroes invalid portions of the pages.
Reported by: Maxim Suhanov <dfirblog@gmail.com>
Discussed with: so
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
Some combinations of FAT12 file system parameters could cause a kernel
panic due to an unmapped access if the size of the FAT was larger than
the CPU page size. The reason is that FAT12 uses 3 bytes to store
2 FAT pointers, leading to partial FAT pointers at the end of buffers
of a size that is not a multiple of 3.
With a typical page size of 4 KB, this caused the FAT entry at byte
offsets 4095 and 4096 to cross the page boundary, with only the first
page mapped. This was fixed by adjusting the mapping to always cover
both bytes of each FAT entry.
Testing revealed 2 other inconsistencies that are fixed by this commit:
1) The calculation of the size of the data area did not take into
account the fact that the first two data block numbers are reserved
and that the data area starts with block 2. This could cause a
FAT12 file system created with the maximum supported number of
blocks to be incorrectly identified as FAT16.
2) The root directory does not take up space in the data area of a
FAT12 or FAT16 file system, since it is placed into a reserved
area outside of that data area. This commits makes stat() report
the logical size of the root directory, but with 0 blocks allocated
from the data area.
PR: 270587
Reviewed by: mckusick
Differential Revision: https://reviews.freebsd.org/D39386
This update implements tallying of free directory entries during
create, delete, or rename operations on FAT12 and FAT16 file systems.
Prior to this change, the total number of root directory entries
was reported as number of inodes, but 0 as the number of free
inodes, causing system health monitoring software to warn about
a suspected disk full issue.
The FAT12 and FAT16 file systems provide a limited number of
root directory entries, e.g. 512 on typical hard disk formats.
The valid range of values is 1 to 65535, but the msdosfs code
will effectively round up "odd" values to the next multiple of 16
(e.g. 513 would allow for 528 root directory entries).
This update implements tracking of directory entries during create,
delete, or rename operations, with initial values determined by
scanning the directory when the file system is mounted.
Total and free directory entries are reported in the f_files and
f_ffree elements of struct statfs, despite differences in semantics
of these values:
- There is no limit on the number of files and directories that can
be created on a FAT file system. Only the root directory of FAT12
and FAT16 file systems is limited, any number of files can still be
created in sub-directories, even when 0 free "inodes" are reported.
- A single file can require 1 to 21 directory entries, depending on
the character set, structure, and length of the name. The DOS 8.3
style file name takes up 1 entry, and if the name does not comply
with the syntax of a DOS 8.3 file name, 1 additional entry is used
for each 13 characters of the file name. Since all these entries
have to be contiguous, it is possible that a file or directory with
a long name can not be created, despite a sufficient total number of
free directory entries.
- Renaming a file can require more directory entries than currently
allocated to store its long name, which may prevent an in-place
update of the name if more entries are needed. This may cause a
rename operation to fail if no contiguous range of free entries for
the new name can be found.
- The volume label is stored in a directory entry. An empty FAT file
system with a volume label will therefore show 1 used "inode" in
df.
- The perceentage of free inodes shown in df or monitoring tools does
only represent the state of the root directory of a FAT12 or FAT16
file system. Neither does a reported value of 0% free inodes does
prevent files from being created in sub-directories, nor does a
value of 50% free inodes guarantee that even a single file with
a "long" name can be created in the root directory (if every other
directory entry is occupied and there are no 2 contiguous entries).
The statfs(2) and df(1) man pages have been updated with a notice
regarding the possibly different semantics of values reported as
total and free inodes for non-Unix file systems.
PR: 270053
Reported by: Ben Woods <woodsb02@freebsd.org>
Approved by: mckusick
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D38987
Building with -DMSDOSFS_DEBUG failed due to a format mismatch and
a variable that has been renamed but not updated in the printf()
parameter list.
MFC after: 1 month
Suppose that the cluster size is larger than page size. If the buffer
at the old EOF (before extending) was partial and dirty, it cannot be
automatically neither written out nor validated by the buffer cache,
since extending buffer adds invalid pages at the end.
Correct the buffer state by calling vfs_bio_clrbuf() on it, to mark
newly added and zeroed pages as valid.
Note that UFS is immune to the problem because ffs_truncate() always
allocate the block and buffer for the last byte of the file.
PR: 269341
Reported by: asomers
In collaboration with: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38549
If extension fails, vnode pager recorded size might be left increased.
Only update vnode pager when extension is past the point of no rollback.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38549
There is no point in clearing just this flag. Flags are reset on the
struct mount re-allocation for reuse anyway.
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37966
To quote from a comment above vput_final:
<quote>
* XXX Some filesystems pass in an exclusively locked vnode and strongly depend
* on the lock being held all the way until VOP_INACTIVE. This in particular
* happens with UFS which adds half-constructed vnodes to the hash, where they
* can be found by other code.
</quote>
As is there is no mechanism which allows filesystems to denote that a
vnode is fully initialized, consequently problems like the above are
only found the hard way(tm).
Add rudimentary support for state transitions, which in particular allow
to assert the vnode is not legally unlocked until its fate is decided
(either construction finishes or vgone is called to abort it).
The new field lands in a 1-byte hole, thus it does not grow the struct.
Bump __FreeBSD_version to 1400077
Reviewed by: kib (previous version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D37759
This removes some of the complexity needed to maintain HASBUF and
allows for removing injecting SAVENAME by filesystems.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D36542
HugeSectors * BytesPerSec should be computed before converting
HugeSectors to a DEV_BSIZE-based count.
Fixes: ba2c98389b ("msdosfs: sanity check sector count from BPB")
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34264
When allocating new vnode, we need to lock it exclusively before
making it externally visible. Since other threads cannot observe the
vnode yet, current lock order cannot create LoR conditions.
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34126
to prevent races with devfs VCHR vnode reclamation, same as it was
done for UFS.
Reported by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33721
A function to remount the filesystem from rw to ro on integrity error.
The work is performed in taskqueue to allow the call to be done from
almost arbitrary context where erronous state was detected.
Tested by: pho
Reviewed by: markj, mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33721
We use sector count to size the FAT inuse bitset. If sector count is
corrupted, kernel might be tricked into doing unbound allocation.
Ensure that the sector count does not exceed the actual volume size.
In collaboration with: pho
Reviewed by: markj, mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33721
Change its return type to void, because its result is ignored in both
call sites. Remove oldcnp argument as well, it is NULL always.
Suggested and reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33721
This means that filesystem is corrupted, there is a loop.
In collaboration with: pho
Reviewed by: markj, mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33721
In other words, stop silently accepting freeing free cluster in
non-debug kernels, but return the error to the caller. Modify callers
to handle errors from usemap_free().
In collaboration with: pho
Reviewed by: markj, mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33721
It is possible, on the corrupted msdosfs volume, to have file which
denode inode number does not match the one calculated using directory
cluster. Instead of asserting the condition as impossible, handle it
and return error, after reclaiming the aliased vnode.
In collaboration with: pho
Reviewed by: markj, mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33721
The cookies argument is only used by the NFS server. NFSv2 defines the
cookie as 32 bits on the wire, but NFSv3 increased it to 64 bits. Our
VOP_READDIR, however, has always defined it as u_long, which is 32 bits
on some architectures. Change it to 64 bits on all architectures. This
doesn't matter for any in-tree file systems, but it matters for some
FUSE file systems that use 64-bit directory cookies.
PR: 260375
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D33404
Similar to the UFS revision 8df4bc48c8
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31464