Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.
Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/
Sponsored by: Netflix
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.
Sponsored by: Netflix
Check for valid file size before processing journal entries for it.
Done by extracting the file size check from pass1.c into chkfilesize()
then using it in the journal code in suj.c
Reported-by: Robert Morris
PR: 271378
MFC-after: 1 week
Sponsored-by: The FreeBSD Foundation
Always create a directory inode structure when a directory inode is
found in Pass 1 as it is not known whether it will be saved or removed
in later passes. If it is to be saved the directory inode structure
is needed to track its status and fsck_ffs(8) will segment fault if
it does not exist.
Reported-by: Robert Morris
PR: 271310
PR: 271354
MFC-after: 1 week
Sponsored-by: The FreeBSD Foundation
The previous change involved calling check_cgmagic() twice in a row
for the same CG in order to differentiate when the CG was already ok vs.
when the CG was rebuilt, but that doesn't work because the second call
(which was supposed to rebuild the CG) returns 0 (indicating that
the CG was not rebuilt) due to the prevfailcg check causing an early
failure return. Fix this by moving the rebuild part of check_cgmagic()
out into a separate function which is called by pass1() when it wants to
rebuild a CG.
Fixes: da86e7a20d
Reported by: pho
Discussed with: mckusick
Sponsored by: Netflix
Pass 1 of fsck_ffs checks the integrity of all the cylinder groups.
If any are found to have been corrupted it offers to rebuild them.
Pass 5 then makes a second pass over the cylinder groups to validate
their block and inode maps. Pass 5 assumes that the cylinder groups
are not corrupted and can segment fault if they are corrupted. Rather
than rerunning the corruption checks a second time in pass 5, this
fix keeps track whether any corrupt cylinder groups were found but not
fixed in pass 1 either due to running with the -n flag or by explicitly
answering `no' when asked whether to fix a corrupted cylinder group.
If any corrupted cylinder groups remain after pass 1, fsck_ffs will
decline to run pass 5. Instead it marks the filesystem as unclean
so that fsck_ffs will need to be run again before the filesystem can
be mounted.
This patch cleans up and documents the return value from check_cgmagic().
It also renames the variable / parameter "rebuildcg" to "rebuiltcg".
This parameter describes whether the cylinder group has been rebuilt
rather than whether it should be rebuilt.
Reported by: Chuck Silvers
Reviewed by: Chuck Silvers
MFC after: 1 week
The algorithm for laying out new directories was devised in the 1980s
and markedly improved the performance of the filesystem. In those days
large disks had at most 100 cylinder groups and often as few as 10-20.
Modern multi-terrabyte disks have thousands of cylinder groups. The
original algorithm does not handle these large sizes well. This change
attempts to expand the scope of the original algorithm to work well
with these much larger disks while still retaining the properties
of the original algorithm for small disks.
The filesystem implementation is divided into policy routines and
implementation routines. The policy routines can be changed in any
way desired without risk of corrupting the filesystem. The policy
requests are handled by the implementation layer. If the policy
asks for an available resource, it is granted. But if it asks for
an already in-use resource, then the implementation will provide
an available one nearby the request. Thus it is impossible for a
policy to double allocate. This change is limited to the policy
implementation.
This change updates the ffs_dirpref() routine which is responsible
for selecting the cylinder group into which a new directory should
be placed. If we are near the root of the filesystem we aim to
spread them out as much as possible. As we descend deeper from the
root we cluster them closer together around their parent as we
expect them to be more closely interactive. Higher-level directories
like usr/src/sys and usr/src/bin should be separated while the
directories in these areas are more likely to be accessed together
so should be closer. And directories within commands or kernel
subsystems should be closer still.
We pick a range of cylinder groups around the cylinder group of the
directory in which we are being created. The size of the range for
our search is based on our depth from the root of our filesystem.
We then probe that range based on how many directories are already
present. The first new directory is at 1/2 (middle) of the range;
the second is in the first 1/4 of the range, then at 3/4, 1/8, 3/8,
5/8, 7/8, 1/16, 3/16, 5/16, etc.
It is desirable to store the depth of a directory in its on-disk
inode so that it is available when we need it. We add a new field
di_dirdepth to track the depth of each directory. Because there are
few spare fields left in the inode, we choose to share an existing
field in the inode rather than having one of our own. Specifically
we create a union with the di_freelink field. The di_freelink field
is used to track inodes that have been unlinked but remain referenced.
It is not needed until a rmdir(2) operation has been done on a
directory. At that point, the directory has no contents and even
if it is kept active as a current directory is no longer able to
have any new directories or files created in it. Thus the use of
di_dirdepth and di_freelink will never coincide.
Reported by: Timo Voelker
Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39246
If a directory entry has an illegal inode number (less than zero
or greater than the last inode in the filesystem) the entry is removed.
If a directory '.' or '..' entry had an illegal inode number they
were being removed. Since fsck_ffs knows what the correct value is
for these two entries fix them rather deleting them.
Add much more extensive cylinder group checks and use them to be
more careful about rebuilding a cylinder group.
Check for out-of-range block numbers before trying to free them.
When a directory is deleted also remove its cache entry created
in pass1 so that later passes do not try to operate on a deleted
directory.
Check for ctime(3) returning NULL before trying to use its return.
When freeing a directory inode, do not try to interpret it as a
directory.
Reserve space in the inostatlist to have room to allocate a
lost+found directory.
If an invalid block number is found past the end of an inode simply
remove it rather than clearing and removing the inode.
Modernize the inoinfo structure to use queue(3) LIST rather than a
handrolled linked list implementation.
Reported by: Bob Prohaska, John-Mark Gurney, and Mark Millard
Tested by: Peter Holm
Reviewed by: Peter Holm
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38668
When something was found wrong with an inode the error message
was always "UNKNOWN FILE TYPE". This error is now used only when
the file type field is wrong. Other errors have their own messages:
"BAD FILE SIZE", "NEGATIVE FILE SIZE", "BAD SPECIAL-FILE RDEV",
"INVALID DIRECT BLOCK", and "INVALID INDIRECT BLOCK".
More complete information about the inode is also provided.
Sponsored by: The FreeBSD Foundation
UFS does not allow files to end with a hole; it requires that the
last block of a file be allocated. As fsck_ffs(8) initially scans
each allocated inode, it tracks the last allocated block in the
inode. It then checks that the inode's size falls in the last
allocated block. If the last allocated block falls before the size,
a `file size beyond end of allocated file' warning is issued and
the file is shortened to reference the last allocated block (to avoid
having it reference a hole at its end). If the last allocated block
falls after the size, a `partially truncated file' warning is issued
and all blocks following the block referenced by the size are freed.
Because of an incorrect unsigned comparison, this test was failing
to handle files with no allocated blocks but non-zero size (which
should have had their size reset to zero). Once that was fixed the
test started incorrectly complaining about short symbolic links
that place the link path in the inode rather than in a disk block.
Because these symbolic links have a non-zero size, but no allocated
blocks, fsck_ffs wanted to zero out their size. This patch has to
detect and avoid changing the size of such symbolic links.
Reported by: Chuck Silvers
Tested by: Chuck Silvers
MFC after: 1 week
Sponsored by: Netflix
making fsck_ffs(8) run faster, there should be no functional change.
The original fsck_ffs(8) had its own disk I/O management system.
When gjournal(8) was added to FreeBSD 7, code was added to fsck_ffs(8)
to do the necessary gjournal rollback. Rather than use the existing
fsck_ffs(8) disk I/O system, it wrote its own from scratch. Similarly
when journalled soft updates were added in FreeBSD 9, code was added
to fsck_ffs(8) to do the necessary journal rollback. And once again,
rather than using either of the existing fsck_ffs(8) disk I/O
systems, it wrote its own from scratch. Lastly the fsdb(8) utility
uses the fsck_ffs(8) disk I/O management system. In preparation for
making the changes necessary to enable snapshots to be taken when
using journalled soft updates, it was necessary to have a single
disk I/O system used by all the various subsystems in fsck_ffs(8).
This commit merges the functionality required by all the different
subsystems into a single disk I/O system that supports all of their
needs. In so doing it picks up optimizations from each of them
with the results that each of the subsystems does fewer reads and
writes than it did with its own customized I/O system. It also
greatly simplifies making changes to fsck_ffs(8) since everything
goes through a single place. For example the ginode() function
fetches an inode from the disk. When inode check hashes were added,
they previously had to be checked in the code implementing inode
fetch in each of the three different disk I/O systems. Now they
need only be checked in ginode().
Tested by: Peter Holm
Sponsored by: Netflix
last allocated block of the file and if that is found, shortens the
file to reference the last allocated block thus avoiding having it
reference a hole at its end.
This update corrects an error where fsck_ffs miscalculated the last
logical block of the file when the file contained a large hole.
Reported by: Jamie Landeg-Jones
Tested by: Peter Holm
MFC after: 2 weeks
Sponsored by: Netflix
inodes that reference directories. While here tighten the check for
comparing the last logical block with the end of the file.
Reported by: Peter Holm
Tested by: Peter Holm
Sponsored by: Netflix
shorter than its size resulting in a hole as its final block (which
is a violation of the invarients of the UFS filesystem).
Soft updates will always ensure that the file size is correct when
writing inodes to disk for files that contain only direct block
pointers. However soft updates does not roll back sizes for files
with indirect blocks that it has set to unallocated because their
contents have not yet been written to disk. Hence, the file can
appear to have a hole at its end because the block pointer has been
rolled back to zero when its inode was written to disk. Thus,
fsck_ffs calculates the last allocated block in the file. For files
that extend into indirect blocks, fsck_ffs checks for a size past
the last allocated block of the file and if that is found, shortens
the file to reference the last allocated block thus avoiding having
it reference a hole at its end.
Submitted by: Chuck Silvers <chs@netflix.com>
Tested by: Chuck Silvers <chs@netflix.com>
MFC after: 1 week
Sponsored by: Netflix
Followup to r313780. Also prefix ext2's and nandfs's versions with
EXT2_ and NANDFS_.
Reported by: kib
Reviewed by: kib, mckusick
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D9623
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
The loop that scans the used inode map when soft updates is in use
assumes that the inosused variable is signed. However, ino_t is
unsigned, so the loop invariant is incorrect and the check for
inosused wrapping to < 0 can never be true.
Instead of checking for wrap after the fact just prevent it from
happening in the first place.
PR: 218592
Submitted by: Todd Miller <todd.miller@courtesan.com>
Reviewed by: mckusick
MFC after: 1 week
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.
Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
errors have been detected in a particular run.
Clean up the global state variables so that a restart can happen correctly.
Separate the global variables in fsck_ffs and fsdb to their own file. This
fixes header sharing with fscd.
Correctly initialize, static-ize, and remove global variables as needed in
dir.c. This fixes a problem with lost+found directories that was causing
a segfault.
Correctly initialize, static-ize, and remove global variables as needed in
suj.c.
Initialize the suj globals before allocating the disk object, not after.
Also ensure that 'preen' mode doesn't conflict with 'restart' mode
Submitted by: scottl, max
Reviewed by: max, mckusick (earlier version)
Obtained from: Netflix
MFC after: 3 days
Clang errors around printf could be trivially fixed, but the breakage in
sbin/fsdb were to significant for this type of change.
Submitter of this changeset has been notified and hopefully this can be
restored soon.
that they do not need to be read again in pass5. As this nearly
doubles the memory requirement for fsck, the cache is thrown away
if other memory needs in fsck would otherwise fail. Thus, the
memory footprint of fsck remains unchanged in memory constrained
environments.
This work was inspired by a paper presented at Usenix's FAST '13:
www.usenix.org/conference/fast13/ffsck-fast-file-system-checker
Details of this implementation appears in the April 2013 of ;login:
www.usenix.org/publications/login/april-2013-volume-38-number-2.
A copy of the April 2013 ;login: paper can also be downloaded
from: www.mckusick.com/publications/faster_fsck.pdf.
Reviewed by: kib
Tested by: Peter Holm
MFC after: 4 weeks
robust. With these changes fsck is now able to detect and reliably
rebuild corrupted cylinder group maps. The -D option is no longer
necessary as it has been replaced by a prompt asking whether the
corrupted cylinder group should be rebuilt and doing so when requested.
These actions are only offered and taken when running fsck in manual
mode. Corrupted cylinder groups found during preen mode cause the fsck
to fail.
Add the -r option to free up excess unused inodes. Decreasing the
number of preallocated inodes reduces the running time of future
runs of fsck and frees up space that can allocated to files. The -r
option is ignored when running in preen mode.
Reviewed by: Xin LI <delphij@>
Sponsored by: Rsync.net
number read from cylinder group. Chances that we read a smarshed
cylinder group, and we can not 100% trust information it has
supplied. fsck_ffs(8) will crash otherwise for some cases.
count of zero and instead encode this information in the inode state.
Pass 4 performed a linear search of this list for each inode in
the file system, which performs poorly if the list is long.
Reviewed by: sam & keramida (an earlier version of the patch), mckusick
MFC after: 1 month
imposed by the filesystem structure itself remains. With 16k blocks,
the maximum file size is now just over 128TB.
For now, the UFS1 file size limit is left unchanged so as to remain
consistent with RELENG_4, but it too could be removed in the future.
Reviewed by: mckusick