Commit graph

1233 commits

Author SHA1 Message Date
Liav A 69f41eb062 Kernel: Reject create links on paths that were not unveiled as writable
This solves one of the security issues being mentioned in issue #15996.
We simply don't allow creating hardlinks on paths that were not unveiled
as writable to prevent possible bypass on a certain path that was
unveiled as non-writable.
2022-12-03 11:00:34 -07:00
Liav A 0bb7c8f4c4 Kernel+SystemServer: Don't hardcode coredump directory path
Instead, allow userspace to decide on the coredump directory path. By
default, SystemServer sets it to the /tmp/coredump directory, but users
can now change this by writing a new path to the sysfs node at
/sys/kernel/variables/coredump_directory, and also to read this node to
check where coredumps are currently generated at.
2022-12-03 05:56:59 -07:00
Liav A 7dcf8f971b Kernel: Rename SysFSSystemBoolean => SysFSSystemBooleanVariable 2022-12-03 05:56:59 -07:00
Liav A 95d8aa2982 Kernel: Allow read access sparingly to some /sys/kernel directory nodes
Those nodes are not exposing any sensitive information so there's no
harm in exposing them.
2022-12-03 05:47:58 -07:00
Liav A 1ca0ac5207 Kernel: Disallow jailed processes to read files in /sys/kernel directory
By default, disallow reading of values in that directory. Later on, we
will enable sparingly read access to specific files.

The idea that led to this mechanism was suggested by Jean-Baptiste
Boric (also known as boricj in GitHub), to prevent access to sensitive
information in the SysFS if someone adds a new file in the /sys/kernel
directory.
2022-12-03 05:47:58 -07:00
Liav A 2e55956784 Kernel: Forbid access to /sys/kernel/power_state for Jailed processes
There's simply no benefit in allowing sandboxed programs to change the
power state of the machine, so disallow writes to the mentioned node to
prevent malicious programs to request that.
2022-12-03 05:47:58 -07:00
Andreas Kling 4dd148f07c Kernel: Add File::is_regular_file()
This makes it easy and expressive to check if a File is a regular file.
2022-11-29 11:09:19 +01:00
Liav A 718ae68621 Kernel+LibCore+LibC: Implement support for forcing unveil on exec
To accomplish this, we add another VeilState which is called
LockedInherited. The idea is to apply exec unveil data, similar to
execpromises of the pledge syscall, on the current exec'ed program
during the execve sequence. When applying the forced unveil data, the
veil state is set to be locked but the special state of LockedInherited
ensures that if the new program tries to unveil paths, the request will
silently be ignored, so the program will continue running without
receiving an error, but is still can only use the paths that were
unveiled before the exec syscall. This in turn, allows us to use the
unveil syscall with a special utility to sandbox other userland programs
in terms of what is visible to them on the filesystem, and is usable on
both programs that use or don't use the unveil syscall in their code.
2022-11-26 12:42:15 -07:00
sin-ack 3b03077abb Kernel: Update the ".." inode for directories after a rename
Because the ".." entry in a directory is a separate inode, if a
directory is renamed to a new location, then we should update this entry
the point to the new parent directory as well.

Co-authored-by: Liav A <liavalb@gmail.com>
2022-11-25 17:33:05 +01:00
Andreas Kling a9d55ddf57 Kernel/TmpFS: Update mtime instead of ctime when asked to update mtime 2022-11-24 16:56:27 +01:00
Andreas Kling 10fa72d451 Kernel: Use AK::Time for InodeMetadata timestamps instead of time_t
Before this change, we were truncating the nanosecond part of file
timestamps in many different places.
2022-11-24 16:56:27 +01:00
Andreas Kling fb00d3ed25 Kernel+lsirq: Track per-CPU IRQ handler call counts
Each GenericInterruptHandler now tracks the number of calls that each
CPU has serviced.

This takes care of a FIXME in the /sys/kernel/interrupts generator.

Also, the lsirq command line tool now displays per-CPU call counts.
2022-11-19 15:39:30 +01:00
Andreas Kling 9b3db63e14 Kernel: Rename GenericInterruptHandler "invoking count" to "call count" 2022-11-19 15:39:30 +01:00
Liav A 31d4c07dee Kernel: Add missing includes for Mount.h file 2022-11-11 10:25:54 +01:00
Liav A 3cc0d60141 Kernel: Split the Ext2FileSystem.{cpp,h} files into smaller components 2022-11-08 02:54:48 -07:00
Liav A 1c91881a1d Kernel: Split the ISO9660FileSystem.{cpp,h} files to smaller components 2022-11-08 02:54:48 -07:00
Liav A fca3b7f1f9 Kernel: Split the DevPtsFS files into smaller components 2022-11-08 02:54:48 -07:00
Liav A 3fc52a6d1c Kernel: Split the Plan9FileSystem.{cpp,h} file into smaller components 2022-11-08 02:54:48 -07:00
Liav A 3906dd3aa3 Kernel: Split the ProcFS core file into smaller components 2022-11-08 02:54:48 -07:00
Liav A e882b2ed05 Kernel: Split the FATFileSystem.{cpp,h} files into smaller components 2022-11-08 02:54:48 -07:00
Liav A 5e6101dd3e Kernel: Split the TmpFS core files into smaller components 2022-11-08 02:54:48 -07:00
Liav A f53149d5f6 Kernel: Split the SysFS core files into smaller components 2022-11-08 02:54:48 -07:00
Liav A 5e062414c1 Kernel: Add support for jails
Our implementation for Jails resembles much of how FreeBSD jails are
working - it's essentially only a matter of using a RefPtr in the
Process class to a Jail object. Then, when we iterate over all processes
in various cases, we could ensure if either the current process is in
jail and therefore should be restricted what is visible in terms of
PID isolation, and also to be able to expose metadata about Jails in
/sys/kernel/jails node (which does not reveal anything to a process
which is in jail).

A lifetime model for the Jail object is currently plain simple - there's
simpy no way to manually delete a Jail object once it was created. Such
feature should be carefully designed to allow safe destruction of a Jail
without the possibility of releasing a process which is in Jail from the
actual jail. Each process which is attached into a Jail cannot leave it
until the end of a Process (i.e. when finalizing a Process). All jails
are kept being referenced in the JailManagement. When a last attached
process is finalized, the Jail is automatically destroyed.
2022-11-05 18:00:58 -06:00
Timon Kruiper 0475407f9f Kernel: Remove bunch of unused includes in SysFS/Processes.cpp 2022-10-26 20:01:45 +02:00
Timon Kruiper 97f1fa7d8f Kernel: Include missing headers for various files
With these missing header files, we can now build these files for
aarch64.
2022-10-26 20:01:45 +02:00
Timon Kruiper fcbb6b79ac Kernel: Don't expose processor information for aarch64 in sysfs
We do not (yet) acquire this information for the aarch64 processors.
2022-10-26 20:01:45 +02:00
Liav A 75f01692b4 Kernel+Userland: Move /sys/firmware/power_state to /sys/kernel directory
Let's put the power_state global node into the /sys/kernel directory,
because that directory represents all global nodes and variables being
related to the Kernel. It's also a mutable node, that is more acceptable
being in the mentioned directory due to the fact that all other files in
the /sys/firmware directory are just firmware blobs and are not mutable
at all.
2022-10-25 15:33:34 -06:00
Liav A a91589c09b Kernel: Introduce global variables and stats in /sys/kernel directory
The ProcFS is an utter mess currently, so let's start move things that
are not related to processes-info. To ensure it's done in a sane manner,
we start by duplicating all /proc/ global nodes to the /sys/kernel/
directory, then we will move Userland to use the new directory so the
old directory nodes can be removed from the /proc directory.
2022-10-25 15:33:34 -06:00
Liav A 03ae9f94cf Kernel/FileSystem: Remove hardcoded unveil path of /usr/lib/Loader.so
If a program needs to execute a dynamic executable program, then it
should unveil /usr/lib/Loader.so by itself and not rely on the Kernel to
allow using this binary without any sense of respect to unveil promises
being made by the running parent program.
2022-10-24 19:41:32 -06:00
Gunnar Beutner ce4b66e908 Kernel: Add support for MSG_NOSIGNAL and properly send SIGPIPE
Previously we didn't send the SIGPIPE signal to processes when
sendto()/sendmsg()/etc. returned EPIPE. And now we do.

This also adds support for MSG_NOSIGNAL to suppress the signal.
2022-10-24 15:49:39 +02:00
Liav A fea3cb5ff9 Kernel/FileSystem: Discard safely filesystems when unmounted last time
This commit reached that goal of "safely discarding" a filesystem by
doing the following:
1. Stop using the s_file_system_map HashMap as it was an unsafe measure
to access pointers of FileSystems. Instead, make sure to register all
FileSystems at the VFS layer, with an IntrusiveList, to avoid problems
related to OOM conditions.
2. Make sure to cleanly remove the DiskCache object from a BlockBased
filesystem, so the destructor of such object will not need to do that in
the destruction point.
3. For ext2 filesystems, don't cache the root inode at m_inode_cache
HashMap. The reason for this is that when unmounting an ext2 filesystem,
we lookup at the cache to see if there's a reference to a cached inode
and if that's the case, we fail with EBUSY. If we keep the m_root_inode
also being referenced at the m_inode_cache map, we have 2 references to
that object, which will lead to fail with EBUSY. Also, it's much simpler
to always ask for a root inode and get it immediately from m_root_inode,
instead of looking up the cache for that inode.
2022-10-22 16:57:52 -04:00
Liav A 24977996a6 Kernel: Append root filesystem to the VFS FileBackedFileSystem list 2022-10-22 16:57:52 -04:00
Liav A 0fd7b688af Kernel: Introduce support for using FileSystem object in multiple mounts
The idea is to enable mounting FileSystem objects across multiple mounts
in contrast to what happened until now - each mount has its own unique
FileSystem object being attached to it.

Considering a situation of mounting a block device at 2 different mount
points at in system, there were a couple of critical flaws due to how
the previous "design" worked:
1. BlockBasedFileSystem(s) that pointed to the same actual device had a
separate DiskCache object being attached to them. Because both instances
were not synchronized by any means, corruption of the filesystem is most
likely achieveable by a simple cache flush of either of the instances.
2. For superblock-oriented filesystems (such as the ext2 filesystem),
lack of synchronization between both instances can lead to severe
corruption in the superblock, which could render the entire filesystem
unusable.
3. Flags of a specific filesystem implementation (for example, with xfs
on Linux, one can instruct to mount it with the discard option) must be
honored across multiple mounts, to ensure expected behavior against a
particular filesystem.

This patch put the foundations to start fix the issues mentioned above.
However, there are still major issues to solve, so this is only a start.
2022-10-22 16:57:52 -04:00
Liav A 965afba320 Kernel/FileSystem: Add a few missing includes
In preparation to future commits, we need to ensure that
OpenFileDescription.h doesn't include the VirtualFileSystem.h file to
avoid include loops.
2022-10-22 16:57:52 -04:00
Liav A 07387ec19a Kernel+Base: Introduce MS_NOREGULAR mount flag
This flag doesn't conform to any POSIX standard nor is found in any OS
out there. The idea behind this mount flag is to ensure that only
non-regular files will be placed in a filesystem, which includes device
nodes, symbolic links, directories, FIFOs and sockets. Currently, the
only valid case for using this mount flag is for TmpFS instances, where
we want to mount a TmpFS but disallow any kind of regular file and only
allow other types of files on the filesystem.
2022-10-22 19:18:15 +02:00
Liav A 97f8927da6 Kernel: Remove the DevTmpFS class
Although this code worked quite well, it is considered to be a code
duplication with the TmpFS code which is more tested and works quite
well for a variety of cases. The only valid reason to keep this
filesystem was that it enforces that no regular files will be created at
all in the filesystem. Later on, we will re-introduce this feature in a
sane manner. Therefore, this can be safely removed after SystemServer no
longer uses this filesystem type anymore.
2022-10-22 19:18:15 +02:00
Liav A c2b5c5bac5 Kernel: Add support for device nodes in TmpFS
Later on we will remove the DevTmpFS code, so in order to support
mounting TmpFS instead, we need to be able to create device nodes on
the filesystem.
2022-10-22 19:18:15 +02:00
Timon Kruiper 9827c11d8b Kernel: Move InterruptDisabler out of Arch directory
The code in this file is not architecture specific, so it can be moved
to the base Kernel directory.
2022-10-17 20:11:31 +02:00
Liav A b9dca3300e Kernel: Use more fine-grained content data block granularity in TmpFS
Instead of just having a giant KBuffer that is not resizeable easily, we
use multiple AnonymousVMObjects in one Vector to store them.
The idea is to not have to do giant memcpy or memset each time we need
to allocate or de-allocate memory for TmpFS inodes, but instead, we can
allocate only the desired block range when trying to write to it.
Therefore, it is also possible to have data holes in the inode content
in case of skipping an entire set of one data block or more when writing
to the inode content, thus, making memory usage much more efficient.

To ensure we don't run out of virtual memory range, don't allocate a
Region in advance to each TmpFSInode, but instead try to allocate a
Region on IO operation, and then use that Region to map the VMObjects
in IO loop.
2022-10-16 17:46:40 +02:00
Undefine 135ca3fa1b Kernel: Add support for the FAT32 filesystem
This commit adds read-only support for the FAT32 filesystem. It also
includes support for long file names.
2022-10-14 18:36:40 -06:00
Liav A 3cf6ac1b3f Kernel: Fix typo in comment in Ext2FileSystem::read_bytes_locked method 2022-09-26 20:13:13 +01:00
Liav A 0a793a7fa3 Kernel/FileSystem: Remove the locking of a Inode mutex in InodeVMObjects
We no longer require to lock the m_inode_lock in the SharedInodeVMObject
code as the methods write_bytes and read_bytes of the Inode class do
this for us now.
2022-09-26 22:06:10 +03:00
Liav A 9252a892bb Kernel: Abstracts x86 reboot and shutdown specific methods
We move QEMU and VirtualBox shutdown sequences to a separate file, as
well as moving the i8042 reboot code sequence too to another file.

This allows us to abstract specific methods from the power state node
code of the SysFS filesystem, to allow other architectures to put their
methods there too in the future.
2022-09-20 18:43:05 +01:00
Liav A 3ad0e1a1d5 Kernel: Handle mmap requests on zero-length data file inodes safely 2022-09-16 14:55:45 +03:00
Liav A c88cc8557f Kernel/FileSystem: Make Inode::{write,read}_bytes methods non-virtual
We make these methods non-virtual because we want to ensure we properly
enforce locking of the m_inode_lock mutex. Also, for write operations,
we want to call prepare_to_write_data before the actual write. The
previous design required us to ensure the callers do that at various
places which lead to hard-to-find bugs. By moving everything to a place
where we call prepare_to_write_data only once, we eliminate a possibilty
of forgeting to call it on some code path in the kernel.
2022-09-16 14:55:45 +03:00
Liav A 4f4717e351 Kernel/FileSystem: Mark ext2 inode block list non-const
The block list required a bit of work, and now the only method being
declared const to bypass its const-iness is the read_bytes method that
calls a new method called compute_block_list_with_exclusive_locking that
takes care of proper locking before trying to update the block list data
of the ext2 inode.
2022-09-16 14:55:45 +03:00
Liav A 843bd43c5b Kernel/FileSystem: Mark ext2 inode lookup cache non-const
For the lookup cache, no method being declared const tried to modify it,
so it was easy to drop the mutable declaration on the HashMap member.
2022-09-16 14:55:45 +03:00
Andreas Kling 2cc947ede4 Kernel: Use correct timestamp in sys$utimens()
We were mixing up the nanosecond and second parts of the timestamps.

Regressed in 280694bb46.
2022-09-13 17:03:31 +02:00
Andreas Kling 30861daa93 Kernel: Simplify the File memory-mapping API
Before this change, we had File::mmap() which did all the work of
setting up a VMObject, and then creating a Region in the current
process's address space.

This patch simplifies the interface by removing the region part.
Files now only have to return a suitable VMObject from
vmobject_for_mmap(), and then sys$mmap() itself will take care of
actually mapping it into the address space.

This fixes an issue where we'd try to block on I/O (for inode metadata
lookup) while holding the address space spinlock. It also reduces time
spent holding the address space lock.
2022-08-24 14:57:51 +02:00
Andreas Kling cf16b2c8e6 Kernel: Wrap process address spaces in SpinlockProtected
This forces anyone who wants to look into and/or manipulate an address
space to lock it. And this replaces the previous, more flimsy, manual
spinlock use.

Note that pointers *into* the address space are not safe to use after
you unlock the space. We've got many issues like this, and we'll have
to track those down as wlel.
2022-08-24 14:57:51 +02:00