Commit graph

710 commits

Author SHA1 Message Date
Sergey Bugaev b3a24d732d Kernel+LibC: Add sys$waitid(), and make sys$waitpid() wrap it
sys$waitid() takes an explicit description of whether it's waiting for a single
process with the given PID, all of the children, a group, etc., and returns its
info as a siginfo_t.

It also doesn't automatically imply WEXITED, which clears up the confusion in
the kernel.
2020-02-05 18:14:37 +01:00
Andreas Kling 3879e5b9d4 Kernel: Start working on a syscall for logging performance events
This patch introduces sys$perf_event() with two event types:

- PERF_EVENT_MALLOC
- PERF_EVENT_FREE

After the first call to sys$perf_event(), a process will begin keeping
these events in a buffer. When the process dies, that buffer will be
written out to "perfcore" in the current directory unless that filename
is already taken.

This is probably not the best way to do this, but it's a start and will
make it possible to start doing memory allocation profiling. :^)
2020-02-02 20:26:27 +01:00
Andreas Kling 934b1d8a9b Kernel: Finalizer should not go back to sleep if there's more to do
Before putting itself back on the wait queue, the finalizer task will
now check if there's more work to do, and if so, do it first. :^)

This patch also puts a bunch of process/thread debug logging behind
PROCESS_DEBUG and THREAD_DEBUG since it was unbearable to debug this
stuff with all the spam.
2020-02-01 10:56:17 +01:00
Andreas Kling 6634da31d9 Kernel: Disallow empty ranges in munmap/mprotect/madvise 2020-01-30 21:55:49 +01:00
Andreas Kling 31d1c82621 Kernel: Reject non-user address ranges in mmap/munmap/mprotect/madvise
There's no valid reason to allow non-userspace address ranges in these
system calls.
2020-01-30 21:51:27 +01:00
Andreas Kling afd2b5a53e Kernel: Copy "stack" and "mmap" bits when splitting a Region 2020-01-30 21:51:27 +01:00
Andreas Kling c9e877a294 Kernel: Address validation helpers should take size_t, not ssize_t 2020-01-30 21:51:27 +01:00
Andreas Kling c64904a483 Kernel: sys$readlink() should return the number of bytes written out 2020-01-27 21:50:51 +01:00
Andreas Kling 8b49804895 Kernel: sys$waitpid() only needs the waitee thread in the stopped case
If the waitee process is dead, we don't need to inspect the thread.

This fixes an issue with sys$waitpid() failing before reap() since
dead processes will have no remaining threads alive.
2020-01-27 21:21:48 +01:00
Andreas Kling f4302b58fb Kernel: Remove SmapDisablers in sys$getsockname() and sys$getpeername()
Instead use the user/kernel copy helpers to only copy the minimum stuff
needed from to/from userspace.

Based on work started by Brian Gianforcaro.
2020-01-27 21:11:36 +01:00
Andreas Kling 5163c5cc63 Kernel: Expose the signal that stopped a thread via sys$waitpid() 2020-01-27 20:47:10 +01:00
Andreas Kling 638fe6f84a Kernel: Disable interrupts while looking into the thread table
There was a race window in a bunch of syscalls between calling
Thread::from_tid() and checking if the found thread was in the same
process as the calling thread.

If the found thread object was destroyed at that point, there was a
use-after-free that could be exploited by filling the kernel heap with
something that looked like a thread object.
2020-01-27 14:04:57 +01:00
Andreas Kling c1f74bf327 Kernel: Never validate access to the kmalloc memory range
Memory validation is used to verify that user syscalls are allowed to
access a given memory range. Ring 0 threads never make syscalls, and
so will never end up in validation anyway.

The reason we were allowing kmalloc memory accesses is because kernel
thread stacks used to be allocated in kmalloc memory. Since that's no
longer the case, we can stop making exceptions for kmalloc in the
validation code.
2020-01-27 12:43:21 +01:00
Andreas Kling 137a45dff2 Kernel: read()/write() should respect timeouts when used on a sockets
Move timeout management to the ReadBlocker and WriteBlocker classes.
Also get rid of the specialized ReceiveBlocker since it no longer does
anything that ReadBlocker can't do.
2020-01-26 17:54:23 +01:00
Andreas Kling b011857e4f Kernel: Make writev() work again
Vector::ensure_capacity() makes sure the underlying vector buffer can
contain all the data, but it doesn't update the Vector::size().

As a result, writev() would simply collect all the buffers to write,
and then do nothing.
2020-01-26 10:10:15 +01:00
Andreas Kling b93f6b07c2 Kernel: Make sched_setparam() and sched_getparam() operate on threads
Instead of operating on "some random thread in PID", these now operate
on the thread with a specific TID. This matches other systems better.
2020-01-26 09:58:58 +01:00
Andreas Kling f4e7aecec2 Kernel: Preserve CoW bits when splitting VM regions 2020-01-25 17:57:10 +01:00
Andreas Kling 7cc0b18f65 Kernel: Only open a single description for stdio in non-fork processes 2020-01-25 17:05:02 +01:00
Andreas Kling 81ddd2dae0 Kernel: Make sys$setsid() clear the calling process's controlling TTY 2020-01-25 14:53:48 +01:00
Andreas Kling 2bf11b8348 Kernel: Allow empty strings in validate_and_copy_string_from_user()
Sergey pointed out that we should just allow empty strings everywhere.
2020-01-25 14:14:11 +01:00
Andreas Kling 69de90a625 Kernel: Simplify Process constructor
Move all the fork-specific inheritance logic to sys$fork(), and all the
stuff for setting up stdio for non-fork ring 3 processes moves to
Process::create_user_process().

Also: we were setting up the PGID, SID and umask twice. Also the code
for copying the open file descriptors was overly complicated. Now it's
just a simple Vector copy assignment. :^)
2020-01-25 14:13:47 +01:00
Andreas Kling 0f5221568b Kernel: sys$execve() should not EFAULT for empty argument strings
It's okay to exec { "/bin/echo", "" } and it should not EFAULT.
2020-01-25 12:21:30 +01:00
Andreas Kling 30ad7953ca Kernel: Rename UnveilState to VeilState 2020-01-21 19:28:59 +01:00
Andreas Kling f38cfb3562 Kernel: Tidy up debug logging a little bit
When using dbg() in the kernel, the output is automatically prefixed
with [Process(PID:TID)]. This makes it a lot easier to understand which
thread is generating the output.

This patch also cleans up some common logging messages and removes the
now-unnecessary "dbg() << *current << ..." pattern.
2020-01-21 16:16:20 +01:00
Andreas Kling 6081c76515 Kernel: Make O_RDONLY non-zero
Sergey suggested that having a non-zero O_RDONLY would make some things
less confusing, and it seems like he's right about that.

We can now easily check read/write permissions separately instead of
dancing around with the bits.

This patch also fixes unveil() validation for O_RDWR which previously
forgot to check for "r" permission.
2020-01-21 13:27:08 +01:00
Andreas Kling 1b3cac2f42 Kernel: Don't forget about unveiled paths with zero permissions
We need to keep these around, otherwise the calling process can remove
and re-add a path to increase its permissions.
2020-01-21 11:42:28 +01:00
Andreas Kling 22cfb1f3bd Kernel: Clear unveiled state on exec() 2020-01-21 10:46:31 +01:00
Andreas Kling cf48c20170 Kernel: Forked children should inherit unveil()'ed paths 2020-01-21 09:44:32 +01:00
Andreas Kling 0569123ad7 Kernel: Add a basic implementation of unveil()
This syscall is a complement to pledge() and adds the same sort of
incremental relinquishing of capabilities for filesystem access.

The first call to unveil() will "drop a veil" on the process, and from
now on, only unveiled parts of the filesystem are visible to it.

Each call to unveil() specifies a path to either a directory or a file
along with permissions for that path. The permissions are a combination
of the following:

- r: Read access (like the "rpath" promise)
- w: Write access (like the "wpath" promise)
- x: Execute access
- c: Create/remove access (like the "cpath" promise)

Attempts to open a path that has not been unveiled with fail with
ENOENT. If the unveiled path lacks sufficient permissions, it will fail
with EACCES.

Like pledge(), subsequent calls to unveil() with the same path can only
remove permissions, not add them.

Once you call unveil(nullptr, nullptr), the veil is locked, and it's no
longer possible to unveil any more paths for the process, ever.

This concept comes from OpenBSD, and their implementation does various
things differently, I'm sure. This is just a first implementation for
SerenityOS, and we'll keep improving on it as we go. :^)
2020-01-20 22:12:04 +01:00
Andreas Kling e901a3695a Kernel: Use the templated copy_to/from_user() in more places
These ensure that the "to" and "from" pointers have the same type,
and also that we copy the correct number of bytes.
2020-01-20 13:41:21 +01:00
Sergey Bugaev d5426fcc88 Kernel: Misc tweaks 2020-01-20 13:26:06 +01:00
Sergey Bugaev 9bc6157998 Kernel: Return new fd from sys$fcntl(F_DUPFD)
This fixes GNU Bash getting confused after performing a redirection.
2020-01-20 13:26:06 +01:00
Andreas Kling 4b7a89911c Kernel: Remove some unnecessary casts to uintptr_t
VirtualAddress is constructible from uintptr_t and const void*.
PhysicalAddress is constructible from uintptr_t but not const void*.
2020-01-20 13:13:03 +01:00
Andreas Kling a246e9cd7e Use uintptr_t instead of u32 when storing pointers as integers
uintptr_t is 32-bit or 64-bit depending on the target platform.
This will help us write pointer size agnostic code so that when the day
comes that we want to do a 64-bit port, we'll be in better shape.
2020-01-20 13:13:03 +01:00
Andreas Kling 8d9dd1b04b Kernel: Add a 1-deep cache to Process::region_from_range()
This simple cache gets hit over 70% of the time on "g++ Process.cpp"
and shaves ~3% off the runtime.
2020-01-19 16:44:37 +01:00
Andreas Kling ae0c435e68 Kernel: Add a Process::add_region() helper
This is a private helper for adding a Region to Process::m_regions.
It's just for convenience since it's a bit cumbersome to do this.
2020-01-19 16:26:42 +01:00
Andreas Kling 1dc9fa9506 Kernel: Simplify PageDirectory swapping in sys$execve()
Swap out both the PageDirectory and the Region list at the same time,
instead of doing the Region list slightly later.
2020-01-19 16:05:42 +01:00
Andreas Kling 6eab7b398d Kernel: Make ProcessPagingScope restore CR3 properly
Instead of restoring CR3 to the current process's paging scope when a
ProcessPagingScope goes out of scope, we now restore exactly whatever
the CR3 value was when we created the ProcessPagingScope.

This fixes breakage in situations where a process ends up with nested
ProcessPagingScopes. This was making profiling very fragile, and with
this change it's now possible to profile g++! :^)
2020-01-19 13:44:53 +01:00
Andreas Kling f7b394e9a1 Kernel: Assert that copy_to/from_user() are called with user addresses
This will panic the kernel immediately if these functions are misused
so we can catch it and fix the misuse.

This patch fixes a couple of misuses:

    - create_signal_trampolines() writes to a user-accessible page
      above the 3GB address mark. We should really get rid of this
      page but that's a whole other thing.

    - CoW faults need to use copy_from_user rather than copy_to_user
      since it's the *source* pointer that points to user memory.

    - Inode faults need to use memcpy rather than copy_to_user since
      we're copying a kernel stack buffer into a quickmapped page.

This should make the copy_to/from_user() functions slightly less useful
for exploitation. Before this, they were essentially just glorified
memcpy() with SMAP disabled. :^)
2020-01-19 09:18:55 +01:00
Andreas Kling 5ce9382e98 Kernel: Only require "stdio" pledge for sending signals to self
This should match what OpenBSD does. Sending a signal to yourself seems
basically harmless.
2020-01-19 08:50:55 +01:00
Sergey Bugaev 3e1ed38d4b Kernel: Do not return ENOENT for unresolved symbols
ENOENT means "no such file or directory", not "no such symbol". Return EINVAL
instead, as we already do in other cases.
2020-01-18 23:51:22 +01:00
Sergey Bugaev d0d13e2bf5 Kernel: Move setting file flags and r/w mode to VFS::open()
Previously, VFS::open() would only use the passed flags for permission checking
purposes, and Process::sys$open() would set them on the created FileDescription
explicitly. Now, they should be set by VFS::open() on any files being opened,
including files that the kernel opens internally.

This also lets us get rid of the explicit check for whether or not the returned
FileDescription was a preopen fd, and in fact, fixes a bug where a read-only
preopen fd without any other flags would be considered freshly opened (due to
O_RDONLY being indistinguishable from 0) and granted a new set of flags.
2020-01-18 23:51:22 +01:00
Sergey Bugaev 544b8286da Kernel: Do not open stdio fds for kernel processes
Kernel processes just do not need them.

This also avoids touching the file (sub)system early in the boot process when
initializing the colonel process.
2020-01-18 23:51:22 +01:00
Sergey Bugaev 6466c3d750 Kernel: Pass correct permission flags when opening files
Right now, permission flags passed to VFS::open() are effectively ignored, but
that is going to change.

* O_RDONLY is 0, but it's still nicer to pass it explicitly
* POSIX says that binding a Unix socket to a symlink shall fail with EADDRINUSE
2020-01-18 23:51:22 +01:00
Andreas Kling 862b3ccb4e Kernel: Enforce W^X between sys$mmap() and sys$execve()
It's now an error to sys$mmap() a file as writable if it's currently
mapped executable by anyone else.

It's also an error to sys$execve() a file that's currently mapped
writable by anyone else.

This fixes a race condition vulnerability where one program could make
modifications to an executable while another process was in the kernel,
in the middle of exec'ing the same executable.

Test: Kernel/elf-execve-mmap-race.cpp
2020-01-18 23:40:12 +01:00
Andreas Kling 4e6fe3c14b Kernel: Symbolicate kernel EIP on process crash
Process::crash() was assuming that EIP was always inside the ELF binary
of the program, but it could also be in the kernel.
2020-01-18 14:38:39 +01:00
Andreas Kling 9c9fe62a4b Kernel: Validate the requested range in allocate_region_with_vmobject() 2020-01-18 14:37:22 +01:00
Andreas Kling aa63de53bd Kernel: Use get_syscall_path_argument() in sys$execve()
Paths passed to sys$execve() should certainly be subject to all the
usual path validation checks.
2020-01-18 11:43:28 +01:00
Andreas Kling b65572b3fe Kernel: Disallow mmap names longer than PATH_MAX 2020-01-18 11:34:53 +01:00
Andreas Kling 94ca55cefd Meta: Add license header to source files
As suggested by Joshua, this commit adds the 2-clause BSD license as a
comment block to the top of every source file.

For the first pass, I've just added myself for simplicity. I encourage
everyone to add themselves as copyright holders of any file they've
added or modified in some significant way. If I've added myself in
error somewhere, feel free to replace it with the appropriate copyright
holder instead.

Going forward, all new source files should include a license header.
2020-01-18 09:45:54 +01:00