Commit graph

3870 commits

Author SHA1 Message Date
Andreas Kling 32e93c8808 Kernel: Mark write_cr0() and write_cr4() as UNMAP_AFTER_INIT
This removes a very useful tool for attackers trying to disable
SMAP/SMEP/etc. :^)
2021-02-19 20:23:05 +01:00
Andreas Kling 6136faa4eb Kernel: Add .unmap_after_init section for code we don't need after init
You can now declare functions with UNMAP_AFTER_INIT and they'll get
segregated into a separate kernel section that gets completely
unmapped at the end of initialization.

This can be used for anything we don't need to call once we've booted
into userspace.

There are two nice things about this mechanism:

- It allows us to free up entire pages of memory for other use.
  (Note that this patch does not actually make use of the freed
  pages yet, but in the future we totally could!)

- It allows us to get rid of obviously dangerous gadgets like
  write-to-CR0 and write-to-CR4 which are very useful for an attacker
  trying to disable SMAP/SMEP/etc.

I've also made sure to include a helpful panic message in case you
hit a kernel crash because of this protection. :^)
2021-02-19 20:23:05 +01:00
Andreas Kling da100f12a6 Kernel: Add helpers for manipulating x86 control registers
Use read_cr{0,2,3,4} and write_cr{0,3,4} helpers instead of inline asm.
2021-02-19 20:23:05 +01:00
Andreas Kling 6e83be67b8 Kernel: Release ptrace lock in exec before stopping due to PT_TRACE_ME
If we have a tracer process waiting for us to exec, we need to release
the ptrace lock before stopping ourselves, since otherwise the tracer
will block forever on the lock.

Fixes #5409.
2021-02-19 12:13:54 +01:00
Andreas Kling 37d8faf1b4 ProcFS: Fix /proc/PID/* hardening bypass
This enabled trivial ASLR bypass for non-dumpable programs by simply
opening /proc/PID/vm before exec'ing.

We now hold the target process's ptrace lock across the refresh/write
operations, and deny access if the process is non-dumpable. The lock
is necessary to prevent a TOCTOU race on Process::is_dumpable() while
the target is exec'ing.

Fixes #5270.
2021-02-19 09:46:36 +01:00
Andreas Kling eb92ec3149 Kernel: Factor out mmap & friends range expansion to a helper function
sys$mmap() and related syscalls must pad to the nearest page boundary
below the base address *and* above the end address of the specified
range. Since we have to do this in many places, let's make a helper.
2021-02-18 18:04:58 +01:00
Andreas Kling 55a9a4f57a Kernel: Use KResult a bit more in sys$execve() 2021-02-18 09:37:33 +01:00
Andreas Kling 6c2f0316d9 Kernel: Convert snprintf() => String::formatted()/number() 2021-02-17 16:37:11 +01:00
Andreas Kling 5f610417d0 Kernel: Remove kprintf()
There are no remaining users of this API.
2021-02-17 16:33:43 +01:00
Andreas Kling 40e5210036 Kernel: Convert dbgprintf()/klog() => dbgln()/dmesgln() in UHCI code 2021-02-17 16:30:55 +01:00
Andreas Kling e4d84b5e79 Kernel: Remove dbgprintf() from kernel
There are no remaining users of this API in the kernel.
2021-02-17 16:22:34 +01:00
Andreas Kling 5a595ef134 Kernel: Use dbgln_if() in sys$fork() 2021-02-17 15:34:32 +01:00
AnotherTest 4043e770e5 Kernel: Don't go through ARP for IP broadcast messages 2021-02-17 14:41:36 +01:00
Andreas Kling 575c7ed414 Kernel: Make sys$msyscall() EFAULT on non-user address
Fixes #5361.
2021-02-16 11:32:00 +01:00
Ben Wiederhake fbb85f9b2f Kernel: Refuse excessively long iovec list, also in readv
This bug is a good example why copy-paste code should eventually be eliminated
from the code base: Apparently the code was copied from read.cpp before
c6027ed7cc, so the same bug got introduced here.

To recap: A malicious program can ask the Kernel to prepare sys-ing to
a huge amount of iovecs. The Kernel must first copy all the vector locations
into 'vecs', and before that allocates an arbitrary amount of memory:
    vecs.resize(iov_count);
This can cause Kernel memory exhaustion, triggered by any malicious userland
program.
2021-02-15 22:09:01 +01:00
Jean-Baptiste Boric 0d22ec9d32 Kernel: Handle 'Menu' key on PS/2 keyboard 2021-02-15 19:37:14 +01:00
AnotherTest 5729e76c7d Meta: Make it possible to (somewhat) build the system inside Serenity
This removes some hard references to the toolchain, some unnecessary
uses of an external install command, and disables a -Werror flag (for
the time being) - only if run inside serenity.

With this, we can build and link the kernel :^)
2021-02-15 17:32:56 +01:00
AnotherTest 4519950266 Kernel+LibC: Add the _SC_GETPW_R_SIZE_MAX sysconf enum
It just returns 4096 :P
2021-02-15 17:32:56 +01:00
AnotherTest a3a7ab83c4 Kernel+LibC: Implement readv
We already had writev, so let's just add readv too.
2021-02-15 17:32:56 +01:00
AnotherTest 1e79c04616 Kernel+LibC: Stub out SO_{SND_RCV}BUF 2021-02-15 17:32:56 +01:00
Brian Gianforcaro 7482cb6531 Kernel: Avoid some un-necessary copies coming from range based for loops
- The irq_controller was getting add_ref/released needlessly during enumeration.

- Used ranges were also getting needlessly copied.
2021-02-15 15:25:23 +01:00
Brian Gianforcaro 96943ab07c Kernel: Initial integration of Kernel Address Sanitizer (KASAN)
KASAN is a dynamic analysis tool that finds memory errors. It focuses
mostly on finding use-after-free and out-of-bound read/writes bugs.

KASAN works by allocating a "shadow memory" region which is used to store
whether each byte of memory is safe to access. The compiler then instruments
the kernel code and a check is inserted which validates the state of the
shadow memory region on every memory access (load or store).

To fully integrate KASAN into the SerenityOS kernel we need to:

 a) Implement the KASAN interface to intercept the injected loads/stores.

      void __asan_load*(address);
      void __asan_store(address);

 b) Setup KASAN region and determine the shadow memory offset + translation.
    This might be challenging since Serenity is only 32bit at this time.

    Ex: Linux implements kernel address -> shadow address translation like:

      static inline void *kasan_mem_to_shadow(const void *addr)
      {
          return ((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT)
                  + KASAN_SHADOW_OFFSET;
      }

 c) Integrating KASAN with Kernel allocators.
    The kernel allocators need to be taught how to record allocation state
    in the shadow memory region.

This commit only implements the initial steps of this long process:
- A new (default OFF) CMake build flag `ENABLE_KERNEL_ADDRESS_SANITIZER`
- Stubs out enough of the KASAN interface to allow the Kernel to link clean.

Currently the KASAN kernel crashes on boot (triple fault because of the crash
in strlen other sanitizer are seeing) but the goal here is to just get started,
and this should help others jump in and continue making progress on KASAN.

References:
* ASAN Paper: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37752.pdf
* KASAN Docs: https://github.com/google/kasan
* NetBSD KASAN Blog: https://blog.netbsd.org/tnf/entry/kernel_address_sanitizer_part_3
* LWN KASAN Article: https://lwn.net/Articles/612153/
* Tracking Issue #5351
2021-02-15 11:41:53 +01:00
Brian Gianforcaro 69df3cfae7 Kernel: Mark KBuffer and its getters as [[nodiscard]]
There is no reason to call a getter without observing the result, doing
so indicates an error in the code. Mark these methods as [[nodiscard]]
to find these cases.
2021-02-15 09:34:52 +01:00
Brian Gianforcaro 0cbede91b8 Kernel: Mark Lock getters as [[nodiscard]]
There is no reason to call a getter without observing the result, doing
so indicates an error in the code. Mark these methods as [[nodiscard]]
to find these cases.
2021-02-15 09:34:52 +01:00
Brian Gianforcaro a75d7958cc Kernel: Mark UserOrKernelBuffer and it's getters as [[nodicard]]
`UserOrKernelBuffer` objects should always be observed when created, in
turn there is no reason to call a getter without observing the result.
Doing either of these indicates an error in the code. Mark these methods
as [[nodiscard]] to find these cases.
2021-02-15 09:34:52 +01:00
Brian Gianforcaro 01a66efe9d Kernel: Mark KResult getters as [[nodiscard]]
There is no reason to call a getter without observing the result, doing
so indicates an error in the code. Mark these methods as [[nodiscard]]
to find these cases.
2021-02-15 09:34:52 +01:00
Brian Gianforcaro 8752a27519 Kernel: Mark PhysicalAddress/VirtualAddress getters as [[nodiscard]]
There is no reason to call a getter without observing the result, doing
so indicates an error in the code. Mark these methods as [[nodiscard]]
to find these cases.
2021-02-15 09:34:52 +01:00
Brian Gianforcaro d71e235894 Kernel: Mark more StdLib functions as [[nodiscard]]
In the never ending journey to catch bugs, mark more functions
as [[nodiscard]] to find incorrect call sites.
2021-02-15 09:34:52 +01:00
Brian Gianforcaro f12754ee10 Kernel: Mark BlockResult as [[nodiscard]]
In the majority of cases we want to force callers to observe the
result of a blocking operation as it's not grantee to succeed as
they expect. Mark BlockResult as [[nodiscard]] to force any callers
to observe the result of the blocking operation.
2021-02-15 08:28:57 +01:00
Brian Gianforcaro a8a834782c Kernel: Ignore unobserved BlockResult from Thread::Sleep
Suppress these in preparation for making BlockResult [[nodiscard]].
2021-02-15 08:28:57 +01:00
Brian Gianforcaro ddd79fe2cf Kernel: Add WaitQueue::wait_forever and it use it for all infinite waits.
In preparation for marking BlockingResult [[nodiscard]], there are a few
places that perform infinite waits, which we never observe the result of
the wait. Instead of suppressing them, add an alternate function which
returns void when performing and infinite wait.
2021-02-15 08:28:57 +01:00
Andreas Kling 68e3616971 Kernel: Forked children should inherit the signal trampoline address
Fixes #5347.
2021-02-14 18:38:46 +01:00
Andreas Kling 8ee42e47df Kernel: Mark a handful of things in CPU.cpp as READONLY_AFTER_INIT 2021-02-14 18:12:00 +01:00
Andreas Kling 99f596fd51 Kernel: Mark a handful of things in kmalloc.cpp as READONLY_AFTER_INIT 2021-02-14 18:12:00 +01:00
Andreas Kling 7a78a4915a Kernel: Mark a handful of things in init.cpp as READONLY_AFTER_INIT 2021-02-14 18:12:00 +01:00
Andreas Kling 49f463f557 Kernel: Mark a handful of things in Thread.cpp READONLY_AFTER_INIT 2021-02-14 18:12:00 +01:00
Andreas Kling c5c68bbd84 Kernel: Mark a handful of things in Scheduler.cpp READONLY_AFTER_INIT 2021-02-14 18:12:00 +01:00
Andreas Kling 00107a0dc1 Kernel: Mark a handful of things in Process.cpp READONLY_AFTER_INIT 2021-02-14 18:12:00 +01:00
Andreas Kling f0a1d9bfa5 Kernel: Mark the x86 IDT as READONLY_AFTER_INIT
We never need to modify the interrupt descriptor table after finishing
initialization, so let's make it an error to do so.
2021-02-14 18:12:00 +01:00
Andreas Kling a10accd48c Kernel: Print a helpful panic message for READONLY_AFTER_INIT crashes 2021-02-14 18:12:00 +01:00
Andreas Kling d8013c60bb Kernel: Add mechanism to make some memory read-only after init finishes
You can now use the READONLY_AFTER_INIT macro when declaring a variable
and we will put it in a special ".ro_after_init" section in the kernel.

Data in that section remains writable during the boot and init process,
and is then marked read-only just before launching the SystemServer.

This is based on an idea from the Linux kernel. :^)
2021-02-14 18:11:32 +01:00
Andreas Kling 6ee499aeb0 Kernel: Round old address/size in sys$mremap() to page size multiples
Found by fuzz-syscalls. :^)
2021-02-14 13:15:05 +01:00
Andreas Kling 0e92a80434 Kernel: Add some bits of randomness to kernel stack pointers
Since kernel stacks are much smaller (64 KiB) than userspace stacks,
we only add a small bit of randomness here (0-256 bytes, 16b aligned.)

This makes the location of the task context switch buffer not be
100% predictable. Note that we still also add extra randomness upon
syscall entry, so this patch primarily affects context switching.
2021-02-14 12:30:07 +01:00
Andreas Kling e47bffdc8c Kernel: Add some bits of randomness to the userspace stack pointer
This patch adds a random offset between 0 and 4096 to the initial
stack pointer in new processes. Since the stack has to be 16-byte
aligned, the bottom bits can't be randomized.

Yet another thing to make things less predictable. :^)
2021-02-14 11:53:49 +01:00
Andreas Kling 4188373020 Kernel: Fix TOCTOU in syscall entry region validation
We were doing stack and syscall-origin region validations before
taking the big process lock. There was a window of time where those
regions could then be unmapped/remapped by another thread before we
proceed with our syscall.

This patch closes that window, and makes sys$get_stack_bounds() rely
on the fact that we now know the userspace stack pointer to be valid.

Thanks to @BenWiederhake for spotting this! :^)
2021-02-14 11:47:14 +01:00
Andreas Kling 10b7f6b77e Kernel: Mark handle_crash() as [[noreturn]] 2021-02-14 11:47:14 +01:00
Ben Wiederhake c0692f1f95 Kernel: Avoid magic number in sys$poll 2021-02-14 10:57:33 +01:00
Andreas Kling cc341c95aa Kernel: Panic on sys$get_stack_bounds() in stack-less process 2021-02-14 10:51:18 +01:00
Andreas Kling 3131281747 Kernel: Panic on syscall from process with IOPL != 0
If this happens then the kernel is in an undefined state, so we should
rather panic than attempt to limp along.
2021-02-14 10:51:17 +01:00
Andreas Kling 781d29a337 Kernel+Userland: Give sys$recvfd() an options argument for O_CLOEXEC
@bugaevc pointed out that we shouldn't be setting this flag in
userspace, and he's right of course.
2021-02-14 10:39:48 +01:00