Commit graph

593 commits

Author SHA1 Message Date
Tom 7593418b77 Kernel: Fix futex race that could lead to thread waiting forever
There is a race condition where we would remove a FutexQueue from
our futex map and in the meanwhile another thread started to queue
itself into that very same futex, leading to that thread to wait
forever as no other wake operation could discover that removed
FutexQueue.

This fixes the problem by:
* Tracking imminent waits, which prevents a new FutexQueue from being
  deleted that a thread will wait on momentarily
* Atomically marking a FutexQueue as removed, which prevents a thread
  from waiting on it before it is actually removed from the futex map.
2021-07-07 10:05:55 +02:00
Andreas Kling 565796ae4e Kernel+LibC: Remove sys$donate()
This was an old SerenityOS-specific syscall for donating the remainder
of the calling thread's time-slice to another thread within the same
process.

Now that Threading::Lock uses a pthread_mutex_t internally, we no
longer need this syscall, which allows us to get rid of a surprising
amount of unnecessary scheduler logic. :^)
2021-07-05 23:30:15 +02:00
ForLoveOfCats ce6658acc1 KeyboardSettings+Kernel: Setting to enable Num Lock on login 2021-07-05 06:19:59 +02:00
Idan Horowitz 301c1a3a58 Everywhere: Fix incorrect usages of AK::Checked
Specifically, explicitly specify the checked type, use the resulting
value instead of doing the same calculation twice, and break down
calculations to discrete operations to ensure no intermediary overflows
are missed.
2021-07-04 20:08:28 +01:00
Gunnar Beutner c51b49a8cb Kernel: Implement TLS support for x86_64 2021-07-04 01:07:28 +02:00
Gunnar Beutner a09e6171a6 Kernel: Don't allow allocate_tls() if the process has multiple threads
We can't safely update the other threads' FS selector. This shouldn't
be a problem in practice because allocate_tls() is only used by the
loader.
2021-07-04 01:07:28 +02:00
Daniel Bertalan b9f30c6f2a Everywhere: Fix some alignment issues
When creating uninitialized storage for variables, we need to make sure
that the alignment is correct. Fixes a KUBSAN failure when running
kernels compiled with Clang.

In `Syscalls/socket.cpp`, we can simply use local variables, as
`sockaddr_un` is a POD type.

Along with moving the `alignas` specifier to the correct member,
`AK::Optional`'s internal buffer has been made non-zeroed by default.
GCC emitted bogus uninitialized memory access warnings, so we now use
`__builtin_launder` to tell the compiler that we know what we are doing.
This might disable some optimizations, but judging by how GCC failed to
notice that the memory's initialization is dependent on `m_has_value`,
I'm not sure that's a bad thing.
2021-07-03 01:56:31 +04:30
Gunnar Beutner 16b9a2d2e1 Kernel+LibPthread: Add support for usermode threads on x86_64 2021-07-01 17:22:22 +02:00
Gunnar Beutner 93c741018e Kernel+LibPthread: Remove m_ prefix for public members 2021-07-01 17:22:22 +02:00
Gunnar Beutner fe2716df21 Kernel: Disable __thread and TLS on x86_64 for now
They're not yet properly supported.
2021-06-30 15:13:30 +02:00
Gunnar Beutner cafccb866c Kernel: Don't start usermode threads on x86_64 for now
Starting usermode threads doesn't currently work on x86_64. With this
stubbed out we can get text mode to boot though.
2021-06-30 15:13:30 +02:00
Max Wipfli 7405536a1a AK+Everywhere: Use mostly StringView in LexicalPath
This changes the m_parts, m_dirname, m_basename, m_title and m_extension
member variables to StringViews onto the m_string String. It also
removes the m_is_absolute member in favour of computing if a path is
absolute in the is_absolute() getter. Due to this, the canonicalize()
method has been completely rewritten.

The parts() getter still returns a Vector<String>, although it is no
longer a const reference as m_parts is no longer a Vector<String>.
Rather, it is constructed from the StringViews in m_parts upon request.
The parts_view() getter has been added, which returns Vector<StringView>
const&. Most previous users of parts() have been changed to use
parts_view(), except where Strings are required.

Due to this change, it's is now no longer allow to create temporary
LexicalPath objects to call the dirname, basename, title, or extension
getters on them because the returned StringViews will point to possible
freed memory.
2021-06-30 11:13:54 +02:00
Max Wipfli fc6d051dfd AK+Everywhere: Add and use static APIs for LexicalPath
The LexicalPath instance methods dirname(), basename(), title() and
extension() will be changed to return StringView const& in a further
commit. Due to this, users creating temporary LexicalPath objects just
to call one of those getters will recieve a StringView const& pointing
to a possible freed buffer.

To avoid this, static methods for those APIs have been added, which will
return a String by value to avoid those problems. All cases where
temporary LexicalPath objects have been used as described above haven
been changed to use the static APIs.
2021-06-30 11:13:54 +02:00
Liav A 7c87891c06 Kernel: Don't copy a Vector<FileDescriptionAndFlags>
Instead of copying a Vector everytime we need to enumerate a Process'
file descriptions, we can just temporarily lock so it won't change.
2021-06-29 20:53:59 +02:00
Liav A 12b6e69150 Kernel: Introduce the new ProcFS design
The new ProcFS design consists of two main parts:
1. The representative ProcFS class, which is derived from the FS class.
The ProcFS and its inodes are much more lean - merely 3 classes to
represent the common type of inodes - regular files, symbolic links and
directories. They're backed by a ProcFSExposedComponent object, which
is responsible for the functional operation behind the scenes.
2. The backend of the ProcFS - the ProcFSComponentsRegistrar class
and all derived classes from the ProcFSExposedComponent class. These
together form the entire backend and handle all the functions you can
expect from the ProcFS.

The ProcFSExposedComponent derived classes split to 3 types in the
manner of lifetime in the kernel:
1. Persistent objects - this category includes all basic objects, like
the root folder, /proc/bus folder, main blob files in the root folders,
etc. These objects are persistent and cannot die ever.
2. Semi-persistent objects - this category includes all PID folders,
and subdirectories to the PID folders. It also includes exposed objects
like the unveil JSON'ed blob. These object are persistent as long as the
the responsible process they represent is still alive.
3. Dynamic objects - this category includes files in the subdirectories
of a PID folder, like /proc/PID/fd/* or /proc/PID/stacks/*. Essentially,
these objects are always created dynamically and when no longer in need
after being used, they're deallocated.
Nevertheless, the new allocated backend objects and inodes try to use
the same InodeIndex if possible - this might change only when a thread
dies and a new thread is born with a new thread stack, or when a file
descriptor is closed and a new one within the same file descriptor
number is opened. This is needed to actually be able to do something
useful with these objects.

The new design assures that many ProcFS instances can be used at once,
with one backend for usage for all instances.
2021-06-29 20:53:59 +02:00
Liav A 92c0dab5ab Kernel: Introduce the new SysFS
The intention is to add dynamic mechanism for notifying the userspace
about hotplug events. Currently, the DMI (SMBIOS) blobs and ACPI tables
are exposed in the new filesystem.
2021-06-29 20:53:59 +02:00
Gunnar Beutner a5b4b95a76 Kernel: Report correct architecture for uname() 2021-06-29 20:03:36 +02:00
Gunnar Beutner 85561feb40 Kernel: Rename some variables to arch-independent names 2021-06-29 20:03:36 +02:00
Gunnar Beutner 90e3aa35ef Kernel: Fix correct argument order for userspace entry point 2021-06-29 20:03:36 +02:00
Gunnar Beutner 6dde7dac8f Kernel: Implement signal handling for x86_64 2021-06-29 20:03:36 +02:00
Gunnar Beutner df9e73de25 Kernel: Add x86_64 support for fork() 2021-06-29 20:03:36 +02:00
Gunnar Beutner 2a78bf8596 Kernel: Fix the return type for syscalls
The Process::Handler type has KResultOr<FlatPtr> as its return type.
Using a different return type with an equally-sized template parameter
sort of works but breaks once that condition is no longer true, e.g.
for KResultOr<int> on x86_64.

Ideally the syscall handlers would also take FlatPtrs as their args
so we can get rid of the reinterpret_cast for the function pointer
but I didn't quite feel like cleaning that up as well.
2021-06-28 22:29:28 +02:00
Gunnar Beutner d4c0d28035 Kernel: Properly set up the userland context for new processes on x86_64 2021-06-28 22:29:28 +02:00
Gunnar Beutner 158355e0d7 Kernel+LibELF: Add support for validating and loading ELF64 executables 2021-06-28 22:29:28 +02:00
Gunnar Beutner f285241cb8 Kernel: Rename Thread::tss to Thread::regs and add x86_64 support
We're using software context switches so calling this struct tss is
somewhat misleading.
2021-06-27 15:46:42 +02:00
Gunnar Beutner 233ef26e4d Kernel+Userland: Add x86_64 registers to RegisterState/PtraceRegisters 2021-06-27 15:46:42 +02:00
Andreas Kling f4090d46de Kernel: Don't kmalloc() for small (<=1024) dbgputstr() syscalls 2021-06-27 10:50:24 +02:00
Gunnar Beutner f630299d49 Kernel: Add support for setting up a x86_64 GDT once in C++ land 2021-06-26 11:08:52 +02:00
Daniel Bertalan f820917a76 Everywhere: Use nothrow new with adopt_{ref,own}_if_nonnull
This commit converts naked `new`s to `AK::try_make` and `AK::try_create`
wherever possible. If the called constructor is private, this can not be
done, so we instead now use the standard-defined and compiler-agnostic
`new (nothrow)`.
2021-06-24 17:35:49 +04:30
Max Wipfli 84c0f98fb2 Kernel: Reimplement the dbgputch and dbgputstr syscalls
This rewrites the dbgputch and dbgputstr system calls as wrappers of
kstdio.h.

This fixes a bug where only the Kernel's debug output was also sent to
a serial debugger, while the userspace's debug output was only sent to
the Bochs debugger.

This also fixes a bug where debug output from one process would
sometimes "interrupt" the debug output from another process in the
middle of a line.
2021-06-24 10:29:09 +02:00
Gunnar Beutner 38fca26f54 Kernel: Add stubs for missing x86_64 functionality
This adds just enough stubs to make the kernel compile on x86_64. Obviously
it won't do anything useful - in fact it won't even attempt to boot because
Multiboot doesn't support ELF64 binaries - but it gets those compiler errors
out of the way so more progress can be made getting all the missing
functionality in place.
2021-06-24 09:27:13 +02:00
Hendiadyoin1 925be2758e Kernel: Remove unused CPU.h includes
In most cases we did not need it at all, in other, we only needed one
header from it
2021-06-24 00:38:23 +02:00
Hendiadyoin1 7ca3d413f7 Kernel: Pull apart CPU.h
This does not add any functional changes
2021-06-24 00:38:23 +02:00
Gunnar Beutner bf779e182e Kernel: Remove obsolete size_t casts 2021-06-17 19:52:54 +02:00
Gunnar Beutner bc3076f894 Kernel: Remove various other uses of ssize_t 2021-06-16 21:29:36 +02:00
Jelle Raaijmakers 30abfc2b21 Kernel: Pass absolute path to shebang interpreter
When you invoke a binary with a shebang line, the `execve` syscall
makes sure to pass along command line arguments to the shebang
interpreter including the path to the binary to execute.

This does not work well when the binary lives in $PATH. For example,
given this script living in `/usr/local/bin/my-script`:

  #!/bin/my-interpreter
  echo "well hello friends"

When executing it as `my-script` from outside `/usr/local/bin/`, it is
executed as `/bin/my-interpreter my-script`. To make sure that the
interpreter can find the binary to execute, we need to replace the
first argument with an absolute path to the binary, so that the
resulting command is:

  /bin/my-interpreter /usr/local/bin/my-script
2021-06-13 21:19:51 +02:00
Jelle Raaijmakers 26250779d1 Kernel: Also move() the shebang path in execve 2021-06-13 21:19:51 +02:00
Max Wipfli 573664758a Kernel: Properly reset m_unveiled_paths on execve()
When a process executes another program, its unveil state is reset. For
this, we not only need to clear all nodes from m_unveiled_paths, but
also reset the metadata of m_unveiled_paths (the root node) itself.

This fixes the following bug:
1) A process unveils "/", then executes another program.
2) That other program also unveils some path.
3) "/" is now unveiled for the new program.
2021-06-08 12:15:04 +02:00
Max Wipfli 8930db0900 Kernel: Change unveil state to dropped even when node already exists
This also changes the UnveilState to Dropped when the path unveil() is
called for already has a node.

This fixes a bug where unveiling "/" would previously keep the
UnveilState as None, which meant that everything was still accessible
until unveil() was called with any non-root path (or nullptr).
2021-06-08 12:15:04 +02:00
Max Wipfli 2fcebfd6a8 Kernel: Update intermediate nodes when changing unveil permissions
When changing the unveil permissions of a preexisting node, we need to
make sure that any intermediate nodes that were created before and
should inherit permissions from the updated node are updated properly.

This fixes the following bug:
unveil("/home/anon/Documents", "r");
unveil("/home", "r");
Now there was a intermediate node for "/home/anon" which still had no
permission, even though it should have inherited the permissions from
"/home".
2021-06-08 12:15:04 +02:00
Max Wipfli e8a317023d Kernel: Allow unveiling subfolders regardless of parent's permissions
This fixes a bug where unveiling a subdirectory of an already unveiled
path would sometimes be allowed and sometimes not (depending on what
other unveil calls have been made).

Now, it is always allowed to unveil a subdirectory of an already
unveiled directory, even if it has higher permissions.

This removes the need for the permissions_inherited_from_root flag in
UnveilMetadata, so it has been removed.
2021-06-08 12:15:04 +02:00
Max Wipfli 9d41dd2ed0 Kernel: Use LexicalPath to avoid two consecutive slashes in unveil path
This patch fixes a bug in the unveil syscall where an UnveilNode's path
would start with two slashes if it's parent node was "/".
2021-06-08 12:15:04 +02:00
Jelle Raaijmakers d6a3f1fcd7 Kernel: Simplify execve shebang argument handling 2021-06-08 11:30:58 +02:00
Brian Gianforcaro 9fccbde371 Kernel: Switch Process to InstrusiveList from InlineLinkedList 2021-06-07 09:42:55 +02:00
Jelle Raaijmakers f6d372b2ab Kernel: Process::exec(): Check if path is a regular file
https://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html

  [EACCES] The new process image file is not a regular file and the
           implementation does not support execution of files of its
           type.

Let's check whether the passed `path` is indeed a regular file.
2021-06-04 23:45:17 +02:00
Jelle Raaijmakers 496988de47 LibC: Add POSIX timer constants 2021-06-04 10:39:41 +02:00
Gunnar Beutner fe0ae3161a Kernel: Fix use-after-free in sys$mremap
Now that Region::name() has been changed to return a StringView we
can't rely on keeping a copy of the region's name past the region's
destruction just by holding a copy of the StringView.
2021-06-02 18:00:13 +02:00
Brian Gianforcaro 7c0b2eb0f5 Kernel: Handle OOM of file system in sys$mount 2021-06-01 23:14:40 +01:00
Brian Gianforcaro d2d6ab40f9 Kernel: Make AnonymousFile::create API OOM safe 2021-06-01 23:14:40 +01:00
Nick Miller 10ba6f254c Kernel: Rename instances of IO port 0xe9 to BOCHS_DEBUG_PORT 2021-05-31 19:06:13 +01:00
Ali Mohammad Pur 2b5732ab77 AK+Kernel: Disallow implicitly lifting pointers to OwnPtr's
This doesn't really _fix_ anything, it just gets rid of the API and
instead makes the users explicitly use `adopt_own_if_non_null()`.
2021-05-31 17:09:12 +04:30
Gunnar Beutner 01c75e3a34 Kernel: Don't log profile data before/after the process/thread lifetime
There were a few cases where we could end up logging profiling events
before or after the associated process or thread exists in the profile:

After enabling profiling we might end up with CPU samples before we
had a chance to synthesize process/thread creation events.

After a thread exits we would still log associated kmalloc/kfree
events. Instead we now just ignore those events.
2021-05-30 19:03:03 +02:00
Andreas Kling 1123af361d Kernel: Convert Process::get_syscall_path_argument() to KString
This API now returns a KResultOr<NonnullOwnPtr<KString>> and allocation
failures should be propagated everywhere nicely. :^)
2021-05-29 20:18:57 +02:00
Gunnar Beutner 42d667645d Kernel: Make sure we free the thread stack on thread exit
This adds two new arguments to the thread_exit system call which let
a thread unmap an arbitrary VM range on thread exit. LibPthread
uses this functionality to unmap the thread stack.

Fixes #7267.
2021-05-29 15:53:08 +02:00
Gunnar Beutner 95c2166ca9 Kernel: Move sys$munmap functionality into a helper method 2021-05-29 15:53:08 +02:00
Brian Gianforcaro 65d5f81afc Kernel: Make PrivateInodeVMObject factory APIs OOM safe 2021-05-29 09:04:05 +02:00
Andreas Kling 9d801d2345 Kernel: Rename Custody::create() => try_create()
The try_ prefix indicates that this may fail. :^)
2021-05-28 11:23:00 +02:00
Andreas Kling fc9ce22981 Kernel: Use KString for Region names
Replace the AK::String used for Region::m_name with a KString.

This seems beneficial across the board, but as a specific data point,
it reduces time spent in sys$set_mmap_name() by ~50% on test-js. :^)
2021-05-28 09:37:09 +02:00
Tim Schumacher 58bc10b947
Kernel: Make dup2() return the fd even if old & new are the same (#7506) 2021-05-27 21:14:57 +02:00
Gunnar Beutner ad6587424f Kernel: Disable profiling if setting up the buffer or timer failed 2021-05-24 09:10:50 +02:00
Gunnar Beutner 0688e02339 Kernel: Make sure we only log profiling events when m_profiling is true
Previously the process' m_profiling flag was ignored for all event
types other than CPU samples.

The kfree tracing code relies on temporarily disabling tracing during
exec. This didn't work for per-process profiles and would instead
panic.

This updates the profiling code so that the m_profiling flag isn't
ignored.
2021-05-23 23:54:30 +01:00
Gunnar Beutner 7557f2db90 Kernel: Remove an allocation when blocking a thread
When blocking a thread with a timeout we would previously allocate
a Timer object. This removes the allocation for that Timer object.
2021-05-20 09:09:10 +02:00
Brian Gianforcaro bb91bed576 Kernel: Make ProcessGroup::find_or_create API OOM safe
Make ProcessGroup::find_or_create & ProcessGroup::create OOM safe, by
moving to adopt_ref_if_nonnull.
2021-05-20 08:10:07 +02:00
Gunnar Beutner 7dc77bd833 Kernel: Avoid an allocation in sys$poll 2021-05-19 22:51:42 +02:00
Gunnar Beutner 277f333b2b Kernel: Add support for profiling kmalloc()/kfree() 2021-05-19 22:51:42 +02:00
Gunnar Beutner 572bbf28cc Kernel+LibC: Add support for filtering profiling events
This adds the -t command-line argument for the profile tool. Using this
argument you can filter which event types you want in your profile.
2021-05-19 22:51:42 +02:00
Justin 1c3badede3 Kernel: Add statvfs & fstatvfs Syscalls
These syscalls fill a statvfs struct with various data
about the mount on the VFS.
2021-05-19 21:33:29 +02:00
Hendiadyoin1 ef425a02f7 Kernel: Implement mprotect for multiple Regions 2021-05-18 16:50:52 +02:00
Sahan Fernando d0f314b23c Kernel: Fix subtle race condition in sys$write implementation
There is a slight race condition in our implementation of write().
We call File::can_write() before attempting to write to it (blocking if
it returns false). If it returns true, we assume that we can write to
the file, and our code assumes that File::write() cannot possibly fail
by being blocked. There is, however, the rare case where another process
writes to the file and prevents further writes in between the call to
Files::can_write() and File::write() in the first process. This would
result in the first process calling File::write() when it cannot be
written to.

We fix this by adding a mechanism for File::can_write() to signal that
it was blocked, making it the responsibilty of File::write() to check
whether it can write and then finally making sys$write() check if the
write failed due to it being blocked.
2021-05-18 16:33:15 +02:00
Gunnar Beutner 89956cb0d6 Kernel+Userspace: Implement the accept4() system call
Unlike accept() the new accept4() system call lets the caller specify
flags for the newly accepted socket file descriptor, such as
SOCK_CLOEXEC and SOCK_NONBLOCK.
2021-05-17 13:32:19 +02:00
Liav A e6f333ae00 Kernel: Print failed attempt to shutdown the machine
Because we don't parse ACPI AML yet, If we are not able to shut down
the machine with "hacky" emulation methods - halt and print this state
to the users so they know they can shutdown the machine by themselves.
2021-05-17 00:30:40 +01:00
Nicholas Baron aa4d41fe2c
AK+Kernel+LibELF: Remove the need for IteratorDecision::Continue
By constraining two implementations, the compiler will select the best
fitting one. All this will require is duplicating the implementation and
simplifying for the `void` case.

This constraining also informs both the caller and compiler by passing
the callback parameter types as part of the constraint
(e.g.: `IterationFunction<int>`).

Some `for_each` functions in LibELF only take functions which return
`void`. This is a minimal correctness check, as it removes one way for a
function to incompletely do something.

There seems to be a possible idiom where inside a lambda, a `return;` is
the same as `continue;` in a for-loop.
2021-05-16 10:36:52 +01:00
Andreas Kling 4d429ba9ea Kernel: Unbreak profiling all processes
Regressed in 8a4cc735b9.
We stopped generating "process created" when enabling profiling,
which led to Profiler getting confused about the missing events.
2021-05-15 21:25:54 +02:00
Liav A 8a4cc735b9 Kernel: Don't use the profile timer if we don't have a timer to assign 2021-05-15 18:08:41 +02:00
Gunnar Beutner 4ab9d8736b Kernel: Make perf_event() work for global profiles
Previously calls to perf_event() would end up in a process-specific
perfcore file even though global profiling was enabled. This changes
the behavior for perf_event() so that these events are stored into
the global profile instead.
2021-05-15 16:28:18 +02:00
Brian Gianforcaro ede1483e48 Kernel: Make Process creation APIs OOM safe
This change looks more involved than it actually is. This simply
reshuffles the previous Process constructor and splits out the
parts which can fail (resource allocation) into separate methods
which can be called from a factory method. The factory is then
used everywhere instead of the constructor.
2021-05-15 09:01:32 +02:00
Andreas Kling 16221305ad LibELF: Remove sketchy use of "undefined" ELF::Image::Section
We were using ELF::Image::section(0) to indicate the "undefined"
section, when what we really wanted was just Optional<Section>.

So let's use Optional instead. :^)
2021-05-15 00:17:55 +02:00
Mart G e7310ba45a Kernel+LibC: Add fstatat
The function fstatat can do the same thing as the stat and lstat
functions. However, it can be passed the file descriptor of a directory
which will be used when as the starting point for relative paths. This
is contrary to stat and lstat which use the current working directory as
the starting for relative paths.
2021-05-14 23:32:10 +02:00
Gunnar Beutner 8614d18956 Kernel: Use a separate timer for profiling the system
This updates the profiling subsystem to use a separate timer to
trigger CPU sampling. This timer has a higher resolution (1000Hz)
and is independent from the scheduler. At a later time the
resolution could even be made configurable with an argument for
sys$profiling_enable() - but not today.
2021-05-14 00:35:57 +02:00
Andreas Kling e46343bf9a Kernel: Make UserOrKernelBuffer R/W helpers return KResultOr<size_t>
This makes error propagation less cumbersome (and also exposed some
places where we were not doing it.)
2021-05-13 23:28:40 +02:00
Brian Gianforcaro 0d50d3ed1e Kernel: Make InodeWatcher::crate API OOM safe 2021-05-13 16:21:53 +02:00
Brian Gianforcaro 51ceb172b9 Kernel: Replace make<T>() with adopt_own_if_nonnull() in sys$module_load 2021-05-13 16:21:53 +02:00
Brian Gianforcaro 956314f0a1 Kernel: Make Process::start_tracing_from API OOM safe
Modify the API so it's possible to propagate error on OOM failure.
NonnullOwnPtr<T> is not appropriate for the ThreadTracer::create() API,
so switch to OwnPtr<T>, use adopt_own_if_nonnull() to handle creation.
2021-05-13 16:21:53 +02:00
sin-ack fe5ca6ca27 Kernel: Implement multi-watch InodeWatcher :^)
This patch modifies InodeWatcher to switch to a one watcher, multiple
watches architecture.  The following changes have been made:

- The watch_file syscall is removed, and in its place the
  create_iwatcher, iwatcher_add_watch and iwatcher_remove_watch calls
  have been added.
- InodeWatcher now holds multiple WatchDescriptions for each file that
  is being watched.
- The InodeWatcher file descriptor can be read from to receive events on
  all watched files.

Co-authored-by: Gunnar Beutner <gunnar@beutner.name>
2021-05-12 22:38:20 +02:00
Gunnar Beutner 22ebd754d3 Kernel: Fix loading ELF images without PT_INTERP
Previously we'd try to load ELF images which did not have
an interpreter set with an incorrect load offset of 0, i.e. way
outside of the part of the address space where we'd expect either
the dynamic loader or the user's executable to reside.

This fixes the problem by using get_load_offset for both executables
which have an interpreter set and those which don't. Notably this
allows us to actually successfully execute the Loader.so binary:

courage:~ $ /usr/lib/Loader.so
You have invoked `Loader.so'. This is the helper program for programs
that use shared libraries. Special directives embedded in executables
tell the kernel to load this program.

This helper program loads the shared libraries needed by the program,
prepares the program to run, and runs it. You do not need to invoke
this helper program directly.
courage:~ $
2021-05-10 20:39:08 +02:00
Brian Gianforcaro 0b7395848a Kernel: Plumb OOM propagation through Custody factory
Modify the Custody::create(..) API so it has the ability to propagate
OOM back to the caller.
2021-05-10 11:55:52 +02:00
Brian Gianforcaro 8bf4201f50 Kernel: Move process creation perf events to PerformanceManager 2021-05-07 15:35:23 +02:00
Brian Gianforcaro ccdcb6a635 Kernel: Add PerformanceManager static class, move perf event APIs there
The current method of emitting performance events requires a bit of
boiler plate at every invocation, as well as having to ignore the
return code which isn't used outside of the perf event syscall. This
change attempts to clean that up by exposing high level API's that
can be used around the code base.
2021-05-07 15:35:23 +02:00
Brian Gianforcaro 11306d7121
Kernel: Modify TimeManagement::current_time(..) API so it can't fail. (#6869)
The fact that current_time can "fail" makes its use a bit awkward.
All callers in the Kernel are trusted besides syscalls, so assert
that they never get there, and make sure all current callers perform
validation of the clock_id with TimeManagement::is_valid_clock_id().

I have fuzzed this change locally for a bit to make sure I didn't
miss any obvious regression.
2021-05-05 18:51:06 +02:00
Brian Gianforcaro 234c6ae32d Kernel: Change Inode::{read/write}_bytes interface to KResultOr<ssize_t>
The error handling in all these cases was still using the old style
negative values to indicate errors. We have a nicer solution for this
now with KResultOr<T>. This change switches the interface and then all
implementers to use the new style.
2021-05-02 13:27:37 +02:00
Sahan Fernando bd563f0b3c Kernel: Make processes start with a 16-byte-aligned stack 2021-05-01 20:08:35 +02:00
Brian Gianforcaro a678851b41 Kernel: Harden sys$setgroups Vector usage against OOM 2021-05-01 09:10:30 +02:00
Itamar 6bbd2ebf83 Kernel+LibELF: Support initializing values of TLS data
Previously, TLS data was always zero-initialized.

To support initializing the values of TLS data, sys$allocate_tls now
receives a buffer with the desired initial data, and copies it to the
master TLS region of the process.

The DynamicLinker gathers the initial TLS image and passes it to
sys$allocate_tls.

We also now require the size passed to sys$allocate_tls to be
page-aligned, to make things easier. Note that this doesn't waste memory
as the TLS data has to be allocated in separate pages anyway.
2021-04-30 18:47:39 +02:00
Itamar 373e8bcbc7 Kernel: Give a name to the Master TLS region allocation 2021-04-30 18:47:39 +02:00
Gunnar Beutner 3c0355a398 Kernel: Accepted socket file descriptors should not inherit flags
For example Linux accepts an additional argument for flags in accept4()
that let the user specify what flags they want. However, by default
accept() should not inherit those flags from the listener socket.
2021-04-30 11:43:19 +02:00
Jesse Buhagiar 60cdbc9397 Kernel/LibC: Implement setreuid 2021-04-30 11:35:17 +02:00
Brian Gianforcaro a8765fa673 Kernel: Harden sys$select Vector usage against OOM.
Theoretically the append should never fail as we have in-line storage
of FD_SETSIZE, which should always be enough. However I'm planning on
removing the non-try variants of AK::Vector when compiling in kernel
mode in the future, so this will need to go eventually. I suppose it
also protects against some unforeseen bug where we we can append more
than FD_SETSIZE items.
2021-04-29 20:31:15 +02:00
Brian Gianforcaro 0ca668f59c Kernel: Harden sys$munmap Vector usage against OOM.
Theoretically the append should never fail as we have in-line storage
of 2, which should be enough. However I'm planning on removing the
non-try variants of AK::Vector when compiling in kernel mode in the
future, so this will need to go eventually. I suppose it also protects
against some unforeseen bug where we we can append more than 2 items.
2021-04-29 20:31:15 +02:00
Brian Gianforcaro 569c5a8922 Kernel: Harden sys$purge Vector usage against OOM.
sys$purge() is a bit unique, in that it is probably in the systems
advantage to attempt to limp along if we hit OOM while processing
the vmobjects to purge. This change modifies the algorithm to observe
OOM and continue trying to purge any previously visited VMObjects.
2021-04-29 20:31:15 +02:00
Brian Gianforcaro b3096276bb Kernel: Harden sys$poll Vector usage against OOM. 2021-04-29 20:31:15 +02:00
Brian Gianforcaro 119b7be249 Kernel: Harden sys$execve Vector usage against OOM. 2021-04-29 20:31:15 +02:00
Brian Gianforcaro 454d2fd42a Kernel: Harden sys$readv / sys$writev Vector usage against OOM. 2021-04-29 20:31:15 +02:00
Brian Gianforcaro cd29eb7867 Kernel: Harden sys$sendmsg / sys$recvmsg Vector usage against OOM. 2021-04-29 20:31:15 +02:00
Gunnar Beutner d9ee2c6a89 Kernel: Avoid overrunning the user-specified buffers in select() 2021-04-28 23:05:10 +02:00
Gunnar Beutner aa792062cb Kernel+LibC: Implement the socketpair() syscall 2021-04-28 14:19:45 +02:00
Jelle Raaijmakers b630e39fbb Kernel: Check futex command if realtime clock is used 2021-04-27 09:19:55 +02:00
Gunnar Beutner 659507696c Kernel: Fix incorrect argument for thread_exit events 2021-04-26 23:26:58 +02:00
Gunnar Beutner 1c02848e54 Kernel: Log thread exits for global profiles 2021-04-26 23:26:58 +02:00
Gunnar Beutner afeee35cbf Kernel: Avoid calling characters() where not necessary 2021-04-26 23:26:58 +02:00
Gunnar Beutner eb798d5538 Kernel+Profiler: Improve profiling subsystem
This turns the perfcore format into more a log than it was before,
which lets us properly log process, thread and region
creation/destruction. This also makes it unnecessary to dump the
process' regions every time it is scheduled like we did before.

Incidentally this also fixes 'profile -c' because we previously ended
up incorrectly dumping the parent's region map into the profile data.

Log-based mmap support enables profiling shared libraries which
are loaded at runtime, e.g. via dlopen().

This enables profiling both the parent and child process for
programs which use execve(). Previously we'd discard the profiling
data for the old process.

The Profiler tool has been updated to not treat thread IDs as
process IDs anymore. This enables support for processes with more
than one thread. Also, there's a new widget to filter which
process should be displayed.
2021-04-26 17:13:55 +02:00
Linus Groh dbe72fd962 Everywhere: Remove empty line after function body opening curly brace 2021-04-25 20:20:00 +02:00
Brian Gianforcaro 8d6e9fad40 Kernel: Remove the now defunct LOCKER(..) macro. 2021-04-25 09:38:27 +02:00
Jelle Raaijmakers 54f5b1346c Kernel: Support null act argument for sigaction syscall
Userspace can provide a null argument for the `act` argument to the
`sigaction` syscall to not set any new behavior. This is described
here:

https://pubs.opengroup.org/onlinepubs/007904875/functions/sigaction.html

Without this fix, the `copy_from_user(...)` invocation on `user_act`
fails and makes the syscall return early.
2021-04-24 23:00:28 +02:00
Andreas Kling b91c49364d AK: Rename adopt() to adopt_ref()
This makes it more symmetrical with adopt_own() (which is used to
create a NonnullOwnPtr from the result of a naked new.)
2021-04-23 16:46:57 +02:00
Maciej Zygmanowski 6efcc2fc99 Kernel: Don't allow to kill kernel processes
The protection was only for SIGKILL before.
2021-04-23 13:26:02 +02:00
Brian Gianforcaro 1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Gunnar Beutner db3fd11646 Kernel: Remove requirement for the thread entitlement for the futex syscall
GCC inserts calls to pthread_mutex_lock when compiling C++ code with
threads enabled.
2021-04-20 21:08:17 +02:00
Brian Gianforcaro 4ed682aebc Kernel: Add a syscall to clear the profiling buffer
While profiling all processes the profile buffer lives forever.
Once you have copied the profile to disk, there's no need to keep it
in memory. This syscall surfaces the ability to clear that buffer.
2021-04-19 18:30:37 +02:00
FalseHonesty 3123ffb19d Kernel: Add ptrace commands for reading/writing the debug registers
This adds PT_PEEKDEBUG and PT_POKEDEBUG to allow for reading/writing
the debug registers, and updates the Kernel's debug handler to read the
new information from the debug status register.
2021-04-18 17:02:40 +02:00
Gunnar Beutner f033416893 Kernel+LibC: Clean up how assertions work in the kernel and LibC
This also brings LibC's abort() function closer to the spec.
2021-04-18 11:11:15 +02:00
Gunnar Beutner cf13fa57cd Kernel: Allow system calls from the dynamic loader
Previously the dynamic loader would become unused after it had invoked
the program's entry function. However, in order to support exceptions
and - at a later point - dlfcn functionality we need to call back
into the dynamic loader at runtime.

Because the dynamic loader has a static copy of LibC it'll attempt to
invoke syscalls directly from its text segment. For this to work the
executable region for the dynamic loader needs to have syscalls enabled.
2021-04-18 10:55:25 +02:00
Linus Groh 2b0c361d04 Everywhere: Fix a bunch of typos 2021-04-18 10:30:03 +02:00
Gunnar Beutner c3ee70591a Kernel: Read the ELF header from the inode rather than the mapped pages
Reading from the mapping doesn't work when the text segment has a non-zero
offset because in that case the first mapped page doesn't contain the ELF
header.
2021-04-14 13:12:52 +02:00
Gunnar Beutner 2d91761cf6 Kernel: Make sure the offset stays the same when using mremap()
When using mmap() on a file with a non-zero offset subsequent
calls to mremap() would incorrectly reset the offset to zero.
2021-04-14 13:12:52 +02:00
Idan Horowitz 2c93123daf Kernel: Replace process' regions vector with a Red Black tree
This should provide some speed up, as currently searches for regions
containing a given address were performed in O(n) complexity, while
this container allows us to do those in O(logn).
2021-04-12 18:03:44 +02:00
Idan Horowitz 497c759ab7 Kernel: Remove old region from process' regions vector before splitting
This does not affect functionality right now, but it means that the
regions vector will now never have any overlapping regions, which will
allow the use of balance binary search trees instead of a vector in the
future. (since they require keys to be exclusive)
2021-04-12 18:03:44 +02:00
Liav A 8e3e3a71cb Kernel: Introduce a new HID subsystem
The end goal of this commit is to allow to boot on bare metal with no
PS/2 device connected to the system. It turned out that the original
code relied on the existence of the PS/2 keyboard, so VirtualConsole
called it even though ACPI indicated the there's no i8042 controller on
my real machine because I didn't plug any PS/2 device.
The code is much more flexible, so adding HID support for other type of
hardware (e.g. USB HID) could be much simpler.

Briefly describing the change, we have a new singleton called
HIDManagement, which is responsible to initialize the i8042 controller
if exists, and to enumerate its devices. I also abstracted a bit
things, so now every Human interface device is represented with the
HIDDevice class. Then, there are 2 types of it - the MouseDevice and
KeyboardDevice classes; both are responsible to handle the interface in
the DevFS.

PS2KeyboardDevice, PS2MouseDevice and VMWareMouseDevice classes are
responsible for handling the hardware-specific interface they are
assigned to. Therefore, they are inheriting from the IRQHandler class.
2021-04-03 11:57:23 +02:00
Itamar ba0df27653 Kernel: Support write() after setting O_APPEND on a non-seekable file
Previously, Process::do_write would error if the O_APPEND flag was set
on a non-seekable file.

Other systems (such as Linux) seem to be OK with doing this, so we now
do not attempt to seek to the end the file if it's not seekable.
2021-03-29 19:56:54 +02:00
Andreas Kling 497c4c3858 Kernel: Let's also not reverse the blocking flag for FIONBIO.. 2021-03-29 08:59:22 +02:00
Andreas Kling 0f270afda8 Kernel: Let's allow unsetting non-blocking mode with FIONBIO as well
Thanks to almightyhydra for pointing this out! :^)
2021-03-29 08:58:13 +02:00
Andreas Kling 5f71bf0cc7 Kernel+LibC: Implement sys$ioctl() FIONBIO
This is another (older) way of making a file descriptor non-blocking.
2021-03-28 17:50:08 +02:00
Hendiadyoin1 0d934fc991 Kernel::CPU: Move headers into common directory
Alot of code is shared between i386/i686/x86 and x86_64
and a lot probably will be used for compatability modes.
So we start by moving the headers into one Directory.
We will probalby be able to move some cpp files aswell.
2021-03-21 09:35:23 +01:00
Itamar 2365e06b12 Kernel: Set TLS-related members of Process after loading static program
We previously ignored these values in the return value of
load_elf_object, which causes us to not allocate a TLS region for
statically-linked programs.
2021-03-19 22:55:53 +01:00
Andreas Kling d48666489c Kernel: Make FileDescription::seek() return KResultOr<off_t>
This exposed a bunch of places where errors were not propagated,
so this patch is forced to deal with them as well.
2021-03-19 10:44:25 +01:00
Jean-Baptiste Boric 6698fd84ff Kernel: Refactor storage stack with u64 as mmap offset 2021-03-19 09:15:19 +01:00
Jean-Baptiste Boric 7a079f7780 LibC+Kernel: Switch off_t to 64 bits 2021-03-17 23:22:42 +01:00
thatdutchguy 569d6d47fe Kernel: sysconf(_SC_CLK_TCK): Use TimeManagement::ticks_per_second() 2021-03-16 21:56:47 +01:00
thatdutchguy 10e3e8f6d4 Kernel: Add _SC_CLK_TCK to sysconf.
Unbreaks the hatari port.
2021-03-16 21:56:47 +01:00
Andreas Kling a166a65eff Kernel: Don't return -EFOO when return type is KResultOr<...> 2021-03-15 09:09:04 +01:00
Hendiadyoin1 eba3fa5e72 Kernel: Make munmap more posix compliant
In case someone tries to unmap a not mapped region (fallback) we should
not return an error, but silently do nothing
2021-03-13 10:00:46 +01:00
Hendiadyoin1 b7f1171a1c Kernel: munmap multiple regions at a time
This implements a fallback to munmap that unmaps multiple regions at a
time, with splitting some when needed.

The way it is implemented is possibly not optimal, due to it searching
without looking into the cache
2021-03-13 10:00:46 +01:00
Andreas Kling 423ed53396 Kernel: Fix rounding of PT_LOAD mappings in sys$execve()
We were not rounding the mappings down/up correctly, which could lead
to executables missing the last 4 KB of text and/or data.
2021-03-12 17:26:24 +01:00
Andreas Kling 73e06a1983 Kernel: Convert klog() => AK::Format in a handful of places 2021-03-12 15:22:35 +01:00
Andreas Kling 49a0f40ff0 Kernel: Inherit the dumpable flag on sys$fork()
This regressed at some point recently. All children were non-dumpable
until manually opting into it.
2021-03-11 14:35:37 +01:00
Andreas Kling 1608ef37d8 Kernel: Move process termination status/signal into protected data 2021-03-11 14:24:08 +01:00
Andreas Kling b7b7a48c66 Kernel: Move process signal trampoline address into protected data 2021-03-11 14:21:49 +01:00
Andreas Kling 08e0e2eb41 Kernel: Move process umask into protected data :^) 2021-03-11 14:21:49 +01:00
Andreas Kling 90c0f9664e Kernel: Don't keep protected Process data in a separate allocation
The previous architecture had a huge flaw: the pointer to the protected
data was itself unprotected, allowing you to overwrite it at any time.

This patch reorganizes the protected data so it's part of the Process
class itself. (Actually, it's a new ProcessBase helper class.)

We use the first 4 KB of Process objects themselves as the new storage
location for protected data. Then we make Process objects page-aligned
using MAKE_ALIGNED_ALLOCATED.

This allows us to easily turn on/off write-protection for everything in
the ProcessBase portion of Process. :^)

Thanks to @bugaevc for pointing out the flaw! This is still not perfect
but it's an improvement.
2021-03-11 14:21:49 +01:00
Andreas Kling de6c5128fd Kernel: Move process pledge promises into protected data 2021-03-10 22:50:00 +01:00
Andreas Kling 3d27269f13 Kernel: Move process parent PID into protected data :^) 2021-03-10 22:30:02 +01:00