Add a MappedROM::find_chunk_starting_with() helper since that's a very
common usage pattern in clients of this code.
Also convert MultiProcessorParser from a persistent singleton object
to a temporary object constructed via a failable factory function.
This patch adds a MappedROM abstraction to the Kernel VM subsystem.
It's basically the read-only byte buffer equivalent of a TypedMapping.
We use this in the ACPI and MP table parsers to scan for interesting
stuff in low memory instead of doing a bunch of address arithmetic.
For singly-indirect blocks, "callback" is just "add_block".
For doubly-indirect blocks, "callback" is the lambda function
iterating on singly-indirect blocks: so instead of adding itself to the
list, the doubly-indirect block will add all its childs, but they add
themselves again when they run the callback of singly-indirect blocks.
And nothing adds the doubly-indirect block itself :(
This leads to a double free of all child blocks of the doubly-indirect
block, which is the failed assert described in #1549.
Closes: #1549.
Let's not be paying the function call overhead for these tiny ops.
Maybe there's an argument for having fewer gadgets in the kernel but
for now we're actually seeing stac() in profiles so let's put
that above theoretical security issues.
If these methods get inlined, the compiler is able to statically eliminate most
of the assertions. Alas, it doesn't realize this, and believes inlining them to
be too expensive. So give it a strong hint that it's not the case.
This *decreases* the kernel binary size.
Make sure that userspace is always referencing "system" headers in a way
that would build on target :). This means removing the explicit
include_directories of Libraries/LibC in favor of having it export its
headers as SYSTEM. Also remove a redundant include_directories of
Libraries in the 'serenity build' part of the build script. It's already
set at the top.
This causes issues for the Kernel, and for crt0.o. These special cases
are handled individually.
read_block() and write_block() now accept the count (how many bytes to read
or write) and offset (where in the block to start; defaults to 0). Using these
new APIs, we can avoid doing copies between intermediary buffers in a lot more
cases. Hopefully this improves performance or something.
This was a holdover from the old times when each Process had a special
main thread with TID 0. Using it was a total crapshoot since it would
just return whichever thread was first on the process's thread list.
Now that I've removed all uses of it, we don't need it anymore. :^)
Instead of falling back to the suspicious "any_thread()" mechanism,
just fail with ESRCH if you try to kill() a PID that doesn't have a
corresponding TID.
This was supposed to be the foundation for some kind of pre-kernel
environment, but nobody is working on it right now, so let's move
everything back into the kernel and remove all the confusion.
We stopped using gettimeofday() in Core::EventLoop a while back,
in favor of clock_gettime() for monotonic time.
Maintaining an optimization for a syscall we're not using doesn't make
a lot of sense, so let's go back to the old-style sys$gettimeofday().
You can still open files that have sockets attached to them from inside
the kernel via VFS::open() (and in fact, that is what LocalSocket itslef uses),
but trying to do that from userspace using open() will now fail with ENXIO.
This reverts commit 3d342f72a7.
This is causing trouble for macOS users. Also it's painfully slow
compared to using the sudo method. This should definitely not be
the default since it punishes people who have genext2fs installed.
This is very basic and doesn't support many features. Instead
of describing what it *doesn't* support, I'll describe what I
have tested:
1. Public key authentication (password is not supported)
2. Single command execution
3. PTY-less interactive bash shell (/bin/sh doesn't work)
4. Multi-user (i.e you can ssh as 'anon' as well as root)
Step one of moving DesktopServices::open handling out of process. This
makes it easier to do things like read in associations for which program
opens which files or protocols. This gives users the ability to modify
the associations without having to rebuild :^)
It didn't feel right to have a "DHCPClient" in a "Servers" directory.
Rename this to Services to better reflect the type of programs we'll
be putting in there.
Ultimately we should not panic just because we can't fully commit a VM
region (by populating it with physical pages.)
This patch handles some of the situations where commit() can fail.
This caused us to report one purged page per occurrence of the shared
zero page in a purgeable memory region, despite it being a no-op.
Thanks to Sergey for spotting the bad assertion removal that led to
this being found!
This patch adds PageFaultResponse::OutOfMemory which informs the fault
handler that we were unable to allocate a necessary physical page and
cannot continue.
In response to this, the kernel will crash the current process. Because
we are OOM, we can't symbolicate the crash like we normally would
(since the ELF symbolication code needs to allocate), so we also
communicate to Process::crash() that we're out of memory.
Now we can survive "allocate 300 MB" (only the allocate process dies.)
This is definitely not perfect and can easily end up killing a random
innocent other process who happened to allocate one page at the wrong
time, but it's a *lot* better than panicking on OOM. :^)
This function has a lot of callers that don't bother checking if it
returns successfully or not. We'll need to handle failure in a bunch
of places and then we can remove this assertion.
The detection works very similarly to how we detect a mouse wheel, just
another magical sequence of "set sample rate" requests to the mouse
followed by an ID check.
If we OOM during a CoW fault and fail to allocate a new page for the
writing process, just leave the original VMObject alone so everyone
else can keep using it.
Since a Region is basically a view into a potentially larger VMObject,
it was always necessary to include the Region starting offset when
accessing its underlying physical pages.
Until now, you had to do that manually, but this patch adds a simple
Region::physical_page() for read-only access and a physical_page_slot()
when you want a mutable reference to the RefPtr<PhysicalPage> itself.
A lot of code is simplified by making use of this.
Previously we blindly just called update_next_timer_due() when
ever we modified the timer list. Since we know the list is sorted
this is a bit wasteful, and we can do better.
This change refactors the code so we only update the next due time
when necessary. In places where it was possible the code was modified
to directly modify the next due time, instead of having to go to the
front of the list to fetch it.
The public consumers of the timer API shouldn't need to know
the how timer id's are tracked internally. Expose a typedef
instead to allow the internal implementation to be protected
from potential churn in the future.
It's also just good API design.
Utilize the new Thread::wait_on timeout parameter to implement
timeout support for FUTEX_WAIT.
As we compute the relative time from the user specified absolute
time, we try to delay that computation as long as possible before
we call into Thread::wait_on(..). To enable this a small bit of
refactoring was done pull futex_queue fetching out and timeout fetch
and calculation separation.
This change plumbs a new optional timeout option to wait_on.
The timeout is enabled by enqueing a timer on the timer queue
while we are waiting. We can then see if we were woken up or
timed out by checking if we are still on the wait queue or not.
The current API of add_timer makes it hard to use as
you are forced to do a bunch of time arithmetic at the
caller. Ideally we would have overloads for common time
types like timespec or timeval to keep the API as straight
forward as possible. This change moves us in that direction.
While I'm here, we should really also use the machines actual
ticks per second, instead of the OPTIMAL_TICKS_PER_SECOND_RATE.
This is a special case that was previously not implemented.
The idea is that you can dispatch a signal to all other processes
the calling process has access to.
There was some minor refactoring to make the self signal logic
into a function so it could easily be easily re-used from do_killall.
Previously, when returning from a pthread's start_routine, we would
segfault. Now we instead implicitly call pthread_exit as specified in
the standard.
pthread_create now creates a thread running the new
pthread_create_helper, which properly manages the calling and exiting
of the start_routine supplied to pthread_create. To accomplish this,
the thread's stack initialization has been moved out of
sys$create_thread and into the userspace function create_thread.
A Lock can now be held either in shared or exclusive mode. Multiple threads can
hold the same lock in shared mode at one time, but if any thread holds the lock
in exclusive mode, no other thread can hold it at the same time in either mode.
The next commit is going to make it bigger again by increasing the size of Lock,
so make use of bitfields to make sure FileDescription still fits into 64 bytes,
and so can still be allocated with the SlabAllocator.
We don't need a wrapper Function object that just forwards the timer
callback to the scheduler tick function. It already has the same
signature, so we can just plug it in directly. :^)
Same with the clock updating function.
The IOAPIC manual states that "Interrupt Mask-R/W. When this bit is 1,
the interrupt signal is masked. Edge-sensitive interrupts signaled on
a masked interrupt pin are ignored." - Therefore we have to ensure that
we disable interrupts globally with cli(), but also to ensure that we
invoke enable_irq() before sending the hardware command that generates
an IRQ almost immediately.
First, before this change, specifying 'force_pio' in the kernel
commandline was meaningless because we nevertheless set the DMA flag to
be enabled.
Also, we had a problem in which we used IO::repeated_out16() in PIO
write method. This might work on buggy emulators, but I suspect that on
real hardware this code will fail.
The most difficult problem was to restore the PIO read operation.
Apparently, it seems that we can't use IO::repeated_in16() here because
it will read zeroed data. Currently we rely on a simple loop that
invokes IO::in16() to a buffer. Also, the interrupt handling stage in
the PIO read method is moved to be handled inside the loop of reading
the requested sectors.
POSIX says, "Conforming applications should not assume that the returned
contents of the symbolic link are null-terminated."
If we do include the null terminator into the returning string, Python
believes it to actually be a part of the returned name, and gets unhappy
about that later. This suggests other systems Python runs in don't include
it, so let's do that too.
Also, make our userspace support non-null-terminated realpath().
Output address validation should be done for the tracer's address space
and not the tracee's.
Also use copy_to_user() instead of copy_from_user(). The two are really
identical at the moment, but maybe we can add some assertions to make
sure we're doing what we think we're doing.
Thanks to Sergey for spotting these!
We currently only care about debug exceptions that are triggered
by the single-step execution mode.
The debug exception is translated to a SIGTRAP, which can be caught
and handled by the tracing thread.
This memory range was set up using 2MB pages by the code in boot.S.
Because of that, the kernel image protection code didn't work, since it
assumed 4KB pages.
We now switch to 4KB pages during MemoryManager initialization. This
makes the kernel image protection code work correctly again. :^)
The syscall wrapper for ptrace needs to return the peeked value when
using PT_PEEK.
Because of this, the user has to check errno to detect an error in
PT_PEEK.
This commit changes the actual syscall's interface (only for PT_PEEK) to
allow the syscall wrapper to detect an error and change errno.
PT_SETTREGS sets the regsiters of the traced thread. It can only be
used when the tracee is stopped.
Also, refactor ptrace.
The implementation was getting long and cluttered the alraedy large
Process.cpp file.
This commit moves the bulk of the implementation to Kernel/Ptrace.cpp,
and factors out peek & poke to separate methods of the Process class.
This was a missing feature in the PT_TRACEME command.
This feature allows the tracer to interact with the tracee before the
tracee has started executing its program.
It will be useful for automatically inserting a breakpoint at a
debugged program's entry point.
Before this commit, m_blocker was only set to null in Thread::block,
after the thread has been unblocked.
Starting with this commit, m_blocker is also set to null in
Thread::unblock.
This change will allow us to implement a missing feature of the PT_TRACE
command of the ptrace syscall - stopping the traced thread when it
exits the execve syscall.
That feature will be implemented by sending a blocking SIGSTOP to the
traced thread after it has executed the execve logic and before it
starts executing the new program in userspace.
However, since Process::exec arranges the tss to return to userspace
(the so-called "yield-teleport"), the code in Thread::block that should
be run after the thread unblocks, and sets m_blocker to null, never
actually runs.
Setting m_blocker to null in Thread::unblock allows us to avoid an
incorrect state where the thread is in a Running state but conatins a
pointer to a Blocker.
PT_POKE writes a single word to the tracee's address space.
Some caveats:
- If the user requests to write to an address in a read-only region, we
temporarily change the page's protections to allow it.
- If the user requests to write to a region that's backed by a
SharedInodeVMObject, we replace the vmobject with a PrivateIndoeVMObject.
This patch adds the minherit() syscall originally invented by OpenBSD.
Only the MAP_INHERIT_ZERO mode is supported for now. If set on an mmap
region, that region will be zeroed out on fork().
We now store the previous thread state in m_stop_state for all
transitions to the Stopped state via Thread::set_state.
Fixes#1752 whereupon resuming a thread that was stopped with SIGTSTP,
the previous state of the thread is not remembered correctly, resulting
in m_stop_state == State::Invalid and the associated assertion fails.
These validate_elf_* methods really had no business being static
methods of ELF::Image. Now that the ELF namespace exists, it makes
sense to just move them to be free functions in the namespace.
The plan is to extend what currently is known as "CPUGraph" and let the
SystemServer spawn multiple instances of it - which then can show memory
or network usages as well :^)
Simply renaming the applet is the first step.
This commit is one step forward for pluggable driver modules.
Instead of creating instances of network adapter classes, we let
their detect() methods to figure out if there are existing devices
to initialize.