The methods try_release_clean_pages() and release_all_clean_pages() in
InodeVMObject are almost identical. This commit makes them both use the
same code path.
In the VMObject code there are multiple examples of loops over
the VMObject's regions (using for_each_region()) that call remap()
on each region.
To clean up usage of this pattern, this patch adds a method in
VMObject that does this remapping loop. VMObject code that needs
to remap its regions call the new method.
As MMIO is placed at fixed physical addressed, and does not need to be
backed by real RAM physical pages, there's no need to use PhysicalPage
instances to track their pages.
This results in slightly reduced allocations, but more importantly
makes MMIO addresses which end up after the normal RAM ranges work,
like 64-bit PCI BARs usually are.
The new baked image is a Prekernel and a Kernel baked together now, so
essentially we no longer need to pass the Prekernel as -kernel and the
actual kernel image as -initrd to QEMU, leaving the option to pass an
actual initrd or initramfs module later on with multiboot.
Before of this change, actually setting the m_access to contain the
HasBeen{Readeable,Writable,Executable} bits was done by the method of
Region set_access_bit which added ORing with (access << 4) when enabling
a certain access bit to achieve this.
Now this is changed and when calling set_{readeable,writable,executable}
methods, they will set an appropriate SetOnce flag that could be checked
later.
This flag is set only once, and should never reset once it has been set,
making it an ideal SetOnce use-case.
It also simplifies the expected conditions for the enabling prctl call,
as we don't expect a boolean flag, but rather the specific prctl option
will always set (enable) Process' AddressSpace syscall region enforcing.
We have many places in the kernel code that we have boolean flags that
are only set once, and never reset again but are checked multiple times
before and after the time they're being set, which matches the purpose
of the SetOnce class.
Instead, rewrite the region page fault handling code to not use
PageFault::type() on RISC-V.
I split Region::handle_fault into having a RISC-V-specific
implementation, as I am not sure if I cover all page fault handling edge
cases by solely relying on MM's own region metadata.
We should probably also take the processor-provided page fault reason
into account, if we decide to merge these two implementations in the
future.
This commit adds minimal support for compiler-instrumentation based
memory access sanitization.
Currently we only support detection of kmalloc redzone accesses, and
kmalloc use-after-free accesses.
Support for inline checks (for improved performance), and for stack
use-after-return and use-after-return detection is left for future PRs.
Our existing AnonymousVMObject cloning flow contains an optimization
wherein purgeable VMObjects which are marked volatile during the clone
are created as a new zero-filled VMObject (as if it was purged), which
lets us skip the expensive COW process.
Unfortunately, one crucial part was missing: Marking the cloned region
as purged, (which is the value returned from madvise when unmarking the
region as volatile) so the userland logic was left unaware of the
effective zero-ing of their memory region, resulting in odd behaviour
and crashes in places like our malloc's large allocation support.
Instead, use the FixedCharBuffer class to ensure we always use a static
buffer storage for these names. This ensures that if a Process or a
Thread were created, there's a guarantee that setting a new name will
never fail, as only copying of strings should be done to that static
storage.
The limits which are set are 32 characters for processes' names and 64
characters for thread names - this is because threads' names could be
more verbose than processes' names.
Once we move to a more proper shutdown procedure, processes other than
the finalizer task must be able to perform cleanup and finalization
duties, not only because the finalizer task itself needs to be cleaned
up by someone. This global variable, mirroring the early boot flags,
allows a future shutdown process to perform cleanup on its own.
Note that while this *could* be considered a weakening in security, the
attack surface is minimal and the results are not dramatic. To exploit
this, an attacker would have to gain a Kernel write primitive to this
global variable (bypassing KASLR among other things) and then gain some
way of calling the relevant functions, all of this only to destroy some
other running process. The same effect can be achieved with LPE which
can often be gained with significantly simpler userspace exploits (e.g.
of setuid binaries).
This has KString, KBuffer, DoubleBuffer, KBufferBuilder, IOWindow,
UserOrKernelBuffer and ScopedCritical classes being moved to the
Kernel/Library subdirectory.
Also, move the panic and assertions handling code to that directory.
Remove the hardcoded "AHCI Scattered DMA" for region name as it is a
part of a common API. Add region_name parameter to the try_create API
so that this API can be used by other drivers with the correct Memory
region name.
The constructor code of ScatterGatherList had code that can return
error. Move it to try_create for better error propagation.
This removes one TODO() and one
release_value_but_fixme_should_propagate_errors().
This removes the TODO from the try_create API to return ErrorOr. This
is also a preparation patch to move the init code in the constructor
that can fail to this try_create function.
"The official project language is American English […]."
5d2e915623/CONTRIBUTING.md (L30)
Here's a short statistic of the occurrences of the word "behavio(u)r":
$ git grep -IPioh 'behaviou?r' | sort | uniq -c | sort -n
2 BEHAVIOR
24 Behaviour
32 behaviour
407 Behavior
992 behavior
Therefore, it is clear that "behaviour" (56 occurrences) should be
regarded a typo, and "behavior" (1401 occurrences) should be preferred.
Note that The occurrences in LibJS are intentionally NOT changed,
because there are taken verbatim from the specification. Hence:
$ git grep -IPioh 'behaviou?r' | sort | uniq -c | sort -n
2 BEHAVIOR
10 behaviour
24 Behaviour
407 Behavior
1014 behavior
This was discovered by me during a work on USB keyboard patches, so it
triggered this bug.
The printing format for the VirtualAddress part is incorrect, leading to
another crash when handling page fault after accessing UNMAP_AFTER_INIT
code section.
Some hardware/software configurations crash KVM as soon as we try to
start Serenity. The exact cause is currently unknown, so just fully
revert it for now.
This reverts commit 897c4e5145.
The new baked image is a Prekernel and a Kernel baked together now, so
essentially we no longer need to pass the Prekernel as -kernel and the
actual kernel image as -initrd to QEMU, leaving the option to pass an
actual initrd or initramfs module later on with multiboot.
These were easy to pick-up as these pointers are assigned during the
construction point and are never changed afterwards.
This small change to these pointers will ensure that our code will not
accidentally assign these pointers with a new object which is always a
kind of bug we will want to prevent.
Previously we had a race condition in the page fault handling: We were
relying on the affected Region staying alive while handling the page
fault, but this was not actually guaranteed, as an munmap from another
thread could result in the region being removed concurrently.
This commit closes that hole by extending the lifetime of the region
affected by the page fault until the handling of the page fault is
complete. This is achieved by maintaing a psuedo-reference count on the
region which counts the number of in-progress page faults being handled
on this region, and extending the lifetime of the region while this
counter is non zero.
Since both the increment of the counter by the page fault handler and
the spin loop waiting for it to reach 0 during Region destruction are
serialized using the appropriate AddressSpace spinlock, eventual
progress is guaranteed: As soon as the region is removed from the tree
no more page faults on the region can start.
And similarly correctness is ensured: The counter is incremented under
the same lock, so any page faults that are being handled will have
already incremented the counter before the region is deallocated.
This replaces the previous owning address space pointer. This commit
should not change any of the existing functionality, but it lays down
the groundwork needed to let us properly access the region table under
the address space spinlock during page fault handling.