Commit graph

366 commits

Author SHA1 Message Date
Hendiadyoin1 f74f80e13b Kernel/aarch64: Use the FDT to get the device/boot info
This removes the old hacky multiboot memory map and instead uses the
FDT to get the actual memory map.
2024-08-20 21:52:12 -04:00
Hendiadyoin1 e0a177061e Kernel/Firmware+riscv64: Move devicetree handling to Firmware directory
This also adds a `verify_fdt` method which will be used in later commits
2024-08-20 21:52:12 -04:00
Sönke Holz 194d9df34f Kernel/Memory: Handle devicetree memory nodes with more than one region 2024-08-20 21:52:12 -04:00
Sönke Holz ab44530304 Kernel/Memory: Remove x86 LAPIC address hack
MM is now able to handle MMIO after our "highest_phyisical_address".
2024-08-20 21:52:12 -04:00
brody-qq a0b021cbcf Kernel/Memory: Fix crash on writes to shared file mmaps
Writes to SharedInodeVMObjects could cause a Protection Violation if a
page was marked as dirty by a different process.

This happened due to a combination of 2 things:
* handle_dirty_on_write_fault() was skipped if a page was already marked
  as dirty
* when a page was marked as dirty, only the Region that caused the page
  fault was remapped

This commit:
* fixes the crash by making handle_fault() stop checking if a page was
  marked dirty before running handle_dirty_on_write_fault()
* modifies handle_dirty_on_write_fault() so that it always marks the
  page as dirty and remaps the page (this avoids a 2nd bug that was
  never hit due to the 1st bug)
2024-08-10 16:19:12 +02:00
Liav A. dd59fe35c7 Kernel+Userland: Reduce jails to be a simple boolean flag
The whole concept of Jails was far more complicated than I actually want
it to be, so let's reduce the complexity of how it works from now on.
Please note that we always leaked the attach count of a Jail object in
the fork syscall if it failed midway.
Instead, we should have attach to the jail just before registering the
new Process, so we don't need to worry about unsuccessful Process
creation.

The reduction of complexity in regard to jails means that instead of
relying on jails to provide PID isolation, we could simplify the whole
idea of them to be a simple SetOnce, and let the ProcessList (now called
ScopedProcessList) to be responsible for this type of isolation.

Therefore, we apply the following changes to do so:
- We make the Jail concept no longer a class of its own. Instead, we
  simplify the idea of being jailed to a simple ProtectedValues boolean
  flag. This means that we no longer check of matching jail pointers
  anywhere in the Kernel code.
  To set a process as jailed, a new prctl option was added to set a
  Kernel SetOnce boolean flag (so it cannot change ever again).
- We provide Process & Thread methods to iterate over process lists.
  A process can either iterate on the global process list, or if it's
  attached to a scoped process list, then only over that list.
  This essentially replaces the need of checking the Jail pointer of a
  process when iterating over process lists.
2024-07-21 11:44:23 +02:00
brody-qq 2a164dc923 Kernel/Memory: Fix overcommit when cloning anonymous mmap objects
AnonymousVMObject::try_clone() computed how many shared cow pages to
commit by counting all VMObject pages that were not shared_zero_pages.

This means that lazy_committed_pages were also being included in the
count. This is a problem because the page fault handling code for
lazy_committed_pages does not allocate from
m_shared_committed_cow_pages. So more pages than necessary were being
committed.

This fixes this overcommitting problem by skipping lazy_committed_pages
when counting how many pages to commit.
2024-07-12 08:52:06 -04:00
brody-qq faa6395a11 Kernel/Memory: Add more efficient method for remapping single page
This commit introduces VMObject::remap_regions_single_page(). This
method remaps a single page in all regions associated with a VMObject.
This is intended to be a more efficient replacement for remap_regions()
in cases where only a single page needs to be remapped.

This commit also updates the cow page fault handling code to use this
new method.
2024-07-12 08:52:06 -04:00
brody-qq e14f954988 Kernel/Memory: Fix shared anonymous mmap changes not being shared
Writes to a MAP_SHARED | MAP_ANONYMOUS mmap region were not visible to
other processes sharing the mmap region. This was happening because the
page fault handler was not remapping the VMObject's m_regions after
allocating a new page.

This commit fixes the problem by calling remap_regions() after assigning
a new page to the VMObject in the page fault handler. This remapping
only occurs for shared Regions.
2024-07-12 08:52:06 -04:00
brody-qq 781ded408b Kernel/Memory: Small refactor of handle_zero_fault()
This commit makes the following minor changes to handle_zero_fault():
* cleans up a call to static_cast(), replacing it with a reference (a
  future commit will also use this reference).
* replaces a call to vmobject() with the new reference mentioned above.
* moves the definition of already_handled to inside the block where
  already_handled is used.
2024-07-12 08:52:06 -04:00
brody-qq 8812410617 Kernel/Memory: Fix redundant page faults on anonymous mmaps after fork
After a fork(), page faults on anonymous mmaps can cause a redundant
page fault to occur.

This happens because VMObjects for anonymous mmaps are initially filled
with references to the lazy_committed_page or shared_zero_page. If there
is a fork, VMObject::try_clone() is called and all pages of the VMObject
are marked as cow (via the m_cow_map).

Page faults on a zero/lazy page are handled by handle_zero_fault().
handle_zero_fault() does not update m_cow_map, so if the page was marked
cow before the fault, it will still be marked cow after the fault. This
causes a second (redundant) page fault when the CPU retries the write.

This commit removes the redundant page fault by not marking zero/lazy
pages as cow in m_cow_map.
2024-07-12 08:52:06 -04:00
brody-qq 2278b17c42 Kernel/Memory: Remove cow map updates from try_allocate_split_region()
AddressSpace::try_allocate_split_region() was updating the cow map of
new_region based on the cow map of source_region.

The problem is that both new_region and source_region reference the
same vmobject and the same cow map, so these cow map updates didn't
actually change anything.

This commit:
* removes the cow map updates from try_allocate_split_region()
* removes Region::set_should_cow() since it is no longer used
2024-07-12 08:52:06 -04:00
brody-qq 3e9b269bcd Kernel/Memory: Make mmap objects track dirty pages
InodeVMObjects now track dirty and clean pages. This tracking of
dirty and clean pages is used by the msync and purge syscalls.

dirty page tracking works using the following rules:
* when a new InodeVMObject is made, all pages are marked clean.
* writes to clean InodeVMObject pages will cause a page fault,
  the fault handler will mark the page as dirty.
* writes to dirty InodeVMObject pages do not cause page faults.
* if msync is called, only dirty pages are flushed to storage (and
  marked clean).
* if purge syscall is called, only clean pages are discarded.
2024-07-07 18:25:32 +02:00
brody-qq e254810d0a Kernel/Memory: Remove duplicate code in try_create_purgeable_with_size()
The methods try_create_with_size() and try_create_purgeable_with_size()
on AnonymousVMObject are almost identical, other than one member
that gets set (m_purgeable). This patch makes
try_create_purgeable_with_size() call try_create_with_size() so that
both methods re-use the same code.
2024-07-01 12:47:32 +02:00
Idan Horowitz 3aa1bd520b Kernel: Support re-mapping MMIOVMObject-backed regions
This is required for example when write combine is enabled on a region
after the initial mapping.
2024-06-25 17:46:37 +02:00
brody-qq 5058873d45 Kernel/Memory: Make release_all_clean_pages use try_release_clean_pages
The methods try_release_clean_pages() and release_all_clean_pages() in
InodeVMObject are almost identical. This commit makes them both use the
same code path.
2024-06-09 14:00:41 -04:00
brody-qq a4ca757db9 Kernel: Add method to clean up remapping region loops
In the VMObject code there are multiple examples of loops over
the VMObject's regions (using for_each_region()) that call remap()
on each region.

To clean up usage of this pattern, this patch adds a method in
VMObject that does this remapping loop. VMObject code that needs
to remap its regions call the new method.
2024-06-08 22:36:03 +01:00
brody-qq 6f6966fb55 Kernel: Remove redundant VERIFY()
Removes a VERIFY() that is already checked earlier in the function
2024-06-05 20:18:44 +01:00
Idan Horowitz 26cff62a0a Kernel: Rename Memory::PhysicalPage to Memory::PhysicalRAMPage
Since these are now only used to represent RAM pages, (and not MMIO
pages) rename them to make their purpose more obvious.
2024-05-17 15:38:28 -06:00
Idan Horowitz 827322c139 Kernel: Stop allocating physical pages for mapped MMIO regions
As MMIO is placed at fixed physical addressed, and does not need to be
backed by real RAM physical pages, there's no need to use PhysicalPage
instances to track their pages.
This results in slightly reduced allocations, but more importantly
makes MMIO addresses which end up after the normal RAM ranges work,
like 64-bit PCI BARs usually are.
2024-05-17 15:38:28 -06:00
Liav A d068af89d5 Kernel/x86: Bake the Prekernel and the Kernel into one image
The new baked image is a Prekernel and a Kernel baked together now, so
essentially we no longer need to pass the Prekernel as -kernel and the
actual kernel image as -initrd to QEMU, leaving the option to pass an
actual initrd or initramfs module later on with multiboot.
2024-05-14 23:37:38 +02:00
Liav A. 5194ab59b5 Kernel/Memory: Make has_been_{r,w,x} flags clearly being set
Before of this change, actually setting the m_access to contain the
HasBeen{Readeable,Writable,Executable} bits was done by the method of
Region set_access_bit which added ORing with (access << 4) when enabling
a certain access bit to achieve this.

Now this is changed and when calling set_{readeable,writable,executable}
methods, they will set an appropriate SetOnce flag that could be checked
later.
2024-05-14 12:41:51 -06:00
Liav A. e756567341 Kernel+Userland: Convert process syscall region enforce flag to SetOnce
This flag is set only once, and should never reset once it has been set,
making it an ideal SetOnce use-case.
It also simplifies the expected conditions for the enabling prctl call,
as we don't expect a boolean flag, but rather the specific prctl option
will always set (enable) Process' AddressSpace syscall region enforcing.
2024-05-14 12:41:51 -06:00
Dan Klishch cc5bacf886 Kernel: Allow annotating initially loaded executable segments
This allows marking regions as VirtualMemoryRangeFlags::SyscallCode in
static executables.
2024-05-07 16:36:38 -06:00
Hendiadyoin1 8ea8b7a6e5 Kernel/MM: Parse /memreserve/ blocks in FDT based memory mapping mode
These seem to be actually used in the RPi FDTs
2024-05-02 07:44:13 -06:00
Hendiadyoin1 2b13769dd5 Kernel/MM: Skip non static reserved memory regions instead of crashing
Crashing seems a bit harsh, so let's just skip them instead, as they
actually show up in the device tree of RPis.
2024-05-02 07:44:13 -06:00
Liav A. 2bba9411ca Kernel: Use the AK SetOnce container class in various cases
We have many places in the kernel code that we have boolean flags that
are only set once, and never reset again but are checked multiple times
before and after the time they're being set, which matches the purpose
of the SetOnce class.
2024-04-26 23:46:23 -06:00
Sönke Holz 6654021655 Kernel/riscv64: Don't hard-code the page fault reason on RISC-V
Instead, rewrite the region page fault handling code to not use
PageFault::type() on RISC-V.

I split Region::handle_fault into having a RISC-V-specific
implementation, as I am not sure if I cover all page fault handling edge
cases by solely relying on MM's own region metadata.
We should probably also take the processor-provided page fault reason
into account, if we decide to merge these two implementations in the
future.
2024-03-25 14:18:38 -06:00
Hendiadyoin1 d3f6b03733 Kernel/riscv64: Take the memory map from the FDT and dump it
For this the BootInfo struct was made architecture specific
2024-02-24 16:43:44 -07:00
Hendiadyoin1 23d6c88027 Kernel/MM: Don't allocate a temporary Vector when parsing the memory map
Instead we can achieve the same by just using an optional.
2024-01-12 15:59:47 -07:00
Idan Horowitz f7a1f28d7f Kernel: Add initial basic support for KASAN
This commit adds minimal support for compiler-instrumentation based
memory access sanitization.
Currently we only support detection of kmalloc redzone accesses, and
kmalloc use-after-free accesses.

Support for inline checks (for improved performance), and for stack
use-after-return and use-after-return detection is left for future PRs.
2023-12-30 13:57:10 +01:00
Sönke Holz 28a3089dc3 Kernel/riscv64: Return correct range in kernel_virtual_range on RISC-V
riscv64 doesn't use a prekernel, so use the same code as aarch64 for
determining the kernel virtual address range.
2023-12-29 16:45:08 +01:00
Idan Horowitz 4c6fd454d0 Kernel: Add MM helper for shrinking a virtual range to page boundaries 2023-12-24 16:11:35 +01:00
Idan Horowitz f972eda7ed Kernel: Mark cloned volatile purgeable AnonymousVMOjects as purged
Our existing AnonymousVMObject cloning flow contains an optimization
wherein purgeable VMObjects which are marked volatile during the clone
are created as a new zero-filled VMObject (as if it was purged), which
lets us skip the expensive COW process.

Unfortunately, one crucial part was missing: Marking the cloned region
as purged, (which is the value returned from madvise when unmarking the
region as volatile) so the userland logic was left unaware of the
effective zero-ing of their memory region, resulting in odd behaviour
and crashes in places like our malloc's large allocation support.
2023-12-22 10:57:59 +01:00
Vladimir Serbinenko 160609d80a Kernel/Memory: Map framebuffer and address space <4GiB
Address space under 4GiB is used for I/O but is absent
from memory maps on some systems.
2023-10-03 16:19:03 -06:00
Liav A 3fd4997fc2 Kernel: Don't allocate memory for names of processes and threads
Instead, use the FixedCharBuffer class to ensure we always use a static
buffer storage for these names. This ensures that if a Process or a
Thread were created, there's a guarantee that setting a new name will
never fail, as only copying of strings should be done to that static
storage.

The limits which are set are 32 characters for processes' names and 64
characters for thread names - this is because threads' names could be
more verbose than processes' names.
2023-08-09 21:06:54 -06:00
Liav A 3b09560251 Kernel/Memory: Split the MemoryManager.h file from user address checks 2023-08-09 21:06:54 -06:00
kleines Filmröllchen 2fd23745a9 Kernel: Allow relaxing cleanup task rules during system shutdown
Once we move to a more proper shutdown procedure, processes other than
the finalizer task must be able to perform cleanup and finalization
duties, not only because the finalizer task itself needs to be cleaned
up by someone. This global variable, mirroring the early boot flags,
allows a future shutdown process to perform cleanup on its own.

Note that while this *could* be considered a weakening in security, the
attack surface is minimal and the results are not dramatic. To exploit
this, an attacker would have to gain a Kernel write primitive to this
global variable (bypassing KASLR among other things) and then gain some
way of calling the relevant functions, all of this only to destroy some
other running process. The same effect can be achieved with LPE which
can often be gained with significantly simpler userspace exploits (e.g.
of setuid binaries).
2023-07-15 00:12:01 +02:00
Kirill Nikolaev 6cdb1f0415 Kernel: Add an initial implementation of virtio-net driver
It can be exercised by setting
    SERENITY_ETHERNET_DEVICE_TYPE=virtio-net-pci.
2023-07-11 00:49:11 -06:00
Timothy Flynn c911781c21 Everywhere: Remove needless trailing semi-colons after functions
This is a new option in clang-format-16.
2023-07-08 10:32:56 +01:00
Liav A 336fb4f313 Kernel: Move InterruptDisabler to the Interrupts subdirectory 2023-06-04 21:32:34 +02:00
Liav A 927926b924 Kernel: Move Performance-measurement code to the Tasks subdirectory 2023-06-04 21:32:34 +02:00
Liav A 8f21420a1d Kernel: Move all boot-related code to the new Boot subdirectory 2023-06-04 21:32:34 +02:00
Liav A 7c0540a229 Everywhere: Move global Kernel pattern code to Kernel/Library directory
This has KString, KBuffer, DoubleBuffer, KBufferBuilder, IOWindow,
UserOrKernelBuffer and ScopedCritical classes being moved to the
Kernel/Library subdirectory.

Also, move the panic and assertions handling code to that directory.
2023-06-04 21:32:34 +02:00
Liav A aaa1de7878 Kernel: Move {Virtual,Physical}Address classes to the Memory directory 2023-06-04 21:32:34 +02:00
Liav A 490856453d Kernel: Move Random.{h,cpp} code to Security subdirectory 2023-06-04 21:32:34 +02:00
Liav A 1b04726c85 Kernel: Move all tasks-related code to the Tasks subdirectory 2023-06-04 21:32:34 +02:00
Pankaj Raghav dabc6dd962 Kernel/ScatterGatherList: Add region_name as a part of try_create API
Remove the hardcoded "AHCI Scattered DMA" for region name as it is a
part of a common API. Add region_name parameter to the try_create API
so that this API can be used by other drivers with the correct Memory
region name.
2023-05-19 22:04:37 +02:00
Pankaj Raghav e067046474 Kernel/ScatterGatherList: Move constructor init code to try_create
The constructor code of ScatterGatherList had code that can return
error. Move it to try_create for better error propagation.

This removes one TODO() and one
release_value_but_fixme_should_propagate_errors().
2023-05-19 22:04:37 +02:00
Pankaj Raghav 489e268b96 Kernel/ScatterGatherList: Return ErrorOr from try_create
This removes the TODO from the try_create API to return ErrorOr. This
is also a preparation patch to move the init code in the constructor
that can fail to this try_create function.
2023-05-19 22:04:37 +02:00