In VFS::rename, if new_path is equal to '/', then, parent custody is
set to null.
VFS::rename would then use parent custody without checking it first.
Fixed VFS::rename to check both old and new path parent custody
before actually using them.
write_bytes is called with a count of 0 bytes if a directory is being
deleted, because in that case even the . and .. pseudo directories are
getting removed. In this case write_bytes is now a no-op.
Before write_bytes would fail because it would check to see if there
were any blocks available to write in (even though it wasn't going to
write in them anyway).
This behaviour was uncovered because of a recent change where
directories are correctly reduced in size. Which in this case results in
all the blocks being removed from the inode, whereas previously there
would be some stale blocks around to pass the check.
e2fsck considers all blocks reachable through any of the pointers in
m_raw_inode.i_block as part of this inode regardless of the value in
m_raw_inode.i_size. When it finds more blocks than the amount that
is indicated by i_size or i_blocks it offers to repair the filesystem
by changing those values. That will actually cause further corruption.
So we must zero all pointers to blocks that are now unused.
The current method of emitting performance events requires a bit of
boiler plate at every invocation, as well as having to ignore the
return code which isn't used outside of the perf event syscall. This
change attempts to clean that up by exposing high level API's that
can be used around the code base.
Ext2 directory contents are stored in a linked list of ext2_dir_entry
structs. There is no sentinel value to determine where the list ends.
Instead the list fills the entirety of the allocated space for the
inode.
Previously the inode was not correctly resized when it became smaller.
This resulted in stale data being interpreted as part of the linked list
of directory entries.
When reading UDP packets from userspace with recvmsg()/recv() we
would hit a VERIFY() if the supplied buffer is smaller than the
received UDP packet. Instead we should just return truncated data
to the caller.
This can be reproduced with:
$ dd if=/dev/zero bs=1k count=1 | nc -u 192.168.3.190 68
We use a global setting to determine if Caps Lock should be remapped to
Control because we don't care how keyboard events come in, just that they
should be massaged into different scan codes.
The `proc` filesystem is able to manipulate this global variable using
the `sysctl` utility like so:
```
# sysctl caps_lock_to_ctrl=1
```
An IP socket can now join a multicast group by using the
IP_ADD_MEMBERSHIP sockopt, which will cause it to start receiving
packets sent to the multicast address, even though this address does
not belong to this host.
When using `sysctl` you can enable/disable values by writing to the
ProcFS. Some drift must have occured where writing was failing due to
a missing `set_mtime` call. Whenever one `write`'s a file the modified
time (mtime) will be updated so we need to implement this interface in
ProcFS.
The fact that current_time can "fail" makes its use a bit awkward.
All callers in the Kernel are trusted besides syscalls, so assert
that they never get there, and make sure all current callers perform
validation of the clock_id with TimeManagement::is_valid_clock_id().
I have fuzzed this change locally for a bit to make sure I didn't
miss any obvious regression.
The variety of checks for Processor::id() == 0 could use some assistance
in the readability department. This change adds a new function to
represent this check, and replaces the comparison everywhere it's used.
FileDescriptionBlocker::m_should_block was shadowing the parent's
FileBlocker::m_should_block variable, which would cause should_block()
to return the wrong value.
Found by @gunnarbeutner
This solves a problem where checking whether a thread is an idle
thread may require iterating all processors if it is not the idle
thread of the current processor.
Previously we would return a 0xdeadc0de frame for every kernel frame
in the real kernel stack when an non super-user issued the request.
This isn't useful, and just produces visual clutter in tools which
attempt to symbolize stacks.
While hacking on `sysctl` an issue in ProcFS was making me unable to
read/write from `/proc/sys/XXX`. Some directories in the ProcFS are not
actually backed by a process and need to return `nullptr` so callbacks
get properly set. We now do an explicit check for the parent to ensure
it's one that is PID-based.
The error handling in all these cases was still using the old style
negative values to indicate errors. We have a nicer solution for this
now with KResultOr<T>. This change switches the interface and then all
implementers to use the new style.
Our current implementation does not work in the special case in which
both shift keys are pressed, and then only one of the keys is released,
as this would result in writing lower case letters, instead of the
expected upper case letters.
This commit fixes that by keeping track of the amount of shift keys
that are pressed (instead of if any are at all), and only switching to
the unshifted keymap once all of them are released.
The previous patch already helped with this, however my idea of only
reading a few packets didn't work and we'd still sometimes end up not
receiving any more packets from the E1000 interface.
With this patch applied my NIC seems to receive packets just fine, at
least for now.
The dance here is not complicated, but it is something that should
be taken note of. Since we append to both lists, we don't want to
orphan the new Inode in the m_links/m_subfolders Vector in the event
that the append to m_parent_fs.m_nodes fails.
Raw sockets don't need a local port, so we shouldn't fail operations
if allocation yields an ENOPROTOOPT.
I'm not in love with the factoring here, just patching up the bug.
Without this patch we end up with sockets in the listener's accept
queue with state 'closed' when doing stealth SYN scans:
Client -> Server: SYN for port 22
Server -> Client: SYN/ACK
Client -> Server: RST (i.e. don't complete the TCP handshake)
Previously, TLS data was always zero-initialized.
To support initializing the values of TLS data, sys$allocate_tls now
receives a buffer with the desired initial data, and copies it to the
master TLS region of the process.
The DynamicLinker gathers the initial TLS image and passes it to
sys$allocate_tls.
We also now require the size passed to sys$allocate_tls to be
page-aligned, to make things easier. Note that this doesn't waste memory
as the TLS data has to be allocated in separate pages anyway.
For example Linux accepts an additional argument for flags in accept4()
that let the user specify what flags they want. However, by default
accept() should not inherit those flags from the listener socket.
When there is more than one file descriptor for a file closing
one of them should not close the underlying file.
Previously this relied on the file's ref_count() but at least
for sockets this didn't work reliably.
Theoretically the append should never fail as we have in-line storage
of FD_SETSIZE, which should always be enough. However I'm planning on
removing the non-try variants of AK::Vector when compiling in kernel
mode in the future, so this will need to go eventually. I suppose it
also protects against some unforeseen bug where we we can append more
than FD_SETSIZE items.
Theoretically the append should never fail as we have in-line storage
of 2, which should be enough. However I'm planning on removing the
non-try variants of AK::Vector when compiling in kernel mode in the
future, so this will need to go eventually. I suppose it also protects
against some unforeseen bug where we we can append more than 2 items.
sys$purge() is a bit unique, in that it is probably in the systems
advantage to attempt to limp along if we hit OOM while processing
the vmobjects to purge. This change modifies the algorithm to observe
OOM and continue trying to purge any previously visited VMObjects.
GCC with -flto is more aggressive when it comes to inlining and
discarding functions which is why we must mark some of the functions
as NEVER_INLINE (because they contain asm labels which would be
duplicated in the object files if the compiler decides to inline
the function elsewhere) and __attribute__((used)) for others so
that GCC doesn't discard them.
This commit will add MSG_PEEK support, which allows a package to be
seen without taking it from the buffer, so that a subsequent recv()
without the MSG_PEEK flag can pick it up.
We had some inconsistencies before:
- Sometimes "The", sometimes "the"
- Sometimes trailing ".", sometimes no trailing "."
I picked the most common one (lowecase "the", trailing ".") and applied
it to all copyright headers.
By using the exact same string everywhere we can ensure nothing gets
missed during a global search (and replace), and that these
inconsistencies are not spread any further (as copyright headers are
commonly copied to new files).
The current implementation would only check the first name.length()
characters match, which means any kernel symbol that the provided name
is a prefix of would match, instead of the actual matching symbol.
This commit fixes that by using StringView::operator==() for the
comparison, which already checks the equality correctly.
Lots of people are confused by the error message you get when the
Toolchain is behind/messed up:
'initializer-list: No such file or directory'
Before this error can happen, catch the problem at CMake configure time,
and provide them with an actionable error message.
Make this stuff a bit easier to maintain by using the
root level variables to build up the Toolchain paths.
Also leave a note for future editors of BuildIt.sh to
give them warning about the other changes they'll need
to make.
Previously the E1000 network adapter would stop receiving further
packets when an RX buffer overrun occurred. This was the case
when connecting the adapter to a real network where enough broadcast
traffic caused the buffer to be full before the kernel had a chance
to clear the RX buffer.
The overall design is the same, but we change a few things,
like decreasing the amount of blocking forever loops. The goal
is to ensure the kernel won't hang forever when dealing with
buggy hardware.
Also, we reset the channel when initializing it, just in case the
hardware was in bad state before we start use it.
The last IP address in an IPv4 subnet is considered the directed
broadcast address, e.g. for 192.168.3.0/24 the directed broadcast
address is 192.168.3.255. We need to consider this address as
belonging to the interface.
Here's an example with this fix applied, SerenityOS has 192.168.3.190:
[gunnar@nyx ~]$ ping -b 192.168.3.255
WARNING: pinging broadcast address
PING 192.168.3.255 (192.168.3.255) 56(84) bytes of data.
64 bytes from 192.168.3.175: icmp_seq=1 ttl=64 time=0.950 ms
64 bytes from 192.168.3.188: icmp_seq=1 ttl=64 time=2.33 ms
64 bytes from 192.168.3.46: icmp_seq=1 ttl=64 time=2.77 ms
64 bytes from 192.168.3.41: icmp_seq=1 ttl=64 time=4.15 ms
64 bytes from 192.168.3.190: icmp_seq=1 ttl=64 time=29.4 ms
64 bytes from 192.168.3.42: icmp_seq=1 ttl=64 time=30.8 ms
64 bytes from 192.168.3.55: icmp_seq=1 ttl=64 time=31.0 ms
64 bytes from 192.168.3.30: icmp_seq=1 ttl=64 time=33.2 ms
64 bytes from 192.168.3.31: icmp_seq=1 ttl=64 time=33.2 ms
64 bytes from 192.168.3.173: icmp_seq=1 ttl=64 time=41.7 ms
64 bytes from 192.168.3.43: icmp_seq=1 ttl=64 time=47.7 ms
^C
--- 192.168.3.255 ping statistics ---
1 packets transmitted, 1 received, +10 duplicates, 0% packet loss,
time 0ms, rtt min/avg/max/mdev = 0.950/23.376/47.676/16.539 ms
[gunnar@nyx ~]$
This turns the perfcore format into more a log than it was before,
which lets us properly log process, thread and region
creation/destruction. This also makes it unnecessary to dump the
process' regions every time it is scheduled like we did before.
Incidentally this also fixes 'profile -c' because we previously ended
up incorrectly dumping the parent's region map into the profile data.
Log-based mmap support enables profiling shared libraries which
are loaded at runtime, e.g. via dlopen().
This enables profiling both the parent and child process for
programs which use execve(). Previously we'd discard the profiling
data for the old process.
The Profiler tool has been updated to not treat thread IDs as
process IDs anymore. This enables support for processes with more
than one thread. Also, there's a new widget to filter which
process should be displayed.
The previous `LOCKER(..)` instrumentation only covered some of the
cases where a lock is actually acquired. By utilizing the new
`AK::SourceLocation` functionality we can now reliably instrument
all calls to lock automatically.
Other changes:
- Tweak the message in `Thread::finalize()` which dumps leaked lock
so it's more readable and includes the function information that is
now available.
- Make the `LOCKER(..)` define a no-op, it will be cleaned up in a
follow up change.
- UBSAN detected cases where we were calling thread->holding_lock(..)
but current_thread was nullptr.
- Fix Lock::force_unlock_if_locked to not pass the correct ref delta to
holding_lock(..).
Userspace can provide a null argument for the `act` argument to the
`sigaction` syscall to not set any new behavior. This is described
here:
https://pubs.opengroup.org/onlinepubs/007904875/functions/sigaction.html
Without this fix, the `copy_from_user(...)` invocation on `user_act`
fails and makes the syscall return early.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
The previous implementation could allocate on insertion into the completed / pending
sub request vectors. There's no reason these can't be intrusive lists instead.
This is a very minor step towards improving the ability to handle OOM, as tracked by #6369
It might also help improve performance on the IO path in certain situations.
I'll benchmark that later.
Until we get the goodness that C++ modules are supposed to be, let's try
to shave off some parse time using precompiled headers.
This commit only adds some very common AK headers, only to binaries,
libraries and the kernel (tests are not covered due to incompatibility
with AK/TestSuite.h).
This option is on by default, but can be disabled by passing
`-DPRECOMPILE_COMMON_HEADERS=OFF` to cmake, which will disable all
header precompilations.
This makes the build about 30 seconds faster on my machine (about 7%).
Binding to port 0 is used to signal to listen() to bind to any port
that is available. (in serenity's case, to the port range of 32768 to
60999, which are not privileged ports)
While profiling all processes the profile buffer lives forever.
Once you have copied the profile to disk, there's no need to keep it
in memory. This syscall surfaces the ability to clear that buffer.
This command line flag can be used to disable VirtIO support on
certain configurations (native windows) where interfacing with
virtio devices can cause qemu to freeze.
Pressing this combo will dump a list of all threads and their state
to the debug console.
This might be useful to figure out why the system is not responding.
This adds PT_PEEKDEBUG and PT_POKEDEBUG to allow for reading/writing
the debug registers, and updates the Kernel's debug handler to read the
new information from the debug status register.
Helps with bare metal debugging, as we can't be sure our implementation
will work with a given machine.
As reported by someone on Discord, their machine hangs when we attempt
the dummy transfer.
Previously the dynamic loader would become unused after it had invoked
the program's entry function. However, in order to support exceptions
and - at a later point - dlfcn functionality we need to call back
into the dynamic loader at runtime.
Because the dynamic loader has a static copy of LibC it'll attempt to
invoke syscalls directly from its text segment. For this to work the
executable region for the dynamic loader needs to have syscalls enabled.
This enables building usermode programs with exception handling. It also
builds a libstdc++ without exception support for the kernel.
This is necessary because the libstdc++ that gets built is different
when exceptions are enabled. Using the same library binary would
require extensive stubs for exception-related functionality in the
kernel.
This is a very basic implementation that only requests 4096 bytes
of entropy from the host once, but its still high quality entropy
so it should be a good fix for #4490 (boot-time entropy starvation)
for virtualized environments.
Co-authored-by: Sahan <sahan.h.fernando@gmail.com>
This commit includes a lot of small changes and additions needed to
finalize the base implementation of VirtIOQueues and VirtDevices:
* The device specific driver implementation now has to handle setting
up the queues it needs before letting the base device class know it
finised initialization
* Supplying buffers to VirtQueues is now done via ScatterGatherLists
instead of arbitary buffer pointers - this ensures the pointers are
physical and allows us to follow the specification in regards to the
requirement that individual descriptors must point to physically
contiguous buffers. This can be further improved in the future by
implementating support for the Indirect-Descriptors feature (as
defined by the specification) to reduce descriptor usage for very
fragmented buffers.
* When supplying buffers to a VirtQueue the driver must supply a
(temporarily-)unique token (usually the supplied buffer's virtual
address) to ensure the driver can discern which buffer has finished
processing by the device in the case in which the device does not
offer the F_IN_ORDER feature.
* Device drivers now handle queue updates (supplied buffers being
returned from the device) by implementing a single pure virtual
method instead of setting a seperate callback for each queue
* Two new VirtQueue methods were added to allow the device driver
to either discard or get used/returned buffers from the device by
cleanly removing them off the descriptor chain (This also allows
the VirtQueue implementation to reuse those freed descriptors)
This also includes the necessary changes to the VirtIOConsole
implementation to match these interface changes.
Co-authored-by: Sahan <sahan.h.fernando@gmail.com>
This allows converting a single virtual buffer into its non-physically
contiguous parts, this is especially useful for DMA-based devices that
support scatter/gather-like functionality, as it eliminates the need
to clone outgoing buffers into one physically contiguous buffer.
This patch allocates a physical page for each of the virtqueues and
memcpys to it when receiving a buffer to get a physical, aligned
contiguous buffer as required by the virtio specification.
Co-authored-by: Sahan <sahan.h.fernando@gmail.com>
This patch actually enables virtio queues after configuring them
so the device can use them, it also enables interrupt handling in
VirtIODevice so they are not ignored.
Co-authored-by: Sahan <sahan.h.fernando@gmail.com>
Raw pointers were mostly replaced with smart pointers and references
where appropriate based on kling and smartcomputer7's suggestions :)
Co-authored-by: Sahan <sahan.h.fernando@gmail.com>
Based on pull #3236 by tomuta, this is still very much WIP but will
eventually allow us to switch from the considerably slower method of
knocking on port 0xe9 for each character
Co-authored-by: Tom <tomut@yahoo.com>
Co-authored-by: Sahan <sahan.h.fernando@gmail.com>
Based on pull #3236 by tomuta, this adds helper methods for generic
device initialization, and partily-broken virtqueue helper methods
Co-authored-by: Tom <tomut@yahoo.com>
Co-authored-by: Sahan <sahan.h.fernando@gmail.com>
`-Wnull-dereference` has found a lot of "possible null dereferences" in userland.
However, in the kernel, no warnings occurred. To keep it that way,
preemptivly add the flag here and remove it once it is enabled system
wide.
This flag makes the compiler check statically for a null deref. It does
not take into account any human-imposed invariants and as such, may need a
`VERIFY(ptr);`, `if(ptr)`, or `if(!ptr)` before using a pointer.
However, as long as a pointer is not reassigned,
the verify will be valid, meaning that adding `VERIFY` can be done sparingly.
This should allow creating intrusive lists that have smart pointers,
while remaining free (compared to the impl before this commit) when
holding raw pointers :^)
As a sidenote, this also adds a `RawPtr<T>` type, which is just
equivalent to `T*`.
Note that this does not actually use such functionality, but is only
expected to pave the way for #6369, to replace NonnullRefPtrVector<T>
with intrusive lists.
As it is with zero-cost things, this makes the interface a bit less nice
by requiring the type name of what an `IntrusiveListNode` holds (and
optionally its container, if not RawPtr), and also requiring the type of
the container (normally `RawPtr`) on the `IntrusiveList` instance.
This patch adds a few missing ioctls which were required by Wine.
SIOCGIFNETMASK, SIOCGIFBRDADDR and SIOCGIFMTU are fully implemented,
while SIOCGIFFLAGS and SIOCGIFCONF are stubs.
This flag warns on classes which have `virtual` functions but do not
have a `virtual` destructor.
This patch adds both the flag and missing destructors. The access level
of the destructors was determined by a two rules of thumb:
1. A destructor should have a similar or lower access level to that of a
constructor.
2. Having a `private` destructor implicitly deletes the default
constructor, which is probably undesirable for "interface" types
(classes with only virtual functions and no data).
In short, most of the added destructors are `protected`, unless the
compiler complained about access.
Reading from the mapping doesn't work when the text segment has a non-zero
offset because in that case the first mapped page doesn't contain the ELF
header.
This should provide some speed up, as currently searches for regions
containing a given address were performed in O(n) complexity, while
this container allows us to do those in O(logn).
This does not affect functionality right now, but it means that the
regions vector will now never have any overlapping regions, which will
allow the use of balance binary search trees instead of a vector in the
future. (since they require keys to be exclusive)
This pattern felt really cluttery:
auto result = something();
if (result.is_error())
return result;
Since it leaves "result" lying around in the no-error case.
Let's use some C++17 if initializer expressions to improve this:
if (auto result = something(); result.is_error())
return result;
Now the "result" goes out of scope if we don't need it anymore.
This is doubly nice since we're also free to reuse the "result"
name later in the same function.
This commit makes the user-facing StdLibExtras templates and utilities
arguably more nice-looking by removing the need to reach into the
wrapper structs generated by them to get the value/type needed.
The C++ standard library had to invent `_v` and `_t` variants (likely
because of backwards compat), but we don't need to cater to any codebase
except our own, so might as well have good things for free. :^)
It's perfectly valid for ext2 inodes to have blocks with index 0.
It means that no physical block was allocated for that area of an inode
and we should treat it as if it's filled with zeroes.
Fixes#6139.
Double kfree() is exceedingly rare in our kernel since we use automatic
memory management and smart pointers for almost all code. However, it
doesn't hurt to do some basic checking that might one day catch bugs.
This patch makes us VERIFY that we don't already consider the first
chunk of a kmalloc() allocation free when kfree()'ing it.
I dont know why we do a fast path in the Kernel, but not in Userspace
Also simplified the byte explosion in memset to "explode_byte"
it even seemed so, that we missed the highest byte when memseting something
The first one is for disabling the PS2 controller, the other one is for
disabling physical storage enumeration.
We can't be sure any machine will work with our implementation,
therefore this will help us to test more machines.
We need to do it to let real hardware to put the correct voltages
on the wire.
Apparently my ICH7 machine refused to boot, and was reading lots of
garbage from an unconnected IDE channel. It was fixed after I added a
delay of 20 microseconds. It probably can be reduced, I just took a safe
value and it seems to work correctly without any problems :)
Now the kernel supports 2 ECAM access methods.
MMIOAccess was renamed to WindowedMMIOAccess and is what we had until
now - each device that is detected on boot is assigned to a
memory-mapped window, so IO operations on multiple devices can occur
simultaneously due to creating multiple virtual mappings, hence the name
is a memory-mapped window.
This commit adds a new class called MMIOAccess (not to be confused with
the old MMIOAccess class). This class creates one memory-mapped window.
On each IO operation on a configuration space of a device, it maps the
requested PCI bus region to that window. Therefore it holds a SpinLock
during the operation to ensure that no other PCI bus region was mapped
during the call.
A user can choose to either use PCI ECAM with memory-mapped window
for each device, or for an entire bus. By default, the kernel prefers to
map the entire PCI bus region.
Apparently we don't enable PCI ECAM (MMIO access to the PCI
configuration space) even if we can. This is a regression, as it was
enabled in the past and in unknown time it was regressed.
The CommandLine::is_mmio_enabled method was renamed to
CommandLine::is_pci_ecam_enabled to better represent the meaning
of this method and what it determines.
Also, an UNMAP_AFTER_INIT macro was removed from a method
in the MMIOAccess class as it halted the system when the kernel
tried to access devices after the boot process.
The end goal of this commit is to allow to boot on bare metal with no
PS/2 device connected to the system. It turned out that the original
code relied on the existence of the PS/2 keyboard, so VirtualConsole
called it even though ACPI indicated the there's no i8042 controller on
my real machine because I didn't plug any PS/2 device.
The code is much more flexible, so adding HID support for other type of
hardware (e.g. USB HID) could be much simpler.
Briefly describing the change, we have a new singleton called
HIDManagement, which is responsible to initialize the i8042 controller
if exists, and to enumerate its devices. I also abstracted a bit
things, so now every Human interface device is represented with the
HIDDevice class. Then, there are 2 types of it - the MouseDevice and
KeyboardDevice classes; both are responsible to handle the interface in
the DevFS.
PS2KeyboardDevice, PS2MouseDevice and VMWareMouseDevice classes are
responsible for handling the hardware-specific interface they are
assigned to. Therefore, they are inheriting from the IRQHandler class.
This reverts commit 36a82188a8.
This register is write-only for the firmware (BIOS), and read-only for
us so we shouldn't set the PCI IRQ line never.
The firmware figured out the IRQ routing to the PIC for us, so changing
it won't affect anything. I was mistaken when I thought that changing
the value of this register will allow us to change its interrupt line,
like when changing a PCI BAR to relocate device resources as desired
with the requirements of the OS.
Also handle native and compatibility channel modes together, so if only
one IDE channel was set to work on PCI native mode, we need to handle it
separately, so the other channel continue to operate with the legacy IO
ports and interrupt line.
This is done also by linux (signal.c:936 in v5.11) at least.
It's a pretty handy notification that allows the parent process to skip
going through a `waitpid` and guesswork to figure out the current state
of a child process.
Added a dummy TIOCSTI ioctl placeholder. This is a dangerous ioctl that
can be used to inject input into a tty. Added for compatibility. Always
fails with EIO.
Previously, Process::do_write would error if the O_APPEND flag was set
on a non-seekable file.
Other systems (such as Linux) seem to be OK with doing this, so we now
do not attempt to seek to the end the file if it's not seekable.
Most coredumps contain large amounts of consecutive null bytes and as
such are a prime candidate for compression.
This commit makes CrashDaemon compress files once the kernel finishes
emitting them, as well as adds the functionality needed in LibCoreDump
to then parse them.
This is a "quirk" I've observed on a Intel ICH7 test machine. Apparently
we need to select the device (master or slave) before starting to work
with the bus master register.
It's very possible that other machines are requiring this step to happen
before the DMA transfer can occur correctly.
Also, when reading with DMA, we should set the transfer direction before
clearing the interrupt status.
For the sake of completeness, I added a few lines in places that I
deemed it to be reasonable to clear the interrupt status there.
Although unlikely to happen, a user can have an IDE controller that
doesn't support bus master capability. If that's the case, we need to
check for this, and create an IDEChannel (not BMIDEChannel) to allow
IO operations with the controller.
If the user requests to force PIO mode, we just create IDEChannel
objects which are capable of sending PIO commands only.
However, if the user doesn't force PIO mode, we create BMIDEChannel
objects, which are sending DMA commands.
This change is somewhat simplifying the code, so each class is
supporting its type of operation - PIO or DMA. The PATADiskDevice
should not care if DMA is enabled or not.
Later on, we could write an IDEChannel class for UDMA modes,
that are available and documented on Intel specifications for their IDE
controllers.
Technically not supported by the original ATA specification, IDE
hot swapping is still in practice possible, so the only sane way
to start support it is with ref-counting the IDEChannel object so if we
remove a PATADiskDevice, it's not gone with it.
An article about IDE limits states that:
"Hard drives over 8.4 GB are supposed to report their geometry as
16383/16/63. This in effect means that the `geometry' is obsolete, and
the total disk size can no longer be computed from the geometry, but is
found in the LBA capacity field returned by the IDENTIFY command.
Hard drives over 137.4 GB are supposed to report an LBA capacity of
0xfffffff = 268435455 sectors (137438952960 bytes). Now the actual disk
size is found in the new 48-capacity field."
(https://tldp.org/HOWTO/Large-Disk-HOWTO-4.html) which is the main
reason to not support CHS as harddrives with less than 8.4 GB capacity
are completely obsolete.
Another good reason is that virtually any harddrive in the last 20 years
or so, supports LBA mode. Therefore, it's probably OK to just ignore CHS
as it's unlikely to encounter a harddrive that doesn't support LBA.
This is somewhat simplifying the IDE initialization and access code.
Also, we should use the ATAIdentifyBlock structure if possible,
so now we do it instead of using macros to calculate offsets.
With the usage of the ATAIdentifyBlock structure, we now use the
48-bit LBA max count if the drive indicates it supports 48-bit LBA mode.
This reverts commit cfc2f33dcb.
We can't actually change the IRQ line value and expect the device
to work with it (this was my mistake).
That register is R/W so the firmware can figure out IRQ routing and put
the correct value and write it to the Interrupt line register.
As a compromise, if the fimrware decided to set the IRQ line to be 7,
or something else we can't deal with, the user can simply force the code
to work with IRQ 11, with the boot argument "force_ahci_irq_11" being
set to "on".
Instead of polling if the device ended the operation, we can just use
interrupts for signalling about end of IO operation.
In similar way, we use interrupts during device detection.
Also, we use the new Work Queue mechanism introduced by @tomuta to allow
better performance and stability :)
We can't use deferred functions for anything that may require preemption,
such as copying from/to user or accessing the disk. For those purposes
we should use a work queue, which is essentially a kernel thread that
may be preempted or blocked.
Alot of code is shared between i386/i686/x86 and x86_64
and a lot probably will be used for compatability modes.
So we start by moving the headers into one Directory.
We will probalby be able to move some cpp files aswell.
We previously ignored these values in the return value of
load_elf_object, which causes us to not allocate a TLS region for
statically-linked programs.
These errors are classed as fatal, so we need to recover from them.
Found while trying to debug AHCI boot on VMware Player,
where I got TFES.
From the spec: "Fatal errors (signified by the setting of PxIS.HBFS,
PxIS.HBDS, PxIS.IFS, or PxIS.TFES) will cause the HBA to enter
the ERR:Fatal state"
We were already recovering from IFS.
In case multiple file descriptors in the `fd_set` were already readable
and/or writable when calling Thread::block<SelectBlocker>(), we would
only mark the first fd in the output sets instead of all relevant fd's.
The short-circuit code path when blocking isn't necessary must ensure
that unblock flags are collected for all file descriptors, not just the
first one encountered.
Fixes#5795.
If we happen to read with offset that is after the end of file of a
device, return 0 to indicate EOF. If we return a negative value,
userspace will think that something bad happened when it's really not
the case.
This implements a fallback to munmap that unmaps multiple regions at a
time, with splitting some when needed.
The way it is implemented is possibly not optimal, due to it searching
without looking into the cache
Instead of blindly resetting every AHCI port, let's just reset only the
controller by default. The user can still request to reset everything
with a new kernel boot argument called ahci_reset_mode which is set
by default to "controller", so the code will only invoke an HBA reset.
This kernel boot argument can be set to 3 different values:
1. "controller" - reset the HBA and skip resetting AHCI ports
2. "none" - don't reset anything, so we rely on the firmware to
initialize the AHCI HBA and ports for us.
3. "complete" - reset the AHCI HBA and ports.
Instead of waiting for the AHCI HBA to reset the signature after SATA
reset sequence, let's just check if the Port x Serial ATA Status
register was set to value 3, indicating that device was detected
and phy communication was established.
The intention is to make the boot to be faster, therefore we should
decrease the time deltas in timeout loops to allow earlier break
from these.
Also, there's no need to wait 10 milliseconds before setting
the interface state to "no action request" during the reset sequence.