Files opened with O_DIRECT will now bypass the disk cache in read/write
operations (though metadata operations will still hit the disk cache.)
This will allow us to test actual disk performance instead of testing
disk *cache* performance, if that's what we want. :^)
There's room for improvment here, we're very aggressively flushing any
dirty cache entries for the specific block before reading/writing that
block. This is done by walking the entire cache, which may be slow.
This helps aid debugging of issues such as #695, where the bridge chip
that controls IDE is NOT a PIIX3/4 compatible controller. Instead of
just hanging when the DMA registers can't be accessed, the system will
inform the user that no valid IDE controller has been found. In this
case, the system will not attempt to initialise the DMA registers and
instead use PIO mode.
Don't keep Inodes around in memory forever after we've interacted with
them once. This is a slight performance pessimization when accessing
the same file repeatedly, but closing it for a while in between.
Longer term we should find a way to keep a limited number of unused
Inodes cached, whichever ones we think are likely to be used again.
Move the kernel image to the 1 MB physical mark. This prevents it from
colliding with stuff like the VGA memory. This was causing us to end
up with the BIOS screen contents sneaking into kernel memory sometimes.
This patch also bumps the kmalloc heap size from 1 MB to 3 MB. It's not
the perfect permanent solution (obviously) but it should get the OOM
monkey off our backs for a while.
dispatch_signal() expected a RegisterDump on the kernel stack. However
in certain cases, like just after a clone, this was not the case and
dispatch_signal() would instead write to an incorrect user stack pointer.
We now use the threads TSS in situations where the RegisterDump may not
be valid, fixing the issue.
After the page fault handler has found the region in which the fault
occurred, do the rest of the work in the region itself.
This patch also makes all fault types consistently crash the process
if a new page is needed but we're all out of pages.
Since the kernel page tables are shared between all processes, there's
no need to (implicitly) flush the TLB for them on every context switch.
Setting the G bit on kernel page tables allows the CPU to keep the
translation caches around.
This patch changes the parameter to Region::map() to be a PageDirectory
since that matches how we think about the memory model:
Regions are views onto VMObjects, and are mapped into PageDirectories.
Each Process has a PageDirectory. The kernel also has a PageDirectory.
Since a Region is merely a "window" onto a VMObject, it can both begin
and end at a distance from the VMObject's boundaries.
Therefore, we should always be computing indices into a VMObject's
physical page array by adding the Region's "first_page_index()".
There was a whole bunch of code that forgot to do that. This fixes
many wrong behaviors for Regions that start part-way into a VMObject.
When creating a new directory, we set the initial size to 1 block.
This meant that we were allocating a block up front, but the Inode's
internal block list cache was not populated with this block.
This broke write_bytes() on a new directory, since it assumed that
the block list cache would be up to date if the call to write_bytes()
would not change the directory's size.
This patch fixes the issue in two ways: First, we cache the initial
block list created for new directories.
Second, we now repopulate the block list cache in write_bytes() if it
is empty when we get there. This is basically just a safety fallback
to avoid having this kind of bug in the future.
Ports/.port_include.sh, Toolchain/BuildIt.sh, Toolchain/UseIt.sh
have been left largely untouched due to use of Bash-exclusive
functions and variables such as $BASH_SOURCE, pushd and popd.
We can't be calling the virtual FS::flush_writes() in order to flush
the disk cache from within the disk cache, since an FS subclass may
try to do cache stuff in its flush_writes() implementation.
Instead, separate out the implementation of DiskBackedFS's flushing
logic into a flush_writes_impl() and call that from the cache code.
Also cache the block group descriptor table in a KBuffer on file system
initialization, instead of on first access.
This reduces pressure on the kmalloc heap somewhat.
We were writing out the full block list whenever Ext2FSInode::resize()
was called, even if the old and new sizes were identical.
This patch makes it a no-op, which drastically improves "cp" speed
since we now take full advantage of the up-front call to ftruncate().
If there are not enough free blocks in the filesystem to accomodate
growing an Inode, we should fail with ENOSPC before even starting to
allocate blocks.
Add a simple cache to Ext2FS where we keep block bitmaps along with a
dirty bit. This allows us to coalesce bitmap flushes, giving us a nice
~3x improvement in disk_benchmark write speeds.
Store the cached super block as an ext2_super_block member instead of
caching it in a ByteBuffer and using a casting helper everywhere.
This patch also combines reading/writing of the super block into a
single disk device operation (instead of two.)
Oops, we had a little mistake here. We were flushing whenever !NOFLSH,
not just when generating a signal.
This broke arrow keys in the terminal (you would only get A/B/C/D when
pressing arrow keys, instead of the full escape sequence.)
We were doing a temporary STI/CLI in MemoryManager::zero_page() to be
able to acquire the VMObject's lock before zeroing out a page.
This logic was inherited from the inode fault handler, where we need
to enable interrupts anyway, since we might need to interact with the
underlying storage device.
Zero-fill faults don't actually need to lock the VMObject, since they
are already guaranteed exclusivity by interrupts being disabled when
entering the fault handler.
This is different from inode faults, where a second thread can often
get an inode fault for the same exact page in the same VMObject before
the first fault handler has received a response from the disk.
This is why the lock exists in the first place, to prevent this race.
This fixes an intermittent crash in sys$execve() that was made much
more visible after I made userspace stacks lazily allocated.
Add text.startup to the .text block, add .ctors as well.
Use them in init.cpp to call global constructors after
gtd and idt init. That way any funky constructors should be ok.
Also defines some Itanium C++ ABI methods that probably shouldn't be,
but without them the linker gets very angry.
If the code ever actually tries to use __dso_handle or call
__cxa_atexit, there's bigger problems with the kernel.
Bit of a hack would be an understatement but hey. It works :)
Make userspace stacks lazily allocated and allow them to grow up to
4 megabytes. This avoids a lot of silly crashes we were running into
with software expecting much larger stacks. :^)
Add dedicated internal types for Int64 and UnsignedInt64. This makes it
a bit more straightforward to work with 64-bit numbers (instead of just
implicitly storing them as doubles.)
This is just a wrapper around strstr() for now. There are many better
ways to search for a string within a string, but I'm just adding a nice
API at the moment. :^)
Add the ability to both pass arguments to scripts with shebangs
(./script argument1 argument2) and to specify them in the shebang line
(#!/usr/local/bin/bash -x -e)
Fixes#585
I have no idea why this was here. It makes no sense. If you're trying
to find out if something is a directory, why wouldn't you be allowed to
ask that about a FIFO? :^)
Thanks to Brandon for spotting this!
Also, while we're here, cache the directory state in a bool member so
we don't have to keep fetching inode metadata when checking this
repeatedly. This is important since sys$read() now calls it.
This makes tcgetpgrp() on a master PTY return the PGID of the slave PTY
which is probably what you are looking for. I'm not sure how correct or
standardized this is, but it makes sense to me right now.
We now return EISDIR whenever a program attempts to call sys$read
on a directory. Previously, attempting to read a directory could
either return junk data or, in the case of /proc/, cause a kernel
panic.
Fixed an issue with operator precedence in calls to `send_byte()`, in
which a value of `1` was being sent to the function. This had the
nasty side-effect of selecting the slave drive if the value of
`head` was equal to one. A read/write would fail in the case, as
it would attempt to read from the slave drive (not good).
I've also added a seek to the top of the read/write code, which seems
to have fixed an issue with Linux not detecting the disk images after
they have been unmounted from Serenity. This isn't specified in the
datasheet, but a few other drivers have it so we should too :^)
Thread::make_userspace_stack_for_main_thread is only ever called from
Process::do_exec, after all the fun ELF loading and TSS setup has
occured.
The calculations in there that check if the combined argv + envp
size will exceed the default stack size are not used in the rest of
the stack setup. So, it should be safe to move this to the beginning
of do_exec and bail early with -E2BIG, just like the man pages say.
Additionally, advertise this limit in limits.h to be a good POSIX.1
citizen. :)
Previously, procfs$pid_fds would return nothing when called
for a process that had either no open files or a non-existent
handle. This could cause problems when a userspace program
expected a valid Json response.
Procfs$pid_fs now returns an empty array in the aforementioned
cases.
ELFLoader::layout() had a "failed" variable that was never set. This
patch checks the return value of each hook (alloc/map section and tls)
and fails the load if they return null.
I also needed to patch Process so that the alloc_section_hook and
map_section_hook actually return nullptr when allocating a region fails.
Fixes#664 :)
The TTY driver now respects the ICANON flag, enabling basic line
editing like VKILL, VERASE, VEOF and VWERASE. Additionally,
ICANON is now set by default.
Basic echoing has can now be enabled via the ECHO flag, though
more complicated echoing like ECHOCTL or ECHONL has not been
implemented.
This reverts commit 1cca5142af.
This appears to be causing intermittent triple-faults and I don't know
why yet, so I'll just revert it to keep the tree in decent shape.
Background: DoubleBuffer is a handy buffer class in the kernel that
allows you to keep writing to it from the "outside" while the "inside"
reads from it. It's used for things like LocalSocket and PTY's.
Internally, it has a read buffer and a write buffer, but the two will
swap places when the read buffer is exhausted (by reading from it.)
Before this patch, it was internally implemented as two Vector<u8>
that we would swap between when the reader side had exhausted the data
in the read buffer. Now instead we preallocate a large KBuffer (64KB*2)
on DoubleBuffer construction and use that throughout its lifetime.
This removes all the kmalloc heap traffic caused by DoubleBuffers :^)
After we clear the FPU state in a thread when it uses the FPU for the
first time, we also save the clean slate in the thread's FPU state
buffer. When we're doing that, let's write through current->fpu_state()
just to make it clear what's going on.
It was actually safe, since we'd just overwritten the g_last_fpu_thread
pointer anyway, but this patch improves the communication of intent.
Spotted by Bryan Steele, thanks!
The way it gets the entropy and blasts it to the buffer is pretty
ugly IMHO, but it does work for now. (It should be replaced, by
not truncating a u32.)
It implements an (unused for now) flags argument, like Linux but
instead of OpenBSD's. This is in case we want to distinguish
between entropy sources or any other reason and have to implement
a new syscall later. Of course, learn from Linux's struggles with
entropy sourcing too.
Cloned threads (basically, forked processes) inherit the complete FPU
state of their origin thread. There was a bug in the lazy FPU state
save/restore mechanism where a cloned thread would believe it had a
buffer full of valid FPU state (because the inherited flag said so)
but the origin thread had never actually copied any FPU state into it.
This patch fixes that by forcing out an FPU state save after doing
the initial FPU initialization (FNINIT) in a thread. :^)
We were leaking 512 bytes of kmalloc memory for every new thread.
This patch fixes that, and also makes sure to zero out the FPU state
buffer after allocating it, and finally also makes the LogStream
operator<< for Thread look a little bit nicer. :^)
This is obviously not ideal, and it would be better to teach it how to
allocate more pages, etc. But since the physical page allocator itself
currently uses SlabAllocator, it's a little bit tricky :^)
Make sure we don't move accepted sockets to the Completed setup state
until we've actually constructed a FileDescription for them.
This is important, since this state transition will trigger connect()
to unblock on the client side, and the client may try writing to the
socket right away.
This makes DNS lookups way more reliable since we don't just fail to
write() right after connect()ing to LookupServer sometimes. :^)
This was causing connect() to unblock immediately for local sockets,
since that's exactly what ConnectBlocker checks for.
Instead, just move to SetupState::Completed when it's accept()ed.
The Cache entries found in `DiskBackedFileSystem` are now stored in a
`KBuffer` object, instead of relying on `kmalloc_eternal`. The number
of entries was exceeding that of the number of bytes allocated to
`kmalloc_eternal`, which in turn caused `mount()` to fail epically
when called.
Now programs can catch the SIGSEGV signal when they segfault.
This commit also introduced the send_urgent_signal_to_self method,
which is needed to send signals to a thread when handling exceptions
caused by the same thread.
Added the exception_code field to RegisterDump, removing the need
for RegisterDumpWithExceptionCode. To accomplish this, I had to
push a dummy exception code during some interrupt entries to properly
pad out the RegisterDump. Note that we also needed to change some code
in sys$sigreturn to deal with the new RegisterDump layout.
Also added a script to handle creation of GPT partitioned disk (with
GRUB config file). Block limit will be used to disallow potential access
to other partitions.