This reverts commit 11896d0e26.
This caused a race where other processes using the same InodeVMObject
could end up accessing the newly-mapped physical page before we've
actually filled it with bytes from disk.
It would be nice to avoid these copies without breaking anything.
We were doing this for the initial kernel-spawned userspace process(es)
to work around instability in the page fault handler. Now that the page
fault handler is more robust, we can stop worrying about this.
Specifically, the page fault handler was previous not able to handle
getting a page fault in anything but the currently executing task's
page directory.
This library is meant to provide C++-style wrappers over lower
level APIs such as syscalls and pthread_* functions, as well as
utilities for easily running pieces of logic on different
threads.
- TmpFSInode::write_bytes() needs to allow non-zero offsets
- TmpFSInode::read_bytes() wasn't respecting the offset
GCC puts the temporary files generated during compilation in /tmp,
so this exposed some bugs in TmpFS.
KBuffer is just meant to be a dumb wrapper around KBufferImpl.
With this change, we actually start to see KBuffers with different size
and capacity, which allows some reallocation-avoiding optimizations.
This papers over an immediate issue where pseudoterminals would choke
on more than 16 characters of pasted input in the GUI terminal.
Longer-term we should find a more elegant solution than using a static
size CircularQueue for this.
The scheduler is not allowed to take locks, so if that's happening,
we want to make that clear instead of crashing with the more general
"Interrupts disabled while trying to take Lock" error.
Here comes the foundation for a neat remote debugging tool.
Right now, it connects to a remote process's CEventLoop RPC socket and
retreives the remote object graph JSON dump. The remote object graph
is then reconstructed and exposed through a GModel subclass, which is
then displayed in a GTreeView.
It's pretty cool, I think. :^)
This implements a very basic VGA device using the information provided
to us by the bootloader in the multiboot header. This allows Serenity to
boot to the desktop on basically any halfway modern system.
The complication is around /proc/sys/ variables, which were attached
to inodes. Now they're their own thing, and the corresponding inodes
are lazily created (as all other ProcFS inodes are) and simply refer
to them by index.
This is kind of a mess, but because IPC client code depending on the
IPC protocol definition artifacts in the server code, we have to build
the IPC servers first. And their dependencies before that, etc.
One more drop in the "maybe we should switch to CMake" bucket..
makeall.sh used to build the AK tests and leave some binary objects laying
around that would get in the way of further incremental builds. There also
wasn't a lot of structure to the order things were built in. This patch
improves both of those things.
It is now possible to unmount file systems from the VFS via `umount`.
It works via looking up the `fsid` of the filesystem from the `Inode`'s
metatdata so I'm not sure how fragile it is. It seems to work for now
though as something to get us going.
We were forced to do this because the page fault code would fall apart
when trying to generate a backtrace for a non-current thread.
This issue has been fixed for a while now, so let's go back to lazily
loading executable pages which should make everything a little better.
This patch adds the mprotect() syscall to allow changing the protection
flags for memory regions. We don't do any region splitting/merging yet,
so this only works on whole mmap() regions.
Added a "crash -r" flag to verify that we crash when you attempt to
write to read-only memory. :^)
Now that we're bringing back the in-kernel virtual console, we should
move towards having a single implementation of terminal emulation.
This patch rips out the emulation code from the Terminal application
and turns it into the beginnings of LibVT.
The basic design idea is that users of VT::Terminal will implement and
provide a VT::TerminalClient subclass to handle presentation-specific
things. We'll need to iterate on this, but it's a start. :^)
TTY::emit is called from an IRQ handler, and is used to push input data
into a buffer for later retrieval. Previously this was using DoubleBuffer,
but that class wants to take a lock. Our lock code wants to make sure
interrupts are enabled, but they're disabled while an IRQ handler is
running. This made the kernel sad, but this CircularQueue cheers it up by
avoiding the lock requirement completely.
This should probably call out to a login program at some point. Right now
it just puts a root terminal on tty{1,2,3}.
Remember not to leave your Serenity workstation unattended!
Our logic for using the ATA_CMD_CACHE_FLUSH functionality was a bit wrong,
and now it's better.
The ATA spec says these two things:
> The device shall enter the interrupt pending state when:
> 1) any command except a PIO data-in command reaches command completion
> successfully;
> ...
> The device shall exit the interrupt pending state when:
> 1) the device is selected, BSY is cleared to zero, and the Status
> register is read;
This means that our sequence of actions was probably never going to work.
We were waiting in a loop checking the status register until it left the
busy state, _then_ waiting for an interrupt. Unfortunately by checking the
status register, we were _clearing_ the interrupt we were about to wait
for.
Now we just wait for the interrupt - we don't poll the status register at
all. This also means that once we get our `wait_for_irq` method sorted out
we'll spend a bunch less CPU time waiting for things to complete.
* The origin PID is the PID of the process that created this socket,
either explicitly by calling socket(), or implicitly by accepting
a TCP connection. Note that accepting a local socket connection
does not create a new socket, it reuses the one connect() was
called on, so for accepted local sockets the origin PID points
to the connecting process.
* The acceptor PID is the PID of the process that accept()ed this
socket. For accepted TCP sockets, this is the same as origin PID.
This is more logical and allows us to solve the problem of
non-blocking TCP sockets getting stuck in SocketRole::None.
The only complication is that a single LocalSocket may be shared
between two file descriptions (on the connect and accept sides),
and should have two different roles depending from which side
you look at it. To deal with it, Socket::role() is made a
virtual method that accepts a file description, and LocalSocket
internally tracks which FileDescription is the which one and
returns a correct role.
Now that there can't be multiple clones of the same fd,
we only need to track whether or not an fd exists on each
side. Also there's no point in tracking connecting fds.
After a fork, the parent and the child are supposed to share
the same file description. For example, modifying the current
offset of a file description is visible in both of them.
Apparently we need to poll the drive for its status after each sector we
read if we're not doing DMA. Previously we only did it at the start,
which resulted in every sector after the first in a batch having 12 bytes
of garbage on the end. This manifested as silent read corruption.
serial_debug will output all the kprintf and dbgprintf data to COM1 at
8-N-1 57600 baud. this is particularly useful for debugging the boot
process on live hardware.
Note: it must be the first parameter in the boot cmdline.
Since this key number doesn't appear to collide with anything on the
US keymap, I was thinking we could get away with supporting a hybrid
US/UK keymap. :^)
Once we've converted from an Ethernet frame to an IPv4 packet, we can
pass the IPv4Packet around instead of the EthernetFrameHeader.
Also add some more code to ignore invalid-looking packets.
Remove the global hash tables and replace them with InlineLinkedLists.
This significantly reduces the kernel heap pressure from doing many
small mmap()'s.
Using a HashTable to track "all instances of Foo" is only useful if we
actually need to look up entries by some kind of index. And since they
are HashTable (not HashMap), the pointer *is* the index.
Since we have the pointer, we can just use it directly. Duh.
This increase sizeof(VMObject) by two pointers, but removes a global
table that had an entry for every VMObject, where the cost was higher.
It also avoids all the general hash tabling business when creating or
destroying VMObjects. Generally we should do more of this. :^)
This is comprised of five small changes:
* Keep a counter for tx/rx packets/bytes per TCP socket
* Keep a counter for tx/rx packets/bytes per network adapter
* Expose that data in /proc/net_tcp and /proc/netadapters
* Convert /proc/netadapters to JSON
* Fix up ifconfig to read the JSON from netadapters
We were only doing this in Process::deallocate_region(), which meant
that kernel-only Regions never gave back their VM.
With this patch, we can start reusing freed-up address space! :^)
This is not perfect as it uses a lot of VM, but since the buffers are
supposed to be temporary it's not super terrible.
This could be improved by giving back the unused VM to the kernel's
RangeAllocator after finishing the buffer building.
Each Function is a heap allocation, so let's make an effort to avoid
doing that during scheduling. Because of header dependencies, I had to
put the runnables iteration helpers in Thread.h, which is a bit meh but
at least this cuts out all the kmalloc() traffic in pick_next().
If kmalloc backtraces are enabled during backtracing, things don't go
super well when the backtrace code calls kmalloc()..
With this fixed, it's basically possible to get all kmalloc backtraces
on the debugger by running (as root):
sysctl kmalloc_stacks=1
This makes VMObject 8 bytes smaller since we can use the array size as
the page count.
The size() is now also computed from the page count instead of being
a separate value. This makes sizes always be a multiple of PAGE_SIZE,
which is sane.
InodeVMObject is a VMObject with an underlying Inode in the filesystem.
AnonymousVMObject has no Inode.
I'm happy that InodeVMObject::inode() can now return Inode& instead of
VMObject::inode() return Inode*. :^)
This wasn't really thought-through, I was just trying anything to see
if it would make WindowServer faster. This doesn't seem to make much of
a difference either way, so let's just not do it for now.
It's easy to bring back if we think we need it in the future.
The VMObject name was always either the owning region's name, or the
absolute path of the underlying inode.
We can reconstitute this information if wanted, no need to keep copies
of these strings around.
This class works by eagerly allocating 1MB of virtual memory but only
adding physical pages on demand. In other words, when you append to it,
its memory usage will increase by 1 page whenever you append across a
page boundary (4KB.)
Instead of dumping the dying thread's backtrace in the signal handling
code, wait until we're finalizing the thread. Since signalling happens
during scheduling, the less work we do there the better.
Basically the less that happens during a scheduler pass the better. :^)
This has several significant changes to the networking stack.
* Significant refactoring of the TCP state machine. Right now it's
probably more fragile than it used to be, but handles quite a lot
more of the handshake process.
* `TCPSocket` holds a `NetworkAdapter*`, assigned during `connect()` or
`bind()`, whichever comes first.
* `listen()` is now virtual in `Socket` and intended to be implemented
in its child classes
* `listen()` no longer works without `bind()` - this is a bit of a
regression, but listening sockets didn't work at all before, so it's
not possible to observe the regression.
* A file is exposed at `/proc/net_tcp`, which is a JSON document listing
the current TCP sockets with a bit of metadata.
* There's an `ETHERNET_VERY_DEBUG` flag for dumping packet's content out
to `kprintf`. It is, indeed, _very debug_.
KBuffers are now zero-filled on demand instead of up front. This means
that you can create a huge KBuffer and it will only take up VM, not
physical pages (until you access them.)
We were short-circuiting the page fault handler a little too eagerly
for page-not-present faults in kernel memory.
If the current page directory already has up-to-date mapps for kernel
memory, allow it to progress to checking for zero-fill conditions.
This will enable us to have lazily populated kernel regions.
This allows the page fault code to find the owning PageDirectory and
corresponding process for faulting regions.
The mapping is implemented as a global hash map right now, which is
definitely not optimal. We can come up with something better when it
becomes necessary.
Sometimes you're only interested in either user OR kernel regions but
not both. Let's break this into two functions so the caller can choose
what he's interested in.
If we were using a ProcessPagingScope to temporarily go into another
process's page tables, things would fall apart when hitting a kernel
NP fault, since we'd clone the kernel page directory entry into the
*currently active process's* page directory rather than cloning it
into the *currently active* page directory.
In the userspace, this mimics the Linux pipe2() syscall;
in the kernel, the Process::sys$pipe() now always accepts
a flags argument, the no-argument pipe() syscall is now a
userspace wrapper over pipe2().
Instead of generating ByteBuffers and keeping those lying around, have
these filesystems generate KBuffers instead. These are way less spooky
to leave around for a while.
Since FileDescription will keep a generated file buffer around until
userspace has read the whole thing, this prevents trivially exhausting
the kmalloc heap by opening many files in /proc for example.
The code responsible for generating each /proc file is not perfectly
efficient and many of them still use ByteBuffers internally but they
at least go away when we return now. :^)
A KBuffer always contains a valid KBufferImpl. If you need a "null"
state buffer, use Optional<KBuffer>.
This makes KBuffer very easy to work with and pass around, just like
ByteBuffer before it.
There's no need for send_ipv4() to take a ByteBuffer&&, the data is
immediately cooked into a packet and transmitted. Instead, just pass
it the address+length of whatever buffer we've been using locally.
The more we can reduce the pressure on kmalloc the better. :^)
The situations in IPv4Socket and LocalSocket were mirrors of each other
where one had implemented read/write as wrappers and the other had
sendto/recvfrom as wrappers.
Instead of this silliness, move read and write up to the Socket base.
Then mark them final, so subclasses have no choice but to implement
sendto and recvfrom.
This memory is not accessible to userspace and comes from the kernel
page allocator, not from the kmalloc heap. This makes it ideal for
larger allocations.
Fork the IPC Connection classes into Server:: and Client::ConnectionNG.
The new IPC messages are serialized very snugly instead of using the
same generic data structure for all messages.
Remove ASAPI.h since we now generate all of it from AudioServer.ipc :^)
Instead of doing everything manually in C++, let's do some codegen.
This patch adds a crude but effective IPC definition parser, along
with two initial definition files for the AudioServer's client and
server endpoints.
- "seekable": whether the fd is seekable or sequential.
- "class": which kernel C++ class implements this File.
- "offset": the current implicit POSIX API file offset.
In the future, we should allow mounting any block device. At the moment
there is too much filesystem code that depends on the underlying device
being a DiskDevice.
- You must now have superuser privileges to use mount().
- We now verify that the mount point is a valid path first, before
trying to find a filesystem on the specified device.
- Convert some dbgprintf() to dbg().
It is now possible to mount ext2 `DiskDevice` devices under Serenity on
any folder in the root filesystem. Currently any user can do this with
any permissions. There's a fair amount of assumptions made here too,
that might not be too good, but can be worked on in the future. This is
a good start to allow more dynamic operation under the OS itself.
It is also currently impossible to unmount and such, and devices will
fail to mount in Linux as the FS 'needs to be cleaned'. I'll work on
getting `umount` done ASAP to rectify this (as well as working on less
assumption-making in the mount syscall. We don't want to just be able
to mount DiskDevices!). This could probably be fixed with some `-t`
flag or something similar.
We were forgetting where we put the userspace thread stacks, so added a
member called Thread::m_userspace_thread_stack to keep track of it.
Then, in ~Thread(), we now deallocate the userspace, kernel and signal
stacks (if present.)
Out of curiosity, the "init_stage2" process doesn't have a kernel stack
which I found surprising. :^)
We had some kernel-specific gizmos in AK that should really just be in the
Kernel subdirectory instead. The only thing remaining after moving those
was mmx_memcpy() which I moved to the ARCH(i386)-specific section of
LibC/string.cpp.
Processes can now have an icon assigned, which is essentially a 16x16 RGBA32
bitmap exposed as a shared buffer ID.
You set the icon ID by calling set_process_icon(int) and the icon ID will be
exposed through /proc/all.
To make this work, I added a mechanism for making shared buffers globally
accessible. For safety reasons, each app seals the icon buffer before making
it global.
Right now the first call to GWindow::set_icon() is what determines the
process icon. We'll probably change this in the future. :^)
This adds a bounds check to the loop that writes to the buffer
'recognized_symbols'. This prevents buffer overflows in the
case when a programs backtrace is particularly large.
Fixes#371.
The previous implementation of the PIIX3/4 PATA/IDE channel driver only
supported a single drive, as the object model was wrong (the channel
inherits the IRQ, not the disk drive itself). This fixes it by 'attaching'
two `PATADiskDevices` to a `PATAChannel`, which makes more sense.
The reading/writing code is presented as is, which violates the spec
outlined by Seagate in the linked datasheet. That spec is rather old,
so it might not be 100% up to date, though may cause issues on real
hardware, so until we can actually test it, this will suffice.
This is expensive because we have to page in the entire executable for every
process up front for this to work. This is due to the page fault code not
being strong enough to run while another process is active.
Note that we already had userspace symbols in *crash* stacks. This patch
adds them generally, so they show up in /proc, Process Manager, etc.
There's room for improvement here, but the debugging benefits way overshadow
the performance penalty right now. :^)
This makes assertion failures generate backtraces again. Sorry to everyone
who suffered from the lack of backtraces lately. :^)
We share code with the /proc/PID/stack implementation. You can now get the
current backtrace for a Thread via Thread::backtrace(), and all the traces
for a Process via Process::backtrace().
The syscall is quite simple:
int watch_file(const char* path, int path_length);
It returns a file descriptor referring to a "InodeWatcher" object in the
kernel. It becomes readable whenever something changes about the inode.
Currently this is implemented by hooking the "metadata dirty bit" in
Inode which isn't perfect, but it's a start. :^)