The idea behind WeakPtr<NetworkAdapter> was to support hot-pluggable
network adapters, but on closer thought, that's super impractical so
let's not go down that road.
If there's not enough space in the output buffer for the whole sockaddr
we now simply truncate the address instead of returning EINVAL.
This patch also makes getpeername() actually return the peer address
rather than the local address.. :^)
Move timeout management to the ReadBlocker and WriteBlocker classes.
Also get rid of the specialized ReceiveBlocker since it no longer does
anything that ReadBlocker can't do.
It was possible to read uninitialized kernel memory via getsockname().
Of course, kmalloc() is a good boy and scrubs new allocations with 0xBB
so all you got was a bunch of 0xBB.
System components that need an IRQ handling are now inheriting the
InterruptHandler class.
In addition to that, the initialization process of PATAChannel was
changed to fit the changes.
PATAChannel, E1000NetworkAdapter and RTL8139NetworkAdapter are now
inheriting from PCI::Device instead of InterruptHandler directly.
Sergey suggested that having a non-zero O_RDONLY would make some things
less confusing, and it seems like he's right about that.
We can now easily check read/write permissions separately instead of
dancing around with the bits.
This patch also fixes unveil() validation for O_RDWR which previously
forgot to check for "r" permission.
Background: DoubleBuffer is a handy buffer class in the kernel that
allows you to keep writing to it from the "outside" while the "inside"
reads from it. It's used for things like LocalSocket and TTY's.
Internally, it has a read buffer and a write buffer, but the two will
swap places when the read buffer is exhausted (by reading from it.)
Before this patch, it was internally implemented as two Vector<u8>
that we would swap between when the reader side had exhausted the data
in the read buffer. Now instead we preallocate a large KBuffer (64KB*2)
on DoubleBuffer construction and use that throughout its lifetime.
This removes all the kmalloc heap traffic caused by DoubleBuffers :^)
uintptr_t is 32-bit or 64-bit depending on the target platform.
This will help us write pointer size agnostic code so that when the day
comes that we want to do a 64-bit port, we'll be in better shape.
Right now, permission flags passed to VFS::open() are effectively ignored, but
that is going to change.
* O_RDONLY is 0, but it's still nicer to pass it explicitly
* POSIX says that binding a Unix socket to a symlink shall fail with EADDRINUSE
As suggested by Joshua, this commit adds the 2-clause BSD license as a
comment block to the top of every source file.
For the first pass, I've just added myself for simplicity. I encourage
everyone to add themselves as copyright holders of any file they've
added or modified in some significant way. If I've added myself in
error somewhere, feel free to replace it with the appropriate copyright
holder instead.
Going forward, all new source files should include a license header.
The kernel and its static data structures are no longer identity-mapped
in the bottom 8MB of the address space, but instead move above 3GB.
The first 8MB above 3GB are pseudo-identity-mapped to the bottom 8MB of
the physical address space. But things don't have to stay this way!
Thanks to Jesse who made an earlier attempt at this, it was really easy
to get device drivers working once the page tables were in place! :^)
Fixes#734.
The join_thread() syscall is not supposed to be interruptible by
signals, but it was. And since the process death mechanism piggybacked
on signal interrupts, it was possible to interrupt a pthread_join() by
killing the process that was doing it, leading to confusing due to some
assumptions being made by Thread::finalize() for threads that have a
pending joiner.
This patch fixes the issue by making "interrupted by death" a distinct
block result separate from "interrupted by signal". Then we handle that
state in join_thread() and tidy things up so that thread finalization
doesn't get confused by the pending joiner being gone.
Test: Tests/Kernel/null-deref-crash-during-pthread_join.cpp
In order to ensure a specific owner and mode when the local socket
filesystem endpoint is instantiated, we need to be able to call
fchmod() and fchown() on a socket fd between socket() and bind().
This is because until we call bind(), there is no filesystem inode
for the socket yet.
We now have these API's in <Kernel/Random.h>:
- get_fast_random_bytes(u8* buffer, size_t buffer_size)
- get_good_random_bytes(u8* buffer, size_t buffer_size)
- get_fast_random<T>()
- get_good_random<T>()
Internally they both use x86 RDRAND if available, otherwise they fall
back to the same LCG we had in RandomDevice all along.
The main purpose of this patch is to give kernel code a way to better
express its needs for random data.
Randomness is something that will require a lot more work, but this is
hopefully a step in the right direction.
The new PCI subsystem is initialized during runtime.
PCI::Initializer is supposed to be called during early boot, to
perform a few tests, and initialize the proper configuration space
access mechanism. Kernel boot parameters can be specified by a user to
determine what tests will occur, to aid debugging on problematic
machines.
After that, PCI::Initializer should be dismissed.
PCI::IOAccess is a class that is derived from PCI::Access
class and implements PCI configuration space access mechanism via x86
IO ports.
PCI::MMIOAccess is a class that is derived from PCI::Access
and implements PCI configurtaion space access mechanism via memory
access.
The new PCI subsystem also supports determination of IO/MMIO space
needed by a device by checking a given BAR.
In addition, Every device or component that use the PCI subsystem has
changed to match the last changes.
The majority of the time in NetworkTask was being spent in allocating
and deallocating KBuffers for each incoming packet.
We'll now keep up to 100 buffers around and reuse them for new packets
if the next incoming packet fits in an old buffer. This is pretty
naively implemented but definitely cuts down on time spent here.
Since stream sockets don't actually need to deliver packets-at-a-time
data in recvfrom(), they can just buffer the payload bytes instead.
This avoids keeping one KBuffer per incoming packet in the receive
queue, which was a big performance issue in ProtocolServer.
This code is definitely not perfect and is something we should keep
improving over time.
Using int was a mistake. This patch changes String, StringImpl,
StringView and StringBuilder to use size_t instead of int for lengths.
Obviously a lot of code needs to change as a result of this.
This patch adds these I/O counters to each thread:
- (Inode) file read bytes
- (Inode) file write bytes
- Unix socket read bytes
- Unix socket write bytes
- IPv4 socket read bytes
- IPv4 socket write bytes
These are then exposed in /proc/all and seen in SystemMonitor.
This defaults to 1500 for all adapters, but LoopbackAdapter increases
it to 65536 on construction.
If an IPv4 packet is larger than the MTU, we'll need to break it into
smaller fragments before transmitting it. This part is a FIXME. :^)
After a socket has disconnected, we shouldn't return -EAGAIN. Instead
we should allow userspace to read/recvfrom the socket until its packet
queue has been exhausted.
At that point, we now return 0, signalling EOF.
It might be even better to start returning -ENOTCONN after signalling
EOF once. I'm not sure how that should work, needs looking into.
This reverts commit 1cca5142af.
This appears to be causing intermittent triple-faults and I don't know
why yet, so I'll just revert it to keep the tree in decent shape.
Background: DoubleBuffer is a handy buffer class in the kernel that
allows you to keep writing to it from the "outside" while the "inside"
reads from it. It's used for things like LocalSocket and PTY's.
Internally, it has a read buffer and a write buffer, but the two will
swap places when the read buffer is exhausted (by reading from it.)
Before this patch, it was internally implemented as two Vector<u8>
that we would swap between when the reader side had exhausted the data
in the read buffer. Now instead we preallocate a large KBuffer (64KB*2)
on DoubleBuffer construction and use that throughout its lifetime.
This removes all the kmalloc heap traffic caused by DoubleBuffers :^)
Make sure we don't move accepted sockets to the Completed setup state
until we've actually constructed a FileDescription for them.
This is important, since this state transition will trigger connect()
to unblock on the client side, and the client may try writing to the
socket right away.
This makes DNS lookups way more reliable since we don't just fail to
write() right after connect()ing to LookupServer sometimes. :^)
This was causing connect() to unblock immediately for local sockets,
since that's exactly what ConnectBlocker checks for.
Instead, just move to SetupState::Completed when it's accept()ed.
If we can't already read when we enter recvfrom() on a LocalSocket,
we'll now block the current thread until we can.
Also added a buffer_for(FileDescription&) helper so that the client
and server can share some of the code. :^)
Made getsockopt() and setsockopt() virtual so we can handle them in the
various Socket subclasses. The subclasses map kinda nicely to "levels".
This will allow us to implement things like "traceroute", although..
I spent some time trying to do that, but then hit a wall when it turned
out that the user-mode networking in QEMU doesn't preserve TTL in the
ICMP packets passing through.
This approach is a bit naiive - whenever we send a packet out, we
check to see if there are any other packets we should try to send.
This works well enough for a busy connection but not very well for a
quiet one. Ideally we would check for not-acked packets on some kind
of timer, and use the length of this not-acked list as feedback to
throttle the writes coming from userspace.
This allows us to take advantage of unsolicited ARP replies, such as
those that are emitted by many systems after their network interfaces
are enabled, or after their DHCP client sets their IP.
This also makes us a bit more vulnerable to ARP flooding, but we need
some kind of eviction strategy anyway, so we can deal with that later.
An incoming socket should only be considered connected after a
program has received it from accept(). Before that point, it's only
"half" open, and it might not ever actually be served to a program.
Socket::accept is where m_connected is correctly set.
This was a workaround to be able to build on case-insensitive file
systems where it might get confused about <string.h> vs <String.h>.
Let's just not support building that way, so String.h can have an
objectively nicer name. :^)
This replaces the previous placeholder routing layer with a real one!
It's still very primitive, doesn't deal with things like timeouts very
well, and will probably need several more iterations to support more
normal networking things.
I haven't confirmed that this works with anything other than the QEMU
user networking layer, but I suspect that's what nearly everybody is
using at this point, so that's the important target to keep working.
By setting up the devices in init() and looping over the registered
network adapters in NetworkTask_main, we can remove the remaining
hard-coded adapter references from the network code.
This also assigns IPs according to the default range supplied by QEMU
in its slirp networking mode.
* The origin PID is the PID of the process that created this socket,
either explicitly by calling socket(), or implicitly by accepting
a TCP connection. Note that accepting a local socket connection
does not create a new socket, it reuses the one connect() was
called on, so for accepted local sockets the origin PID points
to the connecting process.
* The acceptor PID is the PID of the process that accept()ed this
socket. For accepted TCP sockets, this is the same as origin PID.
This is more logical and allows us to solve the problem of
non-blocking TCP sockets getting stuck in SocketRole::None.
The only complication is that a single LocalSocket may be shared
between two file descriptions (on the connect and accept sides),
and should have two different roles depending from which side
you look at it. To deal with it, Socket::role() is made a
virtual method that accepts a file description, and LocalSocket
internally tracks which FileDescription is the which one and
returns a correct role.
Now that there can't be multiple clones of the same fd,
we only need to track whether or not an fd exists on each
side. Also there's no point in tracking connecting fds.
Once we've converted from an Ethernet frame to an IPv4 packet, we can
pass the IPv4Packet around instead of the EthernetFrameHeader.
Also add some more code to ignore invalid-looking packets.
This is comprised of five small changes:
* Keep a counter for tx/rx packets/bytes per TCP socket
* Keep a counter for tx/rx packets/bytes per network adapter
* Expose that data in /proc/net_tcp and /proc/netadapters
* Convert /proc/netadapters to JSON
* Fix up ifconfig to read the JSON from netadapters
This has several significant changes to the networking stack.
* Significant refactoring of the TCP state machine. Right now it's
probably more fragile than it used to be, but handles quite a lot
more of the handshake process.
* `TCPSocket` holds a `NetworkAdapter*`, assigned during `connect()` or
`bind()`, whichever comes first.
* `listen()` is now virtual in `Socket` and intended to be implemented
in its child classes
* `listen()` no longer works without `bind()` - this is a bit of a
regression, but listening sockets didn't work at all before, so it's
not possible to observe the regression.
* A file is exposed at `/proc/net_tcp`, which is a JSON document listing
the current TCP sockets with a bit of metadata.
* There's an `ETHERNET_VERY_DEBUG` flag for dumping packet's content out
to `kprintf`. It is, indeed, _very debug_.
A KBuffer always contains a valid KBufferImpl. If you need a "null"
state buffer, use Optional<KBuffer>.
This makes KBuffer very easy to work with and pass around, just like
ByteBuffer before it.
There's no need for send_ipv4() to take a ByteBuffer&&, the data is
immediately cooked into a packet and transmitted. Instead, just pass
it the address+length of whatever buffer we've been using locally.
The more we can reduce the pressure on kmalloc the better. :^)
The situations in IPv4Socket and LocalSocket were mirrors of each other
where one had implemented read/write as wrappers and the other had
sendto/recvfrom as wrappers.
Instead of this silliness, move read and write up to the Socket base.
Then mark them final, so subclasses have no choice but to implement
sendto and recvfrom.
And use this to return EINTR in various places; some of which we were
not handling properly before.
This might expose a few bugs in userspace, but should be more compatible
with other POSIX systems, and is certainly a little cleaner.
"Blocking" is not terribly informative, but now that everything is
ported over, we can force the blocker to provide us with a reason.
This does mean that to_string(State) needed to become a member, but
that's OK.
Replace the class-based snooze alarm mechanism with a per-thread callback.
This makes it easy to block the current thread on an arbitrary condition:
void SomeDevice::wait_for_irq() {
m_interrupted = false;
current->block_until([this] { return m_interrupted; });
}
void SomeDevice::handle_irq() {
m_interrupted = true;
}
Use this in the SB16 driver, and in NetworkTask :^)
Also tweak the kernel's Makefile to use -nostdinc and -nostdinc++.
This prevents us from picking up random headers from ../Root, which may
include older versions of kernel headers.
Since we still need <initializer_list> for Vector, we specifically include
the necessary GCC path. This is a bit hackish but it works for now.
This is prep work for supporting HashMap with NonnullRefPtr<T> as values.
It's currently not possible because many HashTable functions require being
able to default-construct the value type.
After reading a bunch of POSIX specs, I've learned that a file descriptor
is the number that refers to a file description, not the description itself.
So this patch renames FileDescriptor to FileDescription, and Process now has
FileDescription* file_description(int fd).
The current working directory is now stored as a custody. Likewise for a
process executable file. This unbreaks /proc/PID/fd which has not been
working since we made the filesystem bigger.
This still needs a bunch of work, for instance when renaming or removing
a file somewhere, we have to update the relevant custody links.
links requests SO_ERROR, so in not supporting it, things were unhappy.
Supporting this properly looks a little messy. I guess Socket will need
an m_error member it sets everywhere it returns an error. Or Syscall
could set it, perhaps, but I don't know if that's the right thing to
do, so let's just stub this for now and file a bug.
Also run it across the whole tree to get everything using the One True Style.
We don't yet run this in an automated fashion as it's a little slow, but
there is a snippet to do so in makeall.sh.
This gives us some leeway for WindowServer to queue up a bunch of messages
for one of its clients. Longer-term we should improve DoubleBuffer to be
able to grow dynamically in a way that gets billed to some reasonable place.
Passing this flag to recv() temporarily puts the file descriptor into
non-blocking mode.
Also implement LocalSocket::recv() as a simple forwarding to read().
can_write() was saying yes in situations where write() would overflow the
internal buffer. This patch adds a has_attached_peer() helper to make it
easier to understand what's going on in these functions.
This is not EOF, and never should have been so -- can trip up other code
when porting.
Also updates LibGUI and WindowServer which both relied on the old
behaviour (and didn't work without changes). There may be others, but I
didn't run into them with a quick inspection.
It was way too ambiguous who's the source and who's the destination, and it
didn't really follow a logical pattern. "Local port" vs "Peer port" is super
obvious, so let's call it that.
Make the Socket functions take a FileDescriptor& rather than a socket role
throughout the code. Also change threads to block on a FileDescriptor,
rather than either an fd index or a Socket.
If connect() is called on a non-blocking socket, it will "fail" immediately
with -EINPROGRESS. After that, you select() on the socket and wait for it to
become writable.
This is useful for static locals that never need to be destroyed:
Thing& Thing::the()
{
static Eternal<Thing> the;
return the;
}
The object will be allocated in data segment memory and will never have
its destructor invoked.
Choosing adapter for transmit is done by adapter_for_route_to(IPv4Address).
This is just hard-coded logic right now but can be expanded to support a
proper routing table.
Also start moving kernel networking code into Kernel/Net/.