Commit graph

190 commits

Author SHA1 Message Date
Liav A. 2bba9411ca Kernel: Use the AK SetOnce container class in various cases
We have many places in the kernel code that we have boolean flags that
are only set once, and never reset again but are checked multiple times
before and after the time they're being set, which matches the purpose
of the SetOnce class.
2024-04-26 23:46:23 -06:00
Dan Klishch 5ed7cd6e32 Everywhere: Use east const in more places
These changes are compatible with clang-format 16 and will be mandatory
when we eventually bump clang-format version. So, since there are no
real downsides, let's commit them now.
2024-04-19 06:31:19 -04:00
Tom Finet b9cfb50f71 Kernel/Net: Add TCPSocket timer for TimeWait moving to Closed
RFC9293 states that from the TimeWait state the TCPSocket
should wait the MSL (2mins) for delayed segments to expire
so that their sequence numbers do not clash with a new
connection's sequence numbers using the same ip address
and port number. The wait also ensures the remote TCP peer
has received the ACK to their FIN segment.
2024-03-14 18:33:19 -06:00
Idan Horowitz 785c9d5c2b Kernel: Add support for TCP window size scaling
This should allow us to eventually properly saturate high-bandwidth
network links when using TCP, once other nonoptimal parts of our
network stack are improved.
2023-12-26 21:36:49 +01:00
Idan Horowitz 2c51ff763b Kernel: Properly report receive window size in sent TCP packets
Instead of lying and claiming we always have space left in our receive
buffer, actually report the available space.

While this doesn't really affect network-bound workloads, it makes a
world of difference in cpu/disk-bound ones, like git clones. Resulting
in a considerable speed-up, and in some cases making them work at all.
(instead of the sender side hanging up the connection due to timeouts)
2023-12-26 21:36:49 +01:00
Idan Horowitz 863e8c30ad Kernel: Ensure sockets_by_tuple table entry is up to date on connect
Previously we would incorrectly handle the (somewhat uncommon) case of
binding and then separately connecting a tcp socket to a server, as we
would register the socket during the manual bind(2) in the sockets by
tuple table, but our effective tuple would then change as the result of
the connect updating our target peer address. This would result in the
the entry not being removed from the table on destruction, which could
lead to a UAF.

We now make sure to update the table entry if needed during connects.
2023-12-26 18:36:43 +01:00
Romain Chardiny 61ac554a34 Kernel/Net: Implement TCP_NODELAY 2023-11-08 09:31:54 +01:00
Uku Loskit 2bec281ddc Kernel: Fix panic for Nagel's algorithm
It seems like the current implementation returns 0 in case we do not
have enough data for a whole packet yet. The 0 value gets propagated
to the return value of the syscall which according to the spec
should return non-zero values for non-errors cases. This causes panic,
as there is a VERIFY guard checking that more than > 0 bytes are
written if no error has occurred.
2023-11-05 09:07:39 +01:00
kleines Filmröllchen 12e534c8c6 Kernel: Implement Nagle’s Algorithm
This is an initial implementation, about as basic as intended by the
RFC, and not configurable from userspace at the moment. It should reduce
the amount of low-sized packets sent, reducing overhead and thereby
network traffic.
2023-08-28 00:28:15 +02:00
kleines Filmröllchen ed966a80e2 Kernel/Net: Use monotonic time for TCP times
These were using real time as a mistake before; changing the system time
during ongoing TCP connections shouldn’t break them.
2023-08-28 00:28:15 +02:00
Sergey Bugaev 95bcffd713 Kernel/Net: Rework ephemeral port allocation
Currently, ephemeral port allocation is handled by the
allocate_local_port_if_needed() and protocol_allocate_local_port()
methods. Actually binding the socket to an address (which means
inserting the socket/address pair into a global map) is performed either
in protocol_allocate_local_port() (for ephemeral ports) or in
protocol_listen() (for non-ephemeral ports); the latter will fail with
EADDRINUSE if the address is already used by an existing pair present in
the map.

There used to be a bug where for listen() without an explicit bind(),
the port allocation would conflict with itself: first an ephemeral port
would get allocated and inserted into the map, and then
protocol_listen() would check again for the port being free, find the
just-created map entry, and error out. This was fixed in commit
01e5af487f by passing an additional flag
did_allocate_port into protocol_listen() which specifies whether the
port was just allocated, and skipping the check in protocol_listen() if
the flag is set.

However, this only helps if the socket is bound to an ephemeral port
inside of this very listen() call. But calling bind(sin_port = 0) from
userspace should succeed and bind to an allocated ephemeral port, in the
same was as using an unbound socket for connect() does. The port number
can then be retrieved from userspace by calling getsockname (), and it
should be possible to either connect() or listen() on this socket,
keeping the allocated port number. Also, calling bind() when already
bound (either explicitly or implicitly) should always result in EINVAL.

To untangle this, introduce an explicit m_bound state in IPv4Socket,
just like LocalSocket has already. Once a socket is bound, further
attempt to bind it fail. Some operations cause the socket to implicitly
get bound to an (ephemeral) address; this is implemented by the new
ensure_bound() method. The protocol_allocate_local_port() method is
gone; it is now up to a protocol to assign a port to the socket inside
protocol_bind() if it finds that the socket has local_port() == 0.

protocol_bind() is now called in more cases, such as inside listen() if
the socket wasn't bound before that.
2023-07-29 16:51:58 -06:00
Pierre Delagrave 55faff80df Kernet/Net: Close a TCP connection using FIN|ACK instead of just FIN
When initiating a connection termination, the FIN should be sent with
a ACK from the last received segment even if that ACK already been sent.
2023-06-29 05:58:03 +02:00
Optimoos e72894f23d Kernel/TCPSocket: Read window size from peer
During receive_tcp_packet(), we now set m_send_window_size for the
socket if it is different from the default.

This removes one FIXME from TCPSocket.h.
2023-06-19 13:20:36 +02:00
Liav A 490856453d Kernel: Move Random.{h,cpp} code to Security subdirectory 2023-06-04 21:32:34 +02:00
Liav A 1b04726c85 Kernel: Move all tasks-related code to the Tasks subdirectory 2023-06-04 21:32:34 +02:00
kleines Filmröllchen 213025f210 AK: Rename Time to Duration
That's what this class really is; in fact that's what the first line of
the comment says it is.

This commit does not rename the main files, since those will contain
other time-related classes in a little bit.
2023-05-24 23:18:07 +02:00
Liav A 4617c05a08 Kernel: Move a bunch of generic devices code into new subdirectory 2023-05-19 21:49:21 +02:00
Liav A 7c1f645e27 Kernel/Net: Iron out the locking mechanism across the subsystem
There is a big mix of LockRefPtrs all over the Networking subsystem, as
well as lots of room for improvements with our locking patterns, which
this commit will not pursue, but will give a good start for such work.

To deal with this situation, we change the following things:
- Creating instances of NetworkAdapter should always yield a non-locking
  NonnullRefPtr. Acquiring an instance from the NetworkingManagement
  should give a simple RefPtr,as giving LockRefPtr does not really
  protect from concurrency problems in such case.
- Since NetworkingManagement works with normal RefPtrs we should
  protect all instances of RefPtr<NetworkAdapter> with SpinlockProtected
  to ensure references are gone unexpectedly.
- Protect the so_error class member with a proper spinlock. This happens
  to be important because the clear_so_error() method lacked any proper
  locking measures. It also helps preventing a possible TOCTOU when we
  might do a more fine-grained locking in the Socket code, so this could
  be definitely a start for this.
- Change unnecessary LockRefPtr<PacketWithTimestamp> in the structure
  of OutgoingPacket to a simple RefPtr<PacketWithTimestamp> as the whole
  list should be MutexProtected.
2023-04-14 19:27:56 +02:00
Andreas Kling 03cc45e5a2 Kernel: Use RefPtr instead of LockRefPtr for File and subclasses
This was mostly straightforward, as all the storage locations are
guarded by some related mutex.

The use of old-school associated mutexes instead of MutexProtected
is unfortunate, but the process to modernize such code is ongoing.
2023-03-10 13:15:44 +01:00
Tim Schumacher 30a553ef80 Kernel: Check against TCP packet size overflows in checksum calculation 2022-12-14 15:17:05 +00:00
Tim Schumacher 24f956c739 Kernel: Convert TCP pseudo-headers through a union
This keeps us from tripping strict aliasing, which previously made TCP
connections inoperable when building without `-fsanitize=undefined` or
`-fno-strict-aliasing`.
2022-12-14 15:17:05 +00:00
Gunnar Beutner a9888d4ea0 AK+Kernel: Handle some allocation failures in IPv4Socket and TCPSocket
This adds try_* methods to AK::SinglyLinkedList and
AK::SinglyLinkedListWithCount and updates the network stack to use
those to gracefully handle allocation failures.

Refs #6369.
2022-11-01 14:31:48 +00:00
Andreas Kling 11eee67b85 Kernel: Make self-contained locking smart pointers their own classes
Until now, our kernel has reimplemented a number of AK classes to
provide automatic internal locking:

- RefPtr
- NonnullRefPtr
- WeakPtr
- Weakable

This patch renames the Kernel classes so that they can coexist with
the original AK classes:

- RefPtr => LockRefPtr
- NonnullRefPtr => NonnullLockRefPtr
- WeakPtr => LockWeakPtr
- Weakable => LockWeakable

The goal here is to eventually get rid of the Lock* classes in favor of
using external locking.
2022-08-20 17:20:43 +02:00
Idan Horowitz 364f6a9bf0 Kernel: Remove the Socket::{protocol,}connect ShouldBlock argument
This argument is always set to description.is_blocking(), but
description is also given as a separate argument, so there's no point
to piping it through separately.
2022-07-21 16:39:22 +02:00
Tim Schumacher 3b3af58cf6 Kernel: Annotate all KBuffer and DoubleBuffer with a custom name 2022-07-12 00:55:31 +01:00
Idan Horowitz 4a270c93ed Kernel: Accept NNRP<Socket> instead of RP<Socket> in release_for_accept
This value is always non-null, so let's make it explicit.
2022-04-09 23:46:02 +02:00
Idan Horowitz 086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
Andreas Kling f0dde1cee1 Kernel: Rename TCPFlags::PUSH => PSH
Let's use the proper name of this TCP flag.
2022-03-18 15:18:48 +01:00
Idan Horowitz aabc8d348b Kernel: Add TCPSocket::try_for_each() for fallible iteration
This API will allow users to short circuit iteration and properly
propagate errors.
2022-02-27 20:37:57 +01:00
Andreas Kling c797eaa9f8 Kernel/Net: Don't update TCP socket "last sent ACK" field too early
Defer updating this field until after the last fallible operation has
succeeded.
2022-02-11 12:45:38 +01:00
Andreas Kling 7247f0204d Kernel: Send only FIN when shutting down TCP socket from ESTABLISHED
We were previously sending FIN|ACK for some reason.
2022-02-06 22:13:13 +01:00
Idan Horowitz 664ca58746 Kernel: Use u64 instead of size_t for File::can_write offset
This ensures offsets will not be truncated on large files on i686.
2022-01-25 22:41:17 +02:00
Andreas Kling b86443f0e1 Kernel: Lock weak pointer revocation during listed-ref-counted unref
When doing the last unref() on a listed-ref-counted object, we keep
the list locked while mutating the ref count. The destructor itself
is invoked after unlocking the list.

This was racy with weakable classes, since their weak pointer factory
still pointed to the object after we'd decided to destroy it. That
opened a small time window where someone could try to strong-ref a weak
pointer to an object after it was removed from the list, but just before
the destructor got invoked.

This patch closes the race window by explicitly revoking all weak
pointers while the list is locked.
2022-01-08 16:31:14 +01:00
Andreas Kling 626507943c Kernel: Remove redundant socket_by_tuple removal in ~TCPSocket() 2022-01-06 22:51:28 +01:00
Andreas Kling 01b3666894 Kernel: Lock TCPSocket lookup table across destruction
Use the same trick as SlavePTY and override unref() to provide safe
removal from the sockets_by_tuple table when destroying a TCPSocket.

This should fix the TCPSocket::from_tuple() flake seen on CI.
2022-01-06 22:30:40 +01:00
sin-ack 3da0c072f4 Kernel: Return the correct result for FIONREAD on datagram sockets
Before this commit, we only checked the receive buffer on the socket,
which is unused on datagram streams. Now we return the actual size of
the datagram without the protocol headers, which required the protocol
to tell us what the size of the payload is.
2021-12-16 22:21:35 +03:30
Andreas Kling 79fa9765ca Kernel: Replace KResult and KResultOr<T> with Error and ErrorOr<T>
We now use AK::Error and AK::ErrorOr<T> in both kernel and userspace!
This was a slightly tedious refactoring that took a long time, so it's
not unlikely that some bugs crept in.

Nevertheless, it does pass basic functionality testing, and it's just
real nice to finally see the same pattern in all contexts. :^)
2021-11-08 01:10:53 +01:00
Idan Horowitz adc9939a7b Kernel+LibC: Add support for the IPv4 TOS field via the IP_TOS sockopt 2021-10-28 11:24:36 +02:00
sin-ack 0ccef94a49 Kernel: Drop the receive buffer when socket enters the TimeWait state
The TimeWait state is intended to prevent another socket from taking the
address tuple in case any packets are still in transit after the final
close. Since this state never delivers packets to userspace, it doesn't
make sense to keep the receive buffer around.
2021-09-16 16:50:23 +02:00
Andreas Kling 899cee8185 Kernel: Make KBuffer::try_create_with_size() return KResultOr
This allows us to use TRY() in a lot of new places.
2021-09-07 15:15:08 +02:00
Andreas Kling c69035c630 Kernel: TCPSocket always has a scratch buffer
Let's encode this in the constructor signature.
2021-09-07 15:11:49 +02:00
Andreas Kling 308773ffda Kernel/Net: Add a special SOCKET_TRY() and use it in socket code
Sockets remember their last error code in the SO_ERROR field, so we need
to take special care to remember this when returning an error.

This patch adds a SOCKET_TRY() that works like TRY() but also calls
set_so_error() on the failure path.

There's probably a lot more code that should be using this, but that's
outside the scope of this patch.
2021-09-07 15:05:51 +02:00
Andreas Kling ededd6aac6 Kernel: Make TCPSocket client construction use KResultOr and TRY()
We don't really have anywhere to propagate the error in NetworkTask at
the moment, since it runs in its own kernel thread and has no direct
userspace caller.
2021-09-07 14:44:29 +02:00
Andreas Kling 01993d0af3 Kernel: Make DoubleBuffer::try() return KResultOr
This tidies up error propagation in a number of places.
2021-09-07 13:53:14 +02:00
Andreas Kling 4a9c18afb9 Kernel: Rename FileDescription => OpenFileDescription
Dr. POSIX really calls these "open file description", not just
"file description", so let's call them exactly that. :^)
2021-09-07 13:53:14 +02:00
Andreas Kling b481132418 Kernel: Make UserOrKernelBuffer return KResult from read/write/memset
This allows us to simplify a whole bunch of call sites with TRY(). :^)
2021-09-07 13:53:14 +02:00
Andreas Kling 8714c550b4 Kernel: Use TRY() in TCPSocket 2021-09-05 16:25:40 +02:00
Andreas Kling 648c768d81 Kernel: Tidy up TCPSocket creation a bit
- Rename create() => try_create()
- Use adopt_nonnull_ref_or_enomem()
2021-09-04 23:11:04 +02:00
Brian Gianforcaro afa0fb55b0 Kernel: Don't cast to NetworkOrdered<u16>* from random data
NetworkOrdered is a non trivial type, and it's undefined behavior to
cast a random pointer to it and then pretend it's that type.

Instead just call AK::convert_between_host_and_network_endian on the
individual u16*. This suppresses static analysis warnings.

I don't think there was a "bug" in the previous code, it worked, but
it was very brittle.
2021-09-01 18:06:14 +02:00
Andreas Kling ed0e64943f Kernel: Rename Socket::lock() => Socket::mutex()
"lock" is ambiguous (verb vs noun) while "mutex" is not.
2021-08-29 22:19:42 +02:00