Commit graph

520 commits

Author SHA1 Message Date
John Fastabend dc9dfc8dc6 net: tls, fix WARNIING in __sk_msg_free
A splice with MSG_SPLICE_PAGES will cause tls code to use the
tls_sw_sendmsg_splice path in the TLS sendmsg code to move the user
provided pages from the msg into the msg_pl. This will loop over the
msg until msg_pl is full, checked by sk_msg_full(msg_pl). The user
can also set the MORE flag to hint stack to delay sending until receiving
more pages and ideally a full buffer.

If the user adds more pages to the msg than can fit in the msg_pl
scatterlist (MAX_MSG_FRAGS) we should ignore the MORE flag and send
the buffer anyways.

What actually happens though is we abort the msg to msg_pl scatterlist
setup and then because we forget to set 'full record' indicating we
can no longer consume data without a send we fallthrough to the 'continue'
path which will check if msg_data_left(msg) has more bytes to send and
then attempts to fit them in the already full msg_pl. Then next
iteration of sender doing send will encounter a full msg_pl and throw
the warning in the syzbot report.

To fix simply check if we have a full_record in splice code path and
if not send the msg regardless of MORE flag.

Reported-and-tested-by: syzbot+f2977222e0e95cec15c8@syzkaller.appspotmail.com
Reported-by: Edward Adam Davis <eadavis@qq.com>
Fixes: fe1e81d4f7 ("tls/sw: Support MSG_SPLICE_PAGES")
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-14 12:17:14 +00:00
John Fastabend c5a595000e net: tls, update curr on splice as well
The curr pointer must also be updated on the splice similar to how
we do this for other copy types.

Fixes: d829e9c411 ("tls: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Reported-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20231206232706.374377-2-john.fastabend@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07 09:52:28 -08:00
Jann Horn 53f2cb491b tls: fix NULL deref on tls_sw_splice_eof() with empty record
syzkaller discovered that if tls_sw_splice_eof() is executed as part of
sendfile() when the plaintext/ciphertext sk_msg are empty, the send path
gets confused because the empty ciphertext buffer does not have enough
space for the encryption overhead. This causes tls_push_record() to go on
the `split = true` path (which is only supposed to be used when interacting
with an attached BPF program), and then get further confused and hit the
tls_merge_open_record() path, which then assumes that there must be at
least one populated buffer element, leading to a NULL deref.

It is possible to have empty plaintext/ciphertext buffers if we previously
bailed from tls_sw_sendmsg_locked() via the tls_trim_both_msgs() path.
tls_sw_push_pending_record() already handles this case correctly; let's do
the same check in tls_sw_splice_eof().

Fixes: df720d288d ("tls/sw: Use splice_eof() to flush")
Cc: stable@vger.kernel.org
Reported-by: syzbot+40d43509a099ea756317@syzkaller.appspotmail.com
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20231122214447.675768-1-jannh@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-23 08:51:45 -08:00
Sabrina Dubroca b7c4f5730a tls: don't reset prot->aad_size and prot->tail_size for TLS_HW
Prior to commit 1a074f7618 ("tls: also use init_prot_info in
tls_set_device_offload"), setting TLS_HW on TX didn't touch
prot->aad_size and prot->tail_size. They are set to 0 during context
allocation (tls_prot_info is embedded in tls_context, kzalloc'd by
tls_ctx_create).

When the RX key is configured, tls_set_sw_offload is called (for both
TLS_SW and TLS_HW). If the TX key is configured in TLS_HW mode after
the RX key has been installed, init_prot_info will now overwrite the
correct values of aad_size and tail_size, breaking SW decryption and
causing -EBADMSG errors to be returned to userspace.

Since TLS_HW doesn't use aad_size and tail_size at all (for TLS1.2,
tail_size is always 0, and aad_size is equal to TLS_HEADER_SIZE +
rec_seq_size), we can simply drop this hunk.

Fixes: 1a074f7618 ("tls: also use init_prot_info in tls_set_device_offload")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Ran Rozenstein <ranro@nvidia.com>
Link: https://lore.kernel.org/r/979d2f89a6a994d5bb49cae49a80be54150d094d.1697653889.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-23 10:15:09 -07:00
Jakub Kicinski 041c3466f3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

net/mac80211/key.c
  02e0e426a2 ("wifi: mac80211: fix error path key leak")
  2a8b665e6b ("wifi: mac80211: remove key_mtx")
  7d6904bf26 ("Merge wireless into wireless-next")
https://lore.kernel.org/all/20231012113648.46eea5ec@canb.auug.org.au/

Adjacent changes:

drivers/net/ethernet/ti/Kconfig
  a602ee3176 ("net: ethernet: ti: Fix mixed module-builtin object")
  98bdeae950 ("net: cpmac: remove driver to prepare for platform removal")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 13:29:01 -07:00
Paolo Abeni 419ce133ab tcp: allow again tcp_disconnect() when threads are waiting
As reported by Tom, .NET and applications build on top of it rely
on connect(AF_UNSPEC) to async cancel pending I/O operations on TCP
socket.

The blamed commit below caused a regression, as such cancellation
can now fail.

As suggested by Eric, this change addresses the problem explicitly
causing blocking I/O operation to terminate immediately (with an error)
when a concurrent disconnect() is executed.

Instead of tracking the number of threads blocked on a given socket,
track the number of disconnect() issued on such socket. If such counter
changes after a blocking operation releasing and re-acquiring the socket
lock, error out the current operation.

Fixes: 4faeee0cf8 ("tcp: deny tcp_disconnect() when threads are waiting")
Reported-by: Tom Deseyn <tdeseyn@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1886305
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/f3b95e47e3dbed840960548aebaa8d954372db41.1697008693.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13 16:49:32 -07:00
Sabrina Dubroca 9f0c824551 tls: use fixed size for tls_offload_context_{tx,rx}.driver_state
driver_state is a flex array, but is always allocated by the tls core
to a fixed size (TLS_DRIVER_STATE_SIZE_{TX,RX}). Simplify the code by
making that size explicit so that sizeof(struct
tls_offload_context_{tx,rx}) works.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:10 +01:00
Sabrina Dubroca 1cf7fbcee6 tls: validate crypto_info in a separate helper
Simplify do_tls_setsockopt_conf a bit.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:10 +01:00
Sabrina Dubroca 4f48669918 tls: remove tls_context argument from tls_set_device_offload
It's not really needed since we end up refetching it as tls_ctx. We
can also remove the NULL check, since we have already dereferenced ctx
in do_tls_setsockopt_conf.

While at it, fix up the reverse xmas tree ordering.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:10 +01:00
Sabrina Dubroca b6a30ec923 tls: remove tls_context argument from tls_set_sw_offload
It's not really needed since we end up refetching it as tls_ctx. We
can also remove the NULL check, since we have already dereferenced ctx
in do_tls_setsockopt_conf.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:10 +01:00
Sabrina Dubroca 0137407999 tls: add a helper to allocate/initialize offload_ctx_tx
Simplify tls_set_device_offload a bit.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:10 +01:00
Sabrina Dubroca 1a074f7618 tls: also use init_prot_info in tls_set_device_offload
Most values are shared. Nonce size turns out to be equal to IV size
for all offloadable ciphers.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:10 +01:00
Sabrina Dubroca a9937816ed tls: move tls_prot_info initialization out of tls_set_sw_offload
Simplify tls_set_sw_offload, and allow reuse for the tls_device code.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:10 +01:00
Sabrina Dubroca 615580cbc9 tls: extract context alloc/initialization out of tls_set_sw_offload
Simplify tls_set_sw_offload a bit.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:10 +01:00
Sabrina Dubroca 1c1cb3110d tls: store iv directly within cipher_context
TLS_MAX_IV_SIZE + TLS_MAX_SALT_SIZE is 20B, we don't get much benefit
in cipher_context's size and can simplify the init code a bit.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:10 +01:00
Sabrina Dubroca bee6b7b307 tls: rename MAX_IV_SIZE to TLS_MAX_IV_SIZE
It's defined in include/net/tls.h, avoid using an overly generic name.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:09 +01:00
Sabrina Dubroca 6d5029e547 tls: store rec_seq directly within cipher_context
TLS_MAX_REC_SEQ_SIZE is 8B, we don't get anything by using kmalloc.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:09 +01:00
Sabrina Dubroca 8f1d532b4a tls: drop unnecessary cipher_type checks in tls offload
We should never reach tls_device_reencrypt, tls_enc_record, or
tls_enc_skb with a cipher_type that can't be offloaded. Replace those
checks with a DEBUG_NET_WARN_ON_ONCE, and use cipher_desc instead of
hard-coding offloadable cipher types.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:09 +01:00
Sabrina Dubroca 3bab3ee0f9 tls: get salt using crypto_info_salt in tls_enc_skb
I skipped this conversion in my previous series.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:09 +01:00
Gustavo A. R. Silva a2713257ee tls: Use size_add() in call to struct_size()
If, for any reason, the open-coded arithmetic causes a wraparound,
the protection that `struct_size()` adds against potential integer
overflows is defeated. Fix this by hardening call to `struct_size()`
with `size_add()`.

Fixes: b89fec54fd ("tls: rx: wrap decrypt params in a struct")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18 09:37:23 +01:00
Liu Jian cfaa80c91f net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict()
I got the below warning when do fuzzing test:
BUG: KASAN: null-ptr-deref in scatterwalk_copychunks+0x320/0x470
Read of size 4 at addr 0000000000000008 by task kworker/u8:1/9

CPU: 0 PID: 9 Comm: kworker/u8:1 Tainted: G           OE
Hardware name: linux,dummy-virt (DT)
Workqueue: pencrypt_parallel padata_parallel_worker
Call trace:
 dump_backtrace+0x0/0x420
 show_stack+0x34/0x44
 dump_stack+0x1d0/0x248
 __kasan_report+0x138/0x140
 kasan_report+0x44/0x6c
 __asan_load4+0x94/0xd0
 scatterwalk_copychunks+0x320/0x470
 skcipher_next_slow+0x14c/0x290
 skcipher_walk_next+0x2fc/0x480
 skcipher_walk_first+0x9c/0x110
 skcipher_walk_aead_common+0x380/0x440
 skcipher_walk_aead_encrypt+0x54/0x70
 ccm_encrypt+0x13c/0x4d0
 crypto_aead_encrypt+0x7c/0xfc
 pcrypt_aead_enc+0x28/0x84
 padata_parallel_worker+0xd0/0x2dc
 process_one_work+0x49c/0xbdc
 worker_thread+0x124/0x880
 kthread+0x210/0x260
 ret_from_fork+0x10/0x18

This is because the value of rec_seq of tls_crypto_info configured by the
user program is too large, for example, 0xffffffffffffff. In addition, TLS
is asynchronously accelerated. When tls_do_encryption() returns
-EINPROGRESS and sk->sk_err is set to EBADMSG due to rec_seq overflow,
skmsg is released before the asynchronous encryption process ends. As a
result, the UAF problem occurs during the asynchronous processing of the
encryption module.

If the operation is asynchronous and the encryption module returns
EINPROGRESS, do not free the record information.

Fixes: 635d939817 ("net/tls: free record only on encryption error")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/20230909081434.2324940-1-liujian56@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 09:51:49 +02:00
Sabrina Dubroca f3e444e31f tls: get cipher_name from cipher_desc in tls_set_sw_offload
tls_cipher_desc also contains the algorithm name needed by
crypto_alloc_aead, use it.

Finally, use get_cipher_desc to check if the cipher_type coming from
userspace is valid, and remove the cipher_type switch.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/53d021d80138aa125a9cef4468aa5ce531975a7b.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:42 -07:00
Sabrina Dubroca 48dfad27fd tls: use tls_cipher_desc to access per-cipher crypto_info in tls_set_sw_offload
The crypto_info_* helpers allow us to fetch pointers into the
per-cipher crypto_info's data.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/c23af110caf0af6b68de2f86c58064913e2e902a.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:42 -07:00
Sabrina Dubroca d9a6ca1a97 tls: use tls_cipher_desc to get per-cipher sizes in tls_set_sw_offload
We can get rid of some local variables, but we have to keep nonce_size
because tls1.3 uses nonce_size = 0 for all ciphers.

We can also drop the runtime sanity checks on iv/rec_seq/tag size,
since we have compile time checks on those values.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/deed9c4430a62c31751a72b8c03ad66ffe710717.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:42 -07:00
Sabrina Dubroca 077e05d135 tls: use tls_cipher_desc to simplify do_tls_getsockopt_conf
Every cipher uses the same code to update its crypto_info struct based
on the values contained in the cctx, with only the struct type and
size/offset changing. We can get those  from tls_cipher_desc, and use
a single pair of memcpy and final copy_to_user.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/c21a904b91e972bdbbf9d1c6d2731ccfa1eedf72.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:42 -07:00
Sabrina Dubroca 5f309ade49 tls: get crypto_info size from tls_cipher_desc in do_tls_setsockopt_conf
We can simplify do_tls_setsockopt_conf using tls_cipher_desc. Also use
get_cipher_desc's result to check if the cipher_type coming from
userspace is valid.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/e97658eb4c6a5832f8ba20a06c4f36a77763c59e.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:42 -07:00
Sabrina Dubroca e907277aeb tls: expand use of tls_cipher_desc in tls_sw_fallback_init
tls_sw_fallback_init already gets the key and tag size from
tls_cipher_desc. We can now also check that the cipher type is valid,
and stop hard-coding the algorithm name passed to crypto_alloc_aead.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/c8c94b8fcafbfb558e09589c1f1ad48dbdf92f76.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:42 -07:00
Sabrina Dubroca d2322cf5ed tls: allocate the fallback aead after checking that the cipher is valid
No need to allocate the aead if we're going to fail afterwards.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/335e32511ed55a0b30f3f81a78fa8f323b3bdf8f.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:42 -07:00
Sabrina Dubroca 3524dd4d5f tls: expand use of tls_cipher_desc in tls_set_device_offload
tls_set_device_offload is already getting iv and rec_seq sizes from
tls_cipher_desc. We can now also check if the cipher_type coming from
userspace is valid and can be offloaded.

We can also remove the runtime check on rec_seq, since we validate it
at compile time.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/8ab71b8eca856c7aaf981a45fe91ac649eb0e2e9.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:41 -07:00
Sabrina Dubroca 0d98cc0202 tls: validate cipher descriptions at compile time
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/b38fb8cf60e099e82ae9979c3c9c92421042417c.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:41 -07:00
Sabrina Dubroca 176a3f50bc tls: extend tls_cipher_desc to fully describe the ciphers
- add nonce, usually equal to iv_size but not for chacha
 - add offsets into the crypto_info for each field
 - add algorithm name
 - add offloadable flag

Also add helpers to access each field of a crypto_info struct
described by a tls_cipher_desc.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/39d5f476d63c171097764e8d38f6f158b7c109ae.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:41 -07:00
Sabrina Dubroca 8db44ab26b tls: rename tls_cipher_size_desc to tls_cipher_desc
We're going to add other fields to it to fully describe a cipher, so
the "_size" name won't match the contents.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/76ca6c7686bd6d1534dfa188fb0f1f6fabebc791.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:41 -07:00
Sabrina Dubroca 037303d676 tls: reduce size of tls_cipher_size_desc
tls_cipher_size_desc indexes ciphers by their type, but we're not
using indices 0..50 of the array. Each struct tls_cipher_size_desc is
20B, so that's a lot of unused memory. We can reindex the array
starting at the lowest used cipher_type.

Introduce the get_cipher_size_desc helper to find the right item and
avoid out-of-bounds accesses, and make tls_cipher_size_desc's size
explicit so that gcc reminds us to update TLS_CIPHER_MIN/MAX when we
add a new cipher.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/5e054e370e240247a5d37881a1cd93a67c15f4ca.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:41 -07:00
Sabrina Dubroca 200e231651 tls: add TLS_CIPHER_ARIA_GCM_* to tls_cipher_size_desc
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/b2e0fb79e6d0a4478be9bf33781dc9c9281c9d56.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:41 -07:00
Sabrina Dubroca fd0fc6fdd8 tls: move tls_cipher_size_desc to net/tls/tls.h
It's only used in net/tls/*, no need to bloat include/net/tls.h.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/dd9fad80415e5b3575b41f56b331871038362eab.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:17:41 -07:00
Jakub Kicinski 4d016ae42e Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

drivers/net/ethernet/intel/igc/igc_main.c
  06b412589e ("igc: Add lock to safeguard global Qbv variables")
  d3750076d4 ("igc: Add TransmissionOverrun counter")

drivers/net/ethernet/microsoft/mana/mana_en.c
  a7dfeda6fd ("net: mana: Fix MANA VF unload when hardware is unresponsive")
  a9ca9f9cef ("page_pool: split types and declarations from page_pool.h")
  92272ec410 ("eth: add missing xdp.h includes in drivers")

net/mptcp/protocol.h
  511b90e392 ("mptcp: fix disconnect vs accept race")
  b8dc6d6ce9 ("mptcp: fix rcv buffer auto-tuning")

tools/testing/selftests/net/mptcp/mptcp_join.sh
  c8c101ae39 ("selftests: mptcp: join: fix 'implicit EP' test")
  03668c65d1 ("selftests: mptcp: join: rework detailed report")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10 14:10:53 -07:00
Jakub Kicinski 6b486676b4 net: tls: set MSG_SPLICE_PAGES consistently
We used to change the flags for the last segment, because
non-last segments had the MSG_SENDPAGE_NOTLAST flag set.
That flag is no longer a thing so remove the setting.

Since flags most likely don't have MSG_SPLICE_PAGES set
this avoids passing parts of the sg as splice and parts
as non-splice. Before commit under Fixes we'd have called
tcp_sendpage() which would add the MSG_SPLICE_PAGES.

Why this leads to trouble remains unclear but Tariq
reports hitting the WARN_ON(!sendpage_ok()) due to
page refcount of 0.

Fixes: e117dcfd64 ("tls: Inline do_tcp_sendpages()")
Reported-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/all/4c49176f-147a-4283-f1b1-32aac7b4b996@gmail.com/
Tested-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20230808180917.1243540-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10 11:36:57 -07:00
Hannes Reinecke ba4a734e1a net/tls: avoid TCP window full during ->read_sock()
When flushing the backlog after decoding a record we don't really
know how much data the caller want us to evaluate, so use INT_MAX
and 0 as arguments to tls_read_flush_backlog() to ensure we flush
at 128k of data. Otherwise we might be reading too much data and
trigger a TCP window full.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20230807071022.10091-1-hare@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08 15:53:49 -07:00
Yue Haibing f6ecb68b38 net/tls: Remove unused function declarations
Commit 3c4d755915 ("tls: kernel TLS support") declared but never implemented
these functions.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-07 08:53:54 +01:00
Jakub Kicinski 6b47808f22 net: tls: avoid discarding data on record close
TLS records end with a 16B tag. For TLS device offload we only
need to make space for this tag in the stream, the device will
generate and replace it with the actual calculated tag.

Long time ago the code would just re-reference the head frag
which mostly worked but was suboptimal because it prevented TCP
from combining the record into a single skb frag. I'm not sure
if it was correct as the first frag may be shorter than the tag.

The commit under fixes tried to replace that with using the page
frag and if the allocation failed rolling back the data, if record
was long enough. It achieves better fragment coalescing but is
also buggy.

We don't roll back the iterator, so unless we're at the end of
send we'll skip the data we designated as tag and start the
next record as if the rollback never happened.
There's also the possibility that the record was constructed
with MSG_MORE and the data came from a different syscall and
we already told the user space that we "got it".

Allocate a single dummy page and use it as fallback.

Found by code inspection, and proven by forcing allocation
failures.

Fixes: e7b159a48b ("net/tls: remove the record tail optimization")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06 08:32:18 +01:00
Jakub Kicinski 05191d8896 Merge branch 'in-kernel-support-for-the-tls-alert-protocol'
Chuck Lever says:

====================
In-kernel support for the TLS Alert protocol

IMO the kernel doesn't need user space (ie, tlshd) to handle the TLS
Alert protocol. Instead, a set of small helper functions can be used
to handle sending and receiving TLS Alerts for in-kernel TLS
consumers.
====================

Merged on top of a tag in case it's needed in the NFS tree.

Link: https://lore.kernel.org/r/169047923706.5241.1181144206068116926.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:08:02 -07:00
Chuck Lever 6a7eccef47 net/tls: Move TLS protocol elements to a separate header
Kernel TLS consumers will need definitions of various parts of the
TLS protocol, but often do not need the function declarations and
other infrastructure provided in <net/tls.h>.

Break out existing standardized protocol elements into a separate
header, and make room for a few more elements in subsequent patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169047931374.5241.7713175865185969309.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:07:59 -07:00
Hannes Reinecke 662fbcec32 net/tls: implement ->read_sock()
Implement ->read_sock() function for use with nvme-tcp.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Cc: Boris Pismenny <boris.pismenny@gmail.com>
Link: https://lore.kernel.org/r/20230726191556.41714-7-hare@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 19:49:35 -07:00
Hannes Reinecke f9ae3204fb net/tls: split tls_rx_reader_lock
Split tls_rx_reader_{lock,unlock} into an 'acquire/release' and
the actual locking part.
With that we can use the tls_rx_reader_lock in situations where
the socket is already locked.

Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230726191556.41714-6-hare@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 19:49:35 -07:00
Hannes Reinecke 11863c6d44 net/tls: Use tcp_read_sock() instead of ops->read_sock()
TLS resets the protocol operations, so the read_sock() callback might
be changed, too.
In this case using sock->ops->readsock() in tls_strp_read_copyin() will
enter an infinite recursion if the read_sock() callback is calling
tls_rx_rec_wait() which will call into sock->ops->readsock() via
tls_strp_read_copyin().
But as tls_strp_read_copyin() is supposed to produce data from the
consumed socket and that socket is always a TCP socket we can call
tcp_read_sock() directly without having to deal with callbacks.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230726191556.41714-5-hare@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 19:49:35 -07:00
Hannes Reinecke c004b0e00c net/tls: handle MSG_EOR for tls_device TX flow
tls_push_data() MSG_MORE, but bails out on MSG_EOR.
Seeing that MSG_EOR is basically the opposite of MSG_MORE
this patch adds handling MSG_EOR by treating it as the
absence of MSG_MORE.
Consequently we should return an error when both are set.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230726191556.41714-3-hare@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 19:49:34 -07:00
Hannes Reinecke e22e358bbe net/tls: handle MSG_EOR for tls_sw TX flow
tls_sw_sendmsg() already handles MSG_MORE, but bails
out on MSG_EOR.
Seeing that MSG_EOR is basically the opposite of
MSG_MORE this patch adds handling MSG_EOR by treating
it as the negation of MSG_MORE.
And erroring out if MSG_EOR is specified with MSG_MORE.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230726191556.41714-2-hare@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 19:49:34 -07:00
David Howells b848b26c66 net: Kill MSG_SENDPAGE_NOTLAST
Now that ->sendpage() has been removed, MSG_SENDPAGE_NOTLAST can be cleaned
up.  Things were converted to use MSG_MORE instead, but the protocol
sendpage stubs still convert MSG_SENDPAGE_NOTLAST to MSG_MORE, which is now
unnecessary.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-afs@lists.infradead.org
cc: mptcp@lists.linux.dev
cc: rds-devel@oss.oracle.com
cc: tipc-discussion@lists.sourceforge.net
cc: virtualization@lists.linux-foundation.org
Link: https://lore.kernel.org/r/20230623225513.2732256-17-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-24 15:50:13 -07:00
David Howells dc97391e66 sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)
Remove ->sendpage() and ->sendpage_locked().  sendmsg() with
MSG_SPLICE_PAGES should be used instead.  This allows multiple pages and
multipage folios to be passed through.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-afs@lists.infradead.org
cc: mptcp@lists.linux.dev
cc: rds-devel@oss.oracle.com
cc: tipc-discussion@lists.sourceforge.net
cc: virtualization@lists.linux-foundation.org
Link: https://lore.kernel.org/r/20230623225513.2732256-16-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-24 15:50:13 -07:00
David Howells f8dd95b29d tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usage
As MSG_SENDPAGE_NOTLAST is being phased out along with sendpage(), don't
use it further in than the sendpage methods, but rather translate it to
MSG_MORE and use that instead.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: Bernard Metzler <bmt@zurich.ibm.com>
cc: Jason Gunthorpe <jgg@ziepe.ca>
cc: Leon Romanovsky <leon@kernel.org>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Sitnicki <jakub@cloudflare.com>
cc: David Ahern <dsahern@kernel.org>
cc: Karsten Graul <kgraul@linux.ibm.com>
cc: Wenjia Zhang <wenjia@linux.ibm.com>
cc: Jan Karcher <jaka@linux.ibm.com>
cc: "D. Wythe" <alibuda@linux.alibaba.com>
cc: Tony Lu <tonylu@linux.alibaba.com>
cc: Wen Gu <guwen@linux.alibaba.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: Steffen Klassert <steffen.klassert@secunet.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/20230623225513.2732256-2-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-24 15:50:12 -07:00