Networking changes for 6.5.

Core
 ----
 
  - Rework the sendpage & splice implementations. Instead of feeding
    data into sockets page by page extend sendmsg handlers to support
    taking a reference on the data, controlled by a new flag called
    MSG_SPLICE_PAGES. Rework the handling of unexpected-end-of-file
    to invoke an additional callback instead of trying to predict what
    the right combination of MORE/NOTLAST flags is.
    Remove the MSG_SENDPAGE_NOTLAST flag completely.
 
  - Implement SCM_PIDFD, a new type of CMSG type analogous to
    SCM_CREDENTIALS, but it contains pidfd instead of plain pid.
 
  - Enable socket busy polling with CONFIG_RT.
 
  - Improve reliability and efficiency of reporting for ref_tracker.
 
  - Auto-generate a user space C library for various Netlink families.
 
 Protocols
 ---------
 
  - Allow TCP to shrink the advertised window when necessary, prevent
    sk_rcvbuf auto-tuning from growing the window all the way up to
    tcp_rmem[2].
 
  - Use per-VMA locking for "page-flipping" TCP receive zerocopy.
 
  - Prepare TCP for device-to-device data transfers, by making sure
    that payloads are always attached to skbs as page frags.
 
  - Make the backoff time for the first N TCP SYN retransmissions
    linear. Exponential backoff is unnecessarily conservative.
 
  - Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO).
 
  - Avoid waking up applications using TLS sockets until we have
    a full record.
 
  - Allow using kernel memory for protocol ioctl callbacks, paving
    the way to issuing ioctls over io_uring.
 
  - Add nolocalbypass option to VxLAN, forcing packets to be fully
    encapsulated even if they are destined for a local IP address.
 
  - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure
    in-kernel ECMP implementation (e.g. Open vSwitch) select the same
    link for all packets. Support L4 symmetric hashing in Open vSwitch.
 
  - PPPoE: make number of hash bits configurable.
 
  - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client
    (ipconfig).
 
  - Add layer 2 miss indication and filtering, allowing higher layers
    (e.g. ACL filters) to make forwarding decisions based on whether
    packet matched forwarding state in lower devices (bridge).
 
  - Support matching on Connectivity Fault Management (CFM) packets.
 
  - Hide the "link becomes ready" IPv6 messages by demoting their
    printk level to debug.
 
  - HSR: don't enable promiscuous mode if device offloads the proto.
 
  - Support active scanning in IEEE 802.15.4.
 
  - Continue work on Multi-Link Operation for WiFi 7.
 
 BPF
 ---
 
  - Add precision propagation for subprogs and callbacks. This allows
    maintaining verification efficiency when subprograms are used,
    or in fact passing the verifier at all for complex programs,
    especially those using open-coded iterators.
 
  - Improve BPF's {g,s}setsockopt() length handling. Previously BPF
    assumed the length is always equal to the amount of written data.
    But some protos allow passing a NULL buffer to discover what
    the output buffer *should* be, without writing anything.
 
  - Accept dynptr memory as memory arguments passed to helpers.
 
  - Add routing table ID to bpf_fib_lookup BPF helper.
 
  - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands.
 
  - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark
    maps as read-only).
 
  - Show target_{obj,btf}_id in tracing link fdinfo.
 
  - Addition of several new kfuncs (most of the names are self-explanatory):
    - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(),
      bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size()
      and bpf_dynptr_clone().
    - bpf_task_under_cgroup()
    - bpf_sock_destroy() - force closing sockets
    - bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs
 
 Netfilter
 ---------
 
  - Relax set/map validation checks in nf_tables. Allow checking
    presence of an entry in a map without using the value.
 
  - Increase ip_vs_conn_tab_bits range for 64BIT builds.
 
  - Allow updating size of a set.
 
  - Improve NAT tuple selection when connection is closing.
 
 Driver API
 ----------
 
  - Integrate netdev with LED subsystem, to allow configuring HW
    "offloaded" blinking of LEDs based on link state and activity
    (i.e. packets coming in and out).
 
  - Support configuring rate selection pins of SFP modules.
 
  - Factor Clause 73 auto-negotiation code out of the drivers, provide
    common helper routines.
 
  - Add more fool-proof helpers for managing lifetime of MDIO devices
    associated with the PCS layer.
 
  - Allow drivers to report advanced statistics related to Time Aware
    scheduler offload (taprio).
 
  - Allow opting out of VF statistics in link dump, to allow more VFs
    to fit into the message.
 
  - Split devlink instance and devlink port operations.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - Synopsys EMAC4 IP support (stmmac)
    - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches
    - Marvell 88E6250 7 port switches
    - Microchip LAN8650/1 Rev.B0 PHYs
    - MediaTek MT7981/MT7988 built-in 1GE PHY driver
 
  - WiFi:
    - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps
    - Realtek RTL8723DS (SDIO variant)
    - Realtek RTL8851BE
 
  - CAN:
    - Fintek F81604
 
 Drivers
 -------
 
  - Ethernet NICs:
    - Intel (100G, ice):
      - support dynamic interrupt allocation
      - use meta data match instead of VF MAC addr on slow-path
    - nVidia/Mellanox:
      - extend link aggregation to handle 4, rather than just 2 ports
      - spawn sub-functions without any features by default
    - OcteonTX2:
      - support HTB (Tx scheduling/QoS) offload
      - make RSS hash generation configurable
      - support selecting Rx queue using TC filters
    - Wangxun (ngbe/txgbe):
      - add basic Tx/Rx packet offloads
      - add phylink support (SFP/PCS control)
    - Freescale/NXP (enetc):
      - report TAPRIO packet statistics
    - Solarflare/AMD:
      - support matching on IP ToS and UDP source port of outer header
      - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6
      - add devlink dev info support for EF10
 
  - Virtual NICs:
    - Microsoft vNIC:
      - size the Rx indirection table based on requested configuration
      - support VLAN tagging
    - Amazon vNIC:
      - try to reuse Rx buffers if not fully consumed, useful for ARM
        servers running with 16kB pages
    - Google vNIC:
      - support TCP segmentation of >64kB frames
 
  - Ethernet embedded switches:
    - Marvell (mv88e6xxx):
      - enable USXGMII (88E6191X)
    - Microchip:
     - lan966x: add support for Egress Stage 0 ACL engine
     - lan966x: support mapping packet priority to internal switch
       priority (based on PCP or DSCP)
 
  - Ethernet PHYs:
    - Broadcom PHYs:
      - support for Wake-on-LAN for BCM54210E/B50212E
      - report LPI counter
    - Microsemi PHYs: support RGMII delay configuration (VSC85xx)
    - Micrel PHYs: receive timestamp in the frame (LAN8841)
    - Realtek PHYs: support optional external PHY clock
    - Altera TSE PCS: merge the driver into Lynx PCS which it is
      a variant of
 
  - CAN: Kvaser PCIEcan:
    - support packet timestamping
 
  - WiFi:
    - Intel (iwlwifi):
      - major update for new firmware and Multi-Link Operation (MLO)
      - configuration rework to drop test devices and split
        the different families
      - support for segmented PNVM images and power tables
      - new vendor entries for PPAG (platform antenna gain) feature
    - Qualcomm 802.11ax (ath11k):
      - Multiple Basic Service Set Identifier (MBSSID) and
        Enhanced MBSSID Advertisement (EMA) support in AP mode
      - support factory test mode
    - RealTek (rtw89):
      - add RSSI based antenna diversity
      - support U-NII-4 channels on 5 GHz band
    - RealTek (rtl8xxxu):
      - AP mode support for 8188f
      - support USB RX aggregation for the newer chips
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmSbJM4ACgkQMUZtbf5S
 IrtoDhAAhEim1+LBIKf4lhPcVdZ2p/TkpnwTz5jsTwSeRBAxTwuNJ2fQhFXg13E3
 MnRq6QaEp8G4/tA/gynLvQop+FEZEnv+horP0zf/XLcC8euU7UrKdrpt/4xxdP07
 IL/fFWsoUGNO+L9LNaHwBo8g7nHvOkPscHEBHc2Xrvzab56TJk6vPySfLqcpKlNZ
 CHWDwTpgRqNZzSKiSpoMVd9OVMKUXcPYHpDmfEJ5l+e8vTXmZzOLHrSELHU5nP5f
 mHV7gxkDCTshoGcaed7UTiOvgu1p6E5EchDJxiLaSUbgsd8SZ3u4oXwRxgj33RK/
 fB2+UaLrRt/DdlHvT/Ph8e8Ygu77yIXMjT49jsfur/zVA0HEA2dFb7V6QlsYRmQp
 J25pnrdXmE15llgqsC0/UOW5J1laTjII+T2T70UOAqQl4LWYAQDG4WwsAqTzU0KY
 dueydDouTp9XC2WYrRUEQxJUzxaOaazskDUHc5c8oHp/zVBT+djdgtvVR9+gi6+7
 yy4elI77FlEEqL0ItdU/lSWINayAlPLsIHkMyhSGKX0XDpKjeycPqkNx4UterXB/
 JKIR5RBWllRft+igIngIkKX0tJGMU0whngiw7d1WLw25wgu4sB53hiWWoSba14hv
 tXMxwZs5iGaPcT38oRVMZz8I1kJM4Dz3SyI7twVvi4RUut64EG4=
 =9i4I
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking changes from Jakub Kicinski:
 "WiFi 7 and sendpage changes are the biggest pieces of work for this
  release. The latter will definitely require fixes but I think that we
  got it to a reasonable point.

  Core:

   - Rework the sendpage & splice implementations

     Instead of feeding data into sockets page by page extend sendmsg
     handlers to support taking a reference on the data, controlled by a
     new flag called MSG_SPLICE_PAGES

     Rework the handling of unexpected-end-of-file to invoke an
     additional callback instead of trying to predict what the right
     combination of MORE/NOTLAST flags is

     Remove the MSG_SENDPAGE_NOTLAST flag completely

   - Implement SCM_PIDFD, a new type of CMSG type analogous to
     SCM_CREDENTIALS, but it contains pidfd instead of plain pid

   - Enable socket busy polling with CONFIG_RT

   - Improve reliability and efficiency of reporting for ref_tracker

   - Auto-generate a user space C library for various Netlink families

  Protocols:

   - Allow TCP to shrink the advertised window when necessary, prevent
     sk_rcvbuf auto-tuning from growing the window all the way up to
     tcp_rmem[2]

   - Use per-VMA locking for "page-flipping" TCP receive zerocopy

   - Prepare TCP for device-to-device data transfers, by making sure
     that payloads are always attached to skbs as page frags

   - Make the backoff time for the first N TCP SYN retransmissions
     linear. Exponential backoff is unnecessarily conservative

   - Create a new MPTCP getsockopt to retrieve all info
     (MPTCP_FULL_INFO)

   - Avoid waking up applications using TLS sockets until we have a full
     record

   - Allow using kernel memory for protocol ioctl callbacks, paving the
     way to issuing ioctls over io_uring

   - Add nolocalbypass option to VxLAN, forcing packets to be fully
     encapsulated even if they are destined for a local IP address

   - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure
     in-kernel ECMP implementation (e.g. Open vSwitch) select the same
     link for all packets. Support L4 symmetric hashing in Open vSwitch

   - PPPoE: make number of hash bits configurable

   - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client
     (ipconfig)

   - Add layer 2 miss indication and filtering, allowing higher layers
     (e.g. ACL filters) to make forwarding decisions based on whether
     packet matched forwarding state in lower devices (bridge)

   - Support matching on Connectivity Fault Management (CFM) packets

   - Hide the "link becomes ready" IPv6 messages by demoting their
     printk level to debug

   - HSR: don't enable promiscuous mode if device offloads the proto

   - Support active scanning in IEEE 802.15.4

   - Continue work on Multi-Link Operation for WiFi 7

  BPF:

   - Add precision propagation for subprogs and callbacks. This allows
     maintaining verification efficiency when subprograms are used, or
     in fact passing the verifier at all for complex programs,
     especially those using open-coded iterators

   - Improve BPF's {g,s}setsockopt() length handling. Previously BPF
     assumed the length is always equal to the amount of written data.
     But some protos allow passing a NULL buffer to discover what the
     output buffer *should* be, without writing anything

   - Accept dynptr memory as memory arguments passed to helpers

   - Add routing table ID to bpf_fib_lookup BPF helper

   - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands

   - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark
     maps as read-only)

   - Show target_{obj,btf}_id in tracing link fdinfo

   - Addition of several new kfuncs (most of the names are
     self-explanatory):
      - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(),
        bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size()
        and bpf_dynptr_clone().
      - bpf_task_under_cgroup()
      - bpf_sock_destroy() - force closing sockets
      - bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs

  Netfilter:

   - Relax set/map validation checks in nf_tables. Allow checking
     presence of an entry in a map without using the value

   - Increase ip_vs_conn_tab_bits range for 64BIT builds

   - Allow updating size of a set

   - Improve NAT tuple selection when connection is closing

  Driver API:

   - Integrate netdev with LED subsystem, to allow configuring HW
     "offloaded" blinking of LEDs based on link state and activity
     (i.e. packets coming in and out)

   - Support configuring rate selection pins of SFP modules

   - Factor Clause 73 auto-negotiation code out of the drivers, provide
     common helper routines

   - Add more fool-proof helpers for managing lifetime of MDIO devices
     associated with the PCS layer

   - Allow drivers to report advanced statistics related to Time Aware
     scheduler offload (taprio)

   - Allow opting out of VF statistics in link dump, to allow more VFs
     to fit into the message

   - Split devlink instance and devlink port operations

  New hardware / drivers:

   - Ethernet:
      - Synopsys EMAC4 IP support (stmmac)
      - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches
      - Marvell 88E6250 7 port switches
      - Microchip LAN8650/1 Rev.B0 PHYs
      - MediaTek MT7981/MT7988 built-in 1GE PHY driver

   - WiFi:
      - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps
      - Realtek RTL8723DS (SDIO variant)
      - Realtek RTL8851BE

   - CAN:
      - Fintek F81604

  Drivers:

   - Ethernet NICs:
      - Intel (100G, ice):
         - support dynamic interrupt allocation
         - use meta data match instead of VF MAC addr on slow-path
      - nVidia/Mellanox:
         - extend link aggregation to handle 4, rather than just 2 ports
         - spawn sub-functions without any features by default
      - OcteonTX2:
         - support HTB (Tx scheduling/QoS) offload
         - make RSS hash generation configurable
         - support selecting Rx queue using TC filters
      - Wangxun (ngbe/txgbe):
         - add basic Tx/Rx packet offloads
         - add phylink support (SFP/PCS control)
      - Freescale/NXP (enetc):
         - report TAPRIO packet statistics
      - Solarflare/AMD:
         - support matching on IP ToS and UDP source port of outer
           header
         - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6
         - add devlink dev info support for EF10

   - Virtual NICs:
      - Microsoft vNIC:
         - size the Rx indirection table based on requested
           configuration
         - support VLAN tagging
      - Amazon vNIC:
         - try to reuse Rx buffers if not fully consumed, useful for ARM
           servers running with 16kB pages
      - Google vNIC:
         - support TCP segmentation of >64kB frames

   - Ethernet embedded switches:
      - Marvell (mv88e6xxx):
         - enable USXGMII (88E6191X)
      - Microchip:
         - lan966x: add support for Egress Stage 0 ACL engine
         - lan966x: support mapping packet priority to internal switch
           priority (based on PCP or DSCP)

   - Ethernet PHYs:
      - Broadcom PHYs:
         - support for Wake-on-LAN for BCM54210E/B50212E
         - report LPI counter
      - Microsemi PHYs: support RGMII delay configuration (VSC85xx)
      - Micrel PHYs: receive timestamp in the frame (LAN8841)
      - Realtek PHYs: support optional external PHY clock
      - Altera TSE PCS: merge the driver into Lynx PCS which it is a
        variant of

   - CAN: Kvaser PCIEcan:
      - support packet timestamping

   - WiFi:
      - Intel (iwlwifi):
         - major update for new firmware and Multi-Link Operation (MLO)
         - configuration rework to drop test devices and split the
           different families
         - support for segmented PNVM images and power tables
         - new vendor entries for PPAG (platform antenna gain) feature
      - Qualcomm 802.11ax (ath11k):
         - Multiple Basic Service Set Identifier (MBSSID) and Enhanced
           MBSSID Advertisement (EMA) support in AP mode
         - support factory test mode
      - RealTek (rtw89):
         - add RSSI based antenna diversity
         - support U-NII-4 channels on 5 GHz band
      - RealTek (rtl8xxxu):
         - AP mode support for 8188f
         - support USB RX aggregation for the newer chips"

* tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits)
  net: scm: introduce and use scm_recv_unix helper
  af_unix: Skip SCM_PIDFD if scm->pid is NULL.
  net: lan743x: Simplify comparison
  netlink: Add __sock_i_ino() for __netlink_diag_dump().
  net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses
  Revert "af_unix: Call scm_recv() only after scm_set_cred()."
  phylink: ReST-ify the phylink_pcs_neg_mode() kdoc
  libceph: Partially revert changes to support MSG_SPLICE_PAGES
  net: phy: mscc: fix packet loss due to RGMII delays
  net: mana: use vmalloc_array and vcalloc
  net: enetc: use vmalloc_array and vcalloc
  ionic: use vmalloc_array and vcalloc
  pds_core: use vmalloc_array and vcalloc
  gve: use vmalloc_array and vcalloc
  octeon_ep: use vmalloc_array and vcalloc
  net: usb: qmi_wwan: add u-blox 0x1312 composition
  perf trace: fix MSG_SPLICE_PAGES build error
  ipvlan: Fix return value of ipvlan_queue_xmit()
  netfilter: nf_tables: fix underflow in chain reference counter
  netfilter: nf_tables: unbind non-anonymous set if rule construction fails
  ...
This commit is contained in:
Linus Torvalds 2023-06-28 16:43:10 -07:00
commit 3a8a670eee
1491 changed files with 98684 additions and 25408 deletions

View file

@ -13,6 +13,11 @@ Description:
Specifies the duration of the LED blink in milliseconds.
Defaults to 50 ms.
With hw_control ON, the interval value MUST be set to the
default value and cannot be changed.
Trying to set any value in this specific mode will return
an EINVAL error.
What: /sys/class/leds/<led>/link
Date: Dec 2017
KernelVersion: 4.16
@ -39,6 +44,9 @@ Description:
If set to 1, the LED will blink for the milliseconds specified
in interval to signal transmission.
With hw_control ON, the blink interval is controlled by hardware
and won't reflect the value set in interval.
What: /sys/class/leds/<led>/rx
Date: Dec 2017
KernelVersion: 4.16
@ -50,3 +58,84 @@ Description:
If set to 1, the LED will blink for the milliseconds specified
in interval to signal reception.
With hw_control ON, the blink interval is controlled by hardware
and won't reflect the value set in interval.
What: /sys/class/leds/<led>/hw_control
Date: Jun 2023
KernelVersion: 6.5
Contact: linux-leds@vger.kernel.org
Description:
Communicate whether the LED trigger modes are driven by hardware
or software fallback is used.
If 0, the LED is using software fallback to blink.
If 1, the LED is using hardware control to blink and signal the
requested modes.
What: /sys/class/leds/<led>/link_10
Date: Jun 2023
KernelVersion: 6.5
Contact: linux-leds@vger.kernel.org
Description:
Signal the link speed state of 10Mbps of the named network device.
If set to 0 (default), the LED's normal state is off.
If set to 1, the LED's normal state reflects the link state
speed of 10MBps of the named network device.
Setting this value also immediately changes the LED state.
What: /sys/class/leds/<led>/link_100
Date: Jun 2023
KernelVersion: 6.5
Contact: linux-leds@vger.kernel.org
Description:
Signal the link speed state of 100Mbps of the named network device.
If set to 0 (default), the LED's normal state is off.
If set to 1, the LED's normal state reflects the link state
speed of 100Mbps of the named network device.
Setting this value also immediately changes the LED state.
What: /sys/class/leds/<led>/link_1000
Date: Jun 2023
KernelVersion: 6.5
Contact: linux-leds@vger.kernel.org
Description:
Signal the link speed state of 1000Mbps of the named network device.
If set to 0 (default), the LED's normal state is off.
If set to 1, the LED's normal state reflects the link state
speed of 1000Mbps of the named network device.
Setting this value also immediately changes the LED state.
What: /sys/class/leds/<led>/half_duplex
Date: Jun 2023
KernelVersion: 6.5
Contact: linux-leds@vger.kernel.org
Description:
Signal the link half duplex state of the named network device.
If set to 0 (default), the LED's normal state is off.
If set to 1, the LED's normal state reflects the link half
duplex state of the named network device.
Setting this value also immediately changes the LED state.
What: /sys/class/leds/<led>/full_duplex
Date: Jun 2023
KernelVersion: 6.5
Contact: linux-leds@vger.kernel.org
Description:
Signal the link full duplex state of the named network device.
If set to 0 (default), the LED's normal state is off.
If set to 1, the LED's normal state reflects the link full
duplex state of the named network device.
Setting this value also immediately changes the LED state.

View file

@ -386,8 +386,8 @@ Default : 0 (for compatibility reasons)
txrehash
--------
Controls default hash rethink behaviour on listening socket when SO_TXREHASH
option is set to SOCK_TXREHASH_DEFAULT (i. e. not overridden by setsockopt).
Controls default hash rethink behaviour on socket when SO_TXREHASH option is set
to SOCK_TXREHASH_DEFAULT (i. e. not overridden by setsockopt).
If set to 1 (default), hash rethink is performed on listening socket.
If set to 0, hash rethink is not performed.

View file

@ -238,11 +238,8 @@ The following is the breakdown for each field in struct ``bpf_iter_reg``.
that the kernel function cond_resched() is called to avoid other kernel
subsystem (e.g., rcu) misbehaving.
* - seq_info
- Specifies certain action requests in the kernel BPF iterator
infrastructure. Currently, only BPF_ITER_RESCHED is supported. This means
that the kernel function cond_resched() is called to avoid other kernel
subsystem (e.g., rcu) misbehaving.
- Specifies the set of seq operations for the BPF iterator and helpers to
initialize/free the private data for the corresponding ``seq_file``.
`Click here
<https://lore.kernel.org/bpf/20210212183107.50963-2-songliubraving@fb.com/>`_

View file

@ -351,14 +351,15 @@ In addition to the above kfuncs, there is also a set of read-only kfuncs that
can be used to query the contents of cpumasks.
.. kernel-doc:: kernel/bpf/cpumask.c
:identifiers: bpf_cpumask_first bpf_cpumask_first_zero bpf_cpumask_test_cpu
:identifiers: bpf_cpumask_first bpf_cpumask_first_zero bpf_cpumask_first_and
bpf_cpumask_test_cpu
.. kernel-doc:: kernel/bpf/cpumask.c
:identifiers: bpf_cpumask_equal bpf_cpumask_intersects bpf_cpumask_subset
bpf_cpumask_empty bpf_cpumask_full
.. kernel-doc:: kernel/bpf/cpumask.c
:identifiers: bpf_cpumask_any bpf_cpumask_any_and
:identifiers: bpf_cpumask_any_distribute bpf_cpumask_any_and_distribute
----

View file

@ -163,13 +163,13 @@ BPF_MUL 0x20 dst \*= src
BPF_DIV 0x30 dst = (src != 0) ? (dst / src) : 0
BPF_OR 0x40 dst \|= src
BPF_AND 0x50 dst &= src
BPF_LSH 0x60 dst <<= src
BPF_RSH 0x70 dst >>= src
BPF_LSH 0x60 dst <<= (src & mask)
BPF_RSH 0x70 dst >>= (src & mask)
BPF_NEG 0x80 dst = ~src
BPF_MOD 0x90 dst = (src != 0) ? (dst % src) : dst
BPF_XOR 0xa0 dst ^= src
BPF_MOV 0xb0 dst = src
BPF_ARSH 0xc0 sign extending shift right
BPF_ARSH 0xc0 sign extending dst >>= (src & mask)
BPF_END 0xd0 byte swap operations (see `Byte swap instructions`_ below)
======== ===== ==========================================================
@ -204,6 +204,9 @@ for ``BPF_ALU64``, 'imm' is first sign extended to 64 bits and the result
interpreted as an unsigned 64-bit value. There are no instructions for
signed division or modulo.
Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31)
for 32-bit operations.
Byte swap instructions
~~~~~~~~~~~~~~~~~~~~~~

View file

@ -100,7 +100,7 @@ Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
size parameter, and the value of the constant matters for program safety, __k
suffix should be used.
2.2.2 __uninit Annotation
2.2.3 __uninit Annotation
-------------------------
This annotation is used to indicate that the argument will be treated as
@ -117,6 +117,27 @@ Here, the dynptr will be treated as an uninitialized dynptr. Without this
annotation, the verifier will reject the program if the dynptr passed in is
not initialized.
2.2.4 __opt Annotation
-------------------------
This annotation is used to indicate that the buffer associated with an __sz or __szk
argument may be null. If the function is passed a nullptr in place of the buffer,
the verifier will not check that length is appropriate for the buffer. The kfunc is
responsible for checking if this buffer is null before using it.
An example is given below::
__bpf_kfunc void *bpf_dynptr_slice(..., void *buffer__opt, u32 buffer__szk)
{
...
}
Here, the buffer may be null. If buffer is not null, it at least of size buffer_szk.
Either way, the returned buffer is either NULL, or of size buffer_szk. Without this
annotation, the verifier will reject the program if a null pointer is passed in with
a nonzero size.
.. _BPF_kfunc_nodef:
2.3 Using an existing kernel function
@ -206,23 +227,49 @@ absolutely no ABI stability guarantees.
As mentioned above, a nested pointer obtained from walking a trusted pointer is
no longer trusted, with one exception. If a struct type has a field that is
guaranteed to be valid as long as its parent pointer is trusted, the
``BTF_TYPE_SAFE_NESTED`` macro can be used to express that to the verifier as
follows:
guaranteed to be valid (trusted or rcu, as in KF_RCU description below) as long
as its parent pointer is valid, the following macros can be used to express
that to the verifier:
* ``BTF_TYPE_SAFE_TRUSTED``
* ``BTF_TYPE_SAFE_RCU``
* ``BTF_TYPE_SAFE_RCU_OR_NULL``
For example,
.. code-block:: c
BTF_TYPE_SAFE_NESTED(struct task_struct) {
BTF_TYPE_SAFE_TRUSTED(struct socket) {
struct sock *sk;
};
or
.. code-block:: c
BTF_TYPE_SAFE_RCU(struct task_struct) {
const cpumask_t *cpus_ptr;
struct css_set __rcu *cgroups;
struct task_struct __rcu *real_parent;
struct task_struct *group_leader;
};
In other words, you must:
1. Wrap the trusted pointer type in the ``BTF_TYPE_SAFE_NESTED`` macro.
1. Wrap the valid pointer type in a ``BTF_TYPE_SAFE_*`` macro.
2. Specify the type and name of the trusted nested field. This field must match
2. Specify the type and name of the valid nested field. This field must match
the field in the original type definition exactly.
A new type declared by a ``BTF_TYPE_SAFE_*`` macro also needs to be emitted so
that it appears in BTF. For example, ``BTF_TYPE_SAFE_TRUSTED(struct socket)``
is emitted in the ``type_is_trusted()`` function as follows:
.. code-block:: c
BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct socket));
2.4.5 KF_SLEEPABLE flag
-----------------------

View file

@ -48,7 +48,7 @@ the code with ``llvm-objdump -dr test.o``::
14: 0f 10 00 00 00 00 00 00 r0 += r1
15: 95 00 00 00 00 00 00 00 exit
There are four relations in the above for four ``LD_imm64`` instructions.
There are four relocations in the above for four ``LD_imm64`` instructions.
The following ``llvm-readelf -r test.o`` shows the binary values of the four
relocations::
@ -79,14 +79,16 @@ The following is the symbol table with ``llvm-readelf -s test.o``::
The 6th entry is global variable ``g1`` with value 0.
Similarly, the second relocation is at ``.text`` offset ``0x18``, instruction 3,
for global variable ``g2`` which has a symbol value 4, the offset
from the start of ``.data`` section.
has a type of ``R_BPF_64_64`` and refers to entry 7 in the symbol table.
The second relocation resolves to global variable ``g2`` which has a symbol
value 4. The symbol value represents the offset from the start of ``.data``
section where the initial value of the global variable ``g2`` is stored.
The third and fourth relocations refers to static variables ``l1``
and ``l2``. From ``.rel.text`` section above, it is not clear
which symbols they really refers to as they both refers to
The third and fourth relocations refer to static variables ``l1``
and ``l2``. From the ``.rel.text`` section above, it is not clear
to which symbols they really refer as they both refer to
symbol table entry 4, symbol ``sec``, which has ``STT_SECTION`` type
and represents a section. So for static variable or function,
and represents a section. So for a static variable or function,
the section offset is written to the original insn
buffer, which is called ``A`` (addend). Looking at
above insn ``7`` and ``11``, they have section offset ``8`` and ``12``.

View file

@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0-only
.. Copyright (C) 2022 Red Hat, Inc.
.. Copyright (C) 2022-2023 Isovalent, Inc.
===============================================
BPF_MAP_TYPE_HASH, with PERCPU and LRU Variants
@ -29,7 +30,16 @@ will automatically evict the least recently used entries when the hash
table reaches capacity. An LRU hash maintains an internal LRU list that
is used to select elements for eviction. This internal LRU list is
shared across CPUs but it is possible to request a per CPU LRU list with
the ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``.
the ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``. The
following table outlines the properties of LRU maps depending on the a
map type and the flags used to create the map.
======================== ========================= ================================
Flag ``BPF_MAP_TYPE_LRU_HASH`` ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
======================== ========================= ================================
**BPF_F_NO_COMMON_LRU** Per-CPU LRU, global map Per-CPU LRU, per-cpu map
**!BPF_F_NO_COMMON_LRU** Global LRU, global map Global LRU, per-cpu map
======================== ========================= ================================
Usage
=====
@ -206,3 +216,44 @@ Userspace walking the map elements from the map declared above:
cur_key = &next_key;
}
}
Internals
=========
This section of the document is targeted at Linux developers and describes
aspects of the map implementations that are not considered stable ABI. The
following details are subject to change in future versions of the kernel.
``BPF_MAP_TYPE_LRU_HASH`` and variants
--------------------------------------
Updating elements in LRU maps may trigger eviction behaviour when the capacity
of the map is reached. There are various steps that the update algorithm
attempts in order to enforce the LRU property which have increasing impacts on
other CPUs involved in the following operation attempts:
- Attempt to use CPU-local state to batch operations
- Attempt to fetch free nodes from global lists
- Attempt to pull any node from a global list and remove it from the hashmap
- Attempt to pull any node from any CPU's list and remove it from the hashmap
This algorithm is described visually in the following diagram. See the
description in commit 3a08c2fd7634 ("bpf: LRU List") for a full explanation of
the corresponding operations:
.. kernel-figure:: map_lru_hash_update.dot
:alt: Diagram outlining the LRU eviction steps taken during map update.
LRU hash eviction during map update for ``BPF_MAP_TYPE_LRU_HASH`` and
variants. See the dot file source for kernel function name code references.
Map updates start from the oval in the top right "begin ``bpf_map_update()``"
and progress through the graph towards the bottom where the result may be
either a successful update or a failure with various error codes. The key in
the top right provides indicators for which locks may be involved in specific
operations. This is intended as a visual hint for reasoning about how map
contention may impact update operations, though the map type and flags may
impact the actual contention on those locks, based on the logic described in
the table above. For instance, if the map is created with type
``BPF_MAP_TYPE_LRU_PERCPU_HASH`` and flags ``BPF_F_NO_COMMON_LRU`` then all map
properties would be per-cpu.

View file

@ -0,0 +1,172 @@
// SPDX-License-Identifier: GPL-2.0-only
// Copyright (C) 2022-2023 Isovalent, Inc.
digraph {
node [colorscheme=accent4,style=filled] # Apply colorscheme to all nodes
graph [splines=ortho, nodesep=1]
subgraph cluster_key {
label = "Key\n(locks held during operation)";
rankdir = TB;
remote_lock [shape=rectangle,fillcolor=4,label="remote CPU LRU lock"]
hash_lock [shape=rectangle,fillcolor=3,label="hashtab lock"]
lru_lock [shape=rectangle,fillcolor=2,label="LRU lock"]
local_lock [shape=rectangle,fillcolor=1,label="local CPU LRU lock"]
no_lock [shape=rectangle,label="no locks held"]
}
begin [shape=oval,label="begin\nbpf_map_update()"]
// Nodes below with an 'fn_' prefix are roughly labeled by the C function
// names that initiate the corresponding logic in kernel/bpf/bpf_lru_list.c.
// Number suffixes and errno suffixes handle subsections of the corresponding
// logic in the function as of the writing of this dot.
// cf. __local_list_pop_free() / bpf_percpu_lru_pop_free()
local_freelist_check [shape=diamond,fillcolor=1,
label="Local freelist\nnode available?"];
use_local_node [shape=rectangle,
label="Use node owned\nby this CPU"]
// cf. bpf_lru_pop_free()
common_lru_check [shape=diamond,
label="Map created with\ncommon LRU?\n(!BPF_F_NO_COMMON_LRU)"];
fn_bpf_lru_list_pop_free_to_local [shape=rectangle,fillcolor=2,
label="Flush local pending,
Rotate Global list, move
LOCAL_FREE_TARGET
from global -> local"]
// Also corresponds to:
// fn__local_list_flush()
// fn_bpf_lru_list_rotate()
fn___bpf_lru_node_move_to_free[shape=diamond,fillcolor=2,
label="Able to free\nLOCAL_FREE_TARGET\nnodes?"]
fn___bpf_lru_list_shrink_inactive [shape=rectangle,fillcolor=3,
label="Shrink inactive list
up to remaining
LOCAL_FREE_TARGET
(global LRU -> local)"]
fn___bpf_lru_list_shrink [shape=diamond,fillcolor=2,
label="> 0 entries in\nlocal free list?"]
fn___bpf_lru_list_shrink2 [shape=rectangle,fillcolor=2,
label="Steal one node from
inactive, or if empty,
from active global list"]
fn___bpf_lru_list_shrink3 [shape=rectangle,fillcolor=3,
label="Try to remove\nnode from hashtab"]
local_freelist_check2 [shape=diamond,label="Htab removal\nsuccessful?"]
common_lru_check2 [shape=diamond,
label="Map created with\ncommon LRU?\n(!BPF_F_NO_COMMON_LRU)"];
subgraph cluster_remote_lock {
label = "Iterate through CPUs\n(start from current)";
style = dashed;
rankdir=LR;
local_freelist_check5 [shape=diamond,fillcolor=4,
label="Steal a node from\nper-cpu freelist?"]
local_freelist_check6 [shape=rectangle,fillcolor=4,
label="Steal a node from
(1) Unreferenced pending, or
(2) Any pending node"]
local_freelist_check7 [shape=rectangle,fillcolor=3,
label="Try to remove\nnode from hashtab"]
fn_htab_lru_map_update_elem [shape=diamond,
label="Stole node\nfrom remote\nCPU?"]
fn_htab_lru_map_update_elem2 [shape=diamond,label="Iterated\nall CPUs?"]
// Also corresponds to:
// use_local_node()
// fn__local_list_pop_pending()
}
fn_bpf_lru_list_pop_free_to_local2 [shape=rectangle,
label="Use node that was\nnot recently referenced"]
local_freelist_check4 [shape=rectangle,
label="Use node that was\nactively referenced\nin global list"]
fn_htab_lru_map_update_elem_ENOMEM [shape=oval,label="return -ENOMEM"]
fn_htab_lru_map_update_elem3 [shape=rectangle,
label="Use node that was\nactively referenced\nin (another?) CPU's cache"]
fn_htab_lru_map_update_elem4 [shape=rectangle,fillcolor=3,
label="Update hashmap\nwith new element"]
fn_htab_lru_map_update_elem5 [shape=oval,label="return 0"]
fn_htab_lru_map_update_elem_EBUSY [shape=oval,label="return -EBUSY"]
fn_htab_lru_map_update_elem_EEXIST [shape=oval,label="return -EEXIST"]
fn_htab_lru_map_update_elem_ENOENT [shape=oval,label="return -ENOENT"]
begin -> local_freelist_check
local_freelist_check -> use_local_node [xlabel="Y"]
local_freelist_check -> common_lru_check [xlabel="N"]
common_lru_check -> fn_bpf_lru_list_pop_free_to_local [xlabel="Y"]
common_lru_check -> fn___bpf_lru_list_shrink_inactive [xlabel="N"]
fn_bpf_lru_list_pop_free_to_local -> fn___bpf_lru_node_move_to_free
fn___bpf_lru_node_move_to_free ->
fn_bpf_lru_list_pop_free_to_local2 [xlabel="Y"]
fn___bpf_lru_node_move_to_free ->
fn___bpf_lru_list_shrink_inactive [xlabel="N"]
fn___bpf_lru_list_shrink_inactive -> fn___bpf_lru_list_shrink
fn___bpf_lru_list_shrink -> fn_bpf_lru_list_pop_free_to_local2 [xlabel = "Y"]
fn___bpf_lru_list_shrink -> fn___bpf_lru_list_shrink2 [xlabel="N"]
fn___bpf_lru_list_shrink2 -> fn___bpf_lru_list_shrink3
fn___bpf_lru_list_shrink3 -> local_freelist_check2
local_freelist_check2 -> local_freelist_check4 [xlabel = "Y"]
local_freelist_check2 -> common_lru_check2 [xlabel = "N"]
common_lru_check2 -> local_freelist_check5 [xlabel = "Y"]
common_lru_check2 -> fn_htab_lru_map_update_elem_ENOMEM [xlabel = "N"]
local_freelist_check5 -> fn_htab_lru_map_update_elem [xlabel = "Y"]
local_freelist_check5 -> local_freelist_check6 [xlabel = "N"]
local_freelist_check6 -> local_freelist_check7
local_freelist_check7 -> fn_htab_lru_map_update_elem
fn_htab_lru_map_update_elem -> fn_htab_lru_map_update_elem3 [xlabel = "Y"]
fn_htab_lru_map_update_elem -> fn_htab_lru_map_update_elem2 [xlabel = "N"]
fn_htab_lru_map_update_elem2 ->
fn_htab_lru_map_update_elem_ENOMEM [xlabel = "Y"]
fn_htab_lru_map_update_elem2 -> local_freelist_check5 [xlabel = "N"]
fn_htab_lru_map_update_elem3 -> fn_htab_lru_map_update_elem4
use_local_node -> fn_htab_lru_map_update_elem4
fn_bpf_lru_list_pop_free_to_local2 -> fn_htab_lru_map_update_elem4
local_freelist_check4 -> fn_htab_lru_map_update_elem4
fn_htab_lru_map_update_elem4 -> fn_htab_lru_map_update_elem5 [headlabel="Success"]
fn_htab_lru_map_update_elem4 ->
fn_htab_lru_map_update_elem_EBUSY [xlabel="Hashtab lock failed"]
fn_htab_lru_map_update_elem4 ->
fn_htab_lru_map_update_elem_EEXIST [xlabel="BPF_EXIST set and\nkey already exists"]
fn_htab_lru_map_update_elem4 ->
fn_htab_lru_map_update_elem_ENOENT [headlabel="BPF_NOEXIST set\nand no such entry"]
// Create invisible pad nodes to line up various nodes
pad0 [style=invis]
pad1 [style=invis]
pad2 [style=invis]
pad3 [style=invis]
pad4 [style=invis]
// Line up the key with the top of the graph
no_lock -> local_lock [style=invis]
local_lock -> lru_lock [style=invis]
lru_lock -> hash_lock [style=invis]
hash_lock -> remote_lock [style=invis]
remote_lock -> local_freelist_check5 [style=invis]
remote_lock -> fn___bpf_lru_list_shrink [style=invis]
// Line up return code nodes at the bottom of the graph
fn_htab_lru_map_update_elem -> pad0 [style=invis]
pad0 -> pad1 [style=invis]
pad1 -> pad2 [style=invis]
//pad2-> fn_htab_lru_map_update_elem_ENOMEM [style=invis]
fn_htab_lru_map_update_elem4 -> pad3 [style=invis]
pad3 -> fn_htab_lru_map_update_elem5 [style=invis]
pad3 -> fn_htab_lru_map_update_elem_EBUSY [style=invis]
pad3 -> fn_htab_lru_map_update_elem_EEXIST [style=invis]
pad3 -> fn_htab_lru_map_update_elem_ENOENT [style=invis]
// Reduce diagram width by forcing some nodes to appear above others
local_freelist_check4 -> fn_htab_lru_map_update_elem3 [style=invis]
common_lru_check2 -> pad4 [style=invis]
pad4 -> local_freelist_check5 [style=invis]
}

View file

@ -240,11 +240,11 @@ offsets into ``msg``, respectively.
If a program of type ``BPF_PROG_TYPE_SK_MSG`` is run on a ``msg`` it can only
parse data that the (``data``, ``data_end``) pointers have already consumed.
For ``sendmsg()`` hooks this is likely the first scatterlist element. But for
calls relying on the ``sendpage`` handler (e.g., ``sendfile()``) this will be
the range (**0**, **0**) because the data is shared with user space and by
default the objective is to avoid allowing user space to modify data while (or
after) BPF verdict is being decided. This helper can be used to pull in data
and to set the start and end pointers to given values. Data will be copied if
calls relying on MSG_SPLICE_PAGES (e.g., ``sendfile()``) this will be the
range (**0**, **0**) because the data is shared with user space and by default
the objective is to avoid allowing user space to modify data while (or after)
BPF verdict is being decided. This helper can be used to pull in data and to
set the start and end pointers to given values. Data will be copied if
necessary (i.e., if data was not linear and if start and end pointers do not
point to the same chunk).

View file

@ -98,10 +98,65 @@ can access only the first ``PAGE_SIZE`` of that data. So it has to options:
indicates that the kernel should use BPF's trimmed ``optval``.
When the BPF program returns with the ``optlen`` greater than
``PAGE_SIZE``, the userspace will receive ``EFAULT`` errno.
``PAGE_SIZE``, the userspace will receive original kernel
buffers without any modifications that the BPF program might have
applied.
Example
=======
Recommended way to handle BPF programs is as follows:
.. code-block:: c
SEC("cgroup/getsockopt")
int getsockopt(struct bpf_sockopt *ctx)
{
/* Custom socket option. */
if (ctx->level == MY_SOL && ctx->optname == MY_OPTNAME) {
ctx->retval = 0;
optval[0] = ...;
ctx->optlen = 1;
return 1;
}
/* Modify kernel's socket option. */
if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
ctx->retval = 0;
optval[0] = ...;
ctx->optlen = 1;
return 1;
}
/* optval larger than PAGE_SIZE use kernel's buffer. */
if (ctx->optlen > PAGE_SIZE)
ctx->optlen = 0;
return 1;
}
SEC("cgroup/setsockopt")
int setsockopt(struct bpf_sockopt *ctx)
{
/* Custom socket option. */
if (ctx->level == MY_SOL && ctx->optname == MY_OPTNAME) {
/* do something */
ctx->optlen = -1;
return 1;
}
/* Modify kernel's socket option. */
if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
optval[0] = ...;
return 1;
}
/* optval larger than PAGE_SIZE use kernel's buffer. */
if (ctx->optlen > PAGE_SIZE)
ctx->optlen = 0;
return 1;
}
See ``tools/testing/selftests/bpf/progs/sockopt_sk.c`` for an example
of BPF program that handles socket options.

View file

@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
title: Allwinner A20 GMAC
allOf:
- $ref: "snps,dwmac.yaml#"
- $ref: snps,dwmac.yaml#
maintainers:
- Chen-Yu Tsai <wens@csie.org>

View file

@ -63,7 +63,7 @@ required:
- syscon
allOf:
- $ref: "snps,dwmac.yaml#"
- $ref: snps,dwmac.yaml#
- if:
properties:
compatible:

View file

@ -72,8 +72,8 @@ allOf:
compatible:
contains:
enum:
- const: altr,tse-1.0
- const: ALTR,tse-1.0
- altr,tse-1.0
- ALTR,tse-1.0
then:
properties:
reg:

View file

@ -27,7 +27,7 @@ select:
- compatible
allOf:
- $ref: "snps,dwmac.yaml#"
- $ref: snps,dwmac.yaml#
- if:
properties:
compatible:

View file

@ -50,6 +50,9 @@ properties:
vddch0-supply:
description: VDD_CH0 supply regulator handle
vddch1-supply:
description: VDD_CH1 supply regulator handle
vddaon-supply:
description: VDD_AON supply regulator handle

View file

@ -55,7 +55,7 @@ properties:
patternProperties:
"^mdio@[0-9a-f]+$":
type: object
$ref: "brcm,unimac-mdio.yaml"
$ref: brcm,unimac-mdio.yaml
description:
GENET internal UniMAC MDIO bus

View file

@ -109,6 +109,16 @@ properties:
power-domains:
maxItems: 1
cdns,rx-watermark:
$ref: /schemas/types.yaml#/definitions/uint32
description:
When the receive partial store and forward mode is activated,
the receiver will only begin to forward the packet to the external
AHB or AXI slave when enough packet data is stored in the SRAM packet buffer.
rx-watermark corresponds to the number of SRAM buffer locations,
that need to be filled, before the forwarding process is activated.
Width of the SRAM is platform dependent, and can be 4, 8 or 16 bytes.
'#address-cells':
const: 1
@ -166,6 +176,7 @@ examples:
compatible = "cdns,macb";
reg = <0xfffc4000 0x4000>;
interrupts = <21>;
cdns,rx-watermark = <0x44>;
phy-mode = "rmii";
local-mac-address = [3a 0e 03 04 05 06];
clock-names = "pclk", "hclk", "tx_clk";

View file

@ -20,7 +20,7 @@ which is at a different MDIO base address in different switch families.
6171, 6172, 6175, 6176, 6185, 6240, 6320, 6321,
6341, 6350, 6351, 6352
- "marvell,mv88e6190" : Switch has base address 0x00. Use with models:
6190, 6190X, 6191, 6290, 6390, 6390X
6163, 6190, 6190X, 6191, 6290, 6390, 6390X
- "marvell,mv88e6250" : Switch has base address 0x08 or 0x18. Use with model:
6220, 6250

View file

@ -12,10 +12,6 @@ description:
cs_sck_delay of 500ns. Ensuring that this SPI timing requirement is observed
depends on the SPI bus master driver.
allOf:
- $ref: dsa.yaml#/$defs/ethernet-ports
- $ref: /schemas/spi/spi-peripheral-props.yaml#
maintainers:
- Vladimir Oltean <vladimir.oltean@nxp.com>
@ -36,6 +32,9 @@ properties:
reg:
maxItems: 1
spi-cpha: true
spi-cpol: true
# Optional container node for the 2 internal MDIO buses of the SJA1110
# (one for the internal 100base-T1 PHYs and the other for the single
# 100base-TX PHY). The "reg" property does not have physical significance.
@ -109,6 +108,30 @@ $defs:
1860, 1880, 1900, 1920, 1940, 1960, 1980, 2000, 2020, 2040, 2060, 2080,
2100, 2120, 2140, 2160, 2180, 2200, 2220, 2240, 2260]
allOf:
- $ref: dsa.yaml#/$defs/ethernet-ports
- $ref: /schemas/spi/spi-peripheral-props.yaml#
- if:
properties:
compatible:
enum:
- nxp,sja1105e
- nxp,sja1105p
- nxp,sja1105q
- nxp,sja1105r
- nxp,sja1105s
- nxp,sja1105t
then:
properties:
spi-cpol: false
required:
- spi-cpha
else:
properties:
spi-cpha: false
required:
- spi-cpol
unevaluatedProperties: false
examples:
@ -120,6 +143,7 @@ examples:
ethernet-switch@1 {
reg = <0x1>;
compatible = "nxp,sja1105t";
spi-cpha;
ethernet-ports {
#address-cells = <1>;

View file

@ -93,6 +93,12 @@ properties:
the turn around line low at end of the control phase of the
MDIO transaction.
clocks:
maxItems: 1
description:
External clock connected to the PHY. If not specified it is assumed
that the PHY uses a fixed crystal or an internal oscillator.
enet-phy-lane-swap:
$ref: /schemas/types.yaml#/definitions/flag
description:

View file

@ -19,7 +19,7 @@ select:
- compatible
allOf:
- $ref: "snps,dwmac.yaml#"
- $ref: snps,dwmac.yaml#
properties:
compatible:

View file

@ -17,11 +17,12 @@ properties:
maxlinear,use-broken-interrupts:
description: |
Interrupts are broken on some GPY2xx PHYs in that they keep the
interrupt line asserted even after the interrupt status register is
cleared. Thus it is blocking the interrupt line which is usually bad
for shared lines. By default interrupts are disabled for this PHY and
polling mode is used. If one can live with the consequences, this
property can be used to enable interrupt handling.
interrupt line asserted for a random amount of time even after the
interrupt status register is cleared. Thus it is blocking the
interrupt line which is usually bad for shared lines. By default,
interrupts are disabled for this PHY and polling mode is used. If one
can live with the consequences, this property can be used to enable
interrupt handling.
Affected PHYs (as far as known) are GPY215B and GPY215C.
type: boolean

View file

@ -25,7 +25,7 @@ select:
- compatible
allOf:
- $ref: "snps,dwmac.yaml#"
- $ref: snps,dwmac.yaml#
properties:
compatible:

View file

@ -44,13 +44,13 @@ required:
allOf:
- $ref: ethernet-controller.yaml#
- $ref: /schemas/memory-controllers/mc-peripheral-props.yaml#
- if:
properties:
compatible:
contains:
const: micrel,ks8851
then:
$ref: /schemas/spi/spi-peripheral-props.yaml#
properties:
reg:
maxItems: 1
@ -60,6 +60,7 @@ allOf:
contains:
const: micrel,ks8851-mll
then:
$ref: /schemas/memory-controllers/mc-peripheral-props.yaml#
properties:
reg:
minItems: 2

View file

@ -24,7 +24,7 @@ select:
- compatible
allOf:
- $ref: "snps,dwmac.yaml#"
- $ref: snps,dwmac.yaml#
properties:
compatible:

View file

@ -16,7 +16,7 @@ maintainers:
properties:
$nodename:
pattern: "^ethernet-pse(@.*)?$"
pattern: "^ethernet-pse(@.*|-([0-9]|[1-9][0-9]+))?$"
"#pse-cells":
description:

View file

@ -20,6 +20,7 @@ properties:
compatible:
enum:
- qcom,qcs404-ethqos
- qcom,sa8775p-ethqos
- qcom,sc8280xp-ethqos
- qcom,sm8150-ethqos
@ -32,11 +33,13 @@ properties:
- const: rgmii
interrupts:
minItems: 1
items:
- description: Combined signal for various interrupt events
- description: The interrupt that occurs when Rx exits the LPI state
interrupt-names:
minItems: 1
items:
- const: macirq
- const: eth_lpi
@ -49,11 +52,18 @@ properties:
- const: stmmaceth
- const: pclk
- const: ptp_ref
- const: rgmii
- enum:
- rgmii
- phyaux
iommus:
maxItems: 1
phys: true
phy-names:
const: serdes
required:
- compatible
- clocks

View file

@ -32,7 +32,7 @@ select:
- compatible
allOf:
- $ref: "snps,dwmac.yaml#"
- $ref: snps,dwmac.yaml#
properties:
compatible:

View file

@ -67,6 +67,7 @@ properties:
- loongson,ls2k-dwmac
- loongson,ls7a-dwmac
- qcom,qcs404-ethqos
- qcom,sa8775p-ethqos
- qcom,sc8280xp-ethqos
- qcom,sm8150-ethqos
- renesas,r9a06g032-gmac
@ -582,6 +583,7 @@ allOf:
- ingenic,x1600-mac
- ingenic,x1830-mac
- ingenic,x2000-mac
- qcom,sa8775p-ethqos
- qcom,sc8280xp-ethqos
- snps,dwmac-3.50a
- snps,dwmac-4.10a
@ -638,6 +640,7 @@ allOf:
- ingenic,x1830-mac
- ingenic,x2000-mac
- qcom,qcs404-ethqos
- qcom,sa8775p-ethqos
- qcom,sc8280xp-ethqos
- qcom,sm8150-ethqos
- snps,dwmac-4.00

View file

@ -168,14 +168,14 @@ properties:
patternProperties:
"^mdio@[0-9a-f]+$":
type: object
$ref: "ti,davinci-mdio.yaml#"
$ref: ti,davinci-mdio.yaml#
description:
CPSW MDIO bus.
"^cpts@[0-9a-f]+":
type: object
$ref: "ti,k3-am654-cpts.yaml#"
$ref: ti,k3-am654-cpts.yaml#
description:
CPSW Common Platform Time Sync (CPTS) module.

View file

@ -19,7 +19,7 @@ select:
- compatible
allOf:
- $ref: "snps,dwmac.yaml#"
- $ref: snps,dwmac.yaml#
properties:
compatible:

View file

@ -84,6 +84,8 @@ properties:
required:
- iommus
ieee80211-freq-limit: true
qcom,ath10k-calibration-data:
$ref: /schemas/types.yaml#/definitions/uint8-array
description:
@ -164,6 +166,7 @@ required:
additionalProperties: false
allOf:
- $ref: ieee80211.yaml#
- if:
properties:
compatible:
@ -355,4 +358,5 @@ examples:
"msi14",
"msi15",
"legacy";
ieee80211-freq-limit = <5470000 5875000>;
};

View file

@ -1,101 +0,0 @@
XILINX AXI ETHERNET Device Tree Bindings
--------------------------------------------------------
Also called AXI 1G/2.5G Ethernet Subsystem, the xilinx axi ethernet IP core
provides connectivity to an external ethernet PHY supporting different
interfaces: MII, GMII, RGMII, SGMII, 1000BaseX. It also includes two
segments of memory for buffering TX and RX, as well as the capability of
offloading TX/RX checksum calculation off the processor.
Management configuration is done through the AXI interface, while payload is
sent and received through means of an AXI DMA controller. This driver
includes the DMA driver code, so this driver is incompatible with AXI DMA
driver.
For more details about mdio please refer phy.txt file in the same directory.
Required properties:
- compatible : Must be one of "xlnx,axi-ethernet-1.00.a",
"xlnx,axi-ethernet-1.01.a", "xlnx,axi-ethernet-2.01.a"
- reg : Address and length of the IO space, as well as the address
and length of the AXI DMA controller IO space, unless
axistream-connected is specified, in which case the reg
attribute of the node referenced by it is used.
- interrupts : Should be a list of 2 or 3 interrupts: TX DMA, RX DMA,
and optionally Ethernet core. If axistream-connected is
specified, the TX/RX DMA interrupts should be on that node
instead, and only the Ethernet core interrupt is optionally
specified here.
- phy-handle : Should point to the external phy device if exists. Pointing
this to the PCS/PMA PHY is deprecated and should be avoided.
See ethernet.txt file in the same directory.
- xlnx,rxmem : Set to allocated memory buffer for Rx/Tx in the hardware
Optional properties:
- phy-mode : See ethernet.txt
- xlnx,phy-type : Deprecated, do not use, but still accepted in preference
to phy-mode.
- xlnx,txcsum : 0 or empty for disabling TX checksum offload,
1 to enable partial TX checksum offload,
2 to enable full TX checksum offload
- xlnx,rxcsum : Same values as xlnx,txcsum but for RX checksum offload
- xlnx,switch-x-sgmii : Boolean to indicate the Ethernet core is configured to
support both 1000BaseX and SGMII modes. If set, the phy-mode
should be set to match the mode selected on core reset (i.e.
by the basex_or_sgmii core input line).
- clock-names: Tuple listing input clock names. Possible clocks:
s_axi_lite_clk: Clock for AXI register slave interface
axis_clk: AXI4-Stream clock for TXD RXD TXC and RXS interfaces
ref_clk: Ethernet reference clock, used by signal delay
primitives and transceivers
mgt_clk: MGT reference clock (used by optional internal
PCS/PMA PHY)
Note that if s_axi_lite_clk is not specified by name, the
first clock of any name is used for this. If that is also not
specified, the clock rate is auto-detected from the CPU clock
(but only on platforms where this is possible). New device
trees should specify all applicable clocks by name - the
fallbacks to an unnamed clock or to CPU clock are only for
backward compatibility.
- clocks: Phandles to input clocks matching clock-names. Refer to common
clock bindings.
- axistream-connected: Reference to another node which contains the resources
for the AXI DMA controller used by this device.
If this is specified, the DMA-related resources from that
device (DMA registers and DMA TX/RX interrupts) rather
than this one will be used.
- mdio : Child node for MDIO bus. Must be defined if PHY access is
required through the core's MDIO interface (i.e. always,
unless the PHY is accessed through a different bus).
Non-standard MDIO bus frequency is supported via
"clock-frequency", see mdio.yaml.
- pcs-handle: Phandle to the internal PCS/PMA PHY in SGMII or 1000Base-X
modes, where "pcs-handle" should be used to point
to the PCS/PMA PHY, and "phy-handle" should point to an
external PHY if exists.
Example:
axi_ethernet_eth: ethernet@40c00000 {
compatible = "xlnx,axi-ethernet-1.00.a";
device_type = "network";
interrupt-parent = <&microblaze_0_axi_intc>;
interrupts = <2 0 1>;
clock-names = "s_axi_lite_clk", "axis_clk", "ref_clk", "mgt_clk";
clocks = <&axi_clk>, <&axi_clk>, <&pl_enet_ref_clk>, <&mgt_clk>;
phy-mode = "mii";
reg = <0x40c00000 0x40000 0x50c00000 0x40000>;
xlnx,rxcsum = <0x2>;
xlnx,rxmem = <0x800>;
xlnx,txcsum = <0x2>;
phy-handle = <&phy0>;
axi_ethernetlite_0_mdio: mdio {
#address-cells = <1>;
#size-cells = <0>;
phy0: phy@0 {
device_type = "ethernet-phy";
reg = <1>;
};
};
};

View file

@ -0,0 +1,183 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/net/xlnx,axi-ethernet.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: AXI 1G/2.5G Ethernet Subsystem
description: |
Also called AXI 1G/2.5G Ethernet Subsystem, the xilinx axi ethernet IP core
provides connectivity to an external ethernet PHY supporting different
interfaces: MII, GMII, RGMII, SGMII, 1000BaseX. It also includes two
segments of memory for buffering TX and RX, as well as the capability of
offloading TX/RX checksum calculation off the processor.
Management configuration is done through the AXI interface, while payload is
sent and received through means of an AXI DMA controller. This driver
includes the DMA driver code, so this driver is incompatible with AXI DMA
driver.
maintainers:
- Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
properties:
compatible:
enum:
- xlnx,axi-ethernet-1.00.a
- xlnx,axi-ethernet-1.01.a
- xlnx,axi-ethernet-2.01.a
reg:
description:
Address and length of the IO space, as well as the address
and length of the AXI DMA controller IO space, unless
axistream-connected is specified, in which case the reg
attribute of the node referenced by it is used.
maxItems: 2
interrupts:
items:
- description: Ethernet core interrupt
- description: Tx DMA interrupt
- description: Rx DMA interrupt
description:
Ethernet core interrupt is optional. If axistream-connected property is
present DMA node should contains TX/RX DMA interrupts else DMA interrupt
resources are mentioned on ethernet node.
minItems: 1
phy-handle: true
xlnx,rxmem:
description:
Set to allocated memory buffer for Rx/Tx in the hardware.
$ref: /schemas/types.yaml#/definitions/uint32
phy-mode:
enum:
- mii
- gmii
- rgmii
- sgmii
- 1000BaseX
xlnx,phy-type:
description:
Do not use, but still accepted in preference to phy-mode.
deprecated: true
$ref: /schemas/types.yaml#/definitions/uint32
xlnx,txcsum:
description:
TX checksum offload. 0 or empty for disabling TX checksum offload,
1 to enable partial TX checksum offload and 2 to enable full TX
checksum offload.
$ref: /schemas/types.yaml#/definitions/uint32
enum: [0, 1, 2]
xlnx,rxcsum:
description:
RX checksum offload. 0 or empty for disabling RX checksum offload,
1 to enable partial RX checksum offload and 2 to enable full RX
checksum offload.
$ref: /schemas/types.yaml#/definitions/uint32
enum: [0, 1, 2]
xlnx,switch-x-sgmii:
type: boolean
description:
Indicate the Ethernet core is configured to support both 1000BaseX and
SGMII modes. If set, the phy-mode should be set to match the mode
selected on core reset (i.e. by the basex_or_sgmii core input line).
clocks:
items:
- description: Clock for AXI register slave interface.
- description: AXI4-Stream clock for TXD RXD TXC and RXS interfaces.
- description: Ethernet reference clock, used by signal delay primitives
and transceivers.
- description: MGT reference clock (used by optional internal PCS/PMA PHY)
clock-names:
items:
- const: s_axi_lite_clk
- const: axis_clk
- const: ref_clk
- const: mgt_clk
axistream-connected:
$ref: /schemas/types.yaml#/definitions/phandle
description: Phandle of AXI DMA controller which contains the resources
used by this device. If this is specified, the DMA-related resources
from that device (DMA registers and DMA TX/RX interrupts) rather than
this one will be used.
mdio:
type: object
pcs-handle:
description: Phandle to the internal PCS/PMA PHY in SGMII or 1000Base-X
modes, where "pcs-handle" should be used to point to the PCS/PMA PHY,
and "phy-handle" should point to an external PHY if exists.
maxItems: 1
required:
- compatible
- interrupts
- reg
- xlnx,rxmem
- phy-handle
allOf:
- $ref: /schemas/net/ethernet-controller.yaml#
additionalProperties: false
examples:
- |
axi_ethernet_eth: ethernet@40c00000 {
compatible = "xlnx,axi-ethernet-1.00.a";
interrupts = <2 0 1>;
clock-names = "s_axi_lite_clk", "axis_clk", "ref_clk", "mgt_clk";
clocks = <&axi_clk>, <&axi_clk>, <&pl_enet_ref_clk>, <&mgt_clk>;
phy-mode = "mii";
reg = <0x40c00000 0x40000>,<0x50c00000 0x40000>;
xlnx,rxcsum = <0x2>;
xlnx,rxmem = <0x800>;
xlnx,txcsum = <0x2>;
phy-handle = <&phy0>;
mdio {
#address-cells = <1>;
#size-cells = <0>;
phy0: ethernet-phy@1 {
device_type = "ethernet-phy";
reg = <1>;
};
};
};
- |
axi_ethernet_eth1: ethernet@40000000 {
compatible = "xlnx,axi-ethernet-1.00.a";
interrupts = <0>;
clock-names = "s_axi_lite_clk", "axis_clk", "ref_clk", "mgt_clk";
clocks = <&axi_clk>, <&axi_clk>, <&pl_enet_ref_clk>, <&mgt_clk>;
phy-mode = "mii";
reg = <0x00 0x40000000 0x00 0x40000>;
xlnx,rxcsum = <0x2>;
xlnx,rxmem = <0x800>;
xlnx,txcsum = <0x2>;
phy-handle = <&phy1>;
axistream-connected = <&dma>;
mdio {
#address-cells = <1>;
#size-cells = <0>;
phy1: ethernet-phy@1 {
device_type = "ethernet-phy";
reg = <1>;
};
};
};

View file

@ -73,6 +73,22 @@ Writing clock drivers
class driver, since the lock may also be needed by the clock
driver's interrupt service routine.
PTP hardware clock requirements for '.adjphase'
-----------------------------------------------
The 'struct ptp_clock_info' interface has a '.adjphase' function.
This function has a set of requirements from the PHC in order to be
implemented.
* The PHC implements a servo algorithm internally that is used to
correct the offset passed in the '.adjphase' call.
* When other PTP adjustment functions are called, the PHC servo
algorithm is disabled.
**NOTE:** '.adjphase' is not a simple time adjustment functionality
that 'jumps' the PHC clock time based on the provided offset. It
should correct the offset provided using an internal algorithm.
Supported hardware
==================
@ -106,3 +122,16 @@ Supported hardware
- LPF settings (bandwidth, phase limiting, automatic holdover, physical layer assist (per ITU-T G.8273.2))
- Programmable output PTP clocks, any frequency up to 1GHz (to other PHY/MAC time stampers, refclk to ASSPs/SoCs/FPGAs)
- Lock to GNSS input, automatic switching between GNSS and user-space PHC control (optional)
* NVIDIA Mellanox
- GPIO
- Certain variants of ConnectX-6 Dx and later products support one
GPIO which can time stamp external triggers and one GPIO to produce
periodic signals.
- Certain variants of ConnectX-5 and older products support one GPIO,
configured to either time stamp external triggers or produce
periodic signals.
- PHC instances
- All ConnectX devices have a free-running counter
- ConnectX-6 Dx and later devices have a UTC format counter

View file

@ -521,8 +521,6 @@ prototypes::
int (*fsync) (struct file *, loff_t start, loff_t end, int datasync);
int (*fasync) (int, struct file *, int);
int (*lock) (struct file *, int, struct file_lock *);
ssize_t (*sendpage) (struct file *, struct page *, int, size_t,
loff_t *, int);
unsigned long (*get_unmapped_area)(struct file *, unsigned long,
unsigned long, unsigned long, unsigned long);
int (*check_flags)(int);

View file

@ -1086,7 +1086,6 @@ This describes how the VFS can manipulate an open file. As of kernel
int (*fsync) (struct file *, loff_t, loff_t, int datasync);
int (*fasync) (int, struct file *, int);
int (*lock) (struct file *, int, struct file_lock *);
ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
int (*check_flags)(int);
int (*flock) (struct file *, int, struct file_lock *);

View file

@ -169,6 +169,87 @@ Setting the brightness to zero with brightness_set() callback function
should completely turn off the LED and cancel the previously programmed
hardware blinking function, if any.
Hardware driven LEDs
====================
Some LEDs can be programmed to be driven by hardware. This is not
limited to blink but also to turn off or on autonomously.
To support this feature, a LED needs to implement various additional
ops and needs to declare specific support for the supported triggers.
With hw control we refer to the LED driven by hardware.
LED driver must define the following value to support hw control:
- hw_control_trigger:
unique trigger name supported by the LED in hw control
mode.
LED driver must implement the following API to support hw control:
- hw_control_is_supported:
check if the flags passed by the supported trigger can
be parsed and activate hw control on the LED.
Return 0 if the passed flags mask is supported and
can be set with hw_control_set().
If the passed flags mask is not supported -EOPNOTSUPP
must be returned, the LED trigger will use software
fallback in this case.
Return a negative error in case of any other error like
device not ready or timeouts.
- hw_control_set:
activate hw control. LED driver will use the provided
flags passed from the supported trigger, parse them to
a set of mode and setup the LED to be driven by hardware
following the requested modes.
Set LED_OFF via the brightness_set to deactivate hw control.
Return 0 on success, a negative error number on failing to
apply flags.
- hw_control_get:
get active modes from a LED already in hw control, parse
them and set in flags the current active flags for the
supported trigger.
Return 0 on success, a negative error number on failing
parsing the initial mode.
Error from this function is NOT FATAL as the device may
be in a not supported initial state by the attached LED
trigger.
- hw_control_get_device:
return the device associated with the LED driver in
hw control. A trigger might use this to match the
returned device from this function with a configured
device for the trigger as the source for blinking
events and correctly enable hw control.
(example a netdev trigger configured to blink for a
particular dev match the returned dev from get_device
to set hw control)
Returns a pointer to a struct device or NULL if nothing
is currently attached.
LED driver can activate additional modes by default to workaround the
impossibility of supporting each different mode on the supported trigger.
Examples are hardcoding the blink speed to a set interval, enable special
feature like bypassing blink if some requirements are not met.
A trigger should first check if the hw control API are supported by the LED
driver and check if the trigger is supported to verify if hw control is possible,
use hw_control_is_supported to check if the flags are supported and only at
the end use hw_control_set to activate hw control.
A trigger can use hw_control_get to check if a LED is already in hw control
and init their flags.
When the LED is in hw control, no software blink is possible and doing so
will effectively disable hw control.
Known Issues
============

View file

@ -195,6 +195,16 @@ properties:
description: Max length for a string or a binary attribute.
$ref: '#/$defs/len-or-define'
sub-type: *attr-type
display-hint: &display-hint
description: |
Optional format indicator that is intended only for choosing
the right formatting mechanism when displaying values of this
type.
enum: [ hex, mac, fddi, ipv4, ipv6, uuid ]
# Start genetlink-c
name-prefix:
type: string
# End genetlink-c
# Make sure name-prefix does not appear in subsets (subsets inherit naming)
dependencies:

View file

@ -119,9 +119,24 @@ properties:
name:
type: string
type:
enum: [ u8, u16, u32, u64, s8, s16, s32, s64, string ]
description: The netlink attribute type
enum: [ u8, u16, u32, u64, s8, s16, s32, s64, string, binary ]
len:
$ref: '#/$defs/len-or-define'
byte-order:
enum: [ little-endian, big-endian ]
doc:
description: Documentation for the struct member attribute.
type: string
enum:
description: Name of the enum type used for the attribute.
type: string
display-hint: &display-hint
description: |
Optional format indicator that is intended only for choosing
the right formatting mechanism when displaying values of this
type.
enum: [ hex, mac, fddi, ipv4, ipv6, uuid ]
# End genetlink-legacy
attribute-sets:
@ -171,6 +186,7 @@ properties:
name:
type: string
type: &attr-type
description: The netlink attribute type
enum: [ unused, pad, flag, binary, u8, u16, u32, u64, s32, s64,
string, nest, array-nest, nest-type-value ]
doc:
@ -218,6 +234,11 @@ properties:
description: Max length for a string or a binary attribute.
$ref: '#/$defs/len-or-define'
sub-type: *attr-type
display-hint: *display-hint
# Start genetlink-c
name-prefix:
type: string
# End genetlink-c
# Start genetlink-legacy
struct:
description: Name of the struct type used for the attribute.

View file

@ -168,6 +168,12 @@ properties:
description: Max length for a string or a binary attribute.
$ref: '#/$defs/len-or-define'
sub-type: *attr-type
display-hint: &display-hint
description: |
Optional format indicator that is intended only for choosing
the right formatting mechanism when displaying values of this
type.
enum: [ hex, mac, fddi, ipv4, ipv6, uuid ]
# Make sure name-prefix does not appear in subsets (subsets inherit naming)
dependencies:

View file

@ -9,6 +9,7 @@ doc: Partial family for Devlink.
attribute-sets:
-
name: devlink
name-prefix: devlink-attr-
attributes:
-
name: bus-name
@ -95,10 +96,12 @@ attribute-sets:
-
name: reload-action-info
type: nest
multi-attr: true
nested-attributes: dl-reload-act-info
-
name: reload-action-stats
type: nest
multi-attr: true
nested-attributes: dl-reload-act-stats
-
name: dl-dev-stats
@ -196,3 +199,8 @@ operations:
attributes:
- bus-name
- dev-name
- info-driver-name
- info-serial-number
- info-version-fixed
- info-version-running
- info-version-stored

View file

@ -9,8 +9,13 @@ doc: Partial family for Ethtool Netlink.
definitions:
-
name: udp-tunnel-type
enum-name:
type: enum
entries: [ vxlan, geneve, vxlan-gpe ]
-
name: stringset
type: enum
entries: []
attribute-sets:
-
@ -497,7 +502,7 @@ attribute-sets:
attributes:
-
name: pad
type: u32
type: pad
-
name: tx-frames
type: u64
@ -577,7 +582,7 @@ attribute-sets:
name: phc-index
type: u32
-
name: cable-test-ntf-nest-result
name: cable-result
attributes:
-
name: pair
@ -586,7 +591,7 @@ attribute-sets:
name: code
type: u8
-
name: cable-test-ntf-nest-fault-length
name: cable-fault-length
attributes:
-
name: pair
@ -595,18 +600,25 @@ attribute-sets:
name: cm
type: u32
-
name: cable-test-ntf-nest
name: cable-nest
attributes:
-
name: result
type: nest
nested-attributes: cable-test-ntf-nest-result
nested-attributes: cable-result
-
name: fault-length
type: nest
nested-attributes: cable-test-ntf-nest-fault-length
nested-attributes: cable-fault-length
-
name: cable-test
attributes:
-
name: header
type: nest
nested-attributes: header
-
name: cable-test-ntf
attributes:
-
name: header
@ -618,7 +630,7 @@ attribute-sets:
-
name: nest
type: nest
nested-attributes: cable-test-ntf-nest
nested-attributes: cable-nest
-
name: cable-test-tdr-cfg
attributes:
@ -632,8 +644,22 @@ attribute-sets:
name: step
type: u32
-
name: pari
name: pair
type: u8
-
name: cable-test-tdr-ntf
attributes:
-
name: header
type: nest
nested-attributes: header
-
name: status
type: u8
-
name: nest
type: nest
nested-attributes: cable-nest
-
name: cable-test-tdr
attributes:
@ -646,7 +672,7 @@ attribute-sets:
type: nest
nested-attributes: cable-test-tdr-cfg
-
name: tunnel-info-udp-entry
name: tunnel-udp-entry
attributes:
-
name: port
@ -657,7 +683,7 @@ attribute-sets:
type: u32
enum: udp-tunnel-type
-
name: tunnel-info-udp-table
name: tunnel-udp-table
attributes:
-
name: size
@ -667,9 +693,17 @@ attribute-sets:
type: nest
nested-attributes: bitset
-
name: udp-ports
name: entry
type: nest
nested-attributes: tunnel-info-udp-entry
multi-attr: true
nested-attributes: tunnel-udp-entry
-
name: tunnel-udp
attributes:
-
name: table
type: nest
nested-attributes: tunnel-udp-table
-
name: tunnel-info
attributes:
@ -680,13 +714,13 @@ attribute-sets:
-
name: udp-ports
type: nest
nested-attributes: tunnel-info-udp-table
nested-attributes: tunnel-udp
-
name: fec-stat
attributes:
-
name: pad
type: u8
type: pad
-
name: corrected
type: binary
@ -750,7 +784,7 @@ attribute-sets:
attributes:
-
name: pad
type: u32
type: pad
-
name: id
type: u32
@ -759,16 +793,29 @@ attribute-sets:
type: u32
-
name: stat
type: nest
nested-attributes: u64
type: u64
type-value: [ id ]
-
name: hist-rx
type: nest
nested-attributes: u64
nested-attributes: stats-grp-hist
-
name: hist-tx
type: nest
nested-attributes: u64
nested-attributes: stats-grp-hist
-
name: hist-bkt-low
type: u32
-
name: hist-bkt-hi
type: u32
-
name: hist-val
type: u64
-
name: stats-grp-hist
subset-of: stats-grp
attributes:
-
name: hist-bkt-low
type: u32
@ -783,7 +830,7 @@ attribute-sets:
attributes:
-
name: pad
type: u32
type: pad
-
name: header
type: nest
@ -836,12 +883,15 @@ attribute-sets:
-
name: admin-state
type: u32
name-prefix: ethtool-a-podl-pse-
-
name: admin-control
type: u32
name-prefix: ethtool-a-podl-pse-
-
name: pw-d-status
type: u32
name-prefix: ethtool-a-podl-pse-
-
name: rss
attributes:
@ -895,6 +945,7 @@ attribute-sets:
operations:
enum-model: directional
name-prefix: ethtool-msg-
list:
-
name: strset-get
@ -1348,10 +1399,16 @@ operations:
request:
attributes:
- header
reply:
attributes:
- header
- cable-test-ntf-nest
-
name: cable-test-ntf
doc: Cable test notification.
attribute-set: cable-test-ntf
event:
attributes:
- header
- status
-
name: cable-test-tdr-act
doc: Cable test TDR.
@ -1362,10 +1419,17 @@ operations:
request:
attributes:
- header
reply:
attributes:
- header
- cable-test-tdr-cfg
-
name: cable-test-tdr-ntf
doc: Cable test TDR notification.
attribute-set: cable-test-tdr-ntf
event:
attributes:
- header
- status
- nest
-
name: tunnel-info-get
doc: Get tsinfo params.

View file

@ -3,6 +3,7 @@
name: ovs_datapath
version: 2
protocol: genetlink-legacy
uapi-header: linux/openvswitch.h
doc:
OVS datapath configuration over generic netlink.
@ -18,6 +19,7 @@ definitions:
-
name: user-features
type: flags
name-prefix: ovs-dp-f-
entries:
-
name: unaligned
@ -33,35 +35,37 @@ definitions:
doc: Allow per-cpu dispatch of upcalls
-
name: datapath-stats
enum-name: ovs-dp-stats
type: struct
members:
-
name: hit
name: n-hit
type: u64
-
name: missed
name: n-missed
type: u64
-
name: lost
name: n-lost
type: u64
-
name: flows
name: n-flows
type: u64
-
name: megaflow-stats
enum-name: ovs-dp-megaflow-stats
type: struct
members:
-
name: mask-hit
name: n-mask-hit
type: u64
-
name: masks
name: n-masks
type: u32
-
name: padding
type: u32
-
name: cache-hits
name: n-cache-hit
type: u64
-
name: pad1
@ -70,6 +74,8 @@ definitions:
attribute-sets:
-
name: datapath
name-prefix: ovs-dp-attr-
enum-name: ovs-datapath-attrs
attributes:
-
name: name
@ -101,12 +107,16 @@ attribute-sets:
name: per-cpu-pids
type: binary
sub-type: u32
-
name: ifindex
type: u32
operations:
fixed-header: ovs-header
name-prefix: ovs-dp-cmd-
list:
-
name: dp-get
name: get
doc: Get / dump OVS data path configuration and state
value: 3
attribute-set: datapath
@ -125,7 +135,7 @@ operations:
- per-cpu-pids
dump: *dp-get-op
-
name: dp-new
name: new
doc: Create new OVS data path
value: 1
attribute-set: datapath
@ -137,7 +147,7 @@ operations:
- upcall-pid
- user-features
-
name: dp-del
name: del
doc: Delete existing OVS data path
value: 2
attribute-set: datapath

View file

@ -0,0 +1,980 @@
# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
name: ovs_flow
version: 1
protocol: genetlink-legacy
uapi-header: linux/openvswitch.h
doc:
OVS flow configuration over generic netlink.
definitions:
-
name: ovs-header
type: struct
doc: |
Header for OVS Generic Netlink messages.
members:
-
name: dp-ifindex
type: u32
doc: |
ifindex of local port for datapath (0 to make a request not specific
to a datapath).
-
name: ovs-flow-stats
type: struct
members:
-
name: n-packets
type: u64
doc: Number of matched packets.
-
name: n-bytes
type: u64
doc: Number of matched bytes.
-
name: ovs-key-ethernet
type: struct
members:
-
name: eth-src
type: binary
len: 6
display-hint: mac
-
name: eth-dst
type: binary
len: 6
display-hint: mac
-
name: ovs-key-mpls
type: struct
members:
-
name: mpls-lse
type: u32
byte-order: big-endian
-
name: ovs-key-ipv4
type: struct
members:
-
name: ipv4-src
type: u32
byte-order: big-endian
display-hint: ipv4
-
name: ipv4-dst
type: u32
byte-order: big-endian
display-hint: ipv4
-
name: ipv4-proto
type: u8
-
name: ipv4-tos
type: u8
-
name: ipv4-ttl
type: u8
-
name: ipv4-frag
type: u8
enum: ovs-frag-type
-
name: ovs-key-ipv6
type: struct
members:
-
name: ipv6-src
type: binary
len: 16
byte-order: big-endian
display-hint: ipv6
-
name: ipv6-dst
type: binary
len: 16
byte-order: big-endian
display-hint: ipv6
-
name: ipv6-label
type: u32
byte-order: big-endian
-
name: ipv6-proto
type: u8
-
name: ipv6-tclass
type: u8
-
name: ipv6-hlimit
type: u8
-
name: ipv6-frag
type: u8
-
name: ovs-key-ipv6-exthdrs
type: struct
members:
-
name: hdrs
type: u16
-
name: ovs-frag-type
name-prefix: ovs-frag-type-
type: enum
entries:
-
name: none
doc: Packet is not a fragment.
-
name: first
doc: Packet is a fragment with offset 0.
-
name: later
doc: Packet is a fragment with nonzero offset.
-
name: any
value: 255
-
name: ovs-key-tcp
type: struct
members:
-
name: tcp-src
type: u16
byte-order: big-endian
-
name: tcp-dst
type: u16
byte-order: big-endian
-
name: ovs-key-udp
type: struct
members:
-
name: udp-src
type: u16
byte-order: big-endian
-
name: udp-dst
type: u16
byte-order: big-endian
-
name: ovs-key-sctp
type: struct
members:
-
name: sctp-src
type: u16
byte-order: big-endian
-
name: sctp-dst
type: u16
byte-order: big-endian
-
name: ovs-key-icmp
type: struct
members:
-
name: icmp-type
type: u8
-
name: icmp-code
type: u8
-
name: ovs-key-arp
type: struct
members:
-
name: arp-sip
type: u32
byte-order: big-endian
-
name: arp-tip
type: u32
byte-order: big-endian
-
name: arp-op
type: u16
byte-order: big-endian
-
name: arp-sha
type: binary
len: 6
display-hint: mac
-
name: arp-tha
type: binary
len: 6
display-hint: mac
-
name: ovs-key-nd
type: struct
members:
-
name: nd_target
type: binary
len: 16
byte-order: big-endian
-
name: nd-sll
type: binary
len: 6
display-hint: mac
-
name: nd-tll
type: binary
len: 6
display-hint: mac
-
name: ovs-key-ct-tuple-ipv4
type: struct
members:
-
name: ipv4-src
type: u32
byte-order: big-endian
-
name: ipv4-dst
type: u32
byte-order: big-endian
-
name: src-port
type: u16
byte-order: big-endian
-
name: dst-port
type: u16
byte-order: big-endian
-
name: ipv4-proto
type: u8
-
name: ovs-action-push-vlan
type: struct
members:
-
name: vlan_tpid
type: u16
byte-order: big-endian
doc: Tag protocol identifier (TPID) to push.
-
name: vlan_tci
type: u16
byte-order: big-endian
doc: Tag control identifier (TCI) to push.
-
name: ovs-ufid-flags
name-prefix: ovs-ufid-f-
type: flags
entries:
- omit-key
- omit-mask
- omit-actions
-
name: ovs-action-hash
type: struct
members:
-
name: hash-alg
type: u32
doc: Algorithm used to compute hash prior to recirculation.
-
name: hash-basis
type: u32
doc: Basis used for computing hash.
-
name: ovs-hash-alg
type: enum
doc: |
Data path hash algorithm for computing Datapath hash. The algorithm type only specifies
the fields in a flow will be used as part of the hash. Each datapath is free to use its
own hash algorithm. The hash value will be opaque to the user space daemon.
entries:
- ovs-hash-alg-l4
-
name: ovs-action-push-mpls
type: struct
members:
-
name: mpls-lse
type: u32
byte-order: big-endian
doc: |
MPLS label stack entry to push
-
name: mpls-ethertype
type: u32
byte-order: big-endian
doc: |
Ethertype to set in the encapsulating ethernet frame. The only values
ethertype should ever be given are ETH_P_MPLS_UC and ETH_P_MPLS_MC,
indicating MPLS unicast or multicast. Other are rejected.
-
name: ovs-action-add-mpls
type: struct
members:
-
name: mpls-lse
type: u32
byte-order: big-endian
doc: |
MPLS label stack entry to push
-
name: mpls-ethertype
type: u32
byte-order: big-endian
doc: |
Ethertype to set in the encapsulating ethernet frame. The only values
ethertype should ever be given are ETH_P_MPLS_UC and ETH_P_MPLS_MC,
indicating MPLS unicast or multicast. Other are rejected.
-
name: tun-flags
type: u16
doc: |
MPLS tunnel attributes.
-
name: ct-state-flags
type: flags
name-prefix: ovs-cs-f-
entries:
-
name: new
doc: Beginning of a new connection.
-
name: established
doc: Part of an existing connenction
-
name: related
doc: Related to an existing connection.
-
name: reply-dir
doc: Flow is in the reply direction.
-
name: invalid
doc: Could not track the connection.
-
name: tracked
doc: Conntrack has occurred.
-
name: src-nat
doc: Packet's source address/port was mangled by NAT.
-
name: dst-nat
doc: Packet's destination address/port was mangled by NAT.
attribute-sets:
-
name: flow-attrs
enum-name: ovs-flow-attr
name-prefix: ovs-flow-attr-
attributes:
-
name: key
type: nest
nested-attributes: key-attrs
doc: |
Nested attributes specifying the flow key. Always present in
notifications. Required for all requests (except dumps).
-
name: actions
type: nest
nested-attributes: action-attrs
doc: |
Nested attributes specifying the actions to take for packets that
match the key. Always present in notifications. Required for
OVS_FLOW_CMD_NEW requests, optional for OVS_FLOW_CMD_SET requests. An
OVS_FLOW_CMD_SET without OVS_FLOW_ATTR_ACTIONS will not modify the
actions. To clear the actions, an OVS_FLOW_ATTR_ACTIONS without any
nested attributes must be given.
-
name: stats
type: binary
struct: ovs-flow-stats
doc: |
Statistics for this flow. Present in notifications if the stats would
be nonzero. Ignored in requests.
-
name: tcp-flags
type: u8
doc: |
An 8-bit value giving the ORed value of all of the TCP flags seen on
packets in this flow. Only present in notifications for TCP flows, and
only if it would be nonzero. Ignored in requests.
-
name: used
type: u64
doc: |
A 64-bit integer giving the time, in milliseconds on the system
monotonic clock, at which a packet was last processed for this
flow. Only present in notifications if a packet has been processed for
this flow. Ignored in requests.
-
name: clear
type: flag
doc: |
If present in a OVS_FLOW_CMD_SET request, clears the last-used time,
accumulated TCP flags, and statistics for this flow. Otherwise
ignored in requests. Never present in notifications.
-
name: mask
type: nest
nested-attributes: key-attrs
doc: |
Nested attributes specifying the mask bits for wildcarded flow
match. Mask bit value '1' specifies exact match with corresponding
flow key bit, while mask bit value '0' specifies a wildcarded
match. Omitting attribute is treated as wildcarding all corresponding
fields. Optional for all requests. If not present, all flow key bits
are exact match bits.
-
name: probe
type: binary
doc: |
Flow operation is a feature probe, error logging should be suppressed.
-
name: ufid
type: binary
doc: |
A value between 1-16 octets specifying a unique identifier for the
flow. Causes the flow to be indexed by this value rather than the
value of the OVS_FLOW_ATTR_KEY attribute. Optional for all
requests. Present in notifications if the flow was created with this
attribute.
display-hint: uuid
-
name: ufid-flags
type: u32
enum: ovs-ufid-flags
doc: |
A 32-bit value of ORed flags that provide alternative semantics for
flow installation and retrieval. Optional for all requests.
-
name: pad
type: binary
-
name: key-attrs
enum-name: ovs-key-attr
name-prefix: ovs-key-attr-
attributes:
-
name: encap
type: nest
nested-attributes: key-attrs
-
name: priority
type: u32
-
name: in-port
type: u32
-
name: ethernet
type: binary
struct: ovs-key-ethernet
doc: struct ovs_key_ethernet
-
name: vlan
type: u16
byte-order: big-endian
-
name: ethertype
type: u16
byte-order: big-endian
-
name: ipv4
type: binary
struct: ovs-key-ipv4
-
name: ipv6
type: binary
struct: ovs-key-ipv6
doc: struct ovs_key_ipv6
-
name: tcp
type: binary
struct: ovs-key-tcp
-
name: udp
type: binary
struct: ovs-key-udp
-
name: icmp
type: binary
struct: ovs-key-icmp
-
name: icmpv6
type: binary
struct: ovs-key-icmp
-
name: arp
type: binary
struct: ovs-key-arp
doc: struct ovs_key_arp
-
name: nd
type: binary
struct: ovs-key-nd
doc: struct ovs_key_nd
-
name: skb-mark
type: u32
-
name: tunnel
type: nest
nested-attributes: tunnel-key-attrs
-
name: sctp
type: binary
struct: ovs-key-sctp
-
name: tcp-flags
type: u16
byte-order: big-endian
-
name: dp-hash
type: u32
doc: Value 0 indicates the hash is not computed by the datapath.
-
name: recirc-id
type: u32
-
name: mpls
type: binary
struct: ovs-key-mpls
-
name: ct-state
type: u32
enum: ct-state-flags
enum-as-flags: true
-
name: ct-zone
type: u16
doc: connection tracking zone
-
name: ct-mark
type: u32
doc: connection tracking mark
-
name: ct-labels
type: binary
display-hint: hex
doc: 16-octet connection tracking label
-
name: ct-orig-tuple-ipv4
type: binary
struct: ovs-key-ct-tuple-ipv4
-
name: ct-orig-tuple-ipv6
type: binary
doc: struct ovs_key_ct_tuple_ipv6
-
name: nsh
type: nest
nested-attributes: ovs-nsh-key-attrs
-
name: packet-type
type: u32
byte-order: big-endian
doc: Should not be sent to the kernel
-
name: nd-extensions
type: binary
doc: Should not be sent to the kernel
-
name: tunnel-info
type: binary
doc: struct ip_tunnel_info
-
name: ipv6-exthdrs
type: binary
struct: ovs-key-ipv6-exthdrs
doc: struct ovs_key_ipv6_exthdr
-
name: action-attrs
enum-name: ovs-action-attr
name-prefix: ovs-action-attr-
attributes:
-
name: output
type: u32
doc: ovs port number in datapath
-
name: userspace
type: nest
nested-attributes: userspace-attrs
-
name: set
type: nest
nested-attributes: key-attrs
doc: Replaces the contents of an existing header. The single nested attribute specifies a header to modify and its value.
-
name: push-vlan
type: binary
struct: ovs-action-push-vlan
doc: Push a new outermost 802.1Q or 802.1ad header onto the packet.
-
name: pop-vlan
type: flag
doc: Pop the outermost 802.1Q or 802.1ad header from the packet.
-
name: sample
type: nest
nested-attributes: sample-attrs
doc: |
Probabilistically executes actions, as specified in the nested attributes.
-
name: recirc
type: u32
doc: recirc id
-
name: hash
type: binary
struct: ovs-action-hash
-
name: push-mpls
type: binary
struct: ovs-action-push-mpls
doc: |
Push a new MPLS label stack entry onto the top of the packets MPLS
label stack. Set the ethertype of the encapsulating frame to either
ETH_P_MPLS_UC or ETH_P_MPLS_MC to indicate the new packet contents.
-
name: pop-mpls
type: u16
byte-order: big-endian
doc: ethertype
-
name: set-masked
type: nest
nested-attributes: key-attrs
doc: |
Replaces the contents of an existing header. A nested attribute
specifies a header to modify, its value, and a mask. For every bit set
in the mask, the corresponding bit value is copied from the value to
the packet header field, rest of the bits are left unchanged. The
non-masked value bits must be passed in as zeroes. Masking is not
supported for the OVS_KEY_ATTR_TUNNEL attribute.
-
name: ct
type: nest
nested-attributes: ct-attrs
doc: |
Track the connection. Populate the conntrack-related entries
in the flow key.
-
name: trunc
type: u32
doc: struct ovs_action_trunc is a u32 max length
-
name: push-eth
type: binary
doc: struct ovs_action_push_eth
-
name: pop-eth
type: flag
-
name: ct-clear
type: flag
-
name: push-nsh
type: nest
nested-attributes: ovs-nsh-key-attrs
doc: |
Push NSH header to the packet.
-
name: pop-nsh
type: flag
doc: |
Pop the outermost NSH header off the packet.
-
name: meter
type: u32
doc: |
Run packet through a meter, which may drop the packet, or modify the
packet (e.g., change the DSCP field)
-
name: clone
type: nest
nested-attributes: action-attrs
doc: |
Make a copy of the packet and execute a list of actions without
affecting the original packet and key.
-
name: check-pkt-len
type: nest
nested-attributes: check-pkt-len-attrs
doc: |
Check the packet length and execute a set of actions if greater than
the specified packet length, else execute another set of actions.
-
name: add-mpls
type: binary
struct: ovs-action-add-mpls
doc: |
Push a new MPLS label stack entry at the start of the packet or at the
start of the l3 header depending on the value of l3 tunnel flag in the
tun_flags field of this OVS_ACTION_ATTR_ADD_MPLS argument.
-
name: dec-ttl
type: nest
nested-attributes: dec-ttl-attrs
-
name: tunnel-key-attrs
enum-name: ovs-tunnel-key-attr
name-prefix: ovs-tunnel-key-attr-
attributes:
-
name: id
type: u64
byte-order: big-endian
value: 0
-
name: ipv4-src
type: u32
byte-order: big-endian
-
name: ipv4-dst
type: u32
byte-order: big-endian
-
name: tos
type: u8
-
name: ttl
type: u8
-
name: dont-fragment
type: flag
-
name: csum
type: flag
-
name: oam
type: flag
-
name: geneve-opts
type: binary
sub-type: u32
-
name: tp-src
type: u16
byte-order: big-endian
-
name: tp-dst
type: u16
byte-order: big-endian
-
name: vxlan-opts
type: nest
nested-attributes: vxlan-ext-attrs
-
name: ipv6-src
type: binary
doc: |
struct in6_addr source IPv6 address
-
name: ipv6-dst
type: binary
doc: |
struct in6_addr destination IPv6 address
-
name: pad
type: binary
-
name: erspan-opts
type: binary
doc: |
struct erspan_metadata
-
name: ipv4-info-bridge
type: flag
-
name: check-pkt-len-attrs
enum-name: ovs-check-pkt-len-attr
name-prefix: ovs-check-pkt-len-attr-
attributes:
-
name: pkt-len
type: u16
-
name: actions-if-greater
type: nest
nested-attributes: action-attrs
-
name: actions-if-less-equal
type: nest
nested-attributes: action-attrs
-
name: sample-attrs
enum-name: ovs-sample-attr
name-prefix: ovs-sample-attr-
attributes:
-
name: probability
type: u32
-
name: actions
type: nest
nested-attributes: action-attrs
-
name: userspace-attrs
enum-name: ovs-userspace-attr
name-prefix: ovs-userspace-attr-
attributes:
-
name: pid
type: u32
-
name: userdata
type: binary
-
name: egress-tun-port
type: u32
-
name: actions
type: flag
-
name: ovs-nsh-key-attrs
enum-name: ovs-nsh-key-attr
name-prefix: ovs-nsh-key-attr-
attributes:
-
name: base
type: binary
-
name: md1
type: binary
-
name: md2
type: binary
-
name: ct-attrs
enum-name: ovs-ct-attr
name-prefix: ovs-ct-attr-
attributes:
-
name: commit
type: flag
-
name: zone
type: u16
-
name: mark
type: binary
-
name: labels
type: binary
-
name: helper
type: string
-
name: nat
type: nest
nested-attributes: nat-attrs
-
name: force-commit
type: flag
-
name: eventmask
type: u32
-
name: timeout
type: string
-
name: nat-attrs
enum-name: ovs-nat-attr
name-prefix: ovs-nat-attr-
attributes:
-
name: src
type: flag
-
name: dst
type: flag
-
name: ip-min
type: binary
-
name: ip-max
type: binary
-
name: proto-min
type: u16
-
name: proto-max
type: u16
-
name: persistent
type: flag
-
name: proto-hash
type: flag
-
name: proto-random
type: flag
-
name: dec-ttl-attrs
enum-name: ovs-dec-ttl-attr
name-prefix: ovs-dec-ttl-attr-
attributes:
-
name: action
type: nest
nested-attributes: action-attrs
-
name: vxlan-ext-attrs
enum-name: ovs-vxlan-ext-
name-prefix: ovs-vxlan-ext-
attributes:
-
name: gbp
type: u32
operations:
name-prefix: ovs-flow-cmd-
fixed-header: ovs-header
list:
-
name: get
doc: Get / dump OVS flow configuration and state
value: 3
attribute-set: flow-attrs
do: &flow-get-op
request:
attributes:
- dp-ifindex
- key
- ufid
- ufid-flags
reply:
attributes:
- dp-ifindex
- key
- ufid
- mask
- stats
- actions
dump: *flow-get-op
-
name: new
doc: Create OVS flow configuration in a data path
value: 1
attribute-set: flow-attrs
do:
request:
attributes:
- dp-ifindex
- key
- ufid
- mask
- actions
mcast-groups:
list:
-
name: ovs_flow

View file

@ -3,6 +3,7 @@
name: ovs_vport
version: 2
protocol: genetlink-legacy
uapi-header: linux/openvswitch.h
doc:
OVS vport configuration over generic netlink.
@ -18,10 +19,13 @@ definitions:
-
name: vport-type
type: enum
enum-name: ovs-vport-type
name-prefix: ovs-vport-type-
entries: [ unspec, netdev, internal, gre, vxlan, geneve ]
-
name: vport-stats
type: struct
enum-name: ovs-vport-stats
members:
-
name: rx-packets
@ -51,6 +55,8 @@ definitions:
attribute-sets:
-
name: vport-options
enum-name: ovs-vport-options
name-prefix: ovs-tunnel-attr-
attributes:
-
name: dst-port
@ -60,6 +66,8 @@ attribute-sets:
type: u32
-
name: upcall-stats
enum-name: ovs-vport-upcall-attr
name-prefix: ovs-vport-upcall-attr-
attributes:
-
name: success
@ -70,6 +78,8 @@ attribute-sets:
type: u64
-
name: vport
name-prefix: ovs-vport-attr-
enum-name: ovs-vport-attr
attributes:
-
name: port-no
@ -108,9 +118,10 @@ attribute-sets:
nested-attributes: upcall-stats
operations:
name-prefix: ovs-vport-cmd-
list:
-
name: vport-get
name: get
doc: Get / dump OVS vport configuration and state
value: 3
attribute-set: vport

View file

@ -38,6 +38,7 @@ debug logs.
Some of the ENA devices support a working mode called Low-latency
Queue (LLQ), which saves several more microseconds.
ENA Source Code Directory Structure
===================================
@ -205,6 +206,8 @@ Adaptive coalescing can be switched on/off through `ethtool(8)`'s
More information about Adaptive Interrupt Moderation (DIM) can be found in
Documentation/networking/net_dim.rst
.. _`RX copybreak`:
RX copybreak
============
The rx_copybreak is initialized by default to ENA_DEFAULT_RX_COPYBREAK
@ -315,3 +318,34 @@ Rx
- The new SKB is updated with the necessary information (protocol,
checksum hw verify result, etc), and then passed to the network
stack, using the NAPI interface function :code:`napi_gro_receive()`.
Dynamic RX Buffers (DRB)
------------------------
Each RX descriptor in the RX ring is a single memory page (which is either 4KB
or 16KB long depending on system's configurations).
To reduce the memory allocations required when dealing with a high rate of small
packets, the driver tries to reuse the remaining RX descriptor's space if more
than 2KB of this page remain unused.
A simple example of this mechanism is the following sequence of events:
::
1. Driver allocates page-sized RX buffer and passes it to hardware
+----------------------+
|4KB RX Buffer |
+----------------------+
2. A 300Bytes packet is received on this buffer
3. The driver increases the ref count on this page and returns it back to
HW as an RX buffer of size 4KB - 300Bytes = 3796 Bytes
+----+--------------------+
|****|3796 Bytes RX Buffer|
+----+--------------------+
This mechanism isn't used when an XDP program is loaded, or when the
RX packet is less than rx_copybreak bytes (in which case the packet is
copied out of the RX buffer into the linear part of a new skb allocated
for it and the RX buffer remains the same size, see `RX copybreak`_).

View file

@ -84,24 +84,6 @@ Once the VM shuts down, or otherwise releases the VF, the command will
complete.
Important notes for SR-IOV and Link Aggregation
-----------------------------------------------
Link Aggregation is mutually exclusive with SR-IOV.
- If Link Aggregation is active, SR-IOV VFs cannot be created on the PF.
- If SR-IOV is active, you cannot set up Link Aggregation on the interface.
Bridging and MACVLAN are also affected by this. If you wish to use bridging or
MACVLAN with SR-IOV, you must set up bridging or MACVLAN before enabling
SR-IOV. If you are using bridging or MACVLAN in conjunction with SR-IOV, and
you want to remove the interface from the bridge or MACVLAN, you must follow
these steps:
1. Destroy SR-IOV VFs if they exist
2. Remove the interface from the bridge or MACVLAN
3. Recreate SRIOV VFs as needed
Additional Features and Configurations
======================================

View file

@ -13,6 +13,7 @@ Contents
- `Drivers`_
- `Basic packet flow`_
- `Devlink health reporters`_
- `Quality of service`_
Overview
========
@ -287,3 +288,47 @@ For example::
NIX_AF_ERR:
NIX Error Interrupt Reg : 64
Rx on unmapped PF_FUNC
Quality of service
==================
Hardware algorithms used in scheduling
--------------------------------------
octeontx2 silicon and CN10K transmit interface consists of five transmit levels
starting from SMQ/MDQ, TL4 to TL1. Each packet will traverse MDQ, TL4 to TL1
levels. Each level contains an array of queues to support scheduling and shaping.
The hardware uses the below algorithms depending on the priority of scheduler queues.
once the usercreates tc classes with different priorities, the driver configures
schedulers allocated to the class with specified priority along with rate-limiting
configuration.
1. Strict Priority
- Once packets are submitted to MDQ, hardware picks all active MDQs having different priority
using strict priority.
2. Round Robin
- Active MDQs having the same priority level are chosen using round robin.
Setup HTB offload
-----------------
1. Enable HW TC offload on the interface::
# ethtool -K <interface> hw-tc-offload on
2. Crate htb root::
# tc qdisc add dev <interface> clsact
# tc qdisc replace dev <interface> root handle 1: htb offload
3. Create tc classes with different priorities::
# tc class add dev <interface> parent 1: classid 1:1 htb rate 10Gbit prio 1
# tc class add dev <interface> parent 1: classid 1:2 htb rate 10Gbit prio 7

View file

@ -797,6 +797,16 @@ Counters on the NIC port that is connected to a eSwitch.
RoCE/UD/RC traffic) [#accel]_.
- Acceleration
* - `vport_loopback_packets`
- Unicast, multicast and broadcast packets that were loop-back (received
and transmitted), IB/Eth [#accel]_.
- Acceleration
* - `vport_loopback_bytes`
- Unicast, multicast and broadcast bytes that were loop-back (received
and transmitted), IB/Eth [#accel]_.
- Acceleration
* - `rx_steer_missed_packets`
- Number of packets that was received by the NIC, however was discarded
because it did not match any flow in the NIC flow table.

View file

@ -290,6 +290,13 @@ Description of the vnic counters:
- nic_receive_steering_discard
number of packets that completed RX flow
steering but were discarded due to a mismatch in flow table.
- generated_pkt_steering_fail
number of packets generated by the VNIC experiencing unexpected steering
failure (at any point in steering flow).
- handled_pkt_steering_fail
number of packets handled by the VNIC experiencing unexpected steering
failure (at any point in steering flow owned by the VNIC, including the FDB
for the eswitch owner).
User commands examples:

View file

@ -45,6 +45,28 @@ Following bridge VLAN functions are supported by mlx5:
Subfunction
===========
Subfunction which are spawned over the E-switch are created only with devlink
device, and by default all the SF auxiliary devices are disabled.
This will allow user to configure the SF before the SF have been fully probed,
which will save time.
Usage example:
- Create SF::
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
$ devlink port function set pci/0000:08:00.0/32768 hw_addr 00:00:00:00:00:11 state active
- Enable ETH auxiliary device::
$ devlink dev param set auxiliary/mlx5_core.sf.1 name enable_eth value true cmode driverinit
- Now, in order to fully probe the SF, use devlink reload::
$ devlink dev reload auxiliary/mlx5_core.sf.1
mlx5 supports ETH,rdma and vdpa (vnet) auxiliary devices devlink params (see :ref:`Documentation/networking/devlink/devlink-params.rst <devlink_params_generic>`).
mlx5 supports subfunction management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface.
A subfunction has its own function capabilities and its own resources. This

View file

@ -881,9 +881,10 @@ tcp_fastopen_key - list of comma separated 32-digit hexadecimal INTEGERs
tcp_syn_retries - INTEGER
Number of times initial SYNs for an active TCP connection attempt
will be retransmitted. Should not be higher than 127. Default value
is 6, which corresponds to 63seconds till the last retransmission
with the current initial RTO of 1second. With this the final timeout
for an active TCP connection attempt will happen after 127seconds.
is 6, which corresponds to 67seconds (with tcp_syn_linear_timeouts = 4)
till the last retransmission with the current initial RTO of 1second.
With this the final timeout for an active TCP connection attempt
will happen after 131seconds.
tcp_timestamps - INTEGER
Enable timestamps as defined in RFC1323.
@ -946,6 +947,16 @@ tcp_pacing_ca_ratio - INTEGER
Default: 120
tcp_syn_linear_timeouts - INTEGER
The number of times for an active TCP connection to retransmit SYNs with
a linear backoff timeout before defaulting to an exponential backoff
timeout. This has no effect on SYNACK at the passive TCP side.
With an initial RTO of 1 and tcp_syn_linear_timeouts = 4 we would
expect SYN RTOs to be: 1, 1, 1, 1, 1, 2, 4, ... (4 linear timeouts,
and the first exponential backoff using 2^0 * initial_RTO).
Default: 4
tcp_tso_win_divisor - INTEGER
This allows control over what percentage of the congestion window
can be consumed by a single TSO frame.
@ -970,6 +981,21 @@ tcp_tw_reuse - INTEGER
tcp_window_scaling - BOOLEAN
Enable window scaling as defined in RFC1323.
tcp_shrink_window - BOOLEAN
This changes how the TCP receive window is calculated.
RFC 7323, section 2.4, says there are instances when a retracted
window can be offered, and that TCP implementations MUST ensure
that they handle a shrinking window, as specified in RFC 1122.
- 0 - Disabled. The window is never shrunk.
- 1 - Enabled. The window is shrunk when necessary to remain within
the memory limit set by autotuning (sk_rcvbuf).
This only occurs if a non-zero receive window
scaling factor is also in effect.
Default: 0
tcp_wmem - vector of 3 INTEGERs: min, default, max
min: Amount of memory reserved for send buffers for TCP sockets.
Each TCP socket has rights to use it due to fact of its birth.

View file

@ -269,8 +269,8 @@ a single application thread handles flows with many different flow hashes.
rps_sock_flow_table is a global flow table that contains the *desired* CPU
for flows: the CPU that is currently processing the flow in userspace.
Each table value is a CPU index that is updated during calls to recvmsg
and sendmsg (specifically, inet_recvmsg(), inet_sendmsg(), inet_sendpage()
and tcp_splice_read()).
and sendmsg (specifically, inet_recvmsg(), inet_sendmsg() and
tcp_splice_read()).
When the scheduler moves a thread to a new CPU while it has outstanding
receive packets on the old CPU, packets may arrive out of order. To

View file

@ -78,3 +78,82 @@ to see other examples.
The code generation itself is performed by ``tools/net/ynl/ynl-gen-c.py``
but it takes a few arguments so calling it directly for each file
quickly becomes tedious.
YNL lib
=======
``tools/net/ynl/lib/`` contains an implementation of a C library
(based on libmnl) which integrates with code generated by
``tools/net/ynl/ynl-gen-c.py`` to create easy to use netlink wrappers.
YNL basics
----------
The YNL library consists of two parts - the generic code (functions
prefix by ``ynl_``) and per-family auto-generated code (prefixed
with the name of the family).
To create a YNL socket call ynl_sock_create() passing the family
struct (family structs are exported by the auto-generated code).
ynl_sock_destroy() closes the socket.
YNL requests
------------
Steps for issuing YNL requests are best explained on an example.
All the functions and types in this example come from the auto-generated
code (for the netdev family in this case):
.. code-block:: c
// 0. Request and response pointers
struct netdev_dev_get_req *req;
struct netdev_dev_get_rsp *d;
// 1. Allocate a request
req = netdev_dev_get_req_alloc();
// 2. Set request parameters (as needed)
netdev_dev_get_req_set_ifindex(req, ifindex);
// 3. Issues the request
d = netdev_dev_get(ys, req);
// 4. Free the request arguments
netdev_dev_get_req_free(req);
// 5. Error check (the return value from step 3)
if (!d) {
// 6. Print the YNL-generated error
fprintf(stderr, "YNL: %s\n", ys->err.msg);
return -1;
}
// ... do stuff with the response @d
// 7. Free response
netdev_dev_get_rsp_free(d);
YNL dumps
---------
Performing dumps follows similar pattern as requests.
Dumps return a list of objects terminated by a special marker,
or NULL on error. Use ``ynl_dump_foreach()`` to iterate over
the result.
YNL notifications
-----------------
YNL lib supports using the same socket for notifications and
requests. In case notifications arrive during processing of a request
they are queued internally and can be retrieved at a later time.
To subscribed to notifications use ``ynl_subscribe()``.
The notifications have to be read out from the socket,
``ynl_socket_get_fd()`` returns the underlying socket fd which can
be plugged into appropriate asynchronous IO API like ``poll``,
or ``select``.
Notifications can be retrieved using ``ynl_ntf_dequeue()`` and have
to be freed using ``ynl_ntf_free()``. Since we don't know the notification
type upfront the notifications are returned as ``struct ynl_ntf_base_type *``
and user is expected to cast them to the appropriate full type based
on the ``cmd`` member.

View file

@ -909,13 +909,6 @@ L: netdev@vger.kernel.org
S: Maintained
F: drivers/net/ethernet/altera/
ALTERA TSE PCS
M: Maxime Chevallier <maxime.chevallier@bootlin.com>
L: netdev@vger.kernel.org
S: Supported
F: drivers/net/pcs/pcs-altera-tse.c
F: include/linux/pcs-altera-tse.h
ALTERA UART/JTAG UART SERIAL DRIVERS
M: Tobias Klauser <tklauser@distanz.ch>
L: linux-serial@vger.kernel.org
@ -3613,6 +3606,7 @@ S: Supported
W: http://www.bluez.org/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth.git
T: git git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git
F: Documentation/devicetree/bindings/net/bluetooth/
F: drivers/bluetooth/
BLUETOOTH SUBSYSTEM
@ -8011,6 +8005,12 @@ S: Maintained
F: drivers/hwmon/f75375s.c
F: include/linux/f75375s.h
FINTEK F81604 USB to 2xCANBUS DEVICE DRIVER
M: Ji-Ze Hong (Peter Hong) <peter_hong@fintek.com.tw>
L: linux-can@vger.kernel.org
S: Maintained
F: drivers/net/can/usb/f81604.c
FIREWIRE AUDIO DRIVERS and IEC 61883-1/6 PACKET STREAMING ENGINE
M: Clemens Ladisch <clemens@ladisch.de>
M: Takashi Sakamoto <o-takashi@sakamocchi.jp>
@ -10380,9 +10380,8 @@ M: Jesse Brandeburg <jesse.brandeburg@intel.com>
M: Tony Nguyen <anthony.l.nguyen@intel.com>
L: intel-wired-lan@lists.osuosl.org (moderated for non-subscribers)
S: Supported
W: http://www.intel.com/support/feedback.htm
W: http://e1000.sourceforge.net/
Q: http://patchwork.ozlabs.org/project/intel-wired-lan/list/
W: https://www.intel.com/content/www/us/en/support.html
Q: https://patchwork.ozlabs.org/project/intel-wired-lan/list/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue.git
T: git git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git
F: Documentation/networking/device_drivers/ethernet/intel/
@ -12590,7 +12589,7 @@ F: drivers/mtd/nand/raw/marvell_nand.c
MARVELL OCTEON ENDPOINT DRIVER
M: Veerasenareddy Burru <vburru@marvell.com>
M: Abhijit Ayarekar <aayarekar@marvell.com>
M: Sathesh Edara <sedara@marvell.com>
L: netdev@vger.kernel.org
S: Supported
F: drivers/net/ethernet/marvell/octeon_ep
@ -12889,6 +12888,13 @@ F: Documentation/devicetree/bindings/net/ieee802154/mcr20a.txt
F: drivers/net/ieee802154/mcr20a.c
F: drivers/net/ieee802154/mcr20a.h
MDIO REGMAP DRIVER
M: Maxime Chevallier <maxime.chevallier@bootlin.com>
L: netdev@vger.kernel.org
S: Maintained
F: drivers/net/mdio/mdio-regmap.c
F: include/linux/mdio/mdio-regmap.h
MEASUREMENT COMPUTING CIO-DAC IIO DRIVER
M: William Breathitt Gray <william.gray@linaro.org>
L: linux-iio@vger.kernel.org
@ -13188,6 +13194,15 @@ S: Maintained
F: drivers/net/pcs/pcs-mtk-lynxi.c
F: include/linux/pcs/pcs-mtk-lynxi.h
MEDIATEK ETHERNET PHY DRIVERS
M: Daniel Golle <daniel@makrotopia.org>
M: Qingfang Deng <dqfext@gmail.com>
M: SkyLake Huang <SkyLake.Huang@mediatek.com>
L: netdev@vger.kernel.org
S: Maintained
F: drivers/net/phy/mediatek-ge-soc.c
F: drivers/net/phy/mediatek-ge.c
MEDIATEK I2C CONTROLLER DRIVER
M: Qii Wang <qii.wang@mediatek.com>
L: linux-i2c@vger.kernel.org
@ -13249,6 +13264,7 @@ R: Shayne Chen <shayne.chen@mediatek.com>
R: Sean Wang <sean.wang@mediatek.com>
L: linux-wireless@vger.kernel.org
S: Maintained
T: git https://github.com/nbd168/wireless
F: Documentation/devicetree/bindings/net/wireless/mediatek,mt76.yaml
F: drivers/net/wireless/mediatek/mt76/
@ -14770,6 +14786,7 @@ NETWORKING [TCP]
M: Eric Dumazet <edumazet@google.com>
L: netdev@vger.kernel.org
S: Maintained
F: include/linux/net_mm.h
F: include/linux/tcp.h
F: include/net/tcp.h
F: include/trace/events/tcp.h
@ -17399,6 +17416,8 @@ QUALCOMM ATHEROS ATH11K WIRELESS DRIVER
M: Kalle Valo <kvalo@kernel.org>
L: ath11k@lists.infradead.org
S: Supported
W: https://wireless.wiki.kernel.org/en/users/Drivers/ath11k
B: https://wireless.wiki.kernel.org/en/users/Drivers/ath11k/bugreport
T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
F: Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml
F: drivers/net/wireless/ath/ath11k/
@ -17408,6 +17427,7 @@ M: Toke Høiland-Jørgensen <toke@toke.dk>
L: linux-wireless@vger.kernel.org
S: Maintained
W: https://wireless.wiki.kernel.org/en/users/Drivers/ath9k
T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
F: Documentation/devicetree/bindings/net/wireless/qca,ath9k.yaml
F: drivers/net/wireless/ath/ath9k/
@ -23193,6 +23213,7 @@ F: drivers/iio/adc/xilinx-ams.c
XILINX AXI ETHERNET DRIVER
M: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
S: Maintained
F: Documentation/devicetree/bindings/net/xlnx,axi-ethernet.yaml
F: drivers/net/ethernet/xilinx/xilinx_axienet*
XILINX CAN DRIVER

View file

@ -137,6 +137,9 @@
#define SO_RCVMARK 75
#define SO_PASSPIDFD 76
#define SO_PEERPIDFD 77
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64

View file

@ -1731,21 +1731,21 @@ static void invoke_bpf_mod_ret(struct jit_ctx *ctx, struct bpf_tramp_links *tl,
}
}
static void save_args(struct jit_ctx *ctx, int args_off, int nargs)
static void save_args(struct jit_ctx *ctx, int args_off, int nregs)
{
int i;
for (i = 0; i < nargs; i++) {
for (i = 0; i < nregs; i++) {
emit(A64_STR64I(i, A64_SP, args_off), ctx);
args_off += 8;
}
}
static void restore_args(struct jit_ctx *ctx, int args_off, int nargs)
static void restore_args(struct jit_ctx *ctx, int args_off, int nregs)
{
int i;
for (i = 0; i < nargs; i++) {
for (i = 0; i < nregs; i++) {
emit(A64_LDR64I(i, A64_SP, args_off), ctx);
args_off += 8;
}
@ -1764,7 +1764,7 @@ static void restore_args(struct jit_ctx *ctx, int args_off, int nargs)
*/
static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
struct bpf_tramp_links *tlinks, void *orig_call,
int nargs, u32 flags)
int nregs, u32 flags)
{
int i;
int stack_size;
@ -1772,7 +1772,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
int regs_off;
int retval_off;
int args_off;
int nargs_off;
int nregs_off;
int ip_off;
int run_ctx_off;
struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
@ -1795,11 +1795,11 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
* SP + retval_off [ return value ] BPF_TRAMP_F_CALL_ORIG or
* BPF_TRAMP_F_RET_FENTRY_RET
*
* [ argN ]
* [ arg reg N ]
* [ ... ]
* SP + args_off [ arg1 ]
* SP + args_off [ arg reg 1 ]
*
* SP + nargs_off [ args count ]
* SP + nregs_off [ arg regs count ]
*
* SP + ip_off [ traced function ] BPF_TRAMP_F_IP_ARG flag
*
@ -1816,13 +1816,13 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
if (flags & BPF_TRAMP_F_IP_ARG)
stack_size += 8;
nargs_off = stack_size;
nregs_off = stack_size;
/* room for args count */
stack_size += 8;
args_off = stack_size;
/* room for args */
stack_size += nargs * 8;
stack_size += nregs * 8;
/* room for return value */
retval_off = stack_size;
@ -1865,12 +1865,12 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
emit(A64_STR64I(A64_R(10), A64_SP, ip_off), ctx);
}
/* save args count*/
emit(A64_MOVZ(1, A64_R(10), nargs, 0), ctx);
emit(A64_STR64I(A64_R(10), A64_SP, nargs_off), ctx);
/* save arg regs count*/
emit(A64_MOVZ(1, A64_R(10), nregs, 0), ctx);
emit(A64_STR64I(A64_R(10), A64_SP, nregs_off), ctx);
/* save args */
save_args(ctx, args_off, nargs);
/* save arg regs */
save_args(ctx, args_off, nregs);
/* save callee saved registers */
emit(A64_STR64I(A64_R(19), A64_SP, regs_off), ctx);
@ -1897,7 +1897,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
}
if (flags & BPF_TRAMP_F_CALL_ORIG) {
restore_args(ctx, args_off, nargs);
restore_args(ctx, args_off, nregs);
/* call original func */
emit(A64_LDR64I(A64_R(10), A64_SP, retaddr_off), ctx);
emit(A64_ADR(A64_LR, AARCH64_INSN_SIZE * 2), ctx);
@ -1926,7 +1926,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
}
if (flags & BPF_TRAMP_F_RESTORE_REGS)
restore_args(ctx, args_off, nargs);
restore_args(ctx, args_off, nregs);
/* restore callee saved register x19 and x20 */
emit(A64_LDR64I(A64_R(19), A64_SP, regs_off), ctx);
@ -1967,24 +1967,25 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image,
void *orig_call)
{
int i, ret;
int nargs = m->nr_args;
int nregs = m->nr_args;
int max_insns = ((long)image_end - (long)image) / AARCH64_INSN_SIZE;
struct jit_ctx ctx = {
.image = NULL,
.idx = 0,
};
/* the first 8 arguments are passed by registers */
if (nargs > 8)
return -ENOTSUPP;
/* don't support struct argument */
/* extra registers needed for struct argument */
for (i = 0; i < MAX_BPF_FUNC_ARGS; i++) {
/* The arg_size is at most 16 bytes, enforced by the verifier. */
if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
return -ENOTSUPP;
nregs += (m->arg_size[i] + 7) / 8 - 1;
}
ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nargs, flags);
/* the first 8 registers are used for arguments */
if (nregs > 8)
return -ENOTSUPP;
ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nregs, flags);
if (ret < 0)
return ret;
@ -1995,7 +1996,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image,
ctx.idx = 0;
jit_fill_hole(image, (unsigned int)(image_end - image));
ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nargs, flags);
ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nregs, flags);
if (ret > 0 && validate_code(&ctx) < 0)
ret = -EINVAL;

View file

@ -148,6 +148,9 @@
#define SO_RCVMARK 75
#define SO_PASSPIDFD 76
#define SO_PEERPIDFD 77
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64

View file

@ -129,6 +129,9 @@
#define SO_RCVMARK 0x4049
#define SO_PASSPIDFD 0x404A
#define SO_PEERPIDFD 0x404B
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64

View file

@ -130,6 +130,9 @@
#define SO_RCVMARK 0x0054
#define SO_PASSPIDFD 0x0055
#define SO_PEERPIDFD 0x0056
#if !defined(__KERNEL__)

View file

@ -482,7 +482,6 @@ static const struct proto_ops alg_proto_ops = {
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
.mmap = sock_no_mmap,
.sendpage = sock_no_sendpage,
.sendmsg = sock_no_sendmsg,
.recvmsg = sock_no_recvmsg,
@ -531,50 +530,25 @@ static const struct net_proto_family alg_family = {
.owner = THIS_MODULE,
};
int af_alg_make_sg(struct af_alg_sgl *sgl, struct iov_iter *iter, int len)
{
size_t off;
ssize_t n;
int npages, i;
n = iov_iter_get_pages2(iter, sgl->pages, len, ALG_MAX_PAGES, &off);
if (n < 0)
return n;
npages = DIV_ROUND_UP(off + n, PAGE_SIZE);
if (WARN_ON(npages == 0))
return -EINVAL;
/* Add one extra for linking */
sg_init_table(sgl->sg, npages + 1);
for (i = 0, len = n; i < npages; i++) {
int plen = min_t(int, len, PAGE_SIZE - off);
sg_set_page(sgl->sg + i, sgl->pages[i], plen, off);
off = 0;
len -= plen;
}
sg_mark_end(sgl->sg + npages - 1);
sgl->npages = npages;
return n;
}
EXPORT_SYMBOL_GPL(af_alg_make_sg);
static void af_alg_link_sg(struct af_alg_sgl *sgl_prev,
struct af_alg_sgl *sgl_new)
{
sg_unmark_end(sgl_prev->sg + sgl_prev->npages - 1);
sg_chain(sgl_prev->sg, sgl_prev->npages + 1, sgl_new->sg);
sg_unmark_end(sgl_prev->sgt.sgl + sgl_prev->sgt.nents - 1);
sg_chain(sgl_prev->sgt.sgl, sgl_prev->sgt.nents + 1, sgl_new->sgt.sgl);
}
void af_alg_free_sg(struct af_alg_sgl *sgl)
{
int i;
for (i = 0; i < sgl->npages; i++)
put_page(sgl->pages[i]);
if (sgl->sgt.sgl) {
if (sgl->need_unpin)
for (i = 0; i < sgl->sgt.nents; i++)
unpin_user_page(sg_page(&sgl->sgt.sgl[i]));
if (sgl->sgt.sgl != sgl->sgl)
kvfree(sgl->sgt.sgl);
sgl->sgt.sgl = NULL;
}
}
EXPORT_SYMBOL_GPL(af_alg_free_sg);
@ -1015,7 +989,7 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
while (size) {
struct scatterlist *sg;
size_t len = size;
size_t plen;
ssize_t plen;
/* use the existing memory in an allocated page */
if (ctx->merge) {
@ -1060,35 +1034,58 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
if (sgl->cur)
sg_unmark_end(sg + sgl->cur - 1);
do {
struct page *pg;
unsigned int i = sgl->cur;
if (msg->msg_flags & MSG_SPLICE_PAGES) {
struct sg_table sgtable = {
.sgl = sg,
.nents = sgl->cur,
.orig_nents = sgl->cur,
};
plen = min_t(size_t, len, PAGE_SIZE);
pg = alloc_page(GFP_KERNEL);
if (!pg) {
err = -ENOMEM;
plen = extract_iter_to_sg(&msg->msg_iter, len, &sgtable,
MAX_SGL_ENTS - sgl->cur, 0);
if (plen < 0) {
err = plen;
goto unlock;
}
sg_assign_page(sg + i, pg);
err = memcpy_from_msg(page_address(sg_page(sg + i)),
msg, plen);
if (err) {
__free_page(sg_page(sg + i));
sg_assign_page(sg + i, NULL);
goto unlock;
}
sg[i].length = plen;
for (; sgl->cur < sgtable.nents; sgl->cur++)
get_page(sg_page(&sg[sgl->cur]));
len -= plen;
ctx->used += plen;
copied += plen;
size -= plen;
sgl->cur++;
} while (len && sgl->cur < MAX_SGL_ENTS);
} else {
do {
struct page *pg;
unsigned int i = sgl->cur;
plen = min_t(size_t, len, PAGE_SIZE);
pg = alloc_page(GFP_KERNEL);
if (!pg) {
err = -ENOMEM;
goto unlock;
}
sg_assign_page(sg + i, pg);
err = memcpy_from_msg(
page_address(sg_page(sg + i)),
msg, plen);
if (err) {
__free_page(sg_page(sg + i));
sg_assign_page(sg + i, NULL);
goto unlock;
}
sg[i].length = plen;
len -= plen;
ctx->used += plen;
copied += plen;
size -= plen;
sgl->cur++;
} while (len && sgl->cur < MAX_SGL_ENTS);
}
if (!size)
sg_mark_end(sg + sgl->cur - 1);
@ -1108,69 +1105,6 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
}
EXPORT_SYMBOL_GPL(af_alg_sendmsg);
/**
* af_alg_sendpage - sendpage system call handler
* @sock: socket of connection to user space to write to
* @page: data to send
* @offset: offset into page to begin sending
* @size: length of data
* @flags: message send/receive flags
*
* This is a generic implementation of sendpage to fill ctx->tsgl_list.
*/
ssize_t af_alg_sendpage(struct socket *sock, struct page *page,
int offset, size_t size, int flags)
{
struct sock *sk = sock->sk;
struct alg_sock *ask = alg_sk(sk);
struct af_alg_ctx *ctx = ask->private;
struct af_alg_tsgl *sgl;
int err = -EINVAL;
if (flags & MSG_SENDPAGE_NOTLAST)
flags |= MSG_MORE;
lock_sock(sk);
if (!ctx->more && ctx->used)
goto unlock;
if (!size)
goto done;
if (!af_alg_writable(sk)) {
err = af_alg_wait_for_wmem(sk, flags);
if (err)
goto unlock;
}
err = af_alg_alloc_tsgl(sk);
if (err)
goto unlock;
ctx->merge = 0;
sgl = list_entry(ctx->tsgl_list.prev, struct af_alg_tsgl, list);
if (sgl->cur)
sg_unmark_end(sgl->sg + sgl->cur - 1);
sg_mark_end(sgl->sg + sgl->cur);
get_page(page);
sg_set_page(sgl->sg + sgl->cur, page, size, offset);
sgl->cur++;
ctx->used += size;
done:
ctx->more = flags & MSG_MORE;
unlock:
af_alg_data_wakeup(sk);
release_sock(sk);
return err ?: size;
}
EXPORT_SYMBOL_GPL(af_alg_sendpage);
/**
* af_alg_free_resources - release resources required for crypto request
* @areq: Request holding the TX and RX SGL
@ -1288,8 +1222,8 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags,
while (maxsize > len && msg_data_left(msg)) {
struct af_alg_rsgl *rsgl;
ssize_t err;
size_t seglen;
int err;
/* limit the amount of readable buffers */
if (!af_alg_readable(sk))
@ -1306,16 +1240,23 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags,
return -ENOMEM;
}
rsgl->sgl.npages = 0;
rsgl->sgl.sgt.sgl = rsgl->sgl.sgl;
rsgl->sgl.sgt.nents = 0;
rsgl->sgl.sgt.orig_nents = 0;
list_add_tail(&rsgl->list, &areq->rsgl_list);
/* make one iovec available as scatterlist */
err = af_alg_make_sg(&rsgl->sgl, &msg->msg_iter, seglen);
sg_init_table(rsgl->sgl.sgt.sgl, ALG_MAX_PAGES);
err = extract_iter_to_sg(&msg->msg_iter, seglen, &rsgl->sgl.sgt,
ALG_MAX_PAGES, 0);
if (err < 0) {
rsgl->sg_num_bytes = 0;
return err;
}
sg_mark_end(rsgl->sgl.sgt.sgl + rsgl->sgl.sgt.nents - 1);
rsgl->sgl.need_unpin =
iov_iter_extract_will_pin(&msg->msg_iter);
/* chain the new scatterlist with previous one */
if (areq->last_rsgl)
af_alg_link_sg(&areq->last_rsgl->sgl, &rsgl->sgl);

View file

@ -9,10 +9,10 @@
* The following concept of the memory management is used:
*
* The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is
* filled by user space with the data submitted via sendpage/sendmsg. Filling
* up the TX SGL does not cause a crypto operation -- the data will only be
* tracked by the kernel. Upon receipt of one recvmsg call, the caller must
* provide a buffer which is tracked with the RX SGL.
* filled by user space with the data submitted via sendmsg (maybe with
* MSG_SPLICE_PAGES). Filling up the TX SGL does not cause a crypto operation
* -- the data will only be tracked by the kernel. Upon receipt of one recvmsg
* call, the caller must provide a buffer which is tracked with the RX SGL.
*
* During the processing of the recvmsg operation, the cipher request is
* allocated and prepared. As part of the recvmsg operation, the processed
@ -113,19 +113,19 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg,
}
/*
* Data length provided by caller via sendmsg/sendpage that has not
* yet been processed.
* Data length provided by caller via sendmsg that has not yet been
* processed.
*/
used = ctx->used;
/*
* Make sure sufficient data is present -- note, the same check is
* also present in sendmsg/sendpage. The checks in sendpage/sendmsg
* shall provide an information to the data sender that something is
* wrong, but they are irrelevant to maintain the kernel integrity.
* We need this check here too in case user space decides to not honor
* the error message in sendmsg/sendpage and still call recvmsg. This
* check here protects the kernel integrity.
* Make sure sufficient data is present -- note, the same check is also
* present in sendmsg. The checks in sendmsg shall provide an
* information to the data sender that something is wrong, but they are
* irrelevant to maintain the kernel integrity. We need this check
* here too in case user space decides to not honor the error message
* in sendmsg and still call recvmsg. This check here protects the
* kernel integrity.
*/
if (!aead_sufficient_data(sk))
return -EINVAL;
@ -210,7 +210,7 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg,
*/
/* Use the RX SGL as source (and destination) for crypto op. */
rsgl_src = areq->first_rsgl.sgl.sg;
rsgl_src = areq->first_rsgl.sgl.sgt.sgl;
if (ctx->enc) {
/*
@ -224,7 +224,8 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg,
* RX SGL: AAD || PT || Tag
*/
err = crypto_aead_copy_sgl(null_tfm, tsgl_src,
areq->first_rsgl.sgl.sg, processed);
areq->first_rsgl.sgl.sgt.sgl,
processed);
if (err)
goto free;
af_alg_pull_tsgl(sk, processed, NULL, 0);
@ -242,7 +243,8 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg,
/* Copy AAD || CT to RX SGL buffer for in-place operation. */
err = crypto_aead_copy_sgl(null_tfm, tsgl_src,
areq->first_rsgl.sgl.sg, outlen);
areq->first_rsgl.sgl.sgt.sgl,
outlen);
if (err)
goto free;
@ -267,10 +269,10 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg,
if (usedpages) {
/* RX SGL present */
struct af_alg_sgl *sgl_prev = &areq->last_rsgl->sgl;
struct scatterlist *sg = sgl_prev->sgt.sgl;
sg_unmark_end(sgl_prev->sg + sgl_prev->npages - 1);
sg_chain(sgl_prev->sg, sgl_prev->npages + 1,
areq->tsgl);
sg_unmark_end(sg + sgl_prev->sgt.nents - 1);
sg_chain(sg, sgl_prev->sgt.nents + 1, areq->tsgl);
} else
/* no RX SGL present (e.g. authentication only) */
rsgl_src = areq->tsgl;
@ -278,7 +280,7 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg,
/* Initialize the crypto operation */
aead_request_set_crypt(&areq->cra_u.aead_req, rsgl_src,
areq->first_rsgl.sgl.sg, used, ctx->iv);
areq->first_rsgl.sgl.sgt.sgl, used, ctx->iv);
aead_request_set_ad(&areq->cra_u.aead_req, ctx->aead_assoclen);
aead_request_set_tfm(&areq->cra_u.aead_req, tfm);
@ -368,7 +370,6 @@ static struct proto_ops algif_aead_ops = {
.release = af_alg_release,
.sendmsg = aead_sendmsg,
.sendpage = af_alg_sendpage,
.recvmsg = aead_recvmsg,
.poll = af_alg_poll,
};
@ -420,18 +421,6 @@ static int aead_sendmsg_nokey(struct socket *sock, struct msghdr *msg,
return aead_sendmsg(sock, msg, size);
}
static ssize_t aead_sendpage_nokey(struct socket *sock, struct page *page,
int offset, size_t size, int flags)
{
int err;
err = aead_check_key(sock);
if (err)
return err;
return af_alg_sendpage(sock, page, offset, size, flags);
}
static int aead_recvmsg_nokey(struct socket *sock, struct msghdr *msg,
size_t ignored, int flags)
{
@ -459,7 +448,6 @@ static struct proto_ops algif_aead_ops_nokey = {
.release = af_alg_release,
.sendmsg = aead_sendmsg_nokey,
.sendpage = aead_sendpage_nokey,
.recvmsg = aead_recvmsg_nokey,
.poll = af_alg_poll,
};

View file

@ -63,122 +63,114 @@ static void hash_free_result(struct sock *sk, struct hash_ctx *ctx)
static int hash_sendmsg(struct socket *sock, struct msghdr *msg,
size_t ignored)
{
int limit = ALG_MAX_PAGES * PAGE_SIZE;
struct sock *sk = sock->sk;
struct alg_sock *ask = alg_sk(sk);
struct hash_ctx *ctx = ask->private;
long copied = 0;
ssize_t copied = 0;
size_t len, max_pages, npages;
bool continuing = ctx->more, need_init = false;
int err;
if (limit > sk->sk_sndbuf)
limit = sk->sk_sndbuf;
max_pages = min_t(size_t, ALG_MAX_PAGES,
DIV_ROUND_UP(sk->sk_sndbuf, PAGE_SIZE));
lock_sock(sk);
if (!ctx->more) {
if ((msg->msg_flags & MSG_MORE))
hash_free_result(sk, ctx);
err = crypto_wait_req(crypto_ahash_init(&ctx->req), &ctx->wait);
if (err)
goto unlock;
if (!continuing) {
/* Discard a previous request that wasn't marked MSG_MORE. */
hash_free_result(sk, ctx);
if (!msg_data_left(msg))
goto done; /* Zero-length; don't start new req */
need_init = true;
} else if (!msg_data_left(msg)) {
/*
* No data - finalise the prev req if MSG_MORE so any error
* comes out here.
*/
if (!(msg->msg_flags & MSG_MORE)) {
err = hash_alloc_result(sk, ctx);
if (err)
goto unlock_free;
ahash_request_set_crypt(&ctx->req, NULL,
ctx->result, 0);
err = crypto_wait_req(crypto_ahash_final(&ctx->req),
&ctx->wait);
if (err)
goto unlock_free;
}
goto done_more;
}
ctx->more = false;
while (msg_data_left(msg)) {
int len = msg_data_left(msg);
ctx->sgl.sgt.sgl = ctx->sgl.sgl;
ctx->sgl.sgt.nents = 0;
ctx->sgl.sgt.orig_nents = 0;
if (len > limit)
len = limit;
err = -EIO;
npages = iov_iter_npages(&msg->msg_iter, max_pages);
if (npages == 0)
goto unlock_free;
len = af_alg_make_sg(&ctx->sgl, &msg->msg_iter, len);
if (len < 0) {
err = copied ? 0 : len;
goto unlock;
sg_init_table(ctx->sgl.sgl, npages);
ctx->sgl.need_unpin = iov_iter_extract_will_pin(&msg->msg_iter);
err = extract_iter_to_sg(&msg->msg_iter, LONG_MAX,
&ctx->sgl.sgt, npages, 0);
if (err < 0)
goto unlock_free;
len = err;
sg_mark_end(ctx->sgl.sgt.sgl + ctx->sgl.sgt.nents - 1);
if (!msg_data_left(msg)) {
err = hash_alloc_result(sk, ctx);
if (err)
goto unlock_free;
}
ahash_request_set_crypt(&ctx->req, ctx->sgl.sg, NULL, len);
ahash_request_set_crypt(&ctx->req, ctx->sgl.sgt.sgl,
ctx->result, len);
err = crypto_wait_req(crypto_ahash_update(&ctx->req),
&ctx->wait);
af_alg_free_sg(&ctx->sgl);
if (err) {
iov_iter_revert(&msg->msg_iter, len);
goto unlock;
if (!msg_data_left(msg) && !continuing &&
!(msg->msg_flags & MSG_MORE)) {
err = crypto_ahash_digest(&ctx->req);
} else {
if (need_init) {
err = crypto_wait_req(
crypto_ahash_init(&ctx->req),
&ctx->wait);
if (err)
goto unlock_free;
need_init = false;
}
if (msg_data_left(msg) || (msg->msg_flags & MSG_MORE))
err = crypto_ahash_update(&ctx->req);
else
err = crypto_ahash_finup(&ctx->req);
continuing = true;
}
err = crypto_wait_req(err, &ctx->wait);
if (err)
goto unlock_free;
copied += len;
af_alg_free_sg(&ctx->sgl);
}
err = 0;
done_more:
ctx->more = msg->msg_flags & MSG_MORE;
if (!ctx->more) {
err = hash_alloc_result(sk, ctx);
if (err)
goto unlock;
ahash_request_set_crypt(&ctx->req, NULL, ctx->result, 0);
err = crypto_wait_req(crypto_ahash_final(&ctx->req),
&ctx->wait);
}
done:
err = 0;
unlock:
release_sock(sk);
return copied ?: err;
return err ?: copied;
}
static ssize_t hash_sendpage(struct socket *sock, struct page *page,
int offset, size_t size, int flags)
{
struct sock *sk = sock->sk;
struct alg_sock *ask = alg_sk(sk);
struct hash_ctx *ctx = ask->private;
int err;
if (flags & MSG_SENDPAGE_NOTLAST)
flags |= MSG_MORE;
lock_sock(sk);
sg_init_table(ctx->sgl.sg, 1);
sg_set_page(ctx->sgl.sg, page, size, offset);
if (!(flags & MSG_MORE)) {
err = hash_alloc_result(sk, ctx);
if (err)
goto unlock;
} else if (!ctx->more)
hash_free_result(sk, ctx);
ahash_request_set_crypt(&ctx->req, ctx->sgl.sg, ctx->result, size);
if (!(flags & MSG_MORE)) {
if (ctx->more)
err = crypto_ahash_finup(&ctx->req);
else
err = crypto_ahash_digest(&ctx->req);
} else {
if (!ctx->more) {
err = crypto_ahash_init(&ctx->req);
err = crypto_wait_req(err, &ctx->wait);
if (err)
goto unlock;
}
err = crypto_ahash_update(&ctx->req);
}
err = crypto_wait_req(err, &ctx->wait);
if (err)
goto unlock;
ctx->more = flags & MSG_MORE;
unlock:
release_sock(sk);
return err ?: size;
unlock_free:
af_alg_free_sg(&ctx->sgl);
hash_free_result(sk, ctx);
ctx->more = false;
goto unlock;
}
static int hash_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
@ -296,7 +288,6 @@ static struct proto_ops algif_hash_ops = {
.release = af_alg_release,
.sendmsg = hash_sendmsg,
.sendpage = hash_sendpage,
.recvmsg = hash_recvmsg,
.accept = hash_accept,
};
@ -348,18 +339,6 @@ static int hash_sendmsg_nokey(struct socket *sock, struct msghdr *msg,
return hash_sendmsg(sock, msg, size);
}
static ssize_t hash_sendpage_nokey(struct socket *sock, struct page *page,
int offset, size_t size, int flags)
{
int err;
err = hash_check_key(sock);
if (err)
return err;
return hash_sendpage(sock, page, offset, size, flags);
}
static int hash_recvmsg_nokey(struct socket *sock, struct msghdr *msg,
size_t ignored, int flags)
{
@ -398,7 +377,6 @@ static struct proto_ops algif_hash_ops_nokey = {
.release = af_alg_release,
.sendmsg = hash_sendmsg_nokey,
.sendpage = hash_sendpage_nokey,
.recvmsg = hash_recvmsg_nokey,
.accept = hash_accept_nokey,
};

View file

@ -174,7 +174,6 @@ static struct proto_ops algif_rng_ops = {
.bind = sock_no_bind,
.accept = sock_no_accept,
.sendmsg = sock_no_sendmsg,
.sendpage = sock_no_sendpage,
.release = af_alg_release,
.recvmsg = rng_recvmsg,
@ -192,7 +191,6 @@ static struct proto_ops __maybe_unused algif_rng_test_ops = {
.mmap = sock_no_mmap,
.bind = sock_no_bind,
.accept = sock_no_accept,
.sendpage = sock_no_sendpage,
.release = af_alg_release,
.recvmsg = rng_test_recvmsg,

View file

@ -9,10 +9,10 @@
* The following concept of the memory management is used:
*
* The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is
* filled by user space with the data submitted via sendpage/sendmsg. Filling
* up the TX SGL does not cause a crypto operation -- the data will only be
* tracked by the kernel. Upon receipt of one recvmsg call, the caller must
* provide a buffer which is tracked with the RX SGL.
* filled by user space with the data submitted via sendmsg. Filling up the TX
* SGL does not cause a crypto operation -- the data will only be tracked by
* the kernel. Upon receipt of one recvmsg call, the caller must provide a
* buffer which is tracked with the RX SGL.
*
* During the processing of the recvmsg operation, the cipher request is
* allocated and prepared. As part of the recvmsg operation, the processed
@ -105,7 +105,7 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg,
/* Initialize the crypto operation */
skcipher_request_set_tfm(&areq->cra_u.skcipher_req, tfm);
skcipher_request_set_crypt(&areq->cra_u.skcipher_req, areq->tsgl,
areq->first_rsgl.sgl.sg, len, ctx->iv);
areq->first_rsgl.sgl.sgt.sgl, len, ctx->iv);
if (msg->msg_iocb && !is_sync_kiocb(msg->msg_iocb)) {
/* AIO operation */
@ -194,7 +194,6 @@ static struct proto_ops algif_skcipher_ops = {
.release = af_alg_release,
.sendmsg = skcipher_sendmsg,
.sendpage = af_alg_sendpage,
.recvmsg = skcipher_recvmsg,
.poll = af_alg_poll,
};
@ -246,18 +245,6 @@ static int skcipher_sendmsg_nokey(struct socket *sock, struct msghdr *msg,
return skcipher_sendmsg(sock, msg, size);
}
static ssize_t skcipher_sendpage_nokey(struct socket *sock, struct page *page,
int offset, size_t size, int flags)
{
int err;
err = skcipher_check_key(sock);
if (err)
return err;
return af_alg_sendpage(sock, page, offset, size, flags);
}
static int skcipher_recvmsg_nokey(struct socket *sock, struct msghdr *msg,
size_t ignored, int flags)
{
@ -285,7 +272,6 @@ static struct proto_ops algif_skcipher_ops_nokey = {
.release = af_alg_release,
.sendmsg = skcipher_sendmsg_nokey,
.sendpage = skcipher_sendpage_nokey,
.recvmsg = skcipher_recvmsg_nokey,
.poll = af_alg_poll,
};

View file

@ -1539,6 +1539,8 @@ static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *pa
int offset, size_t size, unsigned msg_flags)
{
struct socket *socket = peer_device->connection->data.socket;
struct msghdr msg = { .msg_flags = msg_flags, };
struct bio_vec bvec;
int len = size;
int err = -EIO;
@ -1548,15 +1550,17 @@ static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *pa
* put_page(); and would cause either a VM_BUG directly, or
* __page_cache_release a page that would actually still be referenced
* by someone, leading to some obscure delayed Oops somewhere else. */
if (drbd_disable_sendpage || !sendpage_ok(page))
return _drbd_no_send_page(peer_device, page, offset, size, msg_flags);
if (!drbd_disable_sendpage && sendpage_ok(page))
msg.msg_flags |= MSG_NOSIGNAL | MSG_SPLICE_PAGES;
msg_flags |= MSG_NOSIGNAL;
drbd_update_congested(peer_device->connection);
do {
int sent;
sent = socket->ops->sendpage(socket, page, offset, len, msg_flags);
bvec_set_page(&bvec, page, offset, len);
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len);
sent = sock_sendmsg(socket, &msg);
if (sent <= 0) {
if (sent == -EAGAIN) {
if (we_should_drop_the_connection(peer_device->connection, socket))

View file

@ -30,45 +30,65 @@ mlx5_ib_set_vport_rep(struct mlx5_core_dev *dev,
static void mlx5_ib_register_peer_vport_reps(struct mlx5_core_dev *mdev);
static void mlx5_ib_num_ports_update(struct mlx5_core_dev *dev, u32 *num_ports)
{
struct mlx5_core_dev *peer_dev;
int i;
mlx5_lag_for_each_peer_mdev(dev, peer_dev, i) {
u32 peer_num_ports = mlx5_eswitch_get_total_vports(peer_dev);
if (mlx5_lag_is_mpesw(peer_dev))
*num_ports += peer_num_ports;
else
/* Only 1 ib port is the representor for all uplinks */
*num_ports += peer_num_ports - 1;
}
}
static int
mlx5_ib_vport_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep)
{
u32 num_ports = mlx5_eswitch_get_total_vports(dev);
struct mlx5_core_dev *lag_master = dev;
const struct mlx5_ib_profile *profile;
struct mlx5_core_dev *peer_dev;
struct mlx5_ib_dev *ibdev;
int second_uplink = false;
u32 peer_num_ports;
int new_uplink = false;
int vport_index;
int ret;
int i;
vport_index = rep->vport_index;
if (mlx5_lag_is_shared_fdb(dev)) {
peer_dev = mlx5_lag_get_peer_mdev(dev);
peer_num_ports = mlx5_eswitch_get_total_vports(peer_dev);
if (mlx5_lag_is_master(dev)) {
if (mlx5_lag_is_mpesw(dev))
num_ports += peer_num_ports;
else
num_ports += peer_num_ports - 1;
mlx5_ib_num_ports_update(dev, &num_ports);
} else {
if (rep->vport == MLX5_VPORT_UPLINK) {
if (!mlx5_lag_is_mpesw(dev))
return 0;
second_uplink = true;
new_uplink = true;
}
mlx5_lag_for_each_peer_mdev(dev, peer_dev, i) {
u32 peer_n_ports = mlx5_eswitch_get_total_vports(peer_dev);
vport_index += peer_num_ports;
dev = peer_dev;
if (mlx5_lag_is_master(peer_dev))
lag_master = peer_dev;
else if (!mlx5_lag_is_mpesw(dev))
/* Only 1 ib port is the representor for all uplinks */
peer_n_ports--;
if (mlx5_get_dev_index(peer_dev) < mlx5_get_dev_index(dev))
vport_index += peer_n_ports;
}
}
}
if (rep->vport == MLX5_VPORT_UPLINK && !second_uplink)
if (rep->vport == MLX5_VPORT_UPLINK && !new_uplink)
profile = &raw_eth_profile;
else
return mlx5_ib_set_vport_rep(dev, rep, vport_index);
return mlx5_ib_set_vport_rep(lag_master, rep, vport_index);
ibdev = ib_alloc_device(mlx5_ib_dev, ib_dev);
if (!ibdev)
@ -85,8 +105,8 @@ mlx5_ib_vport_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep)
vport_index = rep->vport_index;
ibdev->port[vport_index].rep = rep;
ibdev->port[vport_index].roce.netdev =
mlx5_ib_get_rep_netdev(dev->priv.eswitch, rep->vport);
ibdev->mdev = dev;
mlx5_ib_get_rep_netdev(lag_master->priv.eswitch, rep->vport);
ibdev->mdev = lag_master;
ibdev->num_ports = num_ports;
ret = __mlx5_ib_add(ibdev, profile);
@ -94,8 +114,8 @@ mlx5_ib_vport_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep)
goto fail_add;
rep->rep_data[REP_IB].priv = ibdev;
if (mlx5_lag_is_shared_fdb(dev))
mlx5_ib_register_peer_vport_reps(dev);
if (mlx5_lag_is_shared_fdb(lag_master))
mlx5_ib_register_peer_vport_reps(lag_master);
return 0;
@ -118,23 +138,27 @@ mlx5_ib_vport_rep_unload(struct mlx5_eswitch_rep *rep)
struct mlx5_ib_dev *dev = mlx5_ib_rep_to_dev(rep);
int vport_index = rep->vport_index;
struct mlx5_ib_port *port;
int i;
if (WARN_ON(!mdev))
return;
if (mlx5_lag_is_shared_fdb(mdev) &&
!mlx5_lag_is_master(mdev)) {
struct mlx5_core_dev *peer_mdev;
if (rep->vport == MLX5_VPORT_UPLINK)
return;
peer_mdev = mlx5_lag_get_peer_mdev(mdev);
vport_index += mlx5_eswitch_get_total_vports(peer_mdev);
}
if (!dev)
return;
if (mlx5_lag_is_shared_fdb(mdev) &&
!mlx5_lag_is_master(mdev)) {
if (rep->vport == MLX5_VPORT_UPLINK && !mlx5_lag_is_mpesw(mdev))
return;
for (i = 0; i < dev->num_ports; i++) {
if (dev->port[i].rep == rep)
break;
}
if (WARN_ON(i == dev->num_ports))
return;
vport_index = i;
}
port = &dev->port[vport_index];
write_lock(&port->roce.netdev_lock);
port->roce.netdev = NULL;
@ -143,13 +167,18 @@ mlx5_ib_vport_rep_unload(struct mlx5_eswitch_rep *rep)
port->rep = NULL;
if (rep->vport == MLX5_VPORT_UPLINK) {
struct mlx5_core_dev *peer_mdev;
struct mlx5_eswitch *esw;
if (mlx5_lag_is_shared_fdb(mdev) && !mlx5_lag_is_master(mdev))
return;
if (mlx5_lag_is_shared_fdb(mdev)) {
peer_mdev = mlx5_lag_get_peer_mdev(mdev);
esw = peer_mdev->priv.eswitch;
mlx5_eswitch_unregister_vport_reps(esw, REP_IB);
struct mlx5_core_dev *peer_mdev;
struct mlx5_eswitch *esw;
mlx5_lag_for_each_peer_mdev(mdev, peer_mdev, i) {
esw = peer_mdev->priv.eswitch;
mlx5_eswitch_unregister_vport_reps(esw, REP_IB);
}
}
__mlx5_ib_remove(dev, dev->profile, MLX5_IB_STAGE_MAX);
}
@ -163,14 +192,14 @@ static const struct mlx5_eswitch_rep_ops rep_ops = {
static void mlx5_ib_register_peer_vport_reps(struct mlx5_core_dev *mdev)
{
struct mlx5_core_dev *peer_mdev = mlx5_lag_get_peer_mdev(mdev);
struct mlx5_core_dev *peer_mdev;
struct mlx5_eswitch *esw;
int i;
if (!peer_mdev)
return;
esw = peer_mdev->priv.eswitch;
mlx5_eswitch_register_vport_reps(esw, &rep_ops, REP_IB);
mlx5_lag_for_each_peer_mdev(mdev, peer_mdev, i) {
esw = peer_mdev->priv.eswitch;
mlx5_eswitch_register_vport_reps(esw, &rep_ops, REP_IB);
}
}
struct net_device *mlx5_ib_get_rep_netdev(struct mlx5_eswitch *esw,

View file

@ -312,7 +312,7 @@ static int siw_tx_ctrl(struct siw_iwarp_tx *c_tx, struct socket *s,
}
/*
* 0copy TCP transmit interface: Use do_tcp_sendpages.
* 0copy TCP transmit interface: Use MSG_SPLICE_PAGES.
*
* Using sendpage to push page by page appears to be less efficient
* than using sendmsg, even if data are copied.
@ -323,20 +323,26 @@ static int siw_tx_ctrl(struct siw_iwarp_tx *c_tx, struct socket *s,
static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset,
size_t size)
{
struct bio_vec bvec;
struct msghdr msg = {
.msg_flags = (MSG_MORE | MSG_DONTWAIT | MSG_SPLICE_PAGES),
};
struct sock *sk = s->sk;
int i = 0, rv = 0, sent = 0,
flags = MSG_MORE | MSG_DONTWAIT | MSG_SENDPAGE_NOTLAST;
int i = 0, rv = 0, sent = 0;
while (size) {
size_t bytes = min_t(size_t, PAGE_SIZE - offset, size);
if (size + offset <= PAGE_SIZE)
flags = MSG_MORE | MSG_DONTWAIT;
msg.msg_flags &= ~MSG_MORE;
tcp_rate_check_app_limited(sk);
bvec_set_page(&bvec, page[i], bytes, offset);
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
try_page_again:
lock_sock(sk);
rv = do_tcp_sendpages(sk, page[i], offset, bytes, flags);
rv = tcp_sendmsg_locked(sk, &msg, size);
release_sock(sk);
if (rv > 0) {

View file

@ -13,6 +13,7 @@
#include <linux/atomic.h>
#include <linux/ctype.h>
#include <linux/device.h>
#include <linux/ethtool.h>
#include <linux/init.h>
#include <linux/jiffies.h>
#include <linux/kernel.h>
@ -20,10 +21,13 @@
#include <linux/list.h>
#include <linux/module.h>
#include <linux/netdevice.h>
#include <linux/spinlock.h>
#include <linux/mutex.h>
#include <linux/rtnetlink.h>
#include <linux/timer.h>
#include "../leds.h"
#define NETDEV_LED_DEFAULT_INTERVAL 50
/*
* Configurable sysfs attributes:
*
@ -37,7 +41,7 @@
*/
struct led_netdev_data {
spinlock_t lock;
struct mutex lock;
struct delayed_work work;
struct notifier_block notifier;
@ -50,16 +54,11 @@ struct led_netdev_data {
unsigned int last_activity;
unsigned long mode;
#define NETDEV_LED_LINK 0
#define NETDEV_LED_TX 1
#define NETDEV_LED_RX 2
#define NETDEV_LED_MODE_LINKUP 3
};
int link_speed;
u8 duplex;
enum netdev_led_attr {
NETDEV_ATTR_LINK,
NETDEV_ATTR_TX,
NETDEV_ATTR_RX
bool carrier_link_up;
bool hw_control;
};
static void set_baseline_state(struct led_netdev_data *trigger_data)
@ -67,16 +66,48 @@ static void set_baseline_state(struct led_netdev_data *trigger_data)
int current_brightness;
struct led_classdev *led_cdev = trigger_data->led_cdev;
/* Already validated, hw control is possible with the requested mode */
if (trigger_data->hw_control) {
led_cdev->hw_control_set(led_cdev, trigger_data->mode);
return;
}
current_brightness = led_cdev->brightness;
if (current_brightness)
led_cdev->blink_brightness = current_brightness;
if (!led_cdev->blink_brightness)
led_cdev->blink_brightness = led_cdev->max_brightness;
if (!test_bit(NETDEV_LED_MODE_LINKUP, &trigger_data->mode))
if (!trigger_data->carrier_link_up) {
led_set_brightness(led_cdev, LED_OFF);
else {
if (test_bit(NETDEV_LED_LINK, &trigger_data->mode))
} else {
bool blink_on = false;
if (test_bit(TRIGGER_NETDEV_LINK, &trigger_data->mode))
blink_on = true;
if (test_bit(TRIGGER_NETDEV_LINK_10, &trigger_data->mode) &&
trigger_data->link_speed == SPEED_10)
blink_on = true;
if (test_bit(TRIGGER_NETDEV_LINK_100, &trigger_data->mode) &&
trigger_data->link_speed == SPEED_100)
blink_on = true;
if (test_bit(TRIGGER_NETDEV_LINK_1000, &trigger_data->mode) &&
trigger_data->link_speed == SPEED_1000)
blink_on = true;
if (test_bit(TRIGGER_NETDEV_HALF_DUPLEX, &trigger_data->mode) &&
trigger_data->duplex == DUPLEX_HALF)
blink_on = true;
if (test_bit(TRIGGER_NETDEV_FULL_DUPLEX, &trigger_data->mode) &&
trigger_data->duplex == DUPLEX_FULL)
blink_on = true;
if (blink_on)
led_set_brightness(led_cdev,
led_cdev->blink_brightness);
else
@ -85,44 +116,121 @@ static void set_baseline_state(struct led_netdev_data *trigger_data)
/* If we are looking for RX/TX start periodically
* checking stats
*/
if (test_bit(NETDEV_LED_TX, &trigger_data->mode) ||
test_bit(NETDEV_LED_RX, &trigger_data->mode))
if (test_bit(TRIGGER_NETDEV_TX, &trigger_data->mode) ||
test_bit(TRIGGER_NETDEV_RX, &trigger_data->mode))
schedule_delayed_work(&trigger_data->work, 0);
}
}
static bool supports_hw_control(struct led_classdev *led_cdev)
{
if (!led_cdev->hw_control_get || !led_cdev->hw_control_set ||
!led_cdev->hw_control_is_supported)
return false;
return !strcmp(led_cdev->hw_control_trigger, led_cdev->trigger->name);
}
/*
* Validate the configured netdev is the same as the one associated with
* the LED driver in hw control.
*/
static bool validate_net_dev(struct led_classdev *led_cdev,
struct net_device *net_dev)
{
struct device *dev = led_cdev->hw_control_get_device(led_cdev);
struct net_device *ndev;
if (!dev)
return false;
ndev = to_net_dev(dev);
return ndev == net_dev;
}
static bool can_hw_control(struct led_netdev_data *trigger_data)
{
unsigned long default_interval = msecs_to_jiffies(NETDEV_LED_DEFAULT_INTERVAL);
unsigned int interval = atomic_read(&trigger_data->interval);
struct led_classdev *led_cdev = trigger_data->led_cdev;
int ret;
if (!supports_hw_control(led_cdev))
return false;
/*
* Interval must be set to the default
* value. Any different value is rejected if in hw
* control.
*/
if (interval != default_interval)
return false;
/*
* net_dev must be set with hw control, otherwise no
* blinking can be happening and there is nothing to
* offloaded. Additionally, for hw control to be
* valid, the configured netdev must be the same as
* netdev associated to the LED.
*/
if (!validate_net_dev(led_cdev, trigger_data->net_dev))
return false;
/* Check if the requested mode is supported */
ret = led_cdev->hw_control_is_supported(led_cdev, trigger_data->mode);
/* Fall back to software blinking if not supported */
if (ret == -EOPNOTSUPP)
return false;
if (ret) {
dev_warn(led_cdev->dev,
"Current mode check failed with error %d\n", ret);
return false;
}
return true;
}
static void get_device_state(struct led_netdev_data *trigger_data)
{
struct ethtool_link_ksettings cmd;
trigger_data->carrier_link_up = netif_carrier_ok(trigger_data->net_dev);
if (!trigger_data->carrier_link_up)
return;
if (!__ethtool_get_link_ksettings(trigger_data->net_dev, &cmd)) {
trigger_data->link_speed = cmd.base.speed;
trigger_data->duplex = cmd.base.duplex;
}
}
static ssize_t device_name_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct led_netdev_data *trigger_data = led_trigger_get_drvdata(dev);
ssize_t len;
spin_lock_bh(&trigger_data->lock);
mutex_lock(&trigger_data->lock);
len = sprintf(buf, "%s\n", trigger_data->device_name);
spin_unlock_bh(&trigger_data->lock);
mutex_unlock(&trigger_data->lock);
return len;
}
static ssize_t device_name_store(struct device *dev,
struct device_attribute *attr, const char *buf,
size_t size)
static int set_device_name(struct led_netdev_data *trigger_data,
const char *name, size_t size)
{
struct led_netdev_data *trigger_data = led_trigger_get_drvdata(dev);
if (size >= IFNAMSIZ)
return -EINVAL;
cancel_delayed_work_sync(&trigger_data->work);
spin_lock_bh(&trigger_data->lock);
mutex_lock(&trigger_data->lock);
if (trigger_data->net_dev) {
dev_put(trigger_data->net_dev);
trigger_data->net_dev = NULL;
}
memcpy(trigger_data->device_name, buf, size);
memcpy(trigger_data->device_name, name, size);
trigger_data->device_name[size] = 0;
if (size > 0 && trigger_data->device_name[size - 1] == '\n')
trigger_data->device_name[size - 1] = 0;
@ -131,36 +239,58 @@ static ssize_t device_name_store(struct device *dev,
trigger_data->net_dev =
dev_get_by_name(&init_net, trigger_data->device_name);
clear_bit(NETDEV_LED_MODE_LINKUP, &trigger_data->mode);
if (trigger_data->net_dev != NULL)
if (netif_carrier_ok(trigger_data->net_dev))
set_bit(NETDEV_LED_MODE_LINKUP, &trigger_data->mode);
trigger_data->carrier_link_up = false;
trigger_data->link_speed = SPEED_UNKNOWN;
trigger_data->duplex = DUPLEX_UNKNOWN;
if (trigger_data->net_dev != NULL) {
rtnl_lock();
get_device_state(trigger_data);
rtnl_unlock();
}
trigger_data->last_activity = 0;
set_baseline_state(trigger_data);
spin_unlock_bh(&trigger_data->lock);
mutex_unlock(&trigger_data->lock);
return 0;
}
static ssize_t device_name_store(struct device *dev,
struct device_attribute *attr, const char *buf,
size_t size)
{
struct led_netdev_data *trigger_data = led_trigger_get_drvdata(dev);
int ret;
if (size >= IFNAMSIZ)
return -EINVAL;
ret = set_device_name(trigger_data, buf, size);
if (ret < 0)
return ret;
return size;
}
static DEVICE_ATTR_RW(device_name);
static ssize_t netdev_led_attr_show(struct device *dev, char *buf,
enum netdev_led_attr attr)
enum led_trigger_netdev_modes attr)
{
struct led_netdev_data *trigger_data = led_trigger_get_drvdata(dev);
int bit;
switch (attr) {
case NETDEV_ATTR_LINK:
bit = NETDEV_LED_LINK;
break;
case NETDEV_ATTR_TX:
bit = NETDEV_LED_TX;
break;
case NETDEV_ATTR_RX:
bit = NETDEV_LED_RX;
case TRIGGER_NETDEV_LINK:
case TRIGGER_NETDEV_LINK_10:
case TRIGGER_NETDEV_LINK_100:
case TRIGGER_NETDEV_LINK_1000:
case TRIGGER_NETDEV_HALF_DUPLEX:
case TRIGGER_NETDEV_FULL_DUPLEX:
case TRIGGER_NETDEV_TX:
case TRIGGER_NETDEV_RX:
bit = attr;
break;
default:
return -EINVAL;
@ -170,10 +300,10 @@ static ssize_t netdev_led_attr_show(struct device *dev, char *buf,
}
static ssize_t netdev_led_attr_store(struct device *dev, const char *buf,
size_t size, enum netdev_led_attr attr)
size_t size, enum led_trigger_netdev_modes attr)
{
struct led_netdev_data *trigger_data = led_trigger_get_drvdata(dev);
unsigned long state;
unsigned long state, mode = trigger_data->mode;
int ret;
int bit;
@ -182,72 +312,62 @@ static ssize_t netdev_led_attr_store(struct device *dev, const char *buf,
return ret;
switch (attr) {
case NETDEV_ATTR_LINK:
bit = NETDEV_LED_LINK;
break;
case NETDEV_ATTR_TX:
bit = NETDEV_LED_TX;
break;
case NETDEV_ATTR_RX:
bit = NETDEV_LED_RX;
case TRIGGER_NETDEV_LINK:
case TRIGGER_NETDEV_LINK_10:
case TRIGGER_NETDEV_LINK_100:
case TRIGGER_NETDEV_LINK_1000:
case TRIGGER_NETDEV_HALF_DUPLEX:
case TRIGGER_NETDEV_FULL_DUPLEX:
case TRIGGER_NETDEV_TX:
case TRIGGER_NETDEV_RX:
bit = attr;
break;
default:
return -EINVAL;
}
if (state)
set_bit(bit, &mode);
else
clear_bit(bit, &mode);
if (test_bit(TRIGGER_NETDEV_LINK, &mode) &&
(test_bit(TRIGGER_NETDEV_LINK_10, &mode) ||
test_bit(TRIGGER_NETDEV_LINK_100, &mode) ||
test_bit(TRIGGER_NETDEV_LINK_1000, &mode)))
return -EINVAL;
cancel_delayed_work_sync(&trigger_data->work);
if (state)
set_bit(bit, &trigger_data->mode);
else
clear_bit(bit, &trigger_data->mode);
trigger_data->mode = mode;
trigger_data->hw_control = can_hw_control(trigger_data);
set_baseline_state(trigger_data);
return size;
}
static ssize_t link_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
return netdev_led_attr_show(dev, buf, NETDEV_ATTR_LINK);
}
#define DEFINE_NETDEV_TRIGGER(trigger_name, trigger) \
static ssize_t trigger_name##_show(struct device *dev, \
struct device_attribute *attr, char *buf) \
{ \
return netdev_led_attr_show(dev, buf, trigger); \
} \
static ssize_t trigger_name##_store(struct device *dev, \
struct device_attribute *attr, const char *buf, size_t size) \
{ \
return netdev_led_attr_store(dev, buf, size, trigger); \
} \
static DEVICE_ATTR_RW(trigger_name)
static ssize_t link_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t size)
{
return netdev_led_attr_store(dev, buf, size, NETDEV_ATTR_LINK);
}
static DEVICE_ATTR_RW(link);
static ssize_t tx_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
return netdev_led_attr_show(dev, buf, NETDEV_ATTR_TX);
}
static ssize_t tx_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t size)
{
return netdev_led_attr_store(dev, buf, size, NETDEV_ATTR_TX);
}
static DEVICE_ATTR_RW(tx);
static ssize_t rx_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
return netdev_led_attr_show(dev, buf, NETDEV_ATTR_RX);
}
static ssize_t rx_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t size)
{
return netdev_led_attr_store(dev, buf, size, NETDEV_ATTR_RX);
}
static DEVICE_ATTR_RW(rx);
DEFINE_NETDEV_TRIGGER(link, TRIGGER_NETDEV_LINK);
DEFINE_NETDEV_TRIGGER(link_10, TRIGGER_NETDEV_LINK_10);
DEFINE_NETDEV_TRIGGER(link_100, TRIGGER_NETDEV_LINK_100);
DEFINE_NETDEV_TRIGGER(link_1000, TRIGGER_NETDEV_LINK_1000);
DEFINE_NETDEV_TRIGGER(half_duplex, TRIGGER_NETDEV_HALF_DUPLEX);
DEFINE_NETDEV_TRIGGER(full_duplex, TRIGGER_NETDEV_FULL_DUPLEX);
DEFINE_NETDEV_TRIGGER(tx, TRIGGER_NETDEV_TX);
DEFINE_NETDEV_TRIGGER(rx, TRIGGER_NETDEV_RX);
static ssize_t interval_show(struct device *dev,
struct device_attribute *attr, char *buf)
@ -266,6 +386,9 @@ static ssize_t interval_store(struct device *dev,
unsigned long value;
int ret;
if (trigger_data->hw_control)
return -EINVAL;
ret = kstrtoul(buf, 0, &value);
if (ret)
return ret;
@ -283,12 +406,28 @@ static ssize_t interval_store(struct device *dev,
static DEVICE_ATTR_RW(interval);
static ssize_t hw_control_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct led_netdev_data *trigger_data = led_trigger_get_drvdata(dev);
return sprintf(buf, "%d\n", trigger_data->hw_control);
}
static DEVICE_ATTR_RO(hw_control);
static struct attribute *netdev_trig_attrs[] = {
&dev_attr_device_name.attr,
&dev_attr_link.attr,
&dev_attr_link_10.attr,
&dev_attr_link_100.attr,
&dev_attr_link_1000.attr,
&dev_attr_full_duplex.attr,
&dev_attr_half_duplex.attr,
&dev_attr_rx.attr,
&dev_attr_tx.attr,
&dev_attr_interval.attr,
&dev_attr_hw_control.attr,
NULL
};
ATTRIBUTE_GROUPS(netdev_trig);
@ -313,11 +452,15 @@ static int netdev_trig_notify(struct notifier_block *nb,
cancel_delayed_work_sync(&trigger_data->work);
spin_lock_bh(&trigger_data->lock);
mutex_lock(&trigger_data->lock);
clear_bit(NETDEV_LED_MODE_LINKUP, &trigger_data->mode);
trigger_data->carrier_link_up = false;
trigger_data->link_speed = SPEED_UNKNOWN;
trigger_data->duplex = DUPLEX_UNKNOWN;
switch (evt) {
case NETDEV_CHANGENAME:
get_device_state(trigger_data);
fallthrough;
case NETDEV_REGISTER:
if (trigger_data->net_dev)
dev_put(trigger_data->net_dev);
@ -330,14 +473,13 @@ static int netdev_trig_notify(struct notifier_block *nb,
break;
case NETDEV_UP:
case NETDEV_CHANGE:
if (netif_carrier_ok(dev))
set_bit(NETDEV_LED_MODE_LINKUP, &trigger_data->mode);
get_device_state(trigger_data);
break;
}
set_baseline_state(trigger_data);
spin_unlock_bh(&trigger_data->lock);
mutex_unlock(&trigger_data->lock);
return NOTIFY_DONE;
}
@ -360,21 +502,26 @@ static void netdev_trig_work(struct work_struct *work)
}
/* If we are not looking for RX/TX then return */
if (!test_bit(NETDEV_LED_TX, &trigger_data->mode) &&
!test_bit(NETDEV_LED_RX, &trigger_data->mode))
if (!test_bit(TRIGGER_NETDEV_TX, &trigger_data->mode) &&
!test_bit(TRIGGER_NETDEV_RX, &trigger_data->mode))
return;
dev_stats = dev_get_stats(trigger_data->net_dev, &temp);
new_activity =
(test_bit(NETDEV_LED_TX, &trigger_data->mode) ?
(test_bit(TRIGGER_NETDEV_TX, &trigger_data->mode) ?
dev_stats->tx_packets : 0) +
(test_bit(NETDEV_LED_RX, &trigger_data->mode) ?
(test_bit(TRIGGER_NETDEV_RX, &trigger_data->mode) ?
dev_stats->rx_packets : 0);
if (trigger_data->last_activity != new_activity) {
led_stop_software_blink(trigger_data->led_cdev);
invert = test_bit(NETDEV_LED_LINK, &trigger_data->mode);
invert = test_bit(TRIGGER_NETDEV_LINK, &trigger_data->mode) ||
test_bit(TRIGGER_NETDEV_LINK_10, &trigger_data->mode) ||
test_bit(TRIGGER_NETDEV_LINK_100, &trigger_data->mode) ||
test_bit(TRIGGER_NETDEV_LINK_1000, &trigger_data->mode) ||
test_bit(TRIGGER_NETDEV_HALF_DUPLEX, &trigger_data->mode) ||
test_bit(TRIGGER_NETDEV_FULL_DUPLEX, &trigger_data->mode);
interval = jiffies_to_msecs(
atomic_read(&trigger_data->interval));
/* base state is ON (link present) */
@ -392,13 +539,15 @@ static void netdev_trig_work(struct work_struct *work)
static int netdev_trig_activate(struct led_classdev *led_cdev)
{
struct led_netdev_data *trigger_data;
unsigned long mode = 0;
struct device *dev;
int rc;
trigger_data = kzalloc(sizeof(struct led_netdev_data), GFP_KERNEL);
if (!trigger_data)
return -ENOMEM;
spin_lock_init(&trigger_data->lock);
mutex_init(&trigger_data->lock);
trigger_data->notifier.notifier_call = netdev_trig_notify;
trigger_data->notifier.priority = 10;
@ -410,9 +559,24 @@ static int netdev_trig_activate(struct led_classdev *led_cdev)
trigger_data->device_name[0] = 0;
trigger_data->mode = 0;
atomic_set(&trigger_data->interval, msecs_to_jiffies(50));
atomic_set(&trigger_data->interval, msecs_to_jiffies(NETDEV_LED_DEFAULT_INTERVAL));
trigger_data->last_activity = 0;
/* Check if hw control is active by default on the LED.
* Init already enabled mode in hw control.
*/
if (supports_hw_control(led_cdev) &&
!led_cdev->hw_control_get(led_cdev, &mode)) {
dev = led_cdev->hw_control_get_device(led_cdev);
if (dev) {
const char *name = dev_name(dev);
set_device_name(trigger_data, name, strlen(name));
trigger_data->hw_control = true;
trigger_data->mode = mode;
}
}
led_set_trigger_data(led_cdev, trigger_data);
rc = register_netdevice_notifier(&trigger_data->notifier);

View file

@ -403,7 +403,6 @@ config TUN_VNET_CROSS_LE
config VETH
tristate "Virtual ethernet pair device"
select PAGE_POOL
select PAGE_POOL_STATS
help
This device is a local ethernet tunnel. Devices are created in pairs.
When one end receives the packet it appears on its pair and vice

View file

@ -1,8 +1,9 @@
// SPDX-License-Identifier: GPL-1.0+
/*
* originally based on the dummy device.
*
* Copyright 1999, Thomas Davis, tadavis@lbl.gov.
* Licensed under the GPL. Based on dummy.c, and eql.c devices.
* Based on dummy.c, and eql.c devices.
*
* bonding.c: an Ethernet Bonding driver
*
@ -2871,6 +2872,8 @@ static bool bond_has_this_ip(struct bonding *bond, __be32 ip)
return ret;
}
#define BOND_VLAN_PROTO_NONE cpu_to_be16(0xffff)
static bool bond_handle_vlan(struct slave *slave, struct bond_vlan_tag *tags,
struct sk_buff *skb)
{
@ -2878,13 +2881,13 @@ static bool bond_handle_vlan(struct slave *slave, struct bond_vlan_tag *tags,
struct net_device *slave_dev = slave->dev;
struct bond_vlan_tag *outer_tag = tags;
if (!tags || tags->vlan_proto == VLAN_N_VID)
if (!tags || tags->vlan_proto == BOND_VLAN_PROTO_NONE)
return true;
tags++;
/* Go through all the tags backwards and add them to the packet */
while (tags->vlan_proto != VLAN_N_VID) {
while (tags->vlan_proto != BOND_VLAN_PROTO_NONE) {
if (!tags->vlan_id) {
tags++;
continue;
@ -2960,7 +2963,7 @@ struct bond_vlan_tag *bond_verify_device_path(struct net_device *start_dev,
tags = kcalloc(level + 1, sizeof(*tags), GFP_ATOMIC);
if (!tags)
return ERR_PTR(-ENOMEM);
tags[level].vlan_proto = VLAN_N_VID;
tags[level].vlan_proto = BOND_VLAN_PROTO_NONE;
return tags;
}
@ -4197,7 +4200,7 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
return skb->hash;
return __bond_xmit_hash(bond, skb, skb->data, skb->protocol,
skb_mac_offset(skb), skb_network_offset(skb),
0, skb_network_offset(skb),
skb_headlen(skb));
}
@ -5439,7 +5442,7 @@ static netdev_tx_t bond_tls_device_xmit(struct bonding *bond, struct sk_buff *sk
{
struct net_device *tls_netdev = rcu_dereference(tls_get_ctx(skb->sk)->netdev);
/* tls_netdev might become NULL, even if tls_is_sk_tx_device_offloaded
/* tls_netdev might become NULL, even if tls_is_skb_tx_device_offloaded
* was true, if tls_device_down is running in parallel, but it's OK,
* because bond_get_slave_by_dev has a NULL check.
*/
@ -5458,7 +5461,7 @@ static netdev_tx_t __bond_start_xmit(struct sk_buff *skb, struct net_device *dev
return NETDEV_TX_OK;
#if IS_ENABLED(CONFIG_TLS_DEVICE)
if (skb->sk && tls_is_sk_tx_device_offloaded(skb->sk))
if (tls_is_skb_tx_device_offloaded(skb))
return bond_tls_device_xmit(bond, skb, dev);
#endif

View file

@ -1,3 +1,4 @@
/* SPDX-License-Identifier: GPL-1.0+ */
/*
* Bond several ethernet interfaces into a Cisco, running 'Etherchannel'.
*
@ -7,9 +8,6 @@
* BUT, I'm the one who modified it for ethernet, so:
* (c) Copyright 1999, Thomas Davis, tadavis@lbl.gov
*
* This software may be used and distributed according to the terms
* of the GNU Public License, incorporated herein by reference.
*
*/
#ifndef _BONDING_PRIV_H

View file

@ -153,8 +153,7 @@ config CAN_JANZ_ICAN3
config CAN_KVASER_PCIEFD
depends on PCI
tristate "Kvaser PCIe FD cards"
select CRC32
help
help
This is a driver for the Kvaser PCI Express CAN FD family.
Supported devices:

View file

@ -1346,7 +1346,7 @@ static int at91_can_probe(struct platform_device *pdev)
return err;
}
static int at91_can_remove(struct platform_device *pdev)
static void at91_can_remove(struct platform_device *pdev)
{
struct net_device *dev = platform_get_drvdata(pdev);
struct at91_priv *priv = netdev_priv(dev);
@ -1362,8 +1362,6 @@ static int at91_can_remove(struct platform_device *pdev)
clk_put(priv->clk);
free_candev(dev);
return 0;
}
static const struct platform_device_id at91_can_id_table[] = {
@ -1381,7 +1379,7 @@ MODULE_DEVICE_TABLE(platform, at91_can_id_table);
static struct platform_driver at91_can_driver = {
.probe = at91_can_probe,
.remove = at91_can_remove,
.remove_new = at91_can_remove,
.driver = {
.name = KBUILD_MODNAME,
.of_match_table = of_match_ptr(at91_can_dt_ids),

View file

@ -966,22 +966,16 @@ static int bxcan_probe(struct platform_device *pdev)
}
rx_irq = platform_get_irq_byname(pdev, "rx0");
if (rx_irq < 0) {
dev_err(dev, "failed to get rx0 irq\n");
if (rx_irq < 0)
return rx_irq;
}
tx_irq = platform_get_irq_byname(pdev, "tx");
if (tx_irq < 0) {
dev_err(dev, "failed to get tx irq\n");
if (tx_irq < 0)
return tx_irq;
}
sce_irq = platform_get_irq_byname(pdev, "sce");
if (sce_irq < 0) {
dev_err(dev, "failed to get sce irq\n");
if (sce_irq < 0)
return sce_irq;
}
ndev = alloc_candev(sizeof(struct bxcan_priv), BXCAN_TX_MB_NUM);
if (!ndev) {
@ -1039,7 +1033,7 @@ static int bxcan_probe(struct platform_device *pdev)
return err;
}
static int bxcan_remove(struct platform_device *pdev)
static void bxcan_remove(struct platform_device *pdev)
{
struct net_device *ndev = platform_get_drvdata(pdev);
struct bxcan_priv *priv = netdev_priv(ndev);
@ -1048,7 +1042,6 @@ static int bxcan_remove(struct platform_device *pdev)
clk_disable_unprepare(priv->clk);
can_rx_offload_del(&priv->offload);
free_candev(ndev);
return 0;
}
static int __maybe_unused bxcan_suspend(struct device *dev)
@ -1100,7 +1093,7 @@ static struct platform_driver bxcan_driver = {
.of_match_table = bxcan_of_match,
},
.probe = bxcan_probe,
.remove = bxcan_remove,
.remove_new = bxcan_remove,
};
module_platform_driver(bxcan_driver);

View file

@ -410,7 +410,7 @@ static int c_can_plat_probe(struct platform_device *pdev)
return ret;
}
static int c_can_plat_remove(struct platform_device *pdev)
static void c_can_plat_remove(struct platform_device *pdev)
{
struct net_device *dev = platform_get_drvdata(pdev);
struct c_can_priv *priv = netdev_priv(dev);
@ -418,8 +418,6 @@ static int c_can_plat_remove(struct platform_device *pdev)
unregister_c_can_dev(dev);
pm_runtime_disable(priv->device);
free_c_can_dev(dev);
return 0;
}
#ifdef CONFIG_PM
@ -487,7 +485,7 @@ static struct platform_driver c_can_plat_driver = {
.of_match_table = c_can_of_table,
},
.probe = c_can_plat_probe,
.remove = c_can_plat_remove,
.remove_new = c_can_plat_remove,
.suspend = c_can_suspend,
.resume = c_can_resume,
.id_table = c_can_id_table,

View file

@ -285,7 +285,7 @@ static int cc770_isa_probe(struct platform_device *pdev)
return err;
}
static int cc770_isa_remove(struct platform_device *pdev)
static void cc770_isa_remove(struct platform_device *pdev)
{
struct net_device *dev = platform_get_drvdata(pdev);
struct cc770_priv *priv = netdev_priv(dev);
@ -303,13 +303,11 @@ static int cc770_isa_remove(struct platform_device *pdev)
release_region(port[idx], CC770_IOSIZE);
}
free_cc770dev(dev);
return 0;
}
static struct platform_driver cc770_isa_driver = {
.probe = cc770_isa_probe,
.remove = cc770_isa_remove,
.remove_new = cc770_isa_remove,
.driver = {
.name = KBUILD_MODNAME,
},

View file

@ -230,7 +230,7 @@ static int cc770_platform_probe(struct platform_device *pdev)
return err;
}
static int cc770_platform_remove(struct platform_device *pdev)
static void cc770_platform_remove(struct platform_device *pdev)
{
struct net_device *dev = platform_get_drvdata(pdev);
struct cc770_priv *priv = netdev_priv(dev);
@ -242,8 +242,6 @@ static int cc770_platform_remove(struct platform_device *pdev)
mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
release_mem_region(mem->start, resource_size(mem));
return 0;
}
static const struct of_device_id cc770_platform_table[] = {
@ -259,7 +257,7 @@ static struct platform_driver cc770_platform_driver = {
.of_match_table = cc770_platform_table,
},
.probe = cc770_platform_probe,
.remove = cc770_platform_remove,
.remove_new = cc770_platform_remove,
};
module_platform_driver(cc770_platform_driver);

View file

@ -86,7 +86,7 @@ static int ctucan_platform_probe(struct platform_device *pdev)
* This function frees all the resources allocated to the device.
* Return: 0 always
*/
static int ctucan_platform_remove(struct platform_device *pdev)
static void ctucan_platform_remove(struct platform_device *pdev)
{
struct net_device *ndev = platform_get_drvdata(pdev);
struct ctucan_priv *priv = netdev_priv(ndev);
@ -97,8 +97,6 @@ static int ctucan_platform_remove(struct platform_device *pdev)
pm_runtime_disable(&pdev->dev);
netif_napi_del(&priv->napi);
free_candev(ndev);
return 0;
}
static SIMPLE_DEV_PM_OPS(ctucan_platform_pm_ops, ctucan_suspend, ctucan_resume);
@ -113,7 +111,7 @@ MODULE_DEVICE_TABLE(of, ctucan_of_match);
static struct platform_driver ctucanfd_driver = {
.probe = ctucan_platform_probe,
.remove = ctucan_platform_remove,
.remove_new = ctucan_platform_remove,
.driver = {
.name = DRV_NAME,
.pm = &ctucan_platform_pm_ops,

View file

@ -78,18 +78,7 @@ unsigned int can_skb_get_frame_len(const struct sk_buff *skb)
else
len = cf->len;
if (can_is_canfd_skb(skb)) {
if (cf->can_id & CAN_EFF_FLAG)
len += CANFD_FRAME_OVERHEAD_EFF;
else
len += CANFD_FRAME_OVERHEAD_SFF;
} else {
if (cf->can_id & CAN_EFF_FLAG)
len += CAN_FRAME_OVERHEAD_EFF;
else
len += CAN_FRAME_OVERHEAD_SFF;
}
return len;
return can_frame_bytes(can_is_canfd_skb(skb), cf->can_id & CAN_EFF_FLAG,
false, len);
}
EXPORT_SYMBOL_GPL(can_skb_get_frame_len);

View file

@ -220,7 +220,7 @@ int can_rx_offload_irq_offload_fifo(struct can_rx_offload *offload)
EXPORT_SYMBOL_GPL(can_rx_offload_irq_offload_fifo);
int can_rx_offload_queue_timestamp(struct can_rx_offload *offload,
struct sk_buff *skb, u32 timestamp)
struct sk_buff *skb, u32 timestamp)
{
struct can_rx_offload_cb *cb;

View file

@ -2218,7 +2218,7 @@ static int flexcan_probe(struct platform_device *pdev)
return err;
}
static int flexcan_remove(struct platform_device *pdev)
static void flexcan_remove(struct platform_device *pdev)
{
struct net_device *dev = platform_get_drvdata(pdev);
@ -2227,8 +2227,6 @@ static int flexcan_remove(struct platform_device *pdev)
unregister_flexcandev(dev);
pm_runtime_disable(&pdev->dev);
free_candev(dev);
return 0;
}
static int __maybe_unused flexcan_suspend(struct device *device)
@ -2379,7 +2377,7 @@ static struct platform_driver flexcan_driver = {
.of_match_table = flexcan_of_match,
},
.probe = flexcan_probe,
.remove = flexcan_remove,
.remove_new = flexcan_remove,
.id_table = flexcan_id_table,
};

View file

@ -1696,7 +1696,7 @@ static int grcan_probe(struct platform_device *ofdev)
return err;
}
static int grcan_remove(struct platform_device *ofdev)
static void grcan_remove(struct platform_device *ofdev)
{
struct net_device *dev = platform_get_drvdata(ofdev);
struct grcan_priv *priv = netdev_priv(dev);
@ -1706,8 +1706,6 @@ static int grcan_remove(struct platform_device *ofdev)
irq_dispose_mapping(dev->irq);
netif_napi_del(&priv->napi);
free_candev(dev);
return 0;
}
static const struct of_device_id grcan_match[] = {
@ -1726,7 +1724,7 @@ static struct platform_driver grcan_driver = {
.of_match_table = grcan_match,
},
.probe = grcan_probe,
.remove = grcan_remove,
.remove_new = grcan_remove,
};
module_platform_driver(grcan_driver);

View file

@ -1013,15 +1013,13 @@ static int ifi_canfd_plat_probe(struct platform_device *pdev)
return ret;
}
static int ifi_canfd_plat_remove(struct platform_device *pdev)
static void ifi_canfd_plat_remove(struct platform_device *pdev)
{
struct net_device *ndev = platform_get_drvdata(pdev);
unregister_candev(ndev);
platform_set_drvdata(pdev, NULL);
free_candev(ndev);
return 0;
}
static const struct of_device_id ifi_canfd_of_table[] = {
@ -1036,7 +1034,7 @@ static struct platform_driver ifi_canfd_plat_driver = {
.of_match_table = ifi_canfd_of_table,
},
.probe = ifi_canfd_plat_probe,
.remove = ifi_canfd_plat_remove,
.remove_new = ifi_canfd_plat_remove,
};
module_platform_driver(ifi_canfd_plat_driver);

View file

@ -2023,7 +2023,7 @@ static int ican3_probe(struct platform_device *pdev)
return ret;
}
static int ican3_remove(struct platform_device *pdev)
static void ican3_remove(struct platform_device *pdev)
{
struct net_device *ndev = platform_get_drvdata(pdev);
struct ican3_dev *mod = netdev_priv(ndev);
@ -2042,8 +2042,6 @@ static int ican3_remove(struct platform_device *pdev)
iounmap(mod->dpm);
free_candev(ndev);
return 0;
}
static struct platform_driver ican3_driver = {
@ -2051,7 +2049,7 @@ static struct platform_driver ican3_driver = {
.name = DRV_NAME,
},
.probe = ican3_probe,
.remove = ican3_remove,
.remove_new = ican3_remove,
};
module_platform_driver(ican3_driver);

File diff suppressed because it is too large Load diff

View file

@ -469,7 +469,7 @@ static void m_can_receive_skb(struct m_can_classdev *cdev,
int err;
err = can_rx_offload_queue_timestamp(&cdev->offload, skb,
timestamp);
timestamp);
if (err)
stats->rx_fifo_errors++;
} else {
@ -895,7 +895,7 @@ static int m_can_handle_bus_errors(struct net_device *dev, u32 irqstatus,
netdev_dbg(dev, "Arbitration phase error detected\n");
work_done += m_can_handle_lec_err(dev, lec);
}
if (is_lec_err(dlec)) {
netdev_dbg(dev, "Data phase error detected\n");
work_done += m_can_handle_lec_err(dev, dlec);

View file

@ -164,7 +164,7 @@ static __maybe_unused int m_can_resume(struct device *dev)
return m_can_class_resume(dev);
}
static int m_can_plat_remove(struct platform_device *pdev)
static void m_can_plat_remove(struct platform_device *pdev)
{
struct m_can_plat_priv *priv = platform_get_drvdata(pdev);
struct m_can_classdev *mcan_class = &priv->cdev;
@ -172,8 +172,6 @@ static int m_can_plat_remove(struct platform_device *pdev)
m_can_class_unregister(mcan_class);
m_can_class_free_dev(mcan_class->net);
return 0;
}
static int __maybe_unused m_can_runtime_suspend(struct device *dev)
@ -223,7 +221,7 @@ static struct platform_driver m_can_plat_driver = {
.pm = &m_can_pmops,
},
.probe = m_can_plat_probe,
.remove = m_can_plat_remove,
.remove_new = m_can_plat_remove,
};
module_platform_driver(m_can_plat_driver);

View file

@ -349,7 +349,7 @@ static int mpc5xxx_can_probe(struct platform_device *ofdev)
return err;
}
static int mpc5xxx_can_remove(struct platform_device *ofdev)
static void mpc5xxx_can_remove(struct platform_device *ofdev)
{
const struct of_device_id *match;
const struct mpc5xxx_can_data *data;
@ -365,8 +365,6 @@ static int mpc5xxx_can_remove(struct platform_device *ofdev)
iounmap(priv->reg_base);
irq_dispose_mapping(dev->irq);
free_candev(dev);
return 0;
}
#ifdef CONFIG_PM
@ -437,7 +435,7 @@ static struct platform_driver mpc5xxx_can_driver = {
.of_match_table = mpc5xxx_can_table,
},
.probe = mpc5xxx_can_probe,
.remove = mpc5xxx_can_remove,
.remove_new = mpc5xxx_can_remove,
#ifdef CONFIG_PM
.suspend = mpc5xxx_can_suspend,
.resume = mpc5xxx_can_resume,

View file

@ -824,7 +824,7 @@ static int rcar_can_probe(struct platform_device *pdev)
return err;
}
static int rcar_can_remove(struct platform_device *pdev)
static void rcar_can_remove(struct platform_device *pdev)
{
struct net_device *ndev = platform_get_drvdata(pdev);
struct rcar_can_priv *priv = netdev_priv(ndev);
@ -832,7 +832,6 @@ static int rcar_can_remove(struct platform_device *pdev)
unregister_candev(ndev);
netif_napi_del(&priv->napi);
free_candev(ndev);
return 0;
}
static int __maybe_unused rcar_can_suspend(struct device *dev)
@ -908,7 +907,7 @@ static struct platform_driver rcar_can_driver = {
.pm = &rcar_can_pm_ops,
},
.probe = rcar_can_probe,
.remove = rcar_can_remove,
.remove_new = rcar_can_remove,
};
module_platform_driver(rcar_can_driver);

View file

@ -2078,7 +2078,7 @@ static int rcar_canfd_probe(struct platform_device *pdev)
return err;
}
static int rcar_canfd_remove(struct platform_device *pdev)
static void rcar_canfd_remove(struct platform_device *pdev)
{
struct rcar_canfd_global *gpriv = platform_get_drvdata(pdev);
u32 ch;
@ -2096,8 +2096,6 @@ static int rcar_canfd_remove(struct platform_device *pdev)
clk_disable_unprepare(gpriv->clkp);
reset_control_assert(gpriv->rstc1);
reset_control_assert(gpriv->rstc2);
return 0;
}
static int __maybe_unused rcar_canfd_suspend(struct device *dev)
@ -2130,7 +2128,7 @@ static struct platform_driver rcar_canfd_driver = {
.pm = &rcar_canfd_pm_ops,
},
.probe = rcar_canfd_probe,
.remove = rcar_canfd_remove,
.remove_new = rcar_canfd_remove,
};
module_platform_driver(rcar_canfd_driver);

View file

@ -387,6 +387,16 @@ static void sja1000_rx(struct net_device *dev)
netif_rx(skb);
}
static irqreturn_t sja1000_reset_interrupt(int irq, void *dev_id)
{
struct net_device *dev = (struct net_device *)dev_id;
netdev_dbg(dev, "performing a soft reset upon overrun\n");
sja1000_start(dev);
return IRQ_HANDLED;
}
static int sja1000_err(struct net_device *dev, uint8_t isrc, uint8_t status)
{
struct sja1000_priv *priv = netdev_priv(dev);
@ -397,6 +407,7 @@ static int sja1000_err(struct net_device *dev, uint8_t isrc, uint8_t status)
enum can_state rx_state, tx_state;
unsigned int rxerr, txerr;
uint8_t ecc, alc;
int ret = 0;
skb = alloc_can_err_skb(dev, &cf);
if (skb == NULL)
@ -413,6 +424,15 @@ static int sja1000_err(struct net_device *dev, uint8_t isrc, uint8_t status)
stats->rx_over_errors++;
stats->rx_errors++;
sja1000_write_cmdreg(priv, CMD_CDO); /* clear bit */
/* Some controllers needs additional handling upon overrun
* condition: the controller may sometimes be totally confused
* and refuse any new frame while its buffer is empty. The only
* way to re-sync the read vs. write buffer offsets is to
* stop any current handling and perform a reset.
*/
if (priv->flags & SJA1000_QUIRK_RESET_ON_OVERRUN)
ret = IRQ_WAKE_THREAD;
}
if (isrc & IRQ_EI) {
@ -492,7 +512,7 @@ static int sja1000_err(struct net_device *dev, uint8_t isrc, uint8_t status)
netif_rx(skb);
return 0;
return ret;
}
irqreturn_t sja1000_interrupt(int irq, void *dev_id)
@ -501,7 +521,8 @@ irqreturn_t sja1000_interrupt(int irq, void *dev_id)
struct sja1000_priv *priv = netdev_priv(dev);
struct net_device_stats *stats = &dev->stats;
uint8_t isrc, status;
int n = 0;
irqreturn_t ret = 0;
int n = 0, err;
if (priv->pre_irq)
priv->pre_irq(priv);
@ -546,19 +567,25 @@ irqreturn_t sja1000_interrupt(int irq, void *dev_id)
}
if (isrc & (IRQ_DOI | IRQ_EI | IRQ_BEI | IRQ_EPI | IRQ_ALI)) {
/* error interrupt */
if (sja1000_err(dev, isrc, status))
err = sja1000_err(dev, isrc, status);
if (err == IRQ_WAKE_THREAD)
ret = err;
if (err)
break;
}
n++;
}
out:
if (!ret)
ret = (n) ? IRQ_HANDLED : IRQ_NONE;
if (priv->post_irq)
priv->post_irq(priv);
if (n >= SJA1000_MAX_IRQ)
netdev_dbg(dev, "%d messages handled in ISR", n);
return (n) ? IRQ_HANDLED : IRQ_NONE;
return ret;
}
EXPORT_SYMBOL_GPL(sja1000_interrupt);
@ -577,8 +604,9 @@ static int sja1000_open(struct net_device *dev)
/* register interrupt handler, if not done by the device driver */
if (!(priv->flags & SJA1000_CUSTOM_IRQ_HANDLER)) {
err = request_irq(dev->irq, sja1000_interrupt, priv->irq_flags,
dev->name, (void *)dev);
err = request_threaded_irq(dev->irq, sja1000_interrupt,
sja1000_reset_interrupt,
priv->irq_flags, dev->name, (void *)dev);
if (err) {
close_candev(dev);
return -EAGAIN;

View file

@ -147,6 +147,7 @@
*/
#define SJA1000_CUSTOM_IRQ_HANDLER BIT(0)
#define SJA1000_QUIRK_NO_CDR_REG BIT(1)
#define SJA1000_QUIRK_RESET_ON_OVERRUN BIT(2)
/*
* SJA1000 private data structure

View file

@ -223,7 +223,7 @@ static int sja1000_isa_probe(struct platform_device *pdev)
return err;
}
static int sja1000_isa_remove(struct platform_device *pdev)
static void sja1000_isa_remove(struct platform_device *pdev)
{
struct net_device *dev = platform_get_drvdata(pdev);
struct sja1000_priv *priv = netdev_priv(dev);
@ -241,13 +241,11 @@ static int sja1000_isa_remove(struct platform_device *pdev)
release_region(port[idx], SJA1000_IOSIZE);
}
free_sja1000dev(dev);
return 0;
}
static struct platform_driver sja1000_isa_driver = {
.probe = sja1000_isa_probe,
.remove = sja1000_isa_remove,
.remove_new = sja1000_isa_remove,
.driver = {
.name = DRV_NAME,
},

View file

@ -106,7 +106,7 @@ static void sp_technologic_init(struct sja1000_priv *priv, struct device_node *o
static void sp_rzn1_init(struct sja1000_priv *priv, struct device_node *of)
{
priv->flags = SJA1000_QUIRK_NO_CDR_REG;
priv->flags = SJA1000_QUIRK_NO_CDR_REG | SJA1000_QUIRK_RESET_ON_OVERRUN;
}
static void sp_populate(struct sja1000_priv *priv,
@ -277,6 +277,9 @@ static int sp_probe(struct platform_device *pdev)
priv->irq_flags = IRQF_SHARED;
}
if (priv->flags & SJA1000_QUIRK_RESET_ON_OVERRUN)
priv->irq_flags |= IRQF_ONESHOT;
dev->irq = irq;
priv->reg_base = addr;
@ -317,19 +320,17 @@ static int sp_probe(struct platform_device *pdev)
return err;
}
static int sp_remove(struct platform_device *pdev)
static void sp_remove(struct platform_device *pdev)
{
struct net_device *dev = platform_get_drvdata(pdev);
unregister_sja1000dev(dev);
free_sja1000dev(dev);
return 0;
}
static struct platform_driver sp_driver = {
.probe = sp_probe,
.remove = sp_remove,
.remove_new = sp_remove,
.driver = {
.name = DRV_NAME,
.of_match_table = sp_of_table,

View file

@ -729,7 +729,7 @@ static const struct attribute_group softing_pdev_group = {
/*
* platform driver
*/
static int softing_pdev_remove(struct platform_device *pdev)
static void softing_pdev_remove(struct platform_device *pdev)
{
struct softing *card = platform_get_drvdata(pdev);
int j;
@ -747,7 +747,6 @@ static int softing_pdev_remove(struct platform_device *pdev)
iounmap(card->dpram);
kfree(card);
return 0;
}
static int softing_pdev_probe(struct platform_device *pdev)
@ -855,7 +854,7 @@ static struct platform_driver softing_driver = {
.name = KBUILD_MODNAME,
},
.probe = softing_pdev_probe,
.remove = softing_pdev_remove,
.remove_new = softing_pdev_remove,
};
module_platform_driver(softing_driver);

View file

@ -791,14 +791,12 @@ static const struct of_device_id sun4ican_of_match[] = {
MODULE_DEVICE_TABLE(of, sun4ican_of_match);
static int sun4ican_remove(struct platform_device *pdev)
static void sun4ican_remove(struct platform_device *pdev)
{
struct net_device *dev = platform_get_drvdata(pdev);
unregister_netdev(dev);
free_candev(dev);
return 0;
}
static int sun4ican_probe(struct platform_device *pdev)
@ -901,7 +899,7 @@ static struct platform_driver sun4i_can_driver = {
.of_match_table = sun4ican_of_match,
},
.probe = sun4ican_probe,
.remove = sun4ican_remove,
.remove_new = sun4ican_remove,
};
module_platform_driver(sun4i_can_driver);

View file

@ -625,7 +625,7 @@ static int ti_hecc_error(struct net_device *ndev, int int_status,
timestamp = hecc_read(priv, HECC_CANLNT);
err = can_rx_offload_queue_timestamp(&priv->offload, skb,
timestamp);
timestamp);
if (err)
ndev->stats.rx_fifo_errors++;
}
@ -963,7 +963,7 @@ static int ti_hecc_probe(struct platform_device *pdev)
return err;
}
static int ti_hecc_remove(struct platform_device *pdev)
static void ti_hecc_remove(struct platform_device *pdev)
{
struct net_device *ndev = platform_get_drvdata(pdev);
struct ti_hecc_priv *priv = netdev_priv(ndev);
@ -973,8 +973,6 @@ static int ti_hecc_remove(struct platform_device *pdev)
clk_put(priv->clk);
can_rx_offload_del(&priv->offload);
free_candev(ndev);
return 0;
}
#ifdef CONFIG_PM
@ -1028,7 +1026,7 @@ static struct platform_driver ti_hecc_driver = {
.of_match_table = ti_hecc_dt_ids,
},
.probe = ti_hecc_probe,
.remove = ti_hecc_remove,
.remove_new = ti_hecc_remove,
.suspend = ti_hecc_suspend,
.resume = ti_hecc_resume,
};

Some files were not shown because too many files have changed in this diff Show more