linux/net/ipv4
David Ahern 2fbc6e89b2 ipv4: Update exception handling for multipath routes via same device
Kfir reported that pmtu exceptions are not created properly for
deployments where multipath routes use the same device.

After some digging I see 2 compounding problems:
1. ip_route_output_key_hash_rcu is updating the flowi4_oif *after*
   the route lookup. This is the second use case where this has
   been a problem (the first is related to use of vti devices with
   VRF). I can not find any reason for the oif to be changed after the
   lookup; the code goes back to the start of git. It does not seem
   logical so remove it.

2. fib_lookups for exceptions do not call fib_select_path to handle
   multipath route selection based on the hash.

The end result is that the fib_lookup used to add the exception
always creates it based using the first leg of the route.

An example topology showing the problem:

                 |  host1
             +------+
             | eth0 |  .209
             +------+
                 |
             +------+
     switch  | br0  |
             +------+
                 |
       +---------+---------+
       | host2             |  host3
   +------+             +------+
   | eth0 | .250        | eth0 | 192.168.252.252
   +------+             +------+

   +-----+             +-----+
   | vti | .2          | vti | 192.168.247.3
   +-----+             +-----+
       \                  /
 =================================
 tunnels
         192.168.247.1/24

for h in host1 host2 host3; do
        ip netns add ${h}
        ip -netns ${h} link set lo up
        ip netns exec ${h} sysctl -wq net.ipv4.ip_forward=1
done

ip netns add switch
ip -netns switch li set lo up
ip -netns switch link add br0 type bridge stp 0
ip -netns switch link set br0 up

for n in 1 2 3; do
        ip -netns switch link add eth-sw type veth peer name eth-h${n}
        ip -netns switch li set eth-h${n} master br0 up
        ip -netns switch li set eth-sw netns host${n} name eth0
done

ip -netns host1 addr add 192.168.252.209/24 dev eth0
ip -netns host1 link set dev eth0 up
ip -netns host1 route add 192.168.247.0/24 \
        nexthop via 192.168.252.250 dev eth0 nexthop via 192.168.252.252 dev eth0

ip -netns host2 addr add 192.168.252.250/24 dev eth0
ip -netns host2 link set dev eth0 up

ip -netns host2 addr add 192.168.252.252/24 dev eth0
ip -netns host3 link set dev eth0 up

ip netns add tunnel
ip -netns tunnel li set lo up
ip -netns tunnel li add br0 type bridge
ip -netns tunnel li set br0 up
for n in $(seq 11 20); do
        ip -netns tunnel addr add dev br0 192.168.247.${n}/24
done

for n in 2 3
do
        ip -netns tunnel link add vti${n} type veth peer name eth${n}
        ip -netns tunnel link set eth${n} mtu 1360 master br0 up
        ip -netns tunnel link set vti${n} netns host${n} mtu 1360 up
        ip -netns host${n} addr add dev vti${n} 192.168.247.${n}/24
done
ip -netns tunnel ro add default nexthop via 192.168.247.2 nexthop via 192.168.247.3

ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.11
ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.15
ip -netns host1 ro ls cache

Before this patch the cache always shows exceptions against the first
leg in the multipath route; 192.168.252.250 per this example. Since the
hash has an initial random seed, you may need to vary the final octet
more than what is listed. In my tests, using addresses between 11 and 19
usually found 1 that used both legs.

With this patch, the cache will have exceptions for both legs.

Fixes: 4895c771c7 ("ipv4: Add FIB nexthop exceptions")
Reported-by: Kfir Itzhak <mastertheknife@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15 15:44:09 -07:00
..
bpfilter net: Revert "net: optimize the sockptr_t for unified kernel/user address spaces" 2020-08-10 12:06:44 -07:00
netfilter netfilter: delete repeated words 2020-08-28 20:11:38 +02:00
af_inet.c net: remove compat_sock_common_{get,set}sockopt 2020-07-19 18:16:40 -07:00
ah4.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
arp.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
bpf_tcp_ca.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-03-30 19:52:37 -07:00
cipso_ipv4.c net: ipv4: kerneldoc fixes 2020-07-13 17:20:39 -07:00
datagram.c inet: stop leaking jiffies on the wire 2019-11-01 14:57:52 -07:00
devinet.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-31 17:48:46 -07:00
esp4.c ESP: Export esp_output_fill_trailer function 2020-02-19 13:52:32 +01:00
esp4_offload.c net: Add MODULE_DESCRIPTION entries to network modules 2020-06-20 21:33:57 -07:00
fib_frontend.c ipv4: Initialize flowi4_multipath_hash in data path 2020-09-14 14:54:56 -07:00
fib_lookup.h net: add net available in build_state 2020-03-29 22:30:57 -07:00
fib_notifier.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
fib_rules.c fib: use indirect call wrappers in the most common fib_rules_ops 2020-07-28 17:42:31 -07:00
fib_semantics.c net: Fix the arp error in some cases 2020-06-18 20:21:51 -07:00
fib_trie.c ipv4: Silence suspicious RCU usage warning 2020-08-26 15:58:48 -07:00
fou.c net: Add MODULE_DESCRIPTION entries to network modules 2020-06-20 21:33:57 -07:00
gre_demux.c gre: fix uninit-value in __iptunnel_pull_header 2020-03-08 21:25:37 -07:00
gre_offload.c net: gre: recompute gre csum for sctp over gre tunnels 2020-08-03 15:29:44 -07:00
icmp.c icmp6: support rfc 4884 2020-07-24 17:12:41 -07:00
igmp.c ip*_mc_gsfget(): lift copyout of struct group_filter into callers 2020-05-20 20:31:27 -04:00
inet_connection_sock.c net: refactor bind_bucket fastreuse into helper 2020-08-11 15:49:08 -07:00
inet_diag.c inet_diag: support for wider protocol numbers 2020-07-09 12:38:41 -07:00
inet_fragment.c inet: frags: re-introduce skb coalescing for local delivery 2019-08-08 15:55:10 -07:00
inet_hashtables.c net: initialize fastreuse on inet_inherit_port 2020-08-11 15:49:08 -07:00
inet_timewait_sock.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
inetpeer.c inetpeer: fix data-race in inet_putpeer / inet_putpeer 2019-11-07 16:15:56 -08:00
ip_forward.c ipv4: Revert removal of rt_uses_gateway 2019-09-20 18:23:33 -07:00
ip_fragment.c inet: frags: re-introduce skb coalescing for local delivery 2019-08-08 15:55:10 -07:00
ip_gre.c net: add a new ndo_tunnel_ioctl method 2020-05-19 15:45:11 -07:00
ip_input.c bpf: Add socket assign support 2020-03-30 13:45:04 -07:00
ip_options.c net/ipv4: merge ip_options_get and ip_options_get_from_user 2020-07-24 15:41:54 -07:00
ip_output.c ip: fix tos reflection in ack and reset packets 2020-09-08 20:19:08 -07:00
ip_sockglue.c icmp: prepare rfc 4884 for ipv6 2020-07-24 17:12:41 -07:00
ip_tunnel.c ip_tunnel: fix use-after-free in ip_tunnel_lookup() 2020-06-18 20:12:34 -07:00
ip_tunnel_core.c lwtunnel: only keep the available bits when setting vxlan md->gbp 2020-09-14 16:49:39 -07:00
ip_vti.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2020-07-30 14:39:31 -07:00
ipcomp.c ipcomp: assign if_id to child tunnel from parent tunnel 2020-07-09 12:55:37 +02:00
ipconfig.c Documentation: nfsroot.rst: Fix references to nfsroot.rst 2020-03-02 13:11:46 -07:00
ipip.c net: ipip: implement header_ops->parse_protocol for AF_PACKET 2020-06-30 12:29:39 -07:00
ipmr.c ipmr: Copy option to correct variable 2020-07-27 11:39:55 -07:00
ipmr_base.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
Kconfig net: ipv4: remove duplicate "the the" phrase in Kconfig text 2020-08-18 16:02:16 -07:00
Makefile udp_tunnel: add central NIC RX port offload infrastructure 2020-07-10 13:54:00 -07:00
metrics.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
netfilter.c
netlink.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
nexthop.c net: nexthop: don't allow empty NHA_GROUP 2020-08-22 12:39:55 -07:00
ping.c ipv4: fill fl4_icmp_{type,code} in ping_v4_sendmsg 2020-07-07 15:26:37 -07:00
proc.c tcp: add SNMP counter for no. of duplicate segments reported by DSACK 2020-07-17 12:54:30 -07:00
protocol.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
raw.c net: Fix some comments 2020-08-27 07:55:59 -07:00
raw_diag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
route.c ipv4: Update exception handling for multipath routes via same device 2020-09-15 15:44:09 -07:00
syncookies.c tcp: fix build fong CONFIG_MPTCP=n 2020-08-01 11:46:29 -07:00
sysctl_net_ipv4.c tcp: correct read of TFO keys on big endian systems 2020-08-10 12:12:35 -07:00
tcp.c tcp: correct read of TFO keys on big endian systems 2020-08-10 12:12:35 -07:00
tcp_bbr.c tcp_bbr: improve arithmetic division in bbr_update_bw() 2020-01-21 10:45:49 +01:00
tcp_bic.c tcp: fix stretch ACK bugs in BIC 2020-03-16 18:26:54 -07:00
tcp_bpf.c bpf: tcp: Recv() should return 0 when the peer socket is closed 2020-06-12 15:10:12 -07:00
tcp_cdg.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_cong.c tcp: make sure listeners don't initialize congestion-control state 2020-07-09 13:07:45 -07:00
tcp_cubic.c tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT 2020-06-25 16:08:47 -07:00
tcp_dctcp.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tcp_dctcp.h
tcp_diag.c inet_diag: Move the INET_DIAG_REQ_BYTECODE nlattr to cb->data 2020-02-27 18:50:19 -08:00
tcp_fastopen.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-08-13 20:03:11 -07:00
tcp_highspeed.c Replace HTTP links with HTTPS ones: IPv* 2020-07-06 13:23:03 -07:00
tcp_htcp.c Replace HTTP links with HTTPS ones: IPv* 2020-07-06 13:23:03 -07:00
tcp_hybla.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_illinois.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_input.c tcp: apply a floor of 1 for RTT samples from TCP timestamps 2020-08-03 17:54:03 -07:00
tcp_ipv4.c bpf: Refactor to provide aux info to bpf_iter_init_seq_priv_t 2020-07-25 20:16:32 -07:00
tcp_lp.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_metrics.c net-tcp: Disable TCP ssthresh metrics cache by default 2019-12-09 20:17:48 -08:00
tcp_minisocks.c mptcp: add new sock flag to deal with join subflows 2020-05-15 12:30:13 -07:00
tcp_nv.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_offload.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tcp_output.c tcp: rename request_sock cookie_ts bit to syncookie 2020-07-31 16:55:32 -07:00
tcp_rate.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
tcp_recovery.c
tcp_scalable.c tcp: fix stretch ACK bugs in Scalable 2020-03-16 18:26:54 -07:00
tcp_timer.c net: ipv4: kerneldoc fixes 2020-07-13 17:20:39 -07:00
tcp_ulp.c bpf: sockmap: Only check ULP for TCP sockets 2020-03-09 22:34:58 +01:00
tcp_vegas.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_vegas.h
tcp_veno.c Replace HTTP links with HTTPS ones: IPv* 2020-07-06 13:23:03 -07:00
tcp_westwood.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_yeah.c tcp: fix stretch ACK bugs in Yeah 2020-03-16 18:26:55 -07:00
tunnel4.c tunnel4: add cb_handler to struct xfrm_tunnel 2020-07-09 12:51:36 +02:00
udp.c udp, bpf: Ignore connections in reuseport group after BPF sk lookup 2020-07-31 02:00:48 +02:00
udp_bpf.c bpf: Add sockmap hooks for UDP sockets 2020-03-09 22:34:58 +01:00
udp_diag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
udp_impl.h net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
udp_offload.c udp: initialize is_flist with 0 in udp_gro_receive 2020-03-30 10:35:03 -07:00
udp_tunnel_core.c udp_tunnel: add central NIC RX port offload infrastructure 2020-07-10 13:54:00 -07:00
udp_tunnel_nic.c udp_tunnel: add the ability to hard-code IANA VXLAN 2020-08-03 10:13:54 -07:00
udp_tunnel_stub.c udp_tunnel: add central NIC RX port offload infrastructure 2020-07-10 13:54:00 -07:00
udplite.c net/ipv4: remove compat_ip_{get,set}sockopt 2020-07-19 18:16:41 -07:00
xfrm4_input.c xfrm: state: remove extract_input indirection from xfrm_state_afinfo 2020-05-06 09:40:08 +02:00
xfrm4_output.c xfrm: fix unused variable warning if CONFIG_NETFILTER=n 2020-05-11 15:12:27 +02:00
xfrm4_policy.c net: add bool confirm_neigh parameter for dst_ops.update_pmtu 2019-12-24 22:28:54 -08:00
xfrm4_protocol.c xfrm: add route lookup to xfrm4_rcv_encap 2019-12-09 09:59:07 +01:00
xfrm4_state.c xfrm: remove output_finish indirection from xfrm_state_afinfo 2020-05-06 09:40:08 +02:00
xfrm4_tunnel.c xfrm: remove type and offload_type map from xfrm_state_afinfo 2019-06-06 08:34:50 +02:00