Commit graph

97 commits

Author SHA1 Message Date
Jiri Pirko 686ed3047e rocker: use change upper info
Since now information about changed upper is passed along, benefit from
that and use this info directly.

This also fixes possible issues that could happen when non-master device
is added (current code does not distinguish between master and non-master
upper device).

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 16:28:35 -07:00
Jiri Pirko fb4bf21434 rocker: use new helper to figure out master kind
Looking at rtnl kind string is kind of ugly. So use new helpers to do
this in nicer way.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 16:28:35 -07:00
Scott Feldman dd19f83d6c rocker: hook ndo_neigh_destroy to cleanup neigh refs in driver
Rocker driver tracks arp_tbl neighs to resolve IPv4 route nexthops.  The
driver uses NETEVENT_NEIGH_UPDATE for neigh adds and updates, but there is
no event when the neigh is removed from the device (such as when the device
goes admin down).  This patches hooks ndo_neigh_destroy so the driver can
know when a neigh is removed from the device.  In response, the driver will
purge the neigh entry from its internal tbl.

I didn't find an in-tree users of ndo_neigh_destroy, so I'm not sure if
this ndo is vestigial or if there are out-of-tree users.  In any case, it
does what I need here.  An alternative design would be to generate
NETEVENT_NEIGH_UPDATE event when neigh is being destroyed, setting state to
NUD_NONE so driver knows neigh entry is dead.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 17:05:46 -07:00
Scott Feldman c8beb5b261 rocker: print switch ID consistent with phys_switch_id sysfs node
On sucessful probe, driver prints the switch ID.  This patch changes the
format of the printed ID to match what's used in sysfs phys_switch_id node.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 17:05:46 -07:00
David S. Miller 182ad468e7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/cavium/Kconfig

The cavium conflict was overlapping dependency
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 16:23:11 -07:00
Vivien Didelot ce80e7bc57 net: switchdev: support static FDB addresses
This patch adds an ndm_state member to the switchdev_obj_fdb structure,
in order to support static FDB addresses.

Set Rocker ndm_state to NUD_REACHABLE.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-11 12:03:19 -07:00
David S. Miller cdf0969763 Revert "Merge branch 'mv88e6xxx-switchdev-fdb'"
This reverts commit f1d5ca4344, reversing
changes made to 4933d85c51.

I applied v2 instead of v3.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-11 12:00:37 -07:00
Vivien Didelot 1525c386a1 net: switchdev: change fdb addr for a byte array
The address in the switchdev_obj_fdb structure is currently represented
as a pointer. Replacing it for a 6-byte array allows switchdev to carry
addresses directly read from hardware registers, not stored by the
switch chip driver (as in Rocker).

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:48:08 -07:00
Scott Feldman ff14702844 rocker: use netdev_err after register_netdev
After successful register_netdev, we can use netdev_err rather the more
generic dev_err.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 21:47:57 -07:00
Scott Feldman 6c4f7780a5 rocker: NULL port if port probe fails
Set port to NULL if port probe fails so we don't try to remove partially
initialized port on port probe err cleanup path.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 21:47:57 -07:00
Jiri Pirko 95b9be64d1 rocker: linearize skb in case frags would not fit into tx descriptor
Suggested-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-03 14:22:26 -07:00
Ido Schimmel 21518a6eb9 rocker: enable support for scattered packets
rocker supports the transmission of scattered packets, so let the kernel
know about it by setting the NETIF_F_SG bit in the device's features.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-03 14:22:25 -07:00
Ido Schimmel 1ebd47efa4 rocker: free netdevice during netdevice removal
When removing a port's netdevice in 'rocker_remove_ports', we should
also free the allocated 'net_device' structure. Do that by calling
'free_netdev' after unregistering it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Fixes: 4b8ac9660a ("rocker: introduce rocker switch driver")
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-02 17:19:17 -07:00
Scott Feldman 3f98a8e636 rocker: add offload_fwd_mark support
If device flags ingress packet as "fwd offload", mark the
skb->offlaod_fwd_mark using the ingress port's dev->offlaod_fwd_mark.  This
will be the hint to the kernel that this packet has already been forwarded
by device to egress ports matching skb->offlaod_fwd_mark.

For rocker, derive port dev->offlaod_fwd_mark based on device switch ID and
port ifindex.  If port is bridged, use the bridge ifindex rather than the
port ifindex.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 18:32:45 -07:00
Simon Horman 8254973fa3 rocker: forward packets to CPU when port is joined to openvswitch
Teach rocker to forward packets to CPU when a port is joined to Open vSwitch.
There is scope to later refine what is passed up as per Open vSwitch flows
on a port.

This does not change the behaviour of rocker ports that are
not joined to Open vSwitch.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 18:26:03 -07:00
Anuradha Karuppiah c305524617 rocker: Handle protodown notifications.
protodown can be set by user space applications like MLAG on detecting
errors on a switch port. This patch provides sample switch driver changes
for handling protodown. Rocker PHYS disables the port in response to
protodown.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15 21:39:40 -07:00
Scott Feldman 77a58c741d rocker: add change MTU support
Implement ndo_change_mtu: on MTU change, reallocate Rx ring bufs and signal
HW of new port MTU value.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Tested-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 00:31:14 -07:00
David S. Miller 3a07bd6fea Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/mellanox/mlx4/main.c
	net/packet/af_packet.c

Both conflicts were cases of simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24 02:58:51 -07:00
Gilad Ben-Yossef a076e6bfe7 rocker: call correct unregister function on error
Use the correct unregister function matching the register
function on the error path.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Fixes: c1beeef7a3 ("rocker: implement IPv4 fib offloading")
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-23 07:12:26 -07:00
Scott Feldman 7d4f8d871a switchdev; add VLAN support for port's bridge_getlink
One more missing piece of the puzzle.  Add vlan dump support to switchdev
port's bridge_getlink.  iproute2 "bridge vlan show" cmd already knows how
to show the vlans installed on the bridge and the device , but (until now)
no one implemented the port vlan part of the netlink PF_BRIDGE:RTM_GETLINK
msg.  Before this patch, "bridge vlan show":

	$ bridge -c vlan show
	port    vlan ids
	sw1p1    30-34			<< bridge side vlans
		 57

	sw1p1				<< device side vlans (missing)

	sw1p2    57

	sw1p2

	sw1p3

	sw1p4

	br0     None

(When the port is bridged, the output repeats the vlan list for the vlans
on the bridge side of the port and the vlans on the device side of the
port.  The listing above show no vlans for the device side even though they
are installed).

After this patch:

	$ bridge -c vlan show
	port    vlan ids
	sw1p1    30-34			<< bridge side vlan
		 57

	sw1p1    30-34			<< device side vlans
		 57
		 3840 PVID

	sw1p2    57

	sw1p2    57
		 3840 PVID

	sw1p3    3842 PVID

	sw1p4    3843 PVID

	br0     None

I re-used ndo_dflt_bridge_getlink to add vlan fill call-back func.
switchdev support adds an obj dump for VLAN objects, using the same
call-back scheme as FDB dump.  Support included for both compressed and
un-compressed vlan dumps.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-23 06:56:18 -07:00
Scott Feldman 3e3a78b495 switchdev: rename vlan vid_start to vid_begin
Use vid_begin/end to be consistent with BRIDGE_VLAN_INFO_RANGE_BEGIN/END.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-23 06:56:18 -07:00
Scott Feldman b4ad7baa01 bridge: del external_learned fdbs from device on flush or ageout
We need to delete from offload the device externally learnded fdbs when any
one of these events happen:

1) Bridge ages out fdb.  (When bridge is doing ageing vs. device doing
ageing.  If device is doing ageing, it would send SWITCHDEV_FDB_DEL
directly).

2) STP state change flushes fdbs on port.

3) User uses sysfs interface to flush fdbs from bridge or bridge port:

	echo 1 >/sys/class/net/BR_DEV/bridge/flush
	echo 1 >/sys/class/net/BR_PORT/brport/flush

4) Offload driver send event SWITCHDEV_FDB_DEL to delete fdb entry.

For rocker, we can now get called to delete fdb entry in wait and nowait
contexts, so set NOWAIT flag when deleting fdb entry.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 17:08:49 -07:00
Scott Feldman f66feaa98b rocker: move port stop to 'no wait' processing
rocker_port_stop can be called from atomic and non-atomic contexts.  Since
we can't test what context we're getting called in, do the processing as
'no wait', which will cover all cases.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 16:06:49 -07:00
Scott Feldman 92014b97ed rocker: move MAC learn event back to 'no wait' processing
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 16:06:49 -07:00
Scott Feldman ac28393e85 rocker: mark STP update as 'no wait' processing
We can get STP updates from the bridge driver in atomic and non-atomic
contexts.  Since we can't test what context we're getting called in,
do the STP processing as 'no wait', which will cover all cases.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 16:06:49 -07:00
Scott Feldman 02a9fbfc87 rocker: mark neigh update event processing as 'no wait'
Neigh update event handler runs in a context where we can't sleep, so mark
processing in driver with ROCKER_OP_FLAG_NOWAIT.  NOWAIT will use
GFP_ATOMIC for allocations and will queue cmds to the device's cmd ring but
will not wait (sleep) for cmd response back from device.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 16:06:48 -07:00
Scott Feldman 179f9a2590 rocker: revert back to support for nowait processes
One of the items removed from the rocker driver in the Spring Cleanup patch
series was the ability to mark processing in the driver as "no wait" for
those contexts where we cannot sleep.  Turns out, we have "no wait"
contexts where we want to program the device.  So re-add the
ROCKER_OP_FLAG_NOWAIT flag to mark such processes, and propagate flags to
mem allocator and to the device cmd executor.  With NOWAIT, mem allocs are
GFP_ATOMIC and device cmds are queued to the device, but the driver will
not wait (sleep) for the response back from the device.

My bad for removing NOWAIT support in the first place; I thought we could
swing non-sleep contexts to process context using a work queue, for
example, but there is push-back to keep processing in original context.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 16:06:48 -07:00
Scott Feldman 4d81db4156 rocker: fix neigh tbl index increment race
rocker->neigh_tbl_next_index is used to generate unique indices for neigh
entries programmed into the device.  The way new indices were generated was
racy with the new prepare-commit transaction model.  A simple fix here
removes the race.  The race was with two processes getting the same index,
one process using prepare-commit, the other not:

Proc A					Proc B

PREPARE phase
get neigh_tbl_next_index

					NONE phase
					get neigh_tbl_next_index
					neigh_tbl_next_index++

COMMIT phase
neigh_tbl_next_index++

Both A and B got the same index.  The fix is to store and increment
neigh_tbl_next_index in the PREPARE (or NONE) phase and use value in COMMIT
phase:

Proc A					Proc B

PREPARE phase
get neigh_tbl_next_index
neigh_tbl_next_index++

					NONE phase
					get neigh_tbl_next_index
					neigh_tbl_next_index++

COMMIT phase
// use value stashed in PREPARE phase

Reported-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 16:04:21 -07:00
Scott Feldman a072031084 rocker: gaurd against NULL rocker_port when removing ports
The ports array is filled in as ports are probed, but if probing doesn't
finish, we need to stop only those ports that where probed successfully.
Check the ports array for NULL to skip un-probed ports when stopping.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 16:03:48 -07:00
Scott Feldman 2aa2ed0864 rocker: remove support for legacy VLAN ndo ops
Remove support for legacy ndo ops
.ndo_vlan_rx_add_vid/.ndo_vlan_rx_kill_vid.  Rocker will use
bridge_setlink/dellink exclusively for VLAN add/del operations.

The legacy ops are needed if using 8021q driver module to setup VLANs on
the port.  But an alternative exists in using bridge_setlink/delink to
setup VLANs, which doesn't depend on 8021q module.  So rocker will switch
to the newer setlink/dellink ops.  VLANs can added/delete from the port,
regardless if port is bridged or not, using the bridge commands:

	bridge vlan [add|del] vid VID dev DEV self

(Yes, I agree it's confusing to use the "bridge" command to set a VLAN on a
non-bridged port).

Using setlink/dellink over legacy ops let's us handle the stacked driver
case automatically.  It's built-in.  setlink also pass additional flags
(PVID, egress untagged) that aren't available with the legacy ops.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 17:00:09 -07:00
Scott Feldman 027e00dc0b rocker: install/remove router MAC for untagged VLAN when joining/leaving bridge
When the port joins a bridge, the port's internal VLAN ID needs to change
to the bridge's internal VLAN ID.  Likewise, when leaving the bridge, the
internal VLAN ID reverts back the port's original internal VLAN ID.  (The
internal VLAN ID is used by device to internally mark untagged pkts with
some VLAN, which will eventually be removed on egress...think PVID).  When
the internal VLAN ID changes, we need to update the VLAN table entries and
the router MAC entries for IP/IPv6 to reflect the new internal VLAN ID.

This patch makes use of the common rocker_port_vlan_add/del functions to
make sure the tables are updated for the current internal VLAN ID.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 17:00:09 -07:00
Scott Feldman bcfd780144 rocker: install untagged VLAN (vid=0) support for each port
On port probe, install by default untagged VLAN support.  This is
equivalent to running the command:

	bridge vlan add vid 0 dev DEV self

A user could, if they wanted, manaully removing untagged support from the
port by running the command:

	bridge vlan del vid 0 dev DEV self

But installing it by default on port initialization gives the normal
expected behavior.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 17:00:09 -07:00
Scott Feldman cec04a60bc rocker: cleanup vlan table on error adding vlan
Basic house keeping: If there is an error adding the router MAC for this
vlan, removing the just installed VLAN table entry to leave device in same
state as before failure.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 17:00:08 -07:00
Scott Feldman 27b808cbc2 rocker: zero allocate ports array
When allocating the array of rocker port pointers, zero the array values so
we can test for !NULL to see if port is allocated/registered.  We'll need
this later when installing untagged VLAN support for each port, during port
probe.  It's a long story, but to install a VLAN (vid=0 for untagged, in
this case) on a port, we'll need to scan other ports to see if the VLAN
group for that VLAN has been setup.  To scan the other ports, we need to
walk the port array.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 17:00:08 -07:00
Simon Horman 534ba6a87d rocker: remove rocker parameter from functions that have rocker_port parameter
The rocker (switch) of a rocker_port may be trivially obtained from
the latter it seems cleaner not to pass the former to a function when
the latter is being passed anyway.

rocker_port_rx_proc() is omitted from this change as it is a hot path case.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 16:04:52 -07:00
Simon Horman e505464355 rocker: mark parameters and local variables as const
Mark parameters and local variables as const where possible.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 18:17:08 -04:00
Simon Horman 0985df7390 rocker: remove unused rocker_port parameter from rocker_port_kfree
Remove unused rocker_port parameter from rocker_port_kfree.
Also remove the rocker_port parameter from callers of rocker_port_kfree
where the parameter it is now unused.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 18:17:08 -04:00
Simon Horman df6a206730 rocker: make rocker_port_internal_vlan_id_{get, put}() non-transactional
The motivation for this is that rocker_port_internal_vlan_id_{get,put} appear
to only partially implement the transaction model: memory allocation
and freeing is transactional, but hash and bitmap manipulation is not.

The latter could be fixed, however, as it is not currently exercised
due to trans always being SWITCHDEV_TRANS_NONE it seems cleaner
to make rocker_port_internal_vlan_id_get non-transactional.

This problem was introduced by c4f20321d9 ("rocker: support
prepare-commit transaction model").

Found by inspection.
I do not believe that this change should have any run-time effect.

Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 17:20:55 -04:00
Simon Horman 550ecc92fe rocker: do not make neighbour entry changes when preparing transactions
rocker_port_ipv4_nh() and in turn rocker_port_ipv4_neigh() may be
be called with trans == SWITCHDEV_TRANS_PREPARE and then
trans == SWITCHDEV_TRANS_COMMIT from switchdev_port_obj_set() via
fib_table_insert().

The first time that rocker_port_ipv4_nh() is called, with
trans == SWITCHDEV_TRANS_PREPARE, _rocker_neigh_add() adds a new entry to
the neigh table.

And the second time  rocker_port_ipv4_nh() is called, with
trans == SWITCHDEV_TRANS_COMMIT, that entry is found. This causes
rocker_port_ipv4_nh() to believe it is not adding an entry and thus it
frees "entry", which is still present in rocker driver's neigh table.

This problem does not appear to affect deletion as my analysis is that
deletion is always performed with trans == SWITCHDEV_TRANS_NONE.

For completeness _rocker_neigh_{add,del,prepare} are updated not to
manipulate fib table entries if trans == SWITCHDEV_TRANS_PREPARE.

Fixes: c4f20321d9 ("rocker: support prepare-commit transaction model")
Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 17:20:55 -04:00
Simon Horman 42e9488971 rocker: do not modify fdb table in rocker_port_fdb() when preparing transactions
rocker_port_fdb_flush() may be called be called with
trans == SWITCHDEV_TRANS_PREPARE and then trans == SWITCHDEV_TRANS_COMMIT from
switchdev_port_attr_set() via switchdev_port_obj_add().

Adding the new entry to the FDB table when trans == SWITCHDEV_TRANS_PREPARE
may result in a memory leak because when trans == SWITCHDEV_TRANS_PREPARE
rocker_flow_tbl_bridge() will allocate memory when called via
rocker_port_fdb_learn(). However, when trans == SWITCHDEV_TRANS_COMMIT
the presence of the FDB entry in the FDB table causes
rocker_port_fdb() to set the ROCKER_OP_FLAG_REFRESH flag which results
in rocker_port_fdb_learn() skipping the call to rocker_flow_tbl_bridge()
which would free the memory allocated by it when
trans == SWITCHDEV_TRANS_PREPARE.

ip link add br0 type bridge
ip link set up dev eth0
ip link set dev eth0 master br0
bridge fdb add 52:54:00:12:35:08 dev eth0
bridge fdb add 52:54:00:12:35:09 dev eth0
[    2.600730] ------------[ cut here ]------------
[    2.601002] kernel BUG at drivers/net/ethernet/rocker/rocker.c:4369!
[    2.601373] invalid opcode: 0000 [#1] SMP
[    2.601963] Modules linked in:
[    2.602355] CPU: 0 PID: 64 Comm: bridge Not tainted 4.1.0-rc3-01048-g6d0f50c50211-dirty #1075
[    2.602721] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.0-0-g4c59f5d-20150219_092859-nilsson.home.kraxel.org 04/01/2014
[    2.602721] task: ffff880019facef0 ti: ffff88001f96c000 task.ti: ffff88001f96c000
[    2.602721] RIP: 0010:[<ffffffff811f1470>]  [<ffffffff811f1470>] rocker_port_obj_add+0x150/0x160
[    2.602721] RSP: 0018:ffff88001f96fa98  EFLAGS: 00000212
[    2.602721] RAX: ffff880019d4fa68 RBX: ffff88001f96fb18 RCX: 0000000000000000
[    2.602721] RDX: ffff880019d4f000 RSI: ffff88001f96fb18 RDI: ffff880019d4f000
[    2.602721] RBP: 0000000000000001 R08: 0000000000000000 R09: ffff88001f904620
[    2.602721] R10: ffff88001f96fb60 R11: ffff880019e9d100 R12: ffff88001f96fb18
[    2.602721] R13: ffff880019d4f680 R14: ffff88001f904610 R15: ffff8800198f7b80
[    2.602721] FS:  00007f3eee917700(0000) GS:ffff88001b000000(0000) knlGS:0000000000000000
[    2.602721] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.602721] CR2: 00007f3eee4a15cb CR3: 000000001f933000 CR4: 00000000000006b0
[    2.602721] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    2.602721] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
[    2.602721] Stack:
[    2.602721]  0000000000000000 ffff88001f96fb18 ffff880019d4f000 ffff88001f96fb18
[    2.602721]  ffff880019d4f000 ffffffff81332105 ffff88001f96fb50 ffffffff814464c0
[    2.602721]  ffff88001f96fb18 ffff88001f904600 ffff880019d4f000 ffffffff813326e5
[    2.602721] Call Trace:
[    2.602721]  [<ffffffff81332105>] ? __switchdev_port_obj_add+0x25/0x90
[    2.602721]  [<ffffffff813326e5>] ? switchdev_port_obj_add+0x25/0xc0
[    2.602721]  [<ffffffff813327b1>] ? switchdev_port_fdb_add+0x31/0x40
[    2.602721]  [<ffffffff8123911f>] ? rtnl_fdb_add+0xff/0x1e0
[    2.602721]  [<ffffffff81237d8e>] ? rtnetlink_rcv_msg+0x7e/0x250
[    2.602721]  [<ffffffff8121d1ce>] ? __skb_recv_datagram+0xfe/0x4b0
[    2.602721]  [<ffffffff81237d10>] ? rtnetlink_rcv+0x30/0x30
[    2.602721]  [<ffffffff81247958>] ? netlink_rcv_skb+0xa8/0xd0
[    2.602721]  [<ffffffff81237cff>] ? rtnetlink_rcv+0x1f/0x30
[    2.602721]  [<ffffffff81247220>] ? netlink_unicast+0x150/0x200
[    2.602721]  [<ffffffff81247714>] ? netlink_sendmsg+0x374/0x3e0
[    2.602721]  [<ffffffff8120f8df>] ? sock_sendmsg+0xf/0x30
[    2.602721]  [<ffffffff8120ffd3>] ? ___sys_sendmsg+0x1f3/0x200
[    2.602721]  [<ffffffff812100e5>] ? ___sys_recvmsg+0x105/0x140
[    2.602721]  [<ffffffff810a36f0>] ? SyS_readahead+0x90/0x90
[    2.602721]  [<ffffffff81098dfd>] ? filemap_map_pages+0x1ed/0x210
[    2.602721]  [<ffffffff810b77fc>] ? handle_mm_fault+0x5fc/0xe50
[    2.602721]  [<ffffffff81210ef9>] ? __sys_sendmsg+0x39/0x70
[    2.602721]  [<ffffffff8133ce17>] ? system_call_fastpath+0x12/0x6a
[    2.602721] Code: b7 8f a0 06 00 00 48 83 bf 88 06 00 00 00 74 1d 48 83 c4 08 89 ee 4c 89 ef 5b 5d 41 5c 41 5d 0f b7 c9 45 31 c0 e9 51 db ff ff 90 <0f> 0b b8 ea ff ff ff e9 cf fe ff ff 0f 1f 40 00 41 57 41 56 b9
[    2.602721] RIP  [<ffffffff811f1470>] rocker_port_obj_add+0x150/0x160
[    2.602721]  RSP <ffff88001f96fa98>
[    2.615848] ---[ end trace 4f7b4f1c98077108 ]---

The above is resolved by not adding the new FDB entry to the FDB table
if trans == SWITCHDEV_TRANS_PREPARE.

For symmetry this patch also skips deleting FDB entries from the FDB
table trans == SWITCHDEV_TRANS_PREPARE. However, my analysis is that
this never occurs as trans is always SWITCHDEV_TRANS_NONE when removing
FDB entries.

Fixes: c4f20321d9 ("rocker: support prepare-commit transaction model")
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 17:20:54 -04:00
Simon Horman 3098ac3963 rocker: do not delete fdb entries in rocker_port_fdb_flush() when preparing transactions
rocker_port_fdb_flush() is called by rocker_port_stp_update() which in
turn may be called with trans == SWITCHDEV_TRANS_PREPARE and then
trans == SWITCHDEV_TRANS_COMMIT from switchdev_port_attr_set() via
br_set_state().

When rocker_port_fdb_flush() is called with trans == SWITCHDEV_TRANS_PREPARE
it calls rocker_port_fdb_learn() for each entry in the FDB table which in
turn calls rocker_flow_tbl_bridge() which will allocate memory using
rocker_port_kzalloc(). rocker_port_fdb_learn() will then remove the entry
from the FDB table.

Then when rocker_port_fdb_learn() is called with
trans == SWITCHDEV_TRANS_PREPARE no calls are made to rocker_port_fdb_learn()
because there are no longer any entries present in the FDB table. Thus the
memory previously allocated by rocker_port_fdb_learn() is leaked resulting
in the kernel BUG() below.

Furthermore, it looks like the driver ends up with an incorrect view of the
fdb table as the FDB entries are purged from the driver's table but not the
hardware's table.

ip link add br0 type bridge
ip link set up dev eth0
sleep 1
ip link set dev eth0 master br0
[    3.704360] ------------[ cut here ]------------
[    3.704611] kernel BUG at drivers/net/ethernet/rocker/rocker.c:4289!
[    3.704962] invalid opcode: 0000 [#1] SMP
[    3.705537] Modules linked in:
[    3.705919] CPU: 0 PID: 63 Comm: ip Not tainted 4.1.0-rc3-01046-gb9fbe709de4d #1044
[    3.706191] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.0-0-g4c59f5d-20150219_092859-nilsson.home.kraxel.org 04/01/2014
[    3.706820] task: ffff880019f70150 ti: ffff88001f92c000 task.ti: ffff88001f92c000
[    3.707138] RIP: 0010:[<ffffffff811f0080>]  [<ffffffff811f0080>] rocker_port_attr_set+0xe0/0xf0
[    3.707990] RSP: 0018:ffff88001f92f808  EFLAGS: 00000212
[    3.708200] RAX: ffff880019d4fa68 RBX: ffff880019d4f000 RCX: 0000000000000000
[    3.708471] RDX: 000000000000000c RSI: ffff88001f92f890 RDI: ffff880019d4f680
[    3.708740] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000004
[    3.708999] R10: ffff880000034024 R11: 0000000000000000 R12: ffff88001f92f890
[    3.709276] R13: ffff88001f8f1c00 R14: 000000000000000b R15: 0000000000000000
[    3.709303] FS:  00007f8ab66bd700(0000) GS:ffff88001b000000(0000) knlGS:0000000000000000
[    3.709303] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.709303] CR2: 0000000000654988 CR3: 000000001f8f3000 CR4: 00000000000006b0
[    3.709303] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    3.709303] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
[    3.709303] Stack:
[    3.709303]  ffff88001f8f1c00 000000000000000b ffff88001f92f890 ffff880019d4f000
[    3.709303]  ffff88001f92f890 ffffffff813332f5 ffff88001f92f880 0000000000000000
[    3.709303]  ffff88001f92f890 0000000000000001 ffff880019d4f000 ffffffff81333627
[    3.709303] Call Trace:
[    3.709303]  [<ffffffff813332f5>] ? __switchdev_port_attr_set+0x25/0x90
[    3.709303]  [<ffffffff81333627>] ? switchdev_port_attr_set+0x27/0x120
[    3.709303]  [<ffffffff81318e86>] ? br_set_state+0x36/0x50
[    3.709303]  [<ffffffff8131795c>] ? br_add_if+0x37c/0x400
[    3.709303]  [<ffffffff81238ce1>] ? do_setlink+0x7e1/0x800
[    3.709303]  [<ffffffff8111f980>] ? radix_tree_lookup_slot+0x10/0x30
[    3.709303]  [<ffffffff81136fba>] ? nla_parse+0xaa/0x110
[    3.709303]  [<ffffffff81239c98>] ? rtnl_newlink+0x548/0x870
[    3.709303]  [<ffffffff8111f900>] ? __radix_tree_lookup+0x40/0xb0
[    3.709303]  [<ffffffff81136f3e>] ? nla_parse+0x2e/0x110
[    3.709303]  [<ffffffff81237d7e>] ? rtnetlink_rcv_msg+0x7e/0x250
[    3.709303]  [<ffffffff8121d1be>] ? __skb_recv_datagram+0xfe/0x4b0
[    3.709303]  [<ffffffff81237d00>] ? rtnetlink_rcv+0x30/0x30
[    3.709303]  [<ffffffff81247948>] ? netlink_rcv_skb+0xa8/0xd0
[    3.709303]  [<ffffffff81237cef>] ? rtnetlink_rcv+0x1f/0x30
[    3.709303]  [<ffffffff81247210>] ? netlink_unicast+0x150/0x200
[    3.709303]  [<ffffffff81247704>] ? netlink_sendmsg+0x374/0x3e0
[    3.709303]  [<ffffffff8120f8cf>] ? sock_sendmsg+0xf/0x30
[    3.709303]  [<ffffffff8120ffc3>] ? ___sys_sendmsg+0x1f3/0x200
[    3.709303]  [<ffffffff812100d5>] ? ___sys_recvmsg+0x105/0x140
[    3.709303]  [<ffffffff812228d9>] ? dev_get_by_name_rcu+0x69/0x90
[    3.709303]  [<ffffffff812228d9>] ? dev_get_by_name_rcu+0x69/0x90
[    3.709303]  [<ffffffff81217b7d>] ? skb_dequeue+0x4d/0x60
[    3.709303]  [<ffffffff81217bb0>] ? skb_queue_purge+0x20/0x30
[    3.709303]  [<ffffffff810ebdcf>] ? __inode_wait_for_writeback+0x5f/0xb0
[    3.709303]  [<ffffffff810648b0>] ? autoremove_wake_function+0x30/0x30
[    3.709303]  [<ffffffff81210ee9>] ? __sys_sendmsg+0x39/0x70
[    3.709303]  [<ffffffff8133e097>] ? system_call_fastpath+0x12/0x6a
[    3.709303] Code: bb 90 06 00 00 48 c7 04 24 00 00 00 00 45 31 c9 45 31 c0 48 c7 c1 c0 b7 1e 81 89 ea e8 da da ff ff eb 95 0f 1f 84 00 00 00 00 00 <0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 83 fe 15 75
[    3.709303] RIP  [<ffffffff811f0080>] rocker_port_attr_set+0xe0/0xf0
[    3.709303]  RSP <ffff88001f92f808>
[    3.721409] ---[ end trace b7481fcb7cb032aa ]---
Segmentation fault

Fixes: c4f20321d9 ("rocker: support prepare-commit transaction model")
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 17:20:54 -04:00
Samudrala, Sridhar 45d4122ca7 switchdev: add support for fdb add/del/dump via switchdev_port_obj ops.
- introduce port fdb obj and generic switchdev_port_fdb_add/del/dump()
- use switchdev_port_fdb_add/del/dump in rocker/team/bonding ndo ops.
- add support for fdb obj in switchdev_port_obj_add/del/dump()
- switch rocker to implement fdb ops via switchdev_ops

v3: updated to sync with named union changes.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-17 22:49:09 -04:00
Ying Xue 1f9993f682 rocker: fix a neigh entry leak issue
Once we get a neighbour through looking up arp cache or creating a
new one in rocker_port_ipv4_resolve(), the neighbour's refcount is
already taken. But as we don't put the refcount again after it's
used, this makes the neighbour entry leaked.

Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-15 21:20:16 -04:00
Ying Xue 4133fc0952 rocker: fix a neigh entry leak issue
Once we get a neighbour through looking up arp cache or creating a
new one in rocker_port_ipv4_resolve(), the neighbour's refcount is
already taken. But as we don't put the refcount again after it's
used, this makes the neighbour entry leaked.

Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-15 16:58:32 -04:00
Scott Feldman 42275bd8fc switchdev: don't use anonymous union on switchdev attr/obj structs
Older gcc versions (e.g.  gcc version 4.4.6) don't like anonymous unions
which was causing build issues on the newly added switchdev attr/obj
structs.  Fix this by using named union on structs.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 14:20:59 -04:00
Scott Feldman 7a7ee5312d switchdev: sparse warning: pass ipv4 fib dst as network-byte order
And let driver convert it to host-byte order as needed.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 12:26:27 -04:00
Scott Feldman 4725ceb9b7 rocker: make checkpatch -f clean
Well almost clean: ignore the CHECKs for space after cast operator and some
longer-than-80 char cases where for readability it's better to keep as-is.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:56 -04:00
Scott Feldman 7889cbee83 switchdev: remove NETIF_F_HW_SWITCH_OFFLOAD feature flag
Roopa said remove the feature flag for this series and she'll work on
bringing it back if needed at a later date.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:55 -04:00
Scott Feldman 58c2cb16b1 switchdev: convert fib_ipv4_add/del over to switchdev_port_obj_add/del
The IPv4 FIB ops convert nicely to the switchdev objs and we're left with
only four switchdev ops: port get/set and port add/del.  Other objs will
follow, such as FDB.  So go ahead and convert IPv4 FIB over to switchdev
obj for consistency, anticipating more objs to come.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:55 -04:00
Scott Feldman 85fdb95672 switchdev: cut over to new switchdev_port_bridge_getlink
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:55 -04:00