Commit graph

3747 commits

Author SHA1 Message Date
Yann Droneaud bfd2793c95 RDMA/cxgb4: set error code on kmalloc() failure
If kmalloc() fails in c4iw_alloc_ucontext(), the function
leaves but does not set an error code in ret variable:
it will return 0 to the caller.

This patch set ret to -ENOMEM in such case.

Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Steve Wise <swise@chelsio.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 14:55:21 -04:00
Matan Barak e471b40321 mlx4: Use actual number of PCI functions (PF + VFs) for alias GUID logic
The code which is dealing with SRIOV alias GUIDs in the mlx4 IB driver has some
logic which operated according to the maximal possible active functions (PF + VFs).

After the single port VFs code integration this resulted in a flow of false-positive
warnings going to the kernel log after the PF driver started the alias GUID work.

Fix it by referring to the actual number of functions.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-25 20:48:05 -04:00
Matan Barak 449fc48866 net/mlx4: Adapt code for N-Port VF
Adds support for N-Port VFs, this includes:
1. Adding support in the wrapped FW command
	In wrapped commands, we need to verify and convert
	the slave's port into the real physical port.
	Furthermore, when sending the response back to the slave,
	a reverse conversion should be made.
2. Adjusting sqpn for QP1 para-virtualization
	The slave assumes that sqpn is used for QP1 communication.
	If the slave is assigned to a port != (first port), we need
	to adjust the sqpn that will direct its QP1 packets into the
	correct endpoint.
3. Adjusting gid[5] to modify the port for raw ethernet
	In B0 steering, gid[5] contains the port. It needs
	to be adjusted into the physical port.
4. Adjusting number of ports in the query / ports caps in the FW commands
	When a slave queries the hardware, it needs to view only
	the physical ports it's assigned to.
5. Adjusting the sched_qp according to the port number
	The QP port is encoded in the sched_qp, thus in modify_qp we need
	to encode the correct port in sched_qp.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20 16:18:30 -04:00
Matan Barak 82373701be IB/mlx4_ib: Adapt code to use caps.num_ports instead of a constant
Some code in the mlx4 IB driver stack assumed MLX4_MAX_PORTS ports.

Instead, we should only loop until the number of actual ports in i
the device, which is stored in dev->caps.num_ports.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20 16:18:29 -04:00
Steve Wise 05eb23893c cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes
The current logic suffers from a slow response time to disable user DB
usage, and also fails to avoid DB FIFO drops under heavy load. This commit
fixes these deficiencies and makes the avoidance logic more optimal.
This is done by more efficiently notifying the ULDs of potential DB
problems, and implements a smoother flow control algorithm in iw_cxgb4,
which is the ULD that puts the most load on the DB fifo.

Design:

cxgb4:

Direct ULD callback from the DB FULL/DROP interrupt handler.  This allows
the ULD to stop doing user DB writes as quickly as possible.

While user DB usage is disabled, the LLD will accumulate DB write events
for its queues.  Then once DB usage is reenabled, a single DB write is
done for each queue with its accumulated write count.  This reduces the
load put on the DB fifo when reenabling.

iw_cxgb4:

Instead of marking each qp to indicate DB writes are disabled, we create
a device-global status page that each user process maps.  This allows
iw_cxgb4 to only set this single bit to disable all DB writes for all
user QPs vs traversing the idr of all the active QPs.  If the libcxgb4
doesn't support this, then we fall back to the old approach of marking
each QP.  Thus we allow the new driver to work with an older libcxgb4.

When the LLD upcalls iw_cxgb4 indicating DB FULL, we disable all DB writes
via the status page and transition the DB state to STOPPED.  As user
processes see that DB writes are disabled, they call into iw_cxgb4
to submit their DB write events.  Since the DB state is in STOPPED,
the QP trying to write gets enqueued on a new DB "flow control" list.
As subsequent DB writes are submitted for this flow controlled QP, the
amount of writes are accumulated for each QP on the flow control list.
So all the user QPs that are actively ringing the DB get put on this
list and the number of writes they request are accumulated.

When the LLD upcalls iw_cxgb4 indicating DB EMPTY, which is in a workq
context, we change the DB state to FLOW_CONTROL, and begin resuming all
the QPs that are on the flow control list.  This logic runs on until
the flow control list is empty or we exit FLOW_CONTROL mode (due to
a DB DROP upcall, for example).  QPs are removed from this list, and
their accumulated DB write counts written to the DB FIFO.  Sets of QPs,
called chunks in the code, are removed at one time. The chunk size is 64.
So 64 QPs are resumed at a time, and before the next chunk is resumed, the
logic waits (blocks) for the DB FIFO to drain.  This prevents resuming to
quickly and overflowing the FIFO.  Once the flow control list is empty,
the db state transitions back to NORMAL and user QPs are again allowed
to write directly to the user DB register.

The algorithm is designed such that if the DB write load is high enough,
then all the DB writes get submitted by the kernel using this flow
controlled approach to avoid DB drops.  As the load lightens though, we
resume to normal DB writes directly by user applications.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:44:11 -04:00
Steve Wise 7a2cea2aaa cxgb4/iw_cxgb4: Treat CPL_ERR_KEEPALV_NEG_ADVICE as negative advice
Based on original work by Anand Priyadarshee <anandp@chelsio.com>.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:44:11 -04:00
David S. Miller 85dcce7a73 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/r8152.c
	drivers/net/xen-netback/netback.c

Both the r8152 and netback conflicts were simple overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:31:55 -04:00
Jack Morgenstein aa9a2d51a3 mlx4: Activate RoCE/SRIOV
To activate RoCE/SRIOV, need to remove the following:
1. In mlx4_ib_add, need to remove the error return preventing
   initialization of a RoCE port under SRIOV.
2. In update_vport_qp_params (in resource_tracker.c) need to remove
   the error return when a RoCE RC or UD qp is detected.
   This error return causes the INIT-to-RTR qp transition to fail
   in the wrapper function under RoCE/SRIOV.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:57:16 -04:00
Shani Michaelli ceb5433b3a mlx4_ib: Fix SIDR support of for UD QPs under SRIOV/RoCE
* Handle CM_SIDR_REQ_ATTR_ID and CM_SIDR_REP_ATTR_ID
  in multiplex_cm_handler and demux_cm_handler.

* Handle Service ID Resolution messages and REQ messages
  separately, for their formats are different.

Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:57:16 -04:00
Jack Morgenstein 5ea8bbfc49 mlx4: Implement IP based gids support for RoCE/SRIOV
Since there is no connection between the MAC/VLAN and the GID
when using IP-based addressing, the proxy QP1 (running on the
slave) must pass the source-mac, destination-mac, and vlan_id
information separately from the GID. Additionally, the Host
must pass the remote source-mac and vlan_id back to the slave,

This is achieved as follows:
Outgoing MADs:
    1. Source MAC: obtained from the CQ completion structure
       (struct ib_wc, smac field).
    2. Destination MAC: obtained from the tunnel header
    3. vlan_id: obtained from the tunnel header.
Incoming MADs
    1. The source (i.e., remote) MAC and vlan_id are passed in
       the tunnel header to the proxy QP1.

VST mode support:
     For outgoing MADs,  the vlan_id obtained from the header is
        discarded, and the vlan_id specified by the Hypervisor is used
        instead.
     For incoming MADs, the incoming vlan_id (in the wc) is discarded, and the
        "invalid" vlan (0xffff)  is substituted when forwarding to the slave.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:57:16 -04:00
Jack Morgenstein 2f5bb47368 mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.

To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.

This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.

Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.

The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified:  INIT-to-RTR, and SQD-to-SQD.

Similarly for the alternate path information.

Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.

For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
   MAC information to the operational information, and reset
   the candidate MAC info.
2. If there was a previous MAC, unregister it.  Then move
   the MAC information from candidate to operational, and
   reset the candidate info (as in 1. above).

The MAC address failure logic is the same for all cases:
 - Unregister the candidate MAC, and reset the candidate MAC info.

For Vlan registration, the logic is similar.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:57:15 -04:00
Jack Morgenstein b6ffaeffae mlx4: In RoCE allow guests to have multiple GIDS
The GIDs are statically distributed, as follows:
PF: gets 16 GIDs
VFs:  Remaining GIDS are divided evenly between VFs activated by the driver.
      If the division is not even, lower-numbered VFs get an extra GID.

For an IB interface, the number of gids per guest remains as before: one gid per guest.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:57:14 -04:00
Jack Morgenstein 6ee51a4e86 mlx4: Adjust QP1 multiplexing for RoCE/SRIOV
This requires the following modifications:
1. Fix build_mlx4_header to properly fill in the ETH fields
2. Adjust mux and demux QP1 flow to support RoCE.

This commit still assumes only one GID per slave for RoCE.
The commit enabling multiple GIDs is a subsequent commit, and
is done separately because of its complexity.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:57:12 -04:00
Linus Torvalds 66a523db70 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "This series addresses a number of outstanding issues wrt to active I/O
  shutdown using iser-target.  This includes:

   - Fix a long standing tpg_state bug where a tpg could be referenced
     during explicit shutdown (v3.1+ stable)
   - Use list_del_init for iscsi_cmd->i_conn_node so list_empty checks
     work as expected (v3.10+ stable)
   - Fix a isert_conn->state related hung task bug + ensure outstanding
     I/O completes during session shutdown.  (v3.10+ stable)
   - Fix isert_conn->post_send_buf_count accounting for RDMA READ/WRITEs
     (v3.10+ stable)
   - Ignore FRWR completions during active I/O shutdown (v3.12+ stable)
   - Fix command leakage for interrupt coalescing during active I/O
     shutdown (v3.13+ stable)

  Also included is another DIF emulation fix from Sagi specific to
  v3.14-rc code"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  Target/sbc: Fix sbc_copy_prot for offset scatters
  iser-target: Fix command leak for tx_desc->comp_llnode_batch
  iser-target: Ignore completions for FRWRs in isert_cq_tx_work
  iser-target: Fix post_send_buf_count for RDMA READ/WRITE
  iscsi/iser-target: Fix isert_conn->state hung shutdown issues
  iscsi/iser-target: Use list_del_init for ->i_conn_node
  iscsi-target: Fix iscsit_get_tpg_from_np tpg_state bug
2014-03-09 13:50:14 -07:00
Nicholas Bellinger ebbe442183 iser-target: Fix command leak for tx_desc->comp_llnode_batch
This patch addresses a number of active I/O shutdown issues
related to isert_cmd descriptors being leaked that are part
of a completion interrupt coalescing batch.

This includes adding logic in isert_cq_tx_comp_err() to
drain any associated tx_desc->comp_llnode_batch, as well
as isert_cq_drain_comp_llist() to drain any associated
isert_conn->conn_comp_llist.

Also, set tx_desc->llnode_active in isert_init_send_wr()
in order to determine when work requests need to be skipped
in isert_cq_tx_work() exception path code.

Finally, update isert_init_send_wr() to only allow interrupt
coalescing when ISER_CONN_UP.

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.13+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:10 -08:00
Nicholas Bellinger 9bb4ca68fc iser-target: Ignore completions for FRWRs in isert_cq_tx_work
This patch changes IB_WR_FAST_REG_MR + IB_WR_LOCAL_INV related
work requests to include a ISER_FRWR_LI_WRID value in order to
signal isert_cq_tx_work() that these requests should be ignored.

This is necessary because even though IB_SEND_SIGNALED is not
set for either work request, during a QP failure event the work
requests will be returned with exception status from the TX
completion queue.

v2 changes:
 - Rename ISER_FRWR_LI_WRID -> ISER_FASTREG_LI_WRID (Sagi)

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:10 -08:00
Nicholas Bellinger b6b87a1df6 iser-target: Fix post_send_buf_count for RDMA READ/WRITE
This patch fixes the incorrect setting of ->post_send_buf_count
related to RDMA WRITEs + READs where isert_rdma_rw->send_wr_num
was not being taken into account.

This includes incrementing ->post_send_buf_count within
isert_put_datain() + isert_get_dataout(), decrementing within
__isert_send_completion() + isert_response_completion(), and
clearing wr->send_wr_num within isert_completion_rdma_read()

This is necessary because even though IB_SEND_SIGNALED is
not set for RDMA WRITEs + READs, during a QP failure event
the work requests will be returned with exception status
from the TX completion queue.

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:09 -08:00
Nicholas Bellinger defd884845 iscsi/iser-target: Fix isert_conn->state hung shutdown issues
This patch addresses a couple of different hug shutdown issues
related to wait_event() + isert_conn->state.  First, it changes
isert_conn->conn_wait + isert_conn->conn_wait_comp_err from
waitqueues to completions, and sets ISER_CONN_TERMINATING from
within isert_disconnect_work().

Second, it splits isert_free_conn() into isert_wait_conn() that
is called earlier in iscsit_close_connection() to ensure that
all outstanding commands have completed before continuing.

Finally, it breaks isert_cq_comp_err() into seperate TX / RX
related code, and adds logic in isert_cq_rx_comp_err() to wait
for outstanding commands to complete before setting ISER_CONN_DOWN
and calling complete(&isert_conn->conn_wait_comp_err).

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:09 -08:00
Nicholas Bellinger 5159d763f6 iscsi/iser-target: Use list_del_init for ->i_conn_node
There are a handful of uses of list_empty() for cmd->i_conn_node
within iser-target code that expect to return false once a cmd
has been removed from the per connect list.

This patch changes all uses of list_del -> list_del_init in order
to ensure that list_empty() returns false as expected.

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:09 -08:00
Amir Vadai 169a1d85d0 net,IB/mlx: Bump all Mellanox driver versions
Bump all Mellanox driver versions.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-25 17:34:44 -05:00
Linus Torvalds 946dd683af Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "Mostly minor fixes this time to v3.14-rc1 related changes.  Also
  included is one fix for a free after use regression in persistent
  reservations UNREGISTER logic that is CC'ed to >= v3.11.y stable"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  Target/sbc: Fix protection copy routine
  IB/srpt: replace strict_strtoul() with kstrtoul()
  target: Simplify command completion by removing CMD_T_FAILED flag
  iser-target: Fix leak on failure in isert_conn_create_fastreg_pool
  iscsi-target: Fix SNACK Type 1 + BegRun=0 handling
  target: Fix missing length check in spc_emulate_evpd_83()
  qla2xxx: Remove last vestiges of qla_tgt_cmd.cmd_list
  target: Fix 32-bit + CONFIG_LBDAF=n link error w/ sector_div
  target: Fix free-after-use regression in PR unregister
2014-02-15 16:18:47 -08:00
Roland Dreier c9459388d8 Merge branches 'cma', 'cxgb4', 'iser', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'usnic' into for-next 2014-02-14 09:49:12 -08:00
Devesh Sharma 09de3f1313 RDMA/ocrdma: Fix load time panic during GID table init
We should use rdma_vlan_dev_real_dev() instead of using vlan_dev_real_dev()
when building the GID table for a vlan interface.

Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-14 09:49:04 -08:00
Devesh Sharma a61d93d92f RDMA/ocrdma: Fix traffic class shift
Use correct value for obtaining traffic class from device
response for Query QP request.

Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-14 09:49:00 -08:00
Dan Carpenter fd8b48b22a IB/iser: Fix use after free in iser_snd_completion()
We use "tx_desc" again after we free it.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-14 09:48:09 -08:00
Roi Dayan 7d9eacf945 IB/iser: Avoid dereferencing iscsi_iser conn object when not bound to iser connection
Fix a possible NULL pointer dereference in disconnection flow. This
can happen if the target disconnected/rejected the connection request,
e.g before the binding stage between iscsi connection to the transport
connection.

Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-14 09:48:03 -08:00
Upinder Malhi f809309a25 IB/usnic: Fix smatch endianness error
Error reported at http://marc.info/?l=linux-rdma&m=138995755801039&w=2

Fix short to int cast for big endian systems.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-14 09:47:29 -08:00
Eli Cohen 0861565f50 IB/mlx5: Remove dependency on X86
Remove Kconfig dependency of mlx5_ib/mlx5_core on X86, since there is
no such dependency in reality.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 20:48:02 -08:00
Mike Marciniszyn 2f75e12c44 IB/qib: Add missing serdes init sequence
Research has shown that commit a77fcf8950 ("IB/qib: Use a single
txselect module parameter for serdes tuning") missed a key serdes init
sequence.

This patch add that sequence.

Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:52:44 -08:00
Kumar Sanghvi 0f0132001f RDMA/cxgb4: Add missing neigh_release in LE-Workaround path
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:46:40 -08:00
Moni Shoua b4a26a2728 IB: Report using RoCE IP based gids in port caps
For userspace RoCE UD QPs we need to know the GID format that the
kernel uses, e.g when working over older kernels. For that end, add a
new port capability IB_PORT_IP_BASED_GIDS and report it when query
port is issued.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:46:03 -08:00
Moni Shoua ad4885d279 IB/mlx4: Build the port IBoE GID table properly under bonding
When scanning netdevices we need to check a few more conditions and
cases to build the IBoE GID table properly.  For example, under
bonding we must make sure that when a port is down, the bond IP
address isn't programmed as a GID, since doing so will cause failure
with IB core flows that selects ports by GID.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:31:09 -08:00
Moni Shoua 5071456fe2 IB/mlx4: Do IBoE GID table resets per-port
The IBoE code used to reset the GID table did it for all Ethernet
ports of the device.  Since the whole architecture of generating GIDs
and responding to events is port-based, this is inefficient and can
lead to wrong content in the GID table.  Change the reset flow to be
per-port.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:31:08 -08:00
Moni Shoua ddf8bd3491 IB/mlx4: Do IBoE locking earlier when initializing the GID table
Updating the GID table under IBoE requires read/write from/to shared
data structures.  These data structures are protected with the device
iboe lock.  The flows that modify the GID table start from

    1. Initializing the GID table
    2. NETDEV events
    3. INET or INET6 events

This patch makes sure that the flow of initializing the GID table is
consistent with the other two flows w.r.t on what step the lock is taken.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:31:08 -08:00
Moni Shoua 4ce5a5744a IB/mlx4: Move rtnl locking to the right place
On the one hand, the invocation of netdev_master_upper_dev_get()
within mlx4_ib_scan_netdevs() must be done with rtnl lock held.  On
the other hand, it's wrong to call rtnl_lock() from within this
function since it's also called by our netdev notifier callback.
Therefore move the locking to mlx4_ib_add() so that both cases are
covered.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:31:08 -08:00
Moni Shoua acc4fccf4e IB/mlx4: Make sure GID index 0 is always occupied
Make sure that for Ethernet ports, the port GID table index 0 is always
occupied with a default GID of the relevant IPv6 link-local adderss.

This provides better user experience for legacy applications that don't use
the RDMA CM and were working on index 0 prior to the IP addressing change.

Also, as GIDs are generated from IP addresses of the network devices that
are associated with the port, it's basically possible that the GID table
will be empty if no IP address was assigned.  This doesn't comply with the
IB spec section 4.1.1 "GID usage and properties".

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:31:08 -08:00
Matan Barak 4196670be7 IB/mlx4: Don't allocate range of steerable UD QPs for Ethernet-only device
When the device has only Ethernet ports, don't try to allocate range
of steerable UD QPs since they aren't needed.  This fixes an issue
where mlx4 VFs tried to allocate a range of UD steerable QPs, but
failed to do so.

Fixes: c1c9850112 ("IB/mlx4: Add support for steerable IB UD QPs")
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 09:00:18 -08:00
Jingoo Han 9d8abf4594 IB/srpt: replace strict_strtoul() with kstrtoul()
The usage of strict_strtoul() is not preferred, because
strict_strtoul() is obsolete. Thus, kstrtoul() should be
used.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-02-12 15:14:40 -08:00
Nicholas Bellinger a80e21b3b2 iser-target: Fix leak on failure in isert_conn_create_fastreg_pool
This patch fixes a memory leak for fr_desc upon failure of
isert_create_fr_desc() in isert_conn_create_fastreg_pool()
code.

As reported by Coverity 1166659:

*** CID 1166659:  Resource leak  (RESOURCE_LEAK)
/drivers/infiniband/ulp/isert/ib_isert.c: 470 in isert_conn_create_fastreg_pool()
464                      isert_conn, isert_conn->conn_fr_pool_size);
465
466             return 0;
467
468     err:
469             isert_conn_free_fastreg_pool(isert_conn);
>>>     CID 1166659:  Resource leak  (RESOURCE_LEAK)
>>>     Variable "fr_desc" going out of scope leaks the storage it points to.
470             return ret;
471     }
472
473     static int
474     isert_connect_request(struct rdma_cm_id *cma_id, struct rdma_cm_event *event)
475     {

Cc: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-02-12 15:14:11 -08:00
Julia Lawall ab576627c8 RDMA/amso1100: Fix error return code
Set the return variable to an error code as done elsewhere in the function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-12 11:11:46 -08:00
Julia Lawall d07875bd0d RDMA/nes: Fix error return code
Set the return variable to an error code as done elsewhere in the function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-12 11:11:09 -08:00
Eli Cohen 1a4c3a3dc5 IB/mlx5: Don't set "block multicast loopback" capability
Currently Connect-IB does not support blocking multicast loopback, so
don't set IB_DEVICE_BLOCK_MULTICAST_LOOPBACK in the device caps.

Reported by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-06 23:09:48 -08:00
Eli Cohen 78c0f98cc9 IB/mlx5: Fix binary compatibility with libmlx5
Commit c1be5232d2 ("Fix micro UAR allocator") broke binary compatibility
between libmlx5 and mlx5_ib since it defines a different value to the number
of micro UARs per page, leading to wrong calculation in libmlx5. This patch
defines struct mlx5_ib_alloc_ucontext_req_v2 as an extension to struct
mlx5_ib_alloc_ucontext_req.  The extended size is determined in mlx5_ib_alloc_ucontext()
and in case of old library we use uuarn 0 which works fine -- this is
acheived due to create_user_qp() falling back from high to medium then to
low class where low class will return 0.  For new libraries we use the
more sophisticated allocation algorithm.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-06 23:00:48 -08:00
Eli Cohen 9e65dc371b IB/mlx5: Fix RC transport send queue overhead computation
Fix the RC QPs send queue overhead computation to take into account
two additional segments in the WQE which are needed for registration
operations.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-06 23:00:48 -08:00
Linus Torvalds 4e13c5d021 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "The highlights this round include:

  - add support for SCSI Referrals (Hannes)
  - add support for T10 DIF into target core (nab + mkp)
  - add support for T10 DIF emulation in FILEIO + RAMDISK backends (Sagi + nab)
  - add support for T10 DIF -> bio_integrity passthrough in IBLOCK backend (nab)
  - prep changes to iser-target for >= v3.15 T10 DIF support (Sagi)
  - add support for qla2xxx N_Port ID Virtualization - NPIV (Saurav + Quinn)
  - allow percpu_ida_alloc() to receive task state bitmask (Kent)
  - fix >= v3.12 iscsi-target session reset hung task regression (nab)
  - fix >= v3.13 percpu_ref se_lun->lun_ref_active race (nab)
  - fix a long-standing network portal creation race (Andy)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits)
  target: Fix percpu_ref_put race in transport_lun_remove_cmd
  target/iscsi: Fix network portal creation race
  target: Report bad sector in sense data for DIF errors
  iscsi-target: Convert gfp_t parameter to task state bitmask
  iscsi-target: Fix connection reset hang with percpu_ida_alloc
  percpu_ida: Make percpu_ida_alloc + callers accept task state bitmask
  iscsi-target: Pre-allocate more tags to avoid ack starvation
  qla2xxx: Configure NPIV fc_vport via tcm_qla2xxx_npiv_make_lport
  qla2xxx: Enhancements to enable NPIV support for QLOGIC ISPs with TCM/LIO.
  qla2xxx: Fix scsi_host leak on qlt_lport_register callback failure
  IB/isert: pass scatterlist instead of cmd to fast_reg_mr routine
  IB/isert: Move fastreg descriptor creation to a function
  IB/isert: Avoid frwr notation, user fastreg
  IB/isert: seperate connection protection domains and dma MRs
  tcm_loop: Enable DIF/DIX modes in SCSI host LLD
  target/rd: Add DIF protection into rd_execute_rw
  target/rd: Add support for protection SGL setup + release
  target/rd: Refactor rd_build_device_space + rd_release_device_space
  target/file: Add DIF protection support to fd_execute_rw
  target/file: Add DIF protection init/format support
  ...
2014-01-31 15:31:23 -08:00
Linus Torvalds 4ba9920e5e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) BPF debugger and asm tool by Daniel Borkmann.

 2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.

 3) Correct reciprocal_divide and update users, from Hannes Frederic
    Sowa and Daniel Borkmann.

 4) Currently we only have a "set" operation for the hw timestamp socket
    ioctl, add a "get" operation to match.  From Ben Hutchings.

 5) Add better trace events for debugging driver datapath problems, also
    from Ben Hutchings.

 6) Implement auto corking in TCP, from Eric Dumazet.  Basically, if we
    have a small send and a previous packet is already in the qdisc or
    device queue, defer until TX completion or we get more data.

 7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.

 8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
    Borkmann.

 9) Share IP header compression code between Bluetooth and IEEE802154
    layers, from Jukka Rissanen.

10) Fix ipv6 router reachability probing, from Jiri Benc.

11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.

12) Support tunneling in GRO layer, from Jerry Chu.

13) Allow bonding to be configured fully using netlink, from Scott
    Feldman.

14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
    already get the TCI.  From Atzm Watanabe.

15) New "Heavy Hitter" qdisc, from Terry Lam.

16) Significantly improve the IPSEC support in pktgen, from Fan Du.

17) Allow ipv4 tunnels to cache routes, just like sockets.  From Tom
    Herbert.

18) Add Proportional Integral Enhanced packet scheduler, from Vijay
    Subramanian.

19) Allow openvswitch to mmap'd netlink, from Thomas Graf.

20) Key TCP metrics blobs also by source address, not just destination
    address.  From Christoph Paasch.

21) Support 10G in generic phylib.  From Andy Fleming.

22) Try to short-circuit GRO flow compares using device provided RX
    hash, if provided.  From Tom Herbert.

The wireless and netfilter folks have been busy little bees too.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
  net/cxgb4: Fix referencing freed adapter
  ipv6: reallocate addrconf router for ipv6 address when lo device up
  fib_frontend: fix possible NULL pointer dereference
  rtnetlink: remove IFLA_BOND_SLAVE definition
  rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
  qlcnic: update version to 5.3.55
  qlcnic: Enhance logic to calculate msix vectors.
  qlcnic: Refactor interrupt coalescing code for all adapters.
  qlcnic: Update poll controller code path
  qlcnic: Interrupt code cleanup
  qlcnic: Enhance Tx timeout debugging.
  qlcnic: Use bool for rx_mac_learn.
  bonding: fix u64 division
  rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
  sfc: Use the correct maximum TX DMA ring size for SFC9100
  Add Shradha Shah as the sfc driver maintainer.
  net/vxlan: Share RX skb de-marking and checksum checks with ovs
  tulip: cleanup by using ARRAY_SIZE()
  ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
  net/cxgb4: Don't retrieve stats during recovery
  ...
2014-01-25 11:17:34 -08:00
Nicholas Bellinger 676687c696 iscsi-target: Convert gfp_t parameter to task state bitmask
This patch propigates the use of task state bitmask now used by
percpu_ida_alloc() up the iscsi-target callchain, replacing the
use of GFP_ATOMIC for TASK_RUNNING, and GFP_KERNEL for
TASK_INTERRUPTIBLE.

Also, drop the unnecessary gfp_t parameter to isert_allocate_cmd(),
and just pass TASK_INTERRUPTIBLE into iscsit_allocate_cmd().

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-25 06:58:52 +00:00
Roland Dreier fb1b5034e4 Merge branch 'ip-roce' into for-next
Conflicts:
	drivers/infiniband/hw/mlx4/main.c
2014-01-22 23:24:21 -08:00
Roland Dreier 8f399921ea Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next 2014-01-22 23:24:13 -08:00
Eli Cohen 57761d8df8 IB/mlx5: Verify reserved fields are cleared
Verify that reserved fields in struct mlx5_ib_resize_cq are cleared
before continuing execution of the verb. This is required to allow
making use of this area in future revisions.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:54 -08:00