linux/net/rds
Tina Yang 35b52c7053 RDS: Fix corrupted rds_mrs
On second look at this bug (OFED #2002), it seems that the
collision is not with the retransmission queue (packet acked
by the peer), but with the local send completion.  A theoretical
sequence of events (from time t0 to t3) is thought to be as
follows,

Thread #1
t0:
    sock_release
    rds_release
    rds_send_drop_to /* wait on send completion */
t2:
    rds_rdma_drop_keys()   /* destroy & free all mrs */

Thread #2
t1:
    rds_ib_send_cq_comp_handler
    rds_ib_send_unmap_rm
    rds_message_unmapped   /* wake up #1 @ t0 */
t3:
    rds_message_put
    rds_message_purge
    rds_mr_put   /* memory corruption detected */

The problem with the rds_rdma_drop_keys() is it could
remove a mr's refcount more than its due (i.e. repeatedly
as long as it still remains in the tree (mr->r_refcount > 0)).
Theoretically it should remove only one reference - reference
by the tree.

        /* Release any MRs associated with this socket */
        while ((node = rb_first(&rs->rs_rdma_keys))) {
                mr = container_of(node, struct rds_mr, r_rb_node);
                if (mr->r_trans == rs->rs_transport)
                        mr->r_invalidate = 0;
                rds_mr_put(mr);
        }

I think the correct way of doing it is to remove the mr from
the tree and rds_destroy_mr it first, then a rds_mr_put()
to decrement its reference count by one.  Whichever thread
holds the last reference will free the mr via rds_mr_put().

Signed-off-by: Tina Yang <tina.yang@oracle.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:07:31 -07:00
..
af_rds.c net: sk_sleep() helper 2010-04-20 16:37:13 -07:00
bind.c RDS: Add a debug message suggesting to load transport modules 2009-08-23 19:13:14 -07:00
cong.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-04-11 14:53:53 -07:00
connection.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
ib.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
ib.h RDS/IB+IW: Move recv processing to a tasklet 2009-10-30 15:06:39 -07:00
ib_cm.c net/rds: Add missing mutex_unlock 2010-05-29 00:18:48 -07:00
ib_rdma.c RDS: Fix BUG_ONs to not fire when in a tasklet 2010-09-08 18:07:31 -07:00
ib_recv.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-04-11 14:53:53 -07:00
ib_ring.c RDS/IW+IB: Set recv ring low water mark to 1/2 full. 2009-04-09 17:21:14 -07:00
ib_send.c RDS: Properly unmap when getting a remote access error 2010-03-16 21:17:00 -07:00
ib_stats.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2009-09-15 09:39:44 -07:00
ib_sysctl.c sysctl: Drop & in front of every proc_handler. 2009-11-18 08:37:40 -08:00
info.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
info.h RDS: Info and stats 2009-02-26 23:39:25 -08:00
iw.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
iw.h RDS/IB+IW: Move recv processing to a tasklet 2009-10-30 15:06:39 -07:00
iw_cm.c net/rds: Add missing mutex_unlock 2010-05-29 00:18:48 -07:00
iw_rdma.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
iw_recv.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-04-11 14:53:53 -07:00
iw_ring.c RDS/IW+IB: Set recv ring low water mark to 1/2 full. 2009-04-09 17:21:14 -07:00
iw_send.c RDS: Do not BUG() on error returned from ib_post_send 2010-03-16 21:16:53 -07:00
iw_stats.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2009-09-15 09:39:44 -07:00
iw_sysctl.c sysctl: Drop & in front of every proc_handler. 2009-11-18 08:37:40 -08:00
Kconfig RDS: Modularize RDMA and TCP transports 2009-08-23 19:13:09 -07:00
loop.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-04-11 14:53:53 -07:00
loop.h RDS: loopback 2009-02-26 23:39:26 -08:00
Makefile RDS: Modularize RDMA and TCP transports 2009-08-23 19:13:09 -07:00
message.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
page.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
rdma.c RDS: Fix corrupted rds_mrs 2010-09-08 18:07:31 -07:00
rdma.h RDS: Add GET_MR_FOR_DEST sockopt 2009-10-30 15:06:37 -07:00
rdma_transport.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-04-27 12:49:13 -07:00
rdma_transport.h RDS: Common RDMA transport code 2009-02-26 23:39:33 -08:00
rds.h net: sk_sleep() helper 2010-04-20 16:37:13 -07:00
recv.c rds: fix a leak of kernel memory 2010-08-18 23:40:03 -07:00
send.c net: sk_sleep() helper 2010-04-20 16:37:13 -07:00
stats.c RDS: Export symbols from core RDS 2009-08-23 19:13:07 -07:00
sysctl.c sysctl: Drop & in front of every proc_handler. 2009-11-18 08:37:40 -08:00
tcp.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
tcp.h RDS: Add TCP transport to RDS 2009-08-23 19:13:02 -07:00
tcp_connect.c net: Remove unnecessary semicolons after switch statements 2010-05-17 17:44:35 -07:00
tcp_listen.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
tcp_recv.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-04-11 14:53:53 -07:00
tcp_send.c RDS/TCP: Wait to wake thread when write space available 2010-03-16 21:16:55 -07:00
tcp_stats.c RDS: Add TCP transport to RDS 2009-08-23 19:13:02 -07:00
threads.c RDS: Enable per-cpu workqueue threads 2010-03-16 21:17:02 -07:00
transport.c RDS: Track transports via an array, not a list 2009-08-23 19:13:12 -07:00