linux/net
Pablo Neira Ayuso 0269ea4937 netfilter: xtables: add cluster match
This patch adds the iptables cluster match. This match can be used
to deploy gateway and back-end load-sharing clusters. The cluster
can be composed of 32 nodes maximum (although I have only tested
this with two nodes, so I cannot tell what is the real scalability
limit of this solution in terms of cluster nodes).

Assuming that all the nodes see all packets (see below for an
example on how to do that if your switch does not allow this), the
cluster match decides if this node has to handle a packet given:

	(jhash(source IP) % total_nodes) & node_mask

For related connections, the master conntrack is used. The following
is an example of its use to deploy a gateway cluster composed of two
nodes (where this is the node 1):

iptables -I PREROUTING -t mangle -i eth1 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth1 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth1 \
	-m mark ! --mark 0xffff -j DROP
iptables -A PREROUTING -t mangle -i eth2 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth2 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth2 \
	-m mark ! --mark 0xffff -j DROP

And the following commands to make all nodes see the same packets:

ip maddr add 01:00:5e:00:01:01 dev eth1
ip maddr add 01:00:5e:00:01:02 dev eth2
arptables -I OUTPUT -o eth1 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:01
arptables -I INPUT -i eth1 --h-length 6 \
	--destination-mac 01:00:5e:00:01:01 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
arptables -I OUTPUT -o eth2 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:02
arptables -I INPUT -i eth2 --h-length 6 \
	--destination-mac 01:00:5e:00:01:02 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27

In the case of TCP connections, pickup facility has to be disabled
to avoid marking TCP ACK packets coming in the reply direction as
valid.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

BTW, some final notes:

 * This match mangles the skbuff pkt_type in case that it detects
PACKET_MULTICAST for a non-multicast address. This may be done in
a PKTTYPE target for this sole purpose.
 * This match supersedes the CLUSTERIP target.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 17:10:36 +01:00
..
9p 9p: fix endian issues [attempt 3] 2009-02-06 22:07:41 -08:00
802 net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
8021q gro: Optimise Ethernet header comparison 2009-02-08 20:22:18 -08:00
appletalk net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
atm lec: convert to net_device_ops 2009-01-21 14:02:00 -08:00
ax25 ax25: more common return path joining 2009-02-06 23:47:14 -08:00
bluetooth bluetooth: driver API update 2009-01-07 17:23:17 -08:00
bridge netfilter: ebtables: remove unneeded initializations 2009-02-18 16:30:38 +01:00
can ip: support for TX timestamps on UDP and RAW sockets 2009-02-15 22:43:38 -08:00
core net: pass new SIOCSHWTSTAMP through to device drivers 2009-02-15 22:43:38 -08:00
dcb DCB: fix kfree(skb) 2009-01-04 17:29:21 -08:00
dccp dccp: Debugging functions for feature negotiation 2009-01-21 14:34:05 -08:00
decnet net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
dsa net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
econet net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
ethernet eth: Declare an optimized compare_ether_addr_64bits() function 2008-11-23 23:24:32 -08:00
ipv4 netfilter: auto-load ip_queue module when socket opened 2009-03-16 15:31:10 +01:00
ipv6 netfilter: auto-load ip6_queue module when socket opened 2009-03-16 15:30:14 +01:00
ipx net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
irda net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
iucv s390: remove s390_root_dev_*() 2009-01-06 10:44:34 -08:00
key af_key: initialize xfrm encap_oa 2009-01-25 20:49:14 -08:00
lapb [LAPB] net/lapb/lapb_iface.c: use LIST_HEAD instead of LIST_HEAD_INIT 2008-01-28 14:56:52 -08:00
llc net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
mac80211 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2009-02-14 23:12:00 -08:00
netfilter netfilter: xtables: add cluster match 2009-03-16 17:10:36 +01:00
netlabel netlabel: Update kernel configuration API 2008-12-31 12:54:11 -05:00
netlink netlink: change return-value logic of netlink_broadcast() 2009-02-05 23:56:36 -08:00
netrom netrom: convert to net_device_ops 2009-01-21 14:02:02 -08:00
packet net: packet socket packet_lookup_frame fix 2009-02-01 01:53:29 -08:00
phonet Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2009-02-14 23:12:00 -08:00
rfkill net/rfkill/rfkill.c: fix unused rfkill_led_trigger() warning 2009-01-04 17:11:24 -08:00
rose rose: convert to network_device_ops 2009-01-21 14:02:04 -08:00
rxrpc RxRPC: Fix a potential NULL dereference 2009-02-06 21:50:52 -08:00
sched pkt_sched: sch_multiq: Change errno on non-multiqueue devices use. 2009-02-10 00:11:21 -08:00
sctp sctp: Inherit all socket options from parent correctly. 2009-02-16 00:03:11 -08:00
sunrpc net/sunrpc/xprtsock.c: some common code found 2009-02-06 23:48:33 -08:00
tipc net/tipc/bcast.h: use ARRAY_SIZE 2009-01-11 00:06:33 -08:00
unix introduce new LSM hooks where vfsmount is available. 2008-12-31 18:07:37 -05:00
wanrouter netdevice wanrouter: Convert directly reference of netdev->priv 2008-11-20 04:26:21 -08:00
wimax Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2009-02-14 23:12:00 -08:00
wireless cfg80211: add more flexible BSS lookup 2009-02-13 13:45:56 -05:00
x25 net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
xfrm Revert "xfrm: For 32/64 compatability wrt. xfrm_usersa_info" 2009-01-20 09:49:51 -08:00
compat.c net: socket infrastructure for SO_TIMESTAMPING 2009-02-15 22:43:35 -08:00
Kconfig Phonet: move to Networking options like other protocol stacks 2009-01-26 21:03:33 -08:00
Makefile wimax: Makefile, Kconfig and docbook linkage for the stack 2009-01-07 10:00:17 -08:00
nonet.c
socket.c net: socket infrastructure for SO_TIMESTAMPING 2009-02-15 22:43:35 -08:00
sysctl_net.c net: sysctl_net - use net_eq to compare nets 2009-03-16 16:23:30 +01:00
TUNABLE