check_compat_entry_size_and_hooks iterates over the matches and calls
compat_check_calc_match, which loads the match and calculates the
compat offsets, but unlike the non-compat version, doesn't call
->checkentry yet. On error however it calls cleanup_matches, which in
turn calls ->destroy, which can result in crashes if the destroy
function (validly) expects to only get called after the checkentry
function.
Add a compat_release_match function that only drops the module reference
on error and rename compat_check_calc_match to compat_find_calc_match to
reflect the fact that it doesn't call the checkentry function.
Reported by Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a helper module is unloaded all conntracks refering to it have their
helper pointer NULLed out, leading to lots of races. In most places this
can be fixed by proper use of RCU (they do already check for != NULL,
but in a racy way), additionally nf_conntrack_expect_related needs to
bail out when no helper is present.
Also remove two paranoid BUG_ONs in nf_conntrack_proto_gre that are racy
and not worth fixing.
Signed-off-by: Patrick McHarrdy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
They're the same.
Signed-off-by: Jing Min Zhao <zhaojingmin@vivecode.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the packet size is changed by the FTP NAT helper, the connection
tracking helper adjusts the sequence number of the newline character
by the size difference. This is wrong because NAT sequence number
adjustment happens after helpers are called, so the unadjusted number
is compared to the already adjusted one.
Based on report by YU, Haitao <yuhaitao@tsinghua.org.cn>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The option is named CONFIG_NF_NAT not CONFIG_IP_NF_NAT. Remove the ifdef
completely since helpers also expect defragmented packet even without
NAT.
Noticed by Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
iptables matches and targets expect packets to have at least a full
IP header and a valid header length. Ignore packets sent through
raw sockets for which this isn't true as in the other tables.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
nf_nat_rule_find, alloc_null_binding and alloc_null_binding_confirmed
do not use the argument 'info', which is actually ct->nat.info.
If they are necessary to access it again, we can use the argument 'ct'
instead.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
- move arp_tables initial table structure definitions to arp_tables.h
similar to ip_tables and ip6_tables
- use C99 initializers
- use initializer macros where possible
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I needed to use this recently to talk to a Cisco server. In my case
I only did SNAT while the Cisco server used a different address for
RTP traffic than the one for SIP. I discovered that nf_nat_sip NATed
the RTP address to the SIP one which was unnecessary but OK. However,
in doing so it did not DNAT the destination address on the RTP traffic
to the Cisco back to the original RTP address.
This patch corrects this by noting down the RTP address and using it
when the expectation fires.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
While porting some changes of the 2.6.21-rc7 pptp/proto_gre conntrack
and nat modules to a 2.4.32 kernel I noticed that the gre_key function
returns a wrong pointer to the GRE key of a version 0 packet thus
corrupting the packet payload.
The intended behaviour for GREv0 packets is to act like
nf_conntrack_proto_generic/nf_nat_proto_unknown so I have ripped the
offending functions (not used anymore) and modified the
nf_nat_proto_gre modules to not touch version 0 (non PPTP) packets.
Signed-off-by: Jorge Boncompte <jorge@dti2.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also accept the --random option for DNAT to allow randomly selecting a
destination port from the given range.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch cb_lock to mutex and allow netlink kernel users to override it
with a subsystem specific mutex for consistent locking in dump callbacks.
All netlink_dump_start users have been audited not to rely on any
side-effects of the previously used spinlock.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
All LOG targets always use their internal logging function nowadays, so
remove the incorrect error message and handle real errors (!= -EEXIST)
by failing to load.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When mangling packets forwarded to a HW checksumming capable device,
offload recalculation of the checksum instead of doing it in software.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
To clearly state the intent of copying to linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
For the common "(struct nlmsghdr *)skb->data" sequence, so that we reduce the
number of direct accesses to skb->data and for consistency with all the other
cast skb member helpers.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)
Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and
skb->mac to skb->mac_header, to match the names of the associated helpers
(skb[_[re]set]_{transport,network,mac}_header).
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the obsolete IPv4 only connection tracking/NAT as scheduled in
feature-removal-schedule.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common sequence "skb->nh.iph->ihl * 4", removing a good number of open
coded skb->nh.iph uses, now to go after the rest...
Just out of curiosity, here are the idioms found to get the same result:
skb->nh.iph->ihl << 2
skb->nh.iph->ihl<<2
skb->nh.iph->ihl * 4
skb->nh.iph->ihl*4
(skb->nh.iph)->ihl * sizeof(u32)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the places where we need a pointer to the network header, it is still legal
to touch skb->nh.raw directly if just adding to, subtracting from or setting it
to another layer header.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the places where we need a pointer to the mac header, it is still legal to
touch skb->mac.raw directly if just adding to, subtracting from or setting it
to another layer header.
This one also converts some more cases to skb_reset_mac_header() that my
regex missed as it had no spaces before nor after '=', ugh.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently use a special structure (struct skb_timeval) and plain
'struct timeval' to store packet timestamps in sk_buffs and struct
sock.
This has some drawbacks :
- Fixed resolution of micro second.
- Waste of space on 64bit platforms where sizeof(struct timeval)=16
I suggest using ktime_t that is a nice abstraction of high resolution
time services, currently capable of nanosecond resolution.
As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits
a 8 byte shrink of this structure on 64bit architectures. Some other
structures also benefit from this size reduction (struct ipq in
ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...)
Once this ktime infrastructure adopted, we can more easily provide
nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or
SO_TIMESTAMPNS/SCM_TIMESTAMPNS)
Note : this patch includes a bug correction in
compat_sock_get_timestamp() where a "err = 0;" was missing (so this
syscall returned -ENOENT instead of 0)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
CC: John find <linux.kernel@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two device string comparison loops in arp_packet_match().
The first one goes byte-by-byte but the second one tries to be
clever and cast the string to a long and compare by longs.
The device name strings in the arp table entries are not guarenteed
to be aligned enough to make this value, so just use byte-by-byte
for both cases.
Based upon a report by <drraid@gmail.com>.
Signed-off-by: David S. Miller <davem@davemloft.net>
Use put_unaligned to fix warnings about unaligned accesses.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The clusterip_config_find_get() already increases entries reference
counter, so there is no reason to do it twice in checkentry() callback.
This causes the config to be freed before it is removed from the list,
resulting in a crash when adding the next rule.
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently NAT not only reroutes packets in the OUTPUT chain when the
routing key changed, but also if only the non-routing part of the
IPsec policy key changed. This breaks ping -I since it doesn't use
SO_BINDTODEVICE but IP_PKTINFO cmsg to specify the output device, and
this information is lost.
Only do full rerouting if the routing key changed, and just do a new
policy lookup with the old route if only the ports changed.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some stacks apparently send packets with SYN|URG set. Linux accepts
these packets, so TCP conntrack should to.
Pointed out by Martijn Posthuma <posthuma@sangine.com>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The nf_conntrack_netlink config option is named CONFIG_NF_CT_NETLINK,
but multiple files use CONFIG_IP_NF_CONNTRACK_NETLINK or
CONFIG_NF_CONNTRACK_NETLINK for ifdefs.
Fix this and reformat all CONFIG_NF_CT_NETLINK ifdefs to only use a line.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix {nf,ip}_ct_iterate_cleanup unconfirmed list handling:
- unconfirmed entries can not be killed manually, they are removed on
confirmation or final destruction of the conntrack entry, which means
we might iterate forever without making forward progress.
This can happen in combination with the conntrack event cache, which
holds a reference to the conntrack entry, which is only released when
the packet makes it all the way through the stack or a different
packet is handled.
- taking references to an unconfirmed entry and using it outside the
locked section doesn't work, the list entries are not refcounted and
another CPU might already be waiting to destroy the entry
What the code really wants to do is make sure the references of the hash
table to the selected conntrack entries are released, so they will be
destroyed once all references from skbs and the event cache are dropped.
Since unconfirmed entries haven't even entered the hash yet, simply mark
them as dying and skip confirmation based on that.
Reported and tested by Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name. Which is
pain for caching and the proc interface never implemented.
I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.
So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Corey Minyard <minyard@acm.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there. Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.
To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.
Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm. I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The TCP reset packet is copied from the original. This
includes all the GSO bits which do not apply to the new
packet. So we should clear those bits.
Spotted by Patrick McHardy.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of depending on internally needed options and letting users
figure out what is needed, select them when needed:
- IP_NF_IPTABLES, IP_NF_ARPTABLES and IP6_NF_IPTABLES select
NETFILTER_XTABLES
- NETFILTER_XT_TARGET_CONNMARK, NETFILTER_XT_MATCH_CONNMARK and
IP_NF_TARGET_CLUSTERIP select NF_CONNTRACK_MARK
- NETFILTER_XT_MATCH_CONNBYTES selects NF_CT_ACCT
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
CONNTRACK_STAT_INC assumes rcu_read_lock in nf_hook_slow disables
preemption as well, making it legal to use __get_cpu_var without
disabling preemption manually. The assumption is not correct anymore
with preemptable RCU, additionally we need to protect against softirqs
when not holding ip_conntrack_lock.
Add CONNTRACK_STAT_INC_ATOMIC macro, which disables local softirqs,
and use where necessary.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace preempt_{enable,disable} based RCU by proper use of the
RCU API and add missing rcu_read_lock/rcu_read_unlock calls in
all paths not obviously only used within packet process context
(nfnetlink_conntrack).
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace preempt_{enable,disable} based RCU by proper use of the
RCU API and add missing rcu_read_lock/rcu_read_unlock calls in
all paths not obviously only used within packet process context
(nfnetlink_conntrack).
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace preempt_{enable,disable} based RCU by proper use of the
RCU API and add missing rcu_read_lock/rcu_read_unlock calls in
paths used outside of packet processing context (nfnetlink_conntrack).
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace preempt_{enable,disable} based RCU by proper use of the
RCU API and add missing rcu_read_lock/rcu_read_unlock calls in
paths used outside of packet processing context (nfnetlink_conntrack).
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
- rename nf_logging to nf_loggers since its an array of registered loggers
- rename nf_log_unregister_logger() to nf_log_unregister() to make it
symetrical to nf_log_register() and convert all users
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use rcu_assign_pointer/rcu_dereference for ip_ct_attach pointer instead
of self-made RCU and use rcu_read_lock to make sure the conntrack module
doesn't disappear below us while calling it, since this function can be
called from outside the netfilter hooks.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>