These macros allow conveniently declaring arrays which use NFT_{RT,CT}_*
values as indexes.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
IPv6 Segment Routing Header (SRH) contains a list of SIDs to be crossed
by SR encapsulated packet. Each SID is encoded as an IPv6 prefix.
When a Firewall receives an SR encapsulated packet, it should be able
to identify which node previously processed the packet (previous SID),
which node is going to process the packet next (next SID), and which
node is the last to process the packet (last SID) which represent the
final destination of the packet in case of inline SR mode.
An example use-case of using these features could be SID list that
includes two firewalls. When the second firewall receives a packet,
it can check whether the packet has been processed by the first firewall
or not. Based on that check, it decides to apply all rules, apply just
subset of the rules, or totally skip all rules and forward the packet to
the next SID.
This patch extends SRH match to support matching previous SID, next SID,
and last SID.
Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The modulus in the hash function was limited to > 1 as initially
there was no sense to create a hashing of just one element.
Nevertheless, there are certain cases specially for load balancing
where this case needs to be addressed.
This patch fixes the following error.
Error: Could not process rule: Numerical result out of range
add rule ip nftlb lb01 dnat to jhash ip saddr mod 1 map { 0: 192.168.0.10 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The solution comes to force the hash to 0 when the modulus is 1.
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
This patch includes a new attribute in the numgen structure to allow
the lookup of an element based on the number generator as a key.
For this purpose, different ops have been included to extend the
current numgen inc functions.
Currently, only supported for numgen incremental operations, but
it will be supported for random in a follow-up patch.
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Similarly, tbl->entries is not initialized after kmalloc(),
therefore causes an uninit-value warning in ip_vs_lblc_check_expire(),
as reported by syzbot.
Reported-by: <syzbot+3e9695f147fb529aa9bc@syzkaller.appspotmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
tbl->entries is not initialized after kmalloc(), therefore
causes an uninit-value warning in ip_vs_lblc_check_expire()
as reported by syzbot.
Reported-by: <syzbot+3dfdea57819073a04f21@syzkaller.appspotmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Simon Horman says:
====================
IPVS Updates for v4.18
please consider these IPVS enhancements for v4.18.
* Whitepace cleanup
* Add Maglev hashing algorithm as a IPVS scheduler
Inju Song says "Implements the Google's Maglev hashing algorithm as a
IPVS scheduler. Basically it provides consistent hashing but offers some
special features about disruption and load balancing.
1) minimal disruption: when the set of destinations changes,
a connection will likely be sent to the same destination
as it was before.
2) load balancing: each destination will receive an almost
equal number of connections.
Seel also: [3.4 Consistent Hasing] in
https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-eisenbud.pdf
"
* Fix to correct implementation of Knuth's multiplicative hashing
which is used in sh/dh/lblc/lblcr algorithms. Instead the
implementation provided by the hash_32() macro is used.
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
It overcomplicates things for no reason.
nft_meta_bridge only offers retrieval of bridge port interface name.
Because of this being its own module, we had to export all nft_meta
functions, which we can then make static again (which even reduces
the size of nft_meta -- including bridge port retrieval...):
before:
text data bss dec hex filename
1838 832 0 2670 a6e net/bridge/netfilter/nft_meta_bridge.ko
6147 936 1 7084 1bac net/netfilter/nft_meta.ko
after:
5826 936 1 6763 1a6b net/netfilter/nft_meta.ko
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nft rejects rules that lack a timeout and a size limit when they're used
to add elements from packet path.
Pick a sane upperlimit instead of rejecting outright.
The upperlimit is visible to userspace, just as if it would have been
given during set declaration.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Marco De Benedetto says:
I would like to use a timeout of 30 days for elements in a set but it
seems there is a some kind of problem above 24d20h31m23s.
Fix this by using 'jiffies64' for timeout handling to get same behaviour
on 32 and 64bit systems.
nftables passes timeouts as u64 in milliseconds to the kernel,
but on kernel side we used a mixture of 'long' and jiffies conversions
rather than u64 and jiffies64.
Bugzilla: https://bugzilla.netfilter.org/show_bug.cgi?id=1237
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
ipt_get_target is used to get struct xt_entry_target
and ipt_get_target_c is used to get const struct xt_entry_target.
However in the ipt_do_table, ipt_get_target is used to get
const struct xt_entry_target. it should be replaced by ipt_get_target_c.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
ebt_get_target similar to {ip/ip6/arp}t_get_target.
and ebt_get_target_c similar to {ip/ip6/arp}t_get_target_c.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In the check_target, ip6t_get_target is called twice.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
EBT_MATCH and EBT_NOMATCH are used to change return value.
match functions(ebt_xxx.c) return false when received frame is not matched
and returns true when received frame is matched.
but, EBT_MATCH_ITERATE understands oppositely.
so, to change return value, EBT_MATCH and EBT_NOMATCH are used.
but, we can use operation '!' simply.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
A ebt_free_table_info frees all of chainstacks.
It similar to xt_free_table_info. this inline function
reduces code line.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
There are no __exit mark in the helper modules.
because these exit functions used to be called by init function
but now that is not. so we can add __exit mark.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This is a patch proposal to support shifted ranges in portmaps. (i.e. tcp/udp
incoming port 5000-5100 on WAN redirected to LAN 192.168.1.5:2000-2100)
Currently DNAT only works for single port or identical port ranges. (i.e.
ports 5000-5100 on WAN interface redirected to a LAN host while original
destination port is not altered) When different port ranges are configured,
either 'random' mode should be used, or else all incoming connections are
mapped onto the first port in the redirect range. (in described example
WAN:5000-5100 will all be mapped to 192.168.1.5:2000)
This patch introduces a new mode indicated by flag NF_NAT_RANGE_PROTO_OFFSET
which uses a base port value to calculate an offset with the destination port
present in the incoming stream. That offset is then applied as index within the
redirect port range (index modulo rangewidth to handle range overflow).
In described example the base port would be 5000. An incoming stream with
destination port 5004 would result in an offset value 4 which means that the
NAT'ed stream will be using destination port 2004.
Other possibilities include deterministic mapping of larger or multiple ranges
to a smaller range : WAN:5000-5999 -> LAN:5000-5099 (maps WAN port 5*xx to port
51xx)
This patch does not change any current behavior. It just adds new NAT proto
range functionality which must be selected via the specific flag when intended
to use.
A patch for iptables (libipt_DNAT.c + libip6t_DNAT.c) will also be proposed
which makes this functionality immediately available.
Signed-off-by: Thierry Du Tre <thierry@dtsystems.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Drop nft_set_type's ability to act as a container of multiple backend
implementations it chooses from. Instead consolidate the whole selection
logic in nft_select_set_ops() and the actual backend provided estimate()
callback.
This turns nf_tables_set_types into a list containing all available
backends which is traversed when selecting one matching userspace
requested criteria.
Also, this change allows to embed nft_set_ops structure into
nft_set_type and pull flags field into the latter as it's only used
during selection phase.
A crucial part of this change is to make sure the new layout respects
hash backend constraints formerly enforced by nft_hash_select_ops()
function: This is achieved by introduction of a specific estimate()
callback for nft_hash_fast_ops which returns false for key lengths != 4.
In turn, nft_hash_estimate() is changed to return false for key lengths
== 4 so it won't be chosen by accident. Also, both callbacks must return
false for unbounded sets as their size estimate depends on a known
maximum element count.
Note that this patch partially reverts commit 4f2921ca21 ("netfilter:
nf_tables: meter: pick a set backend that supports updates") by making
nft_set_ops_candidate() not explicitly look for an update callback but
make NFT_SET_EVAL a regular backend feature flag which is checked along
with the others. This way all feature requirements are checked in one
go.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Keep it simple to start with, just report attribute offsets that can be
useful to userspace when representating errors to users.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Replace the nf_tables_ prefix by nft_ and merge code into single lookup
function whenever possible. In many cases we go over the 80-chars
boundary function names, this save us ~50 LoC.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pass all NAT types to the flow offload struct, otherwise parts of the
address/port pair do not get translated properly, causing connection
stalls
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Allow the slow path to handle the shutdown of the connection with proper
timeouts. The packet containing RST/FIN is also sent to the slow path
and the TCP conntrack module will update its state.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since conntrack hasn't seen any packets from the offloaded flow in a
while, and the timeout for offloaded flows is set to an extremely long
value, we need to fix up the state before we can send a flow back to the
slow path.
For TCP, reset td_maxwin in both directions, which makes it resync its
state on the next packets.
Use the regular timeout for TCP and UDP established connections.
This allows the slow path to take over again once the offload state has
been torn down
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
On cleanup, this will be treated differently from FLOW_OFFLOAD_DYING:
If FLOW_OFFLOAD_DYING is set, the connection is going away, so both the
offload state and the connection tracking entry will be deleted.
If FLOW_OFFLOAD_TEARDOWN is set, the connection remains alive, but
the offload state is torn down. This is useful for cases that require
more complex state tracking / timeout handling on TCP, or if the
connection has been idle for too long.
Support for sending flows back to the slow path will be implemented in
a following patch
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
It is too trivial to keep as a separate exported function
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Avoids having nf_flow_table depend on nftables (useful for future
iptables backport work)
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The offload ip hook expects a pointer to the flowtable, not to the
rhashtable. Since the rhashtable is the first member, this is safe for
the moment, but breaks as soon as the structure layout changes
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reduces duplication of .gc and .params in flowtable type definitions and
makes the API clearer
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since the offload hook code was moved, this table no longer depends on
the IPv4 and IPv6 flowtable modules
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Useful as preparation for adding iptables support for offload.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Allows the function to be shared with the IPv6 hook code
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Allows some minor code sharing with the ipv6 hook code and is also
useful as preparation for adding iptables support for offload
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reduces the number of cache lines touched in the offload forwarding
path. This is safe because PMTU limits are bypassed for the forwarding
path (see commit f87c10a8aa for more details).
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Just like ip_dst_mtu_maybe_forward(), to avoid a dependency with ipv6.ko.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reduce code duplication and make it much easier to read
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The nfulnl_log_packet() is added to make sure that the NFLOG target
works as only user-space logger. but now, nf_log_packet() can find proper
log function using NF_LOG_TYPE_ULOG and NF_LOG_TYPE_LOG.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
lockdep does not know that the locks used by IPv4 defrag
and IPv6 reassembly units are of different classes.
It complains because of following chains :
1) sch_direct_xmit() (lock txq->_xmit_lock)
dev_hard_start_xmit()
xmit_one()
dev_queue_xmit_nit()
packet_rcv_fanout()
ip_check_defrag()
ip_defrag()
spin_lock() (lock frag queue spinlock)
2) ip6_input_finish()
ipv6_frag_rcv() (lock frag queue spinlock)
ip6_frag_queue()
icmpv6_param_prob() (lock txq->_xmit_lock at some point)
We could add lockdep annotations, but we also can make sure IPv6
calls icmpv6_param_prob() only after the release of the frag queue spinlock,
since this naturally makes frag queue spinlock a leaf in lock hierarchy.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the NetVSP v6 and 6.1 message structures, and includes
these versions into NetVSC/NetVSP version negotiation process.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implement the 'Device Naming' feature of the Hyper-V
network device API. In Hyper-V on the host through the GUI or PowerShell
it is possible to enable the device naming feature which causes
the host to make available to the guest the name of the device.
This shows up in the RNDIS protocol as the friendly name.
The name has no particular meaning and is limited to 256 characters.
The value can only be set via PowerShell on the host, but could
be scripted for mass deployments. The default value is the
string 'Network Adapter' and since that is the same for all devices
and useless, the driver ignores it.
In Windows, the value goes into a registry key for use in SNMP
ifAlias. For Linux, this patch puts the value in the network
device alias property; where it is visible in ip tools and SNMP.
The host provided ifAlias is just a suggestion, and can be
overridden by later ip commands.
Also requires exporting dev_set_alias in netdev core.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit says:
====================
r8169: series with further smaller improvements
This series includes further smaller improvements.
Then I think the basic cleanup has been done and next step would be
preparing the switch to phylib.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the chip configuration entries only RTL8169 (ver <= 06)
supports tx checksumming for jumbo packets.
By the way: constant JUMBO_1K is a little misleading because it refers
to the standard packet size and not to a jumbo packet size.
By implementing this rule we can get rid of configuring tx checksumming
support per chip type.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The region to be used is always the first of type IORESOURCE_MEM.
We can implement this rule directly w/o having to specify which
region is the first one per configuration entry.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>