Update route MTU in case of ifnet MTU change.
Add new RTF_FIXEDMTU to track explicitly specified MTU.
Old behavior:
ifconfig em0 mtu 1500->9000 -> all routes traversing em0 do not change MTU.
User has to manually update all routes.
ifconfig em0 mtu 9000->1500 -> all routes traversing em0 do not change MTU.
However, if ip[6]_output finds route with rt_mtu > interface mtu, rt_mtu
gets updated.
New behavior:
ifconfig em0 mtu 1500->9000 -> all interface routes in all fibs gets updated
with new MTU unless RTF_FIXEDMTU flag set on them.
ifconfig em0 mtu 9000->1500 -> all routes in all fibs gets updated with new
MTU unless RTF_FIXEDMTU flag set on them AND rt_mtu is less than ifp mtu.
route add ... -mtu XXX automatically sets RTF_FIXEDMTU flag.
route change .. -mtu 0 automatically removes RTF_FIXEDMTU flag.
PR: 194238
MFC after: 1 month
CR: D1125
Initially in_matrote() in_clsroute() in their current state was introduced by
r4105 20 years ago. Instead of deleting inactive routes immediately, we kept them
in route table, setting RTPRF_OURS flag and some expire time. After that, either
GC came or RTPRF_OURS got removed on first-packet. It was a good solution
in that days (and probably another decade after that) to keep TCP metrics.
However, after moving metrics to TCP hostcache in r122922, most of in_rmx
functionality became unused. It might had been used for flushing icmp-originated
routes before rte mutexes/refcounting, but I'm not sure about that.
So it looks like this is nearly impossible to make GC do its work nowadays:
in_rtkill() ignores non-RTPRF_OURS routes.
route can only become RTPRF_OURS after dropping last reference via rtfree()
which calls in_clsroute(), which, it turn, ignores UP and non-RTF_DYNAMIC routes.
Dynamic routes can still be installed via received redirect, but they
have default lifetime (no specific rt_expire) and no one has another trie walker
to call RTFREE() on them.
So, the changelist:
* remove custom rnh_match / rnh_close matching function.
* remove all GC functions
* partially revert r256695 (proto3 is no more used inside kernel,
it is not possible to use rt_expire from user point of view, proto3 support
is not complete)
* Finish r241884 (similar to this commit) and remove remaining IPv6 parts
MFC after: 1 month
Since radix has been ignoring sa_family in passed sockaddrs,
no one ever has bothered filling valid sa_family in netmasks.
Additionally, radix adjusts sa_len field in every netmask not to
compare zero bytes at all.
This leads us to rt_mask with sa_family of AF_UNSPEC (-1) and
arbitrary sa_len field (0 for default route, for example).
However, rtsock have been passing that rt_mask intact for ages,
requiring all rtsock consumers to make ther own local hacks.
We even have unfixed on in base:
do `route -n monitor` in one window and issue `route -n get addr`
for some directly-connected address. You will probably see the following:
got message of size 304 on Thu May 8 15:06:06 2014
RTM_GET: Report Metrics: len 304, pid: 30493, seq 1, errno 0, flags:<UP,DONE,PINNED>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK,IFP,IFA>
10.0.0.0 link#1 (255) ffff ffff ff em0:8.0.27.c5.29.d4 10.0.0.92
_________________^^^^^^^^^^^^^^^^^^
after the change:
got message of size 312 on Thu May 8 15:44:07 2014
RTM_GET: Report Metrics: len 312, pid: 2895, seq 1, errno 0, flags:<UP,DONE,PINNED>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK,IFP,IFA>
10.0.0.0 link#1 255.255.255.0 em0:8.0.27.c5.29.d4 10.0.0.92
_________________^^^^^^^^^^^^^^^^^^
Sponsored by: Yandex LLC
MFC after: 1 month
After r263152 this leaves unused variables if route(8) is compiled
without INET support.
Switch the remaining variable accesses to flags and remove now obsolete
variables.
Reviewed by: glebius
MFC after: 1 week
AppleTalk was a network transport protocol for Apple Macintosh devices
in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was
a legacy protocol and primary networking protocol is TCP/IP. The last
Mac OS X release to support AppleTalk happened in 2009. The same year
routing equipment vendors (namely Cisco) end their support.
Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.
IPX was a network transport protocol in Novell's NetWare network operating
system from late 80s and then 90s. The NetWare itself switched to TCP/IP
as default transport in 1998. Later, in this century the Novell Open
Enterprise Server became successor of Novell NetWare. The last release
that claimed to still support IPX was OES 2 in 2007. Routing equipment
vendors (e.g. Cisco) discontinued support for IPX in 2011.
Thus, IPX won't be supported in FreeBSD 11.0-RELEASE.
This will make it easier to link as a library.
Submitted by: Sebastian Huber <sebastian.huber@embedded-brains.de> (older version)
Discussed on: -hackers
This will make it easier to link as a library.
Submitted by: Sebastian Huber <sebastian.huber@embedded-brains.de> (older version)
Discussed on: -hackers
This will make it easier to link as a library.
Submitted by: Sebastian Huber <sebastian.huber@embedded-brains.de> (older version)
Discussed on: -hackers
- Display a AF_LINK address in #linkN when sdl_{nlen,alen,slen) == 0 and
sdl_index != 0.
- Reduce unnecessary loop in pmsg_addrs().
- Remove iso_ntoa(). This is not used.
- Fix a bug in sodump() which prevented struct sockaddr_in6 from displaying.
- Fix a bug in in fiboptlist_csv() which could cause free() of uninitialized
pointer.
- Style cleanups:
. Add missing "static" keywords.
. Use an array of struct sockaddr_storage instead of sockunion for rtmsg.
. Use err() and errx() instead of pair of fprintf(stderr, "...") + exit(1).
. Use nitems() macro.
. Various style(9) fixes.
Since ARP and routing are separated, "proxy only" entries
don't have any meaning, thus we don't need additional field
in sockaddr to pass SIN_PROXY flag.
New kernel is binary compatible with old tools, since sizes
of sockaddr_inarp and sockaddr_in match, and sa_family are
filled with same value.
The structure declaration is left for compatibility with
third party software, but in tree code no longer use it.
Reviewed by: ru, andre, net@
- Add a range condition of given FIB number and the related error messages.
- Fix free() problem.
Spotted by: Artyom Mirgorodskiy
Discussed with: glebius
- Deembed scope id in L3 address in in6_lltable_dump().
- Simplify scope id recovery in rtsock routines.
- Remove embedded scope id handling in ndp(8) and route(8) completely.
- Check V_deembed_scopeid before checking if sa_family == AF_INET6.
- Fix scope id handing in route(8)[2] and ifconfig(8).
Reported by: rpaulo[1], Mateusz Guzik[1], peter[2]
userland via routing socket or sysctl. This eliminates the following
KAME-specific sin6_scope_id handling routine from each userland utility:
sin6.sin6_scope_id = ntohs(*(u_int16_t *)&sin6.sin6_addr.s6_addr[2]);
This behavior can be controlled by net.inet6.ip6.deembed_scopeid. This is
set to 1 by default (sin6_scope_id will be filled in the kernel).
Reviewed by: bz
comma-separated list and/or range specification:
# route add -inet 192.0.2.0/24 198.51.100.1 -fib 1,3-5,6
Although all of the subcommands supports the modifier, "monitor" does not
support the list or range specification at this moment.
Reviewed by: bz
- add static and const where appropriate
- check pointers against NULL
- minor styling nits
- it is actually WARNS=6 clean for non-strict alignment platforms
This is shamelessly stolen from DragonflyBSD and reduces our diff.
PR: bin/140078
Approved by: ed (co-mentor)
- add show as alias for get
- add weights to allow mpath to do more than equal cost
- add sticky / nostick to disable / re-enable per-connection load balancing
This adds a field to rt_metrics_lite so network bits of world will need to be re-built.
Reviewed by: jeli & qingli
calculation was too agressive. Instead we should only
look at each nibble. This makes it so we make
10.2.0.0 become 10.2/16 NOT 10.2/17.
Need to explore the non-cidr address issue. The two
may not be seperable..
MFC after: 1 week
if a entry is not route add -net xxx/bits then we should use
the addr (xxx) to establish the number of bits by looking at
the first non-zero bit. So if we enter
route add -net 10.1.1.0 10.1.3.5
this is the same as doing
route add -net 10.1.1.0/24
Since the 8th bit (zero counting) is set to 1 we set bits
to 32-8.
Users can of course still use the /x to change this behavior
or in cases where the network is in the trailing part
of the address, a "netmask" argument can be supplied to
override what is established from the interpretation of the
address itself. e.g:
route add -net 10.1.1.8 -netmask 0xff00ffff
should overide and place the proper CIDR mask in place.
PR: 131365
MFC after: 1 week
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
'get'. Since rtmsg() always gets called and returns 0 on success and -1
on failure, it's possible to exit with a suitable exit code by calling
exit(ret != 0) instead, as is done at the end of newroute().
PR: bin/112303
Submitted by: bruce@cran.org.uk
MFC after: 1 week
command would add incorrect routing entries if network numbers weren't
fully "spelled" out according to their class. For example:
# route add 128.0/16 (works)
# route add 128/16 (doesn't work)
# route add 193.0.0/24 (works)
# route add 193/24 (doesn't work)
Also, rework the way a netmask is deduced from network number if
it [netmask] is not specified.
Submitted by: Nuno Antunes <nuno.antunes@gmail.com> (mostly)
MFC after: 1 week
- Add description for EEXIST.
- Change description for ENOBUFS. Routing socket can return
this error for many different reasons, including general
memory shortage, mbuf memory shortage and rtentry zone.
PR: kern/64090 [1]
to lo(4) interfaces to have an effect, and that this is not needed
when using IP fast forwarding.
Sponsored by: eXtensible Open Router Project <URL:http://www.xorp.org/>
MFC after: 3 weeks
root is allowed to create raw sockets, then they will be able to create
routing sockets, too. However prison-root is not able to manipulate
routing tables. So when route(8) attempts to write to a routing
socket and recieves EPERM from the kernel, exit rather than moving
on with execution.
Approved by: bmilekic (mentor)
prior sysctl due to the structure growing between calls try again.
Also try again for deleting routes if things fail. We've seen
route -f fail this way which does not actually flush all routes.
This fixes it. It will whine but it will do the work.
PR: 56732
Obtained from: IronPort