Commit graph

122619 commits

Author SHA1 Message Date
Rick Macklem 90d2dfab19 Merge the pNFS server code from projects/pnfs-planb-server into head.
This code merge adds a pNFS service to the NFSv4.1 server. Although it is
a large commit it should not affect behaviour for a non-pNFS NFS server.
Some documentation on how this works can be found at:
http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt
and will hopefully be turned into a proper document soon.
This is a merge of the kernel code. Userland and man page changes will
come soon, once the dust settles on this merge.
It has passed a "make universe", so I hope it will not cause build problems.
It also adds NFSv4.1 server support for the "current stateid".

Here is a brief overview of the pNFS service:
A pNFS service separates the Read/Write oeprations from all the other NFSv4.1
Metadata operations. It is hoped that this separation allows a pNFS service
to be configured that exceeds the limits of a single NFS server for either
storage capacity and/or I/O bandwidth.
It is possible to configure mirroring within the data servers (DSs) so that
the data storage file for an MDS file will be mirrored on two or more of
the DSs.
When this is used, failure of a DS will not stop the pNFS service and a
failed DS can be recovered once repaired while the pNFS service continues
to operate.  Although two way mirroring would be the norm, it is possible
to set a mirroring level of up to four or the number of DSs, whichever is
less.
The Metadata server will always be a single point of failure,
just as a single NFS server is.

A Plan B pNFS service consists of a single MetaData Server (MDS) and K
Data Servers (DS), all of which are recent FreeBSD systems.
Clients will mount the MDS as they would a single NFS server.
When files are created, the MDS creates a file tree identical to what a
single NFS server creates, except that all the regular (VREG) files will
be empty. As such, if you look at the exported tree on the MDS directly
on the MDS server (not via an NFS mount), the files will all be of size 0.
Each of these files will also have two extended attributes in the system
attribute name space:
pnfsd.dsfile - This extended attrbute stores the information that
    the MDS needs to find the data storage file(s) on DS(s) for this file.
pnfsd.dsattr - This extended attribute stores the Size, AccessTime, ModifyTime
    and Change attributes for the file, so that the MDS doesn't need to
    acquire the attributes from the DS for every Getattr operation.
For each regular (VREG) file, the MDS creates a data storage file on one
(or more if mirroring is enabled) of the DSs in one of the "dsNN"
subdirectories.  The name of this file is the file handle
of the file on the MDS in hexadecimal so that the name is unique.
The DSs use subdirectories named "ds0" to "dsN" so that no one directory
gets too large. The value of "N" is set via the sysctl vfs.nfsd.dsdirsize
on the MDS, with the default being 20.
For production servers that will store a lot of files, this value should
probably be much larger.
It can be increased when the "nfsd" daemon is not running on the MDS,
once the "dsK" directories are created.

For pNFS aware NFSv4.1 clients, the FreeBSD server will return two pieces
of information to the client that allows it to do I/O directly to the DS.
DeviceInfo - This is relatively static information that defines what a DS
             is. The critical bits of information returned by the FreeBSD
             server is the IP address of the DS and, for the Flexible
             File layout, that NFSv4.1 is to be used and that it is
             "tightly coupled".
             There is a "deviceid" which identifies the DeviceInfo.
Layout     - This is per file and can be recalled by the server when it
             is no longer valid. For the FreeBSD server, there is support
             for two types of layout, call File and Flexible File layout.
             Both allow the client to do I/O on the DS via NFSv4.1 I/O
             operations. The Flexible File layout is a more recent variant
             that allows specification of mirrors, where the client is
             expected to do writes to all mirrors to maintain them in a
             consistent state. The Flexible File layout also allows the
             client to report I/O errors for a DS back to the MDS.
             The Flexible File layout supports two variants referred to as
             "tightly coupled" vs "loosely coupled". The FreeBSD server always
             uses the "tightly coupled" variant where the client uses the
             same credentials to do I/O on the DS as it would on the MDS.
             For the "loosely coupled" variant, the layout specifies a
             synthetic user/group that the client uses to do I/O on the DS.
             The FreeBSD server does not do striping and always returns
             layouts for the entire file. The critical information in a layout
             is Read vs Read/Writea and DeviceID(s) that identify which
             DS(s) the data is stored on.

At this time, the MDS generates File Layout layouts to NFSv4.1 clients
that know how to do pNFS for the non-mirrored DS case unless the sysctl
vfs.nfsd.default_flexfile is set non-zero, in which case Flexible File
layouts are generated.
The mirrored DS configuration always generates Flexible File layouts.
For NFS clients that do not support NFSv4.1 pNFS, all I/O operations
are done against the MDS which acts as a proxy for the appropriate DS(s).
When the MDS receives an I/O RPC, it will do the RPC on the DS as a proxy.
If the DS is on the same machine, the MDS/DS will do the RPC on the DS as
a proxy and so on, until the machine runs out of some resource, such as
session slots or mbufs.
As such, DSs must be separate systems from the MDS.

Tested by:	james.rose@framestore.com
Relnotes:	yes
2018-06-12 19:36:32 +00:00
Ruslan Bukin 6fdc57357e Include VirtIO devices to the GENERIC configuration file.
These are now available in QEMU/RISC-V.

Sponsored by:	DARPA, AFRL
2018-06-12 17:55:40 +00:00
Ruslan Bukin 2d53a67c2c o Add driver for PLIC (Platform-Level Interrupt Controller) device.
o Convert interrupt machdep support to use INTRNG code.

Sponsored by:	DARPA, AFRL
2018-06-12 17:45:15 +00:00
Ruslan Bukin ebdf0baf3a Add simplebus-like RISC-V SoC bus.
This is required in order to probe and attach devices described under
"riscv-virtio-soc" node of DTS.

Sponsored by:	DARPA, AFRL
2018-06-12 17:07:30 +00:00
Ruslan Bukin f2e299880a Release secondary cores from WFI (wait for interrupt) by sending them
an IPI.

This does not work however yet in QEMU. As a temporary workaround set
software interrupt pending bit manually on a local core to ensure WFI
doesn't halt the hart.

This is required to smpboot in QEMU.

Sponsored by:	DARPA, AFRL
2018-06-12 16:47:33 +00:00
Ruslan Bukin a9063ba1d7 Align virtual addressing entries.
This is required due to C-compressed ISA extension option being turned on.

This fixes SMP operation in QEMU.

Sponsored by:	DARPA, AFRL
2018-06-12 16:19:27 +00:00
Andrew Turner 0c38b2d37c Rework PSCI so it only searches for the call function once.
This is in preperation for supporting newer smccc functions that also use
the same call method.

Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D15745
2018-06-12 14:54:17 +00:00
Ed Maste 0f69696824 linux64: use linux output target for linux_vdso.so
linux_vdso.so provides the vdso for the linuxulator's amd64 target and
is mapped into a Linux binary's address space.  Thus it should be a
Linux-style .so, which has the ELF OS/ABI unset.

It turns out that ELF Tool Chain elfcopy/objcopy also has a bug where
the OS/ABI field is unset, regardless of the specified --output-target,
so this change is a no-op with the default in-tree toolchain.  This is a
real fix when using external binutils, and the ELF Tool Chain bug will
be fixed in the future.

PR:		228934
Sponsored by:	Turing Robotic Industries
2018-06-12 13:32:42 +00:00
Diane Bruce 5bede50958 Add a driver for the BCM2835 Mini-UART as seen on the RPi3
Reviewed by:	andrew
Approved by:	andrew
Differential Revision:	https://reviews.freebsd.org/D15684
2018-06-12 13:26:31 +00:00
Emmanuel Vadot e34425be26 arm64: rockchip: Correctly set armclk
Parent needs to be the same frequency as the armclk, not twice the freq.
The real divider is incremented by one so write it with - 1
The rate can be at index 0

Pointy Hat To: myself
2018-06-12 11:47:21 +00:00
Konstantin Belousov 6ee7d5afcc All exceptions IDT descriptors must use interrupt gates on 4/4 kernel.
Fix it for #MF.

Noted by:	rlibby
Sponsored by:	The FreeBSD Foundation
2018-06-12 10:43:20 +00:00
Konstantin Belousov 7a18e90447 Fix typo.
Sponsored by:	The FreeBSD Foundation
2018-06-12 10:41:26 +00:00
Hans Petter Selasky 986e3bed8b Implement the ip_eth_mc_map() function in the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-06-12 08:43:49 +00:00
Navdeep Parhar 1bb577b4c2 cxgbe(4): Remove homemade version of htobe32 from the driver.
It was needed only for ia64 where it was implemented as a call to
bswapXX, which was always a real function.  htobeXX with a constant
argument is calculated at compile-time everywhere else.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-06-12 06:46:03 +00:00
Jonathan T. Looney 16a227c7c9 Fix a memory leak for the BIOCSETWF ioctl on kernels with the BPF_JITTER
option.

The BPF code was creating a compiled filter in the common filter-creation
path.  However, BPF only uses compiled filters in the read direction.
When creating a write filter, the common filter-creation code was
creating an unneeded write filter and leaking the memory used for that.

MFC after:	2 weeks
Sponsored by:	Netflix
2018-06-11 23:32:06 +00:00
Ed Maste 2c8cf0c505 if_muge: retire lan78xx_eeprom_read
lan78xx_eeprom_read just checked for EEPROM presence then called
lan78xx_eeprom_read_raw if present, and had only one caller.  Introduce
lan78xx_eeprom_present to check for EEPROM presence, and use it in the
one place it is needed.

This is used by r334964, which was accidentally committed out-of-order
from my work tree.

Reported by:	markj
Sponsored by:	The FreeBSD Foundation
2018-06-11 19:34:47 +00:00
Rick Macklem 73b1879c2d Add a couple of safety belt checks to the NFSv4.1 client related to sessions.
There were a couple of cases in newnfs_request() that it assumed that it
was an NFSv4.1 mount with a session. This should always be the case when
a Sequence operation is in the reply or the server replies NFSERR_BADSESSION.
However, if a server was broken and sent an erroneous reply, these safety
belt checks should avoid trouble.
The one check required a small tweak to nfsmnt_mdssession() so that it
returns NULL when there is no session instead of the offset of the field
in the structure (0x8 for i386).
This patch should have no effect on normal operation of the client.
Found by inspection during pNFS server development.

MFC after:	2 weeks
2018-06-11 19:00:07 +00:00
Ed Maste 00ce0c6258 makesyscalls: simplify capenabled pipeline
Replace cat + 2x grep with one grep.

Sponsored by:	Turing Robotic Industries
2018-06-11 18:57:40 +00:00
Ed Maste 2d14fb8bec if_muge: add LAN7850 support
Differences between LAN7800 and LAN7850 from the driver's perspective:

* The LAN7800 muxes EEPROM signals with LEDs, so LED mode needs to be
  disabled when reading/writing EEPROM.  The EEPROM is not muxed on the
  LAN7850.

* The Linux driver enables automatic duplex and speed detection when
  there is no EEPROM, for the LAN7800 only.  With this FreeBSD driver
  LAN7850-based adapters without a configuration EEPROM fail to link
  (with or without the automatic duplex and speed detection code), so
  I have just followed the example of the Linux driver for now.

Sponsored by:	The FreeBSD Foundation
Sponsored by:	Microchip (hardware)
2018-06-11 18:44:56 +00:00
Matt Macy 0ea9d9376e limit change to fixing controlp handling pending review 2018-06-11 17:10:19 +00:00
Matt Macy c34bf30069 soreceive_stream: correctly handle edge cases
- non NULL controlp is not an error, returning EINVAL
  would cause X forwarding to fail

- MSG_PEEK and MSG_WAITALL are fairly exceptional, but we still
  want to handle them - punt to soreceive_generic
2018-06-11 16:31:42 +00:00
Mark Johnston a9336cef39 Use the cached curthread reference in pmc_process_interrupt().
Fix indentation while here.
2018-06-11 16:27:09 +00:00
Hans Petter Selasky 35555d474b Implement the kstrtobool() and kstrtobool_from_user() functions
in the LinuxKPI.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-11 16:26:33 +00:00
Hans Petter Selasky 93ed9ab20b Implement the user_access_begin(), user_access_end(), usafe_get_user() and
unsafe_put_user() function macros in the LinuxKPI.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-11 15:42:29 +00:00
Konstantin Belousov b45e10c3f4 Fix braino in r334799. Maxmem is in pages.
Reported by:	ae, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-06-11 15:28:20 +00:00
Jonathan T. Looney cff21e484b Change RACK dependency on TCPHPTS from a build-time dependency to a load-
time dependency.

At present, RACK requires the TCPHPTS option to run. However, because
modules can be moved from machine to machine, this dependency is really
best assessed at load time rather than at build time.

Reviewed by:	rrs
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D15756
2018-06-11 14:27:19 +00:00
Dimitry Andric 48f00bafa1 Fix build of bxe with base gcc on i386
Casting from rman_res_t to a pointer results in "cast to pointer from
integer of different size" warnings with base gcc on i386, so print
these without casting.  The kva field of struct bxe_bar is of type
vm_offset_t, which can be 32 or 64 bit, so cast it to uintmax_t before
printing.

Reviewed by:	markj
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D15733
2018-06-11 10:08:22 +00:00
Dimitry Andric ee3d52d730 Disable building aesni with base gcc
Because base gcc does not support the required intrinsics, do not
attempt to compile the aesni module with it.

Noticed by:	Dan Allen <danallen46@gmail.com>
MFC after:	3 days
2018-06-11 08:42:03 +00:00
Dimitry Andric a0e8aab7d8 Fix build of i915kms with base gcc
Base gcc fails to compile sys/dev/drm2/i915/intel_display.c for i386,
with the following -Werror warnings:

cc1: warnings being treated as errors
/usr/src/sys/dev/drm2/i915/intel_display.c:8884: warning:
initialization from incompatible pointer type

This is due to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36432, which
incorrectly interprets the [] as a flexible array member.

Because base gcc does not have a -W flag to suppress this particular
warning, it requires a rather ugly cast.  To not influence any other
compiler, put it in a #if/#endif block.

Reviewed by:	kib
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D15744
2018-06-11 08:11:35 +00:00
Dimitry Andric bc5ea07c09 Fix build of ocs_fs with base gcc on i386
Add a few intermediate casts to uintptr_t to suppress "cast to pointer
from integer of different size" warnings from gcc.  Also remove a few
incorrect casts.

Reviewed by:	ram
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D15747
2018-06-11 07:57:32 +00:00
Eitan Adler 0642b65f38 usbdevs: sort my prior commit 2018-06-11 05:28:00 +00:00
Eitan Adler 876713d563 usbdevs: adding vendor
PR:		228856
Reported by:	hrs, ys-h@imail.earth
2018-06-11 05:27:07 +00:00
Andrew Turner 619e50a657 Remove the psci option from arm64. It is now a standard option as it is
required to boot correctly.

Sponsored by:	DARPA, AFRL
2018-06-10 19:42:44 +00:00
Eitan Adler c8141b92e9 Revert r334929
Apparently some software might depend on a header whose sole contents is
a `#warning` to remove it. Revert pending exp-run.
2018-06-10 19:15:38 +00:00
Rick Macklem 8097753476 Add checks for the Flexible File layout to LayoutRecall callbacks.
The Flexible File layout case wasn't handled by LayoutRecall callbacks
because it just checked for File layout and returned NFSERR_NOMATCHLAYOUT
otherwise. This patch adds the Flexible File layout handling.
Found during testing of the pNFS server.

MFC after:	2 weeks
2018-06-10 19:03:21 +00:00
Eitan Adler 7f7bc5b2f9 include: remove sys/capability.h
This file has only generated a warning for the last 18 months. Its
existence at this point only serves to confuse software looking for
POSIX.1e capabilities and produce actionless warnings.
2018-06-10 18:38:48 +00:00
Andrew Turner dc9b99a884 Clean up handling of unexpected exceptions. Previously we would issue a
breakpoint instruction, however this would lose information that may be
useful for debugging.

These are now handled in a similar way to other exceptions, however it
won't exit out of the exception handler until it is known if we can
handle these exceptions in a useful way.

Sponsored by:	DARPA, AFRL
2018-06-10 16:21:21 +00:00
Bruce Evans 3cd246d9a9 Untangle configuration ifdefs a little. On x86, msi is optional on pci,
and also on apic in common and i386 files (except for xen it is optional
only on xenhvm), but it was not ifdefed except on apic in common and i386
files.

This is all that is left from an attempt to build a (sub-)minimal kernel
without any devices.  The isa "option" is still used without ifdefs in many
standard files even on amd64.  ISAPNP is not optional on at least i386.
ATPIC is not optional on i386 (it is used mainly for Xspuriousint).  But
pci is now supposed to be optional on x86.
2018-06-10 14:49:13 +00:00
Bruce Evans 09e3c9a4ec Fix panics in potentially all x86bios calls on i386 since r332489.
A call to npxsave() in the exception trampolines was not relocated.
This call to a garbage address usually paniced when made, but it is only
made when the thread has used an FPU recently, and this is not the usual
case.

PR:		228755
Reviewed by:	kib
2018-06-10 14:21:01 +00:00
Vladimir Kondratyev 67580198b7 Drop MOUSE_GETVARS and MOUSE_SETVARS ioctls support.
These ioctls are not documented and only stubbed in a few drivers: mse(4),
psm(4) and syscon's sysmouse(4). The only exception is MOUSE_GETVARS
implemented in psm(4)

Given the fact that they were introduced 20 years ago and implementation
has never been completed, remove any related code.

PR:		228718 (exp-run)
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D15726
2018-06-10 10:23:31 +00:00
Rick Macklem be9d155ff4 Delete some macros that are unused.
These macros were added because they were used by the pNFS server last
year. However, they are no longer used by the pNFS server code and
might as well be deleted.
This is a partial reversion of r326735.
2018-06-09 23:38:22 +00:00
Rick Macklem d506aa140d Delete an unused macro and clean up a comment about it.
NFSDEV_MIRRORSTR was defined for the pNFS server, but has not been used,
so this patch deletes it. It also cleans up the comment and hopefully
makes it more readable.
2018-06-09 23:14:59 +00:00
Mark Johnston 8590505f48 Bump __FreeBSD_version after r334881 and force libdwarf to be rebuilt.
Reported by:	O. Hartmann <ohartmann@walstatt.org>
Reviewed by:	bdrewery
2018-06-09 20:01:03 +00:00
Mark Johnston f090f67503 Tell the compiler that rdtscp clobbers %ecx. 2018-06-09 18:31:19 +00:00
Andrew Turner c11862c030 In the ThunderX BGX network driver we were skipping the NULL terminator
when parsing the phy type, however this is included in the length returned
by OF_getprop. To fix this stop ignoring the terminator.

PR:		228828
Reported by:	sbruno
Sponsored by:	DARPA, AFRL
2018-06-09 14:47:49 +00:00
Kristof Provost 0b799353d8 pf: Fix deadlock with route-to
If a locally generated packet is routed (with route-to/reply-to/dup-to) out of
a different interface it's passed through the firewall again. This meant we
lost the inp pointer and if we required the pointer (e.g. for user ID matching)
we'd deadlock trying to acquire an inp lock we've already got.

Pass the inp pointer along with pf_route()/pf_route6().

PR:		228782
MFC after:	1 week
2018-06-09 14:17:06 +00:00
Andrey V. Elsukov 44bcc06816 Explicitly change the link state when we assingn an address.
Since we are setting IFF_UP flag on SIOCSIFADDR, it is possible, that
after this link state information still not initialized properly.
This leads to problems with routing, since now interface has
IFCAP_LINKSTATE capability and a route is considered as working only
when interface's link state is in LINK_STATE_UP (see RT_LINK_IS_UP()
macro).

Reported by:	Marek Zarychta
MFC after:	3 days
2018-06-09 09:57:14 +00:00
Mateusz Guzik 0001edb823 counter: add a bit missed in r334858
It happens to be a noop.
2018-06-08 22:06:32 +00:00
Stephen Hurd 3ab4a96085 Remove tx task spinning added in r333686
This caused issues with PASTE.  Just remove the reschedule since the DELAY()
should be enough for use cases such as pkt-gen which were failing before the
change.

Reported by:	Michio Honda
Sponsored by:	Limelight Networks
2018-06-08 21:49:19 +00:00
Mateusz Guzik 4e180881ae uma: implement provisional api for per-cpu zones
Per-cpu zone allocations are very rarely done compared to regular zones.
The intent is to avoid pessimizing the latter case with per-cpu specific
code.

In particular contrary to the claim in r334824, M_ZERO is sometimes being
used for such zones. But the zeroing method is completely different and
braching on it in the fast path for regular zones is a waste of time.
2018-06-08 21:40:03 +00:00