Commit graph

724483 commits

Author SHA1 Message Date
Serge Semin 6952c6de8a NTB: ntb_hw_idt: Set NTB_TOPO_SWITCH topology
Since Switchtec patch there has been a new topology added to
the NTB API. It's called NTB_TOPO_SWITCH and dedicated for
PCIe switch chips. Even though topo field isn't used within the
IDT driver much, lets set it for the sake of unification.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:24 -05:00
Serge Semin 4517a5701c NTB: ntb_test: Update ntb_perf tests
ntb_perf driver has been also updated so to have the multi-port
interface support. User now must specify what peer port is going
to be used to perform the test.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:24 -05:00
Serge Semin fa63be5428 NTB: ntb_test: Update ntb_tool MW tests
There are devices (like IDT PCIe switches), which outbound MWs xlat address
is setup on peer side. In this case local side is supposed to allocate
a memory buffer and somehow deliver the xlat DMA address to peer so one
could set the outbound MW up. The MW test is altered so to support both
previous Intel/AMD and new IDT-like devices.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:24 -05:00
Serge Semin e770112b2d NTB: ntb_test: Add ntb_tool Message tests
Messages NTB API is now available. ntb_tool driver has been altered
to perform messages send and receive operation. The test of messages
read/write to/from peer device has been added to the script.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:24 -05:00
Serge Semin 06bd0407d0 NTB: ntb_test: Update ntb_tool Scratchpad tests
Scratchpad NTB API has changed so has the ntb_tool driver. Outbound
Scratchpad DebugFS files have been moved to peer specific directories.
Each scratchpad is now available via separate file. The test code
has been accordingly altered.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:24 -05:00
Serge Semin 4e1ef427f2 NTB: ntb_test: Update ntb_tool DB tests
DB interface of ntb_tool driver hasn't been changed much, but
db_valid_mask DebugFS file has still been added. In this case
it's much better to test all valid DB bits instead of using
the predefined mask, which may be incorrect in general.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:24 -05:00
Serge Semin 0298e5b9d0 NTB: ntb_test: Update ntb_tool link tests
Link Up and Down methods are used to change NTB link settings on
local side only for multi-port devices. Link is considered up
only if both sides local and peer set it up. Intel/AMD hardware
acts a bit different by assigning the Primary and Secondary roles,
so Primary device only is able to change the link state. Such behaviour
should be reflected in the test code.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:24 -05:00
Serge Semin dd151d7200 NTB: ntb_test: Add ntb_tool port tests
Multi-port interface is now available in ntb_tool driver. According
to the new NTB API, there might be more than two devices connected over
NTB. It means each device can have multiple freely enumerated ports.
Each port got index assigned by NTB hardware driver. This test is
performed to determine the local and peer ports as well as their indexes.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:24 -05:00
Serge Semin 03aadf9499 NTB: ntb_test: Safely use paths with whitespace
If some of variables like LOC/REM or LOCAL_*/REMOTE_* got
whitespaces, the script may fail with syntax error.

Fixes: a9c59ef774 ("ntb_test: Add a selftest script for the NTB subsystem")
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Acked-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:24 -05:00
Serge Semin 5648e56d03 NTB: ntb_perf: Add full multi-port NTB API support
Former NTB Performance driver could only work with NTB devices, which
got Scratchpads available and had just two ports. Since there are
devices, which don't have Scratchpads and got more than two peer
ports, the performance measuring tool needs to be rewritten. This
patch adds the ability to test any available NTB peer.
Additionally it allows to set NTB memory windows up using any
available data exchange interface: Scratchpad or Message registers.
Some cleanups are also added here.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:24 -05:00
Serge Semin 7f46c8b3a5 NTB: ntb_tool: Add full multi-port NTB API support
Former NTB Debugging tool driver supported only the limited
functionality of the recently updated NTB API, which is now available
to work with the truly NTB multi-port devices and devices, which
got NTB Message registers instead of Scratchpads. This patch
fully rewrites the driver so one would fully expose all the new
NTB API interfaces. Particularly it concerns the Message registers,
peer ports API, NTB link settings. Additional cleanups are also added
here.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:24 -05:00
Serge Semin c7aeb0afdc NTB: ntb_pp: Add full multi-port NTB API support
Current Ping Pong driver can't truly work with multi-port devices.
Additionally it requires the Scratchpad registers being available
on NTB device. This patches rewrites the driver so one would
perform the cyclic Ping-Pong algorithm around all the available
NTB peers and makes it working with NTB hardware, which doesn't
support Scratchpads, but such alternative as NTB Message register.
Additional cleanups are also added here.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:24 -05:00
Serge Semin f1678a4c66 NTB: Fix UB/bug in ntb_mw_get_align()
Simple (1 << pidx) operation causes undefined behaviour when
pidx >= 32. It must be casted to u64 to match the actual return
value of ntb_link_is_up() method, so to have all the possible
peer indexes covered and to get rid of undefined behaviour.
Additionally there are special macros in "linux/bitops.h" to perform
the bit-set-shift operations, so it's recommended to have them used
for proper bit setting.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:24 -05:00
Serge Semin 417cf39cfe NTB: Set dma mask and dma coherent mask to NTB devices
The dma_mask and dma_coherent_mask fields of the NTB struct device
weren't initialized in hardware drivers. In fact it should be done
instead of PCIe interface usage, since NTB clients are supposed to
use NTB API and left unaware of real hardware implementation.
In addition to that ntb_device_register() method shouldn't clear
the passed ntb_dev structure, since it dma_mask is initialized
by hardware drivers.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:23 -05:00
Serge Semin b87ab21935 NTB: Rename NTB messaging API methods
There is a common methods signature form used over all the NTB API
like functions naming scheme, arguments names and order, etc.
Recently added NTB messaging API IO callbacks were named a bit
different so should be renamed to be in compliance with the rest
of the API.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:23 -05:00
Arnd Bergmann c6fad21a8d ntb_hw_switchtec: fix logic error
Newer gcc (version 7 and 8 presumably) warn about a statement mixing
the << operator with logical and:

drivers/ntb/hw/mscc/ntb_hw_switchtec.c: In function 'switchtec_ntb_init_sndev':
drivers/ntb/hw/mscc/ntb_hw_switchtec.c:888:24: error: '<<' in boolean context, did you mean '<' ? [-Werror=int-in-bool-context]

My interpretation here is that the author must have intended a bitmask
rather than a comparison, so I'm changing the '&&' to '&', which makes
a lot more sense in the context.

Fixes: 1b249475275d ("ntb_hw_switchtec: Allow using Switchtec NTB in multi-partition setups")
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:23 -05:00
Logan Gunthorpe 1e2fd202f8 ntb_hw_switchtec: Check for alignment of the buffer in mw_set_trans()
With Switchtec hardware, the buffer used for a memory window must be
aligned to its size (the hardware only replaces the lower bits). In
certain circumstances dma_alloc_coherent() will not provide a buffer
that adheres to this requirement like when using the CMA and
CONFIG_CMA_ALIGNMENT is set lower than the buffer size.

When we get an unaligned buffer mw_set_trans() should return an error.
We also log an error so we know the cause of the problem.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:23 -05:00
Logan Gunthorpe cbd27448fa ntb_transport: Fix bug with max_mw_size parameter
When using the max_mw_size parameter of ntb_transport to limit the size of
the Memory windows, communication cannot be established and the queues
freeze.

This is because the mw_size that's reported to the peer is correctly
limited but the size used locally is not. So the MW is initialized
with a buffer smaller than the window but the TX side is using the
full window. This means the TX side will be writing to a region of the
window that points nowhere.

This is easily fixed by applying the same limit to tx_size in
ntb_transport_init_queue().

Fixes: e26a5843f7 ("NTB: Split ntb_hw_intel and ntb_transport drivers")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:23 -05:00
Allen Hubbe c3840c7c3b MAINTAINERS: NTB: Update contact info
I am no longer employed by Dell EMC.  For the purposes of NTB driver
development and maintenance, please contact me via my personal email.

Signed-off-by: Allen Hubbe <allenbh@gmail.com>
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:23 -05:00
Logan Gunthorpe d04be142b8 ntb_hw_switchtec: Force down the link before initializing
If one host crashes and soft reboots, the other host may not see a
link down event. Then when the crashed host comes back up, the
surviving host may not know the link was reset and the NTB clients
may not work without being reset.

To solve this, we send a LINK_FORCE_DOWN message to each peer every
time we come up, before we register the NTB device. If a surviving
host still thinks the link is up it will take it down immediately.
In this way, once the crashed host comes up fully, it will send a
regular link up event as per usual and the link will be properly
restarted.

While we are in the area, this also fixes the MSG_LINK_UP message that
was in the link down function that was reported by Doug Meyers.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reported-by: ThanhTuThai <cruisethai@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:23 -05:00
Logan Gunthorpe 270d32e63c ntb_hw_switchtec: Crosslink doorbells and messages
In a crosslink configuration doorbells and messages largely work the
same but the NTB registers must be accessed through the reserved LUT
window. Also, as a bonus, seeing there are now two independent sets of
NTB links, both partitions can actually use all 60 doorbell registers
instead of them having to be split into two for each partition.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:23 -05:00
Logan Gunthorpe 0175250182 ntb_hw_switchtec: Add initialization code for crosslink
Crosslink is a feature of the Switchtec switches that is similar to
the B2B mode of other NTB devices. It allows a system to be designed
that is perfectly symmetric with two identical switches that link
two hosts together.

In order for the system to be symmetric, there is an empty host-less
partition between the two switches which the host must enumerate and
assign BAR addresses to. The firmware in the switch manages this
specially so that the BAR addresses on both sides of the empty
partition will be identical despite being in the same partition with
the same address space.

The driver determines whether crosslink is enabled by a flag set in
the NTB partition info registers which are set by the switch's
configuration file.

When crosslink is enabled, a reserved LUT window is setup to point to
the peer's switch's NTB registers and the local MWs are set to forward
to the host-less partition's BARs. (Yes, this hurts my brain too.)
Once this is setup, largely the same NTB infrastructure is used to
communicate between the two hosts.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:23 -05:00
Logan Gunthorpe 45f447deb2 ntb_hw_switchtec: Expand PFF CSR registers
The PFF CSR registers actual mirrors the PCI configuration space
for all the ports in the switch. Previously, this was not needed by
the driver but will be used by the crosslink code to enumerate the
bus in an host-less centre partition.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:23 -05:00
Logan Gunthorpe bbe35ca5aa ntb_hw_switchtec: Make switchtec_ntb_init_req_id_table() more general
This is a prep patch in order to support the crosslink feature which
will require the driver to setup the requester ID table in another
partition as well as it's own. To aid this, create a helper function
which sets up the requester IDs from an array.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:23 -05:00
Logan Gunthorpe 12cb203b1b ntb_hw_switchtec: Create helper function to setup reserved LUT MWs
This is a prep patch in order to support the crosslink feature which
will require the driver to use another reserved LUT window. To
simplify this we move the code which sets up the reserved LUT window
into a helper function which will be used by the crosslink
initialization.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:23 -05:00
Logan Gunthorpe c3585cd870 ntb_hw_switchtec: Keep track of the number of LUT windows used by the driver
This is a prep patch in order to support the crosslink feature which will
require the driver to use another reserved LUT window. To simplify this,
we add some code to track the number of reserved LUT windows in use
instead of assuming this is always 1.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:23 -05:00
Kelvin Cao 3df54c870f ntb_hw_switchtec: Allow using Switchtec NTB in multi-partition setups
Allow using Switchtec NTB in setups that have more than two partitions.
Note: this does not enable having multi-host communication, it only
allows for a single NTB link between two hosts in a network that might
have more than two.

Use following logic to determine the NT peer partition:

1) If there are 2 partitions, and the target vector is set in
   the Switchtec configuration, use the partition specified in target
   vector.
2) If there are 2 partitions and target vector is unset
   use the only other partition as specified in the NT EP map.
3) If there are more than 2 partitions and target vector is set
   use the other partition specified in target vector.
4) If there are more than 2 partitions and target vector is unset,
   this is invalid and report an error.

Signed-off-by: Kelvin Cao <kelvin.cao@microsemi.com>
[logang@deltatee.com: commit message fleshed out]
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:23 -05:00
Jon Mason 2dd0f6a64a NTB: switchtec_ntb: Add new line on appropriate printks
Trivial addition of "\n" to the dev_* prints where necessary

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:23 -05:00
Colin Ian King c5ec8b451a NTB: switchtec_ntb: fix spelling mistake: "peforming" -> "performing"
Trivial fix to spelling mistake in dev_err error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-By: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:23 -05:00
Dave Jiang 3f7756728e ntb: remove Intel Atom NTB driver support
Removing dead code since this is not being used.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:23 -05:00
Greg Kroah-Hartman 0ed08f829b ntb: remove unneeded DRIVER_LICENSE #defines
There is no need to #define the license of the driver, just put it in
the MODULE_LICENSE() line directly as a text string.

This allows tools that check that the module license matches the source
code license to work properly, as there is no need to unwind the
unneeded dereference, especially when the string is defined just a few
lines above the usage of it.

Reported-and-reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Allen Hubbe <Allen.Hubbe@emc.com>
Cc: Gary R Hook <gary.hook@amd.com>
Cc: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:23 -05:00
Doug Meyer 140eb52277 NTB: ntb_hw_switchtec: Fix peer BAR bug in switchtec_ntb_init_shared_mw
This resolves a bug which may incorrectly configure the peer host's
LUT for shared memory window access. The code was using the local
host's first BAR number, rather than the peer hosts's first BAR
number, to determine what peer NT control register to program.

The bug will cause the Switchtec NTB link to work only if both peers
have the same first NTB BAR configured. In all other configurations,
the link will not come up, failing silently.

When both hosts have the same first BAR, the configuration works only
because the first BAR numbers happent to be the same. When the hosts
do not have the same first BAR, then the LUT translation will not be
configured in the correct peer LUT and will not give the peer the
shared memory window access required for the link to operate.

Signed-off-by: Doug Meyer <dmeyer@gigaio.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Fixes: 678784a44ae8 ("NTB: switchtec_ntb: Initialize hardware for memory windows")
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28 22:17:22 -05:00
Linus Torvalds d8a5b80568 Linux 4.15 2018-01-28 13:20:33 -08:00
Linus Torvalds 24b1cccf92 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 retpoline fixlet from Thomas Gleixner:
 "Remove the ESP/RSP thunks for retpoline as they cannot ever work.

  Get rid of them before they show up in a release"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/retpoline: Remove the esp/rsp thunk
2018-01-28 12:24:36 -08:00
Linus Torvalds 32c6cdf75c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A set of small fixes for 4.15:

   - Fix vmapped stack synchronization on systems with 4-level paging
     and a large amount of memory caused by a missing 5-level folding
     which made the pgd synchronization logic to fail and causing double
     faults.

   - Add a missing sanity check in the vmalloc_fault() logic on 5-level
     paging systems.

   - Bring back protection against accessing a freed initrd in the
     microcode loader which was lost by a wrong merge conflict
     resolution.

   - Extend the Broadwell micro code loading sanity check.

   - Add a missing ENDPROC annotation in ftrace assembly code which
     makes ORC unhappy.

   - Prevent loading the AMD power module on !AMD platforms. The load
     itself is uncritical, but an unload attempt results in a kernel
     crash.

   - Update Peter Anvins role in the MAINTAINERS file"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ftrace: Add one more ENDPROC annotation
  x86: Mark hpa as a "Designated Reviewer" for the time being
  x86/mm/64: Tighten up vmalloc_fault() sanity checks on 5-level kernels
  x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systems
  x86/microcode: Fix again accessing initrd after having been freed
  x86/microcode/intel: Extend BDW late-loading further with LLC size check
  perf/x86/amd/power: Do not load AMD power module on !AMD platforms
2018-01-28 12:19:23 -08:00
Linus Torvalds 07b0137c02 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
 "A single fix for a ~10 years old problem which causes high resolution
  timers to stop after a CPU unplug/plug cycle due to a stale flag in
  the per CPU hrtimer base struct.

  Paul McKenney was hunting this for about a year, but the heisenbug
  nature made it resistant against debug attempts for quite some time"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hrtimer: Reset hrtimer cpu base proper on CPU hotplug
2018-01-28 12:17:35 -08:00
Linus Torvalds 624441927f Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
 "A single bug fix to prevent a subtle deadlock in the scheduler core
  code vs cpu hotplug"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Fix cpu.max vs. cpuhotplug deadlock
2018-01-28 11:51:45 -08:00
Linus Torvalds 39e383626c Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "Four patches which all address lock inversions and deadlocks in the
  perf core code and the Intel debug store"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix perf,x86,cpuhp deadlock
  perf/core: Fix ctx::mutex deadlock
  perf/core: Fix another perf,trace,cpuhp lock inversion
  perf/core: Fix lock inversion between perf,trace,cpuhp
2018-01-28 11:48:25 -08:00
Linus Torvalds 8c76e31a6a Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
 "Two final locking fixes for 4.15:

   - Repair the OWNER_DIED logic in the futex code which got wreckaged
     with the recent fix for a subtle race condition.

   - Prevent the hard lockup detector from triggering when dumping all
     held locks in the system"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/lockdep: Avoid triggering hardlockup from debug_show_all_locks()
  futex: Fix OWNER_DEAD fixup
2018-01-28 11:20:35 -08:00
Josh Poimboeuf dd085168a7 x86/ftrace: Add one more ENDPROC annotation
When ORC support was added for the ftrace_64.S code, an ENDPROC
for function_hook() was missed. This results in the following warning:

  arch/x86/kernel/ftrace_64.o: warning: objtool: .entry.text+0x0: unreachable instruction

Fixes: e2ac83d74a ("x86/ftrace: Fix ORC unwinding from ftrace handlers")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20180128022150.dqierscqmt3uwwsr@treble
2018-01-28 09:19:12 +01:00
Thomas Gleixner d5421ea43d hrtimer: Reset hrtimer cpu base proper on CPU hotplug
The hrtimer interrupt code contains a hang detection and mitigation
mechanism, which prevents that a long delayed hrtimer interrupt causes a
continous retriggering of interrupts which prevent the system from making
progress. If a hang is detected then the timer hardware is programmed with
a certain delay into the future and a flag is set in the hrtimer cpu base
which prevents newly enqueued timers from reprogramming the timer hardware
prior to the chosen delay. The subsequent hrtimer interrupt after the delay
clears the flag and resumes normal operation.

If such a hang happens in the last hrtimer interrupt before a CPU is
unplugged then the hang_detected flag is set and stays that way when the
CPU is plugged in again. At that point the timer hardware is not armed and
it cannot be armed because the hang_detected flag is still active, so
nothing clears that flag. As a consequence the CPU does not receive hrtimer
interrupts and no timers expire on that CPU which results in RCU stalls and
other malfunctions.

Clear the flag along with some other less critical members of the hrtimer
cpu base to ensure starting from a clean state when a CPU is plugged in.

Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
root cause of that hard to reproduce heisenbug. Once understood it's
trivial and certainly justifies a brown paperbag.

Fixes: 41d2e49493 ("hrtimer: Tune hrtimer_interrupt hang logic")
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
2018-01-27 15:12:22 +01:00
H. Peter Anvin 8a95b74d50 x86: Mark hpa as a "Designated Reviewer" for the time being
Due to some unfortunate events, I have not been directly involved in
the x86 kernel patch flow for a while now.  I have also not been able
to ramp back up by now like I had hoped to, and after reviewing what I
will need to work on both internally at Intel and elsewhere in the near
term, it is clear that I am not going to be able to ramp back up until
late 2018 at the very earliest.

It is not acceptable to not recognize that this load is currently
taken by Ingo and Thomas without my direct participation, so I mark
myself as R: (designated reviewer) rather than M: (maintainer) until
further notice.  This is in fact recognizing the de facto situation
for the past few years.

I have obviously no intention of going away, and I will do everything
within my power to improve Linux on x86 and x86 for Linux.  This,
however, puts credit where it is due and reflects a change of focus.

This patch also removes stale entries for portions of the x86
architecture which have not been maintained separately from arch/x86
for a long time.  If there is a reason to re-introduce them then that
can happen later.

Signed-off-by: H. Peter Anvin <h.peter.anvin@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bruce Schlobohm <bruce.schlobohm@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180125195934.5253-1-hpa@zytor.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-27 10:11:00 +01:00
Linus Torvalds c4e0ca7fa2 RISC-V: We have a new mailing list and git repo!
Sorry to send something essentially as late as possible (Friday after an
 rc9), but we managed to get a mailing list for the RISC-V Linux port.
 We've been using patches@groups.riscv.org for a while, but that list has
 some problems (it's Google Groups and it's shared over all RISC-V
 software projects).  The new infaread.org list is much better.   We just
 got it on Wednesday but I used it a bit on Thursday to shake out all the
 configuration problems and it appears to be in working order.
 
 When I updated the mailing list I noticed that the MAINTAINERS file was
 pointing to our github repo, but now that we have a kernel.org repo I'd
 like to point to that instead so I changed that as well.  We'll be
 centralizing all RISC-V Linux related development here as that seems to
 be the saner way to go about it.
 
 I can understand if it's too late to get this into 4.15, but given that
 it's not a code change I was hoping it'd still be OK.  It would be nice
 to have the new mailing list and git repo in the release tarballs so
 when people start to find bugs they'll get to the right place.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAlprU08THHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRDvTKFQLMurQXWGD/967RCJl2NVTq/THdKs371+tpv+8HG7
 hH6ihV8qyCwdiiOR1sYqOklWV9TP/Ci++Y3P27doO4HE7QkUPRy9J4WqvXcKxcvn
 0W8vFyJHIEmZjuuJzaNHOk1OxJTaleH0t4uHwaCK9KJQax5S6NcFd5YG9gNn+26s
 mxwKVQbI/NYvA5Uzc288paLiBBGHH8Ja83SrWTsYrp3PXSlealRj1WIMKxZXL4cp
 Wl9LixD/EY2NPUkYQ7MB63Lc3ym/0r//ij5Lt1+jmZt2fo/8uiuM1pPSFtlVEmYi
 8XUPKztwEq8PGUWJPLpSHy6XrfBMH18vRPQ4GCWEwfLlmCd9v5WCq1HdvzdcYMQD
 g/eg4eZtn4sv2JM7Mjh7saFLbRD8ej0DMJ4bPA0YoRLjZ9DdUqI8UvIV6UL58BDf
 px0HWzcN/xDke32aJ83eg6t3wAipGzcvRIAE++2mBfJgl4cOZGa2eX23bH7b/kPR
 gotRiIeV1vXnuNswGzwzLOVHQ4nSgFaT+G6Lb8OdK3WyK5gKBYoWygo36FMK6WkE
 qxbJqoylRhnmYAtxklvNN753pXvFcPGWg5yDPdmx3bRbGjdryFXmUqe4dv8/EmjC
 a0R061lFvhr5wx2lNNuRrNgQx/AAypsaqwttxmezQHyHQIYvgeG91JXCbEF1TP+Y
 Ve57nAXxqHIVGQ==
 =zdWp
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-4.15-maintainers' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux

Pull RISC-V update from Palmer Dabbelt:
 "RISC-V: We have a new mailing list and git repo!

  Sorry to send something essentially as late as possible (Friday after
  an rc9), but we managed to get a mailing list for the RISC-V Linux
  port. We've been using patches@groups.riscv.org for a while, but that
  list has some problems (it's Google Groups and it's shared over all
  RISC-V software projects). The new infaread.org list is much better.
  We just got it on Wednesday but I used it a bit on Thursday to shake
  out all the configuration problems and it appears to be in working
  order.

  When I updated the mailing list I noticed that the MAINTAINERS file
  was pointing to our github repo, but now that we have a kernel.org
  repo I'd like to point to that instead so I changed that as well.
  We'll be centralizing all RISC-V Linux related development here as
  that seems to be the saner way to go about it.

  I can understand if it's too late to get this into 4.15, but given
  that it's not a code change I was hoping it'd still be OK. It would be
  nice to have the new mailing list and git repo in the release tarballs
  so when people start to find bugs they'll get to the right place"

* tag 'riscv-for-linus-4.15-maintainers' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
  Update the RISC-V MAINTAINERS file
2018-01-26 15:10:50 -08:00
Linus Torvalds ba804bb4b7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) The per-network-namespace loopback device, and thus its namespace,
    can have its teardown deferred for a long time if a kernel created
    TCP socket closes and the namespace is exiting meanwhile. The kernel
    keeps trying to finish the close sequence until it times out (which
    takes quite some time).

    Fix this by forcing the socket closed in this situation, from Dan
    Streetman.

 2) Fix regression where we're trying to invoke the update_pmtu method
    on route types (in this case metadata tunnel routes) that don't
    implement the dst_ops method. Fix from Nicolas Dichtel.

 3) Fix long standing memory corruption issues in r8169 driver by
    performing the chip statistics DMA programming more correctly. From
    Francois Romieu.

 4) Handle local broadcast sends over VRF routes properly, from David
    Ahern.

 5) Don't refire the DCCP CCID2 timer endlessly, otherwise the socket
    can never be released. From Alexey Kodanev.

 6) Set poll flags properly in VSOCK protocol layer, from Stefan
    Hajnoczi.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  VSOCK: set POLLOUT | POLLWRNORM for TCP_CLOSING
  dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
  net: vrf: Add support for sends to local broadcast address
  r8169: fix memory corruption on retrieval of hardware statistics.
  net: don't call update_pmtu unconditionally
  net: tcp: close sock if net namespace is exiting
2018-01-26 09:03:16 -08:00
Linus Torvalds db218549e6 vc4 and nouveau fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaawnZAAoJEAx081l5xIa+HNcP/RiVEwwAfSu0MzMosYvf2P53
 ekyWyHobwBzuGo5xjG0XvVbexXiad1D/JXCiNjoNPS65O/sbCG4nfh/w1mGlrJwi
 QXOnx0CDBwReITbzwMfyJl+gcsHnnKt2jPK5RJUqNy/0fEQKNNhnTsRL/rAzkvDc
 8Bc06blwXoCmPMAUsju3htzxCOKS+AgIqYH8qEDcZ+6aOA2f2/LU/hnpYdhl//CE
 IOujzKIoJhmNqvndAb7conik+PiKzlq3GEpx966QMZajnu6LKH8iFoFt5M7Jg6cD
 vcTEzLGZCIimb5wOqeLq3t9rgS05oScNKRryCxOGB9nTrgwhqAgRUQH3MCUVx+GZ
 OybTr3QmS/Oq7a0XjB8LU2M86zR192Kvl5xzmUgT9bhbPdvzR65e6C/I/02+75BY
 2FXrn1nGTFXApGPGKjmUo2hyRsSyVudfD6f4JUJX5rlbXiwyv2kfZv/2pCjnLYZt
 sAlawMKp+rv628Tx9rPD/dWvMR5Ftrqp55b4eNEZPnsqNMjlEvZjgy+fHJ4VPIII
 x9TJYTMlHqjy/tpWWn21qzMbRST1bNB1AaQnLNY10DaRkelEN2lPeNrG2xzC7+YG
 8Y/p3Tezmu15dlIx4KUcC+aFUntDS5mdzJuOgnc970DkKTMuPqMMSX433McCNKFG
 h3AaLo0IGQh+Dz2z1H0A
 =r4NK
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-for-v4.15-rc10-2' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "A fairly urgent nouveau regression fix for broken irqs across
  suspend/resume came in. This was broken before but a patch in 4.15 has
  made it much more obviously broken and now s/r fails a lot more often.

  The fix removes freeing the irq across s/r which never should have
  been done anyways.

  Also two vc4 fixes for a NULL deference and some misrendering /
  flickering on screen"

* tag 'drm-fixes-for-v4.15-rc10-2' of git://people.freedesktop.org/~airlied/linux:
  drm/nouveau: Move irq setup/teardown to pci ctor/dtor
  drm/vc4: Fix NULL pointer dereference in vc4_save_hang_state()
  drm/vc4: Flush the caches before the bin jobs, as well.
2018-01-26 08:59:57 -08:00
Stefan Hajnoczi ba3169fc75 VSOCK: set POLLOUT | POLLWRNORM for TCP_CLOSING
select(2) with wfds but no rfds must return when the socket is shut down
by the peer.  This way userspace notices socket activity and gets -EPIPE
from the next write(2).

Currently select(2) does not return for virtio-vsock when a SEND+RCV
shutdown packet is received.  This is because vsock_poll() only sets
POLLOUT | POLLWRNORM for TCP_CLOSE, not the TCP_CLOSING state that the
socket is in when the shutdown is received.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 11:16:27 -05:00
Alexey Kodanev dd5684ecae dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
ccid2_hc_tx_rto_expire() timer callback always restarts the timer
again and can run indefinitely (unless it is stopped outside), and after
commit 120e9dabaf ("dccp: defer ccid_hc_tx_delete() at dismantle time"),
which moved ccid_hc_tx_delete() (also includes sk_stop_timer()) from
dccp_destroy_sock() to sk_destruct(), this started to happen quite often.
The timer prevents releasing the socket, as a result, sk_destruct() won't
be called.

Found with LTP/dccp_ipsec tests running on the bonding device,
which later couldn't be unloaded after the tests were completed:

  unregister_netdevice: waiting for bond0 to become free. Usage count = 148

Fixes: 2a91aa3967 ("[DCCP] CCID2: Initial CCID2 (TCP-Like) implementation")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 11:15:00 -05:00
Palmer Dabbelt 6572cc2bf2
Update the RISC-V MAINTAINERS file
Now that we're upstream in Linux we've been able to make some
infrastructure changes so our port works a bit more like other ports.
Specifically:

* We now have a mailing list specific to the RISC-V Linux port, hosted
  at lists.infreadead.org.
* We now have a kernel.org git tree where work on our port is
  coordinated.

This patch changes the RISC-V maintainers entry to reflect these new
bits of infrastructure.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-01-26 08:01:24 -08:00
Andy Lutomirski 36b3a77268 x86/mm/64: Tighten up vmalloc_fault() sanity checks on 5-level kernels
On a 5-level kernel, if a non-init mm has a top-level entry, it needs to
match init_mm's, but the vmalloc_fault() code skipped over the BUG_ON()
that would have checked it.

While we're at it, get rid of the rather confusing 4-level folded "pgd"
logic.

Cleans-up: b50858ce3e ("x86/mm/vmalloc: Add 5-level paging support")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Neil Berrington <neil.berrington@datacore.com>
Link: https://lkml.kernel.org/r/2ae598f8c279b0a29baf75df207e6f2fdddc0a1b.1516914529.git.luto@kernel.org
2018-01-26 15:56:23 +01:00
Andy Lutomirski 5beda7d54e x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systems
Neil Berrington reported a double-fault on a VM with 768GB of RAM that uses
large amounts of vmalloc space with PTI enabled.

The cause is that load_new_mm_cr3() was never fixed to take the 5-level pgd
folding code into account, so, on a 4-level kernel, the pgd synchronization
logic compiles away to exactly nothing.

Interestingly, the problem doesn't trigger with nopti.  I assume this is
because the kernel is mapped with global pages if we boot with nopti.  The
sequence of operations when we create a new task is that we first load its
mm while still running on the old stack (which crashes if the old stack is
unmapped in the new mm unless the TLB saves us), then we call
prepare_switch_to(), and then we switch to the new stack.
prepare_switch_to() pokes the new stack directly, which will populate the
mapping through vmalloc_fault().  I assume that we're getting lucky on
non-PTI systems -- the old stack's TLB entry stays alive long enough to
make it all the way through prepare_switch_to() and switch_to() so that we
make it to a valid stack.

Fixes: b50858ce3e ("x86/mm/vmalloc: Add 5-level paging support")
Reported-and-tested-by: Neil Berrington <neil.berrington@datacore.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: stable@vger.kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/346541c56caed61abbe693d7d2742b4a380c5001.1516914529.git.luto@kernel.org
2018-01-26 15:56:23 +01:00