Commit graph

4761 commits

Author SHA1 Message Date
Sebastian Ott f9dc447ec8 s390/pci: fmb enhancements
Implement the function type specific function measurement block used
in new machines.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-04-15 18:16:40 +02:00
Jan Höppner 8fd575200d s390/dasd: Add new ioctl BIODASDCHECKFMT
Implement new DASD IOCTL BIODASDCHECKFMT to check a range of tracks on a
DASD volume for correct formatting. The following characteristics are
checked:
- Block size
- ECKD key length
- ECKD record ID
- Number of records per track

Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-04-15 18:16:39 +02:00
Sebastian Ott 368704a65b s390/pci: add report_error attribute
Provide an report_error attribute to send an adapter-error
notification associated with a PCI function.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-04-15 18:16:39 +02:00
Sebastian Ott 12283a4035 s390/sclp: add error notification command
Add SCLP event 24 "Adapter-error notification".

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-04-15 18:16:38 +02:00
Peter Zijlstra 0227f7c42d s390: Clarify pagefault interrupt
While looking at set_task_state() users I stumbled over the s390 pfault
interrupt code.  Since Heiko provided a great explanation on how it
worked, I figured we ought to preserve this.

Also make a few little tweaks to the code to aid in readability and
explicitly comment the unusual blocking scheme.

Based-on-text-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-04-15 18:16:37 +02:00
Heiko Carstens 2fd9227364 s390: add CPU_BIG_ENDIAN config option
Make sure that s390 appears to be a big endian machine by defining
this config option.

Without this s390 appears to be little endian as seen by e.g. the
recordmount script: "perl ./scripts/recordmcount.pl "s390" "little"
"64""
This has no practical impact within the script since the endian
variable is only evaluated for mips. However there are already a
couple of common code places which evaluate this config option. None
of them is relevant for s390 currently though.

To avoid any issues in the future (and fix the recordmcount oddity)
add the new config option.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-04-15 18:01:52 +02:00
Heiko Carstens 8497695243 s390/spinlock: avoid yield to non existent cpu
arch_spin_lock_wait_flags() checks if a spinlock is not held before
trying a compare and swap instruction. If the lock is unlocked it
tries the compare and swap instruction, however if a different cpu
grabbed the lock in the meantime the instruction will fail as
expected.

Subsequently the arch_spin_lock_wait_flags() incorrectly tries to
figure out if the cpu that holds the lock is running. However it is
using the wrong cpu number for this (-1) and then will also yield the
current cpu to the wrong cpu.

Fix this by adding a missing continue statement.

Fixes: 470ada6b1a ("s390/spinlock: refactor arch_spin_lock_wait[_flags]")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-04-15 18:01:48 +02:00
Michal Hocko 4edab14ec6 locking/rwsem, s390: Provide __down_write_killable()
Introduce ___down_write() for the fast path and reuse it for __down_write()
resp. __down_write_killable() each using the respective generic slow path
(rwsem_down_write_failed() resp. rwsem_down_write_failed_killable()).

Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-10-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 10:42:22 +02:00
Michal Hocko f8e04d8545 locking/rwsem: Get rid of __down_write_nested()
This is no longer used anywhere and all callers (__down_write()) use
0 as a subclass. Ditch __down_write_nested() to make the code easier
to follow.

This shouldn't introduce any functional change.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-2-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 10:42:16 +02:00
Linus Torvalds 541d8f4d59 Miscellaneous bugfixes. ARM and s390 are new from the merge window,
others are usual stable material.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJXA8x6AAoJEL/70l94x66D0x8H/RcBnc75994RQ++WmHSvD9GF
 yruGB8soLDdjX+Oceol0aEPHokrBu3JtcdoTBe0GwbCKV/F5NkQZ4EfLxDtR3tte
 7ILkPULLy5GElFpJNQuT4pmXzTEspFvXpqHhFik7WVBga3W9wMFQcjbrgmGBUzLE
 p2aJVhZyErpKxGFkUYWhDnlqWsguTTIzv/pqNhLY4VVc0UrXN9AA0fq9RkvgU3KS
 Hxk4/A6SV/b7dyzvttzITww0f1iu8FmlLj2TXapIEoOz7AnInD6KIN0RYpxbDjxN
 bEzEfpahUtuDeM87/t2kHEj0Gn09iHK7/BbCC1Hrwo1CQhbAQ/D0GIvqYAQixf4=
 =NugZ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Miscellaneous bugfixes.

  The ARM and s390 fixes are for new regressions from the merge window,
  others are usual stable material"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  compiler-gcc: disable -ftracer for __noclone functions
  kvm: x86: make lapic hrtimer pinned
  s390/mm/kvm: fix mis-merge in gmap handling
  kvm: set page dirty only if page has been writable
  KVM: x86: reduce default value of halt_poll_ns parameter
  KVM: Hyper-V: do not do hypercall userspace exits if SynIC is disabled
  KVM: x86: Inject pending interrupt even if pending nmi exist
  arm64: KVM: Register CPU notifiers when the kernel runs at HYP
  arm64: kvm: 4.6-rc1: Fix VTCR_EL2 VS setting
2016-04-05 16:16:00 -07:00
Christian Borntraeger 9c650d09a9 s390/mm/kvm: fix mis-merge in gmap handling
commit 1e133ab296 ("s390/mm: split arch/s390/mm/pgtable.c") dropped
some changes from commit a3a92c31bf ("KVM: s390: fix mismatch
between user and in-kernel guest limit") - this breaks KVM for some
memory sizes (kvm-s390: failed to commit memory region) like
exactly 2GB.

Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05 14:19:07 +02:00
Linus Torvalds 4a2d057e4f Merge branch 'PAGE_CACHE_SIZE-removal'
Merge PAGE_CACHE_SIZE removal patches from Kirill Shutemov:
 "PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
  ago with promise that one day it will be possible to implement page
  cache with bigger chunks than PAGE_SIZE.

  This promise never materialized.  And unlikely will.

  Let's stop pretending that pages in page cache are special.  They are
  not.

  The first patch with most changes has been done with coccinelle.  The
  second is manual fixups on top.

  The third patch removes macros definition"

[ I was planning to apply this just before rc2, but then I spaced out,
  so here it is right _after_ rc2 instead.

  As Kirill suggested as a possibility, I could have decided to only
  merge the first two patches, and leave the old interfaces for
  compatibility, but I'd rather get it all done and any out-of-tree
  modules and patches can trivially do the converstion while still also
  working with older kernels, so there is little reason to try to
  maintain the redundant legacy model.    - Linus ]

* PAGE_CACHE_SIZE-removal:
  mm: drop PAGE_CACHE_* and page_cache_{get,release} definition
  mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage
  mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
2016-04-04 10:50:24 -07:00
Kirill A. Shutemov 09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Sudip Mukherjee 4f375903cc s390/seccomp: include generic seccomp header file
Fixes this build error on linux-next:

kernel/seccomp.c: In function '__secure_computing_strict':
kernel/seccomp.c:526:3: error: implicit declaration of function
				'get_compat_mode1_syscalls'

The retrieval of compat syscall numbers were moved into inline function
defined in asm-generic header but the asm-generic header is not being
used by s390.

[heiko.carstens@de.ibm.com]: even though the build error will trigger
only in the next merge window it makes sense to include the generic
header file already now.

Fixes: ("seccomp: Get compat syscalls from asm-generic header")
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-04-01 17:20:55 +02:00
Sebastian Ott 9d89d9e61d s390/pci: add extra padding to function measurement block
Newer machines might use a different (larger) format for function
measurement blocks. To ensure that we comply with the alignment
requirement on these machines and prevent memory corruption (when
firmware writes more data than we expect) add 16 padding bytes
at the end of the fmb.

Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-04-01 17:20:55 +02:00
Jessica Yu 425595a7fc livepatch: reuse module loader code to write relocations
Reuse module loader code to write relocations, thereby eliminating the need
for architecture specific relocation code in livepatch. Specifically, reuse
the apply_relocate_add() function in the module loader to write relocations
instead of duplicating functionality in livepatch's arch-dependent
klp_write_module_reloc() function.

In order to accomplish this, livepatch modules manage their own relocation
sections (marked with the SHF_RELA_LIVEPATCH section flag) and
livepatch-specific symbols (marked with SHN_LIVEPATCH symbol section
index). To apply livepatch relocation sections, livepatch symbols
referenced by relocs are resolved and then apply_relocate_add() is called
to apply those relocations.

In addition, remove x86 livepatch relocation code and the s390
klp_write_module_reloc() function stub. They are no longer needed since
relocation work has been offloaded to module loader.

Lastly, mark the module as a livepatch module so that the module loader
canappropriately identify and initialize it.

Signed-off-by: Jessica Yu <jeyu@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>   # for s390 changes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-04-01 15:00:11 +02:00
Jessica Yu f31e0960f3 module: s390: keep mod_arch_specific for livepatch modules
Livepatch needs to utilize the symbol information contained in the
mod_arch_specific struct in order to be able to call the s390
apply_relocate_add() function to apply relocations. Keep a reference to
syminfo if the module is a livepatch module. Remove the redundant vfree()
in module_finalize() since module_arch_freeing_init() (which also frees
those structures) is called in do_init_module(). If the module isn't a
livepatch module, we free the structures in module_arch_freeing_init() as
usual.

Signed-off-by: Jessica Yu <jeyu@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-04-01 15:00:11 +02:00
Linus Torvalds dc8a64ee1a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 - A proper fix for the locking issue in the dasd driver
 - Wire up the new preadv2 nad pwritev2 system calls
 - Add the mark_rodata_ro function and set DEBUG_RODATA=y
 - A few more bug fixes.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: wire up preadv2/pwritev2 syscalls
  s390/pci: PCI function group 0 is valid for clp_query_pci_fn
  s390/crypto: provide correct file mode at device register.
  s390/mm: handle PTE-mapped tail pages in fast gup
  s390: add DEBUG_RODATA support
  s390: disable postinit-readonly for now
  s390/dasd: reorder lcu and device lock
  s390/cpum_sf: Fix cpu hotplug notifier transitions
  s390/cpum_cf: Fix missing cpu hotplug notifier transition
2016-04-01 07:15:54 -05:00
Heiko Carstens 3358999a8e s390: wire up preadv2/pwritev2 syscalls
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-04-01 08:42:38 +02:00
Pierre Morel aa624886b6 s390/pci: PCI function group 0 is valid for clp_query_pci_fn
The PCI function group 0 is a valid function group,
it is wrong to reject it.

Let's accept PCI function group 0.

Signed-off-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-04-01 08:42:35 +02:00
Harald Freudenberger 74b2375e67 s390/crypto: provide correct file mode at device register.
When the prng device driver calls misc_register() there is the possibility
to also provide the recommented file permissions. This fix now gives
useful values (0644) where previously just the default was used (resulting
in 0600 for the device file).

Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-29 10:55:12 +02:00
Alexander Potapenko be7635e728 arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections
KASAN needs to know whether the allocation happens in an IRQ handler.
This lets us strip everything below the IRQ entry point to reduce the
number of unique stack traces needed to be stored.

Move the definition of __irq_entry to <linux/interrupt.h> so that the
users don't need to pull in <linux/ftrace.h>.  Also introduce the
__softirq_entry macro which is similar to __irq_entry, but puts the
corresponding functions to the .softirqentry.text section.

Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Ard Biesheuvel c352e8b6de s390/extable: use generic search and sort routines
Replace the arch specific versions of search_extable() and
sort_extable() with calls to the generic ones, which now support
relative exception tables as well.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Linus Torvalds 643ad15d47 Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 protection key support from Ingo Molnar:
 "This tree adds support for a new memory protection hardware feature
  that is available in upcoming Intel CPUs: 'protection keys' (pkeys).

  There's a background article at LWN.net:

      https://lwn.net/Articles/643797/

  The gist is that protection keys allow the encoding of
  user-controllable permission masks in the pte.  So instead of having a
  fixed protection mask in the pte (which needs a system call to change
  and works on a per page basis), the user can map a (handful of)
  protection mask variants and can change the masks runtime relatively
  cheaply, without having to change every single page in the affected
  virtual memory range.

  This allows the dynamic switching of the protection bits of large
  amounts of virtual memory, via user-space instructions.  It also
  allows more precise control of MMU permission bits: for example the
  executable bit is separate from the read bit (see more about that
  below).

  This tree adds the MM infrastructure and low level x86 glue needed for
  that, plus it adds a high level API to make use of protection keys -
  if a user-space application calls:

        mmap(..., PROT_EXEC);

  or

        mprotect(ptr, sz, PROT_EXEC);

  (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
  this special case, and will set a special protection key on this
  memory range.  It also sets the appropriate bits in the Protection
  Keys User Rights (PKRU) register so that the memory becomes unreadable
  and unwritable.

  So using protection keys the kernel is able to implement 'true'
  PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
  PROT_READ as well.  Unreadable executable mappings have security
  advantages: they cannot be read via information leaks to figure out
  ASLR details, nor can they be scanned for ROP gadgets - and they
  cannot be used by exploits for data purposes either.

  We know about no user-space code that relies on pure PROT_EXEC
  mappings today, but binary loaders could start making use of this new
  feature to map binaries and libraries in a more secure fashion.

  There is other pending pkeys work that offers more high level system
  call APIs to manage protection keys - but those are not part of this
  pull request.

  Right now there's a Kconfig that controls this feature
  (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
  (like most x86 CPU feature enablement code that has no runtime
  overhead), but it's not user-configurable at the moment.  If there's
  any serious problem with this then we can make it configurable and/or
  flip the default"

* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
  mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
  x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
  mm/core, x86/mm/pkeys: Add execute-only protection keys support
  x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
  x86/mm/pkeys: Allow kernel to modify user pkey rights register
  x86/fpu: Allow setting of XSAVE state
  x86/mm: Factor out LDT init from context init
  mm/core, x86/mm/pkeys: Add arch_validate_pkey()
  mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
  x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
  x86/mm/pkeys: Add Kconfig prompt to existing config option
  x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
  x86/mm/pkeys: Dump PKRU with other kernel registers
  mm/core, x86/mm/pkeys: Differentiate instruction fetches
  x86/mm/pkeys: Optimize fault handling in access_error()
  mm/core: Do not enforce PKEY permissions on remote mm access
  um, pkeys: Add UML arch_*_access_permitted() methods
  mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
  x86/mm/gup: Simplify get_user_pages() PTE bit handling
  ...
2016-03-20 19:08:56 -07:00
Linus Torvalds f0691533b7 virtio/vhost: new features, performance improvements, cleanups
This adds basic polling support for vhost.
 Reworks virtio to optionally use DMA API, fixing it on Xen.
 Balloon stats gained a new entry.
 Using the new napi_alloc_skb speeds up virtio net.
 virtio blk stats can now be read while another VCPU
 us busy inflating or deflating the balloon.
 Plus misc cleanups in various places.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW7qRJAAoJECgfDbjSjVRpVNoH/A7z+lZ6nooSJ9fUBtAAlwit
 mE1VKi8g0G6naV1NVLFVe7hPAejExGiHfR3ZUrVoenJKj2yeW/DFojFC10YR/KTe
 ac7Imuc+owA3UOE/QpeGBs59+EEWKTZUYt6r8HSJVwoodeosw9v2ecP/Iwhbax8H
 a4V3HqOADjKnHg73R9o3u+bAgA1GrGYHeK0AfhCBSTNwlPdxkvf0463HgfOpM4nl
 /sNoFWO3vOyekk+loIk+jpmWVIoIfG2NFzW4lPwEPkfqUBX7r0ei/NR23hIqHL7r
 QZ6vMj1Ew9qctUONbJu4kXjuV2Vk9NhxwbDjoJtm8plKL2hz2prJynUEogkHh2g=
 =VMD0
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio/vhost updates from Michael Tsirkin:
 "New features, performance improvements, cleanups:

   - basic polling support for vhost
   - rework virtio to optionally use DMA API, fixing it on Xen
   - balloon stats gained a new entry
   - using the new napi_alloc_skb speeds up virtio net
   - virtio blk stats can now be read while another VCPU is busy
     inflating or deflating the balloon

  plus misc cleanups in various places"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio_net: replace netdev_alloc_skb_ip_align() with napi_alloc_skb()
  vhost_net: basic polling support
  vhost: introduce vhost_vq_avail_empty()
  vhost: introduce vhost_has_work()
  virtio_balloon: Allow to resize and update the balloon stats in parallel
  virtio_balloon: Use a workqueue instead of "vballoon" kthread
  virtio/s390: size of SET_IND payload
  virtio/s390: use dev_to_virtio
  vhost: rename vhost_init_used()
  vhost: rename cross-endian helpers
  virtio_blk: VIRTIO_BLK_F_WCE->VIRTIO_BLK_F_FLUSH
  vring: Use the DMA API on Xen
  virtio_pci: Use the DMA API if enabled
  virtio_mmio: Use the DMA API if enabled
  virtio: Add improved queue allocation API
  virtio_ring: Support DMA APIs
  vring: Introduce vring_use_dma_api()
  s390/dma: Allow per device dma ops
  alpha/dma: use common noop dma ops
  dma: Provide simple noop dma ops
2016-03-20 13:28:18 -07:00
Linus Torvalds 1200b6809d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support more Realtek wireless chips, from Jes Sorenson.

   2) New BPF types for per-cpu hash and arrap maps, from Alexei
      Starovoitov.

   3) Make several TCP sysctls per-namespace, from Nikolay Borisov.

   4) Allow the use of SO_REUSEPORT in order to do per-thread processing
   of incoming TCP/UDP connections.  The muxing can be done using a
   BPF program which hashes the incoming packet.  From Craig Gallek.

   5) Add a multiplexer for TCP streams, to provide a messaged based
      interface.  BPF programs can be used to determine the message
      boundaries.  From Tom Herbert.

   6) Add 802.1AE MACSEC support, from Sabrina Dubroca.

   7) Avoid factorial complexity when taking down an inetdev interface
      with lots of configured addresses.  We were doing things like
      traversing the entire address less for each address removed, and
      flushing the entire netfilter conntrack table for every address as
      well.

   8) Add and use SKB bulk free infrastructure, from Jesper Brouer.

   9) Allow offloading u32 classifiers to hardware, and implement for
      ixgbe, from John Fastabend.

  10) Allow configuring IRQ coalescing parameters on a per-queue basis,
      from Kan Liang.

  11) Extend ethtool so that larger link mode masks can be supported.
      From David Decotigny.

  12) Introduce devlink, which can be used to configure port link types
      (ethernet vs Infiniband, etc.), port splitting, and switch device
      level attributes as a whole.  From Jiri Pirko.

  13) Hardware offload support for flower classifiers, from Amir Vadai.

  14) Add "Local Checksum Offload".  Basically, for a tunneled packet
      the checksum of the outer header is 'constant' (because with the
      checksum field filled into the inner protocol header, the payload
      of the outer frame checksums to 'zero'), and we can take advantage
      of that in various ways.  From Edward Cree"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
  bonding: fix bond_get_stats()
  net: bcmgenet: fix dma api length mismatch
  net/mlx4_core: Fix backward compatibility on VFs
  phy: mdio-thunder: Fix some Kconfig typos
  lan78xx: add ndo_get_stats64
  lan78xx: handle statistics counter rollover
  RDS: TCP: Remove unused constant
  RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
  net: smc911x: convert pxa dma to dmaengine
  team: remove duplicate set of flag IFF_MULTICAST
  bonding: remove duplicate set of flag IFF_MULTICAST
  net: fix a comment typo
  ethernet: micrel: fix some error codes
  ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
  bpf, dst: add and use dst_tclassid helper
  bpf: make skb->tc_classid also readable
  net: mvneta: bm: clarify dependencies
  cls_bpf: reset class and reuse major in da
  ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
  ldmvsw: Add ldmvsw.c driver code
  ...
2016-03-19 10:05:34 -07:00
Linus Torvalds 814a2bf957 Merge branch 'akpm' (patches from Andrew)
Merge second patch-bomb from Andrew Morton:

 - a couple of hotfixes

 - the rest of MM

 - a new timer slack control in procfs

 - a couple of procfs fixes

 - a few misc things

 - some printk tweaks

 - lib/ updates, notably to radix-tree.

 - add my and Nick Piggin's old userspace radix-tree test harness to
   tools/testing/radix-tree/.  Matthew said it was a godsend during the
   radix-tree work he did.

 - a few code-size improvements, switching to __always_inline where gcc
   screwed up.

 - partially implement character sets in sscanf

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
  sscanf: implement basic character sets
  lib/bug.c: use common WARN helper
  param: convert some "on"/"off" users to strtobool
  lib: add "on"/"off" support to kstrtobool
  lib: update single-char callers of strtobool()
  lib: move strtobool() to kstrtobool()
  include/linux/unaligned: force inlining of byteswap operations
  include/uapi/linux/byteorder, swab: force inlining of some byteswap operations
  include/asm-generic/atomic-long.h: force inlining of some atomic_long operations
  usb: common: convert to use match_string() helper
  ide: hpt366: convert to use match_string() helper
  ata: hpt366: convert to use match_string() helper
  power: ab8500: convert to use match_string() helper
  power: charger_manager: convert to use match_string() helper
  drm/edid: convert to use match_string() helper
  pinctrl: convert to use match_string() helper
  device property: convert to use match_string() helper
  lib/string: introduce match_string() helper
  radix-tree tests: add test for radix_tree_iter_next
  radix-tree tests: add regression3 test
  ...
2016-03-18 19:26:54 -07:00
Linus Torvalds 0f49fc95b8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching update from Jiri Kosina:

 - cleanup of module notifiers; this depends on a module.c cleanup which
   has been acked by Rusty; from Jessica Yu

 - small assorted fixes and MAINTAINERS update

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch/module: remove livepatch module notifier
  modules: split part of complete_formation() into prepare_coming_module()
  livepatch: Update maintainers
  livepatch: Fix the error message about unresolvable ambiguity
  klp: remove CONFIG_LIVEPATCH dependency from klp headers
  klp: remove superfluous errors in asm/livepatch.h
2016-03-17 21:46:32 -07:00
Kees Cook 4cc7ecb7f2 param: convert some "on"/"off" users to strtobool
This changes several users of manual "on"/"off" parsing to use
strtobool.

Some side-effects:
- these uses will now parse y/n/1/0 meaningfully too
- the early_param uses will now bubble up parse errors

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Amitkumar Karwar <akarwar@marvell.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Perches <joe@perches.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nishant Sarmukadam <nishants@marvell.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Steve French <sfrench@samba.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Linus Torvalds 70477371dc Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "Here is the crypto update for 4.6:

  API:
   - Convert remaining crypto_hash users to shash or ahash, also convert
     blkcipher/ablkcipher users to skcipher.
   - Remove crypto_hash interface.
   - Remove crypto_pcomp interface.
   - Add crypto engine for async cipher drivers.
   - Add akcipher documentation.
   - Add skcipher documentation.

  Algorithms:
   - Rename crypto/crc32 to avoid name clash with lib/crc32.
   - Fix bug in keywrap where we zero the wrong pointer.

  Drivers:
   - Support T5/M5, T7/M7 SPARC CPUs in n2 hwrng driver.
   - Add PIC32 hwrng driver.
   - Support BCM6368 in bcm63xx hwrng driver.
   - Pack structs for 32-bit compat users in qat.
   - Use crypto engine in omap-aes.
   - Add support for sama5d2x SoCs in atmel-sha.
   - Make atmel-sha available again.
   - Make sahara hashing available again.
   - Make ccp hashing available again.
   - Make sha1-mb available again.
   - Add support for multiple devices in ccp.
   - Improve DMA performance in caam.
   - Add hashing support to rockchip"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits)
  crypto: qat - remove redundant arbiter configuration
  crypto: ux500 - fix checks of error code returned by devm_ioremap_resource()
  crypto: atmel - fix checks of error code returned by devm_ioremap_resource()
  crypto: qat - Change the definition of icp_qat_uof_regtype
  hwrng: exynos - use __maybe_unused to hide pm functions
  crypto: ccp - Add abstraction for device-specific calls
  crypto: ccp - CCP versioning support
  crypto: ccp - Support for multiple CCPs
  crypto: ccp - Remove check for x86 family and model
  crypto: ccp - memset request context to zero during import
  lib/mpi: use "static inline" instead of "extern inline"
  lib/mpi: avoid assembler warning
  hwrng: bcm63xx - fix non device tree compatibility
  crypto: testmgr - allow rfc3686 aes-ctr variants in fips mode.
  crypto: qat - The AE id should be less than the maximal AE number
  lib/mpi: Endianness fix
  crypto: rockchip - add hash support for crypto engine in rk3288
  crypto: xts - fix compile errors
  crypto: doc - add skcipher API documentation
  crypto: doc - update AEAD AD handling
  ...
2016-03-17 11:22:54 -07:00
Gerald Schaefer fc897c95e9 s390/mm: handle PTE-mapped tail pages in fast gup
With the THP refcounting rework it is possible to see THP compound tail
pages mapped with PTEs during a THP split. This needs to be considered
when using page_cache_get_speculative(), which will always fail on tail
pages because ->_count is always zero. commit 7aef4172 "mm: handle
PTE-mapped tail pages in gerneric fast gup implementaiton" fixed it for
the generic fast gup code by using compound_head(page) instead of page,
but not for s390.

This patch is a 1:1 adaption of commit 7aef4172 for the s390 fast gup
code. Without this fix, gup will fall back to the slow path or fail
in the unlikely scenario that we hit a THP under splitting in-between
the page table split and the compound page split.

Cc: stable@vger.kernel.org # v4.5
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-17 16:42:14 +01:00
Heiko Carstens 91d3721176 s390: add DEBUG_RODATA support
git commit d2aa1acad2 ("mm/init: Add 'rodata=off' boot cmdline
parameter to disable read-only kernel mappings") adds a bogus warning
to the console which states that s390 does not support kernel memory
protection.

This however is not true. We do support that since a couple of years
however in a different way than the author of the above named patch
expected.

To get rid of the misleading message implement the mark_rodata_ro
function and emit a message which states the amount of memory which
was write protected already earlier.

This is the same what parisc currently does.

We currently do not support the kernel parameter "rodata=off" which
would allow to write to the rodata section again. However since we
have this feature since years without any problems there is no reason
to add support for this.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-17 13:18:25 +01:00
Kees Cook df9ceff906 s390: disable postinit-readonly for now
This is a temporary fix to let lkdtm run again on s390, though it'll
still fail the ro_after_init tests. Until rodata and ro_after_init
sections can be split on s390, disable special handling of ro_after_init.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-17 13:18:25 +01:00
Anna-Maria Gleixner 1e3c1dd15d s390/cpum_sf: Fix cpu hotplug notifier transitions
The cpumf_pmu_notfier() hotplug callback lacks handling of the
CPU_DOWN_FAILED case. That means, if CPU_DOWN_PREPARE failes, the PMC
of the CPU is not setup again. Furthermore the CPU_ONLINE_FROZEN case
will never be processed because of masking the switch expression with
CPU_TASKS_FROZEN.

Add handling for CPU_DOWN_FAILED transition to setup the PMC of the
CPU. Remove CPU_ONLINE_FROZEN case.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Acked-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-17 13:18:24 +01:00
Anna-Maria Gleixner a987884295 s390/cpum_cf: Fix missing cpu hotplug notifier transition
The cpumf_pmu_notfier() hotplug callback lacks handling of the
CPU_DOWN_FAILED case. That means, if CPU_DOWN_PREPARE failes, the PMC
of the CPU is not setup again.

Add handling for CPU_DOWN_FAILED transition to setup the PMC of the
CPU.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-17 13:18:24 +01:00
Linus Torvalds 63e30271b0 PCI changes for the v4.6 merge window:
Enumeration
     Disable IO/MEM decoding for devices with non-compliant BARs (Bjorn Helgaas)
     Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs (Bjorn Helgaas
 
   Resource management
     Mark shadow copy of VGA ROM as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
     Don't assign or reassign immutable resources (Bjorn Helgaas)
     Don't enable/disable ROM BAR if we're using a RAM shadow copy (Bjorn Helgaas)
     Set ROM shadow location in arch code, not in PCI core (Bjorn Helgaas)
     Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs (Bjorn Helgaas)
     ia64: Use ioremap() instead of open-coded equivalent (Bjorn Helgaas)
     ia64: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas)
     MIPS: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas)
     Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY (Bjorn Helgaas)
     Don't leak memory if sysfs_create_bin_file() fails (Bjorn Helgaas)
     rcar: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi)
     designware: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi)
 
   Virtualization
     Wait for up to 1000ms after FLR reset (Alex Williamson)
     Support SR-IOV on any function type (Kelly Zytaruk)
     Add ACS quirk for all Cavium devices (Manish Jaggi)
 
   AER
     Rename pci_ops_aer to aer_inj_pci_ops (Bjorn Helgaas)
     Restore pci_ops pointer while calling original pci_ops (David Daney)
     Fix aer_inject error codes (Jean Delvare)
     Use dev_warn() in aer_inject (Jean Delvare)
     Log actual error causes in aer_inject (Jean Delvare)
     Log aer_inject error injections (Jean Delvare)
 
   VPD
     Prevent VPD access for buggy devices (Babu Moger)
     Move pci_read_vpd() and pci_write_vpd() close to other VPD code (Bjorn Helgaas)
     Move pci_vpd_release() from header file to pci/access.c (Bjorn Helgaas)
     Remove struct pci_vpd_ops.release function pointer (Bjorn Helgaas)
     Rename VPD symbols to remove unnecessary "pci22" (Bjorn Helgaas)
     Fold struct pci_vpd_pci22 into struct pci_vpd (Bjorn Helgaas)
     Sleep rather than busy-wait for VPD access completion (Bjorn Helgaas)
     Update VPD definitions (Hannes Reinecke)
     Allow access to VPD attributes with size 0 (Hannes Reinecke)
     Determine actual VPD size on first access (Hannes Reinecke)
 
   Generic host bridge driver
     Move structure definitions to separate header file (David Daney)
     Add pci_host_common_probe(), based on gen_pci_probe() (David Daney)
     Expose pci_host_common_probe() for use by other drivers (David Daney)
 
   Altera host bridge driver
     Fix altera_pcie_link_is_up() (Ley Foon Tan)
 
   Cavium ThunderX host bridge driver
     Add PCIe host driver for ThunderX processors (David Daney)
     Add driver for ThunderX-pass{1,2} on-chip devices (David Daney)
 
   Freescale i.MX6 host bridge driver
     Add DT bindings to configure PHY Tx driver settings (Justin Waters)
     Move imx6_pcie_reset_phy() near other PHY handling functions (Lucas Stach)
     Move PHY reset into imx6_pcie_establish_link() (Lucas Stach)
     Remove broken Gen2 workaround (Lucas Stach)
     Move link up check into imx6_pcie_wait_for_link() (Lucas Stach)
 
   Freescale Layerscape host bridge driver
     Add "fsl,ls2085a-pcie" compatible ID (Yang Shi)
 
   Intel VMD host bridge driver
     Attach VMD resources to parent domain's resource tree (Jon Derrick)
     Set bus resource start to 0 (Keith Busch)
 
   Microsoft Hyper-V host bridge driver
     Add fwnode_handle to x86 pci_sysdata (Jake Oshins)
     Look up IRQ domain by fwnode_handle (Jake Oshins)
     Add paravirtual PCI front-end for Microsoft Hyper-V VMs (Jake Oshins)
 
   NVIDIA Tegra host bridge driver
     Add pci_ops.{add,remove}_bus() callbacks (Thierry Reding)
     Implement ->{add,remove}_bus() callbacks (Thierry Reding)
     Remove unused struct tegra_pcie.num_ports field (Thierry Reding)
     Track bus -> CPU mapping (Thierry Reding)
     Remove misleading PHYS_OFFSET (Thierry Reding)
 
   Renesas R-Car host bridge driver
     Depend on ARCH_RENESAS, not ARCH_SHMOBILE (Simon Horman)
 
   Synopsys DesignWare host bridge driver
     ARC: Add PCI support (Joao Pinto)
     Add generic dw_pcie_wait_for_link() (Joao Pinto)
     Add default link up check if sub-driver doesn't override (Joao Pinto)
     Add driver for prototyping kits based on ARC SDP (Joao Pinto)
 
   TI Keystone host bridge driver
     Defer probing if devm_phy_get() returns -EPROBE_DEFER (Shawn Lin)
 
   Xilinx AXI host bridge driver
     Use of_pci_get_host_bridge_resources() to parse DT (Bharat Kumar Gogada)
     Remove dependency on ARM-specific struct hw_pci (Bharat Kumar Gogada)
     Don't call pci_fixup_irqs() on Microblaze (Bharat Kumar Gogada)
     Update Zynq binding with Microblaze node (Bharat Kumar Gogada)
     microblaze: Support generic Xilinx AXI PCIe Host Bridge IP driver (Bharat Kumar Gogada)
 
   Xilinx NWL host bridge driver
     Add support for Xilinx NWL PCIe Host Controller (Bharat Kumar Gogada)
 
   Miscellaneous
     Check device_attach() return value always (Bjorn Helgaas)
     Move pci_set_flags() from asm-generic/pci-bridge.h to linux/pci.h (Bjorn Helgaas)
     Remove includes of empty asm-generic/pci-bridge.h (Bjorn Helgaas)
     ARM64: Remove generated include of asm-generic/pci-bridge.h (Bjorn Helgaas)
     Remove empty asm-generic/pci-bridge.h (Bjorn Helgaas)
     Remove includes of asm/pci-bridge.h (Bjorn Helgaas)
     Consolidate PCI DMA constants and interfaces in linux/pci-dma-compat.h (Bjorn Helgaas)
     unicore32: Remove unused HAVE_ARCH_PCI_SET_DMA_MASK definition (Bjorn Helgaas)
     Cleanup pci/pcie/Kconfig whitespace (Andreas Ziegler)
     Include pci/hotplug Kconfig directly from pci/Kconfig (Bjorn Helgaas)
     Include pci/pcie/Kconfig directly from pci/Kconfig (Bogicevic Sasa)
     frv: Remove stray pci_{alloc,free}_consistent() declaration (Christoph Hellwig)
     Move pci_dma_* helpers to common code (Christoph Hellwig)
     Add PCI_CLASS_SERIAL_USB_DEVICE definition (Heikki Krogerus)
     Add QEMU top-level IDs for (sub)vendor & device (Robin H. Johnson)
     Fix broken URL for Dell biosdevname (Naga Venkata Sai Indubhaskar Jupudi)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW6XgMAAoJEFmIoMA60/r8Yq4P/1nNwwZPikU+9Z8k0HyGPll6
 vqXBOYj/wlbAxJTzH2weaoyUamFrwvsKaO3Vap3xHkAeTFPD/Dp0TipCCNMrZ82Z
 j1y83JJpenkRyX6ifLARCNYpOtvnvgzSrO9x7Sb2Xfqb64dPb7+jGAfOpGNzhKsO
 n1nj/L7RGx8Q6fNFGf8ANMXKTsdkdL+1pdwegjUXmD5WdOT+oW8DmqVbhyfSKwl0
 E8r4Ml2lIg7Qd5Wu5iKMIBsR0+5HEyrwV7ch92wXChwKfoRwG70qnn7FGdc0y5ZB
 XvJuj8UD5UeMxEUeoRa9SwU6wWQT3Q9e6BzMS+P+43z36SPYjMfy/Xffv054z/bY
 rQomLjuGxNLESpmfNK5JfKxWoe2YNXjHQIDWMrAHyNlwdKJbYiwPcxnZJhvOa/eB
 p0QYcGS7O43STjibG9PZhzeq8tuSJRshxi0W6iB9QlqO8qs8nJQxIO+sZj/vl4yz
 lSnswWcV9062KITl8Fe9xDw244/RTz1xSVCdldlSoDhJyeMOjRvzS8raUMyyVmbA
 YULsI3l2iCl+fwDm/T21o7hJG966oYdAmgEv7lc7BWfgEAMg//LZXvMzVvrPFB2D
 R77u/0idtOciVJrmnO/x9DnQO2hzro9SLmVH6m0+0YU4wSSpZfGn98PCrtkatOAU
 c8zT9dJgyJVE3Z7cnPJ4
 =otsF
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "PCI changes for v4.6:

  Enumeration:
   - Disable IO/MEM decoding for devices with non-compliant BARs (Bjorn Helgaas)
   - Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs (Bjorn Helgaas

  Resource management:
   - Mark shadow copy of VGA ROM as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
   - Don't assign or reassign immutable resources (Bjorn Helgaas)
   - Don't enable/disable ROM BAR if we're using a RAM shadow copy (Bjorn Helgaas)
   - Set ROM shadow location in arch code, not in PCI core (Bjorn Helgaas)
   - Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs (Bjorn Helgaas)
   - ia64: Use ioremap() instead of open-coded equivalent (Bjorn Helgaas)
   - ia64: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas)
   - MIPS: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas)
   - Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY (Bjorn Helgaas)
   - Don't leak memory if sysfs_create_bin_file() fails (Bjorn Helgaas)
   - rcar: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi)
   - designware: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi)

  Virtualization:
   - Wait for up to 1000ms after FLR reset (Alex Williamson)
   - Support SR-IOV on any function type (Kelly Zytaruk)
   - Add ACS quirk for all Cavium devices (Manish Jaggi)

  AER:
   - Rename pci_ops_aer to aer_inj_pci_ops (Bjorn Helgaas)
   - Restore pci_ops pointer while calling original pci_ops (David Daney)
   - Fix aer_inject error codes (Jean Delvare)
   - Use dev_warn() in aer_inject (Jean Delvare)
   - Log actual error causes in aer_inject (Jean Delvare)
   - Log aer_inject error injections (Jean Delvare)

  VPD:
   - Prevent VPD access for buggy devices (Babu Moger)
   - Move pci_read_vpd() and pci_write_vpd() close to other VPD code (Bjorn Helgaas)
   - Move pci_vpd_release() from header file to pci/access.c (Bjorn Helgaas)
   - Remove struct pci_vpd_ops.release function pointer (Bjorn Helgaas)
   - Rename VPD symbols to remove unnecessary "pci22" (Bjorn Helgaas)
   - Fold struct pci_vpd_pci22 into struct pci_vpd (Bjorn Helgaas)
   - Sleep rather than busy-wait for VPD access completion (Bjorn Helgaas)
   - Update VPD definitions (Hannes Reinecke)
   - Allow access to VPD attributes with size 0 (Hannes Reinecke)
   - Determine actual VPD size on first access (Hannes Reinecke)

  Generic host bridge driver:
   - Move structure definitions to separate header file (David Daney)
   - Add pci_host_common_probe(), based on gen_pci_probe() (David Daney)
   - Expose pci_host_common_probe() for use by other drivers (David Daney)

  Altera host bridge driver:
   - Fix altera_pcie_link_is_up() (Ley Foon Tan)

  Cavium ThunderX host bridge driver:
   - Add PCIe host driver for ThunderX processors (David Daney)
   - Add driver for ThunderX-pass{1,2} on-chip devices (David Daney)

  Freescale i.MX6 host bridge driver:
   - Add DT bindings to configure PHY Tx driver settings (Justin Waters)
   - Move imx6_pcie_reset_phy() near other PHY handling functions (Lucas Stach)
   - Move PHY reset into imx6_pcie_establish_link() (Lucas Stach)
   - Remove broken Gen2 workaround (Lucas Stach)
   - Move link up check into imx6_pcie_wait_for_link() (Lucas Stach)

  Freescale Layerscape host bridge driver:
   - Add "fsl,ls2085a-pcie" compatible ID (Yang Shi)

  Intel VMD host bridge driver:
   - Attach VMD resources to parent domain's resource tree (Jon Derrick)
   - Set bus resource start to 0 (Keith Busch)

  Microsoft Hyper-V host bridge driver:
   - Add fwnode_handle to x86 pci_sysdata (Jake Oshins)
   - Look up IRQ domain by fwnode_handle (Jake Oshins)
   - Add paravirtual PCI front-end for Microsoft Hyper-V VMs (Jake Oshins)

  NVIDIA Tegra host bridge driver:
   - Add pci_ops.{add,remove}_bus() callbacks (Thierry Reding)
   - Implement ->{add,remove}_bus() callbacks (Thierry Reding)
   - Remove unused struct tegra_pcie.num_ports field (Thierry Reding)
   - Track bus -> CPU mapping (Thierry Reding)
   - Remove misleading PHYS_OFFSET (Thierry Reding)

  Renesas R-Car host bridge driver:
   - Depend on ARCH_RENESAS, not ARCH_SHMOBILE (Simon Horman)

  Synopsys DesignWare host bridge driver:
   - ARC: Add PCI support (Joao Pinto)
   - Add generic dw_pcie_wait_for_link() (Joao Pinto)
   - Add default link up check if sub-driver doesn't override (Joao Pinto)
   - Add driver for prototyping kits based on ARC SDP (Joao Pinto)

  TI Keystone host bridge driver:
   - Defer probing if devm_phy_get() returns -EPROBE_DEFER (Shawn Lin)

  Xilinx AXI host bridge driver:
   - Use of_pci_get_host_bridge_resources() to parse DT (Bharat Kumar Gogada)
   - Remove dependency on ARM-specific struct hw_pci (Bharat Kumar Gogada)
   - Don't call pci_fixup_irqs() on Microblaze (Bharat Kumar Gogada)
   - Update Zynq binding with Microblaze node (Bharat Kumar Gogada)
   - microblaze: Support generic Xilinx AXI PCIe Host Bridge IP driver (Bharat Kumar Gogada)

  Xilinx NWL host bridge driver:
   - Add support for Xilinx NWL PCIe Host Controller (Bharat Kumar Gogada)

  Miscellaneous:
   - Check device_attach() return value always (Bjorn Helgaas)
   - Move pci_set_flags() from asm-generic/pci-bridge.h to linux/pci.h (Bjorn Helgaas)
   - Remove includes of empty asm-generic/pci-bridge.h (Bjorn Helgaas)
   - ARM64: Remove generated include of asm-generic/pci-bridge.h (Bjorn Helgaas)
   - Remove empty asm-generic/pci-bridge.h (Bjorn Helgaas)
   - Remove includes of asm/pci-bridge.h (Bjorn Helgaas)
   - Consolidate PCI DMA constants and interfaces in linux/pci-dma-compat.h (Bjorn Helgaas)
   - unicore32: Remove unused HAVE_ARCH_PCI_SET_DMA_MASK definition (Bjorn Helgaas)
   - Cleanup pci/pcie/Kconfig whitespace (Andreas Ziegler)
   - Include pci/hotplug Kconfig directly from pci/Kconfig (Bjorn Helgaas)
   - Include pci/pcie/Kconfig directly from pci/Kconfig (Bogicevic Sasa)
   - frv: Remove stray pci_{alloc,free}_consistent() declaration (Christoph Hellwig)
   - Move pci_dma_* helpers to common code (Christoph Hellwig)
   - Add PCI_CLASS_SERIAL_USB_DEVICE definition (Heikki Krogerus)
   - Add QEMU top-level IDs for (sub)vendor & device (Robin H. Johnson)
   - Fix broken URL for Dell biosdevname (Naga Venkata Sai Indubhaskar Jupudi)"

* tag 'pci-v4.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (94 commits)
  PCI: Add PCI_CLASS_SERIAL_USB_DEVICE definition
  PCI: designware: Add driver for prototyping kits based on ARC SDP
  PCI: designware: Add default link up check if sub-driver doesn't override
  PCI: designware: Add generic dw_pcie_wait_for_link()
  PCI: Cleanup pci/pcie/Kconfig whitespace
  PCI: Simplify pci_create_attr() control flow
  PCI: Don't leak memory if sysfs_create_bin_file() fails
  PCI: Simplify sysfs ROM cleanup
  PCI: Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY
  MIPS: Loongson 3: Keep CPU physical (not virtual) addresses in shadow ROM resource
  MIPS: Loongson 3: Use temporary struct resource * to avoid repetition
  ia64/PCI: Keep CPU physical (not virtual) addresses in shadow ROM resource
  ia64/PCI: Use ioremap() instead of open-coded equivalent
  ia64/PCI: Use temporary struct resource * to avoid repetition
  PCI: Clean up pci_map_rom() whitespace
  PCI: Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs
  PCI: thunder: Add driver for ThunderX-pass{1,2} on-chip devices
  PCI: thunder: Add PCIe host driver for ThunderX processors
  PCI: generic: Expose pci_host_common_probe() for use by other drivers
  PCI: generic: Add pci_host_common_probe(), based on gen_pci_probe()
  ...
2016-03-16 14:45:55 -07:00
Linus Torvalds 271ecc5253 Merge branch 'akpm' (patches from Andrew)
Merge first patch-bomb from Andrew Morton:

 - some misc things

 - ofs2 updates

 - about half of MM

 - checkpatch updates

 - autofs4 update

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits)
  autofs4: fix string.h include in auto_dev-ioctl.h
  autofs4: use pr_xxx() macros directly for logging
  autofs4: change log print macros to not insert newline
  autofs4: make autofs log prints consistent
  autofs4: fix some white space errors
  autofs4: fix invalid ioctl return in autofs4_root_ioctl_unlocked()
  autofs4: fix coding style line length in autofs4_wait()
  autofs4: fix coding style problem in autofs4_get_set_timeout()
  autofs4: coding style fixes
  autofs: show pipe inode in mount options
  kallsyms: add support for relative offsets in kallsyms address table
  kallsyms: don't overload absolute symbol type for percpu symbols
  x86: kallsyms: disable absolute percpu symbols on !SMP
  checkpatch: fix another left brace warning
  checkpatch: improve UNSPECIFIED_INT test for bare signed/unsigned uses
  checkpatch: warn on bare unsigned or signed declarations without int
  checkpatch: exclude asm volatile from complex macro check
  mm: memcontrol: drop unnecessary lru locking from mem_cgroup_migrate()
  mm: migrate: consolidate mem_cgroup_migrate() calls
  mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  ...
2016-03-16 11:51:08 -07:00
Linus Torvalds 72aafdf01d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:

 - Add the CPU id for the new z13s machine

 - Add a s390 specific XOR template for RAID-5 checksumming based on the
   XC instruction.  Remove all other alternatives, XC is always faster

 - The merge of our four different stack tracers into a single one

 - Tidy up the code related to page tables, several large inline
   functions are now out-of-line.  Bloat-o-meter reports ~11K text size
   reduction

 - A binary interface for the priviledged CLP instruction to retrieve
   the hardware view of the installed PCI functions

 - Improvements for the dasd format code

 - Bug fixes and cleanups

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (31 commits)
  s390/pci: enforce fmb page boundary rule
  s390: fix floating pointer register corruption (again)
  s390/cpumf: add missing lpp magic initialization
  s390: Fix misspellings in comments
  s390/mm: split arch/s390/mm/pgtable.c
  s390/mm: uninline pmdp_xxx functions from pgtable.h
  s390/mm: uninline ptep_xxx functions from pgtable.h
  s390/pci: add ioctl interface for CLP
  s390: Use pr_warn instead of pr_warning
  s390/dasd: remove casts to dasd_*_private
  s390/dasd: Refactor dasd format functions
  s390/dasd: Simplify code in format logic
  s390/dasd: Improve dasd format code
  s390/percpu: remove this_cpu_cmpxchg_double_4
  s390/cpumf: Improve guest detection heuristics
  s390/fault: merge report_user_fault implementations
  s390/dis: use correct escape sequence for '%' character
  s390/kvm: simplify set_guest_storage_key
  s390/oprofile: add z13/z13s model numbers
  s390: add z13s model number to z13 elf platform
  ...
2016-03-16 10:47:45 -07:00
Linus Torvalds 10dc374766 One of the largest releases for KVM... Hardly any generic improvement,
but lots of architecture-specific changes.
 
 * ARM:
 - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
 - PMU support for guests
 - 32bit world switch rewritten in C
 - various optimizations to the vgic save/restore code.
 
 * PPC:
 - enabled KVM-VFIO integration ("VFIO device")
 - optimizations to speed up IPIs between vcpus
 - in-kernel handling of IOMMU hypercalls
 - support for dynamic DMA windows (DDW).
 
 * s390:
 - provide the floating point registers via sync regs;
 - separated instruction vs. data accesses
 - dirty log improvements for huge guests
 - bugfixes and documentation improvements.
 
 * x86:
 - Hyper-V VMBus hypercall userspace exit
 - alternative implementation of lowest-priority interrupts using vector
 hashing (for better VT-d posted interrupt support)
 - fixed guest debugging with nested virtualizations
 - improved interrupt tracking in the in-kernel IOAPIC
 - generic infrastructure for tracking writes to guest memory---currently
 its only use is to speedup the legacy shadow paging (pre-EPT) case, but
 in the future it will be used for virtual GPUs as well
 - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJW5r3BAAoJEL/70l94x66D2pMH/jTSWWwdTUJMctrDjPVzKzG0
 yOzHW5vSLFoFlwEOY2VpslnXzn5TUVmCAfrdmFNmQcSw6hGb3K/xA/ZX/KLwWhyb
 oZpr123ycahga+3q/ht/dFUBCCyWeIVMdsLSFwpobEBzPL0pMgc9joLgdUC6UpWX
 tmN0LoCAeS7spC4TTiTTpw3gZ/L+aB0B6CXhOMjldb9q/2CsgaGyoVvKA199nk9o
 Ngu7ImDt7l/x1VJX4/6E/17VHuwqAdUrrnbqerB/2oJ5ixsZsHMGzxQ3sHCmvyJx
 WG5L00ubB1oAJAs9fBg58Y/MdiWX99XqFhdEfxq4foZEiQuCyxygVvq3JwZTxII=
 =OUZZ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "One of the largest releases for KVM...  Hardly any generic
  changes, but lots of architecture-specific updates.

  ARM:
   - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
   - PMU support for guests
   - 32bit world switch rewritten in C
   - various optimizations to the vgic save/restore code.

  PPC:
   - enabled KVM-VFIO integration ("VFIO device")
   - optimizations to speed up IPIs between vcpus
   - in-kernel handling of IOMMU hypercalls
   - support for dynamic DMA windows (DDW).

  s390:
   - provide the floating point registers via sync regs;
   - separated instruction vs.  data accesses
   - dirty log improvements for huge guests
   - bugfixes and documentation improvements.

  x86:
   - Hyper-V VMBus hypercall userspace exit
   - alternative implementation of lowest-priority interrupts using
     vector hashing (for better VT-d posted interrupt support)
   - fixed guest debugging with nested virtualizations
   - improved interrupt tracking in the in-kernel IOAPIC
   - generic infrastructure for tracking writes to guest
     memory - currently its only use is to speedup the legacy shadow
     paging (pre-EPT) case, but in the future it will be used for
     virtual GPUs as well
   - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits)
  KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch
  KVM: x86: disable MPX if host did not enable MPX XSAVE features
  arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit
  arm64: KVM: vgic-v3: Reset LRs at boot time
  arm64: KVM: vgic-v3: Do not save an LR known to be empty
  arm64: KVM: vgic-v3: Save maintenance interrupt state only if required
  arm64: KVM: vgic-v3: Avoid accessing ICH registers
  KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit
  KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit
  KVM: arm/arm64: vgic-v2: Reset LRs at boot time
  KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty
  KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function
  KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required
  KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers
  KVM: s390: allocate only one DMA page per VM
  KVM: s390: enable STFLE interpretation only if enabled for the guest
  KVM: s390: wake up when the VCPU cpu timer expires
  KVM: s390: step the VCPU timer while in enabled wait
  KVM: s390: protect VCPU cpu timer with a seqcount
  KVM: s390: step VCPU cpu timer during kvm_run ioctl
  ...
2016-03-16 09:55:35 -07:00
Christian Borntraeger 10917b8393 s390: query dynamic DEBUG_PAGEALLOC setting
We can use debug_pagealloc_enabled() to check if we can map the identity
mapping with 1MB/2GB pages as well as to print the current setting in
dump_stack.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Linus Torvalds 710d60cbf1 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull cpu hotplug updates from Thomas Gleixner:
 "This is the first part of the ongoing cpu hotplug rework:

   - Initial implementation of the state machine

   - Runs all online and prepare down callbacks on the plugged cpu and
     not on some random processor

   - Replaces busy loop waiting with completions

   - Adds tracepoints so the states can be followed"

More detailed commentary on this work from an earlier email:
 "What's wrong with the current cpu hotplug infrastructure?

   - Asymmetry

     The hotplug notifier mechanism is asymmetric versus the bringup and
     teardown.  This is mostly caused by the notifier mechanism.

   - Largely undocumented dependencies

     While some notifiers use explicitely defined notifier priorities,
     we have quite some notifiers which use numerical priorities to
     express dependencies without any documentation why.

   - Control processor driven

     Most of the bringup/teardown of a cpu is driven by a control
     processor.  While it is understandable, that preperatory steps,
     like idle thread creation, memory allocation for and initialization
     of essential facilities needs to be done before a cpu can boot,
     there is no reason why everything else must run on a control
     processor.  Before this patch series, bringup looks like this:

       Control CPU                     Booting CPU

       do preparatory steps
       kick cpu into life

                                       do low level init

       sync with booting cpu           sync with control cpu

       bring the rest up

   - All or nothing approach

     There is no way to do partial bringups.  That's something which is
     really desired because we waste e.g.  at boot substantial amount of
     time just busy waiting that the cpu comes to life.  That's stupid
     as we could very well do preparatory steps and the initial IPI for
     other cpus and then go back and do the necessary low level
     synchronization with the freshly booted cpu.

   - Minimal debuggability

     Due to the notifier based design, it's impossible to switch between
     two stages of the bringup/teardown back and forth in order to test
     the correctness.  So in many hotplug notifiers the cancel
     mechanisms are either not existant or completely untested.

   - Notifier [un]registering is tedious

     To [un]register notifiers we need to protect against hotplug at
     every callsite.  There is no mechanism that bringup/teardown
     callbacks are issued on the online cpus, so every caller needs to
     do it itself.  That also includes error rollback.

  What's the new design?

     The base of the new design is a symmetric state machine, where both
     the control processor and the booting/dying cpu execute a well
     defined set of states.  Each state is symmetric in the end, except
     for some well defined exceptions, and the bringup/teardown can be
     stopped and reversed at almost all states.

     So the bringup of a cpu will look like this in the future:

       Control CPU                     Booting CPU

       do preparatory steps
       kick cpu into life

                                       do low level init

       sync with booting cpu           sync with control cpu

                                       bring itself up

     The synchronization step does not require the control cpu to wait.
     That mechanism can be done asynchronously via a worker or some
     other mechanism.

     The teardown can be made very similar, so that the dying cpu cleans
     up and brings itself down.  Cleanups which need to be done after
     the cpu is gone, can be scheduled asynchronously as well.

  There is a long way to this, as we need to refactor the notion when a
  cpu is available.  Today we set the cpu online right after it comes
  out of the low level bringup, which is not really correct.

  The proper mechanism is to set it to available, i.e. cpu local
  threads, like softirqd, hotplug thread etc. can be scheduled on that
  cpu, and once it finished all booting steps, it's set to online, so
  general workloads can be scheduled on it.  The reverse happens on
  teardown.  First thing to do is to forbid scheduling of general
  workloads, then teardown all the per cpu resources and finally shut it
  off completely.

  This patch series implements the basic infrastructure for this at the
  core level.  This includes the following:

   - Basic state machine implementation with well defined states, so
     ordering and prioritization can be expressed.

   - Interfaces to [un]register state callbacks

     This invokes the bringup/teardown callback on all online cpus with
     the proper protection in place and [un]installs the callbacks in
     the state machine array.

     For callbacks which have no particular ordering requirement we have
     a dynamic state space, so that drivers don't have to register an
     explicit hotplug state.

     If a callback fails, the code automatically does a rollback to the
     previous state.

   - Sysfs interface to drive the state machine to a particular step.

     This is only partially functional today.  Full functionality and
     therefor testability will be achieved once we converted all
     existing hotplug notifiers over to the new scheme.

   - Run all CPU_ONLINE/DOWN_PREPARE notifiers on the booting/dying
     processor:

       Control CPU                     Booting CPU

       do preparatory steps
       kick cpu into life

                                       do low level init

       sync with booting cpu           sync with control cpu
       wait for boot
                                       bring itself up

                                       Signal completion to control cpu

     In a previous step of this work we've done a full tree mechanical
     conversion of all hotplug notifiers to the new scheme.  The balance
     is a net removal of about 4000 lines of code.

     This is not included in this series, as we decided to take a
     different approach.  Instead of mechanically converting everything
     over, we will do a proper overhaul of the usage sites one by one so
     they nicely fit into the symmetric callback scheme.

     I decided to do that after I looked at the ugliness of some of the
     converted sites and figured out that their hotplug mechanism is
     completely buggered anyway.  So there is no point to do a
     mechanical conversion first as we need to go through the usage
     sites one by one again in order to achieve a full symmetric and
     testable behaviour"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  cpu/hotplug: Document states better
  cpu/hotplug: Fix smpboot thread ordering
  cpu/hotplug: Remove redundant state check
  cpu/hotplug: Plug death reporting race
  rcu: Make CPU_DYING_IDLE an explicit call
  cpu/hotplug: Make wait for dead cpu completion based
  cpu/hotplug: Let upcoming cpu bring itself fully up
  arch/hotplug: Call into idle with a proper state
  cpu/hotplug: Move online calls to hotplugged cpu
  cpu/hotplug: Create hotplug threads
  cpu/hotplug: Split out the state walk into functions
  cpu/hotplug: Unpark smpboot threads from the state machine
  cpu/hotplug: Move scheduler cpu_online notifier to hotplug core
  cpu/hotplug: Implement setup/removal interface
  cpu/hotplug: Make target state writeable
  cpu/hotplug: Add sysfs state interface
  cpu/hotplug: Hand in target state to _cpu_up/down
  cpu/hotplug: Convert the hotplugged cpu work to a state machine
  cpu/hotplug: Convert to a state machine for the control processor
  cpu/hotplug: Add tracepoints
  ...
2016-03-15 13:50:29 -07:00
Bjorn Helgaas 18e5e6913b Merge branches 'pci/aer', 'pci/enumeration', 'pci/kconfig', 'pci/misc', 'pci/virtualization' and 'pci/vpd' into next
* pci/aer:
  PCI/AER: Log aer_inject error injections
  PCI/AER: Log actual error causes in aer_inject
  PCI/AER: Use dev_warn() in aer_inject
  PCI/AER: Fix aer_inject error codes

* pci/enumeration:
  PCI: Fix broken URL for Dell biosdevname

* pci/kconfig:
  PCI: Cleanup pci/pcie/Kconfig whitespace
  PCI: Include pci/hotplug Kconfig directly from pci/Kconfig
  PCI: Include pci/pcie/Kconfig directly from pci/Kconfig

* pci/misc:
  PCI: Add PCI_CLASS_SERIAL_USB_DEVICE definition
  PCI: Add QEMU top-level IDs for (sub)vendor & device
  unicore32: Remove unused HAVE_ARCH_PCI_SET_DMA_MASK definition
  PCI: Consolidate PCI DMA constants and interfaces in linux/pci-dma-compat.h
  PCI: Move pci_dma_* helpers to common code
  frv/PCI: Remove stray pci_{alloc,free}_consistent() declaration

* pci/virtualization:
  PCI: Wait for up to 1000ms after FLR reset
  PCI: Support SR-IOV on any function type

* pci/vpd:
  PCI: Prevent VPD access for buggy devices
  PCI: Sleep rather than busy-wait for VPD access completion
  PCI: Fold struct pci_vpd_pci22 into struct pci_vpd
  PCI: Rename VPD symbols to remove unnecessary "pci22"
  PCI: Remove struct pci_vpd_ops.release function pointer
  PCI: Move pci_vpd_release() from header file to pci/access.c
  PCI: Move pci_read_vpd() and pci_write_vpd() close to other VPD code
  PCI: Determine actual VPD size on first access
  PCI: Use bitfield instead of bool for struct pci_vpd_pci22.busy
  PCI: Allow access to VPD attributes with size 0
  PCI: Update VPD definitions
2016-03-15 08:55:02 -05:00
Linus Torvalds d4e796152a Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle are:

   - Make schedstats a runtime tunable (disabled by default) and
     optimize it via static keys.

     As most distributions enable CONFIG_SCHEDSTATS=y due to its
     instrumentation value, this is a nice performance enhancement.
     (Mel Gorman)

   - Implement 'simple waitqueues' (swait): these are just pure
     waitqueues without any of the more complex features of full-blown
     waitqueues (callbacks, wake flags, wake keys, etc.).  Simple
     waitqueues have less memory overhead and are faster.

     Use simple waitqueues in the RCU code (in 4 different places) and
     for handling KVM vCPU wakeups.

     (Peter Zijlstra, Daniel Wagner, Thomas Gleixner, Paul Gortmaker,
     Marcelo Tosatti)

   - sched/numa enhancements (Rik van Riel)

   - NOHZ performance enhancements (Rik van Riel)

   - Various sched/deadline enhancements (Steven Rostedt)

   - Various fixes (Peter Zijlstra)

   - ... and a number of other fixes, cleanups and smaller enhancements"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
  sched/cputime: Fix steal_account_process_tick() to always return jiffies
  sched/deadline: Remove dl_new from struct sched_dl_entity
  Revert "kbuild: Add option to turn incompatible pointer check into error"
  sched/deadline: Remove superfluous call to switched_to_dl()
  sched/debug: Fix preempt_disable_ip recording for preempt_disable()
  sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity
  time, acct: Drop irq save & restore from __acct_update_integrals()
  acct, time: Change indentation in __acct_update_integrals()
  sched, time: Remove non-power-of-two divides from __acct_update_integrals()
  sched/rt: Kick RT bandwidth timer immediately on start up
  sched/debug: Add deadline scheduler bandwidth ratio to /proc/sched_debug
  sched/debug: Move sched_domain_sysctl to debug.c
  sched/debug: Move the /sys/kernel/debug/sched_features file setup into debug.c
  sched/rt: Fix PI handling vs. sched_setscheduler()
  sched/core: Remove duplicated sched_group_set_shares() prototype
  sched/fair: Consolidate nohz CPU load update code
  sched/fair: Avoid using decay_load_missed() with a negative value
  sched/deadline: Always calculate end of period on sched_yield()
  sched/cgroup: Fix cgroup entity load tracking tear-down
  rcu: Use simple wait queues where possible in rcutree
  ...
2016-03-14 19:14:06 -07:00
Linus Torvalds fbed0bc091 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking changes from Ingo Molnar:
 "Various updates:

   - Futex scalability improvements: remove page lock use for shared
     futex get_futex_key(), which speeds up 'perf bench futex hash'
     benchmarks by over 40% on a 60-core Westmere.  This makes anon-mem
     shared futexes perform close to private futexes.  (Mel Gorman)

   - lockdep hash collision detection and fix (Alfredo Alvarez
     Fernandez)

   - lockdep testing enhancements (Alfredo Alvarez Fernandez)

   - robustify lockdep init by using hlists (Andrew Morton, Andrey
     Ryabinin)

   - mutex and csd_lock micro-optimizations (Davidlohr Bueso)

   - small x86 barriers tweaks (Michael S Tsirkin)

   - qspinlock updates (Waiman Long)"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  locking/csd_lock: Use smp_cond_acquire() in csd_lock_wait()
  locking/csd_lock: Explicitly inline csd_lock*() helpers
  futex: Replace barrier() in unqueue_me() with READ_ONCE()
  locking/lockdep: Detect chain_key collisions
  locking/lockdep: Prevent chain_key collisions
  tools/lib/lockdep: Fix link creation warning
  tools/lib/lockdep: Add tests for AA and ABBA locking
  tools/lib/lockdep: Add userspace version of READ_ONCE()
  tools/lib/lockdep: Fix the build on recent kernels
  locking/qspinlock: Move __ARCH_SPIN_LOCK_UNLOCKED to qspinlock_types.h
  locking/mutex: Allow next waiter lockless wakeup
  locking/pvqspinlock: Enable slowpath locking count tracking
  locking/qspinlock: Use smp_cond_acquire() in pending code
  locking/pvqspinlock: Move lock stealing count tracking code into pv_queued_spin_steal_lock()
  locking/mcs: Fix mcs_spin_lock() ordering
  futex: Remove requirement for lock_page() in get_futex_key()
  futex: Rename barrier references in ordering guarantees
  locking/atomics: Update comment about READ_ONCE() and structures
  locking/lockdep: Eliminate lockdep_init()
  locking/lockdep: Convert hash tables to hlists
  ...
2016-03-14 15:50:44 -07:00
Linus Torvalds d37a14bb5f Merge branch 'core-resources-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull ram resource handling changes from Ingo Molnar:
 "Core kernel resource handling changes to support NVDIMM error
  injection.

  This tree introduces a new I/O resource type, IORESOURCE_SYSTEM_RAM,
  for System RAM while keeping the current IORESOURCE_MEM type bit set
  for all memory-mapped ranges (including System RAM) for backward
  compatibility.

  With this resource flag it no longer takes a strcmp() loop through the
  resource tree to find "System RAM" resources.

  The new resource type is then used to extend ACPI/APEI error injection
  facility to also support NVDIMM"

* 'core-resources-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ACPI/EINJ: Allow memory error injection to NVDIMM
  resource: Kill walk_iomem_res()
  x86/kexec: Remove walk_iomem_res() call with GART type
  x86, kexec, nvdimm: Use walk_iomem_res_desc() for iomem search
  resource: Add walk_iomem_res_desc()
  memremap: Change region_intersects() to take @flags and @desc
  arm/samsung: Change s3c_pm_run_res() to use System RAM type
  resource: Change walk_system_ram() to use System RAM type
  drivers: Initialize resource entry to zero
  xen, mm: Set IORESOURCE_SYSTEM_RAM to System RAM
  kexec: Set IORESOURCE_SYSTEM_RAM for System RAM
  arch: Set IORESOURCE_SYSTEM_RAM flag for System RAM
  ia64: Set System RAM type and descriptor
  x86/e820: Set System RAM type and descriptor
  resource: Add I/O resource descriptor
  resource: Handle resource flags properly
  resource: Add System RAM resource type
2016-03-14 15:15:51 -07:00
Sebastian Ott 80c544ded2 s390/pci: enforce fmb page boundary rule
The function measurement block must not cross a page boundary. Ensure
that by raising the alignment requirement to the smallest power of 2
larger than the size of the fmb.

Fixes: d0b088531 ("s390/pci: performance statistics and debug infrastructure")
Cc: stable@vger.kernel.org # v3.8+
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-14 16:19:09 +01:00
Alexander Duyck 01cfbad79a ipv4: Update parameters for csum_tcpudp_magic to their original types
This patch updates all instances of csum_tcpudp_magic and
csum_tcpudp_nofold to reflect the types that are usually used as the source
inputs.  For example the protocol field is populated based on nexthdr which
is actually an unsigned 8 bit value.  The length is usually populated based
on skb->len which is an unsigned integer.

This addresses an issue in which the IPv6 function csum_ipv6_magic was
generating a checksum using the full 32b of skb->len while
csum_tcpudp_magic was only using the lower 16 bits.  As a result we could
run into issues when attempting to adjust the checksum as there was no
protocol agnostic way to update it.

With this change the value is still truncated as many architectures use
"(len + proto) << 8", however this truncation only occurs for values
greater than 16776960 in length and as such is unlikely to occur as we stop
the inner headers at ~64K in size.

I did have to make a few minor changes in the arm, mn10300, nios2, and
score versions of the function in order to support these changes as they
were either using things such as an OR to combine the protocol and length,
or were using ntohs to convert the length which would have truncated the
value.

I also updated a few spots in terms of whitespace and type differences for
the addresses.  Most of this was just to make sure all of the definitions
were in sync going forward.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13 23:55:13 -04:00
Linus Torvalds f2c1242194 A few simple fixes for ARM, x86, PPC and generic code. The x86 MMU fix
is a bit larger because the surrounding code needed a cleanup, but
 nothing worrisome.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJW4UwZAAoJEL/70l94x66DG3YH/0PfUr4sW0jnWRVXmYlPVka4
 sNFYrdtYnx08PwXu2sWMm1F+OBXlF/t0ZSJXJ9OBF8WdKIu8TU4yBOINRAvGO/oE
 slrivjktLTKgicTtIXP5BpRR14ohwHIGcuiIlppxvnhmQz1/rMtig7fvhZxYI545
 lJyIbyquNR86tiVdUSG9/T9+ulXXXCvOspYv8jPXZx7VKBXKTvp5P5qavSqciRb+
 O9RqY+GDCR/5vrw+MV0J7H9ZydeEJeD02LcWguTGMATTm0RCrhydvSbou42UcKfY
 osWii0kwt2LhcM/sTOz+cWnLJ6gwU9T+ZtJTTbLvYWXWDLP/+icp9ACMkwNciNo=
 =/y4V
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "A few simple fixes for ARM, x86, PPC and generic code.

  The x86 MMU fix is a bit larger because the surrounding code needed a
  cleanup, but nothing worrisome"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0
  KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
  kvm: cap halt polling at exactly halt_poll_ns
  KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUS
  KVM: VMX: disable PEBS before a guest entry
  KVM: PPC: Book3S HV: Sanitize special-purpose register values on guest exit
2016-03-10 10:42:15 -08:00
Paolo Bonzini f958ee745f KVM: s390: Fixes and features for kvm/next (4.6) part 2
- add watchdog diagnose to trace event decoder
 - better handle the cpu timer when not inside the guest
 - only provide STFLE if the CPU model has STFLE
 - reduce DMA page usage
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJW3/XrAAoJEBF7vIC1phx8D7cP/RGZqjVQF8S0tOpzGl2EHRYd
 8MyTA2b53wUQx1cR35U1Tx1mKoTBoAcwADVJljCsCzCuGp/Bae9kT+T1kz4/aWEv
 vlQqE7+63VkPsgQxFCcKA6CKagFLz+2QBmNFMdYStjHjN/MNBDwfo58I9Y9g2DdB
 r7cjdExKC1WhCUguDv8eREtNWGRGC8IDV8ID02xTABTSk7LYcIGok0YIRjsZaV5b
 W1gchW5GicOTp/mktkCsNPhnFg28owsG2LVcb5zeGQuq0TowvrSvRzxCOrHM1lKS
 WU28FDvFn64bPvSAu1w894YKMsMxvm9SX5SWQ7nA5JkDko5lQ49H77M2YipMLUXY
 F+KN9TlS352fzGr5i7TaPUOLAtFDhdtM0NAicUf5+ydR73DLetQOvMqqenmmP1Mw
 K9hgAviFtKfnScIomCPrmeel5ddIl0OTshiZMGIbS6/lAGRZ5aJuXPxUgMhFiz95
 YgjH2tb16eR0eYAs2PIt2bsJVis4x2MfFi+PExYzWP5vERxzMthE879Ra42NBfT1
 tswxtvvQtXgXrkakwYkCTXqnja0JLNOhfKvIL73/am9DbwUmikWfVYOoq/yEe/w9
 Gf5ZHzACrokHDqeKbMT6xzGEk4C1QEn7VRc4H4MX47xseVVXb2GxPYFNASgL2SSU
 Qe6+IILH9uDhDkXVHIJD
 =n2MS
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-4.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fixes and features for kvm/next (4.6) part 2

- add watchdog diagnose to trace event decoder
- better handle the cpu timer when not inside the guest
- only provide STFLE if the CPU model has STFLE
- reduce DMA page usage
2016-03-10 17:06:07 +01:00
Martin Schwidefsky e370e47694 s390: fix floating pointer register corruption (again)
There is a tricky interaction between the machine check handler
and the critical sections of load_fpu_regs and save_fpu_regs
functions. If the machine check interrupts one of the two
functions the critical section cleanup will complete the function
before the machine check handler s390_do_machine_check is called.
Trouble is that the machine check handler needs to validate the
floating point registers *before* and not *after* the completion
of load_fpu_regs/save_fpu_regs.

The simplest solution is to rewind the PSW to the start of the
load_fpu_regs/save_fpu_regs and retry the function after the
return from the machine check handler.

Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: <stable@vger.kernel.org> # 4.3+
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-10 14:35:42 +01:00
Heiko Carstens 8f100bb1ff s390/cpumf: add missing lpp magic initialization
Add the missing lpp magic initialization for cpu 0. Without this all
samples on cpu 0 do not have the most significant bit set in the
program parameter field, which we use to distinguish between guest and
host samples if the pid is also 0.

We did initialize the lpp magic in the absolute zero lowcore but
forgot that when switching to the allocated lowcore on cpu 0 only.

Reported-by: Shu Juan Zhang <zhshuj@cn.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org # v4.4+
Fixes: e22cf8ca6f ("s390/cpumf: rework program parameter setting to detect guest samples")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-10 14:35:35 +01:00
Martin Schwidefsky 3446c13b26 s390/mm: four page table levels vs. fork
The fork of a process with four page table levels is broken since
git commit 6252d702c5 "[S390] dynamic page tables."

All new mm contexts are created with three page table levels and
an asce limit of 4TB. If the parent has four levels dup_mmap will
add vmas to the new context which are outside of the asce limit.
The subsequent call to copy_page_range will walk the three level
page table structure of the new process with non-zero pgd and pud
indexes. This leads to memory clobbers as the pgd_index *and* the
pud_index is added to the mm->pgd pointer without a pgd_deref
in between.

The init_new_context() function is selecting the number of page
table levels for a new context. The function is used by mm_init()
which in turn is called by dup_mm() and mm_alloc(). These two are
used by fork() and exec(). The init_new_context() function can
distinguish the two cases by looking at mm->context.asce_limit,
for fork() the mm struct has been copied and the number of page
table levels may not change. For exec() the mm_alloc() function
set the new mm structure to zero, in this case a three-level page
table is created as the temporary stack space is located at
STACK_TOP_MAX = 4TB.

This fixes CVE-2016-2143.

Reported-by: Marcin Kościelnicki <koriakin@0x04.net>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-10 09:21:24 +01:00
Paolo Bonzini ab92f30875 KVM/ARM updates for 4.6
- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
 - PMU support for guests
 - 32bit world switch rewritten in C
 - Various optimizations to the vgic save/restore code
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW36xjAAoJECPQ0LrRPXpDGQkQAMDppzcTOixT3e8VPdHAX09a
 Z5PO0gyTMVV7Jyz5Ul3pedPJA2GSK9mxOCwqvIFbdxLAR6ZB00juO5FrTHkSdI91
 1XLPj4bKoMWcVvhL/g5A4Glp/pVMW1k/9Yq8zZAtYlsLRlqG5rLOutSadcqHcYaJ
 cTD/pFf7b2oPtkTPyoFml75KgHBT/8uvAvFDOWA66Id2z6T11+PsBT/6XnGDiwKg
 tpGTNzx3kPIKIzOAOHqVW6UBxFOeabebXLT8wUz3VwNn/UbG6gkumMNApMAyF2q1
 zU0nAh8+7Ek6Dr4OFWE6BfW6sgg/l7i1lA8XoAmqG7ZTrSptCc59fvaZJxPruG+Q
 dMsU6QgR77JJjbZTinf9a1jReZ/liZrx2gZXedVKdILrjmDSq0UnGcxjUOEDZOGy
 2/dbrlJhv+LhpcJtuPpxPCfoqbW5L0ynzmuYuXRdRz3lTHiOWIRx5gugrhO+wH4D
 4gvZhbw3XCiYfpYHYhl8A1EH5kanKgdXDocz9yIm7mZm89gngufF/HkeXS3ZU25T
 yThyBGulGjqN4FCdgf1HolkTfFjnfSx4qJovJ58eHga+HNLXRkTecZZcbFy2OOHv
 8Bx0PIlwj4RgSaRLWQUudAhdhKS2g22DKDDljxFwhkMPNghvqkYMJCRDKLu6GBXQ
 4YsLKM+TaShHFjSpx+ao
 =rpvb
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM updates for 4.6

- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
- PMU support for guests
- 32bit world switch rewritten in C
- Various optimizations to the vgic save/restore code

Conflicts:
	include/uapi/linux/kvm.h
2016-03-09 11:50:42 +01:00
Bjorn Helgaas e7e127e3c7 PCI: Include pci/hotplug Kconfig directly from pci/Kconfig
Include pci/hotplug/Kconfig directly from pci/Kconfig, so arches don't
have to source both pci/Kconfig and pci/hotplug/Kconfig.

Note that this effectively adds pci/hotplug/Kconfig to the following
arches, because they already sourced drivers/pci/Kconfig but they
previously did not source drivers/pci/hotplug/Kconfig:

  alpha
  arm
  avr32
  frv
  m68k
  microblaze
  mn10300
  sparc
  unicore32

Inspired-by-patch-from: Bogicevic Sasa <brutallesale@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-08 15:10:48 -06:00
Bogicevic Sasa 5f8fc43217 PCI: Include pci/pcie/Kconfig directly from pci/Kconfig
Include pci/pcie/Kconfig directly from pci/Kconfig, so arches don't
have to source both pci/Kconfig and pci/pcie/Kconfig.

Note that this effectively adds pci/pcie/Kconfig to the following
arches, because they already sourced drivers/pci/Kconfig but they
previously did not source drivers/pci/pcie/Kconfig:

  alpha
  avr32
  blackfin
  frv
  m32r
  m68k
  microblaze
  mn10300
  parisc
  sparc
  unicore32
  xtensa

[bhelgaas: changelog, source pci/pcie/Kconfig at top of pci/Kconfig, whitespace]
Signed-off-by: Sasa Bogicevic <brutallesale@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-08 14:36:48 -06:00
David S. Miller 810813c47a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several cases of overlapping changes, as well as one instance
(vxlan) of a bug fix in 'net' overlapping with code movement
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 12:34:12 -05:00
Adam Buchbinder 7eb792bf7c s390: Fix misspellings in comments
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08 15:00:17 +01:00
Martin Schwidefsky 1e133ab296 s390/mm: split arch/s390/mm/pgtable.c
The pgtable.c file is quite big, before it grows any larger split it
into pgtable.c, pgalloc.c and gmap.c. In addition move the gmap related
header definitions into the new gmap.h header and all of the pgste
helpers from pgtable.h to pgtable.c.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08 15:00:15 +01:00
Martin Schwidefsky 227be799c3 s390/mm: uninline pmdp_xxx functions from pgtable.h
The pmdp_xxx function are smaller than their ptep_xxx counterparts
but to keep things symmetrical unline them as well.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08 15:00:14 +01:00
Martin Schwidefsky ebde765c0e s390/mm: uninline ptep_xxx functions from pgtable.h
The code in the various ptep_xxx functions has grown quite large,
consolidate them to four out-of-line functions:
  ptep_xchg_direct to exchange a pte with another with immediate flushing
  ptep_xchg_lazy to exchange a pte with another in a batched update
  ptep_modify_prot_start to begin a protection flags update
  ptep_modify_prot_commit to commit a protection flags update

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08 15:00:12 +01:00
David Hildenbrand c54f0d6ae0 KVM: s390: allocate only one DMA page per VM
We can fit the 2k for the STFLE interpretation and the crypto
control block into one DMA page. As we now only have to allocate
one DMA page, we can clean up the code a bit.

As a nice side effect, this also fixes a problem with crycbd alignment in
case special allocation debug options are enabled, debugged by Sascha
Silbe.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08 13:57:54 +01:00
David Hildenbrand 80bc79dc0b KVM: s390: enable STFLE interpretation only if enabled for the guest
Not setting the facility list designation disables STFLE interpretation,
this is what we want if the guest was told to not have it.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08 13:57:54 +01:00
David Hildenbrand b3c17f10fa KVM: s390: wake up when the VCPU cpu timer expires
When the VCPU cpu timer expires, we have to wake up just like when the ckc
triggers. For now, setting up a cpu timer in the guest and going into
enabled wait will never lead to a wakeup. This patch fixes this problem.
Just as for the ckc, we have to take care of waking up too early. We
have to recalculate the sleep time and go back to sleep.

Please note that the timer callback calls kvm_s390_get_cpu_timer() from
interrupt context. As the timer is canceled when leaving handle_wait(),
and we don't do any VCPU cpu timer writes/updates in that function, we can
be sure that we will never try to read the VCPU cpu timer from the same cpu
that is currentyl updating the timer (deadlock).

Reported-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Tested-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08 13:57:53 +01:00
David Hildenbrand 5ebda31686 KVM: s390: step the VCPU timer while in enabled wait
The cpu timer is a mean to measure task execution time. We want
to account everything for a VCPU for which it is responsible. Therefore,
if the VCPU wants to sleep, it shall be accounted for it.

We can easily get this done by not disabling cpu timer accounting when
scheduled out while sleeping because of enabled wait.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08 13:57:53 +01:00
David Hildenbrand 9c23a1318e KVM: s390: protect VCPU cpu timer with a seqcount
For now, only the owning VCPU thread (that has loaded the VCPU) can get a
consistent cpu timer value when calculating the delta. However, other
threads might also be interested in a more recent, consistent value. Of
special interest will be the timer callback of a VCPU that executes without
having the VCPU loaded and could run in parallel with the VCPU thread.

The cpu timer has a nice property: it is only updated by the owning VCPU
thread. And speaking about accounting, a consistent value can only be
calculated by looking at cputm_start and the cpu timer itself in
one shot, otherwise the result might be wrong.

As we only have one writing thread at a time (owning VCPU thread), we can
use a seqcount instead of a seqlock and retry if the VCPU refreshed its
cpu timer. This avoids any heavy locking and only introduces a counter
update/check plus a handful of smp_wmb().

The owning VCPU thread should never have to retry on reads, and also for
other threads this might be a very rare scenario.

Please note that we have to use the raw_* variants for locking the seqcount
as lockdep will produce false warnings otherwise. The rq->lock held during
vcpu_load/put is also acquired from hardirq context. Lockdep cannot know
that we avoid potential deadlocks by disabling preemption and thereby
disable concurrent write locking attempts (via vcpu_put/load).

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08 13:57:53 +01:00
David Hildenbrand db0758b297 KVM: s390: step VCPU cpu timer during kvm_run ioctl
Architecturally we should only provide steal time if we are scheduled
away, and not if the host interprets a guest exit. We have to step
the guest CPU timer in these cases.

In the first shot, we will step the VCPU timer only during the kvm_run
ioctl. Therefore all time spent e.g. in interception handlers or on irq
delivery will be accounted for that VCPU.

We have to take care of a few special cases:
- Other VCPUs can test for pending irqs. We can only report a consistent
  value for the VCPU thread itself when adding the delta.
- We have to take care of STP sync, therefore we have to extend
  kvm_clock_sync() and disable preemption accordingly
- During any call to disable/enable/start/stop we could get premeempted
  and therefore get start/stop calls. Therefore we have to make sure we
  don't get into an inconsistent state.

Whenever a VCPU is scheduled out, sleeping, in user space or just about
to enter the SIE, the guest cpu timer isn't stepped.

Please note that all primitives are prepared to be called from both
environments (cpu timer accounting enabled or not), although not completely
used in this patch yet (e.g. kvm_s390_set_cpu_timer() will never be called
while cpu timer accounting is enabled).

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08 13:57:52 +01:00
David Hildenbrand 4287f247f6 KVM: s390: abstract access to the VCPU cpu timer
We want to manually step the cpu timer in certain scenarios in the future.
Let's abstract any access to the cpu timer, so we can hide the complexity
internally.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08 13:57:52 +01:00
David Hildenbrand 01a745ac8b KVM: s390: store cpu id in vcpu->cpu when scheduled in
By storing the cpu id, we have a way to verify if the current cpu is
owning a VCPU.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08 13:57:51 +01:00
Alexander Yarygin dce382b6c7 KVM: s390: Add diag "watchdog functions" to trace event decoding
DIAG 0x288 may occur now. Let's add its code to the diag table in
sie.h.

Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08 13:57:51 +01:00
David Hildenbrand 9522b37f5a KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUS
With MACHINE_HAS_VX, we convert the floating point registers from the
vector registeres when storing the status. For other VCPUs, these are
stored to vcpu->run->s.regs.vrs, but we are using current->thread.fpu.vxrs,
which resolves to the currently loaded VCPU.

So kvm_s390_store_status_unloaded() currently writes the wrong floating
point registers (converted from the vector registers) when called from
another VCPU on a z13.

This is only the case for old user space not handling SIGP STORE STATUS and
SIGP STOP AND STORE STATUS, but relying on the kernel implementation. All
other calls come from the loaded VCPU via kvm_s390_store_status().

Fixes: 9abc2a08a7 (KVM: s390: fix memory overwrites when vx is disabled)
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08 12:47:01 +01:00
Christian Borntraeger 7a76aa95f6 s390/cpumf: Fix lpp detection
we have to check bit 40 of the facility list before issuing LPP
and not bit 48. Otherwise a guest running on a system with
"The decimal-floating-point zoned-conversion facility" and without
the "The set-program-parameters facility" might crash on an lpp
instruction.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org # v4.4+
Fixes: e22cf8ca6f ("s390/cpumf: rework program parameter setting to detect guest samples")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08 10:38:06 +01:00
Christoph Hellwig bc4b024a8b PCI: Move pci_dma_* helpers to common code
For a long time all architectures implement the pci_dma_* functions using
the generic DMA API, and they all use the same header to do so.

Move this header, pci-dma-compat.h, to include/linux and include it from
the generic pci.h instead of having each arch duplicate this include.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-07 10:40:02 -06:00
Martin Schwidefsky 988b86e69d s390/pci: add ioctl interface for CLP
Provide a user space interface to issue call logical-processor instructions.
Only selected CLP commands are allowed, enough to get the full overview of
the installed PCI functions.

Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-07 16:54:32 +01:00
Joe Perches baebc70a4d s390: Use pr_warn instead of pr_warning
Convert the uses of pr_warning to pr_warn so there are fewer
uses of the old pr_warning.

Miscellanea:

o Align arguments
o Coalesce formats

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-07 13:12:04 +01:00
Jiri Kosina 335e073faa klp: remove CONFIG_LIVEPATCH dependency from klp headers
There is no need for livepatch.h (generic and arch-specific) to depend
on CONFIG_LIVEPATCH. Remove that superfluous dependency.

Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-03-06 22:22:10 +01:00
Miroslav Benes b24b78a113 klp: remove superfluous errors in asm/livepatch.h
There is an #error in asm/livepatch.h for both x86 and s390 in
!CONFIG_LIVEPATCH cases. It does not make much sense as pointed out by
Michael Ellerman. One can happily include asm/livepatch.h with
CONFIG_LIVEPATCH. Remove it as useless.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-03-06 22:19:26 +01:00
Ingo Molnar bc94b99636 Linux 4.5-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW0yM6AAoJEHm+PkMAQRiGeUwIAJRTHFPJTFpJcJjeZEV4/EL1
 7Pl0WSHs/CWBkXIevAg2HgkECSQ9NI9FAUFvoGxCldDpFAnL1U2QV8+Ur2qhiXMG
 5v0jILJuiw57qT/NfhEudZolerlRoHILmB3JRTb+DUV4GHZuWpTkJfUSI9j5aTEl
 w83XUgtK4bKeIyFbHdWQk6xqfzfFBSuEITuSXreOMwkFfMmeScE0WXOPLBZWyhPa
 v0rARJLYgM+vmRAnJjnG8unH+SgnqiNcn2oOFpevKwmpVcOjcEmeuxh/HdeZf7HM
 /R8F86OwdmXsO+z8dQxfcucLg+I9YmKfFr8b6hopu1sRztss2+Uk6H1j2J7IFIg=
 =tvkh
 -----END PGP SIGNATURE-----

Merge tag 'v4.5-rc6' into core/resources, to resolve conflict

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-04 12:12:08 +01:00
Christian Borntraeger e82becfc18 s390/dma: Allow per device dma ops
As virtio-ccw will have dma ops, we can no longer default to the
zPCI ones. Make use of dev_archdata to keep the dma_ops per device.
The pci devices now use that to override the default, and the
default is changed to use the noop ops for everything that does not
specify a device specific one.
To compile without PCI support we will enable HAS_DMA all the time,
via the default config in lib/Kconfig.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-02 17:01:56 +02:00
Heiko Carstens f369b98ee3 s390/percpu: remove this_cpu_cmpxchg_double_4
git commit 26f15caaf9 ("s390/cmpxchg: simplify cmpxchg_double")
removed support for cmpxchg_double for two consecutive four byte
values, for which it would generate a cds instruction.

However I forgot to remove the corresponding define in our percpu
header file, which means that this_cpu_cmpxchg_double would now
incorrectly generate a cdsg instruction if being used on a double four
byte location. Therefore remove the percpu define as well.

There is currently no user and therefore no bug fixed with
this. Obviously any such user could and should simply use cmpxchg.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-02 06:44:30 -06:00
Christian Borntraeger b1685ab9bd s390/cpumf: Improve guest detection heuristics
commit e22cf8ca6f ("s390/cpumf: rework program parameter setting
to detect guest samples") requires guest changes to get proper
guest/host. We can do better: We can use the primary asn value,
which is set on all Linux variants to compare this with the host
pp value.
We now have the following cases:
1. Guest using PP
host sample:  gpp == 0, asn == hpp --> host
guest sample: gpp != 0 --> guest
2. Guest not using PP
host sample:  gpp == 0, asn == hpp --> host
guest sample: gpp == 0, asn != hpp --> guest

As soon as the host no longer sets CR4, we must back out
this heuristics - let's add a comment in switch_to.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-02 06:44:28 -06:00
Heiko Carstens 5d7eccecf8 s390/fault: merge report_user_fault implementations
We have two close to identical report_user_fault functions.
Add a parameter to one and get rid of the other one in order
to reduce code duplication.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-02 06:44:27 -06:00
Heiko Carstens ee8479bb97 s390/dis: use correct escape sequence for '%' character
The double escape character sequence introduced with commit
272fa59ccb ("s390/dis: Fix handling of format specifiers") is not
necessary anymore since commit 561e103002 ("s390/dis: Fix printing
of the register numbers").

Instead this now generates an extra '%' character:

lg      %%r1,160(%%r11)

So fix this and basically revert 272fa59ccb.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-02 06:44:25 -06:00
Martin Schwidefsky 443a813304 s390/kvm: simplify set_guest_storage_key
Git commit ab3f285f22
"KVM: s390/mm: try a cow on read only pages for key ops"
added a fixup_user_fault to set_guest_storage_key force a copy on
write if the page is mapped read-only. This is supposed to fix the
problem of differing storage keys for shared mappings, e.g. the
empty_zero_page.
But if the storage key is set before the pte is mapped the storage
key update is done on the pgste. A later fault will happily map the
shared page with the key from the pgste.

Eventually git commit 2faee8ff9d
"s390/mm: prevent and break zero page mappings in case of storage keys"
fixed this problem for the empty_zero_page. The commit makes sure that
guests enabled for storage keys will not use the empty_zero_page at all.

As the call to fixup_user_fault in set_guest_storage_key depends on the
order of the storage key operation vs. the fault that maps the pte
it does not really fix anything. Just remove it.

Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-02 06:44:22 -06:00
Thomas Gleixner fc6d73d674 arch/hotplug: Call into idle with a proper state
Let the non boot cpus call into idle with the corresponding hotplug state, so
the hotplug core can handle the further bringup. That's a first step to
convert the boot side of the hotplugged cpus to do all the synchronization
with the other side through the state machine. For now it'll only start the
hotplug thread and kick the full bringup of the cpu.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Rik van Riel <riel@redhat.com>
Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: http://lkml.kernel.org/r/20160226182341.614102639@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-01 20:36:57 +01:00
Ingo Molnar 39a1142dbb Linux 4.5-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW0yM6AAoJEHm+PkMAQRiGeUwIAJRTHFPJTFpJcJjeZEV4/EL1
 7Pl0WSHs/CWBkXIevAg2HgkECSQ9NI9FAUFvoGxCldDpFAnL1U2QV8+Ur2qhiXMG
 5v0jILJuiw57qT/NfhEudZolerlRoHILmB3JRTb+DUV4GHZuWpTkJfUSI9j5aTEl
 w83XUgtK4bKeIyFbHdWQk6xqfzfFBSuEITuSXreOMwkFfMmeScE0WXOPLBZWyhPa
 v0rARJLYgM+vmRAnJjnG8unH+SgnqiNcn2oOFpevKwmpVcOjcEmeuxh/HdeZf7HM
 /R8F86OwdmXsO+z8dQxfcucLg+I9YmKfFr8b6hopu1sRztss2+Uk6H1j2J7IFIg=
 =tvkh
 -----END PGP SIGNATURE-----

Merge tag 'v4.5-rc6' into locking/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:55:22 +01:00
Ingo Molnar 6aa447bcbb Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:42:07 +01:00
Tom Herbert a87cb3e48e net: Facility to report route quality of connected sockets
This patch add the SO_CNX_ADVICE socket option (setsockopt only). The
purpose is to allow an application to give feedback to the kernel about
the quality of the network path for a connected socket. The value
argument indicates the type of quality report. For this initial patch
the only supported advice is a value of 1 which indicates "bad path,
please reroute"-- the action taken by the kernel is to call
dst_negative_advice which will attempt to choose a different ECMP route,
reset the TX hash for flow label and UDP source port in encapsulation,
etc.

This facility should be useful for connected UDP sockets where only the
application can provide any feedback about path quality. It could also
be useful for TCP applications that have additional knowledge about the
path outside of the normal TCP control loop.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-25 22:01:22 -05:00
Marcelo Tosatti 8577370fb0 KVM: Use simple waitqueue for vcpu->wq
The problem:

On -rt, an emulated LAPIC timer instances has the following path:

1) hard interrupt
2) ksoftirqd is scheduled
3) ksoftirqd wakes up vcpu thread
4) vcpu thread is scheduled

This extra context switch introduces unnecessary latency in the
LAPIC path for a KVM guest.

The solution:

Allow waking up vcpu thread from hardirq context,
thus avoiding the need for ksoftirqd to be scheduled.

Normal waitqueues make use of spinlocks, which on -RT
are sleepable locks. Therefore, waking up a waitqueue
waiter involves locking a sleeping lock, which
is not allowed from hard interrupt context.

cyclictest command line:

This patch reduces the average latency in my tests from 14us to 11us.

Daniel writes:
Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency
benchmark on mainline. The test was run 1000 times on
tip/sched/core 4.4.0-rc8-01134-g0905f04:

  ./x86-run x86/tscdeadline_latency.flat -cpu host

with idle=poll.

The test seems not to deliver really stable numbers though most of
them are smaller. Paolo write:

"Anything above ~10000 cycles means that the host went to C1 or
lower---the number means more or less nothing in that case.

The mean shows an improvement indeed."

Before:

               min             max         mean           std
count  1000.000000     1000.000000  1000.000000   1000.000000
mean   5162.596000  2019270.084000  5824.491541  20681.645558
std      75.431231   622607.723969    89.575700   6492.272062
min    4466.000000    23928.000000  5537.926500    585.864966
25%    5163.000000  1613252.750000  5790.132275  16683.745433
50%    5175.000000  2281919.000000  5834.654000  23151.990026
75%    5190.000000  2382865.750000  5861.412950  24148.206168
max    5228.000000  4175158.000000  6254.827300  46481.048691

After
               min            max         mean           std
count  1000.000000     1000.00000  1000.000000   1000.000000
mean   5143.511000  2076886.10300  5813.312474  21207.357565
std      77.668322   610413.09583    86.541500   6331.915127
min    4427.000000    25103.00000  5529.756600    559.187707
25%    5148.000000  1691272.75000  5784.889825  17473.518244
50%    5160.000000  2308328.50000  5832.025000  23464.837068
75%    5172.000000  2393037.75000  5853.177675  24223.969976
max    5222.000000  3922458.00000  6186.720500  42520.379830

[Patch was originaly based on the swait implementation found in the -rt
 tree. Daniel ported it to mainline's version and gathered the
 benchmark numbers for tscdeadline_latency test.]

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-rt-users@vger.kernel.org
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1455871601-27484-4-git-send-email-wagi@monom.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-02-25 11:27:16 +01:00
Heiko Carstens 993e068108 s390/oprofile: add z13/z13s model numbers
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-24 12:18:10 +01:00
Heiko Carstens bb3aa61401 s390: add z13s model number to z13 elf platform
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-24 12:18:07 +01:00
Martin Schwidefsky 13c6a790d7 s390/mm: correct comment about segment table entries
The comment describing the bit encoding for segment table entries
is incorrect in regard to the read and write bits. The segment
read bit is 0x0002 and write is 0x0001, not the other way around.

Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-23 08:56:20 +01:00
Heiko Carstens 758d39ebd3 s390/dumpstack: merge all four stack tracers
We have four different stack tracers of which three had bugs. So it's
time to merge them to a single stack tracer which allows to specify a
call back function which will be called for each step.

This patch changes behavior a bit:

- the "nosched" and "in_sched_functions" check within
  save_stack_trace_tsk did work only for the last stack frame within a
  context. Now it considers the check for each stack frame like it
  should.

- both the oprofile variant and the perf_events variant did save a
  return address twice if a zero back chain was detected, which
  indicates an interrupt frame. The new dump_trace function will call
  the oprofile and perf_events backends with the psw address that is
  contained within the corresponding pt_regs structure instead.

- the original show_trace and save_context_stack functions did already
  use the psw address of the pt_regs structure if a zero back chain
  was detected. However now we ignore the psw address if it is a user
  space address. After all we trace the kernel stack and not the user
  space stack. This way we also get rid of the garbage user space
  address in case of warnings and / or panic call traces.

So this should make life easier since now there is only one stack
tracer left which we can break.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-23 08:56:20 +01:00
Martin Schwidefsky 3c2c126a67 s390/mm: remove unnecessary indirection with pgste_update_all
The first parameter of pgste_update_all is a pointer to a pte.
Simplify the code by passing the pte value.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-23 08:56:19 +01:00
Heiko Carstens 1f021ea0da s390/dumpstack: use bit fields to decode psw mask in show_registers()
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-23 08:56:19 +01:00
Heiko Carstens ca0fd75a54 s390/dumpstack: add missing ri bit to show_registers() output
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-23 08:56:18 +01:00
Heiko Carstens 76737ce17a s390: add current_stack_pointer() helper function
Implement current_stack_pointer() helper function and use it
everywhere, instead of having several different inline assembly
variants.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-23 08:56:18 +01:00
Heiko Carstens 77ec8a5451 s390/stacktrace: use nosched instead of savesched parameter
Use the inversed "nosched" logic like all other architectures.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-23 08:56:17 +01:00
Martin Schwidefsky 007ccec53d s390/pageattr: do a single TLB flush for change_page_attr
The change of the access rights for an address range in the kernel
address space is currently done with a loop of IPTE + a store of the
modified PTE. Between the IPTE and the store the PTE will be invalid,
this intermediate state can cause problems with concurrent accesses.

Consider a change of a kernel area from read-write to read-only, a
concurrent reader of that area should be fine but with the invalid
PTE it might get an unexpected exception.

Remove the IPTEs for each PTE and do a global flush after all PTEs
have been modified.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-23 08:56:17 +01:00
Martin Schwidefsky 2cfc5f9ce7 s390/xor: optimized xor routing using the XC instruction
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-23 08:56:17 +01:00
Sebastian Ott 9a99649f2a s390/pci: remove pdev pointer from arch data
For each PCI function we need to maintain arch specific data in
struct zpci_dev which also contains a pointer to struct pci_dev.

When a function is registered or deregistered (which is triggered by PCI
common code) we need to adjust that pointer which could interfere with
the machine check handler (triggered by FW) using zpci_dev->pdev.

Since multiple instances of the same pdev could exist at a time this can't
be solved with locking.

Fix that by ditching the pdev pointer and use a bus walk to reach
struct pci_dev (only one instance of a pdev can be registered at the bus
at a time).

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-23 08:56:16 +01:00
Martin Schwidefsky 1b17cb796f s390/fpu: signals vs. floating point control register
git commit 904818e2f2
"s390/kernel: introduce fpu-internal.h with fpu helper functions"
introduced the fpregs_store / fp_regs_load helper. These function
fail to save and restore the floating pointer control registers.

The effect is that the FPC is not correctly handled on signal
delivery and signal return.

Cc: stable@vger.kernel.org # 4.4
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-22 09:29:35 +01:00
Martin Schwidefsky 342300cc9c s390/compat: correct restore of high gprs on signal return
git commit 8070361799
"s390: add support for vector extension"
broke 31-bit compat processes in regard to signal handling.

The restore_sigregs_ext32() function is used to restore the additional
elements from the user space signal frame. Among the additional elements
are the upper registers halves for 64-bit register support for 31-bit
processes. The copy_from_user that is used to retrieve the high-gprs
array from the user stack uses an incorrect length, 8 bytes instead of
64 bytes. This causes incorrect upper register halves to get loaded.

Cc: stable@vger.kernel.org # 3.8+
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-22 09:29:35 +01:00
Linus Torvalds ff5f16820f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "Several bug fixes:

   - There are four different stack tracers, and three of them have
     bugs.  For 4.5 the bugs are fixed and we prepare a cleanup patch
     for the next merge window.

   - Three bug fixes for the dasd driver in regard to parallel access
     volumes and the new max_dev_sectors block device queue limit

   - The irq restore optimization needs a fixup for memcpy_real

   - The diagnose trace code has a conflict with lockdep"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/dasd: fix performance drop
  s390/maccess: reduce stnsm instructions
  s390/diag: avoid lockdep recursion
  s390/dasd: fix refcount for PAV reassignment
  s390/dasd: prevent incorrect length error under z/VM after PAV changes
  s390: fix DAT off memory access, e.g. on kdump
  s390/oprofile: fix address range for asynchronous stack
  s390/perf_event: fix address range for asynchronous stack
  s390/stacktrace: add save_stack_trace_regs()
  s390/stacktrace: save full stack traces
  s390/stacktrace: add missing end marker
  s390/stacktrace: fix address ranges for asynchronous and panic stack
  s390/stacktrace: fix save_stack_trace_tsk() for current task
2016-02-19 08:33:12 -08:00
Linus Torvalds 705d43dbe1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching fixes from Jiri Kosina:

 - regression (from 4.4) fix for ordering issue, introduced by an
   earlier ftrace change, that broke live patching of modules.

   The fix replaces the ftrace module notifier by direct call in order
   to make the ordering guaranteed and well-defined.  The patch, from
   Jessica Yu, has been acked both by Steven and Rusty

 - error message fix from Miroslav Benes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  ftrace/module: remove ftrace module notifier
  livepatch: change the error message in asm/livepatch.h header files
2016-02-18 16:34:15 -08:00
Dave Hansen d61172b4b6 mm/core, x86/mm/pkeys: Differentiate instruction fetches
As discussed earlier, we attempt to enforce protection keys in
software.

However, the code checks all faults to ensure that they are not
violating protection key permissions.  It was assumed that all
faults are either write faults where we check PKRU[key].WD (write
disable) or read faults where we check the AD (access disable)
bit.

But, there is a third category of faults for protection keys:
instruction faults.  Instruction faults never run afoul of
protection keys because they do not affect instruction fetches.

So, plumb the PF_INSTR bit down in to the
arch_vma_access_permitted() function where we do the protection
key checks.

We also add a new FAULT_FLAG_INSTRUCTION.  This is because
handle_mm_fault() is not passed the architecture-specific
error_code where we keep PF_INSTR, so we need to encode the
instruction fetch information in to the arch-generic fault
flags.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210224.96928009@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 19:46:29 +01:00
Dave Hansen 1b2ee1266e mm/core: Do not enforce PKEY permissions on remote mm access
We try to enforce protection keys in software the same way that we
do in hardware.  (See long example below).

But, we only want to do this when accessing our *own* process's
memory.  If GDB set PKRU[6].AD=1 (disable access to PKEY 6), then
tried to PTRACE_POKE a target process which just happened to have
some mprotect_pkey(pkey=6) memory, we do *not* want to deny the
debugger access to that memory.  PKRU is fundamentally a
thread-local structure and we do not want to enforce it on access
to _another_ thread's data.

This gets especially tricky when we have workqueues or other
delayed-work mechanisms that might run in a random process's context.
We can check that we only enforce pkeys when operating on our *own* mm,
but delayed work gets performed when a random user context is active.
We might end up with a situation where a delayed-work gup fails when
running randomly under its "own" task but succeeds when running under
another process.  We want to avoid that.

To avoid that, we use the new GUP flag: FOLL_REMOTE and add a
fault flag: FAULT_FLAG_REMOTE.  They indicate that we are
walking an mm which is not guranteed to be the same as
current->mm and should not be subject to protection key
enforcement.

Thanks to Jerome Glisse for pointing out this scenario.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
Cc: Eric B Munson <emunson@akamai.com>
Cc: Geliang Tang <geliangtang@163.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Low <jason.low2@hp.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 19:46:28 +01:00
Dave Hansen 33a709b25a mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
Today, for normal faults and page table walks, we check the VMA
and/or PTE to ensure that it is compatible with the action.  For
instance, if we get a write fault on a non-writeable VMA, we
SIGSEGV.

We try to do the same thing for protection keys.  Basically, we
try to make sure that if a user does this:

	mprotect(ptr, size, PROT_NONE);
	*ptr = foo;

they see the same effects with protection keys when they do this:

	mprotect(ptr, size, PROT_READ|PROT_WRITE);
	set_pkey(ptr, size, 4);
	wrpkru(0xffffff3f); // access disable pkey 4
	*ptr = foo;

The state to do that checking is in the VMA, but we also
sometimes have to do it on the page tables only, like when doing
a get_user_pages_fast() where we have no VMA.

We add two functions and expose them to generic code:

	arch_pte_access_permitted(pte_flags, write)
	arch_vma_access_permitted(vma, write)

These are, of course, backed up in x86 arch code with checks
against the PTE or VMA's protection key.

But, there are also cases where we do not want to respect
protection keys.  When we ptrace(), for instance, we do not want
to apply the tracer's PKRU permissions to the PTEs from the
process being traced.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20160212210219.14D5D715@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 09:32:44 +01:00
Heiko Carstens 52499d93d6 s390/maccess: reduce stnsm instructions
When fixing the DAT off bug ("s390: fix DAT off memory access, e.g.
on kdump") both Christian and I missed that we can save an additional
stnsm instruction.

This saves us a couple of cycles which could improve the speed of
memcpy_real.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-17 09:05:04 +01:00
Stephan Mueller 49abc0d2e1 crypto: xts - fix compile errors
Commit 28856a9e52 missed the addition of the crypto/xts.h include file
for different architecture-specific AES implementations.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-02-17 15:53:02 +08:00
Stephan Mueller 28856a9e52 crypto: xts - consolidate sanity check for keys
The patch centralizes the XTS key check logic into the service function
xts_check_key which is invoked from the different XTS implementations.
With this, the XTS implementations in ARM, ARM64, PPC and S390 have now
a sanity check for the XTS keys similar to the other arches.

In addition, this service function received a check to ensure that the
key != the tweak key which is mandated by FIPS 140-2 IG A.9. As the
check is not present in the standards defining XTS, it is only enforced
in FIPS mode of the kernel.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-02-17 04:07:51 +08:00
Dave Hansen d4edcf0d56 mm/gup: Switch all callers of get_user_pages() to not pass tsk/mm
We will soon modify the vanilla get_user_pages() so it can no
longer be used on mm/tasks other than 'current/current->mm',
which is by far the most common way it is called.  For now,
we allow the old-style calls, but warn when they are used.
(implemented in previous patch)

This patch switches all callers of:

	get_user_pages()
	get_user_pages_unlocked()
	get_user_pages_locked()

to stop passing tsk/mm so they will no longer see the warnings.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: jack@suse.cz
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210156.113E9407@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16 10:11:12 +01:00
Heiko Carstens f6c9b16023 s390/diag: avoid lockdep recursion
The diagnose tracer will indirectly call back into the lockdep code
when lockdep does not expect it (arch_spinlock). This causes lockdep
to disable itself and therefore we don't have a working lock
dependency validator anymore.

This patch effectively disables tracing of diag 0x9c and 0x44 if
lockdep is enabled.  If however lockdep is enabled spinlocks are
mainly implemented using a trylock variant, which will not issue any
diag 0x9c or 0x44. So this change has hardly any effect on tracing
except when arch_spinlock and friends are explicitly used.

Reported-and-Tested-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-11 13:05:56 +01:00
Christian Borntraeger 0986d97741 s390: fix DAT off memory access, e.g. on kdump
commit 204ee2c564 ("s390/irqflags: optimize irq restore") optimized
irqrestore to really only care about interrupts and adapted the
remaining low level users. One spot (memcpy_real) was not touched,
though - fix it. Otherwise a kdump kernel will fail while reading
the old kernel. As we re-enable irqs with a non-standard function
we have to tell lockdep about that.

Fixes: 204ee2c564 ("s390/irqflags: optimize irq restore")
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-11 13:05:51 +01:00
Christian Borntraeger 1763f8d09d KVM: s390: bail out early on fatal signal in dirty logging
A KVM_GET_DIRTY_LOG ioctl might take a long time.
This can result in fatal signals seemingly being ignored.
Lets bail out during the dirty bit sync, if a fatal signal
is pending.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10 13:12:57 +01:00
Christian Borntraeger 70c88a00fb KVM: s390: do not block CPU on dirty logging
When doing dirty logging on huge guests (e.g.600GB) we sometimes
get rcu stall timeouts with backtraces like

[ 2753.194083] ([<0000000000112fb2>] show_trace+0x12a/0x130)
[ 2753.194092]  [<0000000000113024>] show_stack+0x6c/0xe8
[ 2753.194094]  [<00000000001ee6a8>] rcu_pending+0x358/0xa48
[ 2753.194099]  [<00000000001f20cc>] rcu_check_callbacks+0x84/0x168
[ 2753.194102]  [<0000000000167654>] update_process_times+0x54/0x80
[ 2753.194107]  [<00000000001bdb5c>] tick_sched_handle.isra.16+0x4c/0x60
[ 2753.194113]  [<00000000001bdbd8>] tick_sched_timer+0x68/0x90
[ 2753.194115]  [<0000000000182a88>] __run_hrtimer+0x88/0x1f8
[ 2753.194119]  [<00000000001838ba>] hrtimer_interrupt+0x122/0x2b0
[ 2753.194121]  [<000000000010d034>] do_extint+0x16c/0x170
[ 2753.194123]  [<00000000005e206e>] ext_skip+0x38/0x3e
[ 2753.194129]  [<000000000012157c>] gmap_test_and_clear_dirty+0xcc/0x118
[ 2753.194134] ([<00000000001214ea>] gmap_test_and_clear_dirty+0x3a/0x118)
[ 2753.194137]  [<0000000000132da4>] kvm_vm_ioctl_get_dirty_log+0xd4/0x1b0
[ 2753.194143]  [<000000000012ac12>] kvm_vm_ioctl+0x21a/0x548
[ 2753.194146]  [<00000000002b57f6>] do_vfs_ioctl+0x30e/0x518
[ 2753.194149]  [<00000000002b5a9c>] SyS_ioctl+0x9c/0xb0
[ 2753.194151]  [<00000000005e1ae6>] sysc_tracego+0x14/0x1a
[ 2753.194153]  [<000003ffb75f3972>] 0x3ffb75f3972

We should do a cond_resched in here.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10 13:12:57 +01:00
Christian Borntraeger ab99a1cc7a KVM: s390: do not take mmap_sem on dirty log query
Dirty log query can take a long time for huge guests.
Holding the mmap_sem for very long times  can cause some unwanted
latencies.
Turns out that we do not need to hold the mmap semaphore.
We hold the slots_lock for gfn->hva translation and walk the page
tables with that address, so no need to look at the VMAs. KVM also
holds a reference to the mm, which should prevent other things
going away. During the walk we take the necessary ptl locks.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10 13:12:56 +01:00
David Hildenbrand efa48163b8 KVM: s390: remove old fragment of vector registers
Since commit 9977e886cb ("s390/kernel: lazy restore fpu registers"),
vregs in struct sie_page is unsed. We can safely remove the field and
the definition.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10 13:12:54 +01:00
David Hildenbrand 9b0d721a07 KVM: s390: instruction-fetching exceptions on SIE faults
On instruction-fetch exceptions, we have to forward the PSW by any
valid ilc and correctly use that ilc when injecting the irq. Injection
will already take care of rewinding the PSW if we injected a nullifying
program irq, so we don't need special handling prior to injection.

Until now, autodetection would have guessed an ilc of 0.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10 13:12:54 +01:00
David Hildenbrand 5631792053 KVM: s390: provide prog irq ilc on SIE faults
On SIE faults, the ilc cannot be detected automatically, as the icptcode
is 0. The ilc indicated in the program irq will always be 0. Therefore we
have to manually specify the ilc in order to tell the guest which ilen was
used when forwarding the PSW.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10 13:12:53 +01:00
David Hildenbrand eaa4f41642 KVM: s390: irq delivery should not rely on icptcode
Program irq injection during program irq intercepts is the last candidates
that injects nullifying irqs and relies on delivery to do the right thing.

As we should not rely on the icptcode during any delivery (because that
value will not be migrated), let's add a flag, telling prog IRQ delivery
to not rewind the PSW in case of nullifying prog IRQs.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10 13:12:53 +01:00
David Hildenbrand f6af84e7e7 KVM: s390: clean up prog irq injection on prog irq icpts
__extract_prog_irq() is used only once for getting the program check data
in one place. Let's combine it with an injection function to avoid a memset
and to prevent misuse on injection by simplifying the interface to only
have the VCPU as parameter.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10 13:12:52 +01:00
David Hildenbrand 6597732275 KVM: s390: read the correct opcode on SIE faults
Let's use our fresh new function read_guest_instr() to access
guest storage via the correct addressing schema.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10 13:12:51 +01:00
David Hildenbrand 34346b9a93 KVM: s390: gaccess: implement instruction fetching mode
When an instruction is to be fetched, special handling applies to
secondary-space mode and access-register mode. The instruction is to be
fetched from primary space.

We can easily support this by selecting the right asce for translation.
Access registers will never be used during translation, so don't
include them in the interface. As we only want to read from the current
PSW address for now, let's also hide that detail.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10 13:12:51 +01:00
David Hildenbrand 92c9632119 KVM: s390: gaccess: introduce access modes
We will need special handling when fetching instructions, so let's
introduce new guest access modes GACC_FETCH and GACC_STORE instead
of a write flag. An additional patch will then introduce GACC_IFETCH.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10 13:12:50 +01:00
David Hildenbrand 634790b827 KVM: s390: migration / injection of prog irq ilc
We have to migrate the program irq ilc and someday we will have to
specify the ilc without KVM trying to autodetect the value.

Let's reuse one of the spare fields in our program irq that should
always be set to 0 by user space. Because we also want to make use
of 0 ilcs ("not available"), we need a validity indicator.

If no valid ilc is given, we try to autodetect the ilc via the current
icptcode and icptstatus + parameter and store the valid ilc in the
irq structure.

This has a nice effect: QEMU's making use of KVM_S390_IRQ /
KVM_S390_SET_IRQ_STATE / KVM_S390_GET_IRQ_STATE for migration will
directly migrate the ilc without any changes.

Please note that we use bit 0 as validity and bit 1,2 for the ilc, so
by applying the ilc mask we directly get the ilen which is usually what
we work with.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10 13:12:50 +01:00
David Hildenbrand 0e8bc06a2f KVM: s390: PSW forwarding / rewinding / ilc rework
We have some confusion about ilc vs. ilen in our current code. So let's
correctly use the term ilen when dealing with (ilc << 1).

Program irq injection didn't take care of the correct ilc in case of
irqs triggered by EXECUTE functions, let's provide one function
kvm_s390_get_ilen() to take care of all that.

Also, manually specifying in intercept handlers the size of the
instruction (and sometimes overwriting that value for EXECUTE internally)
doesn't make too much sense. So also provide the functions:
- kvm_s390_retry_instr to retry the currently intercepted instruction
- kvm_s390_rewind_psw to rewind the PSW without internal overwrites
- kvm_s390_forward_psw to forward the PSW

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10 13:12:49 +01:00
David Hildenbrand 6fd8e67dd8 KVM: s390: sync of fp registers via kvm_run
As we already store the floating point registers in the vector save area
in floating point register format when we don't have MACHINE_HAS_VX, we can
directly expose them to user space using a new sync flag.

The floating point registers will be valid when KVM_SYNC_FPRS is set. The
fpc will also be valid when KVM_SYNC_FPRS is set.

Either KVM_SYNC_FPRS or KVM_SYNC_VRS will be enabled, never both.

Let's also change two positions where we access vrs, making the code easier
to read and one comment superfluous.

Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10 13:12:49 +01:00
David Hildenbrand f6aa6dc449 KVM: s390: allow sync of fp registers via vregs
If we have MACHINE_HAS_VX, the floating point registers are stored
in the vector register format, event if the guest isn't enabled for vector
registers. So we can allow KVM_SYNC_VRS as soon as MACHINE_HAS_VX is
available.

This can in return be used by user space to support floating point
registers via struct kvm_run when the machine has vector registers.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10 13:12:48 +01:00
Heiko Carstens 232f5dd785 s390/oprofile: fix address range for asynchronous stack
git commit dc7ee00d47 ("s390: lowcore stack pointer offsets")
introduced a regression in regard to s390_backtrace(). The stack
pointer for the asynchronous stack in the lowcore now has an
additional offset applied. This offset needs to be taken into account
in the calculation for the low and high address for the stack.

This bug was already partially fixed with commit 9cc5c206d9
("s390/dumpstack: fix address ranges for asynchronous and panic
stack"). This patch fixes it also for the oprofile code.

Fixes: dc7ee00d47 ("s390: lowcore stack pointer offsets")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-10 09:25:22 +01:00
Heiko Carstens 1f8cbb9c83 s390/perf_event: fix address range for asynchronous stack
git commit dc7ee00d47 ("s390: lowcore stack pointer offsets")
introduced a regression in regard to perf_callchain_kernel(). The
stack pointer for the asynchronous stack in the lowcore now has an
additional offset applied. This offset needs to be taken into account
in the calculation for the low and high address for the stack.

This bug was already partially fixed with 9cc5c206d9
("s390/dumpstack: fix address ranges for asynchronous and panic
stack"). This patch fixes it also for the perf_event code.

Fixes: dc7ee00d47 ("s390: lowcore stack pointer offsets")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-10 09:25:22 +01:00
Pratyush Anand e0115875c0 s390/stacktrace: add save_stack_trace_regs()
Implement save_stack_trace_regs, so that a stack trace of a kprobe
event can be obtained.

Without this we see following warning:
"save_stack_trace_regs() not implemented yet."
when we execute:
echo stacktrace > /sys/kernel/debug/tracing/trace_options
echo "p kfree" >> /sys/kernel/debug/tracing/kprobe_events
echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable

Reported-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Pratyush Anand <panand@redhat.com>
[heiko.carstens@de.ibm.com]: changed patch to use __save_stack_trace()
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-10 09:25:22 +01:00
Heiko Carstens 66adce8f1f s390/stacktrace: save full stack traces
save_stack_trace() only saves the stack trace of the current context
(interrupt or process context). This is different to what other
architectures like x86 do, which save the full stack trace across
different contexts.

Also extract a __save_stack_trace() helper function which will be used
by a follow on patch.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-10 09:25:21 +01:00
Heiko Carstens f6331aaccb s390/stacktrace: add missing end marker
save_stack_trace() did not write the ULONG_MAX end marker if there is
enough space left. So simply follow x86 and arm64.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-10 09:25:21 +01:00
Heiko Carstens 9900c48c46 s390/stacktrace: fix address ranges for asynchronous and panic stack
git commit dc7ee00d47 ("s390: lowcore stack pointer offsets")
introduced a regression in regard to save_stack_trace(). The stack
pointer for the asynchronous and the panic stack in the lowcore now
have an additional offset applied to them. This offset needs to be
taken into account in the calculation for the low and high address for
the stacks.

This bug was already partially fixed with 9cc5c206d9
("s390/dumpstack: fix address ranges for asynchronous and panic
stack"). This patch fixes it also for the stacktrace code.

Fixes: dc7ee00d47 ("s390: lowcore stack pointer offsets")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-10 09:25:20 +01:00
Heiko Carstens 665ca9187c s390/stacktrace: fix save_stack_trace_tsk() for current task
The function save_stack_trace_tsk() did not consider that it can be
used for tsk == current, for which the current stack pointer obviously
cannot be found in the thread structure.

Fix this and get the stack pointer with an inline assembly.

This fixes e.g. the output of "cat /proc/self/stack".

Before:
[<0000000000000000>]           (null)
[<ffffffffffffffff>] 0xffffffffffffffff

After:
[<000000000011b3ee>] save_stack_trace_tsk+0x56/0x98
[<0000000000366cde>] proc_pid_stack+0xae/0x108
[<00000000003636f0>] proc_single_show+0x70/0xc0
[<0000000000311fbc>] seq_read+0xcc/0x448
[<00000000002e7716>] __vfs_read+0x36/0x100
[<00000000002e872e>] vfs_read+0x76/0x130
[<00000000002e975e>] SyS_read+0x66/0xd8
[<000000000089490e>] system_call+0xd6/0x264
[<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-10 09:25:20 +01:00
Andrey Ryabinin 06bea3dbfe locking/lockdep: Eliminate lockdep_init()
Lockdep is initialized at compile time now.  Get rid of lockdep_init().

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Krinkin <krinkin.m.u@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: mm-commits@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09 12:03:25 +01:00
Toshi Kani 35d98e93fe arch: Set IORESOURCE_SYSTEM_RAM flag for System RAM
Set IORESOURCE_SYSTEM_RAM in flags of resource ranges with
"System RAM", "Kernel code", "Kernel data", and "Kernel bss".

Note that:

 - IORESOURCE_SYSRAM (i.e. modifier bit) is set in flags when
   IORESOURCE_MEM is already set. IORESOURCE_SYSTEM_RAM is defined
   as (IORESOURCE_MEM|IORESOURCE_SYSRAM).

 - Some archs do not set 'flags' for children nodes, such as
   "Kernel code".  This patch does not change 'flags' in this
   case.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Cc: linux-mm <linux-mm@kvack.org>
Cc: linux-parisc@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1453841853-11383-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30 09:49:57 +01:00
Linus Torvalds 6b292a8abd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "An optimization for irq-restore, the SSM instruction is quite a bit
  slower than an if-statement and a STOSM.

  The copy_file_range system all is added.

  Cleanup for PCI and CIO.

  And a couple of bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/cio: update measurement characteristics
  s390/cio: ensure consistent measurement state
  s390/cio: fix measurement characteristics memleak
  s390/zcrypt: Fix cryptographic device id in kernel messages
  s390/pci: remove iomap sanity checks
  s390/pci: set error state for unusable functions
  s390/pci: fix bar check
  s390/pci: resize iomap
  s390/pci: improve ZPCI_* macros
  s390/pci: provide ZPCI_ADDR macro
  s390/pci: adjust IOMAP_MAX_ENTRIES
  s390/numa: move numa_init_late() from device to arch_initcall
  s390: remove all usages of PSW_ADDR_INSN
  s390: remove all usages of PSW_ADDR_AMODE
  s390: wire up copy_file_range syscall
  s390: remove superfluous memblock_alloc() return value checks
  s390/numa: allocate memory with correct alignment
  s390/irqflags: optimize irq restore
  s390/mm: use TASK_MAX_SIZE where applicable
2016-01-29 16:05:18 -08:00
Linus Torvalds f0ce3ff42e s390 and POWER bug fixes, plus enabling the KVM-VFIO interface on s390.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWp5FIAAoJEL/70l94x66DozoIAKWFLnwi6PQ7Sm+Tvr0Sl/mp
 yGDM+PuVyG4C6bPS66xd4XkXEFIwfpIJQzq1Zt7Xg0m27t50DRtmInLJU4ql7MFD
 vYn3h6lOudtUR0i5kPYSy6SMehFjx8wS0GX+O+iX9tMlYI0vGWEJ+7C06FmnqpGP
 RUUjnVvljAMih8wsAHLOjH8rwZHnO+EfHNi+V7Q+eFBHJnn2R06IqK8z5DmTBeTQ
 ZecNdZslX5qZvMNulTo7nC2P6VShkdRuMJkLEFeSaTlCqx5WcnzgwEiMMavUeYI8
 aIWuE51v0RMxRhsvRXUQOeRpRgU0AF2OYJEKXBRbO3P6IRikFohJq4QH67zN2HA=
 =54tv
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "s390 and POWER bug fixes, plus enabling the KVM-VFIO interface on
  s390"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM doc: Fix KVM_SMI chapter number
  KVM: s390: fix memory overwrites when vx is disabled
  KVM: s390: Enable the KVM-VFIO device
  KVM: s390: fix guest fprs memory leak
  KVM: PPC: Fix ONE_REG AltiVec support
  KVM: PPC: Increase memslots to 512
  KVM: PPC: Book3S PR: Remove unused variable 'vcpu_book3s'
  KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8
  KVM: PPC: Book3S HV: Handle unexpected traps in guest entry/exit code better
2016-01-27 10:50:42 -08:00
David Hildenbrand 9abc2a08a7 KVM: s390: fix memory overwrites when vx is disabled
The kernel now always uses vector registers when available, however KVM
has special logic if support is really enabled for a guest. If support
is disabled, guest_fpregs.fregs will only contain memory for the fpu.
The kernel, however, will store vector registers into that area,
resulting in crazy memory overwrites.

Simply extending that area is not enough, because the format of the
registers also changes. We would have to do additional conversions, making
the code even more complex. Therefore let's directly use one place for
the vector/fpu registers + fpc (in kvm_run). We just have to convert the
data properly when accessing it. This makes current code much easier.

Please note that vector/fpu registers are now always stored to
vcpu->run->s.regs.vrs. Although this data is visible to QEMU and
used for migration, we only guarantee valid values to user space  when
KVM_SYNC_VRS is set. As that is only the case when we have vector
register support, we are on the safe side.

Fixes: b5510d9b68 ("s390/fpu: always enable the vector facility if it is available")
Cc: stable@vger.kernel.org # v4.4 d9a3a09af5 s390/kvm: remove dependency on struct save_area definition
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[adopt to d9a3a09af5]
2016-01-26 15:40:21 +01:00
Dong Jia Shi 14b0b4ac37 KVM: s390: Enable the KVM-VFIO device
The KVM-VFIO device is used by the QEMU VFIO device. It is used to
record the list of in-use VFIO groups so that KVM can manipulate
them.
While we don't need this on s390 currently, let's try to be like
everyone else.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-01-26 15:40:17 +01:00
David Hildenbrand 9c7ebb613b KVM: s390: fix guest fprs memory leak
fprs is never freed, therefore resulting in a memory leak if
kvm_vcpu_init() fails or the vcpu is destroyed.

Fixes: 9977e886cb ("s390/kernel: lazy restore fpu registers")
Cc: stable@vger.kernel.org # v4.3+
Reported-by: Eric Farman <farman@linux.vnet.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-01-26 15:40:09 +01:00
Sebastian Ott f5e44f82c1 s390/pci: remove iomap sanity checks
Since each iomap_entry handles only one bar of one pci function
(even when disjunct ranges of a bar are mapped) the sanity check
in pci_iomap_range is not needed and can be removed.

Also convert the remaining BUG_ONs to WARN_ONs.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-26 12:46:45 +01:00
Sebastian Ott 8ead7efb63 s390/pci: set error state for unusable functions
We receive special notifications from firmware when an error was detected
and a pci function became unusable. Set the error_state accordingly to give
device drivers a hint that they don't need to try error recovery.

Suggested-by: Alexander Schmidt <alexschm@de.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-26 12:46:28 +01:00
Sebastian Ott c0cabaddee s390/pci: fix bar check
Fix the check which bar space we should map to allow available bars only.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-26 12:46:17 +01:00
Sebastian Ott c506fff3d3 s390/pci: resize iomap
On s390 we need to maintain a mapping between iomem addresses
and arch specific function identifiers. Currently the mapping
table is created as such that we could span the whole iomem
address space. Since we can only map each bar space from each
possible function we have an upper bound for the number of
mapping entries.

This reduces the size of the iomap from 256K to less than 4K
(using the defconfig).

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-26 12:45:58 +01:00
Sebastian Ott bf19c94d5c s390/pci: improve ZPCI_* macros
Most of the constants defined in pci_io.h depend on each other
and thus can be calculated.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-26 12:45:49 +01:00
Sebastian Ott 9e00caaea1 s390/pci: provide ZPCI_ADDR macro
Provide and use a ZPCI_ADDR macro as the complement of ZPCI_IDX
to get rid of some constants in the code.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-26 12:45:41 +01:00
Sebastian Ott c2e1fcf3ec s390/pci: adjust IOMAP_MAX_ENTRIES
ZPCI_IOMAP_MAX_ENTRIES is off by one. Let's adjust this
for the sake of correctness.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-26 12:45:33 +01:00
Michael Holzheu 2d0f76a6ca s390/numa: move numa_init_late() from device to arch_initcall
Commit 3e89e1c5ea ("hugetlb: make mm and fs code explicitly non-modular")
moves hugetlb_init() from module_init to subsys_initcall.

The hugetlb_init()->hugetlb_register_node() code accesses "node->dev.kobj"
which is initialized in numa_init_late().

Since numa_init_late() is a device_initcall which is called *after*
subsys_initcall the above mentioned patch breaks NUMA on s390.

So fix this and move numa_init_late() to arch_initcall.

Fixes: 3e89e1c5ea ("hugetlb: make mm and fs code explicitly non-modular")
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-26 12:45:24 +01:00
Al Viro 5955102c99 wrappers for ->i_mutex access
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-22 18:04:28 -05:00
Christoph Hellwig e1c7e32453 dma-mapping: always provide the dma_map_ops based implementation
Move the generic implementation to <linux/dma-mapping.h> now that all
architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now
that everyone supports them.

[valentinrothberg@gmail.com: remove leftovers in Kconfig]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Helge Deller <deller@gmx.de>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-20 17:09:18 -08:00
Heiko Carstens 9cb1ccecb6 s390: remove all usages of PSW_ADDR_INSN
Yet another leftover from the 31 bit era. The usual operation
"y = x & PSW_ADDR_INSN" with the PSW_ADDR_INSN mask is a nop for
CONFIG_64BIT.

Therefore remove all usages and hope the code is a bit less confusing.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
2016-01-19 12:14:03 +01:00
Heiko Carstens fecc868a66 s390: remove all usages of PSW_ADDR_AMODE
This is a leftover from the 31 bit area. For CONFIG_64BIT the usual
operation "y = x | PSW_ADDR_AMODE" is a nop. Therefore remove all
usages of PSW_ADDR_AMODE and make the code a bit less confusing.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
2016-01-19 12:14:02 +01:00
Heiko Carstens 696aafd369 s390: wire up copy_file_range syscall
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-19 12:14:02 +01:00
Heiko Carstens cd17e153af s390: remove superfluous memblock_alloc() return value checks
memblock_alloc() and memblock_alloc_base() will panic on their own if
they can't find free memory. Therefore remove some pointless checks.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-19 12:14:02 +01:00
Heiko Carstens ef1f7fd7eb s390/numa: allocate memory with correct alignment
Allocating memory with a requested minimum alignment of 1 is wrong
since pg_data_t contains a spinlock which requires an alignment of 4
bytes.

Therefore fix this and ask for an alignment of 8 bytes like it is
guarenteed for all kmalloc requests.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-19 12:14:01 +01:00
Christian Borntraeger 204ee2c564 s390/irqflags: optimize irq restore
The ssm instruction takes longer that stnsm/stosm as it is often
used to modify DAT and PER. We know that irqsave/irqrestore only
deals with external and I/O interrupts and we know that irqrestore
can transition only from disabled->disabled or disabled->enabled,
so we can use the faster stosm.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-01-19 12:14:01 +01:00
Dominik Dingel a9d7ab9781 s390/mm: use TASK_MAX_SIZE where applicable
To improve readability we can use TASK_MAX_SIZE when we just check for the
upper limit.  All places explicitly dealing with 3 vs 4 level pgtables
were left unchanged.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-By: Sascha Silbe <silbe@linux.vnet.ibm.com>
2016-01-19 12:14:01 +01:00
Linus Torvalds a200dcb346 virtio: barrier rework+fixes
This adds a new kind of barrier, and reworks virtio and xen
 to use it.
 Plus some fixes here and there.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWlU2kAAoJECgfDbjSjVRpZ6IH/Ra19ecG8sCQo9zskr4zo22Z
 DZXC3u0sJDBYjjBAiw3IY1FKh7wx2Fr1RhUOj1bteBgcFCMCV1zInP5ITiCyzd1H
 YYh1w9C2tZaj2T4t9L4hIrAdtIF8fGS+oI2IojXPjOuDLEt6pfFBEjHp/sfl3UJq
 ZmZvw4OXviSNej7jBw8Xni3Uv18yfmLGXvMdkvMSPC1/XL29voGDqTVwhqJwxLVz
 k/ZLcKFOzIs9N7Nja0Jl1EiZtC2Y9cpItqweicNAzszlpkSL44vQxmCSefB+WyQ4
 gt0O3+AxYkLfrxzCBhUA4IpRex3/XPW1b+1e/V1XjfR2n/FlyLe+AIa8uPJElFc=
 =ukaV
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio barrier rework+fixes from Michael Tsirkin:
 "This adds a new kind of barrier, and reworks virtio and xen to use it.

  Plus some fixes here and there"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (44 commits)
  checkpatch: add virt barriers
  checkpatch: check for __smp outside barrier.h
  checkpatch.pl: add missing memory barriers
  virtio: make find_vqs() checkpatch.pl-friendly
  virtio_balloon: fix race between migration and ballooning
  virtio_balloon: fix race by fill and leak
  s390: more efficient smp barriers
  s390: use generic memory barriers
  xen/events: use virt_xxx barriers
  xen/io: use virt_xxx barriers
  xenbus: use virt_xxx barriers
  virtio_ring: use virt_store_mb
  sh: move xchg_cmpxchg to a header by itself
  sh: support 1 and 2 byte xchg
  virtio_ring: update weak barriers to use virt_xxx
  Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb"
  asm-generic: implement virt_xxx memory barriers
  x86: define __smp_xxx
  xtensa: define __smp_xxx
  tile: define __smp_xxx
  ...
2016-01-18 16:44:24 -08:00
Miroslav Benes 383bf44d1a livepatch: change the error message in asm/livepatch.h header files
If anyone includes asm/livepatch.h when CONFIG_LIVEPATCH=n the build
fails with the existing error message. Change it to something saner.

[jkosina@suse.cz: fixed changelog typo spotted by Josh]
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-01-18 21:35:43 +01:00
Will Deacon da48d094ce Kconfig: remove HAVE_LATENCYTOP_SUPPORT
As illustrated by commit a3afe70b83 ("[S390] latencytop s390
support."), HAVE_LATENCYTOP_SUPPORT is defined by an architecture to
advertise an implementation of save_stack_trace_tsk.

However, as of 9212ddb5ea ("stacktrace: provide save_stack_trace_tsk()
weak alias") a dummy implementation is provided if STACKTRACE=y.  Given
that LATENCYTOP already depends on STACKTRACE_SUPPORT and selects
STACKTRACE, we can remove HAVE_LATENCYTOP_SUPPORT altogether.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Helge Deller <deller@gmx.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 11:17:23 -08:00
Dominik Dingel fef8953ae4 s390/mm: enable fixup_user_fault retrying
By passing a non-null flag we allow fixup_user_fault to retry, which
enables userfaultfd.  As during these retries we might drop the mmap_sem
we need to check if that happened and redo the complete chain of
actions.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric B Munson <emunson@akamai.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Dominik Dingel 4a9e1cda27 mm: bring in additional flag for fixup_user_fault to signal unlock
During Jason's work with postcopy migration support for s390 a problem
regarding gmap faults was discovered.

The gmap code will call fixup_user_fault which will end up always in
handle_mm_fault.  Till now we never cared about retries, but as the
userfaultfd code kind of relies on it.  this needs some fix.

This patchset does not take care of the futex code.  I will now look
closer at this.

This patch (of 2):

With the introduction of userfaultfd, kvm on s390 needs fixup_user_fault
to pass in FAULT_FLAG_ALLOW_RETRY and give feedback if during the
faulting we ever unlocked mmap_sem.

This patch brings in the logic to handle retries as well as it cleans up
the current documentation.  fixup_user_fault was not having the same
semantics as filemap_fault.  It never indicated if a retry happened and
so a caller wasn't able to handle that case.  So we now changed the
behaviour to always retry a locked mmap_sem.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric B Munson <emunson@akamai.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Kirill A. Shutemov fecffad254 s390, thp: remove infrastructure for handling splitting PMDs
With new refcounting we don't need to mark PMDs splitting.  Let's drop
code to handle this.

pmdp_splitting_flush() is not needed too: on splitting PMD we will do
pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
needed for fast_gup.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Kirill A. Shutemov ddc58f27f9 mm: drop tail page refcounting
Tail page refcounting is utterly complicated and painful to support.

It uses ->_mapcount on tail pages to store how many times this page is
pinned.  get_page() bumps ->_mapcount on tail page in addition to
->_count on head.  This information is required by split_huge_page() to
be able to distribute pins from head of compound page to tails during
the split.

We will need ->_mapcount to account PTE mappings of subpages of the
compound page.  We eliminate need in current meaning of ->_mapcount in
tail pages by forbidding split entirely if the page is pinned.

The only user of tail page refcounting is THP which is marked BROKEN for
now.

Let's drop all this mess.  It makes get_page() and put_page() much
simpler.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Linus Torvalds 875fc4f5dd Merge branch 'akpm' (patches from Andrew)
Merge first patch-bomb from Andrew Morton:

 - A few hotfixes which missed 4.4 becasue I was asleep.  cc'ed to
   -stable

 - A few misc fixes

 - OCFS2 updates

 - Part of MM.  Including pretty large changes to page-flags handling
   and to thp management which have been buffered up for 2-3 cycles now.

  I have a lot of MM material this time.

[ It turns out the THP part wasn't quite ready, so that got dropped from
  this series  - Linus ]

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits)
  zsmalloc: reorganize struct size_class to pack 4 bytes hole
  mm/zbud.c: use list_last_entry() instead of list_tail_entry()
  zram/zcomp: do not zero out zcomp private pages
  zram: pass gfp from zcomp frontend to backend
  zram: try vmalloc() after kmalloc()
  zram/zcomp: use GFP_NOIO to allocate streams
  mm: add tracepoint for scanning pages
  drivers/base/memory.c: fix kernel warning during memory hotplug on ppc64
  mm/page_isolation: use macro to judge the alignment
  mm: fix noisy sparse warning in LIBCFS_ALLOC_PRE()
  mm: rework virtual memory accounting
  include/linux/memblock.h: fix ordering of 'flags' argument in comments
  mm: move lru_to_page to mm_inline.h
  Documentation/filesystems: describe the shared memory usage/accounting
  memory-hotplug: don't BUG() in register_memory_resource()
  hugetlb: make mm and fs code explicitly non-modular
  mm/swapfile.c: use list_for_each_entry_safe in free_swap_count_continuations
  mm: /proc/pid/clear_refs: no need to clear VM_SOFTDIRTY in clear_soft_dirty_pmd()
  mm: make sure isolate_lru_page() is never called for tail page
  vmstat: make vmstat_updater deferrable again and shut down on idle
  ...
2016-01-15 11:41:44 -08:00
Linus Torvalds 0f0836b7eb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching updates from Jiri Kosina:

 - RO/NX attribute fixes for patch module relocations from Josh
   Poimboeuf.  As part of this effort, module.c has been cleaned up as
   well and livepatching is piggy-backing on this cleanup.  Rusty is OK
   with this whole lot going through livepatching tree.

 - symbol disambiguation support from Chris J Arges.  That series is
   also

        Reviewed-by: Miroslav Benes <mbenes@suse.cz>

   but this came in only after I've alredy pushed out.  Didn't want to
   rebase because of that, hence I am mentioning it here.

 - symbol lookup fix from Miroslav Benes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch: Cleanup module page permission changes
  module: keep percpu symbols in module's symtab
  module: clean up RO/NX handling.
  module: use a structure to encapsulate layout.
  gcov: use within_module() helper.
  module: Use the same logic for setting and unsetting RO/NX
  livepatch: function,sympos scheme in livepatch sysfs directory
  livepatch: add sympos as disambiguator field to klp_reloc
  livepatch: add old_sympos as disambiguator field to klp_func
2016-01-14 16:38:02 -08:00
Jerome Marchand eca56ff906 mm, shmem: add internal shmem resident memory accounting
Currently looking at /proc/<pid>/status or statm, there is no way to
distinguish shmem pages from pages mapped to a regular file (shmem pages
are mapped to /dev/zero), even though their implication in actual memory
use is quite different.

The internal accounting currently counts shmem pages together with
regular files.  As a preparation to extend the userspace interfaces,
this patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for
shmem pages separately from MM_FILEPAGES.  The next patch will expose it
to userspace - this patch doesn't change the exported values yet, by
adding up MM_SHMEMPAGES to MM_FILEPAGES at places where MM_FILEPAGES was
used before.  The only user-visible change after this patch is the OOM
killer message that separates the reported "shmem-rss" from "file-rss".

[vbabka@suse.cz: forward-porting, tweak changelog]
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14 16:00:49 -08:00
Linus Torvalds d080827f85 libnvdimm for 4.5
1/ Media error handling: The 'badblocks' implementation that originated
    in md-raid is up-levelled to a generic capability of a block device.
    This initial implementation is limited to being consulted in the pmem
    block-i/o path.  Later, 'badblocks' will be consulted when creating
    dax mappings.
 
 2/ Raw block device dax: For virtualization and other cases that want
    large contiguous mappings of persistent memory, add the capability to
    dax-mmap a block device directly.
 
 3/ Increased /dev/mem restrictions: Add an option to treat all io-memory
    as IORESOURCE_EXCLUSIVE, i.e. disable /dev/mem access while a driver is
    actively using an address range.  This behavior is controlled via the
    new CONFIG_IO_STRICT_DEVMEM option and can be overridden by the
    existing "iomem=relaxed" kernel command line option.
 
 4/ Miscellaneous fixes include a 'pfn'-device huge page alignment fix,
    block device shutdown crash fix, and other small libnvdimm fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWlrhjAAoJEB7SkWpmfYgCFbAQALKsQfFwT6JFS+zlPgiNpbqw
 2VMNKEH0AfGYGj96mT02j2q+vSUmXLMIDMTsbe0sDdtwFZtQbFmhmryzPWUVppSu
 KGTlLPW8vuEhQVs91+UI3BQKkvpi0+tbR8hPOh9W6QhjpRT+lyHFKnsNR5HZy5wB
 K4/VMaT5ffd5/pXRTjkYiPQYTwWyfcvNjICj0YtqhPvOwS031m77JpFsWJ8HSpEX
 K99VlzNUPMXd1pYkHmFNXWw52fhRGNhwAEomLeKMdQfKms+KnbKp8BOSA0aCqU8E
 kpujQcilDXJwykFQZOFI3Z5Dxvrv8lxFTU8HRMBvo3ESzfTWjfqcvyjGOjDUcruw
 ihESFSJtdZzhrBiMnf9RRqSpMFJvAT8MVT6Q4D3mZUHCMPbUqFJsQjMPt9hEH3ho
 4F0D2lesOCkubUKFTZmjMoDb+szuKbVhYK8TeFVVEhizinc/Aj0NKuazJqi+CXB/
 xh0ER4ZxD8wvzqFFWvS5UvR1G9I5fr7+3jGRUrqGLHlSdeXP9dkEg28ao3QbWk3x
 1dPOen6ZqQ9WJ/E7eGmXbVEz2R4Xd79hMXQzdQwmKDk/KbxRoAp7hyU8BslAyrBf
 HCdmVt+RAgrxZYfFRXuLhqwEBThJnNrgZA3qu74FUpkpFg6xRUu1bAYBiF7N+bFi
 82b5UbMkveBTtkXjJoiR
 =7V5r
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "The bulk of this has appeared in -next and independently received a
  build success notification from the kbuild robot.  The 'for-4.5/block-
  dax' topic branch was rebased over the weekend to drop the "block
  device end-of-life" rework that Al would like to see re-implemented
  with a notifier, and to address bug reports against the badblocks
  integration.

  There is pending feedback against "libnvdimm: Add a poison list and
  export badblocks" received last week.  Linda identified some localized
  fixups that we will handle incrementally.

  Summary:

   - Media error handling: The 'badblocks' implementation that
     originated in md-raid is up-levelled to a generic capability of a
     block device.  This initial implementation is limited to being
     consulted in the pmem block-i/o path.  Later, 'badblocks' will be
     consulted when creating dax mappings.

   - Raw block device dax: For virtualization and other cases that want
     large contiguous mappings of persistent memory, add the capability
     to dax-mmap a block device directly.

   - Increased /dev/mem restrictions: Add an option to treat all
     io-memory as IORESOURCE_EXCLUSIVE, i.e. disable /dev/mem access
     while a driver is actively using an address range.  This behavior
     is controlled via the new CONFIG_IO_STRICT_DEVMEM option and can be
     overridden by the existing "iomem=relaxed" kernel command line
     option.

   - Miscellaneous fixes include a 'pfn'-device huge page alignment fix,
     block device shutdown crash fix, and other small libnvdimm fixes"

* tag 'libnvdimm-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (32 commits)
  block: kill disk_{check|set|clear|alloc}_badblocks
  libnvdimm, pmem: nvdimm_read_bytes() badblocks support
  pmem, dax: disable dax in the presence of bad blocks
  pmem: fail io-requests to known bad blocks
  libnvdimm: convert to statically allocated badblocks
  libnvdimm: don't fail init for full badblocks list
  block, badblocks: introduce devm_init_badblocks
  block: clarify badblocks lifetime
  badblocks: rename badblocks_free to badblocks_exit
  libnvdimm, pmem: move definition of nvdimm_namespace_add_poison to nd.h
  libnvdimm: Add a poison list and export badblocks
  nfit_test: Enable DSMs for all test NFITs
  md: convert to use the generic badblocks code
  block: Add badblock management for gendisks
  badblocks: Add core badblock management code
  block: fix del_gendisk() vs blkdev_ioctl crash
  block: enable dax for raw block devices
  block: introduce bdev_file_inode()
  restrict /dev/mem to idle io memory ranges
  arch: consolidate CONFIG_STRICT_DEVM in lib/Kconfig.debug
  ...
2016-01-13 19:15:14 -08:00
Linus Torvalds cbd88cd4c0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "Among the traditional bug fixes and cleanups are some improvements:

   - A tool to generated the facility lists, generating the bit fields
     by hand has been a source of bugs in the past

   - The spinlock loop is reordered to avoid bursts of hypervisor calls

   - Add support for the open-for-business interface to the service
     element

   - The get_cpu call is added to the vdso

   - A set of tracepoints is defined for the common I/O layer

   - The deprecated sclp_cpi module is removed

   - Update default configuration"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (56 commits)
  s390/sclp: fix possible control register corruption
  s390: fix normalization bug in exception table sorting
  s390/configs: update default configurations
  s390/vdso: optimize getcpu system call
  s390: drop smp_mb in vdso_init
  s390: rename struct _lowcore to struct lowcore
  s390/mem_detect: use unsigned longs
  s390/ptrace: get rid of long longs in psw_bits
  s390/sysinfo: add missing SYSIB 1.2.2 multithreading fields
  s390: get rid of CONFIG_SCHED_MC and CONFIG_SCHED_BOOK
  s390/Kconfig: remove pointless 64 bit dependencies
  s390/dasd: fix failfast for disconnected devices
  s390/con3270: testing return kzalloc retval
  s390/hmcdrv: constify hmcdrv_ftp_ops structs
  s390/cio: add NULL test
  s390/cio: Change I/O instructions from inline to normal functions
  s390/cio: Introduce common I/O layer tracepoints
  s390/cio: Consolidate inline assemblies and related data definitions
  s390/cio: Fix incorrect xsch opcode specification
  s390/cio: Remove unused inline assemblies
  ...
2016-01-13 13:16:16 -08:00
Linus Torvalds aee3bfa330 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from Davic Miller:

 1) Support busy polling generically, for all NAPI drivers.  From Eric
    Dumazet.

 2) Add byte/packet counter support to nft_ct, from Floriani Westphal.

 3) Add RSS/XPS support to mvneta driver, from Gregory Clement.

 4) Implement IPV6_HDRINCL socket option for raw sockets, from Hannes
    Frederic Sowa.

 5) Add support for T6 adapter to cxgb4 driver, from Hariprasad Shenai.

 6) Add support for VLAN device bridging to mlxsw switch driver, from
    Ido Schimmel.

 7) Add driver for Netronome NFP4000/NFP6000, from Jakub Kicinski.

 8) Provide hwmon interface to mlxsw switch driver, from Jiri Pirko.

 9) Reorganize wireless drivers into per-vendor directories just like we
    do for ethernet drivers.  From Kalle Valo.

10) Provide a way for administrators "destroy" connected sockets via the
    SOCK_DESTROY socket netlink diag operation.  From Lorenzo Colitti.

11) Add support to add/remove multicast routes via netlink, from Nikolay
    Aleksandrov.

12) Make TCP keepalive settings per-namespace, from Nikolay Borisov.

13) Add forwarding and packet duplication facilities to nf_tables, from
    Pablo Neira Ayuso.

14) Dead route support in MPLS, from Roopa Prabhu.

15) TSO support for thunderx chips, from Sunil Goutham.

16) Add driver for IBM's System i/p VNIC protocol, from Thomas Falcon.

17) Rationalize, consolidate, and more completely document the checksum
    offloading facilities in the networking stack.  From Tom Herbert.

18) Support aborting an ongoing scan in mac80211/cfg80211, from
    Vidyullatha Kanchanapally.

19) Use per-bucket spinlock for bpf hash facility, from Tom Leiming.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1375 commits)
  net: bnxt: always return values from _bnxt_get_max_rings
  net: bpf: reject invalid shifts
  phonet: properly unshare skbs in phonet_rcv()
  dwc_eth_qos: Fix dma address for multi-fragment skbs
  phy: remove an unneeded condition
  mdio: remove an unneed condition
  mdio_bus: NULL dereference on allocation error
  net: Fix typo in netdev_intersect_features
  net: freescale: mac-fec: Fix build error from phy_device API change
  net: freescale: ucc_geth: Fix build error from phy_device API change
  bonding: Prevent IPv6 link local address on enslaved devices
  IB/mlx5: Add flow steering support
  net/mlx5_core: Export flow steering API
  net/mlx5_core: Make ipv4/ipv6 location more clear
  net/mlx5_core: Enable flow steering support for the IB driver
  net/mlx5_core: Initialize namespaces only when supported by device
  net/mlx5_core: Set priority attributes
  net/mlx5_core: Connect flow tables
  net/mlx5_core: Introduce modify flow table command
  net/mlx5_core: Managing root flow table
  ...
2016-01-12 18:57:02 -08:00
Linus Torvalds 33caf82acf Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "All kinds of stuff.  That probably should've been 5 or 6 separate
  branches, but by the time I'd realized how large and mixed that bag
  had become it had been too close to -final to play with rebasing.

  Some fs/namei.c cleanups there, memdup_user_nul() introduction and
  switching open-coded instances, burying long-dead code, whack-a-mole
  of various kinds, several new helpers for ->llseek(), assorted
  cleanups and fixes from various people, etc.

  One piece probably deserves special mention - Neil's
  lookup_one_len_unlocked().  Similar to lookup_one_len(), but gets
  called without ->i_mutex and tries to avoid ever taking it.  That, of
  course, means that it's not useful for any directory modifications,
  but things like getting inode attributes in nfds readdirplus are fine
  with that.  I really should've asked for moratorium on lookup-related
  changes this cycle, but since I hadn't done that early enough...  I
  *am* asking for that for the coming cycle, though - I'm going to try
  and get conversion of i_mutex to rwsem with ->lookup() done under lock
  taken shared.

  There will be a patch closer to the end of the window, along the lines
  of the one Linus had posted last May - mechanical conversion of
  ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
  inode_is_locked()/inode_lock_nested().  To quote Linus back then:

    -----
    |    This is an automated patch using
    |
    |        sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
    |        sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
    |        sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[     ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
    |        sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
    |        sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
    |
    |    with a very few manual fixups
    -----

  I'm going to send that once the ->i_mutex-affecting stuff in -next
  gets mostly merged (or when Linus says he's about to stop taking
  merges)"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  nfsd: don't hold i_mutex over userspace upcalls
  fs:affs:Replace time_t with time64_t
  fs/9p: use fscache mutex rather than spinlock
  proc: add a reschedule point in proc_readfd_common()
  logfs: constify logfs_block_ops structures
  fcntl: allow to set O_DIRECT flag on pipe
  fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
  fs: xattr: Use kvfree()
  [s390] page_to_phys() always returns a multiple of PAGE_SIZE
  nbd: use ->compat_ioctl()
  fs: use block_device name vsprintf helper
  lib/vsprintf: add %*pg format specifier
  fs: use gendisk->disk_name where possible
  poll: plug an unused argument to do_poll
  amdkfd: don't open-code memdup_user()
  cdrom: don't open-code memdup_user()
  rsxx: don't open-code memdup_user()
  mtip32xx: don't open-code memdup_user()
  [um] mconsole: don't open-code memdup_user_nul()
  [um] hostaudio: don't open-code memdup_user()
  ...
2016-01-12 17:11:47 -08:00
Linus Torvalds 1baa5efbeb * s390: Support for runtime instrumentation within guests,
support of 248 VCPUs.
 
 * ARM: rewrite of the arm64 world switch in C, support for
 16-bit VM identifiers.  Performance counter virtualization
 missed the boat.
 
 * x86: Support for more Hyper-V features (synthetic interrupt
 controller), MMU cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWlSKwAAoJEL/70l94x66DY0UIAK5vp4zfQoQOJC4KP4Xgxwdu
 kpnK2Boz3/74o1b0y5+eJZoUZCsXCVLtmP5uhmMxUYWDgByFG2X8ZDhPFwB5FYLT
 2dN+Lr4tsolgIfRdHZtrT6Svp9SDL039bWTdscnbR6l37/j9FRWvpKdhI3orloFD
 /i4CSW2dVIq1/9Xctwu/rtcOEesEx4Cad+6YV3/530eVAXFzE908nXfmqJNZTocY
 YCGcmrMVCOu0ng5QM4xSzmmYjKMLUcRs+QzZWkVBzdJtTgwZUr09yj7I2dZ1yj/i
 cxYrJy6shSwE74XkXsmvG+au3C5u3vX4tnXjBFErnPJ99oqzHatVnFWNRhj4dLQ=
 =PIj1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "PPC changes will come next week.

   - s390: Support for runtime instrumentation within guests, support of
     248 VCPUs.

   - ARM: rewrite of the arm64 world switch in C, support for 16-bit VM
     identifiers.  Performance counter virtualization missed the boat.

   - x86: Support for more Hyper-V features (synthetic interrupt
     controller), MMU cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (115 commits)
  kvm: x86: Fix vmwrite to SECONDARY_VM_EXEC_CONTROL
  kvm/x86: Hyper-V SynIC timers tracepoints
  kvm/x86: Hyper-V SynIC tracepoints
  kvm/x86: Update SynIC timers on guest entry only
  kvm/x86: Skip SynIC vector check for QEMU side
  kvm/x86: Hyper-V fix SynIC timer disabling condition
  kvm/x86: Reorg stimer_expiration() to better control timer restart
  kvm/x86: Hyper-V unify stimer_start() and stimer_restart()
  kvm/x86: Drop stimer_stop() function
  kvm/x86: Hyper-V timers fix incorrect logical operation
  KVM: move architecture-dependent requests to arch/
  KVM: renumber vcpu->request bits
  KVM: document which architecture uses each request bit
  KVM: Remove unused KVM_REQ_KICK to save a bit in vcpu->requests
  kvm: x86: Check kvm_write_guest return value in kvm_write_wall_clock
  KVM: s390: implement the RI support of guest
  kvm/s390: drop unpaired smp_mb
  kvm: x86: fix comment about {mmu,nested_mmu}.gva_to_gpa
  KVM: x86: MMU: Use clear_page() instead of init_shadow_page_table()
  arm/arm64: KVM: Detect vGIC presence at runtime
  ...
2016-01-12 13:22:12 -08:00
Michael S. Tsirkin 779a6a3696 s390: more efficient smp barriers
As per: lkml.kernel.org/r/20150921112252.3c2937e1@mschwide
atomics imply a barrier on s390, so s390 should change
smp_mb__before_atomic and smp_mb__after_atomic to barrier() instead of
smp_mb() and hence should not use the generic versions.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-01-12 20:47:05 +02:00
Michael S. Tsirkin a677f48695 s390: use generic memory barriers
The s390 kernel is SMP to 99.99%, we just didn't bother with a
non-smp variant for the memory-barriers. If the generic header
is used we'd get the non-smp version for free. It will save a
small amount of text space for CONFIG_SMP=n.

Suggested-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-01-12 20:47:04 +02:00
Michael S. Tsirkin 82b44496ab s390: define __smp_xxx
This defines __smp_xxx barriers for s390,
for use by virtualization.

Some smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h

Note: smp_mb, smp_rmb and smp_wmb are defined as full barriers
unconditionally on this architecture.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-01-12 20:46:56 +02:00
Michael S. Tsirkin 21535aaed9 s390: reuse asm-generic/barrier.h
On s390 read_barrier_depends, smp_read_barrier_depends
smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the
asm-generic variants exactly. Drop the local definitions and pull in
asm-generic/barrier.h instead.

This is in preparation to refactoring this code area.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-01-12 20:46:48 +02:00
Davidlohr Bueso 5a1b26d7c6 lcoking/barriers, arch: Use smp barriers in smp_store_release()
With commit b92b8b35a2 ("locking/arch: Rename set_mb() to smp_store_mb()")
it was made clear that the context of this call (and thus set_mb)
is strictly for CPU ordering, as opposed to IO. As such all archs
should use the smp variant of mb(), respecting the semantics and
saving a mandatory barrier on UP.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-01-12 20:46:46 +02:00
Linus Torvalds 24af98c4cf Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "So we have a laundry list of locking subsystem changes:

   - continuing barrier API and code improvements

   - futex enhancements

   - atomics API improvements

   - pvqspinlock enhancements: in particular lock stealing and adaptive
     spinning

   - qspinlock micro-enhancements"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op
  futex: Cleanup the goto confusion in requeue_pi()
  futex: Remove pointless put_pi_state calls in requeue()
  futex: Document pi_state refcounting in requeue code
  futex: Rename free_pi_state() to put_pi_state()
  futex: Drop refcount if requeue_pi() acquired the rtmutex
  locking/barriers, arch: Remove ambiguous statement in the smp_store_mb() documentation
  lcoking/barriers, arch: Use smp barriers in smp_store_release()
  locking/cmpxchg, arch: Remove tas() definitions
  locking/pvqspinlock: Queue node adaptive spinning
  locking/pvqspinlock: Allow limited lock stealing
  locking/pvqspinlock: Collect slowpath lock statistics
  sched/core, locking: Document Program-Order guarantees
  locking, sched: Introduce smp_cond_acquire() and use it
  locking/pvqspinlock, x86: Optimize the PV unlock code path
  locking/qspinlock: Avoid redundant read of next pointer
  locking/qspinlock: Prefetch the next node cacheline
  locking/qspinlock: Use _acquire/_release() versions of cmpxchg() & xchg()
  atomics: Add test for atomic operations with _relaxed variants
2016-01-11 14:18:38 -08:00
Ard Biesheuvel bcb7825a77 s390: fix normalization bug in exception table sorting
The normalization pass in the sorting routine of the relative exception
table serves two purposes:
- it ensures that the address fields of the exception table entries are
  fully ordered, so that no ambiguities arise between entries with
  identical instruction offsets (i.e., when two instructions that are
  exactly 8 bytes apart each have an exception table entry associated with
  them)
- it ensures that the offsets of both the instruction and the fixup fields
  of each entry are relative to their final location after sorting.

Commit eb608fb366 ("s390/exceptions: switch to relative exception table
entries") ported the relative exception table format from x86, but modified
the sorting routine to only normalize the instruction offset field and not
the fixup offset field. The result is that the fixup offset of each entry
will be relative to the original location of the entry before sorting,
likely leading to crashes when those entries are dereferenced.

Fixes: eb608fb366 ("s390/exceptions: switch to relative exception table entries")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-11 13:02:28 +01:00
Heiko Carstens cb14def6a9 s390/configs: update default configurations
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-11 13:01:28 +01:00
Martin Schwidefsky 249c543b97 s390/vdso: optimize getcpu system call
Add the CPU number to the per-cpu vdso data page and add the
__kernel_getcpu function to the vdso object to retrieve the
CPU number in user space.

Suggested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-11 13:01:24 +01:00
Michael S. Tsirkin 0dab3e0e59 s390: drop smp_mb in vdso_init
The initial s390 vdso code is heavily influenced by the powerpc version
which does have a smp_wmb in vdso_init right before the vdso_ready=1
assignment. s390 has no need for that.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <1452010645-25380-1-git-send-email-mst@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-11 12:27:22 +01:00
Heiko Carstens c667aeacc1 s390: rename struct _lowcore to struct lowcore
Finally get rid of the leading underscore. I tried this already two or
three years ago, however Michael Holzheu objected since this would
break the crash utility (again).

However Michael integrated support for the new name into the crash
utility back then, so it doesn't break if the name will be changed
now.  So finally get rid of the ever confusing leading underscore.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-11 12:27:15 +01:00
Heiko Carstens 423d5b364c s390/mem_detect: use unsigned longs
The memory detection code historically had to use unsigned long long
since the machine reported the true memory size (>4GB) even if the
virtual machine was running in ESA/390 mode.

Since the old code is gone use unsigned long everywhere and also get
rid of an unused ADDR2G define.

(this patch converts all long longs within sclp_info to longs)

There are many more possible conversions, however that can be done if
somebody touches the corresponding code.  Since people started to
convert unrelated long types to long longs because of the types within
struct sclp_info convert this now.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-11 12:27:11 +01:00
Heiko Carstens cb951785a7 s390/ptrace: get rid of long longs in psw_bits
The long longs were introduced by me in order to have a working
definition of the struct psw_bits also in 31 bit mode. Since that is
gone also get rid of the long longs.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-11 12:27:07 +01:00
Heiko Carstens 7022ec4960 s390/sysinfo: add missing SYSIB 1.2.2 multithreading fields
Add missing multithreading fields of SYSIB 1.2.2 (Basic-Machine CPUs)
to the output of /proc/sysinfo.

Also use bitfields for SYSIB 2.2.2 to simplify the C code a bit.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-11 12:27:00 +01:00
Dan Williams 21266be9ed arch: consolidate CONFIG_STRICT_DEVM in lib/Kconfig.debug
Let all the archs that implement devmem_is_allowed() opt-in to a common
definition of CONFIG_STRICT_DEVM in lib/Kconfig.debug.

Cc: Kees Cook <keescook@chromium.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
[heiko: drop 'default y' for s390]
Acked-by: Ingo Molnar <mingo@redhat.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-01-09 06:30:49 -08:00
Al Viro bdb97e91e0 [s390] page_to_phys() always returns a multiple of PAGE_SIZE
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09 02:16:04 -05:00
Paolo Bonzini 2860c4b167 KVM: move architecture-dependent requests to arch/
Since the numbers now overlap, it makes sense to enumerate
them in asm/kvm_host.h rather than linux/kvm_host.h.  Functions
that refer to architecture-specific requests are also moved
to arch/.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-08 19:04:36 +01:00
Fan Zhang c6e5f16637 KVM: s390: implement the RI support of guest
This patch adds runtime instrumentation support for KVM guest. We need to
setup a save area for the runtime instrumentation-controls control block(RICCB)
and implement the necessary interfaces to live migrate the guest settings.

We setup the sie control block in a way, that the runtime
instrumentation instructions of a guest are handled by hardware.

We also add a capability KVM_CAP_S390_RI to make this feature opt-in as
it needs migration support.

Signed-off-by: Fan Zhang <zhangfan@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-01-07 14:48:26 +01:00
Michael S. Tsirkin c57ee5faf4 kvm/s390: drop unpaired smp_mb
smp_mb on vcpu destroy isn't paired with anything, violating pairing
rules, and seems to be useless.

Drop it.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <1452010811-25486-1-git-send-email-mst@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-01-07 14:48:26 +01:00
Craig Gallek 538950a1b7 soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF
Expose socket options for setting a classic or extended BPF program
for use when selecting sockets in an SO_REUSEPORT group.  These options
can be used on the first socket to belong to a group before bind or
on any socket in the group after bind.

This change includes refactoring of the existing sk_filter code to
allow reuse of the existing BPF filter validation checks.

Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 22:49:59 -05:00
David S. Miller c07f30ad68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-31 18:20:10 -05:00
Heiko Carstens 9236b4dd6b s390: get rid of CONFIG_SCHED_MC and CONFIG_SCHED_BOOK
Use CONFIG_TOPOLOGY which selects CONFIG_SCHED_* all over the place to
reduce the random usage of the previous config options.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-30 10:34:57 +01:00
Heiko Carstens c095a94932 s390/Kconfig: remove pointless 64 bit dependencies
s390 is always 64 bit.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-30 10:34:47 +01:00
Daniel Borkmann 8b614aebec bpf: move clearing of A/X into classic to eBPF migration prologue
Back in the days where eBPF (or back then "internal BPF" ;->) was not
exposed to user space, and only the classic BPF programs internally
translated into eBPF programs, we missed the fact that for classic BPF
A and X needed to be cleared. It was fixed back then via 83d5b7ef99
("net: filter: initialize A and X registers"), and thus classic BPF
specifics were added to the eBPF interpreter core to work around it.

This added some confusion for JIT developers later on that take the
eBPF interpreter code as an example for deriving their JIT. F.e. in
f75298f5c3 ("s390/bpf: clear correct BPF accumulator register"), at
least X could leak stack memory. Furthermore, since this is only needed
for classic BPF translations and not for eBPF (verifier takes care
that read access to regs cannot be done uninitialized), more complexity
is added to JITs as they need to determine whether they deal with
migrations or native eBPF where they can just omit clearing A/X in
their prologue and thus reduce image size a bit, see f.e. cde66c2d88
("s390/bpf: Only clear A and X for converted BPF programs"). In other
cases (x86, arm64), A and X is being cleared in the prologue also for
eBPF case, which is unnecessary.

Lets move this into the BPF migration in bpf_convert_filter() where it
actually belongs as long as the number of eBPF JITs are still few. It
can thus be done generically; allowing us to remove the quirk from
__bpf_prog_run() and to slightly reduce JIT image size in case of eBPF,
while reducing code duplication on this matter in current(/future) eBPF
JITs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Zi Shen Lim <zlim.lnx@gmail.com>
Cc: Yang Shi <yang.shi@linaro.org>
Acked-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-18 16:04:51 -05:00
Peter Oberparleiter 2ab59de7c5 s390/cio: Consolidate inline assemblies and related data definitions
Replace the current semi-arbitrary distribution of inline assemblies:
 - Inline assemblies used by CIO go into ioasm.h
 - Data definitions used by inline assemblies go into cio.h

Beyond cleaning up the current structure this is also required for
use of tracepoints in inline assemblies introduced by a follow-on
patch.

Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:34 +01:00
Christian Borntraeger d7ae65aec0 s390/setup: cleanup machine flags
Over time some machine flags got unused (e.g. MACHINE_FLAG_MVPG)
or are available on all 64bit systems (MACHINE_FLAG_CSP,
MACHINE_FLAG_IEEE) - let's remove them.
Reorder the other ones to match the order of the MACHINE_HAS_*
macros and renumber all bits to avoid holes.
Also fix the comment about where the flags are detected.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:32 +01:00
Heiko Carstens 3dbc78d3a1 s390/smp: save timestamp on external calls
This is supposed to make debugging easier: if within a dump we can see
that an external call or emergency signal IPI is pending but all cpus
are idle, we have no idea for how long the interrupt is outstanding.

Therefore save a timestamp into the per cpu pcpu array of the target
cpu whenever such an IPI is sent.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:31 +01:00
Christian Borntraeger e527aec434 s390/sysinfo: Remove unused variables
max_mnest and rc are never used.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:30 +01:00
Christian Borntraeger 90f405e859 s390/extmem: remove unused variable
findseg_scode is assigned, but never used.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:29 +01:00
Christian Borntraeger 292d8d7157 s390/fault: remove unused variable
address is assigned but never used.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:28 +01:00
Christian Borntraeger f9101e639c s390/traps: Remove unused variable
location is assigned but never used.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:27 +01:00
Heiko Carstens 6527f6e0cb s390: compile head.S always with -march=z900
head.S on s390 contains some sanity checks if the kernel will run on a
machine or if the machine is too old, e.g. if the kernel contains
instructions not available on the machine. If so, it will emit an error
message to the console before it stops execution.

Therefore head.S contains only instructions which are availanble with the
earliest machine generation (z900).  In order to make sure we don't
accidently add instructions which are not available on z900, always compile
with -march=z900. This makes sure compilation will fail if wrong
instructions are used.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:26 +01:00
Heiko Carstens f3d05f9de7 s390/facilities: add z13 als bit
If configured for z13 assume the kernel makes use of the instructions
that are part of the load-and-zero-rightmost-byte facility and
load/store-on-condition facility 2.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:24 +01:00
Heiko Carstens 6ab6c31a54 s390/facilities: optimize test_facility()
test_facility() can be optimized for bits which must be set anyway,
due to the check in head.S. This removes a couple of superfluous
runtime checks.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:23 +01:00
Heiko Carstens 185edd4431 s390/facilities: remove unneeded facility bits
The facility lists contain a lot of bits which are not necessary to
run the kernel.  Therefore remove them and keep only those bits which
are required for the kernel.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:22 +01:00
Heiko Carstens 0358ecf732 s390/facilities: make use of generated facility list
Change head.S to make use of the generated facility list.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:21 +01:00
Heiko Carstens c30f6828fe s390/facilities: add helper tool to generate facility lists
Modifying the architecture level set facility lists was always very
error prone. Given the numbering of the facility bits within the
Principles of Operation, where the most significant bit number is 0,
it happened a lot of times that wrong bits were set or cleared.

Therefore this patch adds a tool "gen_facilities" which generates
include/generated/facilites.h.  The definition of the bits to be set
is contained within arch/s390/include/asm/facilities_src.h and can be
easily extended to e.g. also generate such lists for the KVM module.

The generated file looks like this:

 #define FACILITIES_ALS _AC(0xc1006450f0040000,UL)
 #define FACILITIES_ALS_DWORDS 1

The facility bits defined in this patch match 1:1 to the current masks
that can be found in head.S.

That is if the tool gets executed with -march=z990 then the generated
masks will equal the masks in head.S for CONFIG_MARCH_Z990.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:20 +01:00
Heiko Carstens 76cdd44c2e s390/facilities: always use lowcore's stfle field for storing facility bits
head.s contains an stfle instruction which stores it result at the
storage location that is assigned to the stfl instruction.

This is currently no problem, since we only care about one double
word. However if the number of double words in the ALS bitfield grows
the current code is not very stable.

E.g. before issuing the stfle command the memory to which it stores
must be cleared, since the instruction may or may not clear memory
contents where no bits are set.

In order to simplify the code a bit always use the storage location
that we reserved for the stfle result.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:19 +01:00
Heiko Carstens 9552a66fe6 s390/facilities: use stfl mnemonic instead of insn magic
Now that 31 bit support is gone, the assembler always knows about the
stfl instruction. Therefore lets use a readable mnemonic.  Also remove
the not needed extable entry for the inline assembly and fix the
output constraint.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:18 +01:00
Michael Holzheu 272fa59ccb s390/dis: Fix handling of format specifiers
The print_insn() function returns strings like "lghi %r1,0". To escape the
'%' character in sprintf() a second '%' is used. For example "lghi %%r1,0"
is converted into "lghi %r1,0".

After print_insn() the output string is passed to printk(). Because format
specifiers like "%r" or "%f" are ignored by printk() this works by chance
most of the time. But for instructions with control registers like
"lctl %c6,%c6,780" this fails because printk() interprets "%c" as
character format specifier.

Fix this problem and escape the '%' characters twice.

For example "lctl %%%%c6,%%%%c6,780" is then converted by sprintf()
into "lctl %%c6,%%c6,780" and by printk() into "lctl %c6,%c6,780".

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:43:21 +01:00
Guenther Hutzl 32e6b236d2 KVM: s390: consider system MHA for guest storage
Verify that the guest maximum storage address is below the MHA (maximum
host address) value allowed on the host.

Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Guenther Hutzl <hutzl@linux.vnet.ibm.com>
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
[adopt to match recent limit,size changes]

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-12-15 17:08:22 +01:00
Dominik Dingel a3a92c31bf KVM: s390: fix mismatch between user and in-kernel guest limit
While the userspace interface requests the maximum size the gmap code
expects to get a maximum address.

This error resulted in bigger page tables than necessary for some guest
sizes, e.g. a 2GB guest used 3 levels instead of 2.

At the same time we introduce KVM_S390_NO_MEM_LIMIT, which allows in a
bright future that a guest spans the complete 64 bit address space.

We also switch to TASK_MAX_SIZE for the initial memory size, this is a
cosmetic change as the previous size also resulted in a 4 level pagetable
creation.

Reported-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-12-15 17:08:21 +01:00
Christian Borntraeger 8335713ad0 KVM: s390: obey kptr_restrict in traces
The s390dbf and trace events provide a debugfs interface.
If kptr_restrict is active, we should not expose kernel
pointers. We can fence the debugfs output by using %pK
instead of %p.

Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-12-15 17:06:32 +01:00
Christian Borntraeger 7ec7c8c70b KVM: s390: use assignment instead of memcpy
Replace two memcpy with proper assignment.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-12-15 16:06:48 +01:00
Rusty Russell 7523e4dc50 module: use a structure to encapsulate layout.
Makes it easier to handle init vs core cleanly, though the change is
fairly invasive across random architectures.

It simplifies the rbtree code immediately, however, while keeping the
core data together in the same cachline (now iff the rbtree code is
enabled).

Acked-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-12-04 22:46:25 +01:00
Davidlohr Bueso d5a73cadf3 lcoking/barriers, arch: Use smp barriers in smp_store_release()
With commit b92b8b35a2 ("locking/arch: Rename set_mb() to smp_store_mb()")
it was made clear that the context of this call (and thus set_mb)
is strictly for CPU ordering, as opposed to IO. As such all archs
should use the smp variant of mb(), respecting the semantics and
saving a mandatory barrier on UP.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-04 11:39:51 +01:00
Christian Borntraeger 2f8a43d45d KVM: s390: remove redudant assigment of error code
rc already contains -ENOMEM, no need to assign it twice.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
2015-11-30 12:47:13 +01:00
Heiko Carstens a6aacc3f87 KVM: s390: remove pointless test_facility(2) check
This evaluates always to 'true'.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:12 +01:00
David Hildenbrand 07197fd05f KVM: s390: don't load kvm without virtualization support
If we don't have support for virtualization (SIE), e.g. when running under
a hypervisor not supporting execution of the SIE instruction, we should
immediately abort loading the kvm module, as the SIE instruction cannot
be enabled dynamically.

Currently, the SIE instructions fails with an exception on a non-SIE
host, resulting in the guest making no progress, instead of failing hard.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:12 +01:00
David Hildenbrand 7f16d7e787 s390: show virtualization support in /proc/cpuinfo
This patch exposes the SIE capability (aka virtualization support) via
/proc/cpuinfo -> "features" as "sie".

As we don't want to expose this hwcap via elf, let's add a second,
"internal"/non-elf capability list. The content is simply concatenated
to the existing features when printing /proc/cpuinfo.

We also add the defines to elf.h to keep the hwcap stuff at a common
place.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:11 +01:00
David Hildenbrand 8dfd523f85 s390/sclp: introduce check for SIE
This patch adds a way to check if the SIE with zArchitecture support is
available.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:11 +01:00
David Hildenbrand 4215825eeb KVM: s390: don't switch to ESCA for ucontrol
sca_add_vpcu is not called for ucontrol guests. We must also not
apply the sca checking for sca_can_add_vcpu as ucontrol guests
do not have to follow the sca limits.

As common code already checks that id < KVM_MAX_VCPUS all other
data structures are safe as well.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:11 +01:00
David Hildenbrand eaa78f3432 KVM: s390: cleanup sca_add_vcpu
Now that we already have kvm and the VCPU id set for the VCPU, we can
convert sda_add_vcpu to look much more like sda_del_vcpu.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:10 +01:00
David Hildenbrand 10ce32d5b0 KVM: s390: always set/clear the SCA sda field
Let's always set and clear the sda when enabling/disabling a VCPU.
Dealing with sda being set to something else makes no sense anymore
as we enable a VCPU in the SCA now after it has been registered at
the VM.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:10 +01:00
David Hildenbrand 2550882449 KVM: s390: fix SCA related races and double use
If something goes wrong in kvm_arch_vcpu_create, the VCPU has already
been added to the sca but will never be removed. Trying to create VCPUs
with duplicate ids (e.g. after a failed attempt) is problematic.

Also, when creating multiple VCPUs in parallel, we could theoretically
forget to set the correct SCA when the switch to ESCA happens just
before the VCPU is registered.

Let's add the VCPU to the SCA in kvm_arch_vcpu_postcreate, where we can
be sure that no duplicate VCPU with the same id is around and the VCPU
has already been registered at the VM. We also have to make sure to update
ECB at that point.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:09 +01:00
David Hildenbrand 5f3fe620a5 KVM: s390: we always have a SCA
Having no sca can never happen, even when something goes wrong when
switching to ESCA. Otherwise we would have a serious bug.
Let's remove this superfluous check.

Acked-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:09 +01:00
David Hildenbrand 2c1bb2be98 KVM: s390: fast path for sca_ext_call_pending
If CPUSTAT_ECALL_PEND isn't set, we can't have an external call pending,
so we can directly avoid taking the lock.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:09 +01:00
Eugene (jno) Dvurechenski fe0edcb731 KVM: s390: Enable up to 248 VCPUs per VM
This patch allows s390 to have more than 64 VCPUs for a guest (up to
248 for memory usage considerations), if supported by the underlaying
hardware (sclp.has_esca).

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:08 +01:00
Eugene (jno) Dvurechenski 5e04431523 KVM: s390: Introduce switching code
This patch adds code that performs transparent switch to Extended
SCA on addition of 65th VCPU in a VM. Disposal of ESCA is added too.
The entier ESCA functionality, however, is still not enabled.
The enablement will be provided in a separate patch.

This patch also uses read/write lock protection of SCA and its subfields for
possible disposal at the BSCA-to-ESCA transition. While only Basic SCA needs such
a protection (for the swap), any SCA access is now guarded.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:08 +01:00
Eugene (jno) Dvurechenski 7d43bafcff KVM: s390: Make provisions for ESCA utilization
This patch updates the routines (sca_*) to provide transparent access
to and manipulation on the data for both Basic and Extended SCA in use.
The kvm.arch.sca is generalized to (void *) to handle BSCA/ESCA cases.
Also the kvm.arch.use_esca flag is provided.
The actual functionality is kept the same.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:08 +01:00
Eugene (jno) Dvurechenski bc784ccee5 KVM: s390: Introduce new structures
This patch adds new structures and updates some existing ones to
provide the base for Extended SCA functionality.

The old sca_* structures were renamed to bsca_* to keep things uniform.

The access to fields of SIGP controls were turned into bitfields instead
of hardcoded bitmasks.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:07 +01:00
Eugene (jno) Dvurechenski a6e2f683e7 KVM: s390: Provide SCA-aware helpers for VCPU add/del
This patch provides SCA-aware helpers to create/delete a VCPU.
This is to prepare for upcoming introduction of Extended SCA support.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:07 +01:00
Eugene (jno) Dvurechenski a5bd764734 KVM: s390: Generalize access to SIGP controls
This patch generalizes access to the SIGP controls, which is a part of SCA.
This is to prepare for upcoming introduction of Extended SCA support.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:06 +01:00
Eugene (jno) Dvurechenski 605145103a KVM: s390: Generalize access to IPTE controls
This patch generalizes access to the IPTE controls, which is a part of SCA.
This is to prepare for upcoming introduction of Extended SCA support.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:06 +01:00
Eugene (jno) Dvurechenski f7ba1d3426 s390/sclp: introduce checks for ESCA and HVS
Introduce sclp.has_hvs and sclp.has_esca to provide a way for kvm to check
whether the extended-SCA and the home-virtual-SCA facilities are available.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:06 +01:00
David Hildenbrand 71f116bfed KVM: s390: rewrite vcpu_post_run and drop out early
Let's rewrite this function to better reflect how we actually handle
exit_code. By dropping out early we can save a few cycles. This
especially speeds up sie exits caused by host irqs.

Also, let's move the special -EOPNOTSUPP for intercepts to
the place where it belongs and convert it to -EREMOTE.

Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:05 +01:00
David Hildenbrand e09fefdeeb KVM: Use common function for VCPU lookup by id
Let's reuse the new common function for VPCU lookup by id.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[split out the new function into a separate patch]
2015-11-30 12:47:04 +01:00
Martin Schwidefsky 419123f900 s390/spinlock: do not yield to a CPU in udelay/mdelay
It does not make sense to try to relinquish the time slice with diag 0x9c
to a CPU in a state that does not allow to schedule the CPU. The scenario
where this can happen is a CPU waiting in udelay/mdelay while holding a
spin-lock.

Add a CIF bit to tag a CPU in enabled wait and use it to detect that the
yield of a CPU will not be successful and skip the diagnose call.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:18 +01:00
Heiko Carstens 9eb31be33c s390: remove is_32bit_task() helper
is_32bit_task() used to be helpful when we still had CONFIG_32BIT.
Since that is gone, it is nowadays identical to is_compat_task().
So remove it.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:17 +01:00
Michael Holzheu b8eecf36a4 s390: add 'install' target to 'make help'
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:17 +01:00
Sascha Silbe 3f975df69d s390/sclp: Add VT220 support to early sclp console
When running under qemu with the default configuration (-nographic),
there is only a VT220 SCLP console, no line-mode SCLP console. Add
VT220 support to the early SCLP console so the user has a chance to
see critical error messages during early boot.

None of the existing users of _sclp_print_early() check the return
code. Instead of trying to come up with return code semantics when
printing to multiple consoles (any or all of which may fail), we just
drop the return code entirely.

Tested on z/VM (line mode console) and LPAR (VT220 and line mode
console). Tested on qemu/KVM with VT220 console and / or line mode
console.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:17 +01:00
Christian Borntraeger 561e103002 s390/dis: Fix printing of the register numbers
Since commit b006f19b05 ("lib/vsprintf.c: handle invalid format
specifiers more robustly") I get errors like
[...]
Krnl Code: 00000000004e2410: c00400000000        brcl 0,4e2410
Please remove unsupported %r in format string
[    8.179483] ------------[ cut here ]------------
[    8.179484] WARNING: at lib/vsprintf.c:1781

Turns out that our disassembler relied on %r not being used as format
string. Let's do the proper escaping of our decode buffers.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:16 +01:00
Gerald Schaefer 69eea95c48 s390/pci_dma: fix DMA table corruption with > 4 TB main memory
DMA addresses returned from map_page() are calculated by using an iommu
bitmap plus a start_dma offset. The size of this bitmap is based on the main
memory size. If we have more than (4 TB - start_dma) main memory, the DMA
address calculation will also produce addresses > 4 TB. Such addresses
cannot be inserted in the 3-level DMA page table, instead the entries
modulo 4 TB will be overwritten.

Fix this by restricting the iommu bitmap size to (4 TB - start_dma).
Also set zdev->end_dma to the actual end address of the usable
range, instead of the theoretical maximum as reported by the hardware,
which fixes a sanity check in dma_map() and also the IOMMU API domain
geometry aperture calculation.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:15 +01:00
David Hildenbrand 406123517b s390: get_user_pages_fast() might sleep
Let's annotate it correctly, so we directly get a warning if
we ever were to use it in atomic/preempt_disable/spinlock environment.

Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:15 +01:00
Martin Schwidefsky db1c45154a s390/spinlock: avoid diagnose loop
The spinlock implementation calls the diagnose 0x9c / 0x44 immediately
if the SIGP sense running reported the target CPU as not running.

The diagnose 0x9c is a hint to the hypervisor to schedule the target
CPU in preference to the source CPU that issued the diagnose. It can
happen that on return from the diagnose the target CPU has not been
scheduled yet, e.g. if the target logical CPU is on another physical
CPU and the hypervisor did not want to migrate the logical CPU.

Avoid the immediate repeat of the diagnose instruction, instead do
the retry loop before the next invocation of diagnose 0x9c.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:15 +01:00
Martin Schwidefsky 1a2c5840ac s390/dump: cleanup CPU save area handling
Introduce save_area_alloc(), save_area_boot_cpu(), save_area_add_regs()
and save_area_add_vxrs to deal with storing the CPU state in case of a
system dump. Remove struct save_area and save_area_ext, and create a new
struct save_area as a local definition to arch/s390/kernel/crash_dump.c.
Copy each individual field from the hardware status area to the save area,
storing the minimum of required data.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:14 +01:00
Martin Schwidefsky 1a36a39e22 s390/dump: rework CPU register dump code
To collect the CPU registers of the crashed system allocated a single
page with memblock_alloc_base and use it as a copy buffer. Replace the
stop-and-store-status sigp with a store-status-at-address sigp in
smp_save_dump_cpus() and smp_store_status(). In both cases the target
CPU is already stopped and store-status-at-address avoids the detour
via the absolute zero page.

For kexec simplify s390_reset_system and call store_status() before
the prefix register of the boot CPU has been set to zero. Use STPX
to store the prefix register and remove dump_prefix_page.

Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:14 +01:00
Martin Schwidefsky f08b841463 s390/dump: remove SAVE_AREA_BASE
Replace the SAVE_AREA_BASE offset calculations in reipl.S with the
assembler constant for the location of each register status area.

Use __LC_FPREGS_SAVE_AREA instead of SAVE_AREA_BASE in the three
remaining code locations and remove the definition of SAVE_AREA_BASE.

Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:14 +01:00
Martin Schwidefsky d9a3a09af5 s390/kvm: remove dependency on struct save_area definition
Replace the offsets based on the struct area_area with the offset
constants from asm-offsets.c based on the struct _lowcore.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:13 +01:00
Martin Schwidefsky df9694c797 s390/dump: streamline oldmem copy functions
Introduce two copy functions for the memory of the dumped system,
copy_oldmem_kernel() to copy to the virtual kernel address space
and copy_oldmem_user() to copy to user space.

Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:12 +01:00
Martin Schwidefsky 8a07dd02d7 s390/kdump: remove code to create ELF notes in the crashed system
The s390 architecture can store the CPU registers of the crashed system
after the kdump kernel has been started and this is the preferred way.
Remove the remaining code fragments that deal with storing CPU registers
while the crashed system is still active.

Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:12 +01:00
Martin Schwidefsky ffa52d02c5 s390/zcore: remove /sys/kernel/debug/zcore/mem
New versions of the SCSI dumper use the /dev/vmcore interface instead
of zcore mem. Remove the outdated interface.

Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:12 +01:00
Martin Schwidefsky bbfed511c2 s390/zcore: copy vector registers into the image data
The /sys/kernel/debug/zcore/mem interface delivers the memory of the
old system with the CPU registers stored to the assigned locations in
each prefix page.

For the vector registers the prefix page of each CPU has an address of
a 1024 byte save area at 0x11b0. But the /sys/kernel/debug/zcore/mem
interface fails copy the vector registers saved at boot of the zfcpdump
kernel into the dump image.

Copy the saved vector registers of a CPU to the outout buffer if the
memory area that is read via /sys/kernel/debug/zcore/mem intersects
with the vector register save area of this CPU.

Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:11 +01:00
Paolo Bonzini 8bd142c016 KVM/ARM Fixes for v4.4-rc3.
Includes some timer fixes, properly unmapping PTEs, an errata fix, and two
 tweaks to the EL2 panic code.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWVJ7gAAoJEEtpOizt6ddyD5MH/3M/nhtZTnT6v0RPDvHWJo7s
 5BQmITJYPHFkTO14OHWTVLXiGgLws8gPZnWHxC4jjHjpuJnL+/MM551FpCOqDDd7
 vweYgVlSqD8ANH5nKbv1PPnzjrqhTVN+yi3ZItXy2pxsfvu63FC6Z43B2axelLvw
 XYmHoMZaeWBBw2gHi3djGfju3Yj/2SOe+ozuvAXpxA5+NhSiPHHnMefGy5k3wKnJ
 sETwshPdjiMeK4ItfMhveFTDRjl4uh9uQyORfaa5gqG0uePt3EalYynw+gEjZ6RX
 Bpc3nLwboIfRIa/WwyoHm+nmLIUYjU8dAgLwUOIbdeG0igpdALdvsB0aBHCgngk=
 =+7ED
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master

KVM/ARM Fixes for v4.4-rc3.

Includes some timer fixes, properly unmapping PTEs, an errata fix, and two
tweaks to the EL2 panic code.
2015-11-24 19:34:40 +01:00
David Hildenbrand 152e9f65d6 KVM: s390: fix wrong lookup of VCPUs by array index
For now, VCPUs were always created sequentially with incrementing
VCPU ids. Therefore, the index in the VCPUs array matched the id.

As sequential creation might change with cpu hotplug, let's use
the correct lookup function to find a VCPU by id, not array index.

Let's also use kvm_lookup_vcpu() for validation of the sending VCPU
on external call injection.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org # db27a7a KVM: Provide function for VCPU lookup by id
2015-11-19 14:47:43 +01:00
David Hildenbrand b85de33a1a KVM: s390: avoid memory overwrites on emergency signal injection
Commit 383d0b0501 ("KVM: s390: handle pending local interrupts via
bitmap") introduced a possible memory overwrite from user space.

User space could pass an invalid emergency signal code (sending VCPU)
and therefore exceed the bitmap. Let's take care of this case and
check that the id is in the valid range.

Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v3.19+ db27a7a KVM: Provide function for VCPU lookup by id
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-19 14:47:32 +01:00
Heiko Carstens 03c02807e2 KVM: s390: fix pfmf intercept handler
The pfmf intercept handler should check if the EDAT 1 facility
is installed in the guest, not if it is installed in the host.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-19 11:08:17 +01:00
David Hildenbrand 5967c17b11 KVM: s390: enable SIMD only when no VCPUs were created
We should never allow to enable/disable any facilities for the guest
when other VCPUs were already created.

kvm_arch_vcpu_(load|put) relies on SIMD not changing during runtime.
If somebody would create and run VCPUs and then decides to enable
SIMD, undefined behaviour could be possible (e.g. vector save area
not being set up).

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org # 4.1+
2015-11-19 11:08:16 +01:00
Heiko Carstens f52c74fee9 s390: remove SALIPL loader
There is no known user, therefore remove the code.

Acked-by: Rob Van Der Heij <robvdheij@nl.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-16 12:51:11 +01:00
Heiko Carstens 932f608193 s390: wire up mlock2 system call
Passes mlock2-tests test case in 64 bit and compat mode.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-16 12:51:07 +01:00
Heiko Carstens ddfd4a054b s390: remove g5 elf platform support
Remove dead code, since this could only happen on a 31 bit machine
where the kernel wouldn't IPL.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-16 12:04:41 +01:00
Martin Schwidefsky c7e8b2c21c s390: avoid cache aliasing under z/VM and KVM
commit 1f6b83e5e4 ("s390: avoid z13 cache aliasing") checks for the
machine type to optimize address space randomization and zero page
allocation to avoid cache aliases.

This check might fail under a hypervisor with migration support.
z/VMs "Single System Image and Live Guest Relocation" facility will
"fake" the machine type of the oldest system in the group. For example
in a group of zEC12 and Z13 the guest appears to run on a zEC12
(architecture fencing within the relocation domain)

Remove the machine type detection and always use cache aliasing
rules that are known to work for all machines. These are the z13
aliasing rules.

Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-16 12:04:18 +01:00
Sascha Silbe f07f21b3e2 s390/sclp: _sclp_wait_int(): retain full PSW mask
There's no reason to clear all PSW mask bits other than the addressing
mode bits. Just use the previous PSW mask as-is.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-12 13:08:00 +01:00
Sebastian Ott 18e22a1772 s390: add support for ipl devices in subchannel sets > 0
Allow to ipl from CCW based devices residing in any subchannel set.

Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-11 13:56:27 +01:00
Sebastian Ott e0bedada3a s390/ipl: fix out of bounds access in scpdata_write
The input buffer in reipl_fcp_scpdata_write is accessed out of bounds
when an offset is specified. The problem is that the offset refers to
the data we should write to and not to the buffer we read from.

So instead of
        memcpy(scp_data, buf + off, count);
we could just do
        memcpy(scp_data + off, buf, count);

However we not only modify the data but also store its length. For this to
work we'd need to remember a state per open FH. Since that's not possible
with sysfs callbacks let's just fail when an offset is specified.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-11 09:07:06 +01:00
Sebastian Ott 52d43d8184 s390/pci_dma: improve debugging of errors during dma map
Improve debugging to find out what went wrong during a failed
dma map/unmap operation.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-09 09:10:49 +01:00
Sebastian Ott 66728eeea6 s390/pci_dma: handle dma table failures
We use lazy allocation for translation table entries but don't handle
allocation (and other) failures during translation table updates.

Handle these failures and undo translation table updates when it's
meaningful.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-09 09:10:49 +01:00
Sebastian Ott 4d5a6b7295 s390/pci_dma: unify label of invalid translation table entries
Newly allocated translation table entries are flagged as invalid
and protected. If an existing translation table entry is invalidated,
the protection flag is left unchanged.

If a page (with invalid and protection flag set) is accessed it's
undefined which type of exception we'll receive.

Make sure to always set the invalid flag only.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-09 09:10:49 +01:00
Heiko Carstens 86b68c3873 s390/syscalls: remove system call number calculation
Explicitly write the system call number for each define instead of
calculating it. This makes it easier to parse the file when generating
system call tables for various tools and libraries.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-09 09:10:48 +01:00
Martin Schwidefsky 230ccb370f s390/diag: add a s390 prefix to the diagnose trace point
Documentation/trace/tracepoints.txt states that the naming scheme
for tracepoints is "subsys_event" to avoid collisions. Rename
the 'diagnose' tracepoint to 's390_diagnose'.

Reported-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-09 09:10:47 +01:00
Sascha Silbe c6eafbf990 s390/head: fix error message on unsupported hardware
startup calls the C function _sclp_print_early() if the machine we're
running on is not supported by the kernel. sclp.c is getting built
with -m64, so _sclp_print_early() expects the zSeries ELF ABI to be
used.

We previously called _sclp_print_early() using the S/390 ELF ABI, with
a stack frame size of 96 bytes and while being in 31-bit address
mode. This caused _sclp_wait_int() (called indirectly from
_sclp_print_early()) to jump to an undefined address. While
_sclp_wait_int() contained some code to deal with being called in
31-bit addressing mode, it didn't quite work. While fixing this is
possible, the code would still only work by chance and could break any
time.

Ensure compliance with the zSeries ELF ABI by switching to 64-bit
addressing mode early and using a minimum stack frame size of 160
bytes.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-09 09:10:47 +01:00
Linus Torvalds 933425fb00 s390: A bunch of fixes and optimizations for interrupt and time
handling.
 
 PPC: Mostly bug fixes.
 
 ARM: No big features, but many small fixes and prerequisites including:
 - a number of fixes for the arch-timer
 - introducing proper level-triggered semantics for the arch-timers
 - a series of patches to synchronously halt a guest (prerequisite for
   IRQ forwarding)
 - some tracepoint improvements
 - a tweak for the EL2 panic handlers
 - some more VGIC cleanups getting rid of redundant state
 
 x86: quite a few changes:
 
 - support for VT-d posted interrupts (i.e. PCI devices can inject
 interrupts directly into vCPUs).  This introduces a new component (in
 virt/lib/) that connects VFIO and KVM together.  The same infrastructure
 will be used for ARM interrupt forwarding as well.
 
 - more Hyper-V features, though the main one Hyper-V synthetic interrupt
 controller will have to wait for 4.5.  These will let KVM expose Hyper-V
 devices.
 
 - nested virtualization now supports VPID (same as PCID but for vCPUs)
 which makes it quite a bit faster
 
 - for future hardware that supports NVDIMM, there is support for clflushopt,
 clwb, pcommit
 
 - support for "split irqchip", i.e. LAPIC in kernel + IOAPIC/PIC/PIT in
 userspace, which reduces the attack surface of the hypervisor
 
 - obligatory smattering of SMM fixes
 
 - on the guest side, stable scheduler clock support was rewritten to not
 require help from the hypervisor.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWO2IQAAoJEL/70l94x66D/K0H/3AovAgYmJQToZlimsktMk6a
 f2xhdIqfU5lIQQh5uNBCfL3o9o8H9Py1ym7aEw3fmztPHHJYc91oTatt2UEKhmEw
 VtZHp/dFHt3hwaIdXmjRPEXiYctraKCyrhaUYdWmUYkoKi7lW5OL5h+S7frG2U6u
 p/hFKnHRZfXHr6NSgIqvYkKqtnc+C0FWY696IZMzgCksOO8jB1xrxoSN3tANW3oJ
 PDV+4og0fN/Fr1capJUFEc/fejREHneANvlKrLaa8ht0qJQutoczNADUiSFLcMPG
 iHljXeDsv5eyjMtUuIL8+MPzcrIt/y4rY41ZPiKggxULrXc6H+JJL/e/zThZpXc=
 =iv2z
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "First batch of KVM changes for 4.4.

  s390:
     A bunch of fixes and optimizations for interrupt and time handling.

  PPC:
     Mostly bug fixes.

  ARM:
     No big features, but many small fixes and prerequisites including:

      - a number of fixes for the arch-timer

      - introducing proper level-triggered semantics for the arch-timers

      - a series of patches to synchronously halt a guest (prerequisite
        for IRQ forwarding)

      - some tracepoint improvements

      - a tweak for the EL2 panic handlers

      - some more VGIC cleanups getting rid of redundant state

  x86:
     Quite a few changes:

      - support for VT-d posted interrupts (i.e. PCI devices can inject
        interrupts directly into vCPUs).  This introduces a new
        component (in virt/lib/) that connects VFIO and KVM together.
        The same infrastructure will be used for ARM interrupt
        forwarding as well.

      - more Hyper-V features, though the main one Hyper-V synthetic
        interrupt controller will have to wait for 4.5.  These will let
        KVM expose Hyper-V devices.

      - nested virtualization now supports VPID (same as PCID but for
        vCPUs) which makes it quite a bit faster

      - for future hardware that supports NVDIMM, there is support for
        clflushopt, clwb, pcommit

      - support for "split irqchip", i.e.  LAPIC in kernel +
        IOAPIC/PIC/PIT in userspace, which reduces the attack surface of
        the hypervisor

      - obligatory smattering of SMM fixes

      - on the guest side, stable scheduler clock support was rewritten
        to not require help from the hypervisor"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits)
  KVM: VMX: Fix commit which broke PML
  KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()
  KVM: x86: allow RSM from 64-bit mode
  KVM: VMX: fix SMEP and SMAP without EPT
  KVM: x86: move kvm_set_irq_inatomic to legacy device assignment
  KVM: device assignment: remove pointless #ifdefs
  KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic
  KVM: x86: zero apic_arb_prio on reset
  drivers/hv: share Hyper-V SynIC constants with userspace
  KVM: x86: handle SMBASE as physical address in RSM
  KVM: x86: add read_phys to x86_emulate_ops
  KVM: x86: removing unused variable
  KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs
  KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
  KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
  KVM: arm/arm64: Optimize away redundant LR tracking
  KVM: s390: use simple switch statement as multiplexer
  KVM: s390: drop useless newline in debugging data
  KVM: s390: SCA must not cross page boundaries
  KVM: arm: Do not indent the arguments of DECLARE_BITMAP
  ...
2015-11-05 16:26:26 -08:00
Linus Torvalds 39cf7c3981 IOMMU Updates for Linux v4.4
This time including:
 
 	* A new IOMMU driver for s390 pci devices
 
 	* Common dma-ops support based on iommu-api for ARM64. The plan is to
 	  use this as a basis for ARM32 and hopefully other architectures as
 	  well in the future.
 
 	* MSI support for ARM-SMMUv3
 
 	* Cleanups and dead code removal in the AMD IOMMU driver
 
 	* Better RMRR handling for the Intel VT-d driver
 
 	* Various other cleanups and small fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJWOz7hAAoJECvwRC2XARrjbvYQALwtITTA5iTm0y/ApwNMxI7n
 pZpjZVPoBPNsGBc4t/MT8pVhUSdmpBOljbV4Y4CayL1mSSB6Bl2gooZjd66m7Z81
 qMJYEVWhFQqVsIKkCSNOgaO7W5y+xt3rTgqN6vCu86/CCDfKrTPP/+CRl1T/z9bo
 1J8ioM3KnZG9KzG8JuXYFg5wwbKToaBh6swSmj+O4U9hru7zV/ILP7ikcc9pyMji
 12WbzCqchRORsJZD65xMRYAqRaPNN/3IlDejs00TOFhY3qpWgEgFUucyeRJBJ/+q
 K4U8T5vZsnr1a04l7/BeYbLmP7y/9Qv0N0xMGtTyoy/w/BieGqRWu4hHhqf/44NO
 EhCSXcEThMNCGTjP2VWC4dnQ/s7Y8OmSW9nCreUcFVxHoE5LfDoh8RngA2fpeNuS
 ixb3OwP+YXHN9Ck+1BQqQCeBznsPTLuDxlhRjCJsWntIfMSkXebOkz83YxyZ9b0Q
 gFvptfuknU7cotUwWa3dg8RiUB8kNlKJyEEByaVpWEbEOabnONKEMkstvuBx6Ots
 kA63wbe7QcPgbUYuq7g0nijDw6E2aEtf0nx2Xx4ZDL932qjg/xUkiBpmbDXHw4Gu
 nimNXVQtbCzF74SyTvxEtupiijOTm5eHtoKtg0mYnqPZ+V9eOwEvW8IHaFFf8XHD
 SecikoTtH1Q4RVtqOcAQ
 =jLlB
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:
 "This time including:

   - A new IOMMU driver for s390 pci devices

   - Common dma-ops support based on iommu-api for ARM64.  The plan is
     to use this as a basis for ARM32 and hopefully other architectures
     as well in the future.

   - MSI support for ARM-SMMUv3

   - Cleanups and dead code removal in the AMD IOMMU driver

   - Better RMRR handling for the Intel VT-d driver

   - Various other cleanups and small fixes"

* tag 'iommu-updates-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (41 commits)
  iommu/vt-d: Fix return value check of parse_ioapics_under_ir()
  iommu/vt-d: Propagate error-value from ir_parse_ioapic_hpet_scope()
  iommu/vt-d: Adjust the return value of the parse_ioapics_under_ir
  iommu: Move default domain allocation to iommu_group_get_for_dev()
  iommu: Remove is_pci_dev() fall-back from iommu_group_get_for_dev
  iommu/arm-smmu: Switch to device_group call-back
  iommu/fsl: Convert to device_group call-back
  iommu: Add device_group call-back to x86 iommu drivers
  iommu: Add generic_device_group() function
  iommu: Export and rename iommu_group_get_for_pci_dev()
  iommu: Revive device_group iommu-ops call-back
  iommu/amd: Remove find_last_devid_on_pci()
  iommu/amd: Remove first/last_device handling
  iommu/amd: Initialize amd_iommu_last_bdf for DEV_ALL
  iommu/amd: Cleanup buffer allocation
  iommu/amd: Remove cmd_buf_size and evt_buf_size from struct amd_iommu
  iommu/amd: Align DTE flag definitions
  iommu/amd: Remove old alias handling code
  iommu/amd: Set alias DTE in do_attach/do_detach
  iommu/amd: WARN when __[attach|detach]_device are called with irqs enabled
  ...
2015-11-05 16:12:10 -08:00
Linus Torvalds e627078a0c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "There is only one new feature in this pull for the 4.4 merge window,
  most of it is small enhancements, cleanup and bug fixes:

   - Add the s390 backend for the software dirty bit tracking.  This
     adds two new pgtable functions pte_clear_soft_dirty and
     pmd_clear_soft_dirty which is why there is a hit to
     arch/x86/include/asm/pgtable.h in this pull request.

   - A series of cleanup patches for the AP bus, this includes the
     removal of the support for two outdated crypto cards (PCICC and
     PCICA).

   - The irq handling / signaling on buffer full in the runtime
     instrumentation code is dropped.

   - Some micro optimizations: remove unnecessary memory barriers for a
     couple of functions: [smb_]rmb, [smb_]wmb, atomics, bitops, and for
     spin_unlock.  Use the builtin bswap if available and make
     test_and_set_bit_lock more cache friendly.

   - Statistics and a tracepoint for the diagnose calls to the
     hypervisor.

   - The CPU measurement facility support to sample KVM guests is
     improved.

   - The vector instructions are now always enabled for user space
     processes if the hardware has the vector facility.  This simplifies
     the FPU handling code.  The fpu-internal.h header is split into fpu
     internals, api and types just like x86.

   - Cleanup and improvements for the common I/O layer.

   - Rework udelay to solve a problem with kprobe.  udelay has busy loop
     semantics but still uses an idle processor state for the wait"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (66 commits)
  s390: remove runtime instrumentation interrupts
  s390/cio: de-duplicate subchannel validation
  s390/css: unneeded initialization in for_each_subchannel
  s390/Kconfig: use builtin bswap
  s390/dasd: fix disconnected device with valid path mask
  s390/dasd: fix invalid PAV assignment after suspend/resume
  s390/dasd: fix double free in dasd_eckd_read_conf
  s390/kernel: fix ptrace peek/poke for floating point registers
  s390/cio: move ccw_device_stlck functions
  s390/cio: move ccw_device_call_handler
  s390/topology: reduce per_cpu() invocations
  s390/nmi: reduce size of percpu variable
  s390/nmi: fix terminology
  s390/nmi: remove casts
  s390/nmi: remove pointless error strings
  s390: don't store registers on disabled wait anymore
  s390: get rid of __set_psw_mask()
  s390/fpu: split fpu-internal.h into fpu internals, api, and type headers
  s390/dasd: fix list_del corruption after lcu changes
  s390/spinlock: remove unneeded serializations at unlock
  ...
2015-11-04 11:31:31 -08:00
Linus Torvalds b0f85fa11a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

Changes of note:

 1) Allow to schedule ICMP packets in IPVS, from Alex Gartrell.

 2) Provide FIB table ID in ipv4 route dumps just as ipv6 does, from
    David Ahern.

 3) Allow the user to ask for the statistics to be filtered out of
    ipv4/ipv6 address netlink dumps.  From Sowmini Varadhan.

 4) More work to pass the network namespace context around deep into
    various packet path APIs, starting with the netfilter hooks.  From
    Eric W Biederman.

 5) Add layer 2 TX/RX checksum offloading to qeth driver, from Thomas
    Richter.

 6) Use usec resolution for SYN/ACK RTTs in TCP, from Yuchung Cheng.

 7) Support Very High Throughput in wireless MESH code, from Bob
    Copeland.

 8) Allow setting the ageing_time in switchdev/rocker.  From Scott
    Feldman.

 9) Properly autoload L2TP type modules, from Stephen Hemminger.

10) Fix and enable offload features by default in 8139cp driver, from
    David Woodhouse.

11) Support both ipv4 and ipv6 sockets in a single vxlan device, from
    Jiri Benc.

12) Fix CWND limiting of thin streams in TCP, from Bendik Rønning
    Opstad.

13) Fix IPSEC flowcache overflows on large systems, from Steffen
    Klassert.

14) Convert bridging to track VLANs using rhashtable entries rather than
    a bitmap.  From Nikolay Aleksandrov.

15) Make TCP listener handling completely lockless, this is a major
    accomplishment.  Incoming request sockets now live in the
    established hash table just like any other socket too.

    From Eric Dumazet.

15) Provide more bridging attributes to netlink, from Nikolay
    Aleksandrov.

16) Use hash based algorithm for ipv4 multipath routing, this was very
    long overdue.  From Peter Nørlund.

17) Several y2038 cures, mostly avoiding timespec.  From Arnd Bergmann.

18) Allow non-root execution of EBPF programs, from Alexei Starovoitov.

19) Support SO_INCOMING_CPU as setsockopt, from Eric Dumazet.  This
    influences the port binding selection logic used by SO_REUSEPORT.

20) Add ipv6 support to VRF, from David Ahern.

21) Add support for Mellanox Spectrum switch ASIC, from Jiri Pirko.

22) Add rtl8xxxu Realtek wireless driver, from Jes Sorensen.

23) Implement RACK loss recovery in TCP, from Yuchung Cheng.

24) Support multipath routes in MPLS, from Roopa Prabhu.

25) Fix POLLOUT notification for listening sockets in AF_UNIX, from Eric
    Dumazet.

26) Add new QED Qlogic river, from Yuval Mintz, Manish Chopra, and
    Sudarsana Kalluru.

27) Don't fetch timestamps on AF_UNIX sockets, from Hannes Frederic
    Sowa.

28) Support ipv6 geneve tunnels, from John W Linville.

29) Add flood control support to switchdev layer, from Ido Schimmel.

30) Fix CHECKSUM_PARTIAL handling of potentially fragmented frames, from
    Hannes Frederic Sowa.

31) Support persistent maps and progs in bpf, from Daniel Borkmann.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1790 commits)
  sh_eth: use DMA barriers
  switchdev: respect SKIP_EOPNOTSUPP flag in case there is no recursion
  net: sched: kill dead code in sch_choke.c
  irda: Delete an unnecessary check before the function call "irlmp_unregister_service"
  net: dsa: mv88e6xxx: include DSA ports in VLANs
  net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports
  net/core: fix for_each_netdev_feature
  vlan: Invoke driver vlan hooks only if device is present
  arcnet/com20020: add LEDS_CLASS dependency
  bpf, verifier: annotate verbose printer with __printf
  dp83640: Only wait for timestamps for packets with timestamping enabled.
  ptp: Change ptp_class to a proper bitmask
  dp83640: Prune rx timestamp list before reading from it
  dp83640: Delay scheduled work.
  dp83640: Include hash in timestamp/packet matching
  ipv6: fix tunnel error handling
  net/mlx5e: Fix LSO vlan insertion
  net/mlx5e: Re-eanble client vlan TX acceleration
  net/mlx5e: Return error in case mlx5e_set_features() fails
  net/mlx5e: Don't allow more than max supported channels
  ...
2015-11-04 09:41:05 -08:00
Linus Torvalds ccc9d4a6d6 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "API:

   - Add support for cipher output IVs in testmgr
   - Add missing crypto_ahash_blocksize helper
   - Mark authenc and des ciphers as not allowed under FIPS.

Algorithms:

   - Add CRC support to 842 compression
   - Add keywrap algorithm
   - A number of changes to the akcipher interface:
      + Separate functions for setting public/private keys.
      + Use SG lists.

Drivers:

   - Add Intel SHA Extension optimised SHA1 and SHA256
   - Use dma_map_sg instead of custom functions in crypto drivers
   - Add support for STM32 RNG
   - Add support for ST RNG
   - Add Device Tree support to exynos RNG driver
   - Add support for mxs-dcp crypto device on MX6SL
   - Add xts(aes) support to caam
   - Add ctr(aes) and xts(aes) support to qat
   - A large set of fixes from Russell King for the marvell/cesa driver"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (115 commits)
  crypto: asymmetric_keys - Fix unaligned access in x509_get_sig_params()
  crypto: akcipher - Don't #include crypto/public_key.h as the contents aren't used
  hwrng: exynos - Add Device Tree support
  hwrng: exynos - Fix missing configuration after suspend to RAM
  hwrng: exynos - Add timeout for waiting on init done
  dt-bindings: rng: Describe Exynos4 PRNG bindings
  crypto: marvell/cesa - use __le32 for hardware descriptors
  crypto: marvell/cesa - fix missing cpu_to_le32() in mv_cesa_dma_add_op()
  crypto: marvell/cesa - use memcpy_fromio()/memcpy_toio()
  crypto: marvell/cesa - use gfp_t for gfp flags
  crypto: marvell/cesa - use dma_addr_t for cur_dma
  crypto: marvell/cesa - use readl_relaxed()/writel_relaxed()
  crypto: caam - fix indentation of close braces
  crypto: caam - only export the state we really need to export
  crypto: caam - fix non-block aligned hash calculation
  crypto: caam - avoid needlessly saving and restoring caam_hash_ctx
  crypto: caam - print errno code when hash registration fails
  crypto: marvell/cesa - fix memory leak
  crypto: marvell/cesa - fix first-fragment handling in mv_cesa_ahash_dma_last_req()
  crypto: marvell/cesa - rearrange handling for sw padded hashes
  ...
2015-11-04 09:11:12 -08:00
Paolo Bonzini 197a4f4b06 KVM/ARM Changes for v4.4-rc1
Includes a number of fixes for the arch-timer, introducing proper
 level-triggered semantics for the arch-timers, a series of patches to
 synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
 improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
 getting rid of redundant state, and finally a stylistic change that gets rid of
 some ctags warnings.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWOhgAAAoJEEtpOizt6ddyKS8H/2ZHMTPjo6yChnrusNWy4Qbr
 6laPDlzL+g45oMQRwNL7GnM1deRftaxvT2Wi+X84D/6Y/BD6MPds4HgtBfuWcSZ1
 CyRJ0Ot/zrxenucSuJuOjq+a9gdizdAczkbB1MfYDULJH8fb6D+7RYLo3zgh4Xo4
 pla3L9U6gSWe+YopBjZtZH43m3fwiwSM/v+uHOTIcXrsbR+fEgx/EFSKmA/DUCuo
 P5cFO/ceUGu7nATCexu5V82TgR2hvurrsR7mqfwY8YcF6HRM+NEOoS29xWC77v5S
 u/F08TKuKQLv0YTEFTyLETI/oEeuC0cHtrRQBNf4+9kXEOzKyXaae0wR/I6X2Ss=
 =GMNk
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM Changes for v4.4-rc1

Includes a number of fixes for the arch-timer, introducing proper
level-triggered semantics for the arch-timers, a series of patches to
synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
getting rid of redundant state, and finally a stylistic change that gets rid of
some ctags warnings.

Conflicts:
	arch/x86/include/asm/kvm_host.h
2015-11-04 16:24:17 +01:00
Martin Schwidefsky b38feccd66 s390: remove runtime instrumentation interrupts
The external interrupts for runtime instrumentation buffer-full
and runtime instrumentation halted are unused and have no current
user. Remove the support and ignore the second parameter of the
s390_runtime_instr system call from now on.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-03 14:40:51 +01:00
Christian Borntraeger 295d8fa961 s390/Kconfig: use builtin bswap
Depending on the gcc version we can use builtin_bswap instead of
architecture functions. Doing so is better than the inline assembly
version of load reverse for two reasons:
- the sequence of load reversed, apply constant mask, save reversed can
  be optimized to load, apply reversed mask, save
- builtins are slightly better to optimize e.g. gcc instruction
  scheduler cannot optimize grouping on inline assemblies.

To enable set we have to ARCH_USE_BUILTIN_BSWAP.
bloat-o-meter results:
add/remove: 1/1 grow/shrink: 75/533 up/down: 1711/-9394 (-7683)

Suggested-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-03 14:40:48 +01:00
Martin Schwidefsky 55a423b6f1 s390/kernel: fix ptrace peek/poke for floating point registers
git commit 155e839a81
"s390/kernel: dynamically allocate FP register save area"
introduced a regression in regard to ptrace.

If the vector register extension is not present or unused the
ptrace peek of a floating pointer register return incorrect data
and the ptrace poke to a floating pointer register overwrites the
task structure starting at task->thread.fpu.fprs.

Cc: stable@kernel.org # v4.3
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-03 14:40:42 +01:00
Joerg Roedel b67ad2f7c7 Merge branches 'x86/vt-d', 'arm/omap', 'arm/smmu', 's390', 'core' and 'x86/amd' into next
Conflicts:
	drivers/iommu/amd_iommu_types.h
2015-11-02 20:03:34 +09:00
Christian Borntraeger 46b708ea87 KVM: s390: use simple switch statement as multiplexer
We currently do some magic shifting (by exploiting that exit codes
are always a multiple of 4) and a table lookup to jump into the
exit handlers. This causes some calculations and checks, just to
do an potentially expensive function call.

Changing that to a switch statement gives the compiler the chance
to inline and dynamically decide between jump tables or inline
compare and branches. In addition it makes the code more readable.

bloat-o-meter gives me a small reduction in code size:

add/remove: 0/7 grow/shrink: 1/1 up/down: 986/-1334 (-348)
function                                     old     new   delta
kvm_handle_sie_intercept                      72    1058    +986
handle_prog                                  704     696      -8
handle_noop                                   54       -     -54
handle_partial_execution                      60       -     -60
intercept_funcs                              120       -    -120
handle_instruction                           198       -    -198
handle_validity                              210       -    -210
handle_stop                                  316       -    -316
handle_external_interrupt                    368       -    -368

Right now my gcc does conditional branches instead of jump tables.
The inlining seems to give us enough cycles as some micro-benchmarking
shows minimal improvements, but still in noise.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-10-29 15:59:11 +01:00
Christian Borntraeger 58c383c62e KVM: s390: drop useless newline in debugging data
the s390 debug feature does not need newlines. In fact it will
result in empty lines. Get rid of 4 leftovers.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-10-29 15:58:54 +01:00
David Hildenbrand c5c2c39346 KVM: s390: SCA must not cross page boundaries
We seemed to have missed a few corner cases in commit f6c137ff00
("KVM: s390: randomize sca address").

The SCA has a maximum size of 2112 bytes. By setting the sca_offset to
some unlucky numbers, we exceed the page.

0x7c0 (1984) -> Fits exactly
0x7d0 (2000) -> 16 bytes out
0x7e0 (2016) -> 32 bytes out
0x7f0 (2032) -> 48 bytes out

One VCPU entry is 32 bytes long.

For the last two cases, we actually write data to the other page.
1. The address of the VCPU.
2. Injection/delivery/clearing of SIGP externall calls via SIGP IF.

Especially the 2. happens regularly. So this could produce two problems:
1. The guest losing/getting external calls.
2. Random memory overwrites in the host.

So this problem happens on every 127 + 128 created VM with 64 VCPUs.

Cc: stable@vger.kernel.org # v3.15+
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-29 15:58:41 +01:00
Heiko Carstens 439eb131f7 s390/topology: reduce per_cpu() invocations
Each per_cpu() invocation generates extra code. Since there are a lot
of similiar calls in the topology code we can avoid a lot of them.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-27 09:34:39 +01:00
Heiko Carstens 36324963a3 s390/nmi: reduce size of percpu variable
Change the flag fields within struct mcck_struct to simple bit fields
to reduce the size of the structure which is used as percpu variable.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-27 09:33:57 +01:00
Heiko Carstens 975be635d9 s390/nmi: fix terminology
According to the architecture registers are validated and not
revalidated. So change comments and functions names to match.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-27 09:33:56 +01:00
Heiko Carstens dc6e15556a s390/nmi: remove casts
Remove all the casts to and from the machine check interruption code.
This patch changes struct mci to a union, which contains an anonymous
structure with the already known bits and in addition an unsigned
long field, which contains the raw machine check interruption code.

This allows to simply assign and decoce the interruption code value
without the need for all those casts we had all the time.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-27 09:33:55 +01:00
Heiko Carstens 3d68286a43 s390/nmi: remove pointless error strings
s390_handle_damage() has character string parameter which was used as
a pointer to verbose error message. The hope was (a lot of years ago)
when analyzing dumps that register R2 would still contain the pointer
and therefore it would be rather easy to tell what went wrong.

However gcc optimizes the strings away since a long time. And even if
it wouldn't it is necessary to have a close look at the machine check
interruption code to tell what's wrong.

So remove the pointless error strings.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-27 09:33:54 +01:00
Heiko Carstens f9e6edfb9c s390: don't store registers on disabled wait anymore
The current disabled wait code stores register contents into their
save areas, however it is (at least) missing the new vector registers.

Given the fact that the whole exercise seems to be rather pointless
simply don't save any registers anymore.

In a "live" system it is always possible to inspect register contents,
and in case of a dump the register contents will be stored by the
dump mechanism.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-27 09:33:48 +01:00
Heiko Carstens ecbafda853 s390: get rid of __set_psw_mask()
With the removal of 31 bit code we can always assume that the epsw
instruction is available. Therefore use the __extract_psw() function
to disable and enable machine checks.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-27 09:33:44 +01:00
Christoffer Dall 3217f7c25b KVM: Add kvm_arch_vcpu_{un}blocking callbacks
Some times it is useful for architecture implementations of KVM to know
when the VCPU thread is about to block or when it comes back from
blocking (arm/arm64 needs to know this to properly implement timers, for
example).

Therefore provide a generic architecture callback function in line with
what we do elsewhere for KVM generic-arch interactions.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22 23:01:41 +02:00
David S. Miller 26440c835f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/asix_common.c
	net/ipv4/inet_connection_sock.c
	net/switchdev/switchdev.c

In the inet_connection_sock.c case the request socket hashing scheme
is completely different in net-next.

The other two conflicts were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-20 06:08:27 -07:00
Hendrik Brueckner b0753902d4 s390/fpu: split fpu-internal.h into fpu internals, api, and type headers
Split the API and FPU type definitions into separate header files
similar to "x86/fpu: Rename fpu-internal.h to fpu/internal.h" (78f7f1e54b).

The new header files and their meaning are:

asm/fpu/types.h:
	FPU related data types, needed for 'struct thread_struct' and
	'struct task_struct'.

asm/fpu/api.h:
	FPU related 'public' functions for other subsystems and device
	drivers.

asm/fpu/internal.h:
	FPU internal functions mainly used to convert
	FPU register contents in signal handling.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-16 09:41:12 +02:00
LABBE Corentin 19b14e7e22 crypto: s390/sha - replace raw value by their coresponding define
SHA_MAX_STATE_SIZE is just the number of u32 word for SHA512.
So replace the raw value "16" by their meaning (SHA512_DIGEST_SIZE / 4)

Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-10-15 21:05:11 +08:00
Christian Borntraeger fdbbe8e791 s390/spinlock: remove unneeded serializations at unlock
the kernel locks have aqcuire/release semantics. No operation done
after the lock can be "moved" before the lock and no operation before
the unlock can be moved after the unlock. But it is perfectly fine
that memory accesses which happen code wise after unlock are performed
within the critical section.
On s390x, reads are in-order with other reads (PoP section
"Storage-Operand Fetch References") and writes are in-order with
other writes (PoP section "Storage-Operand Store References"). Writes
are also in-order with reads to the same memory location (PoP section
"Storage-Operand Store References"). To other CPUs (and the channel
subsystem), reads additionally appear to be performed prior to reads or
writes that happen after them in the conceptual sequence (PoP section
"Relation between Operand Accesses").
So at least as observed by other CPUs and the channel subsystem, reads
inside the critical sections will not happen after unlock (and writes
are in-order anyway). That's exactly what we need for "RELEASE
operations" (memory-barriers.txt): "It guarantees that all memory
operations before the RELEASE operation will appear to happen before the
RELEASE operation with respect to the other components of the system."

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-By: Sascha Silbe <silbe@linux.vnet.ibm.com>
[cross-reading and lot of improvements for the patch description]
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:25 +02:00
Heiko Carstens 29b0a8250b s390/etr,stp: fix possible deadlock on machine check
The first level machine check handler for etr and stp machine checks may
call queue_work() while in nmi context. This may deadlock e.g. if the
machine check happened when the interrupted context did hold a lock, that
also will be acquired by queue_work().
Therefore split etr and stp machine check handling into first and second
level handling. The second level handling will then issue the queue_work()
call in process context which avoids the potential deadlock.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:18 +02:00
Sebastian Ott 7cc8944e13 s390/pci: reshuffle struct used to write debug data
zpci_err_insn writes stale stack content to the debugfs.

Ensure that the struct in zpci_err_insn is ordered in a way that
we don't have uninitialized holes in it. In addition to that
add the packed attribute.

Fixes: 3d8258e (s390/pci: move debug messages to debugfs)
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:17 +02:00
Heiko Carstens 48002bd5af s390/bitops: remove 31 bit related comments
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:15 +02:00
Heiko Carstens e4165dcbc0 s390/cmpxchg: remove dead code
With the removal of 31 bit support a couple of defines became unused.
Remove them.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:15 +02:00
Heiko Carstens 004f0bba19 s390/nmi: change type of mcck_interruption_code lowcore field
For some unknown reason the mcck_interruption_code field is defined
as array of two 32 bit values. Given that this actually is a 64 bit
field according to the architecture, change the type to u64.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:15 +02:00
Heiko Carstens 92778b9920 s390/flags: use _BITUL macro
The defines that are used in entry.S have been partially converted to
use the _BITUL macro (setup.h). This patch converts the rest.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:14 +02:00
Heiko Carstens ac25e790d9 s390/flags: fix flag handling
The cpu flags and pt_regs flags fields are each 64 bits in size. A flag can
be set with helper functions like set_cpu_flags().

These functions create a mask using "1U << flag". This doesn't work if flag
is larger than 31, since 1U << 32 == 0.

So fix this in case we ever will have such flag numbers.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:14 +02:00
Heiko Carstens db7e007fd6 s390/udelay: make udelay have busy loop semantics
When using systemtap it was observed that our udelay implementation is
rather suboptimal if being called from a kprobe handler installed by
systemtap.

The problem observed when a kprobe was installed on lock_acquired().
When the probe was hit the kprobe handler did call udelay, which set
up an (internal) timer and reenabled interrupts (only the clock comparator
interrupt) and waited for the interrupt.
This is an optimization to avoid that the cpu is busy looping while waiting
that enough time passes. The problem is that the interrupt handler still
does call irq_enter()/irq_exit() which then again can lead to a deadlock,
since some accounting functions may take locks as well.

If one of these locks is the same, which caused lock_acquired() to be
called, we have a nice deadlock.

This patch reworks the udelay code for the interrupts disabled case to
immediately leave the low level interrupt handler when the clock
comparator interrupt happens. That way no C code is being called and the
deadlock cannot happen anymore.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:13 +02:00
Christian Borntraeger e22cf8ca6f s390/cpumf: rework program parameter setting to detect guest samples
The program parameter can be used to mark hardware samples with
some token.  Previously, it was used to mark guest samples only.

Improve the program parameter doubleword by combining two parts,
the leftmost LPP part and the rightmost PID part.  Set the PID
part for processes by using the task PID.
To distinguish host and guest samples for the kernel (PID part
is zero), the guest must always set the program paramater to a
non-zero value.  Use the leftmost bit in the LPP part of the
program parameter to be able to detect guest kernel samples.

[brueckner@linux.vnet.ibm.com]: Split __LC_CURRENT and introduced
__LC_LPP. Corrected __LC_CURRENT users and adjusted assembler parts.
And updated the commit message accordingly.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:12 +02:00
Martin Schwidefsky 6a62b485ea s390/asm: make use of the OFFSET macro to define assember constants
The use of OFFSET instead of DEFINE makes the definitions in asm-offsets.c
more readable. While we are at it sort the defines for struct _lowcore
according to the field order and remove some unneeded defines.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:10 +02:00
Hendrik Brueckner 83abeffbd5 s390/entry: add assembler macro to conveniently tests under mask
Various functions in entry.S perform test-under-mask instructions
to test for particular bits in memory.  Because test-under-mask uses
a mask value of one byte, the mask value and the offset into the
memory must be calculated manually.  This easily introduces errors
and is hard to review and read.

Introduce the TSTMSK assembler macro to specify a mask constant and
let the macro calculate the offset and the byte mask to generate a
test-under-mask instruction.  The benefit is that existing symbolic
constants can now be used for tests.  Also the macro checks for
zero mask values and mask values that consist of multiple bytes.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:09 +02:00
Hendrik Brueckner 0ac277790e s390/fpu: add static FPU save area for init_task
Previously, the init task did not have an allocated FPU save area and
saving an FPU state was not possible.  Now if the vector extension is
always enabled, provide a static FPU save area to save FPU states of
vector instructions that can be executed quite early.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:08 +02:00
Hendrik Brueckner b5510d9b68 s390/fpu: always enable the vector facility if it is available
If the kernel detects that the s390 hardware supports the vector
facility, it is enabled by default at an early stage.  To force
it off, use the novx kernel parameter.  Note that there is a small
time window, where the vector facility is enabled before it is
forced to be off.

With enabling the vector facility by default, the FPU save and
restore functions can be improved.  They do not longer require
to manage expensive control register updates to enable or disable
the vector enablement control for particular processes.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:08 +02:00
Martin Schwidefsky 395e6aa1d0 s390/mm: try to avoid storage key operation in ptep_set_access_flags
The call to pgste_set_key in ptep_set_access_flags can be avoided
if the old pte is found to be valid at the time the new access
rights are set. The function that created the old, valid pte already
completed the required storage key operation.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:07 +02:00
Martin Schwidefsky 5da7667c03 s390/barrier: remove unnecessary serialization in atomics and bitops
The principles of operation states reads are in order, writes are in
order, writes can be reordered after reads, but no reads can be
reordered after writes.

The atomic and bitops variantes for z196 use the interlocked-access
facility instructions with a memory barrier before and after the
instruction. Because of the memory ordering the first barrier is
unnecessary and can be removed.

Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:07 +02:00
Martin Schwidefsky b5a6b71b19 s390/diag: add tracepoint for diagnose calls
To be able to analyse problems in regard to hypervisor overhead
add a tracepoing for diagnose calls. It reports the number of
the diagnose issued, e.g.

            sshd-1385  [002] ....    42.701431: diagnose: nr=0x9c
          <idle>-0     [001] ..s.    43.587528: diagnose: nr=0x9c

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:06 +02:00
Martin Schwidefsky 1ec2772e0c s390/diag: add a statistic for diagnose calls
Introduce /sys/debug/kernel/diag_stat with a statistic how many diagnose
calls have been done by each CPU in the system.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:06 +02:00
Martin Schwidefsky acdc9fc9a8 s390/bitops: implement cache friendly test_and_set_bit_lock
The generic implementation for test_and_set_bit_lock in include/asm-generic
uses the standard test_and_set_bit operation. This is done with either a
'csg' or a 'loag' instruction. For both version the cache line is fetched
exclusively, even if the bit is already set. The result is an increase in
cache traffic, for a contented lock this is a bad idea.

Acked-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:06 +02:00
Martin Schwidefsky 5614dd920a s390/mm: implement soft-dirty bits for user memory change tracking
Use bit 2**1 of the pte and bit 2**14 of the pmd for the soft dirty
bit. The fault mechanism to do dirty tracking is already in place.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:05 +02:00
Sebastian Ott 9d49f86dab s390/cio: introduce pathmask_to_pos
We often need to correlate an 8 bit path mask with the position
in a channel path array. Introduce and use pathmask_to_pos for
that task.

Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:04 +02:00
Sebastian Ott 1bc6664bdf s390/cio: use device_lock during cmb activation
Hold the device_lock during [de]activation of the channel measurement
block to synchronize concurrent usage of these functions.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:02 +02:00
Alexander Kuleshov 3c4aac86cb s390/crash_dump: use for_each_mem_range
The <linux/memblock.h> already provides for_each_mem_range() macro that
iterates through memblock areas from type_a and not included in type_b.
We can remove custom for_each_dump_mem_range() macro and use the
for_each_mem_range() instead.

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2015-10-14 14:32:01 +02:00
Christian Borntraeger 1afc82aee4 s390/barrier: avoid serialization in [smp_]rmb and [smp_]wmb
The principles of operation says:

The storage-operand fetch references of one instruction
occur after those of all preceding instructions and
before those of subsequent instructions, as observed
by other CPUs and by channel programs.
[...]
The CPU may fetch the operands of instructions before the
instructions are executed.
[...]
The CPU may delay placing results in storage.
[...]
the results of one instruction are placed in storage after
the results of all preceding instructions have been placed
in storage and before any results of the succeeding
instructions are stored, as observed by other CPUs and by
the channel subsystem.

which boils down to:
- reads are in order
- writes are in order
- reads can happen earlier
- writes can happen later

By definition (see memory-barrier.txt) read barriers orders
reads vs reads and write barriers orders writes agains writes.
but neither of these orders reads vs. writes.

That means we can implement smp_wmb,smp_rmb,wmb and rmb as
simple compiler barriers. To avoid reviewing all driver code
for correct barrier usage we keep dma_[rw]mb as serialization
for now.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:01 +02:00
Christian Borntraeger 33b5881d11 s390/vdso: use correct memory barrier
By definition smp_wmb only orders writes against writes. (Finish all
previous writes, and do not start any future write). To protect the
vdso init code against early reads on other CPUs, let's use a full
smp_mb at the end of vdso init. As right now smp_wmb is implemented
as full serialization, this needs no stable backport, but this change
will be necessary if we reimplement smp_wmb.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:01 +02:00
Christian Borntraeger e0af21c56d s390/spinlock: use correct barriers
_raw_write_lock_wait first sets the high order bit to indicate a
pending writer and then waits for the reader to drop to zero.
smp_rmb by definition only orders reads against reads. Let's use
a full smp_mb instead. As right now smp_rmb is implemented
as full serialization, this needs no stable backport, but this
patch will be necessary if we reimplement smp_rmb.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:00 +02:00
Michael Holzheu b02064a9b8 s390/numa: write kernel message when emu_size has been increased
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:31:59 +02:00
David Hildenbrand 60417fcc2b KVM: s390: factor out reading of the guest TOD clock
Let's factor this out and always use get_tod_clock_fast() when
reading the guest TOD.

STORE CLOCK FAST does not do serialization and, therefore, might
result in some fuzziness between different processors in a way
that subsequent calls on different CPUs might have time stamps that
are earlier. This semantics is fine though for all KVM use cases.
To make it obvious that the new function has STORE CLOCK FAST
semantics we name it kvm_s390_get_tod_clock_fast.

With this patch, we only have a handful of places were we
have to care about STP sync (using preempt_disable() logic).

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:35 +02:00
David Hildenbrand 25ed167596 KVM: s390: factor out and fix setting of guest TOD clock
Let's move that whole logic into one function. We now always use unsigned
values when calculating the epoch (to avoid over/underflow defined).
Also, we always have to get all VCPUs out of SIE before doing the update
to avoid running differing VCPUs with different TODs.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:35 +02:00
David Hildenbrand 5a3d883a59 KVM: s390: switch to get_tod_clock() and fix STP sync races
Nobody except early.c makes use of store_tod_clock() to handle the
cc. So if we would get a cc != 0, we would be in more trouble.

Let's replace all users with get_tod_clock(). Returning a cc
on an ioctl sounded strange either way.

We can now also easily move the get_tod_clock() call into the
preempt_disable() section. This is in fact necessary to make the
STP sync work as expected. Otherwise the host TOD could change
and we would end up with a wrong epoch calculation.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:34 +02:00
David Hildenbrand 238293b14d KVM: s390: correctly handle injection of pgm irqs and per events
PER events can always co-exist with other program interrupts.

For now, we always overwrite all program interrupt parameters when
injecting any type of program interrupt.

Let's handle that correctly by only overwriting the relevant portion of
the program interrupt parameters. Therefore we can now inject PER events
and ordinary program interrupts concurrently, resulting in no loss of
program interrupts. This will especially by helpful when manually detecting
PER events later - as both types might be triggered during one SIE exit.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:34 +02:00
David Hildenbrand 66933b78e3 KVM: s390: simplify in-kernel program irq injection
The main reason to keep program injection in kernel separated until now
was that we were able to do some checking, if really only the owning
thread injects program interrupts (via waitqueue_active(li->wq)).

This BUG_ON was never triggered and the chances of really hitting it, if
another thread injected a program irq to another vcpu, were very small.

Let's drop this check and turn kvm_s390_inject_program_int() and
kvm_s390_inject_prog_irq() into simple inline functions that makes use of
kvm_s390_inject_vcpu().

__must_check can be dropped as they are implicitely given by
kvm_s390_inject_vcpu(), to avoid ugly long function prototypes.

Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:34 +02:00
David Hildenbrand 4d32ad6bec KVM: s390: drop out early in kvm_s390_has_irq()
Let's get rid of the local variable and exit directly if we found
any pending interrupt. This is not only faster, but also better
readable.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:33 +02:00
David Hildenbrand 118b862b15 KVM: s390: kvm_arch_vcpu_runnable already cares about timer interrupts
We can remove that double check.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:33 +02:00
David Hildenbrand 5f94c58ed0 KVM: s390: set interception requests for all floating irqs
No need to separate pending and floating irqs when setting interception
requests. Let's do it for all equally.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:33 +02:00
David Hildenbrand fee0e0fdb2 KVM: s390: disabled wait cares about machine checks, not PER
We don't care about program event recording irqs (synchronous
program irqs) but asynchronous irqs when checking for disabled
wait. Machine checks were missing.

Let's directly switch to the functions we have for that purpose
instead of testing once again for magic bits.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:32 +02:00
Christian Borntraeger f59922b47e KVM: s390: remove unused variable in __inject_vm
the float int structure is no longer used in __inject_vm.

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:27 +02:00
Ingo Molnar d3df65c198 Merge branch 'perf/urgent' into perf/core, before pulling new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-08 10:52:18 +02:00
Linus Torvalds 3ec20e2e61 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "Three bug fixes and an update to the default configuration"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/defconfig: set SCSI_DH=y
  s390/vtime: correct scaled cputime of partially idle CPUs
  s390/boot/decompression: disable floating point in decompressor
  s390/numa: use correct type for node_to_cpumask_map
2015-10-06 14:59:36 +01:00
Gerald Schaefer 8128f23c43 iommu/s390: Add iommu api for s390 pci devices
This adds an IOMMU API implementation for s390 PCI devices.

Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-10-06 12:20:24 +02:00
Linus Torvalds 30c44659f4 Merge branch 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull strscpy string copy function implementation from Chris Metcalf.

Chris sent this during the merge window, but I waffled back and forth on
the pull request, which is why it's going in only now.

The new "strscpy()" function is definitely easier to use and more secure
than either strncpy() or strlcpy(), both of which are horrible nasty
interfaces that have serious and irredeemable problems.

strncpy() has a useless return value, and doesn't NUL-terminate an
overlong result.  To make matters worse, it pads a short result with
zeroes, which is a performance disaster if you have big buffers.

strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking
the insane NUL padding, but having a differently broken return value
which returns the original length of the source string.  Which means
that it will read characters past the count from the source buffer, and
you have to trust the source to be properly terminated.  It also makes
error handling fragile, since the test for overflow is unnecessarily
subtle.

strscpy() avoids both these problems, guaranteeing the NUL termination
(but not excessive padding) if the destination size wasn't zero, and
making the overflow condition very obvious by returning -E2BIG.  It also
doesn't read past the size of the source, and can thus be used for
untrusted source data too.

So why did I waffle about this for so long?

Every time we introduce a new-and-improved interface, people start doing
these interminable series of trivial conversion patches.

And every time that happens, somebody does some silly mistake, and the
conversion patch to the improved interface actually makes things worse.
Because the patch is mindnumbing and trivial, nobody has the attention
span to look at it carefully, and it's usually done over large swatches
of source code which means that not every conversion gets tested.

So I'm pulling the strscpy() support because it *is* a better interface.
But I will refuse to pull mindless conversion patches.  Use this in
places where it makes sense, but don't do trivial patches to fix things
that aren't actually known to be broken.

* 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  tile: use global strscpy() rather than private copy
  string: provide strscpy()
  Make asm/word-at-a-time.h available on all architectures
2015-10-04 16:31:13 +01:00
Daniel Borkmann a91263d520 ebpf: migrate bpf_prog's flags to bitfield
As we need to add further flags to the bpf_prog structure, lets migrate
both bools to a bitfield representation. The size of the base structure
(excluding insns) remains unchanged at 40 bytes.

Add also tags for the kmemchecker, so that it doesn't throw false
positives. Even in case gcc would generate suboptimal code, it's not
being accessed in performance critical paths.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03 05:02:39 -07:00
Sebastian Ott daad0bf149 s390/defconfig: set SCSI_DH=y
Fix this warning:
arch/s390/configs/performance_defconfig:380:warning: symbol value 'm' invalid for SCSI_DH

Introduced via 086b91d052
(scsi_dh: integrate into the core SCSI code)

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-01 10:48:36 +02:00
Martin Schwidefsky 72d38b1978 s390/vtime: correct scaled cputime of partially idle CPUs
The calculation for the SMT scaling factor for a hardware thread
which has been partially idle needs to disregard the cycles spent
by the other threads of the core while the thread is idle.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-30 16:22:38 +02:00
Christian Borntraeger adc0b7fbf6 s390/boot/decompression: disable floating point in decompressor
my gcc 5.1 used an ldgr instruction with a register != 0,2,4,6 for
spilling/filling into a floating point register in our decompressor.

This will cause an AFP-register data exception as the decompressor
did not setup the additional floating point registers via cr0.
That causes a program check loop that looked like a hang with
one "Uncompressing Linux... " message (directly booted via kvm)
or a loop of "Uncompressing Linux... " messages (when booted via
zipl boot loader).

The offending code in my build was

   48e400:       e3 c0 af ff ff 71       lay     %r12,-1(%r10)
-->48e406:       b3 c1 00 1c             ldgr    %f1,%r12
   48e40a:       ec 6c 01 22 02 7f       clij    %r6,2,12,0x48e64e

but gcc could do spilling into an fpr at any function. We can
simply disable floating point support at that early stage.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: stable@vger.kernel.org
2015-09-29 14:45:10 +02:00
Ingo Molnar 6afc0c269c Merge branch 'linus' into perf/core, to pick up fixes before applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-28 08:06:57 +02:00
Linus Torvalds b6d980f493 AMD fixes for bugs introduced in the 4.2 merge window,
and a few PPC bug fixes too.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWBSn7AAoJEL/70l94x66Dxd4H/RT6kWWj9x4grEYUkcJUDyK2
 AXm7XcKQm04auwAic8Otr+ts/Qix/50kWmBe/TU0QLgqb8rj5Dj3yGFK6Z1y6mAz
 KvaxqMJd4tZGTqN0DDvC2ItEdzjfAdeJZo/FHXqPHVspG0G14T7STLna02LTBBEJ
 tNzY9qor8nFhg2fT2szqKaudUNgTqkCTpo57o2BrHE96SHG+m0WdpQCV1F5hPVpg
 Te0Pb7qX9xng5n3sQ7IV/t3QYbrza1ACwNQS9XJa0Yu6iEz7JdmVmzHQASK9ynn6
 hUHhsNYGx4IsPjPtfJk2GroNaRDZL+VMzw07tfcOvPx8xkS9hS63pwzmSBqfLrM=
 =Ywqn
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "AMD fixes for bugs introduced in the 4.2 merge window, and a few PPC
  bug fixes too"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: disable halt_poll_ns as default for s390x
  KVM: x86: fix off-by-one in reserved bits check
  KVM: x86: use correct page table format to check nested page table reserved bits
  KVM: svm: do not call kvm_set_cr0 from init_vmcb
  KVM: x86: trap AMD MSRs for the TSeg base and mask
  KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()
  KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit
  KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs
  kvm: svm: reset mmu on VCPU reset
2015-09-25 10:51:40 -07:00
David Hildenbrand 920552b213 KVM: disable halt_poll_ns as default for s390x
We observed some performance degradation on s390x with dynamic
halt polling. Until we can provide a proper fix, let's enable
halt_poll_ns as default only for supported architectures.

Architectures are now free to set their own halt_poll_ns
default value.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-25 10:31:30 +02:00
Martin Schwidefsky 22be9cd9f2 s390/numa: use correct type for node_to_cpumask_map
With CONFIG_CPUMASK_OFFSTACK=y cpumask_var_t is a pointer to a CPU mask.
Replace the incorrect type for node_to_cpumask_map with cpumask_t.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-23 09:18:56 +02:00
Linus Torvalds 90a835f5d1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "A couple of system call updates.  The two new system calls userfaultfd
  and membarrier have been added, as well as the 17 direct calls for the
  multiplexed socket system calls.

  In addition the system call compat wrappers have been flagged as
  notrace functions and a few wrappers could be removed.

  And bug fixes for the vector register handling, cpu_mf, suspend/resume,
  compat signals, SMT cputime accounting and the zfcp dumper"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: wire up separate socketcalls system calls
  s390/compat: remove superfluous compat wrappers
  s390/compat: do not trace compat wrapper functions
  s390/s390x: allocate sys_membarrier system call number
  s390/configs//zfcpdump_defconfig: Remove CONFIG_MEMSTICK
  s390: wire up userfaultfd system call
  s390/vtime: correct scaled cputime for SMT
  s390/cpum_cf: Corrected return code for unauthorized counter sets
  s390/compat: correct uc_sigmask of the compat signal frame
  s390: fix floating point register corruption
  s390/hibernate: fix save and restore of vector registers
2015-09-21 09:53:30 -07:00
Heiko Carstens 977108f89c s390: wire up separate socketcalls system calls
As discussed on linux-arch all architectures should wire up the separate
system calls that are hidden behind the socketcall multiplexer system call.

It's just a couple more system calls and gives us a very small performance
improvement.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-18 11:16:53 +02:00