Commit graph

782387 commits

Author SHA1 Message Date
Nicholas Piggin 5434ae7462 powerpc/64s/hash: Add a SLB preload cache
When switching processes, currently all user SLBEs are cleared, and a
few (exec_base, pc, and stack) are preloaded. In trivial testing with
small apps, this tends to miss the heap and low 256MB segments, and it
will also miss commonly accessed segments on large memory workloads.

Add a simple round-robin preload cache that just inserts the last SLB
miss into the head of the cache and preloads those at context switch
time. Every 256 context switches, the oldest entry is removed from the
cache to shrink the cache and require fewer slbmte if they are unused.

Much more could go into this, including into the SLB entry reclaim
side to track some LRU information etc, which would require a study of
large memory workloads. But this is a simple thing we can do now that
is an obvious win for common workloads.

With the full series, process switching speed on the context_switch
benchmark on POWER9/hash (with kernel speculation security masures
disabled) increases from 140K/s to 178K/s (27%).

POWER8 does not change much (within 1%), it's unclear why it does not
see a big gain like POWER9.

Booting to busybox init with 256MB segments has SLB misses go down
from 945 to 69, and with 1T segments 900 to 21. These could almost all
be eliminated by preloading a bit more carefully with ELF binary
loading.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Nicholas Piggin 425d331462 powerpc/64s/hash: Provide arch_setup_exec() hooks for hash slice setup
This will be used by the SLB code in the next patch, but for now this
sets the slb_addr_limit to the correct size for 32-bit tasks.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Nicholas Piggin 126b11b294 powerpc/64s/hash: Add SLB allocation status bitmaps
Add 32-entry bitmaps to track the allocation status of the first 32
SLB entries, and whether they are user or kernel entries. These are
used to allocate free SLB entries first, before resorting to the round
robin allocator.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Nicholas Piggin 48e7b76957 powerpc/64s/hash: Convert SLB miss handlers to C
This patch moves SLB miss handlers completely to C, using the standard
exception handler macros to set up the stack and branch to C.

This can be done because the segment containing the kernel stack is
always bolted, so accessing it with relocation on will not cause an
SLB exception.

Arbitrary kernel memory must not be accessed when handling kernel
space SLB misses, so care should be taken there. However user SLB
misses can access any kernel memory, which can be used to move some
fields out of the paca (in later patches).

User SLB misses could quite easily reconcile IRQs and set up a first
class kernel environment and exit via ret_from_except, however that
doesn't seem to be necessary at the moment, so we only do that if a
bad fault is encountered.

[ Credit to Aneesh for bug fixes, error checks, and improvements to
  bad address handling, etc ]

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Disallow tracing for all of slb.c for now.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Nicholas Piggin 4c2de74cc8 powerpc/64: Interrupts save PPR on stack rather than thread_struct
PPR is the odd register out when it comes to interrupt handling, it is
saved in current->thread.ppr while all others are saved on the stack.

The difficulty with this is that accessing thread.ppr can cause a SLB
fault, but the SLB fault handler implementation in C change had
assumed the normal exception entry handlers would not cause an SLB
fault.

Fix this by allocating room in the interrupt stack to save PPR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Michael Ellerman 3eeacd9f4e powerpc/ptrace: Don't use sizeof(struct pt_regs) in ptrace code
Now that we've split the user & kernel versions of pt_regs we need to
be more careful in the ptrace code.

For now we've ensured the location of the fields in both structs is
the same, so most of the ptrace code doesn't need updating.

But there are a few places where we use sizeof(pt_regs), and these
will be wrong as soon as we increase the size of the kernel structure.

So flip them all to use sizeof(user_pt_regs).

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Michael Ellerman 002af9391b powerpc: Split user/kernel definitions of struct pt_regs
We use a shared definition for struct pt_regs in uapi/asm/ptrace.h.
That means the layout of the structure is ABI, ie. we can't change it.

That would be fine if it was only used to describe the user-visible
register state of a process, but it's also the struct we use in the
kernel to describe the registers saved in an interrupt frame.

We'd like more flexibility in the content (and possibly layout) of the
kernel version of the struct, but currently that's not possible.

So split the definition into a user-visible definition which remains
unchanged, and a kernel internal one.

At the moment they're still identical, and we check that at build
time. That's because we have code (in ptrace etc.) that assumes that
they are the same. We will fix that code in future patches, and then
we can break the strict symmetry between the two structs.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Benjamin Herrenschmidt 7f995d3ba6 powerpc/prom_init: Make "default_colors" const
It's never modified.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Benjamin Herrenschmidt 30c69ca048 powerpc/prom_init: Make "fake_elf" const
It is never modified

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Benjamin Herrenschmidt 3bad719b49 powerpc/prom_init: Make of_workarounds static
It's not used anywhere else.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy 1b2443a547 powerpc/book3s64: Avoid multiple endian conversion in pte helpers
In the same spirit as already done in pte query helpers,
this patch changes pte setting helpers to perform endian
conversions on the constants rather than on the pte value.

In the meantime, it changes pte_access_permitted() to use
pte helpers for the same reason.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy ff00552578 powerpc/8xx: change name of a few page flags to avoid confusion
_PAGE_PRIVILEGED corresponds to the SH bit which doesn't protect
against user access but only disables ASID verification on kernel
accesses. User access is controlled with _PMD_USER flag.

Name it _PAGE_SH instead of _PAGE_PRIVILEGED

_PAGE_HUGE corresponds to the SPS bit which doesn't really tells
that's it is a huge page but only that it is not a 4k page.

Name it _PAGE_SPS instead of _PAGE_HUGE

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy 5662315384 powerpc/mm: Get rid of pte-common.h
Do not include pte-common.h in nohash/32/pgtable.h

As that was the last includer, get rid of pte-common.h

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy cbcbbf4afd powerpc/mm: Define platform default caches related flags
Cache related flags like _PAGE_COHERENT and _PAGE_WRITETHRU
are defined on most platforms. The platforms not defining
them don't define any alternative. So we can give them a NUL
value directly for those platforms directly.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy a0da4bc166 powerpc/mm: Allow platforms to redefine some helpers
The 40xx defines _PAGE_HWWRITE while others don't.
The 8xx defines _PAGE_RO instead of _PAGE_RW.
The 8xx defines _PAGE_PRIVILEGED instead of _PAGE_USER.
The 8xx defines _PAGE_HUGE and _PAGE_NA while others don't.

Lets those platforms redefine pte_write(), pte_wrprotect() and
pte_mkwrite() and get _PAGE_RO and _PAGE_HWWRITE off the common
helpers.

Lets the 8xx redefine pte_user(), pte_mkprivileged() and pte_mkuser()
and get rid of _PAGE_PRIVILEGED and _PAGE_USER default values.

Lets the 8xx redefine pte_mkhuge() and get rid of
_PAGE_HUGE default value.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy 6c5d2d3fd3 powerpc/nohash/64: do not include pte-common.h
nohash/64 only uses book3e PTE flags, so it doesn't need pte-common.h

This also allows to drop PAGE_SAO and H_PAGE_4K_PFN from pte_common.h
as they are only used by PPC64

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy d82fd29c5a powerpc/mm: Distribute platform specific PAGE and PMD flags and definitions
The base kernel PAGE_XXXX definition sets are more or less platform
specific. Lets distribute them close to platform _PAGE_XXX flags
definition, and customise them to their exact platform flags.

Also defines _PAGE_PSIZE and _PTE_NONE_MASK for each platform
allthough they are defined as 0.

Do the same with _PMD flags like _PMD_USER and _PMD_PRESENT_MASK

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy e0f57031ca powerpc/mm: Move pte_user() into nohash/pgtable.h
Now the pte-common.h is only for nohash platforms, lets
move pte_user() helper out of pte-common.h to put it
together with other helpers.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy b2133bd7a5 powerpc/book3s/32: do not include pte-common.h
As done for book3s/64, add necessary flags/defines in
book3s/32/pgtable.h and do not include pte-common.h

It allows in the meantime to remove all related hash
definitions from pte-common.h and to also remove
_PAGE_EXEC default as _PAGE_EXEC is defined on all
platforms except book3s/32.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy f4805785f0 powerpc/mm: move __P and __S tables in the common pgtable.h
__P and __S flags are the same for all platform and should remain
as is in the future, so avoid duplication.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy 093d7ca229 powerpc/mm: drop unused page flags
The following page flags in pte-common.h can be dropped:

_PAGE_ENDIAN is only used in mm/fsl_booke_mmu.c and is defined in
asm/nohash/32/pte-fsl-booke.h

_PAGE_4K_PFN is nowhere defined nor used

_PAGE_READ, _PAGE_WRITE and _PAGE_PTE are only defined and used
in book3s/64

The following page flags in book3s/64/pgtable.h can be dropped as
they are not used on this platform nor by common code.

_PAGE_NA, _PAGE_RO, _PAGE_USER and _PAGE_PSIZE

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy 97026b5a5a powerpc/mm: Split dump_pagelinuxtables flag_array table
To reduce the complexity of flag_array, and allow the removal of
default 0 value of non existing flags, lets have one flag_array
table for each platform family with only the really existing flags.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy 26973fa5ac powerpc/mm: use pte helpers in generic code
Get rid of platform specific _PAGE_XXXX in powerpc common code and
use helpers instead.

mm/dump_linuxpagetables.c will be handled separately

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy 34eb138ed7 powerpc/mm: don't use _PAGE_EXEC for calling hash_preload()
The 'access' parameter of hash_preload() is either 0 or _PAGE_EXEC.
Among the two versions of hash_preload(), only the PPC64 one is
doing something with this 'access' parameter.

In order to remove the use of _PAGE_EXEC outside platform code,
'access' parameter is replaced by 'is_exec' which will be either
true of false, and the PPC64 version of hash_preload() creates
the access flag based on 'is_exec'.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy daba790242 powerpc/mm: add pte helpers to query and change pte flags
In order to avoid using generic _PAGE_XXX flags in powerpc
core functions, define helpers for all needed flags:
- pte_mkuser() and pte_mkprivileged() to set/unset and/or
unset/set _PAGE_USER and/or _PAGE_PRIVILEGED
- pte_hashpte() to check if _PAGE_HASHPTE is set.
- pte_ci() check if cache is inhibited (already existing on book3s/64)
- pte_exprotect() to protect against execution
- pte_exec() and pte_mkexec() to query and set page execution
- pte_mkpte() to set _PAGE_PTE flag.
- pte_hw_valid() to check _PAGE_PRESENT since pte_present does
something different on book3s/64.

On book3s/32 there is no exec protection, so pte_mkexec() and
pte_exprotect() are nops and pte_exec() returns always true.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy aa9cd505e3 powerpc/mm: move some nohash pte helpers in nohash/[32:64]/pgtable.h
In order to allow their use in nohash/32/pgtable.h, we have to move the
following helpers in nohash/[32:64]/pgtable.h:
- pte_mkwrite()
- pte_mkdirty()
- pte_mkyoung()
- pte_wrprotect()

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy d81e6f8b7c powerpc/mm: don't use _PAGE_EXEC in book3s/32
book3s/32 doesn't define _PAGE_EXEC, so no need to use it.

All other platforms define _PAGE_EXEC so no need to check
it is not NUL when not book3s/32.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy c766ee7223 powerpc: handover page flags with a pgprot_t parameter
In order to avoid multiple conversions, handover directly a
pgprot_t to map_kernel_page() as already done for radix.

Do the same for __ioremap_caller() and __ioremap_at().

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy 56f3c1413f powerpc/mm: properly set PAGE_KERNEL flags in ioremap()
Set PAGE_KERNEL directly in the caller and do not rely on a
hack adding PAGE_KERNEL flags when _PAGE_PRESENT is not set.

As already done for PPC64, use pgprot_cache() helpers instead of
_PAGE_XXX flags in PPC32 ioremap() derived functions.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy aa91796ec4 powerpc: don't use ioremap_prot() nor __ioremap() unless really needed.
In many places, ioremap_prot() and __ioremap() can be replaced with
higher level functions like ioremap(), ioremap_coherent(),
ioremap_cache(), ioremap_wc() ...

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy 402a5698b4 soc/fsl/qbman: use ioremap_cache() instead of ioremap_prot(0)
ioremap_prot() with flag set to 0 relies on a hack in
__ioremap_caller() which adds PAGE_KERNEL flags when the
handed flags don't look like a valid set of flags
(ie don't include _PAGE_PRESENT)

The intention being to map cached memory, use ioremap_cache() instead.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy ed18e423a3 drivers/block/z2ram: use ioremap_wt() instead of __ioremap(_PAGE_WRITETHRU)
_PAGE_WRITETHRU is a target specific flag. Prefer generic functions.

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 23:52:45 +11:00
Christophe Leroy e04e39507c drivers/video/fbdev: use ioremap_wc/wt() instead of __ioremap()
_PAGE_NO_CACHE is a platform specific flag. In addition, this flag
is misleading because one would think it requests a noncached page
whereas a noncached page is _PAGE_NO_CACHE | _PAGE_GUARDED

_PAGE_NO_CACHE alone means write combined noncached page, so lets
use ioremap_wc() instead.

_PAGE_WRITETHRU is also platform specific flag. Use ioremap_wt()
instead.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 23:52:45 +11:00
Christophe Leroy 86c391bd5f powerpc/32: Add ioremap_wt() and ioremap_coherent()
Other arches have ioremap_wt() to map IO areas write-through.
Implement it on PPC as well in order to avoid drivers using
__ioremap(_PAGE_WRITETHRU)

Also implement ioremap_coherent() to avoid drivers using
__ioremap(_PAGE_COHERENT)

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Gautham R. Shenoy dfd718a2ed powerpc/rtas: Fix a potential race between CPU-Offline & Migration
Live Partition Migrations require all the present CPUs to execute the
H_JOIN call, and hence rtas_ibm_suspend_me() onlines any offline CPUs
before initiating the migration for this purpose.

The commit 85a88cabad
("powerpc/pseries: Disable CPU hotplug across migrations")
disables any CPU-hotplug operations once all the offline CPUs are
brought online to prevent any further state change. Once the
CPU-Hotplug operation is disabled, the code assumes that all the CPUs
are online.

However, there is a minor window in rtas_ibm_suspend_me() between
onlining the offline CPUs and disabling CPU-Hotplug when a concurrent
CPU-offline operations initiated by the userspace can succeed thereby
nullifying the the aformentioned assumption. In this unlikely case
these offlined CPUs will not call H_JOIN, resulting in a system hang.

Fix this by verifying that all the present CPUs are actually online
after CPU-Hotplug has been disabled, failing which we restore the
state of the offline CPUs in rtas_ibm_suspend_me() and return an
-EBUSY.

Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Gautham R. Shenoy 500fe5f550 powerpc/cacheinfo: Report the correct shared_cpu_map on big-cores
Currently on POWER9 SMT8 cores systems, in sysfs, we report the
shared_cache_map for L1 caches (both data and instruction) to be the
cpu-ids of the threads in SMT8 cores. This is incorrect since on
POWER9 SMT8 cores there are two groups of threads, each of which
shares its own L1 cache.

This patch addresses this by reporting the shared_cpu_map correctly in
sysfs for L1 caches.

Before the patch
   /sys/devices/system/cpu/cpu0/cache/index0/shared_cpu_map : 000000ff
   /sys/devices/system/cpu/cpu0/cache/index1/shared_cpu_map : 000000ff
   /sys/devices/system/cpu/cpu1/cache/index0/shared_cpu_map : 000000ff
   /sys/devices/system/cpu/cpu1/cache/index1/shared_cpu_map : 000000ff

After the patch
   /sys/devices/system/cpu/cpu0/cache/index0/shared_cpu_map : 00000055
   /sys/devices/system/cpu/cpu0/cache/index1/shared_cpu_map : 00000055
   /sys/devices/system/cpu/cpu1/cache/index0/shared_cpu_map : 000000aa
   /sys/devices/system/cpu/cpu1/cache/index1/shared_cpu_map : 000000aa

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Gautham R. Shenoy 8e8a31d7fd powerpc: Use cpu_smallcore_sibling_mask at SMT level on bigcores
POWER9 SMT8 cores consist of two groups of threads, where threads in
each group shares L1-cache. The scheduler is not aware of this
distinction as the current sched-domain hierarchy has all the threads
of the core defined at the SMT domain.

	SMT  [Thread siblings of the SMT8 core]
	DIE  [CPUs in the same die]
	NUMA [All the CPUs in the system]

Due to this, we can observe run-to-run variance when we run a
multi-threaded benchmark bound to a single core based on how the
scheduler spreads the software threads across the two groups in the
core.

We fix this in this patch by defining each group of threads which
share L1-cache to be the SMT level. The group of threads in the SMT8
core is defined to be the CACHE level. The sched-domain hierarchy
after this patch will be :

	SMT	[Thread siblings in the core that share L1 cache]
	CACHE 	[Thread siblings that are in the SMT8 core]
	DIE  	[CPUs in the same die]
	NUMA 	[All the CPUs in the system]

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Gautham R. Shenoy 425752c63b powerpc: Detect the presence of big-cores via "ibm, thread-groups"
On IBM POWER9, the device tree exposes a property array identifed by
"ibm,thread-groups" which will indicate which groups of threads share
a particular set of resources.

As of today we only have one form of grouping identifying the group of
threads in the core that share the L1 cache, translation cache and
instruction data flow.

This patch adds helper functions to parse the contents of
"ibm,thread-groups" and populate a per-cpu variable to cache
information about siblings of each CPU that share the L1, traslation
cache and instruction data-flow.

It also defines a new global variable named "has_big_cores" which
indicates if the cores on this configuration have multiple groups of
threads that share L1 cache.

For each online CPU, it maintains a cpu_smallcore_mask, which
indicates the online siblings which share the L1-cache with it.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Michael Ellerman bf6cbd0c87 powerpc: Fix stackprotector detection for non-glibc toolchains
If GCC is not built with glibc support then we must explicitly tell it
which register to use for TLS mode stack protector, otherwise it will
error out and the cc-option check will fail.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Michael Ellerman 50530f5eac powerpc/xmon: Show the stack protector canary in xmon
This is helpful for debugging stack protector crashes.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Joel Stanley ed9e84a4d7 powerpc: Use SWITCH_FRAME_SIZE for prom and rtas entry
Commit 6c1719942e ("powerpc/of: Remove useless register save/restore
when calling OF back") removed the saving of srr0 and srr1 when calling
into OpenFirmware. Commit e31aa453bb ("powerpc: Use LOAD_REG_IMMEDIATE
only for constants on 64-bit") did the same for rtas.

This means we don't need to save the extra stack space and can use
the common SWITCH_FRAME_SIZE.

There were already no users of _SRR0 and _SRR1 so we can remove them
too.

Link: https://github.com/linuxppc/linux/issues/83
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Michael Bringmann 65b9fdadfc powerpc/pseries/mobility: Extend start/stop topology update scope
The powerpc mobility code may receive RTAS requests to perform PRRN
(Platform Resource Reassignment Notification) topology changes at any
time, including during LPAR migration operations.

In some configurations where the affinity of CPUs or memory is being
changed on that platform, the PRRN requests may apply or refer to
outdated information prior to the complete update of the device-tree.

This patch changes the duration for which topology updates are
suppressed during LPAR migrations from just the rtas_ibm_suspend_me()
/ 'ibm,suspend-me' call(s) to cover the entire migration_store()
operation to allow all changes to the device-tree to be applied prior
to accepting and applying any PRRN requests.

For tracking purposes, pr_info notices are added to the functions
start_topology_update() and stop_topology_update() of 'numa.c'.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Joel Stanley 960e300298 powerpc/Makefile: Fix PPC_BOOK3S_64 ASFLAGS
Ever since commit 15a3204d24 ("powerpc/64s: Set assembler machine type
to POWER4") we force -mpower4 to be passed to the assembler
irrespective of the CFLAGS used (for Book3s 64).

When building a powerpc64 kernel with clang, clang will not add -many
to the assembler flags, so any instructions that the compiler has
generated that are not available on power4 will cause an error:

  /usr/bin/as -a64 -mppc64 -mlittle-endian -mpower8 \
   -I ./arch/powerpc/include -I ./arch/powerpc/include/generated \
   -I ./include -I ./arch/powerpc/include/uapi \
   -I ./arch/powerpc/include/generated/uapi -I ./include/uapi \
   -I ./include/generated/uapi -I arch/powerpc -I arch/powerpc \
   -maltivec -mpower4 -o init/do_mounts.o /tmp/do_mounts-3b0a3d.s
  /tmp/do_mounts-51ce54.s:748: Error: unrecognized opcode: `isel'

GCC does include -many, so the GCC driven gas call will succeed:

  as -v -I ./arch/powerpc/include -I ./arch/powerpc/include/generated -I
  ./include -I ./arch/powerpc/include/uapi
  -I ./arch/powerpc/include/generated/uapi -I ./include/uapi
  -I ./include/generated/uapi -I arch/powerpc -I arch/powerpc
   -a64 -mpower8 -many -mlittle -maltivec -mpower4 -o init/do_mounts.o

Note that isel is power7 and above for IBM CPUs. GCC only generates it
for Power9 and above, but the above test was run against the clang
generated assembly.

Peter Bergner explains:

  When using -many -mpower4, gas will first try and find a matching
  power4 mnemonic and failing that, it will then allow any valid
  mnemonic that gas knows about. GCC's use of -many predates me
  though.

  IIRC, Alan looked at trying to remove it, but I forget why he
  didn't. Could be either a gcc or gas issue at the time. I'm not sure
  whether issue still exists or not. He and I have modified how gas
  works internally a fair amount since he tried removing gcc use of
  -many.

  I will also note that when using -many, gas will choose the first
  mnemonic that matches in the mnemonic table and we have (mostly)
  sorted the table so that server mnemonics show up earlier in the
  table than other mnemonics, so they'll be seen/chosen first.

By explicitly setting -many we can build with Clang and GCC while
retaining the -mpower4 option.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
YueHaibing b45e9d761b powerpc/pseries/memory-hotplug: Fix return value type of find_aa_index
The variable 'aa_index' is defined as an unsigned value in
update_lmb_associativity_index(), but find_aa_index() may return -1
when dlpar_clone_property() fails. So change find_aa_index() to return
a bool, which indicates whether 'aa_index' was found or not.

Fixes: c05a5a4096 ("powerpc/pseries: Dynamic add entires to associativity lookup array")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Nathan Fontenot nfont@linux.vnet.ibm.com>
[mpe: Tweak changelog, rename is_found to just found]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Sam Bobroff b90484ec11 powerpc/eeh: Cleanup control flow in eeh_handle_normal_event()
Rather than mixing "if (state)" blocks and gotos, convert entirely to
"if (state)" blocks to make the state machine behaviour clearer.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Sam Bobroff fef7f90552 powerpc/eeh: Cleanup eeh_ops.wait_state()
The wait_state member of eeh_ops does not need to be platform
dependent; it's just logic around eeh_ops.get_state(). Therefore,
merge the two (slightly different!) platform versions into a new
function, eeh_wait_state() and remove the eeh_ops member.

While doing this, also correct:
* The wait logic, so that it never waits longer than max_wait.
* The wait logic, so that it never waits less than
  EEH_STATE_MIN_WAIT_TIME.
* One call site where the result is treated like a bit field before
  it's checked for negative error values.
* In pseries_eeh_get_state(), rename the "state" parameter to "delay"
  because that's what it is.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Sam Bobroff e762bb891a powerpc/eeh: Cleanup eeh_pe_state_mark()
Currently, eeh_pe_state_mark() marks a PE (and it's children) with a
state and then performs additional processing if that state included
EEH_PE_ISOLATED.

The state parameter is always a constant at the call site, so
rearrange eeh_pe_state_mark() into two functions and just call the
appropriate one at each site.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Sam Bobroff eed4bdbeec powerpc/eeh: Cleanup unnecessary eeh_pe_state_mark_with_cfg()
The function eeh_pe_state_mark_with_cfg() just performs the work of
eeh_pe_state_mark() and then, conditionally, the work of
eeh_pe_state_clear(). However it is only ever called with a constant
state such that the condition is always true, so replace it by direct
calls.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Sam Bobroff 54644927a0 powerpc/eeh: Cleanup eeh_enabled()
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Sam Bobroff 9a3eda266f powerpc/eeh: Cleanup logic in eeh_rmv_from_parent_pe()
Move the call to eeh_dev_to_pe() up, so that later it's clear that
"pe" isn't NULL.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00