Commit graph

16 commits

Author SHA1 Message Date
Johannes Weiner 759496ba64 arch: mm: pass userspace fault flag to generic fault handler
Unlike global OOM handling, memory cgroup code will invoke the OOM killer
in any OOM situation because it has no way of telling faults occuring in
kernel context - which could be handled more gracefully - from
user-triggered faults.

Pass a flag that identifies faults originating in user space from the
architecture-specific fault handlers to generic code so that memcg OOM
handling can be improved.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12 15:38:01 -07:00
David S. Miller 0fbebed682 sparc64: Fix tsb_grow() in atomic context.
If our first THP installation for an MM is via the set_pmd_at() done
during khugepaged's collapsing we'll end up in tsb_grow() trying to do
a GFP_KERNEL allocation with several locks held.

Simply using GFP_ATOMIC in this situation is not the best option
because we really can't have this fail, so we'd really like to keep
this an order 0 GFP_KERNEL allocation if possible.

Also, doing the TSB allocation from khugepaged is a really bad idea
because we'll allocate it potentially from the wrong NUMA node in that
context.

So what we do is defer the hugepage TSB allocation until the first TLB
miss we take on a hugepage.  This is slightly tricky because we have
to handle two unusual cases:

1) Taking the first hugepage TLB miss in the window trap handler.
   We'll call the winfix_trampoline when that is detected.

2) An initial TSB allocation via TLB miss races with a hugetlb
   fault on another cpu running the same MM.  We handle this by
   unconditionally loading the TSB we see into the current cpu
   even if it's non-NULL at hugetlb_setup time.

Reported-by: Meelis Roos <mroos@ut.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-20 09:46:08 -08:00
David S. Miller f88620b9c5 sparc64: Fix deficiencies in sun4v error reporting.
Missing error types, attributes, and report fields.  Pad out
to 64-bytes.

Make string reporting cleaner and easier to extend in the future using
"const char *" arrays that index by either bit position, or absolute
field value.

Report the raw 64-byte error report as a sequence of u64s before the
annotated version.

Only report fields which are valid, given the context and the
attribute bits which are set.

For shutdown requests, use the local copy of the error report not the
one we just freed up back to the queue.  Also, use orderly_poweroff()
just like the Domain Services shutdown request code does.

If the real-address reported is "-1" (unknown) try to disassemble the
instruction to report the effective address of the access.  Only do
this in privileged mode.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 17:19:32 -07:00
David Miller 9e695d2ecc sparc64: Support transparent huge pages.
This is relatively easy since PMD's now cover exactly 4MB of memory.

Our PMD entries are 32-bits each, so we use a special encoding.  The
lowest bit, PMD_ISHUGE, determines the interpretation.  This is possible
because sparc64's page tables are purely software entities so we can use
whatever encoding scheme we want.  We just have to make the TLB miss
assembler page table walkers aware of the layout.

set_pmd_at() works much like set_pte_at() but it has to operate in two
page from a table of non-huge PTEs, so we have to queue up TLB flushes
based upon what mappings are valid in the PTE table.  In the second regime
we are going from huge-page to non-huge-page, and in that case we need
only queue up a single TLB flush to push out the huge page mapping.

We still have 5 bits remaining in the huge PMD encoding so we can very
likely support any new pieces of THP state tracking that might get added
in the future.

With lots of help from Johannes Weiner.

Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:23:06 +09:00
Shaohua Li 45cac65b0f readahead: fault retry breaks mmap file read random detection
.fault now can retry.  The retry can break state machine of .fault.  In
filemap_fault, if page is miss, ra->mmap_miss is increased.  In the second
try, since the page is in page cache now, ra->mmap_miss is decreased.  And
these are done in one fault, so we can't detect random mmap file access.

Add a new flag to indicate .fault is tried once.  In the second try, skip
ra->mmap_miss decreasing.  The filemap_fault state machine is ok with it.

I only tested x86, didn't test other archs, but looks the change for other
archs is obvious, but who knows :)

Signed-off-by: Shaohua Li <shaohua.li@fusionio.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:47 +09:00
Kautuk Consul 7358e51082 sparc/mm/fault_64.c: Port OOM changes to do_sparc64_fault
Commit d065bd810b
(mm: retry page fault when blocking on disk transfer) and
commit 37b23e0525
(x86,mm: make pagefault killable)

The above commits introduced changes into the x86 pagefault handler
for making the page fault handler retryable as well as killable.

These changes reduce the mmap_sem hold time, which is crucial
during OOM killer invocation.

Port these changes to 64-bit sparc.

Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04 15:42:24 -07:00
Peter Zijlstra a8b0ca17b8 perf: Remove the nmi parameter from the swevent and overflow interface
The nmi parameter indicated if we could do wakeups from the current
context, if not, we would set some state and self-IPI and let the
resulting interrupt do the wakeup.

For the various event classes:

  - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
    the PMI-tail (ARM etc.)
  - tracepoint: nmi=0; since tracepoint could be from NMI context.
  - software: nmi=[0,1]; some, like the schedule thing cannot
    perform wakeups, and hence need 0.

As one can see, there is very little nmi=1 usage, and the down-side of
not using it is that on some platforms some software events can have a
jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).

The up-side however is that we can remove the nmi parameter and save a
bunch of conditionals in fast paths.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01 11:06:35 +02:00
David S. Miller 4b17764737 sparc: Support show_unhandled_signals.
Just faults right now, will add other traps later.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-01 00:02:23 -08:00
David S. Miller a084b6678a sparc: Add missing SW perf fault events.
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-20 16:23:03 -08:00
David S. Miller 4ed5d5e429 sparc64: Add some missing __kprobes annotations to kernel fault paths.
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-10 18:08:29 -08:00
David S. Miller 135d082171 sparc64: Use kprobes_built_in() to avoid ifdefs in fault_64.c
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-10 18:02:19 -08:00
David S. Miller a923c28fc5 sparc: Use page_fault_out_of_memory() for VM_FAULT_OOM.
As noted by Nick Piggin.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02 19:17:15 -07:00
Linus Torvalds d06063cc22 Move FAULT_FLAG_xyz into handle_mm_fault() callers
This allows the callers to now pass down the full set of FAULT_FLAG_xyz
flags to handle_mm_fault().  All callers have been (mechanically)
converted to the new calling convention, there's almost certainly room
for architectures to clean up their code and then add FAULT_FLAG_RETRY
when that support is added.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-21 13:08:22 -07:00
David S. Miller 9b02605826 sparc64: Kill bogus TPC/address truncation during 32-bit faults.
This builds upon eeabac7386
("sparc64: Validate kernel generated fault addresses on sparc64.")

Upon further consideration, we actually should never see any
fault addresses for 32-bit tasks with the upper 32-bits set.

If it does every happen, by definition it's a bug.  Whatever
context created that fault would only have that fault satisfied
if we used the full 64-bit address.  If we truncate it, we'll
always fault the wrong address and we'll always loop faulting
forever.

So catch such conditions and mark them as errors always.  Log
the error and fail the fault.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-03 16:28:23 -08:00
David S. Miller eeabac7386 sparc64: Validate kernel generated fault addresses on sparc64.
In order to handle all of the cases of address calculation overflow
properly, we run sparc 32-bit processes in "address masking" mode
when running on a 64-bit kernel.

Address masking mode zeros out the top 32-bits of the address
calculated for every load and store instruction.

However, when we're in privileged mode we have to run with that
address masking mode disabled even when accessing userspace from
the kernel.

To "simulate" the address masking mode we clear the top-bits by
hand for 32-bit processes in the fault handler.

It is the responsibility of code in the compat layer to properly
zero extend addresses used to access userspace.  If this isn't
followed properly we can get into a fault loop.

Say that the user address is 0xf0000000 but for whatever reason
the kernel code sign extends this to 64-bit, and then the kernel
tries to access the result.

In such a case we'll fault on address 0xfffffffff0000000 but the fault
handler will process that fault as if it were to address 0xf0000000.
We'll loop faulting forever because the fault never gets satisfied.

So add a check specifically for this case, when the kernel is faulting
on a user address access and the addresses don't match up.

This code path is sufficiently slow path, and this bug is sufficiently
painful to diagnose, that this kind of bug check is warranted.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-02 22:08:15 -08:00
Sam Ravnborg 27137e5285 sparc,sparc64: unify mm/
- move all sparc64/mm/ files to arch/sparc/mm/
- commonly named files are named _64.c
- add files to sparc/mm/Makefile preserving link order
- delete now unused sparc64/mm/Makefile
- sparc64 now finds mm/ in sparc

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-04 09:16:59 -08:00
Renamed from arch/sparc64/mm/fault.c (Browse further)