* 'kvm-updates/2.6.30' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (113 commits)
KVM: VMX: Don't allow uninhibited access to EFER on i386
KVM: Correct deassign device ioctl to IOW
KVM: ppc: e500: Fix the bug that KVM is unstable in SMP
KVM: ppc: e500: Fix the bug that mas0 update to wrong value when read TLB entry
KVM: Fix missing smp tlb flush in invlpg
KVM: Get support IRQ routing entry counts
KVM: fix sparse warnings: Should it be static?
KVM: fix sparse warnings: context imbalance
KVM: is_long_mode() should check for EFER.LMA
KVM: VMX: Update necessary state when guest enters long mode
KVM: ia64: Fix the build errors due to lack of macros related to MSI.
ia64: Move the macro definitions related to MSI to one header file.
KVM: fix kvm_vm_ioctl_deassign_device
KVM: define KVM_CAP_DEVICE_DEASSIGNMENT
KVM: ppc: Add emulation of E500 register mmucsr0
KVM: Report IRQ injection status for MSI delivered interrupts
KVM: MMU: Fix another largepage memory leak
KVM: SVM: set accessed bit for VMCB segment selectors
KVM: Report IRQ injection status to userspace.
KVM: MMU: remove assertion in kvm_mmu_alloc_page
...
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (29 commits)
crypto: sha512-s390 - Add missing block size
hwrng: timeriomem - Breaks an allyesconfig build on s390:
nlattr: Fix build error with NET off
crypto: testmgr - add zlib test
crypto: zlib - New zlib crypto module, using pcomp
crypto: testmgr - Add support for the pcomp interface
crypto: compress - Add pcomp interface
netlink: Move netlink attribute parsing support to lib
crypto: Fix dead links
hwrng: timeriomem - New driver
crypto: chainiv - Use kcrypto_wq instead of keventd_wq
crypto: cryptd - Per-CPU thread implementation based on kcrypto_wq
crypto: api - Use dedicated workqueue for crypto subsystem
crypto: testmgr - Test skciphers with no IVs
crypto: aead - Avoid infinite loop when nivaead fails selftest
crypto: skcipher - Avoid infinite loop when cipher fails selftest
crypto: api - Fix crypto_alloc_tfm/create_create_tfm return convention
crypto: api - crypto_alg_mod_lookup either tested or untested
crypto: amcc - Add crypt4xx driver
crypto: ansi_cprng - Add maintainer
...
Makes code futureproof against the impending change to mm->cpu_vm_mask.
It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Impact: use new API
Use the accessors rather than frobbing bits directly. Most of this is
in arch code I haven't even compiled, but is straightforward.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Impact: cleanup, futureproof
In fact, all cpumask ops will only be valid (in general) for bit
numbers < nr_cpu_ids. So use that instead of NR_CPUS in various
places (I also updated the immediate sites to use the new cpumask_
operators).
This is always safe: no cpu number can be >= nr_cpu_ids, and
nr_cpu_ids is initialized to NR_CPUS at boot.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Impact: cleanup
cpu_coregroup_mask is the New Hotness.
As S/390 uses theirs internally, so we just make it static.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The lowcore clock comparator save area on 64 bit machines is defined to
contain only the seven most significant bits of the register.
That's also why it starts at an uneven address (0x1331).
The current code however writes eight bytes to the address and
therefore overwrites the first byte of the access register save area.
Fix this and write only seven bytes to the save area.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The Extended Translation Facility 3 (ETF3) added instructions which
allow conversions between different unicode character maps (UTF-8,
UTF-32 ...). These instructions got enhanced with a later version of
the ETF3 allowing malformed multibyte chars to be recognized and
reported correctly. The attached patch reserves bit 8 in the elf
hwcaps vector for the enhanced version of ETF3. The bit corresponds to
the stfle bits 22 and 30 and will only be set if both of the stfle
bits are set.
Signed-off-by: Andreas Krebbel <krebbel@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The s390 ipl panic notifier will stop the system or trigger a system dump.
This should be done as final action on the panic path. All other panic
notifiers should be executed before. Currently we use priority 0 for the ipl
notifier. In order to be called late, this patch changes the priority to
INT_MIN which is the lowest possible priority.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The old dfp detection wanted to check bit 43 (dfp high performance), but due
to a wrong calculation always used to check bit 42. Additionally the
"userspace expectation" is, that the dfp capability bit is set is if facility
bit 42 (decimal floating point facility available) and bit 44 (perform floating
point operation facility avail).
The patch fixes the bit calculation and extends the check to work like:
elf hw cap dfp bit = facility bits 42 (dfp) & 44 (pfpo) available
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Performing an initial cpu reset makes sure all registers and tlbs of
the targeted cpu are initialized and flushed.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If sigp_set_prefix fails on __cpu_up we leak the lowcore structures
and async+panic stacks for the failed cpu.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
A code analysis tool reported two warnings:
"The expression `ipl_info.type == IPL_TYPE_FCP' is true whenever evaluated."
and "Default is not possible". This patch improves the corresponding if
statement logic and removes the unnecessary switch defaults.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The check for NULL pointer has already be done before. Therefore we can remove
the second check.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use kzfree() instead of memset() + kfree().
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The zcore code switches to real addressing mode when creating a kernel dump.
This is not possible, if it is built as a kernel module. With this patch
zcore (zfcpdump) can't be built as a kernel module any more.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The cksm function in system.h is duplicate to csum_partial in checksum.h.
Remove cksm and use csum_partial instead.
Signed-off-by: Frank Munzert <munzert@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The lowcore.h header has quite a lot of whitespace damage and a rather
wild collection of entries. Remove all that whitespace and tidy up the
order of the lowcore fields.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch fixes two addresses in the comments for the
lowcore structure. Looks like an copy-paste bug.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We need to use this value in the checkpoint/restart code and would like to
have a constant instead of a magic '3'.
Cc: linux-s390@vger.kernel.org
Signed-off-by: Dan Smith <danms@us.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use builtin variants if gcc 4 or newer is used to compile the kernel.
Generates better code than the asm variants.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Avoid the detour over the PLT if the branch target of a function call
in a module is in the range of the bras (16-bit) or brasl (32-bit)
instruction. The PLT is still generated but it is unused.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
likely/unlikely profiling revealed that none of the branches in bitops
is taken likely or unlikely. So remove the annotations.
In addition the generated code is shorter.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
strlcpy() does already NUL-terminate the destination string.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In introducing a trivial "strstarts()" function in linux/string.h, we
hit the following error on s390:
In file included from include/linux/bitmap.h:8,
from include/linux/cpumask.h:142,
from include/linux/smp.h:12,
from /home/rusty/devel/kernel/patches/linux-2.6/arch/s390/include/asm/spinlock.h:14,
from include/linux/spinlock.h:88,
from include/linux/seqlock.h:29,
from include/linux/time.h:8,
from include/linux/stat.h:60,
from include/linux/module.h:10,
from arch/s390/lib/string.c:13:
include/linux/string.h: In function 'strstarts':
include/linux/string.h:124: error: implicit declaration of function 'strlen'
include/linux/string.h:124: warning: incompatible implicit declaration of built-in function 'strlen'
Because when including asm/string.h from arch/s390/lib/string.c we
don't declare the string ops we are about to define, and
linux/string.h barfs.
The fix is to declare them in this IN_ARCH_STRING_C case, but in
general I wonder if there's a neater fix.
Reported-by: linux-next
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Errors from SIGA instructions are stored in the per queue qdio_error
and reported back when the queue handler is called. That opens a race
when multiple error conditions occur simultanously.
Report SIGA errors immediately in the return value of do_QDIO so the
upper layer can react and SIGA errors no longer interfere with other
errors.
Move the SIGA error handling in qeth from the outbound handler to
qeth_flush_buffers.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Disable switch_amode by default because pagetable walk on pre z9
hardware has negative performance impact.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The clock sync mode flag CLOCK_SYNC_STP is not cleared when stp
is set offline. In this case the get_sync_clock() function returns
-EACCESS and the dasd driver will block all i/o until stp is enabled
again. In addition get_sync_clock can return -EACCESS if the clock is
not in sync instead of -EAGAIN.
Rework the stp/etr online handling to fix these problems.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move all EXPORT_SYMBOLs to their corresponding definitions.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Everybody enables it so there is no point for an extra config option.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Split machine check handler code and move it to cio and kernel code
where it belongs to. No functional change.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With CONFIG_NET not set appldata build breaks on s390.
arch/s390/appldata/built-in.o: In function appldata_get_net_sum_data:
appldata_net_sum.c:(.text+0x2684): undefined reference to dev_get_stats
appldata_net_sum.c:(.text+0x2688): undefined reference to init_net
appldata_net_sum.c:(.text+0x268c): undefined reference to init_net
appldata_net_sum.c:(.text+0x2694): undefined reference to dev_base_lock
The following patch fixes the issue for me.
Signed-off-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Currently we use the cpuid (via STIDP instruction) to recognize LPAR,
z/VM and KVM.
The architecture states, that bit 0-7 of STIDP returns all zero, and
if STIDP is executed in a virtual machine, the VM operating system
will replace bits 0-7 with FF.
KVM should not use FE to distinguish z/VM from KVM for interested
guests. The proper way to detect the hypervisor is the STSI (Store
System Information) instruction, which return information about the
hypervisors via function code 3, selector1=2, selector2=2.
This patch changes the detection routine of Linux to use STSI instead
of STIDP. This detection is earlier than bootmem, we have to use a
static buffer. Since STSI expects a 4kb block (4kb aligned) this
patch also changes the init.data alignment for s390. As this section
will be freed during boot, this should be no problem.
Patch is tested with LPAR, z/VM, KVM on LPAR, and KVM under z/VM.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The sie instruction requires address spaces to be switched
to run proper. This patch verifies that this is the case
in s390_enable_sie, otherwise the kernel would crash badly
as soon as the process runs into sie.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With lockdep we got the following trace after a panic:
Badness at /home/autobuild/BUILD/linux-2.6.28-20090204/kernel/lockdep.c:2878
[...]
Call Trace:
[<0000000000176334>] lock_acquire+0x54/0xbc
[<000000000050b4fe>] __atomic_notifier_call_chain+0x6e/0xdc
[<000000000050b59c>] atomic_notifier_call_chain+0x30/0x44
[<0000000000504274>] panic+0xd0/0x1e8
[...]
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<0000000000170e62>] check_flags+0xae/0x15c
possible reason: unannotated irqs-off.
lockdep is right. We missed a trace_hardirq_off in our smp_send_stop
function and smp_send_stop is called before the panic call chain.
Reported-by: Mijo <Safradin mijo@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Initialize per thread timer values instead of just copying them from
the parent. That way it is easily possible to tell how much time a
thread spent in user/system context.
Doesn't fix a bug, this is just for debugging purposes.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix all the whitespace damage in process.c, especially copy_thread().
Next patch will add code to copy_thread() which needs to 'fixed' first.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
All in sysinfo.c is core kernel code and not driver code. So move it
to arch/s390/kernel. Also includes some small cleanups.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
To support High Performance FICON, the DASD device driver has to
translate I/O requests into the new transport mode control words (TCW)
instead of the traditional (command mode) CCW requests.
Signed-off-by: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The dasd device driver will now support ECKD devices with more then
65520 cylinders.
In the traditional ECKD adressing scheme each track is addressed
by a 16-bit cylinder and 16-bit head number. The new addressing
scheme makes use of the fact that the actual number of heads is
never larger then 15, so 12 bits of the head number can be redefined
to be part of the cylinder address.
Signed-off-by: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Provide new shutdown action "dump_reipl" for automatic ipl after dump.
Signed-off-by: Frank Munzert <munzert@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
This patch fixes the SET PREFIX interrupt if triggered by userspace.
Until now, it was not necessary, but life migration will need it. In
addition, it helped me creating SMP support for my kvm_crashme tool
(lets kvm execute random guest memory content).
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The kernel handles some priviledged instruction exits. While I was
unable to trigger such an exit from guest userspace, the code should
check for supervisor state before emulating a priviledged instruction.
I also renamed kvm_s390_handle_priv to kvm_s390_handle_b2. After all
there are non priviledged b2 instructions like stck (store clock).
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
KVM on s390 does not support the ESA/390 architecture. We refuse to
change the architecture mode and print a warning. This patch removes
the printk for several reasons:
o A malicious guest can flood host dmesg
o The old message had no newline
o there is no connection between the message and the failing guest
This patch simply removes the printk. We already set the condition
code to 3 - the guest knows that something went wrong.
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Two KVM archs support irqchips and two don't. Add a Kconfig item to
make selecting between the two models easier.
Signed-off-by: Avi Kivity <avi@redhat.com>
Remove the remaining arch fragments of the old guest debug interface
that now break non-x86 builds.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This rips out the support for KVM_DEBUG_GUEST and introduces a new IOCTL
instead: KVM_SET_GUEST_DEBUG. The IOCTL payload consists of a generic
part, controlling the "main switch" and the single-step feature. The
arch specific part adds an x86 interface for intercepting both types of
debug exceptions separately and re-injecting them when the host was not
interested. Moveover, the foundation for guest debugging via debug
registers is layed.
To signal breakpoint events properly back to userland, an arch-specific
data block is now returned along KVM_EXIT_DEBUG. For x86, the arch block
contains the PC, the debug exception, and relevant debug registers to
tell debug events properly apart.
The availability of this new interface is signaled by
KVM_CAP_SET_GUEST_DEBUG. Empty stubs for not yet supported archs are
provided.
Note that both SVM and VTX are supported, but only the latter was tested
yet. Based on the experience with all those VTX corner case, I would be
fairly surprised if SVM will work out of the box.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
I missed the block size when converting sha512-s390 to shash.
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Impact: build fix on s390 !CONFIG_SMP
Remove arch specific smp_send_stop for !CONFIG_SMP since it conflicts
with a new generic version.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
LKML-Reference: <20090320092410.30d2bac3@osiris.boeblingen.de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After TASK_SIZE now gives the current size of the address space the
upgrade of a 64 bit process from 3 to 4 levels of page table needs
to use the arch_mmap_check hook to catch large mmap lengths. The
get_unmapped_area* functions need to check for -ENOMEM from the
arch_get_unmapped_area*, upgrade the page table and retry.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Make page table walking on s390 more robust. The current code requires
that the pgd/pud/pmd/pte loop is only done for address ranges that are
below the end address of the last vma of the address space. But this
is not always true, e.g. the generic page table walker does not guarantee
this. Change TASK_SIZE/TASK_SIZE_OF to reflect the current size of the
address space. This makes the generic page table walker happy but it
breaks the upgrade of a 3 level page table to a 4 level page table.
To make the upgrade work again another fix is required.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
pfn_valid() actually checks for a valid struct page and not for a
valid pfn. Using xip mappings w/o struct pages, this will result in
-EFAULT returned by the (page table walk) user copy functions,
even though there is valid memory. Those user copy functions don't
need a struct page, so this patch just removes the pfn_valid() check.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With packed stack the backchain is at a different location.
Just use __SF_BACKCHAIN as an offset to store the backchain.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The default values for SD_MC_INIT cause an additional cpu usage of up
to 40% on some network benchmarks compared to the plain SD_CPU_INIT
values. So just define SD_MC_INIT to SD_CPU_INIT.
More tuning needs to be done.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The implementation of __div64_31 for G5 machines is broken. The comments
in __div64_31 are correct, only the code does not do what the comments
say. The part "If the remainder has overflown subtract base and increase
the quotient" is only partially realized, the base is subtracted correctly
but the quotient is only increased if the dividend had the last bit set.
Using the correct instruction fixes the problem.
Cc: stable@kernel.org
Reported-by: Frans Pop <elendil@planet.nl>
Tested-by: Frans Pop <elendil@planet.nl>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With the mandatory algorithm testing at registration, we have
now created a deadlock with algorithms requiring fallbacks.
This can happen if the module containing the algorithm requiring
fallback is loaded first, without the fallback module being loaded
first. The system will then try to test the new algorithm, find
that it needs to load a fallback, and then try to load that.
As both algorithms share the same module alias, it can attempt
to load the original algorithm again and block indefinitely.
As algorithms requiring fallbacks are a special case, we can fix
this by giving them a different module alias than the rest. Then
it's just a matter of using the right aliases according to what
algorithms we're trying to find.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Standby memory detected with the sclp interface gets always registered
with add_memory calls without considering the limitationt that the
"mem=" kernel paramater implies.
So fix this and only register standby memory that is below the specified
limit.
This fixes zfcpdump since it uses "mem=32M". In case there is appr.
2GB standby memory present all of usable memory would be used for the
struct pages needed for standby memory.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
commit aa5e97ce4b
[PATCH] improve precision of process accounting.
Introduced a timing regression:
-bash-3.2# time ls
real 0m0.006s
user 0m1.754s
sys 0m1.094s
The problem was introduced by an error in cputime_to_timeval.
Cputime is now 1/4096 microsecond, therefore, we have to divide
the remainder with 4096 to get the microseconds.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch converts the S390 sha algorithms to the new shash interface.
With fixes by Jan Glauber.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
User space can request hardware and/or software time stamping.
Reporting of the result(s) via a new control message is enabled
separately for each field in the message because some of the
fields may require additional computation and thus cause overhead.
User space can tell the different kinds of time stamps apart
and choose what suits its needs.
When a TX timestamp operation is requested, the TX skb will be cloned
and the clone will be time stamped (in hardware or software) and added
to the socket error queue of the skb, if the skb has a socket
associated with it.
The actual TX timestamp will reach userspace as a RX timestamp on the
cloned packet. If timestamping is requested and no timestamping is
done in the device driver (potentially this may use hardware
timestamping), it will be done in software after the device's
start_hard_xmit routine.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kvm_arch_sync_events is introduced to quiet down all other events may happen
contemporary with VM destroy process, like IRQ handler and work struct for
assigned device.
For kvm_arch_sync_events is called at the very beginning of kvm_destroy_vm(), so
the state of KVM here is legal and can provide a environment to quiet down other
events.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The vdso_per_cpu_data entry in the lowcore structure uses __u32
instead of __u64. If the data page is above 4GB the pointer is
truncated and the kernel crashes.
Reported-by: Mijo Safradin <mijo@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add wrapper functions for the following compat system calls:
* readahead
* sendfile64
* tkill
* tgkill
* keyctl
This ensures that the high order bits of the parameter registers are correctly
sign extended.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Precreate stop_machine threads in case the machine supports ETR/STP.
Otherwise we might deadlock if a time sync operation gets scheduled
and the creation of stop_machine threads would cause disk I/O.
This is just the minimal fix.
The real fix would be to only precreate stop_machine threads if
ETR/STP is actually used. But that would be a rather large and
complicated patch.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
On (initial) cpu hotplug the lowcore values for user_timer and
system_timer don't get initialized like they would get on each
process schedule.
On initial start of secondary cpus this leads to the situation
where per thread user/system_timer values are larger than the
corresponding contents of the lowcore. When later calculating
time spent in user/system context the result can be negative.
So for cpu hotplug we should manually initialize lowcore values.
Fixes this bug:
Kernel BUG at 000ec080 [verbose debug info unavailable]
fixpoint divide exception: 0009 [#1] PREEMPT SMP
Modules linked in:
CPU: 10 Not tainted 2.6.28 #4
Process sysctl (pid: 975, task: 3fa752e0, ksp: 3fbebca0)
Krnl PSW : 070c1000 800ec080 (show_stat+0x390/0x5fc)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0
Krnl GPRS: 7fffffff fefc7ce5 3faec080 003879ae
00000001 01388000 7fffffff 01388000
00000000 00000000 0049ad50 3fbebcf8
01388000 002f51a8 800ec1fe 3fbebcf8
Krnl Code: 800ec076: 9001b188 stm %r0,%r1,392(%r11)
800ec07a: 9801b0c0 lm %r0,%r1,192(%r11)
800ec07e: 1d05 dr %r0,%r5
>800ec080: 9001b0c0 stm %r0,%r1,192(%r11)
800ec084: 5860b0c4 l %r6,196(%r11)
800ec088: 1806 lr %r0,%r6
800ec08a: 8c800001 srdl %r8,1
800ec08e: 1d87 dr %r8,%r7
Call Trace:
([<00000000000ec1ee>] show_stat+0x4fe/0x5fc)
[<00000000000c13e8>] seq_read+0xc4/0x3ac
[<00000000000e4796>] proc_reg_read+0x6e/0x9c
[<00000000000a6a44>] vfs_read+0x78/0x100
[<00000000000a6ba8>] sys_read+0x40/0x80
[<00000000000234a8>] sysc_do_restart+0x1a/0x1e
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
When 31 bit user space programs call sigaltstack on a 64 bit Linux
OS, the system call returns -1 with errno=EFAULT. The 31 bit pointer passed
to the system call is extended to 64 bit, but the high order bits are not
set to zero. The kernel detects the invalid user space pointer and
returns -EFAULT. To solve the problem, sys32_sigaltstack_wrapper()
instead of sys32_sigaltstack() has to be called. The wrapper function sets
the high order bits to zero.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Use the personality() macro to mask out all bits that are not
relevant for the personality type.
The personality field contains bits for other things as well,
so without masking out the not relevalent bits the comparison
won't do what is expected.
Reported-by: Andreas Krebbel <krebbel@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
* 'syscalls' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: (44 commits)
[CVE-2009-0029] s390 specific system call wrappers
[CVE-2009-0029] System call wrappers part 33
[CVE-2009-0029] System call wrappers part 32
[CVE-2009-0029] System call wrappers part 31
[CVE-2009-0029] System call wrappers part 30
[CVE-2009-0029] System call wrappers part 29
[CVE-2009-0029] System call wrappers part 28
[CVE-2009-0029] System call wrappers part 27
[CVE-2009-0029] System call wrappers part 26
[CVE-2009-0029] System call wrappers part 25
[CVE-2009-0029] System call wrappers part 24
[CVE-2009-0029] System call wrappers part 23
[CVE-2009-0029] System call wrappers part 22
[CVE-2009-0029] System call wrappers part 21
[CVE-2009-0029] System call wrappers part 20
[CVE-2009-0029] System call wrappers part 19
[CVE-2009-0029] System call wrappers part 18
[CVE-2009-0029] System call wrappers part 17
[CVE-2009-0029] System call wrappers part 16
[CVE-2009-0029] System call wrappers part 15
...
Add swab.h to kbuild.asm and remove the individual entries from
each arch, mark as unifdef as some arches have some kernel-only
bits inside.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove __attribute__((weak)) from common code sys_pipe implemantation.
IA64, ALPHA, SUPERH (32bit) and SPARC (32bit) have own implemantations
with the same name. Just rename them.
For sys_pipe2 there is no architecture specific implementation.
Cc: Richard Henderson <rth@twiddle.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
As requested by Andrew. Same as what sparc did.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
!CONFIG_SMP:
arch/s390/kernel/vdso.c: In function 'vdso_init':
arch/s390/kernel/vdso.c:325: error: incompatible type for argument 2 of 'vdso_alloc_per_cpu'
Also move the code out of the BUG_ON statement since it won't be
executed on !CONFIG_BUG. And that would be a bug.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The system call isn't wired up on s390. Just delete the dead code.
Also we use the common code sys_ptrace system call, so the sys_ptrace
declaration is pointless is well.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
/include/asm/chpid.h:12: include of <linux/types.h> is preferred over <asm/types.h>
/include/asm/chsc.h:15: found __[us]{8,16,32,64} type without #include <linux/types.h>
/include/asm/cmb.h:28: found __[us]{8,16,32,64} type without #include <linux/types.h>
/include/asm/dasd.h:195: found __[us]{8,16,32,64} type without #include <linux/types.h>
/include/asm/kvm.h:16: include of <linux/types.h> is preferred over <asm/types.h>
/include/asm/kvm.h:30: found __[us]{8,16,32,64} type without #include <linux/types.h>
/include/asm/qeth.h:24: found __[us]{8,16,32,64} type without #include <linux/types.h>
/include/asm/schid.h:5: found __[us]{8,16,32,64} type without #include <linux/types.h>
/include/asm/swab.h:12: include of <linux/types.h> is preferred over <asm/types.h>
/include/asm/swab.h:19: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When the connection between host and storage server is lost, the
dasd device driver usually blocks all I/O on affected devices and
waits for them to reappear. In some setups however it would be
better if the I/O is returned as error so that device can be
recovered by some other means, eg. in a raid or multipath setup.
Signed-off-by: Holger Smolinski <Holger.Smolinski@de.ibm.com>
Signed-off-by: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Bring s390 in line with all the other ports. Not sure how s390 missed
this change as all the other arches were being updated ...
CC: Heiko Carstens <heiko.carstens@de.ibm.com>
CC: linux390@de.ibm.com
CC: linux-s390@vger.kernel.org
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
/include/asm/ptrace.h:275: extern's make no sense in userspace
/include/asm/ptrace.h:279: extern's make no sense in userspace
/include/asm/ptrace.h:280: extern's make no sense in userspace
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
trivial: chack -> check typo fix in main Makefile
trivial: Add a space (and a comma) to a printk in 8250 driver
trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx
trivial: Fix misspelling of "firmware" in powerpc Makefile
trivial: Fix misspelling of "firmware" in usb.c
trivial: Fix misspelling of "firmware" in qla1280.c
trivial: Fix misspelling of "firmware" in a100u2w.c
trivial: Fix misspelling of "firmware" in megaraid.c
trivial: Fix misspelling of "firmware" in ql4_mbx.c
trivial: Fix misspelling of "firmware" in acpi_memhotplug.c
trivial: Fix misspelling of "firmware" in ipw2100.c
trivial: Fix misspelling of "firmware" in atmel.c
trivial: Fix misspelled firmware in Kconfig
trivial: fix an -> a typos in documentation and comments
trivial: fix then -> than typos in comments and documentation
trivial: update Jesper Juhl CREDITS entry with new email
trivial: fix singal -> signal typo
trivial: Fix incorrect use of "loose" in event.c
trivial: printk: fix indentation of new_text_line declaration
trivial: rtc-stk17ta8: fix sparse warning
...
Add kprobe_insn_mutex for protecting kprobe_insn_pages hlist, and remove
kprobe_mutex from architecture dependent code.
This allows us to call arch_remove_kprobe() (and free_insn_slot) while
holding kprobe_mutex.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The atomic_t type cannot currently be used in some header files because it
would create an include loop with asm/atomic.h. Move the type definition
to linux/types.h to break the loop.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Show node to memory section relationship with symlinks in sysfs
Add /sys/devices/system/node/nodeX/memoryY symlinks for all
the memory sections located on nodeX. For example:
/sys/devices/system/node/node1/memory135 -> ../../memory/memory135
indicates that memory section 135 resides on node1.
Also revises documentation to cover this change as well as updating
Documentation/ABI/testing/sysfs-devices-memory to include descriptions
of memory hotremove files 'phys_device', 'phys_index', and 'state'
that were previously not described there.
In addition to it always being a good policy to provide users with
the maximum possible amount of physical location information for
resources that can be hot-added and/or hot-removed, the following
are some (but likely not all) of the user benefits provided by
this change.
Immediate:
- Provides information needed to determine the specific node
on which a defective DIMM is located. This will reduce system
downtime when the node or defective DIMM is swapped out.
- Prevents unintended onlining of a memory section that was
previously offlined due to a defective DIMM. This could happen
during node hot-add when the user or node hot-add assist script
onlines _all_ offlined sections due to user or script inability
to identify the specific memory sections located on the hot-added
node. The consequences of reintroducing the defective memory
could be ugly.
- Provides information needed to vary the amount and distribution
of memory on specific nodes for testing or debugging purposes.
Future:
- Will provide information needed to identify the memory
sections that need to be offlined prior to physical removal
of a specific node.
Symlink creation during boot was tested on 2-node x86_64, 2-node
ppc64, and 2-node ia64 systems. Symlink creation during physical
memory hot-add tested on a 2-node x86_64 system.
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
... and don't bother in callers. Don't bother with zeroing i_blocks,
while we are at it - it's already been zeroed.
i_mode is not worth the effort; it has no common default value.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits)
x86: setup_per_cpu_areas() cleanup
cpumask: fix compile error when CONFIG_NR_CPUS is not defined
cpumask: use alloc_cpumask_var_node where appropriate
cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t
x86: use cpumask_var_t in acpi/boot.c
x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids
sched: put back some stack hog changes that were undone in kernel/sched.c
x86: enable cpus display of kernel_max and offlined cpus
ia64: cpumask fix for is_affinity_mask_valid()
cpumask: convert RCU implementations, fix
xtensa: define __fls
mn10300: define __fls
m32r: define __fls
h8300: define __fls
frv: define __fls
cris: define __fls
cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
cpumask: zero extra bits in alloc_cpumask_var_node
cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/
cpumask: convert mm/
...
* 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
x86: export vector_used_by_percpu_irq
x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
sched: nominate preferred wakeup cpu, fix
x86: fix lguest used_vectors breakage, -v2
x86: fix warning in arch/x86/kernel/io_apic.c
sched: fix warning in kernel/sched.c
sched: move test_sd_parent() to an SMP section of sched.h
sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
sched: activate active load balancing in new idle cpus
sched: bias task wakeups to preferred semi-idle packages
sched: nominate preferred wakeup cpu
sched: favour lower logical cpu number for sched_mc balance
sched: framework for sched_mc/smt_power_savings=N
sched: convert BALANCE_FOR_xx_POWER to inline functions
x86: use possible_cpus=NUM to extend the possible cpus allowed
x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
x86: update io_apic.c to the new cpumask code
x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
x86: xen: use smp_call_function_many()
x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
...
Fixed up trivial conflict in kernel/time/tick-sched.c manually
* 'kvm-updates/2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (140 commits)
KVM: MMU: handle large host sptes on invlpg/resync
KVM: Add locking to virtual i8259 interrupt controller
KVM: MMU: Don't treat a global pte as such if cr4.pge is cleared
MAINTAINERS: Maintainership changes for kvm/ia64
KVM: ia64: Fix kvm_arch_vcpu_ioctl_[gs]et_regs()
KVM: x86: Rework user space NMI injection as KVM_CAP_USER_NMI
KVM: VMX: Fix pending NMI-vs.-IRQ race for user space irqchip
KVM: fix handling of ACK from shared guest IRQ
KVM: MMU: check for present pdptr shadow page in walk_shadow
KVM: Consolidate userspace memory capability reporting into common code
KVM: Advertise the bug in memory region destruction as fixed
KVM: use cpumask_var_t for cpus_hardware_enabled
KVM: use modern cpumask primitives, no cpumask_t on stack
KVM: Extract core of kvm_flush_remote_tlbs/kvm_reload_remote_mmus
KVM: set owner of cpu and vm file operations
anon_inodes: use fops->owner for module refcount
x86: KVM guest: kvm_get_tsc_khz: return khz, not lpj
KVM: MMU: prepopulate the shadow on invlpg
KVM: MMU: skip global pgtables on sync due to cr3 switch
KVM: MMU: collapse remote TLB flushes on root sync
...
Impact: New API
The old topology_core_siblings() and topology_thread_siblings() return
a cpumask_t; these new ones return a (const) struct cpumask *.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
The s390 backend of kvm never calls kvm_vcpu_uninit. This causes
a memory leak of vcpu->run pages.
Lets call kvm_vcpu_uninit in kvm_arch_vcpu_destroy to free
the vcpu->run.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Currently it is impossible to unload the kvm module on s390.
This patch fixes kvm_arch_destroy_vm to release all cpus.
This make it possible to unload the module.
In addition we stop messing with the module refcount in arch code.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The extract cpu time instruction (ectg) instruction allows the user
process to get the current thread cputime without calling into the
kernel. The code that uses the instruction needs to switch to the
access registers mode to get access to the per-cpu info page that
contains the two base values that are needed to calculate the current
cputime from the CPU timer with the ectg instruction.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Distinguish the cputime of the idle process where idle is actually using
cpu cycles from the cputime where idle is sleeping on an enabled wait psw.
The former is accounted as system time, the later as idle time.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Increase the precision of the idle time calculation that is exported
to user space via /sys/devices/system/cpu/cpu<x>/idle_time_us
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The unit of the cputime accouting values that are stored per process is
currently a microsecond. The CPU timer has a maximum granularity of
2**-12 microseconds. There is no benefit in storing the per process values
in the lesser precision and there is the disadvantage that the backend
has to do the rounding to microseconds. The better solution is to use
the maximum granularity of the CPU timer as cputime unit.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The cpu time spent by the idle process actually doing something is
currently accounted as idle time. This is plain wrong, the architectures
that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
time spent doing nothing and the time spent by idle doing work. The first
is accounted with account_idle_time and the second with account_system_time.
The architectures that use the account_xxx_time interface directly and not
the account_xxx_ticks interface now need to do the check for the idle
process in their arch code. In particular to improve the system vs true
idle time accounting the arch code needs to measure the true idle time
instead of just testing for the idle process.
To improve the tick based accounting as well we would need an architecture
primitive that can tell us if the pt_regs of the interrupted context
points to the magic instruction that halts the cpu.
In addition idle time is no more added to the stime of the idle process.
This field now contains the system time of the idle process as it should
be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
every tick that occurs while idle is running will be accounted as idle
time.
This patch contains the necessary common code changes to be able to
distinguish idle system time and true idle time. The architectures with
support for VIRT_CPU_ACCOUNTING need some changes to exploit this.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The utimescaled / stimescaled fields in the task structure and the
global cpustat should be set on all architectures. On s390 the calls
to account_user_time_scaled and account_system_time_scaled never have
been added. In addition system time that is accounted as guest time
to the user time of a process is accounted to the scaled system time
instead of the scaled user time.
To fix the bugs and to prevent future forgetfulness this patch merges
account_system_time_scaled into account_system_time and
account_user_time_scaled into account_user_time.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Michael Neuling <mikey@neuling.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
lguest: struct device - replace bus_id with dev_name()
lguest: move the initial guest page table creation code to the host
kvm-s390: implement config_changed for virtio on s390
virtio_console: support console resizing
virtio: add PCI device release() function
virtio_blk: fix type warning
virtio: block: dynamic maximum segments
virtio: set max_segment_size and max_sectors to infinite.
virtio: avoid implicit use of Linux page size in balloon interface
virtio: hand virtio ring alignment as argument to vring_new_virtqueue
virtio: use KVM_S390_VIRTIO_RING_ALIGN instead of relying on pagesize
virtio: use LGUEST_VRING_ALIGN instead of relying on pagesize
virtio: Don't use PAGE_SIZE for vring alignment in virtio_pci.
virtio: rename 'pagesize' arg to vring_init/vring_size
virtio: Don't use PAGE_SIZE in virtio_pci.c
virtio: struct device - replace bus_id with dev_name(), dev_set_name()
virtio-pci queue allocation not page-aligned
This doesn't really matter, since s390 pagesize is 4k anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
The mm->ioctx_list is currently protected by a reader-writer lock,
so we always grab that lock on the read side for doing ioctx
lookups. As the workload is extremely reader biased, turn this into
an rcu hlist so we can make lookup_ioctx() lockless. Get rid of
the rwlock and use a spinlock for providing update side exclusion.
There's usually only 1 entry on this list, so it doesn't make sense
to look into fancier data structures.
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
net: Allow dependancies of FDDI & Tokenring to be modular.
igb: Fix build warning when DCA is disabled.
net: Fix warning fallout from recent NAPI interface changes.
gro: Fix potential use after free
sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
sfc: When disabling the NIC, close the device rather than unregistering it
sfc: SFT9001: Add cable diagnostics
sfc: Add support for multiple PHY self-tests
sfc: Merge top-level functions for self-tests
sfc: Clean up PHY mode management in loopback self-test
sfc: Fix unreliable link detection in some loopback modes
sfc: Generate unique names for per-NIC workqueues
802.3ad: use standard ethhdr instead of ad_header
802.3ad: generalize out mac address initializer
802.3ad: initialize ports LACPDU from const initializer
802.3ad: remove typedef around ad_system
802.3ad: turn ports is_individual into a bool
802.3ad: turn ports is_enabled into a bool
802.3ad: make ntt bool
ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
...
Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
Like cpu_coregroup_map, but returns a (const) pointer.
Compile-tested on s390 (defconfig).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
The loop above the modified code only terminates when rc is a valid pointer.
A simplified version of the semantic patch that makes this change is as
follows: (http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@r exists@
local idexpression x;
expression E;
position p1,p2;
@@
if (x@p1 == NULL || ...) { ... when forall
return ...; }
... when != \(x=E\|x--\|x++\|--x\|++x\|x-=E\|x+=E\|x|=E\|x&=E\|&x\)
(
x@p2 == NULL
|
x@p2 != NULL
)
// another path to the test that is not through p1?
@s exists@
local idexpression r.x;
position r.p1,r.p2;
@@
... when != x@p1
(
x@p2 == NULL
|
x@p2 != NULL
)
@fix depends on !s@
position r.p1,r.p2;
expression x,E;
statement S1,S2;
@@
(
- if ((x@p2 != NULL) || ...)
S1
|
- if ((x@p2 == NULL) && ...) S1
|
- BUG_ON(x@p2 == NULL);
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch sets the default console device for s390.
The console= kernel parameter can be still used to switch the preferred
console to some other device. In that case, console messages are also
printed on the default console device (ttyS0).
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Tell the compile that the clear_table inline assembly writes to the
memory referenced by *s.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On s390 we always want to run with precise cputime accounting.
Remove the config options VIRT_TIMER and VIRT_CPU_ACCOUNTING.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Interrupts haven't been implemented. So remove the dead code.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce a topology=[on|off] kernel parameter which allows to switch
cpu topology on/off. Default will be off, since it looks like that for
some workloards this doesn't behave very well (on s390).
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add the machine types for z9-bc, z10-ec and z10-bc to the elf_platform
detection in setup_hwcaps.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Early init code clears the backchain of the initial kernel stack frame.
This is not necessary since it is pre initialized with zeros. Plus it
was broken on 64 bit since it cleared only four of eight bytes.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch adds the code generation option for IBM System z10 and
adds a check in head[31,64].S to prevents the execution of a kernel
compiled for a new processor type on an old machine.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Functions which end in a BUG() statement and skip the return statement
cause compile warnings on s390, e.g.:
mm/bootmem.c: In function 'mark_bootmem':
mm/bootmem.c:321: warning: control reaches end of non-void function
To avoid the warning add an endless loop to the BUG() macro.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
disabled_wait() won't return, so add an __attribute__((noreturn)).
This will remove a false positive finding which our internal code
checker reports.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Values for the sac field have changed - update code accordingly.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
A kernel compile on 31 bit gives the following warnings in ptrace.c:
arch/s390/kernel/ptrace.c: In function 'peek_user':
arch/s390/kernel/ptrace.c:207: warning: unused variable 'dummy'
arch/s390/kernel/ptrace.c: In function 'poke_user':
arch/s390/kernel/ptrace.c:315: warning: unused variable 'dummy'
Getting rid of the dummy variables removes the warnings.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This implements just the basic function tracer (_mcount) backend for s390.
The dynamic variant will come later.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a new proc interface /proc/service_levels that allows any code
to report a relevant service level, e.g. the microcode level of
devices, the service level of the hypervisor, etc.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
- make qdio_trace a per device view
- remove s390dbf exceptions
- remove CONFIG_QDIO_DEBUG, not needed anymore if we check for the level
before calling sprintf
- use snprintf for dbf entries
- add start markers to see if the dbf view wrapped
- add a global error view for all queues
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
qeth needs to get the port count information before
qdio has allocated a page for the chsc operation.
Extend qdio_get_ssqd_desc() to store the data in the
specified structure.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When the machine supports AP adapter interrupts polling will be
switched off at module initialization and the driver will work in
interrupt mode.
Signed-off-by: Felix Beck <felix.beck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
stfle will be needed by the ap_bus module to figure out wether the AP
queue adapter interruption facility is installed.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Since etr/stp don't need the old smp_call_function semantics anymore
we can convert s390 to the generic IPI infrastructure.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The work function dispatched with schedule_work() can be run twice
on different cpus because run_workqueue clears the WORK_STRUCT_PENDING
bit and then executes the function. Another cpu can call schedule_work()
again and run the work function a second time before the first call
is completed. This patch serialized the etr and stp work function with
a mutex.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This converts the etr and stp code to the new stop_machine interface
which allows to synchronize all cpus without allocating any memory.
This way we get rid of the only reason why we haven't converted s390
to the generic IPI interface yet.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a vdso to speed up gettimeofday and clock_getres/clock_gettime for
CLOCK_REALTIME/CLOCK_MONOTONIC.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Call rebuild_sched_domains instead of arch_reinit_sched_domains if
cpu topology changes. This leaves cpu sets alone which otherwise would
be destroyed.
If and how it makes sense to define cpu sets on a virtualized
architecture is another question.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On s390 we have ret_from_fork jump not to the "do all work we
normally do on return from syscall" as on x86, ppc, etc., but to the
"do all such work except audit". Historical reasons - the codepath
triggered when we have AUDIT process flag set is separated from the
normall one and they converge at sysc_return, which is the common
part of post-syscall work. And does not include calling audit_syscall_exit() -
that's done in the end of sysc_tracesys path, just before that path jumps
to sysc_return.
IOW, the child returning from fork()/clone()/vfork() doesn't
call audit_syscall_exit() at all, so no matter what we do with its
audit context, we are not going to see the audit entry.
The fix is simple: have ret_from_fork go to the point just past
the call of sys_.... in the 'we have AUDIT flag set' path. There we
have (64bit variant; for 31bit the situation is the same):
sysc_tracenogo:
tm __TI_flags+7(%r9),(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT)
jz sysc_return
la %r2,SP_PTREGS(%r15) # load pt_regs
larl %r14,sysc_return # return point is sysc_return
jg do_syscall_trace_exit
which is precisely what we need - check the flag, bugger off to sysc_return
if not set, otherwise call do_syscall_trace_exit() and bugger off to
sysc_return. r9 has just been properly set by ret_from_fork itself,
so we are fine.
Tested on s390x, seems to work fine. WARNING: it's been about
16 years since my last contact with 3X0 assembler[1], so additional
review would be very welcome. I don't think I've managed to screw it
up, but...
[1] that *was* in another country and besides, the box is dead...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Common code doesn't call arch_update_cpu_topology() anymore on
cpu hotplug. But our architecture backend relied on that in order to
update the cpu_core_map. For machines without cpu topology support
this leads uninitialized cpu_core_maps for later on added cpus.
To solve this just initialize the maps with cpu_possible_map, since
that will be always valid for machines without topology support.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Impact: change calling convention of existing clock_event APIs
struct clock_event_timer's cpumask field gets changed to take pointer,
as does the ->broadcast function.
Another single-patch change. For safety, we BUG_ON() in
clockevents_register_device() if it's not set.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Each SMP arch defines these themselves. Move them to a central
location.
Twists:
1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a
CONFIG_INIT_ALL_POSSIBLE for this rather than break them.
2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'.
Those archs simply have phys_cpu_present_map replaced everywhere.
3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky
so I just manipulate them both in sync.
4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map'
declarations.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Tested-by: Tony Luck <tony.luck@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Mike Travis <travis@sgi.com>
Cc: ink@jurassic.park.msu.ru
Cc: rmk@arm.linux.org.uk
Cc: starvik@axis.com
Cc: tony.luck@intel.com
Cc: takata@linux-m32r.org
Cc: ralf@linux-mips.org
Cc: grundler@parisc-linux.org
Cc: paulus@samba.org
Cc: schwidefsky@de.ibm.com
Cc: lethal@linux-sh.org
Cc: wli@holomorphy.com
Cc: davem@davemloft.net
Cc: jdike@addtoit.com
Cc: mingo@redhat.com
Change arch_update_cpu_topology so it returns 1 if the cpu topology changed
and 0 if it didn't change. This will be useful for the next patch which adds
a call to this function in partition_sched_domains.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
fs/nfsd/nfs4recover.c
Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().
Signed-off-by: James Morris <jmorris@namei.org>
* 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
KVM: MMU: avoid creation of unreachable pages in the shadow
KVM: ppc: stop leaking host memory on VM exit
KVM: MMU: fix sync of ptes addressed at owner pagetable
KVM: ia64: Fix: Use correct calling convention for PAL_VPS_RESUME_HANDLER
KVM: ia64: Fix incorrect kbuild CFLAGS override
KVM: VMX: Fix interrupt loss during race with NMI
KVM: s390: Fix problem state handling in guest sigp handler
All architectures now use the generic compat_sys_ptrace, as should every
new architecture that needs 32bit compat (if we'll ever get another).
Remove the now superflous __ARCH_WANT_COMPAT_SYS_PTRACE define, and also
kill a comment about __ARCH_SYS_PTRACE that was added after
__ARCH_SYS_PTRACE was already gone.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need an alignment of 16384 bytes for the initial kernel stack if
the kernel is configured for 16384 bytes stacks but the linker script
currently guarantees only an alignment of 8192 bytes.
So fix this and simply use THREAD_SIZE as alignment value which will
always do the right thing.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When running several kvm processes with lots of memory overcommitment,
we have seen an oops during process shutdown:
------------[ cut here ]------------
Kernel BUG at 0000000000193434 [verbose debug info unavailable]
addressing exception: 0005 [#1] PREEMPT SMP
Modules linked in: kvm sunrpc qeth_l2 dm_mod qeth ccwgroup
CPU: 10 Not tainted 2.6.28-rc4-kvm-bigiron-00521-g0ccca08-dirty #8
Process kuli (pid: 14460, task: 0000000149822338, ksp: 0000000024f57650)
Krnl PSW : 0704e00180000000 0000000000193434 (unmap_vmas+0x884/0xf10)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 EA:3
Krnl GPRS: 0000000000000002 0000000000000000 000000051008d000 000003e05e6034e0
00000000001933f6 00000000000001e9 0000000407259e0a 00000002be88c400
00000200001c1000 0000000407259608 0000000407259e08 0000000024f577f0
0000000407259e09 0000000000445fa8 00000000001933f6 0000000024f577f0
Krnl Code: 0000000000193426: eb22000c000d sllg %r2,%r2,12
000000000019342c: a7180000 lhi %r1,0
0000000000193430: b2290012 iske %r1,%r2
>0000000000193434: a7110002 tmll %r1,2
0000000000193438: a7840006 brc 8,193444
000000000019343c: 9602c000 oi 0(%r12),2
0000000000193440: 96806000 oi 0(%r6),128
0000000000193444: a7110004 tmll %r1,4
Call Trace:
([<00000000001933f6>] unmap_vmas+0x846/0xf10)
[<0000000000199680>] exit_mmap+0x210/0x458
[<000000000012a8f8>] mmput+0x54/0xfc
[<000000000012f714>] exit_mm+0x134/0x144
[<000000000013120c>] do_exit+0x240/0x878
[<00000000001318dc>] do_group_exit+0x98/0xc8
[<000000000013e6b0>] get_signal_to_deliver+0x30c/0x358
[<000000000010bee0>] do_signal+0xec/0x860
[<0000000000112e30>] sysc_sigpending+0xe/0x22
[<000002000013198a>] 0x2000013198a
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<00000000001a68d0>] free_swap_and_cache+0x1a0/0x1a4
<4>---[ end trace bc19f1d51ac9db7c ]---
The faulting instruction is the storage key operation (iske) in
ptep_rcp_copy (called by pte_clear, called by unmap_vmas). iske
reads dirty and reference bit information for a physical page and
requires a valid physical address. Since we are in pte_clear, we
cannot rely on the pte containing a valid address. Fortunately we
dont need these information in pte_clear - after all there is no
mapping. The best fix is to remove the needless call to ptep_rcp_copy
that contains the iske.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
CONFIG_PRINTK_TIME reveals that sched_clock has a wrong offset during boot:
..
[ 0.000000] Movable zone: 0 pages used for memmap
[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 775679
[ 0.000000] Kernel command line: dasd=4b6c root=/dev/dasda1 ro noinitrd
[ 0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
[6920575.975232] console [ttyS0] enabled
[6920575.987586] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
[6920575.991404] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
..
The s390 implementation of sched_clock uses the store clock instruction and
subtracts jiffies_timer_cc.
jiffies_timer_cc is a local variable in arch/s390/kernel/time.c and only used
for sched_clock and monotonic clock. For historical reasons there is an offset
on that value. With todays code this offset is unnecessary. By removing that
offset we can get a sched_clock which returns the nanoseconds after time_init.
This improves CONFIG_PRINTK_TIME.
Since sched_clock is the only user, I have also renamed jiffies_timer_cc to
sched_clock_base_cc. In addition, the local variable init_timer_cc is redundant
and can be romved as well.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
syscall_get_nr() currently returns a valid result only if the call
chain of the traced process includes do_syscall_trace_enter(). But
collect_syscall() can be called for any sleeping task, the result of
syscall_get_nr() in general is completely bogus.
To make syscall_get_nr() work for any sleeping task the traps field
in pt_regs is replace with svcnr - the system call number the process
is executing. If svcnr == 0 the process is not on a system call path.
The syscall_get_arguments and syscall_set_arguments use regs->gprs[2]
for the first system call parameter. This is incorrect since gprs[2]
may have been overwritten with the system call number if the call
chain includes do_syscall_trace_enter. Use regs->orig_gprs2 instead.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We can get an exit for instructions starting with 0xae, even if the guest is
in userspace. Lets make sure, that the signal processor handler is only called
in guest supervisor mode. Otherwise, send a program check.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
In order for the network device ops get_stats call to be immutable, the handling
of the default internal network device stats block has to be changed. Add a new
helper function which replaces the old use of internal_get_stats.
Note: change return code to make it clear that the caller should not
go changing the returned statistics.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The uname system call for 64 bit compares current->personality without
masking the upper 16 bits. If e.g. READ_IMPLIES_EXEC is set the result
of a uname system call will always be s390x even if the process uses
the s390 personality.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
cpu_coregroup_map used to grab a mutex on s390 since it was only
called from process context.
Since c7c22e4d5c "block: add support
for IO CPU affinity" this is not true anymore.
It now also gets called from softirq context.
To prevent possible deadlocks change this in architecture code and
use a spinlock instead of a mutex.
Cc: stable@kernel.org
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With CONFIG_IRQSOFF_TRACER the trace_hardirqs_off() function includes
a call to __builtin_return_address(1). But we calltrace_hardirqs_off()
from early entry code. There we have just a single stack frame.
So this results in a kernel stack backchain walk that would walk beyond
the kernel stack. Following the NULL terminated backchain this results
in a lowcore read access.
To fix this we simply call trace_hardirqs_off_caller() and pass the
current instruction pointer.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Disable tracing on idle psw. Otherwise it would give us huge
preempt off times for idle. Which is rather pointless.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
arch/s390/kernel/built-in.o: In function `cleanup_io_leave_insn':
mem_detect.c:(.text+0x10592): undefined reference to `lockdep_sys_exit'
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
add_active_range() expects start_pfn + size as end_pfn value, i.e. not
the pfn of the last page frame but the one behind that.
We used the pfn of the last page frame so far, which can lead to a
BUG_ON in move_freepages(), when the kernelcore parameter is specified
(page_zone(start_page) != page_zone(end_page)).
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Separate the task security context from task_struct. At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.
Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.
With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.
Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
The s390 kernel does not compile if virtio console is enabled, but guest
support is disabled:
LD .tmp_vmlinux1
arch/s390/kernel/built-in.o: In function `setup_arch':
/space/linux-2.5/arch/s390/kernel/setup.c:773: undefined reference to
`s390_virtio_console_init'
The fix is related to
commit 99e65c92f2
Author: Christian Borntraeger <borntraeger@de.ibm.com>
Date: Fri Jul 25 15:50:04 2008 +0200
KVM: s390: Fix guest kconfig
Which changed the build process to build kvm_virtio.c only if CONFIG_S390_GUEST
is set. We must ifdef the prototype in the header file accordingly.
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
allyesconfig and allmodconfig built kernels have a tape IPL record.
A the vmreader record makes much more sense, since hardly anybody will
ever IPL a kernel from tape. So change the default.
As I side effect I can test these kernels without fiddling around with
the kernel config ;)
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use sysdev_class_create_file() to create create sysdev class attributes
instead of sysfs_create_file(). Using sysfs_create_file() wasn't a very
good idea since the show and store functions have a different amount of
parameters for sysfs files and sysdev class files.
In particular the pointer to the buffer is the last argument and
therefore accesses to random memory regions happened.
Still worked surprisingly well until we got a kernel panic.
Cc: stable@kernel.org
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The current enable_sie code sets the mm->context.pgstes bit to tell
dup_mm that the new mm should have extended page tables. This bit is also
used by the s390 specific page table primitives to decide about the page
table layout - which means context.pgstes has two meanings. This can cause
any kind of bugs. For example - e.g. shrink_zone can call
ptep_clear_flush_young while enable_sie is running. ptep_clear_flush_young
will test for context.pgstes. Since enable_sie changed that value of the old
struct mm without changing the page table layout ptep_clear_flush_young will
do the wrong thing.
The solution is to split pgstes into two bits
- one for the allocation
- one for the current state
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch implements a new freezer subsystem in the control groups
framework. It provides a way to stop and resume execution of all tasks in
a cgroup by writing in the cgroup filesystem.
The freezer subsystem in the container filesystem defines a file named
freezer.state. Writing "FROZEN" to the state file will freeze all tasks
in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in
the cgroup. Reading will return the current state.
* Examples of usage :
# mkdir /containers/freezer
# mount -t cgroup -ofreezer freezer /containers
# mkdir /containers/0
# echo $some_pid > /containers/0/tasks
to get status of the freezer subsystem :
# cat /containers/0/freezer.state
RUNNING
to freeze all tasks in the container :
# echo FROZEN > /containers/0/freezer.state
# cat /containers/0/freezer.state
FREEZING
# cat /containers/0/freezer.state
FROZEN
to unfreeze all tasks in the container :
# echo RUNNING > /containers/0/freezer.state
# cat /containers/0/freezer.state
RUNNING
This is the basic mechanism which should do the right thing for user space
task in a simple scenario.
It's important to note that freezing can be incomplete. In that case we
return EBUSY. This means that some tasks in the cgroup are busy doing
something that prevents us from completely freezing the cgroup at this
time. After EBUSY, the cgroup will remain partially frozen -- reflected
by freezer.state reporting "FREEZING" when read. The state will remain
"FREEZING" until one of these things happens:
1) Userspace cancels the freezing operation by writing "RUNNING" to
the freezer.state file
2) Userspace retries the freezing operation by writing "FROZEN" to
the freezer.state file (writing "FREEZING" is not legal
and returns EIO)
3) The tasks that blocked the cgroup from entering the "FROZEN"
state disappear from the cgroup's set of tasks.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: export thaw_process]
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Tested-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch series introduces a cgroup subsystem that utilizes the swsusp
freezer to freeze a group of tasks. It's immediately useful for batch job
management scripts. It should also be useful in the future for
implementing container checkpoint/restart.
The freezer subsystem in the container filesystem defines a cgroup file
named freezer.state. Reading freezer.state will return the current state
of the cgroup. Writing "FROZEN" to the state file will freeze all tasks
in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in
the cgroup.
* Examples of usage :
# mkdir /containers/freezer
# mount -t cgroup -ofreezer freezer /containers
# mkdir /containers/0
# echo $some_pid > /containers/0/tasks
to get status of the freezer subsystem :
# cat /containers/0/freezer.state
RUNNING
to freeze all tasks in the container :
# echo FROZEN > /containers/0/freezer.state
# cat /containers/0/freezer.state
FREEZING
# cat /containers/0/freezer.state
FROZEN
to unfreeze all tasks in the container :
# echo RUNNING > /containers/0/freezer.state
# cat /containers/0/freezer.state
RUNNING
This patch:
The first step in making the refrigerator() available to all
architectures, even for those without power management.
The purpose of such a change is to be able to use the refrigerator() in a
new control group subsystem which will implement a control group freezer.
[akpm@linux-foundation.org: fix sparc]
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Acked-by: Pavel Machek <pavel@suse.cz>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>
Tested-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is nothing architecture specific about remove_memory().
remove_memory() function is common for all architectures which support
hotplug memory remove. Instead of duplicating it in every architecture,
collapse them into arch neutral function.
[akpm@linux-foundation.org: fix the export]
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Gary Hade <garyhade@us.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (134 commits)
KVM: ia64: Add intel iommu support for guests.
KVM: ia64: add directed mmio range support for kvm guests
KVM: ia64: Make pmt table be able to hold physical mmio entries.
KVM: Move irqchip_in_kernel() from ioapic.h to irq.h
KVM: Separate irq ack notification out of arch/x86/kvm/irq.c
KVM: Change is_mmio_pfn to kvm_is_mmio_pfn, and make it common for all archs
KVM: Move device assignment logic to common code
KVM: Device Assignment: Move vtd.c from arch/x86/kvm/ to virt/kvm/
KVM: VMX: enable invlpg exiting if EPT is disabled
KVM: x86: Silence various LAPIC-related host kernel messages
KVM: Device Assignment: Map mmio pages into VT-d page table
KVM: PIC: enhance IPI avoidance
KVM: MMU: add "oos_shadow" parameter to disable oos
KVM: MMU: speed up mmu_unsync_walk
KVM: MMU: out of sync shadow core
KVM: MMU: mmu_convert_notrap helper
KVM: MMU: awareness of new kvm_mmu_zap_page behaviour
KVM: MMU: mmu_parent_walk
KVM: x86: trap invlpg
KVM: MMU: sync roots on mmu reload
...
Nothing arch specific in get/settimeofday. The details of the timeval
conversion varied a little from arch to arch, but all with the same
results.
Also add an extern declaration for sys_tz to linux/time.h because externs
in .c files are fowned upon. I'll kill the externs in various other files
in a sparate patch.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David S. Miller <davem@davemloft.net> [ sparc bits ]
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
struct stat / compat_stat is the same on all architectures, so
cp_compat_stat should be, too.
Turns out it is, except that various architectures have slightly and some
high2lowuid/high2lowgid or the direct assignment instead of the
SET_UID/SET_GID that expands to the correct one anyway.
This patch replaces the arch-specific cp_compat_stat implementations with
a common one based on the x86-64 one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David S. Miller <davem@davemloft.net> [ sparc bits ]
Acked-by: Kyle McMartin <kyle@mcmartin.ca> [ parisc bits ]
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The SET_PERSONALITY macro is always called with a second argument of 0.
Remove the ibcs argument and the various tests to set the PER_SVR4
personality.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The current help text for CONFIG_S390_GUEST is not very helpful.
Lets add more text.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Heiko Carstens pointed out, that its safer to activate working facilities
instead of disabling problematic facilities. The new code uses the host
facility bits and masks it with known good ones.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.
This was posted for review some time ago and I believe its been in -mm
since then.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
chsc_sstpc returns -EIO on error and 0 on success but stp_reset checks
against 1 instead of 0. chsc_sstpc used to return 1 on success, one
call location has not been updated ..
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch defines a dirty bit in the PGSTE that can be used to implement
dirty pages logging for KVM's live migration. The bit is set in the
ptep_rcp_copy function, which is called to save dirty and referenced information
from the storage key in the PGSTE. The bit can be tested and reset by KVM using
the kvm_s390_test_and_clear_page_dirty function that is introduced by this patch.
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Florian Funke <ffunke@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
EMC Symmetrix Subsystem Control I/O through CKD dasd requires a
specific parameter list sent to the array via a Perform Subsystem
Function CCW. The Symmetrix response is retrieved from the array
via a Read Subsystem Data CCW.
Signed-off-by: Nigel Hislop <hislop_nigel@emc.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move cio's private simple udelay function to lib/delay.c and turn it
into something much more readable. So we have all implementations
at one place.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The DCSS block device driver is modified to add >2G DCSSs support and
allow a DCSS block device to map to a set of contiguous DCSSs. The
extmem code is also modified to use new Diagnose x'64' subcodes for
>2G DCSSs.
Signed-off-by: Hongjie Yang <hongjie@us.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* System call parameter and result access functions
* Add tracehook calls
* Split syscall_trace into two functions do_syscall_trace_enter and
do_syscall_trace_exit
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
sys32_pause is a useless copy of the generic sys_pause.
(and it's certainly not there for old sparc32 binaries..)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add support for z10 HiperSockets multiwrite SBALs on output
queues. This is used on LPAR with EDDP enabled devices.
Signed-off-by: Klaus-Dieter Wacker <kdwacker@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This fixes a regression that came with 934b2857cc
("[S390] nohz/sclp: disable timer on synchronous waits.").
If udelay() gets called from a disabled context it sets the clock comparator
to a value where it expects the next interrupt. When the interrupt happens
the clock comparator gets not reset and therefore the interrupt condition
doesn't get cleared. The result is an endless timer interrupt loop.
In addition this patch fixes also the following:
rcutorture reveals that our __udelay implementation is still buggy,
since it might schedule tasklets, but prevents their execution:
NOHZ: local_softirq_pending 42
NOHZ: local_softirq_pending 02
NOHZ: local_softirq_pending 142
NOHZ: local_softirq_pending 02
To fix this we make sure that only the clock comparator interrupt
is enabled when the enabled wait psw is loaded.
Also no code gets called anymore which might schedule tasklets.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When running a 31-bit ptrace, on either an s390 or s390x kernel,
reads and writes into a padding area in struct user_regs_struct32
will result in a kernel panic.
This is also known as CVE-2008-1514.
Test case available here:
http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/user-area-padding.c?cvsroot=systemtap
Steps to reproduce:
1) wget the above
2) gcc -o user-area-padding-31bit user-area-padding.c -Wall -ggdb2 -D_GNU_SOURCE -m31
3) ./user-area-padding-31bit
<panic>
Test status
-----------
Without patch, both s390 and s390x kernels panic. With patch, the test case,
as well as the gdb testsuite, pass without incident, padding area reads
returning zero, writes ignored.
Nb: original version returned -EINVAL on write attempts, which broke the
gdb test and made the test case slightly unhappy, Jan Kratochvil suggested
the change to return 0 on write attempts.
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Right now, there is no notifier that is called on a new cpu, before the new
cpu begins processing interrupts/softirqs.
Various kernel function would need that notification, e.g. kvm works around
by calling smp_call_function_single(), rcu polls cpu_online_map.
The patch adds a CPU_STARTING notification. It also adds a helper function
that sends the message to all cpu_chain handlers.
Tested on x86-64.
All other archs are untested. Especially on sparc, I'm not sure if I got
it right.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
ext4 does not work on s390 because ext2_find_next_bit is broken. Fortunately
this function is only used by ext4. The function uses ffs which does not work
analog to ffz. The result of ffs has an offset of 1 which is not taken into
account. To fix this use the low level __ffs_word function directly instead
of the ill defined ffs.
In addition the patch improves find_next_zero_bit and ext2_find_next_zero_bit
by passing the bit offset into __ffz_word instead of adding it after the
function call returned.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove the now unneeded s390_idle.lock spinlock initialization after
Josef Sipek did it the right way in arch/s390/kernel/process.c.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because control
page is used for not only code on some platform. For example in kexec
jump, it is used for data and stack too.
[akpm@linux-foundation.org: unbreak powerpc and arm, finish conversion]
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'kvm-updates-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
KVM: s390: Fix kvm on IBM System z10
KVM: Advertise synchronized mmu support to userspace
KVM: Synchronize guest physical memory map to host virtual memory map
KVM: Allow browsing memslots with mmu_lock
KVM: Allow reading aliases with mmu_lock
Fix these two (false positive) warnings by adding an __init annoation:
WARNING: vmlinux.o(.text+0x7e6a): Section mismatch in reference from the function stp_reset() to the function .init.text:__alloc_bootmem()
The function stp_reset() references
the function __init __alloc_bootmem().
This is often because stp_reset lacks a __init
annotation or the annotation of __alloc_bootmem is wrong.
WARNING: vmlinux.o(.text+0x7ece): Section mismatch in reference from the function stp_reset() to the function .init.text:free_bootmem()
The function stp_reset() references
the function __init free_bootmem().
This is often because stp_reset lacks a __init
annotation or the annotation of free_bootmem is wrong.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The result of the diag 0x260 call is not always what one would expect.
So just remove it.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
sclp_sync_wait wait synchronously for an sclp interrupt and disables
timer interrupts. However on the irq enter paths there is an extra
check if a timer interrupt would be due and calls the timer callback.
This would schedule softirqs in the wrong context.
So introduce local_tick_enable/disable which prevents this.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
During startup we check if diag308 works using diag 308 subcode 6,
which stores the actual ipl information. This fails with rc = 0x102, if
the system has been ipled from the HMC using load from CD or load from file.
In the case of rc = 0x102 we have to assume that diag 308 is working,
since it still can be used to ipl from an alternative device.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The z10 system supports large pages, kvm-s390 doesnt.
Make sure that we dont advertise large pages to avoid the guest crashing as
soon as the guest kernel activates DAT.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The lctl(g) instructions require a specific alignment for the parameters.
The architecture requires a specification program check if these alignments
are not used. Enforcing this alignment also removes a possible host BUG,
since the get_guest functions check for proper alignment and emits a BUG.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Lets fix the name for the lctlg instruction...
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The current interrupt handling on s390 misbehaves on an error case. On s390
each cpu has the prefix area (lowcore) for interrupt delivery. This memory
must always be available. If we fail to access the prefix area for a guest
on interrupt delivery the configuration is completely unusable. There is no
point in sending another program interrupt to an inaccessible lowcore.
Furthermore, we should not bug the host kernel, because this can be triggered
by userspace. I think the guest kernel itself can not trigger the problem, as
SET PREFIX and SIGNAL PROCESSOR SET PREFIX both check that the memory is
available and sane. As this is a userspace bug (e.g. setting the wrong guest
offset, unmapping guest memory) we should kill the userspace process instead
of BUGing the host kernel.
In the long term we probably should notify the userspace process about this
problem.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
All registers are unsigned long types. This patch changes all occurences
of guestaddr in gaccess from u64 to unsigned long.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
KVM_CAP_USER_MEMORY is used by s390, therefore, we should advertise it.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Remove arch-specific show_mem() in favor of the generic version.
This also removes the following redundant information display:
- pages in swapcache, printed by show_swap_cache_info()
where show_mem() calls show_free_areas(), which calls
show_swap_cache_info().
Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that it is safe to use get_online_cpus() we can revert
[S390] cpu topology: Fix possible deadlock.
commit: fd781fa25c
and call arch_reinit_sched_domains() directly from topology_work_fn().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently list of kretprobe instances are stored in kretprobe object (as
used_instances,free_instances) and in kretprobe hash table. We have one
global kretprobe lock to serialise the access to these lists. This causes
only one kretprobe handler to execute at a time. Hence affects system
performance, particularly on SMP systems and when return probe is set on
lot of functions (like on all systemcalls).
Solution proposed here gives fine-grain locks that performs better on SMP
system compared to present kretprobe implementation.
Solution:
1) Instead of having one global lock to protect kretprobe instances
present in kretprobe object and kretprobe hash table. We will have
two locks, one lock for protecting kretprobe hash table and another
lock for kretporbe object.
2) We hold lock present in kretprobe object while we modify kretprobe
instance in kretprobe object and we hold per-hash-list lock while
modifying kretprobe instances present in that hash list. To prevent
deadlock, we never grab a per-hash-list lock while holding a kretprobe
lock.
3) We can remove used_instances from struct kretprobe, as we can
track used instances of kretprobe instances using kretprobe hash
table.
Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
with return probes set on all systemcalls looks like this.
cacheline non-cacheline Un-patched kernel
aligned patch aligned patch
===============================================================================
real 9m46.784s 9m54.412s 10m2.450s
user 40m5.715s 40m7.142s 40m4.273s
sys 2m57.754s 2m58.583s 3m17.430s
===========================================================
Time duration for kernel compilation ("make -j 8) on the same system, when
kernel is not probed.
=========================
real 9m26.389s
user 40m8.775s
sys 2m7.283s
=========================
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch enables virtio_console as the default console on kvm for
s390. We currently use the same notify hack as lguest for early
console output. I will try to address this for lguest and s390 later.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
nohz: adjust tick_nohz_stop_sched_tick() call of s390 as well
nohz: prevent tick stop outside of the idle loop
Straight forward extensions for huge pages located in the PUD instead of
PMDs.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The goal of this patchset is to support multiple hugetlb page sizes. This
is achieved by introducing a new struct hstate structure, which
encapsulates the important hugetlb state and constants (eg. huge page
size, number of huge pages currently allocated, etc).
The hstate structure is then passed around the code which requires these
fields, they will do the right thing regardless of the exact hstate they
are operating on.
This patch adds the hstate structure, with a single global instance of it
(default_hstate), and does the basic work of converting hugetlb to use the
hstate.
Future patches will add more hstate structures to allow for different
hugetlbfs mounts to have different page sizes.
[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Adam Litke <agl@us.ibm.com>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allow to dynamically generate attributes and share show/store
functions between attributes. Right now most attributes are generated
by special macros and lots of duplicated code. With the attribute
passed it's instead possible to attach some data to the attribute
and then use that in shared low level functions to do different things.
I need this for the dynamically generated bank attributes in the x86
machine check code, but it'll allow some further cleanups.
I converted all users in tree to the new show/store prototype. It's a single
huge patch to avoid unbisectable sections.
Runtime tested: x86-32, x86-64
Compiled only: ia64, powerpc
Not compile tested/only grep converted: sh, arm, avr32
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Flush the shadow mmu before removing regions to avoid stale entries.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
While doing some tests with our lcrash implementation I have seen a
naming conflict with prefix_info in kvm_host.h vs. addrconf.h
To avoid future conflicts lets rename private definitions in
asm/kvm_host.h by adding the kvm_s390 prefix.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Some machines do not accept 16EB as guest storage limit. Lets change the
default for the guest storage limit to a sane value. We also should set
the guest_origin to what userspace thinks it is. This allows guests
starting at an address != 0.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch fixes a memory leak, we want to free the physmem when destroying
the vm.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Add missing module.h include to fix this:
CC arch/s390/kernel/stacktrace.o
arch/s390/kernel/stacktrace.c:84: warning: data definition has no type or storage class
arch/s390/kernel/stacktrace.c:84: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
arch/s390/kernel/stacktrace.c:84: warning: parameter names (without types) in function declaration
arch/s390/kernel/stacktrace.c:97: warning: data definition has no type or storage class
arch/s390/kernel/stacktrace.c:97: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
arch/s390/kernel/stacktrace.c:97: warning: parameter names (without types) in function declaration
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Compiling a kernel with allmodconfig or allyesconfig results in tons
of gcc warnings, because the default maximum stacksize from which on
gcc will emit a warning is just 256 bytes.
Increase this to 2048, so these warnings don't distract from the real
warnings that we need to watch at.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Most likely it is broken anyway because of the changes in memory
detection. Since we can't test it and there are probably better ways
that using a P390 card, remove support for it.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move memory detection code to own file and also simplify it.
Also add an interface which can be called at any time to get the
current memory layout. This interface is needed by our kernel
internal system dumper.
Cc: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Cc: Michael Holzheu <holzheu@de.ibm.com>
Cc: Frank Munzert <munzert@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
As noted by Akinobu Mita in patch b1fceac2b9,
alloc_bootmem and related functions never return NULL and always return a
zeroed region of memory. Thus a NULL test or memset after calls to these
functions is unnecessary.
arch/s390/kernel/topology.c | 2 --
1 file changed, 2 deletions(-)
This was fixed using the following semantic patch.
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
expression E;
statement S;
@@
E = \(alloc_bootmem\|alloc_bootmem_low\|alloc_bootmem_pages\|alloc_bootmem_low_pages\)(...)
... when != E
(
- BUG_ON (E == NULL);
|
- if (E == NULL) S
)
@@
expression E,E1;
@@
E = \(alloc_bootmem\|alloc_bootmem_low\|alloc_bootmem_pages\|alloc_bootmem_low_pages\)(...)
... when != E
- memset(E,0,E1);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Now it is possible to specify additional kernel parameters on the IPL
command line using the IPL PARM option.
If the Linux system is already running, the new reipl sysfs attribute
'parm' can be used to change kernel parameters for the next reboot.
Examples:
IPL C PARM dasd=1234 root=/dev/dasda1
IPL 1234 PARM savesys=mylnxnss
echo "init=/bin/bash" > /sys/firmware/reipl/ccw/parm
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The idle notifier chain consists of at most one element. So there's
no point in having a notifier chain. Remove it and directly call the
function.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch adds a driver for subchannels of type chsc.
A device /dev/chsc is created which may be used to issue ioctls to:
- obtain information about the machine's I/O configuration
- dynamically change the machine's I/O configuration via
asynchronous chsc commands
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Add support for clock synchronization with the server time protocol.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
In case the initrd is located within the bss section it will be
overwritten when the section is cleared. To prevent this just move
the initrd right behind the bss section if it starts within the
section.
The current code already moves the initrd if the bootmem allocator
bitmap would overwrite it. With this patch we should be safe against
initrd corruptions.
Cc: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add the user_regset definitions for normal and compat processes, replace
the dump_regs core dump cruft with the generic CORE_DUMP_USER_REGSET and
replace binfmt_elf32.c with the generic compat_binfmt_elf.c implementation.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Using the ipldelay kernel parameter leads to a crash at IPL time.
Since this is broken since a long time it looks like nobody is using
it anymore. So remove it instead of fixing it.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Avoid compile error by using EXPORT_SYMBOL_GPL(si_swapinfo) only if
CONFIG_SWAP is set.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Bevore issuing any s390 crypto operation check whether the
CPACF facility is enabled in the facility list. That way a
virtualization layer can prevent usage of the CPACF facility
regardless of the availability of the crypto instructions.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Andrew Morton reported this against linux-next:
ERROR: ".save_stack_trace" [tests/backtracetest.ko] undefined!
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
KVM: MMU: Fix is_empty_shadow_page() check
KVM: MMU: Fix printk() format string
KVM: IOAPIC: only set remote_irr if interrupt was injected
KVM: MMU: reschedule during shadow teardown
KVM: VMX: Clear CR4.VMXE in hardware_disable
KVM: migrate PIT timer
KVM: ppc: Report bad GFNs
KVM: ppc: Use a read lock around MMU operations, and release it on error
KVM: ppc: Remove unmatched kunmap() call
KVM: ppc: add lwzx/stwz emulation
KVM: ppc: Remove duplicate function
KVM: s390: Fix race condition in kvm_s390_handle_wait
KVM: s390: Send program check on access error
KVM: s390: fix interrupt delivery
KVM: s390: handle machine checks when guest is running
KVM: s390: fix locking order problem in enable_sie
KVM: s390: use yield instead of schedule to implement diag 0x44
KVM: x86 emulator: fix hypercall return value on AMD
KVM: ia64: fix zero extending for mmio ld1/2/4 emulation in KVM
The first argument to __ctl_store() should be the array to store
stuff in, not just the first element of that array. With the
current code in __cpu_up(), mainline GCC dies with an internal
compiler error. I didn't diagnose that further, but just fixed
the kernel bug.
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
If a memory range is supposed to be added to the 1:1 mapping and it
ends just below the maximum supported physical address it won't
succeed. This is because a test doesn't consider that the end address
is 1 smaller than start + size.
Fix the comparison.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In case of !64BIT kernel we end up with a zero sized mem_section array.
This happens because NR_MEM_SECTIONS is smaller than SECTIONS_PER_ROOT
but we have:
#define NR_SECTION_ROOTS (NR_MEM_SECTIONS / SECTIONS_PER_ROOT)
and
struct mem_section *mem_section[NR_SECTION_ROOTS];
So fix this by selecting SPARSEMEM_STATIC which makes sure
that SECTIONS_PER_ROOT is 1.
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The call to add_timer was issued before local_int.lock was taken and before
timer_due was set to 0. If the timer expires before the lock is being taken,
the timer function will set timer_due to 1 and exit before the vcpu falls
asleep. Depending on other external events, the vcpu might sleep forever.
This fix pulls setting timer_due to the beginning of the function before
add_timer, which ensures correct behavior.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
If the guest accesses non-existing memory, the sie64a function returns
-EFAULT. We must check the return value and send a program check to the
guest if the sie instruction faulted, otherwise the guest will loop at
the faulting code.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The current code delivers pending interrupts before it checks for
need_resched. On a busy host, this can lead to a longer interrupt
latency if the interrupt is injected while the process is scheduled
away. This patch moves delivering the interrupt _after_ schedule(),
which makes more sense.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The low-level interrupt handler on s390 checks for _TIF_WORK_INT and
exits the guest context, if work is pending.
TIF_WORK_INT is defined as_TIF_SIGPENDING | _TIF_NEED_RESCHED |
_TIF_MCCK_PENDING. Currently the sie loop checks for signals and
reschedule, but it does not check for machine checks. That means that
we exit the guest context if a machine check is pending, but we do not
handle the machine check.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
CC: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
There are potential locking problem in enable_sie. We take the task_lock
and the mmap_sem. As exit_mm uses the same locks vice versa, this triggers
a lockdep warning.
The second problem is that dup_mm and mmput might sleep, so we must not
hold the task_lock at that moment.
The solution is to dup the mm unconditional and use the task_lock before and
afterwards to check if we can use the new mm. dup_mm and mmput are called
outside the task_lock, but we run update_mm while holding the task_lock,
protection us against ptrace.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
diag 0x44 is the common way on s390 to yield the cpu to the hypervisor.
It is called by the guest in cpu_relax and in the spinlock code to
yield to other guest cpus.
This semantic is similar to yield. Lets replace the call to schedule with
yield to make sure that current is really yielding.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The correct instruction format of idte is "idte r1,r3,r2" with
r1 at bit 24, r3 at bit 16 and r2 at bit 28.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Convert show_mem() so its nearly the same as on x86/powerpc.
Gives us proper locking and we get also rid of the only use of max_mapnr.
Also the number of pages was contained in an int which might not be
sufficient not too far in the future.
Cc: Johannes Weiner <hannes@saeurebad.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use get_online_cpus() to prevent cpu hotplug in situations where
for_each_online_cpu() is called.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This fixes the last remaining section mismatch warnings in s390
architecture code. It reveals also a real bug introduced by... me
with git commit 2069e978d5
("[S390] sparsemem vmemmap: initialize memmap.")
Calling the generic vmemmap_alloc_block() function to get initialized
memory is a nice idea, however that function is __meminit annotated
and therefore the function might be gone if we try to call it later.
This can happen if a DCSS segment gets added.
So basically revert the patch and clear the memmap explicitly to fix
the original bug.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On s390 make allnoconfig fails with the following build error:
arch/s390/mm/init.c: In function 'show_mem':
arch/s390/mm/init.c:55: error: implicit declaration of function 'pfn_valid'
make[1]: *** [arch/s390/mm/init.o] Error 1
make: *** [arch/s390/mm] Error 2
This problem can by fixed ensuring that ARCH_SELECT_MEMORY_MODEL
is always turned on.
Signed-off-by: Hans-Joachim Picht <hans@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Surround all the code withing show_interrupts() with
get/put_online_cpus() to prevent strange results wrt cpu hotplug.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Both smp_call_function() and __smp_call_function_map() access
cpu_online_map. Both functions run with preemption disabled which
protects for cpus going offline. However new cpus can be added and
therefore the cpu_online_map can change unexpectedly.
So use the call_lock to protect against changes to the cpu_online_map
in start_secondary() and all smp_call_* functions.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We should use const char * for passing the name of the debug feature
around since it will not be changed.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Let's just use the generic vmmemmap_alloc_block() function which
always returns initialized memory.
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use the existing arch_alloc_page/arch_free_page callbacks to do
the guest page state transitions between stable and unused.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This removes redundant arch code for generic ptrace requests
already handled by ptrace_request and compat_ptrace_request.
It simplifies things to just have the standard entry points,
and use the generic compat_sys_ptrace.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch fixes a bug with cpu bound guest on kvm-s390. Sometimes it
was impossible to deliver a signal to a spinning guest. We used
preemption as a circumvention. The preemption notifiers called
vcpu_load, which checked for pending signals and triggered a host
intercept. But even with preemption, a sigkill was not delivered
immediately.
This patch changes the low level host interrupt handler to check for the
SIE instruction, if TIF_WORK is set. In that case we change the
instruction pointer of the return PSW to rerun the vcpu_run loop. The kvm
code sees an intercept reason 0 if that happens. This patch adds accounting
for these types of intercept as well.
The advantages:
- works with and without preemption
- signals are delivered immediately
- much better host latencies without preemption
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On return from syscall or interrupt, we have to check if we return to
userspace (likely) and if there is work todo (less likely) to decide
if we handle the work. We can optimize this check: we first check for
the less likely work case and then check for userspace.
This patch is also a preparation for an additional patch, that fixes a bug
in KVM dealing with cpu bound guests.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This replaces the duplicated arch-specific versions of "sys_pipe()" with
one unified implementation. This removes almost 250 lines of duplicated
code.
It's marked __weak, so that *if* an architecture wants to override the
default implementation it can do so by simply having its own replacement
version, since many architectures use alternate calling conventions for
the 'pipe()' system call for legacy reasons (ie traditional UNIX
implementations often return the two file descriptors in registers)
I still haven't changed the cris version even though Linus says the BKL
isn't needed. The arch maintainer can easily do it if there are really
no obstacles.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
TIF_RESTORE_SIGMASK no longer needs to be in the _TIF_WORK_* masks. Those low
bits are scarce, and are all used up now. Renumber TIF_RESTORE_SIGMASK to
free one up.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After the PT_IEEE_IP hack has been removed s390 can now use
the common code sys_ptrace function.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The self referential PT_IEEE_IP ptrace peek & poke calls have been
broken for that last 6 years. For peek the code always returns 0
instead of the last ieee fault and for poke the code does nothing.
Since nobody noticed the code seems to be superfluous. So lets
remove it.
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Convert s390 to SPARSEMEM and SPARSEMEM_VMEMMAP. We do a select
of SPARSEMEM_VMEMMAP since it is configurable. This is because
SPARSEMEM without SPARSEMEM_VMEMMAP gives us a hell of broken
include dependencies that I don't want to fix.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This adds hugetlbfs support on System z, using both hardware large page
support if available and software large page emulation on older hardware.
Shared (large) page tables are implemented in software emulation mode,
by using page->index of the first tail page from a compound large page
to store page table information.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
From: Heiko Carstens <heiko.carstens@de.ibm.com>
From: Carsten Otte <cotte@de.ibm.com>
This lets us use defines for the magic bits in machine flags instead
of using plain numbers all over the place.
In addition on newer machines features/facilities are indicated by the
result of the stfl instruction. So we use these bits instead of trying
to execute new instructions and check wether we get an exception or
not.
Also the mvpg instruction is always available when in zArch mode,
whereas the idte instruction is only available in zArch mode. This
results in some minor optimizations.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Always use clear_table to initialise page tables. The overlapping
memcpy is just a leftover of a previous version that wasn't fully
converted to clear_table.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
arch/s390/lib/uaccess_mvcos.c:166:
warning: 'strnlen_user_mvcos' defined but not used
arch/s390/lib/uaccess_mvcos.c:186:
warning: 'strncpy_from_user_mvcos' defined but not used
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When we get a notification that cpu topology changed, we schedule a
work struct which just calls arch_reinit_sched_domains. This function
in turn calls get_online_cpus() which results int the lockdep warning
below.
After all it turnded out that it's not legal to call get_online_cpus()
from the context of a multi-threaded work queue.
It could deadlock this way:
process 0 (events/cpu-x):
-> run_workqueue
-> removes my work_struct from the work queue
-> calls work_struct->fn
-> get_online_cpus()
-> locks on cpu_hotplug.lock since process 1 below is doing cpu hotplug
process 1:
-> cpu_down (for cpu-x)
-> cpu_hotplug_begin (holds cpu_hotplug.lock now)
-> cpu-x dead
-> notifier_call_chain with CPU_DEAD
-> cleanup_workqueue_thread
-> flush_cpu_workqueue (succeeds)
-> kthread_stop for events/cpu-x
-> now kthread_stop waits for my work_struct to complete from within
process 0. -> dead.
A single threaded workqueue wouldn't have such problems, however there is
no such common queue available and it's not worth to create one for the
very rare calls to arch_reinit_sched_domains.
So we just create a kernel thread from our work struct which calls
arch_reinit_sched_domains and are done with it.
Thanks to Oleg Nesterov and Peter Zijlstra for helping me figuring out
that this isn't a false positive lockdep warning:
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.25-03562-g3dc5063-dirty #12
-------------------------------------------------------
events/3/14 is trying to acquire lock:
(&cpu_hotplug.lock){--..}, at: [<0000000000076094>] get_online_cpus+0x50/0x78
but task is already holding lock:
(topology_work){--..}, at: [<0000000000059cde>] run_workqueue+0x106/0x278
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (topology_work){--..}:
[<000000000006fc74>] __lock_acquire+0x1010/0x111c
[<000000000006fe40>] lock_acquire+0xc0/0xf8
[<0000000000059d48>] run_workqueue+0x170/0x278
[<0000000000059edc>] worker_thread+0x8c/0xf0
[<000000000005f5bc>] kthread+0x68/0xa0
[<000000000001a33e>] kernel_thread_starter+0x6/0xc
[<000000000001a338>] kernel_thread_starter+0x0/0xc
-> #1 (events){--..}:
[<000000000006fc74>] __lock_acquire+0x1010/0x111c
[<000000000006fe40>] lock_acquire+0xc0/0xf8
[<000000000005a23c>] cleanup_workqueue_thread+0x60/0xa8
[<00000000003b2ab8>] workqueue_cpu_callback+0xbc/0x170
[<00000000003bba80>] notifier_call_chain+0x5c/0xa4
[<00000000000655a2>] __raw_notifier_call_chain+0x26/0x38
[<00000000000655e2>] raw_notifier_call_chain+0x2e/0x40
[<0000000000075e00>] cpu_down+0x228/0x31c
[<00000000003b1dd8>] store_online+0x64/0xb8
[<00000000001e7128>] sysdev_store+0x48/0x58
[<0000000000121cd2>] sysfs_write_file+0x126/0x1c0
[<00000000000c1944>] vfs_write+0xb0/0x15c
[<00000000000c20e6>] sys_write+0x56/0x88
[<0000000000027a68>] sys32_write+0x34/0x4c
[<0000000000023f70>] sysc_noemu+0x10/0x16
[<0000000077f3f186>] 0x77f3f186
-> #0 (&cpu_hotplug.lock){--..}:
[<000000000006fa84>] __lock_acquire+0xe20/0x111c
[<000000000006fe40>] lock_acquire+0xc0/0xf8
[<00000000003b701c>] mutex_lock_nested+0xd0/0x364
[<0000000000076094>] get_online_cpus+0x50/0x78
[<000000000003a03e>] arch_reinit_sched_domains+0x26/0x58
[<000000000002700e>] topology_work_fn+0x26/0x34
[<0000000000059d4e>] run_workqueue+0x176/0x278
[<0000000000059edc>] worker_thread+0x8c/0xf0
[<000000000005f5bc>] kthread+0x68/0xa0
[<000000000001a33e>] kernel_thread_starter+0x6/0xc
[<000000000001a338>] kernel_thread_starter+0x0/0xc
other info that might help us debug this:
2 locks held by events/3/14:
#0: (events){--..}, at: [<0000000000059cde>] run_workqueue+0x106/0x278
#1: (topology_work){--..}, at: [<0000000000059cde>] run_workqueue+0x106/0x278
stack backtrace:
CPU: 3 Not tainted 2.6.25-03562-g3dc5063-dirty #12
Process events/3 (pid: 14, task: 000000002fb04038, ksp: 000000002fb0bd70)
0400000000000000 000000002fb0ba40 0000000000000002 0000000000000000
000000002fb0bae0 000000002fb0ba58 000000002fb0ba58 0000000000016488
0000000000000000 000000002fb0bd70 0000000000000000 0000000000000000
000000002fb0ba40 000000000000000c 000000002fb0ba40 000000002fb0bab0
00000000003c99e0 0000000000016488 000000002fb0ba40 000000002fb0ba90
Call Trace:
([<00000000000163fc>] show_trace+0x138/0x158)
[<00000000000164e2>] show_stack+0xc6/0xf8
[<0000000000016624>] dump_stack+0xb0/0xc0
[<000000000006cd36>] print_circular_bug_tail+0xa2/0xb4
[<000000000006fa84>] __lock_acquire+0xe20/0x111c
[<000000000006fe40>] lock_acquire+0xc0/0xf8
[<00000000003b701c>] mutex_lock_nested+0xd0/0x364
[<0000000000076094>] get_online_cpus+0x50/0x78
[<000000000003a03e>] arch_reinit_sched_domains+0x26/0x58
[<000000000002700e>] topology_work_fn+0x26/0x34
[<0000000000059d4e>] run_workqueue+0x176/0x278
[<0000000000059edc>] worker_thread+0x8c/0xf0
[<000000000005f5bc>] kthread+0x68/0xa0
[<000000000001a33e>] kernel_thread_starter+0x6/0xc
[<000000000001a338>] kernel_thread_starter+0x0/0xc
INFO: lockdep is turned off.
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On some smp sysfs store attributes get_online_cpus() may block on
cpu_hotplug.lock, but we hold already smp_cpu_state_mutex. Since the
locking order on cpu hotplug via arch_update_cpu_topology is inverse
this might lead to deadlocks.
So make sure locking order is always the same.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This is where it should be and we can get rid of some externs
and a static inline function.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
New version that does not preserve the marker. Arch maintainers indicate
that the marker functionality is is not needed anymore.
Note you may simplify the s390 asm-offsets.c code further if you use the
OFFSET() macro instead of the DEFINE. See kbuild.h
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
s390 has a strange marker in DEFINE. Undefine the DEFINE from kbuild.h and
define it the way s390 wants it to preserve things as they were.
May be good if the arch maintainer could go over this and check if this
workaround is really necessary.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a proper prototype for __do_softirq() in include/linux/interrupt.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
So userspace can save/restore the mpstate during migration.
[avi: export the #define constants describing the value]
[christian: add s390 stubs]
[avi: ditto for ia64]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Timers that fire between guest hlt and vcpu_block's add_wait_queue() are
ignored, possibly resulting in hangs.
Also make sure that atomic_inc and waitqueue_active tests happen in the
specified order, otherwise the following race is open:
CPU0 CPU1
if (waitqueue_active(wq))
add_wait_queue()
if (!atomic_read(pit_timer->pending))
schedule()
atomic_inc(pit_timer->pending)
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Temporarily rename this function to avoid merge conflicts and/or
dependencies. This function will be removed as soon as git-s390
and kvm.git are finally upstream.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch adds functionality to detect if the kernel runs under the KVM
hypervisor. A macro MACHINE_IS_KVM is exported for device drivers. This
allows drivers to skip device detection if the systems runs non-virtualized.
We also define a preferred console to avoid having the ttyS0, which is a line
mode only console.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch adds the virtualization submenu and the kvm option to the kernel
config. It also defines HAVE_KVM for 64bit kernels.
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch introduces interpretation of some diagnose instruction intercepts.
Diagnose is our classic architected way of doing a hypercall. This patch
features the following diagnose codes:
- vm storage size, that tells the guest about its memory layout
- time slice end, which is used by the guest to indicate that it waits
for a lock and thus cannot use up its time slice in a useful way
- ipl functions, which a guest can use to reset and reboot itself
In order to implement ipl functions, we also introduce an exit reason that
causes userspace to perform various resets on the virtual machine. All resets
are described in the principles of operation book, except KVM_S390_RESET_IPL
which causes a reboot of the machine.
Acked-by: Martin Schwidefsky <martin.schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch introduces in-kernel handling of _some_ sigp interprocessor
signals (similar to ipi).
kvm_s390_handle_sigp() decodes the sigp instruction and calls individual
handlers depending on the operation requested:
- sigp sense tries to retrieve information such as existence or running state
of the remote cpu
- sigp emergency sends an external interrupt to the remove cpu
- sigp stop stops a remove cpu
- sigp stop store status stops a remote cpu, and stores its entire internal
state to the cpus lowcore
- sigp set arch sets the architecture mode of the remote cpu. setting to
ESAME (s390x 64bit) is accepted, setting to ESA/S390 (s390, 31 or 24 bit) is
denied, all others are passed to userland
- sigp set prefix sets the prefix register of a remote cpu
For implementation of this, the stop intercept indication starts to get reused
on purpose: a set of action bits defines what to do once a cpu gets stopped:
ACTION_STOP_ON_STOP really stops the cpu when a stop intercept is recognized
ACTION_STORE_ON_STOP stores the cpu status to lowcore when a stop intercept is
recognized
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch introduces in-kernel handling of some intercepts for privileged
instructions:
handle_set_prefix() sets the prefix register of the local cpu
handle_store_prefix() stores the content of the prefix register to memory
handle_store_cpu_address() stores the cpu number of the current cpu to memory
handle_skey() just decrements the instruction address and retries
handle_stsch() delivers condition code 3 "operation not supported"
handle_chsc() same here
handle_stfl() stores the facility list which contains the
capabilities of the cpu
handle_stidp() stores cpu type/model/revision and such
handle_stsi() stores information about the system topology
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch contains the s390 interrupt subsystem (similar to in kernel apic)
including timer interrupts (similar to in-kernel-pit) and enabled wait
(similar to in kernel hlt).
In order to achieve that, this patch also introduces intercept handling
for instruction intercepts, and it implements load control instructions.
This patch introduces an ioctl KVM_S390_INTERRUPT which is valid for both
the vm file descriptors and the vcpu file descriptors. In case this ioctl is
issued against a vm file descriptor, the interrupt is considered floating.
Floating interrupts may be delivered to any virtual cpu in the configuration.
The following interrupts are supported:
SIGP STOP - interprocessor signal that stops a remote cpu
SIGP SET PREFIX - interprocessor signal that sets the prefix register of a
(stopped) remote cpu
INT EMERGENCY - interprocessor interrupt, usually used to signal need_reshed
and for smp_call_function() in the guest.
PROGRAM INT - exception during program execution such as page fault, illegal
instruction and friends
RESTART - interprocessor signal that starts a stopped cpu
INT VIRTIO - floating interrupt for virtio signalisation
INT SERVICE - floating interrupt for signalisations from the system
service processor
struct kvm_s390_interrupt, which is submitted as ioctl parameter when injecting
an interrupt, also carrys parameter data for interrupts along with the interrupt
type. Interrupts on s390 usually have a state that represents the current
operation, or identifies which device has caused the interruption on s390.
kvm_s390_handle_wait() does handle waitpsw in two flavors: in case of a
disabled wait (that is, disabled for interrupts), we exit to userspace. In case
of an enabled wait we set up a timer that equals the cpu clock comparator value
and sleep on a wait queue.
[christian: change virtio interrupt to 0x2603]
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This path introduces handling of sie intercepts in three flavors: Intercepts
are either handled completely in-kernel by kvm_handle_sie_intercept(),
or passed to userspace with corresponding data in struct kvm_run in case
kvm_handle_sie_intercept() returns -ENOTSUPP.
In case of partial execution in kernel with the need of userspace support,
kvm_handle_sie_intercept() may choose to set up struct kvm_run and return
-EREMOTE.
The trivial intercept reasons are handled in this patch:
handle_noop() just does nothing for intercepts that don't require our support
at all
handle_stop() is called when a cpu enters stopped state, and it drops out to
userland after updating our vcpu state
handle_validity() faults in the cpu lowcore if needed, or passes the request
to userland
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch contains the port of Qumranet's kvm kernel module to IBM zSeries
(aka s390x, mainframe) architecture. It uses the mainframe's virtualization
instruction SIE to run virtual machines with up to 64 virtual CPUs each.
This port is only usable on 64bit host kernels, and can only run 64bit guest
kernels. However, running 31bit applications in guest userspace is possible.
The following source files are introduced by this patch
arch/s390/kvm/kvm-s390.c similar to arch/x86/kvm/x86.c, this implements all
arch callbacks for kvm. __vcpu_run calls back into
sie64a to enter the guest machine context
arch/s390/kvm/sie64a.S assembler function sie64a, which enters guest
context via SIE, and switches world before and after that
include/asm-s390/kvm_host.h contains all vital data structures needed to run
virtual machines on the mainframe
include/asm-s390/kvm.h defines kvm_regs and friends for user access to
guest register content
arch/s390/kvm/gaccess.h functions similar to uaccess to access guest memory
arch/s390/kvm/kvm-s390.h header file for kvm-s390 internals, extended by
later patches
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The SIE instruction on s390 uses the 2nd half of the page table page to
virtualize the storage keys of a guest. This patch offers the s390_enable_sie
function, which reorganizes the page tables of a single-threaded process to
reserve space in the page table:
s390_enable_sie makes sure that the process is single threaded and then uses
dup_mm to create a new mm with reorganized page tables. The old mm is freed
and the process has now a page status extended field after every page table.
Code that wants to exploit pgstes should SELECT CONFIG_PGSTE.
This patch has a small common code hit, namely making dup_mm non-static.
Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's
review feedback. Now we do have the prototype for dup_mm in
include/linux/sched.h. Following Martin's suggestion, s390_enable_sie() does now
call task_lock() to prevent race against ptrace modification of mm_users.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
[HWRNG] omap: Minor updates
[CRYPTO] kconfig: Ordering cleanup
[CRYPTO] all: Clean up init()/fini()
[CRYPTO] padlock-aes: Use generic setkey function
[CRYPTO] aes: Export generic setkey
[CRYPTO] api: Make the crypto subsystem fully modular
[CRYPTO] cts: Add CTS mode required for Kerberos AES support
[CRYPTO] lrw: Replace all adds to big endians variables with be*_add_cpu
[CRYPTO] tcrypt: Change the XTEA test vectors
[CRYPTO] tcrypt: Shrink the tcrypt module
[CRYPTO] tcrypt: Change the usage of the test vectors
[CRYPTO] api: Constify function pointer tables
[CRYPTO] aes-x86-32: Remove unused return code
[CRYPTO] tcrypt: Shrink speed templates
[CRYPTO] tcrypt: Group common speed templates
[CRYPTO] sha512: Rename sha512 to sha512_generic
[CRYPTO] sha384: Hardware acceleration for s390
[CRYPTO] sha512: Hardware acceleration for s390
[CRYPTO] s390: Generic sha_update and sha_final
[CRYPTO] api: Switch to proc_create()
Exploit the System z10 hardware acceleration for SHA384.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Exploit the System z10 hardware acceleration for SHA512.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The sha_{update|final} functions are similar for every sha variant.
Since that is error-prone and redundant replace these functions by
a shared generic implementation for s390.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26: (1090 commits)
[NET]: Fix and allocate less memory for ->priv'less netdevices
[IPV6]: Fix dangling references on error in fib6_add().
[NETLABEL]: Fix NULL deref in netlbl_unlabel_staticlist_gen() if ifindex not found
[PKT_SCHED]: Fix datalen check in tcf_simp_init().
[INET]: Uninline the __inet_inherit_port call.
[INET]: Drop the inet_inherit_port() call.
SCTP: Initialize partial_bytes_acked to 0, when all of the data is acked.
[netdrvr] forcedeth: internal simplifications; changelog removal
phylib: factor out get_phy_id from within get_phy_device
PHY: add BCM5464 support to broadcom PHY driver
cxgb3: Fix __must_check warning with dev_dbg.
tc35815: Statistics cleanup
natsemi: fix MMIO for PPC 44x platforms
[TIPC]: Cleanup of TIPC reference table code
[TIPC]: Optimized initialization of TIPC reference table
[TIPC]: Remove inlining of reference table locking routines
e1000: convert uint16_t style integers to u16
ixgb: convert uint16_t style integers to u16
sb1000.c: make const arrays static
sb1000.c: stop inlining largish static functions
...
* 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc:
Remove DEBUG_SEMAPHORE from Kconfig
Improve semaphore documentation
Simplify semaphore implementation
Add down_timeout and change ACPI to use it
Introduce down_killable()
Generic semaphore implementation
Add semaphore.h to kernel_lock.c
Fix quota.h includes
Semaphores are no longer performance-critical, so a generic C
implementation is better for maintainability, debuggability and
extensibility. Thanks to Peter Zijlstra for fixing the lockdep
warning. Thanks to Harvey Harrison for pointing out that the
unlikely() was unnecessary.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Move the function that prints the segment warning messages found in the
monreader driver and the dcssblk driver to the extmem base code.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Newer s390 models have a breaking-event-address-recording register.
Each time an instruction causes a break in the sequential instruction
execution, the address is saved in that hardware register. On a program
interrupt the address is copied to the lowcore address 272-279, which
makes it software accessible.
This patch changes the program check handler and the stack overflow
checker to copy the value into the pt_regs argument.
The oops output is enhanced to show the last known breaking address.
It might give additional information if the stack trace is corrupted.
The feature is only available on 64 bit.
The new oops output looks like:
[---------snip----------]
Modules linked in: vmcp sunrpc qeth_l2 dm_mod qeth ccwgroup
CPU: 2 Not tainted 2.6.24zlive-host #8
Process modprobe (pid: 4788, task: 00000000bf3d8718, ksp: 00000000b2b0b8e0)
Krnl PSW : 0704200180000000 000003e000020028 (vmcp_init+0x28/0xe4 [vmcp])
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3
Krnl GPRS: 0000000004000002 000003e000020000 0000000000000000 0000000000000001
000000000015734c ffffffffffffffff 000003e0000b3b00 0000000000000000
000003e00007ca30 00000000b5bb5d40 00000000b5bb5800 000003e0000b3b00
000003e0000a2000 00000000003ecf50 00000000b2b0bd50 00000000b2b0bcb0
Krnl Code: 000003e000020018: c0c000040ff4 larl %r12,3e0000a2000
000003e00002001e: e3e0f0000024 stg %r14,0(%r15)
000003e000020024: a7f40001 brc 15,3e000020026
>000003e000020028: e310c0100004 lg %r1,16(%r12)
000003e00002002e: c020000413dc larl %r2,3e0000a27e6
000003e000020034: c0a00004aee6 larl %r10,3e0000b5e00
000003e00002003a: a7490001 lghi %r4,1
000003e00002003e: a75900f0 lghi %r5,240
Call Trace:
([<000000000014b300>] blocking_notifier_call_chain+0x2c/0x40)
[<000000000015735c>] sys_init_module+0x19d8/0x1b08
[<0000000000110afc>] sysc_noemu+0x10/0x16
[<000002000011cda2>] 0x2000011cda2
Last Breaking-Event-Address:
[<000003e000020024>] vmcp_init+0x24/0xe4 [vmcp]
[---------snip----------]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The current uaccess page table walk code assumes at a few places that
any access is a user space access. This is not correct if somebody
has issued a set_fs(KERNEL_DS) in advance.
Add code which checks which address space we are in and with this make
sure we access the correct address space. This way we get also rid of
the dirty
if (!currrent-mm)
return -EFAULT;
hack in futex_atomic_cmpxchg_pt.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Most noteable part of this commit is the new local header file entry.h
which contains all the function declarations of functions that get only
called from asm code or are arch internal. That way we can avoid extern
declarations in C files.
This is more or less the same that was done for sparc64.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
This way we get rid of s390's NO_IDLE_HZ and use the generic dynticks
variant instead. In addition we get high resolution timers for free.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Remove the program check generating monitor calls and use function
calls instead. Theres is no real advantage in using monitor calls,
but they do make debugging harder, because of all the program checks
it generates.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The new function supports setting of permissions for the debugfs files
created by the debug feature. In addition to that, the function provides
uid and gid as parameters for future use. Currently only root is allowed
for uid and gid.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Not very helpful when code dies in "init".
See also http://lkml.org/lkml/2008/3/26/557 .
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Add get_clock_xt to read an 8 byte clock value using store clock
extended (STCKE) and use get_clock_xt for sched_clock. STCKE should
be faster than STCK on newer machines.
Signed-off-by: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
If vertical cpu polarization is active then the hypervisor will
dispatch certain cpus for a longer time than other cpus for maximum
performance. For example if a guest would have three virtual cpus,
each of them with a share of 33 percent, then in case of vertical
cpu polarization all of the processing time would be combined to a
single cpu which would run all the time, while the other two cpus
would get nearly no cpu time.
There are three different types of vertical cpus: high, medium and
low. Low cpus hardly get any real cpu time, while high cpus get a
full real cpu. Medium cpus get something in between.
In order to switch between the two possible modes (default is
horizontal) a 0 for horizontal polarization or a 1 for vertical
polarization must be written to the dispatching sysfs attribute:
/sys/devices/system/cpu/dispatching
The polarization of each single cpu can be figured out by the
polarization sysfs attribute of each cpu:
/sys/devices/system/cpu/cpuX/polarization
horizontal, vertical:high, vertical:medium, vertical:low or unknown.
When switching polarization the polarization attribute may contain
the value unknown until the configuration change is done and the
kernel has figured out the new polarization of each cpu.
Note that running a system with different types of vertical cpus may
result in significant performance regressions. If possible only one
type of vertical cpus should be used. All other cpus should be
offlined.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Add s390 backend so we can give the scheduler some hints about the
cpu topology.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Make stfle visible so other code can call this.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
sys_sigreturn and sys_rt_sigreturn don't take any arguments. So luckily
this resulted only in unneeded instead of incorrect code.
But still this clearly shows why one should not put extern declarations
in C files (will be fixed with a larger sparse patch).
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
This is just a port of 83bd01024b
"x86: protect against sigaltstack wraparound".
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
a0c1e9073e "futex: runtime enable pi and
robust functionality" introduces a test wether futex in atomic stuff
works or not.
It does that by writing to address 0 of the kernel address space. This
will crash on older machines where addressing mode switching is enabled
but where the mvcos instruction is not available. Page table walking is
done by hand and therefore the code tries to access current->mm which
is NULL.
Therefore add an extra check, so we survive the early test.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
List of major changes and improvements:
no manipulation of the global ARP constructor
clean code split into core, layer 2 and layer 3 functionality
better exploitation of the ethtool interface
better representation of the various hardware capabilities
fix packet socket support (tcpdump), no fake_ll required
osasnmpd notification via udev events
coding style and beautification
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
/sys/firmware/reipl/nss/name contains the nss name when defsys or
savesys command has been executed. If the defsys or savesys command
fails the kernel_nss_name has to be cleared since a reipl on that
nss name won't be possible.
Signed-off-by: Hongjie Yang <hongjie@us.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Normally this should not happen, but it's cleaner to do it that way.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
IPL from NSS didn't work because the memory detection routine omits any
memory sections with a size lower than what MAX_ORDER defines.
This causes the detection routine to skip the first memory segment which
has a size of 1MB. Which later on will let the kernel think that there
is no memory available at all.
Since in addition the z/VM memory increment size is 1MB force MAX_ORDER
to be 9, so we can support 1MB segments.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Compile smp.o with -Wno-nonnull so gcc stops warning about memcpy
being used with a null parameter. Also remove the workaround code
and use a char * cast instead of a void * cast to do computations.
Cc: Bastian Blank <bastian@waldi.eu.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If a machine check handling is pending when the idle loop is entered
default_idle will be left with timer ticks and virtual timer disabled.
Fix this by "calling" the idle_chain. Also a BUG_ON(!in_interrupt) in
start_hz_timer must be removed since the function now gets called from
non interrupt context as well.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add CONFIG_HAVE_KRETPROBES to the arch/<arch>/Kconfig file for relevant
architectures with kprobes support. This facilitates easy handling of
in-kernel modules (like samples/kprobes/kretprobe_example.c) that depend on
kretprobes being present in the kernel.
Thanks to Sam Ravnborg for helping make the patch more lean.
Per Mathieu's suggestion, added CONFIG_KRETPROBES and fixed up dependencies.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add missing exception table entry so that the kernel can handle
proctection exceptions as well on the cs instruction. Currently only
specification exceptions are handled correctly.
The missing entry allows user space to crash the kernel.
Cc: stable <stable@kernel.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Since a5fbb6d106
"KVM: fix !SMP build error" smp_call_function isn't a define anymore
that folds into nothing but a define that calls up_smp_call_function
with all parameters. Hence we cannot #ifdef out the unused code
anymore...
This seems to be the preferred method, so do this for s390 as well.
arch/s390/kernel/time.c: In function 'etr_sync_clock':
arch/s390/kernel/time.c:825: error: 'clock_sync_cpu_start' undeclared
arch/s390/kernel/time.c:862: error: 'clock_sync_cpu_end' undeclared
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Just copy the first 512 read-only bytes of the current cpu lowcore if
a new cpu gets onlined. The rest is zeroed out and must be explicitly
initialized. Current code just copies the entire lowcore and
initializes the needed fields.
This should reveal bugs in future enhancements quite early.
Also when the lowcore of the first cpu is replaced this is now done
atomically (no interrupts, no machine checks).
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If both NO_IDLE_HZ and VIRT_TIMER are disabled default_idle won't load
an enabled wait psw and busy loop instead. This is because the
idle_chain is empty and the return value of atomic_notifier_call_chain
will be NOTIFY_DONE, which causes default_idle to return instead of
loading an enabled wait psw.
Fix this by calling __atomic_notifier_call_chain instead and add proper
return value handling.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add support for different number of page table levels dependent
on the highest address used for a process. This will cause a 31 bit
process to use a two level page table instead of the four level page
table that is the default after the pud has been introduced. Likewise
a normal 64 bit process will use three levels instead of four. Only
if a process runs out of the 4 tera bytes which can be addressed with
a three level page table the fourth level is dynamically added. Then
the process can use up to 8 peta byte.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Background: I've implemented 1K/2K page tables for s390. These sub-page
page tables are required to properly support the s390 virtualization
instruction with KVM. The SIE instruction requires that the page tables
have 256 page table entries (pte) followed by 256 page status table entries
(pgste). The pgstes are only required if the process is using the SIE
instruction. The pgstes are updated by the hardware and by the hypervisor
for a number of reasons, one of them is dirty and reference bit tracking.
To avoid wasting memory the standard pte table allocation should return
1K/2K (31/64 bit) and 2K/4K if the process is using SIE.
Problem: Page size on s390 is 4K, page table size is 1K or 2K. That means
the s390 version for pte_alloc_one cannot return a pointer to a struct
page. Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
cannot return a pointer to a pte either, since that would require more than
32 bit for the return value of pte_alloc_one (and the pte * would not be
accessible since its not kmapped).
Solution: The only solution I found to this dilemma is a new typedef: a
pgtable_t. For s390 pgtable_t will be a (pte *) - to be introduced with a
later patch. For everybody else it will be a (struct page *). The
additional problem with the initialization of the ptl lock and the
NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
a destructor pgtable_page_dtor. The page table allocation and free
functions need to call these two whenever a page table page is allocated or
freed. pmd_populate will get a pgtable_t instead of a struct page pointer.
To get the pgtable_t back from a pmd entry that has been installed with
pmd_populate a new function pmd_pgtable is added. It replaces the pmd_page
call in free_pte_range and apply_to_pte_range.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we possibly lookup the pid in the wrong pid namespace. So
seq_file convert proc_pid_status which ensures the proper pid namespaces is
passed in.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: another build fix]
[akpm@linux-foundation.org: s390 build fix]
[akpm@linux-foundation.org: fix task_name() output]
[akpm@linux-foundation.org: fix nommu build]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andrew Morgan <morgan@kernel.org>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset adds a flags variable to reserve_bootmem() and uses the
BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
between crashkernel area and already used memory.
This patch:
Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
If that flag is set, the function returns with -EBUSY if the memory already
has been reserved in the past. This is to avoid conflicts.
Because that code runs before SMP initialisation, there's no race condition
inside reserve_bootmem_core().
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: <linux-arch@vger.kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] dcss: Initialize workqueue before using it.
[S390] Remove BUILD_BUG_ON() in vmem code.
[S390] sclp_tty/sclp_vt220: Fix scheduling while atomic
[S390] dasd: fix panic caused by alias device offline
[S390] dasd: add ifcc handling
[S390] latencytop s390 support.
[S390] Implement ext2_find_next_bit.
[S390] Cleanup & optimize bitops.
[S390] Define GENERIC_LOCKBREAK.
[S390] console: allow vt220 console to be the only console
[S390] Fix couple of section mismatches.
[S390] Fix smp_call_function_mask semantics.
[S390] Fix linker script.
[S390] DEBUG_PAGEALLOC support for s390.
[S390] cio: Add shutdown callback for ccwgroup.
[S390] cio: Update documentation.
[S390] cio: Clean up chsc response code handling.
[S390] cio: make sense id procedure work with partial hardware response
This is the new timerfd API as it is implemented by the following patch:
int timerfd_create(int clockid, int flags);
int timerfd_settime(int ufd, int flags,
const struct itimerspec *utmr,
struct itimerspec *otmr);
int timerfd_gettime(int ufd, struct itimerspec *otmr);
The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.
The timerfd_settime() API give new settings by the timerfd fd, by optionally
retrieving the previous expiration time (in case the "otmr" parameter is not
NULL).
The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
is set in the "flags" parameter. Otherwise it's a relative time.
The timerfd_gettime() API returns the next expiration time of the timer, or
{0, 0} if the timerfd has not been set yet.
Like the previous timerfd API implementation, read(2) and poll(2) are
supported (with the same interface). Here's a simple test program I used to
exercise the new timerfd APIs:
http://www.xmailserver.org/timerfd-test2.c
[akpm@linux-foundation.org: coding-style cleanups]
[akpm@linux-foundation.org: fix ia64 build]
[akpm@linux-foundation.org: fix m68k build]
[akpm@linux-foundation.org: fix mips build]
[akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
[heiko.carstens@de.ibm.com: fix s390]
[akpm@linux-foundation.org: fix powerpc build]
[akpm@linux-foundation.org: fix sparc64 more]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove BUILD_BUG_ON() in vmem code since it causes build failures if
the size of struct page increases. Instead calculate at compile time
the address of the highest physical address that can be added to the
1:1 mapping.
This supposed to fix a build failure with the page owner tracking leak
detector patches as reported by akpm.
page-owner-tracking-leak-detector-broken-on-s390.patch can be removed
from -mm again when this is merged.
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix compile error:
CC arch/s390/kernel/asm-offsets.s
In file included from
arch/s390/kernel/asm-offsets.c:7:
include/linux/sched.h: In function 'spin_needbreak':
include/linux/sched.h:1931: error: implicit declaration of function '__raw_spin_is_contended'
make[2]: *** [arch/s390/kernel/asm-offsets.s] Error 1
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix console detection logic to support configurations in which the
vt220 console is the only available Linux console.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix couple of section mismatches. And since we touch the code
anyway change the IPL code to use C99 initializers.
Cc: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Make sure func isn't called on the local cpu just like on all other
architectures that implement this function.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fixes this warning:
vmlinux: warning: allocated section `.text' not in segment
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
After seeing the filename I'd have expected something about the
implementation of SMP in the Linux kernel - not some notes on kernel
configuration and building trivialities noone would search at this
place.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Alan Cox <alan@redhat.com>
Linus:
On the per-architecture side, I do think it would be better to *not* have
internal architecture knowledge in a generic file, and as such a line like
depends on X86_32 || IA64 || PPC || S390 || SPARC64 || X86_64 || AVR32
really shouldn't exist in a file like kernel/Kconfig.instrumentation.
It would be much better to do
depends on ARCH_SUPPORTS_KPROBES
in that generic file, and then architectures that do support it would just
have a
bool ARCH_SUPPORTS_KPROBES
default y
in *their* architecture files. That would seem to be much more logical,
and is readable both for arch maintainers *and* for people who have no
clue - and don't care - about which architecture is supposed to support
which interface...
Changelog:
Actually, I know I gave this as the magic incantation, but now that I see
it, I realize that I should have told you to just use
config KPROBES_SUPPORT
def_bool y
instead, which is a bit denser.
We seem to use both kinds of syntax for these things, but this is really
what "def_bool" is there for...
- Use HAVE_KPROBES
- Use a select
- Yet another update :
Moving to HAVE_* now.
- Update ARM for kprobes support.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Linus:
On the per-architecture side, I do think it would be better to *not* have
internal architecture knowledge in a generic file, and as such a line like
depends on X86_32 || IA64 || PPC || S390 || SPARC64 || X86_64 || AVR32
really shouldn't exist in a file like kernel/Kconfig.instrumentation.
It would be much better to do
depends on ARCH_SUPPORTS_KPROBES
in that generic file, and then architectures that do support it would just
have a
bool ARCH_SUPPORTS_KPROBES
default y
in *their* architecture files. That would seem to be much more logical,
and is readable both for arch maintainers *and* for people who have no
clue - and don't care - about which architecture is supposed to support
which interface...
Changelog:
Actually, I know I gave this as the magic incantation, but now that I see
it, I realize that I should have told you to just use
config ARCH_SUPPORTS_KPROBES
def_bool y
instead, which is a bit denser.
We seem to use both kinds of syntax for these things, but this is really
what "def_bool" is there for...
Changelog :
- Moving to HAVE_*.
- Add AVR32 oprofile.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
This patch consolidate all definitions of .init.text, .init.data
and .exit.text, .exit.data section definitions in
the generic vmlinux.lds.h.
This is a preparational patch - alone it does not buy
us much good.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Git commit 86ef5c9a8e forgot a few
lock_cpu_hotplug/unlock_cpu_hotplug pairs in arch/s390/kernel/smp.c
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In s390's spin_lock_irqsave, interrupts remain disabled while
spinning. In other architectures like x86 and powerpc, interrupts are
re-enabled while spinning if IRQ is not masked before spin_lock_irqsave
is called.
The following patch re-enables interrupts through local_irq_restore
while spinning for a lock acquisition.
This can improve system response.
[heiko.carstens@de.ibm.com: removed saving of pc]
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch adds the s390 variant for smp_call_function_mask(). The
implementation is pretty straight forward using the wrapper
__smp_call_function_map() which already takes a cpumask_t argument.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch removes TOPDIR from arch/s390/kernel/Makefile.
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move the NOTES and BUG_TABLE section in the linker script to the
read-only sections right after the text section.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch fixes a problem with the following scenario:
1. Linux booted from DASD "A"
2. Reboot from DASD "B" using "/sys/firmware/reipl/ccw/device"
3. Reboot DASD "B"
Without this patch in step 3 on newer s390 systems under LPAR instead of
DASD "B", DASD "A" will be booted. The reason is that in step 2 we use CCW
reipl and in step 3 we use DIAG308 (subcode 3) reipl. DIAG308 does not
notice the CCW reipl and still thinks that it has to reboot DASD "A".
Before applying this fix, ensure to have MCF RJ9967101E or z9 GA3 base driver
installed.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We have seen an oops in an OOM situation, where show_mem tried to
access the struct page of a dcss segment. The vmemmap code has
already created the 1:1 mapping but failed allocating the struct
pages. In the OOM case, show_mem now walks the memory. It uses
pfn_valid to detect if it may access the struct page. In the case
described above, the mapping was established and pfn_valid returned
true. As the struct pages were not allocated, the kernel oopsed.
We have to ensure that we have created the struct pages, before we
add a mapping pointing to the pages.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The sclp ipl information has not been initialized. Therefore the ipl loadparm
and the "has_dump" flag have not been set correctly.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
No need to preallocate the per cpu lowcores and stacks.
Savings are 28-32k per offline cpu.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In case of a kernel panic it is currently possible to specify that a dump
should be created, the system should be rebooted or stopped. Virtual sysfs
files under the directory /sys/firmware/ are used for that configuration.
In addition to that, there are kernel parameters 'vmhalt', 'vmpoff'
and 'vmpanic', which can be used to specify z/VM commands, which are
automatically executed in case of halt, power off or a kernel panic.
This patch combines both functionalities and allows to specify the z/VM CP
commands also via sysfs attributes. In addition to that, it enhances the
existing handling of shutdown triggers (e.g. halt or panic) and associated
shutdown actions (e.g. dump or reipl) and makes it more flexible.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move s390 crypto Kconfig options to drivers/crypto/Kconfig to have all
hardware crypto devices in one place.
This also makes messing up the kernel source tree easier for some people.
Signed-off-by: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
It caused only a lot of confusion. From now on cpu hotplug of up to
NR_CPUS will work by default. If somebody wants to limit that then
the possible_cpus parameter can be used.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Used to contain the address of the holder of the lock. But since the
spinlock code is not inlined anymore all locks contain the same address
anyway. And since in addtition nobody complained about that for ages
its obviously unused. So remove it.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Align everything to MAX_ORDER so we can get rid of the extra checks.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Also print PREEMPT and/or SMP if the kernel was configured that way.
Makes s390 look a bit more like other architectures.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Currently the vmalloc area starts at a dynamic address depending on
the memory size. There was also an 8MB security hole after the
physical memory to catch out-of-bounds accesses.
We can simplify the code by putting the vmalloc area explicitely at
the top of the kernel mapping and setting the vmalloc size to a fixed
value of 128MB/128GB for 31bit/64bit systems. Part of the vmalloc
area will be used for the vmem_map. This leaves an area of 96MB/1GB
for normal vmalloc allocations.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The clear-by-asce operation of the idte instruction gets an asce
(address-space-control-element) as argument to specify which TLBs
need to get flushed. The current code passes a plain pointer to
the start of the pgd without the additional bits which would make
the pointer an asce. The current machines don't mind the difference
but a future model might want to use the designation type control
bits in the asce as a filter for the TLBs to flush.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a new interface so that cpus can be put into standby state and
configured state.
Only offline cpus can be put into standby state or configured state.
For that the new percpu sysfs attribute "configure" must be used.
To put a cpu in standby state a "0" must be written to the attribute.
In order to switch it into configured state a "1" must be written to
the attribute.
Only cpus in configured state can be brought online.
In addition this patch introduces a static mapping of physical to
logical cpus. As a result only the sysfs directories of present cpus
will be created. To scan for new cpus the new sysfs attribute "rescan"
must be used.
Writing to /sys/devices/system/cpu/rescan will trigger a rescan of
cpus and will create directories for new cpus.
On IPL only configured cpus will be used. And on reboot/shutdown all
cpus will remain in their current state (configured/standby).
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (125 commits)
[CRYPTO] twofish: Merge common glue code
[CRYPTO] hifn_795x: Fixup container_of() usage
[CRYPTO] cast6: inline bloat--
[CRYPTO] api: Set default CRYPTO_MINALIGN to unsigned long long
[CRYPTO] tcrypt: Make xcbc available as a standalone test
[CRYPTO] xcbc: Remove bogus hash/cipher test
[CRYPTO] xcbc: Fix algorithm leak when block size check fails
[CRYPTO] tcrypt: Zero axbuf in the right function
[CRYPTO] padlock: Only reset the key once for each CBC and ECB operation
[CRYPTO] api: Include sched.h for cond_resched in scatterwalk.h
[CRYPTO] salsa20-asm: Remove unnecessary dependency on CRYPTO_SALSA20
[CRYPTO] tcrypt: Add select of AEAD
[CRYPTO] salsa20: Add x86-64 assembly version
[CRYPTO] salsa20_i586: Salsa20 stream cipher algorithm (i586 version)
[CRYPTO] gcm: Introduce rfc4106
[CRYPTO] api: Show async type
[CRYPTO] chainiv: Avoid lock spinning where possible
[CRYPTO] seqiv: Add select AEAD in Kconfig
[CRYPTO] scatterwalk: Handle zero nbytes in scatterwalk_map_and_copy
[CRYPTO] null: Allow setkey on digest_null
...
All kobjects require a dynamically allocated name now. We no longer
need to keep track if the name is statically assigned, we can just
unconditionally free() all kobject names on cleanup.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is no need for kobject_unregister() anymore, thanks to Kay's
kobject cleanup changes, so replace all instances of it with
kobject_put().
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is no firmware "subsystem" it's just a directory in /sys that
other portions of the kernel want to hook into. So make it a kobject
not a kset to help alivate anyone who tries to do some odd kset-like
things with this.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dynamically create the kset instead of declaring it statically.
This makes the kobject attributes now work properly that I broke in the
previous patch.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Michael Holzheu <holzheu@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Volker Sameske <sameske@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This makes the code a bit simpler and and gets us one step closer to
deleting the deprecated subsys_attr code.
NOTE, this needs the next patch in the series in order to work properly.
This will build, but the sysfs files will not properly operate.
Thanks to Cornelia for the build fix on this patch.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Michael Holzheu <holzheu@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Volker Sameske <sameske@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We don't need a kset here, a simple kobject will do just fine, so
dynamically create the kobject and use it.
Thanks to Cornelia for the build fix.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Michael Holzheu <holzheu@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We don't need a kset here, a simple kobject will do just fine, so
dynamically create the kobject and use it.
We also rename hypervisor_subsys to hypervisor_kset to catch all users
of the variable.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We don't need a "default" ktype for a kset. We should set this
explicitly every time for each kset. This change is needed so that we
can make ksets dynamic, and cleans up one of the odd, undocumented
assumption that the kset/kobject/ktype model has.
This patch is based on a lot of help from Kay Sievers.
Nasty bug in the block code was found by Dave Young
<hidave.darkstar@gmail.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
crypto_blkcipher_decrypt is wrong because it does not care about
the IV.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>