The current timeslice code mixes 'jiffies' up with 'spesched ticks'. This
change correctly defines the number of time slices each SPE contexts is
given, and clarifies the comment.
This brings the default timeslice for SPE contexts into a reasonable
range.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Enable preemptive scheduling for non-RT contexts.
We use the same algorithms as the CPU scheduler to calculate the time
slice length, and for now we also use the same timeslice length as the
CPU scheduler. This might be not enough for good performance and can be
changed after some benchmarking.
Note that currently we do not boost the priority for contexts waiting
on the runqueue for a long time, so contexts with a higher nice value
could starve ones with less priority. This could easily be fixed once
the rework of the spu lists that Luke and I discussed is done.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Get rid of the scheduler workqueues that complicated things a lot to
a dedicated spu scheduler thread that gets woken by a traditional
scheduler tick. By default this scheduler tick runs a HZ * 10, aka
one spu scheduler tick for every 10 cpu ticks.
Currently the tick is not disabled when we have less context than
available spus, but I will implement this later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a bit define from book, and replace one hex number with a
symbol, for clarity.
Signed-off-by: Sebastian Siewior <bigeasy@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently it fails with gcc from sdk 2.1 because of a spec change [1].
Maybe we should start using the definitions from spu_mfcio.h.
[1] http://gcc.gnu.org/ml/gcc-patches/2006-11/msg01598.html
Signed-off-by: Sebastian Siewior <bigeasy@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The recent change to cell_defconfig to enable cpufreq on Cell exposed
the fact that the cbe_cpufreq driver currently needs the PMI interface
code to compile, but Kconfig doesn't make sure that the PMI interface
code gets built if cbe_cpufreq is enabled.
In fact cbe_cpufreq can work without PMI, so this ifdefs out the code
that deals with PMI. This is a minimal solution for 2.6.22; a more
comprehensive solution will be merged for 2.6.23.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a shutdown method to spu_sysdev_class to allow proper spu resource
cleanup on system shutdown. This is needed to support kexec on the PS3
platform.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds a "capabilities" file to spu contexts consisting of a
list of linefeed separated capability names. The current exposed
capabilities are "sched" (the context is scheduleable) and
"step" (the context supports single stepping).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds support for SPU single stepping. The single
step bit is set in the SPU when the current process is
being single-stepped via ptrace. The spu then stops and
returns with a specific flag set and the syscall exit code
will generate the SIGTRAP.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This rewrites pretty much from scratch the handling of MMIO and PIO
space allocations on powerpc64. The main goals are:
- Get rid of imalloc and use more common code where possible
- Simplify the current mess so that PIO space is allocated and
mapped in a single place for PCI bridges
- Handle allocation constraints of PIO for all bridges including
hot plugged ones within the 2GB space reserved for IO ports,
so that devices on hotplugged busses will now work with drivers
that assume IO ports fit in an int.
- Cleanup and separate tracking of the ISA space in the reserved
low 64K of IO space. No ISA -> Nothing mapped there.
I booted a cell blade with IDE on PIO and MMIO and a dual G5 so
far, that's it :-)
With this patch, all allocations are done using the code in
mm/vmalloc.c, though we use the low level __get_vm_area with
explicit start/stop constraints in order to manage separate
areas for vmalloc/vmap, ioremap, and PCI IOs.
This greatly simplifies a lot of things, as you can see in the
diffstat of that patch :-)
A new pair of functions pcibios_map/unmap_io_space() now replace
all of the previous code that used to manipulate PCI IOs space.
The allocation is done at mapping time, which is now called from
scan_phb's, just before the devices are probed (instead of after,
which is by itself a bug fix). The only other caller is the PCI
hotplug code for hot adding PCI-PCI bridges (slots).
imalloc is gone, as is the "sub-allocation" thing, but I do beleive
that hotplug should still work in the sense that the space allocation
is always done by the PHB, but if you unmap a child bus of this PHB
(which seems to be possible), then the code should properly tear
down all the HPTE mappings for that area of the PHB allocated IO space.
I now always reserve the first 64K of IO space for the bridge with
the ISA bus on it. I have moved the code for tracking ISA in a separate
file which should also make it smarter if we ever are capable of
hot unplugging or re-plugging an ISA bridge.
This should have a side effect on platforms like powermac where VGA IOs
will no longer work. This is done on purpose though as they would have
worked semi-randomly before. The idea at this point is to isolate drivers
that might need to access those and fix them by providing a proper
function to obtain an offset to the legacy IOs of a given bus.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The error path in spufs_fill_dir() is broken. If d_alloc_name() or
spufs_new_file() fails, spufs_prune_dir() is getting called. At this time
dir->inode is not set and a NULL pointer is dereferenced by mutex_lock().
This bugfix replaces spufs_prune_dir() with a shorter version that does
not touch dir->inode but simply removes all children.
Signed-off-by: Sebastian Siewior <bigeasy@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Nosched context sould never be scheduled out, thus we must not
deactivate them in spu_yield ever.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
... and get rid of cpufreq_set_policy call that caused a build
failure due interfering commits.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fix the race between checking for contexts on the runqueue and actually
waking them in spu_deactive and spu_yield.
The guts of spu_reschedule are split into a new helper called
grab_runnable_context which shows if there is a runnable thread below
a specified priority and if yes removes if from the runqueue and uses
it. This function is used by the new __spu_deactivate hepler shared
by preemption and spu_yield to grab a new context before deactivating
a specified priority and if yes removes if from the runqueue and uses
it. This function is used by the new __spu_deactivate hepler shared
by preemption and spu_yield to grab a new context before deactivating
the old one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Make sure the mapping_lock also protects access to the various address_space
pointers used for tearing down the ptes on a spu context switch.
Because unmap_mapping_range can sleep we need to turn mapping_lock from
a spinlock into a sleeping mutex.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In case spufs_fill_dir() fails only put_spu_context()
gets called for cleanup and the acquired mm_struct never gets freed.
Signed-off-by: Sebastian Siewior <bigeasy@linux.vnet.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Previously, closing a SPE gang that still has contexts would trigger
a WARN_ON, and leak the allocated gang.
This change fixes the problem by using the gang's reference counts to
destroy the gang instead. The gangs will persist until their last
reference (be it context or open file handle) is gone.
Also, avoid using statements with side-effects in a WARN_ON().
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently spufs_mem_release and the mem file doesn't have any release
method hooked up, leading to leaks everytime is used.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
As noticed by David Woodhouse, it's currently possible to mount
spufs on any machine, which means that it actually will get
mounted by fedora.
This refuses to load the module on platforms that have no
support for SPUs.
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@ucw.cz>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch renames the raw hard_irq_{enable,disable} into
__hard_irq_{enable,disable} and introduces a higher level hard_irq_disable()
function that can be used by any code to enforce that IRQs are fully disabled,
not only lazy disabled.
The difference with the __ versions is that it will update some per-processor
fields so that the kernel keeps track and properly re-enables them in the next
local_irq_disable();
This prepares powerpc for my next patch that introduces hard_irq_disable()
generically.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Further fixes for the removal of 4level-fixup hack from ppc32
[POWERPC] EEH: log all PCI-X and PCI-E AER registers
[POWERPC] EEH: capture and log pci state on error
[POWERPC] EEH: Split up long error msg
[POWERPC] EEH: log error only after driver notification.
[POWERPC] fsl_soc: Make mac_addr const in fs_enet_of_init().
[POWERPC] Don't use SLAB/SLUB for PTE pages
[POWERPC] Spufs support for 64K LS mappings on 4K kernels
[POWERPC] Add ability to 4K kernel to hash in 64K pages
[POWERPC] Introduce address space "slices"
[POWERPC] Small fixes & cleanups in segment page size demotion
[POWERPC] iSeries: Make HVC_ISERIES the default
[POWERPC] iSeries: suppress build warning in lparmap.c
[POWERPC] Mark pages that don't exist as nosave
[POWERPC] swsusp: Introduce register_nosave_region_late
This adds an option to spufs when the kernel is configured for
4K page to give it the ability to use 64K pages for SPE local store
mappings.
Currently, we are optimistic and try order 4 allocations when creating
contexts. If that fails, the code will fallback to 4K automatically.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The basic issue is to be able to do what hugetlbfs does but with
different page sizes for some other special filesystems; more
specifically, my need is:
- Huge pages
- SPE local store mappings using 64K pages on a 4K base page size
kernel on Cell
- Some special 4K segments in 64K-page kernels for mapping a dodgy
type of powerpc-specific infiniband hardware that requires 4K MMU
mappings for various reasons I won't explain here.
The main issues are:
- To maintain/keep track of the page size per "segment" (as we can
only have one page size per segment on powerpc, which are 256MB
divisions of the address space).
- To make sure special mappings stay within their allotted
"segments" (including MAP_FIXED crap)
- To make sure everybody else doesn't mmap/brk/grow_stack into a
"segment" that is used for a special mapping
Some of the necessary mechanisms to handle that were present in the
hugetlbfs code, but mostly in ways not suitable for anything else.
The patch relies on some changes to the generic get_unmapped_area()
that just got merged. It still hijacks hugetlb callbacks here or
there as the generic code hasn't been entirely cleaned up yet but
that shouldn't be a problem.
So what is a slice ? Well, I re-used the mechanism used formerly by our
hugetlbfs implementation which divides the address space in
"meta-segments" which I called "slices". The division is done using
256MB slices below 4G, and 1T slices above. Thus the address space is
divided currently into 16 "low" slices and 16 "high" slices. (Special
case: high slice 0 is the area between 4G and 1T).
Doing so simplifies significantly the tracking of segments and avoids
having to keep track of all the 256MB segments in the address space.
While I used the "concepts" of hugetlbfs, I mostly re-implemented
everything in a more generic way and "ported" hugetlbfs to it.
Slices can have an associated page size, which is encoded in the mmu
context and used by the SLB miss handler to set the segment sizes. The
hash code currently doesn't care, it has a specific check for hugepages,
though I might add a mechanism to provide per-slice hash mapping
functions in the future.
The slice code provide a pair of "generic" get_unmapped_area() (bottomup
and topdown) functions that should work with any slice size. There is
some trickiness here so I would appreciate people to have a look at the
implementation of these and let me know if I got something wrong.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.
Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
SLAB.
I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.
I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.
Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on. If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code. But there is no such code
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e. add debug code before kfree).
There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches. Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
This is the last slab flag that SLUB did not support. Remove the check for
unimplemented flags from SLUB.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
for consistency with other Open Firmware interfaces (and Sparc).
This is just a straight replacement.
This leaves the compatibility define in place.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Use DEFINE_SPINLOCK instead of initializing spinlocks to
SPIN_LOCK_UNLOCKED, since DEFINE_SPINLOCK is better for lockdep.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Just another pass through arch/powerpc for old usages.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
check_legacy_ioport makes only sense on PREP, CHRP and pSeries.
They may have an isa node with PS/2, parport, floppy and serial ports.
Remove the check_legacy_ioport call from ppc_md, it's not needed
anymore. Hardware capabilities come from the device-tree.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Enable Periodic Recalibration (PTCAL) support for Cell XDR memory,
using the new ibm,cbe-start-ptcal and ibm,cbe-stop-ptcal RTAS calls.
Tested on QS20 and QS21 (by Thomas Huth). It seems that SLOF has
problems disabling, at least on QS20; this patch should only be
used once these problems have been addressed.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch adds support for a proper device-tree.
A porper device-tree on cell contains be nodes
for each CBE containg nodes for SPEs and all the
other special devices on it.
Ofcourse oldschool devicetree is still supported.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The new PMI driver was added in order to support
cpufreq on blades that require the frequency to
be controlled by the service processor, so use it
on those.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch adds some attributes the cpu and spu nodes:
/sys/devices/system/[c|s]pu/[c|s]pu*/thermal/throttle_begin
/sys/devices/system/[c|s]pu/[c|s]pu*/thermal/throttle_end
/sys/devices/system/[c|s]pu/[c|s]pu*/thermal/throttle_full_stop
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch introduces a little function for transforming
register values into temperature.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch adds code to deal with conversion of
logical cpu to cbe nodes. It removes code that
assummed there were two logical CPUs per CBE.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This change fixes the case where spu_base and spufs are initialised on a
system with no SPEs - unconditionally create the spu_lists so spu_alloc
doesn't explode, and check for spu_management ops before starting spufs.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
arch/powerpc/platforms/cell/spu_base.c | 7 ++++---
arch/powerpc/platforms/cell/spufs/inode.c | 5 +++++
2 files changed, 9 insertions(+), 3 deletions(-)
spu_base.c is always built into the kernel image, so there is no need
for a cleanup function. And some of the things it does are in the
way for my following patches, so I'd rather get rid of it ASAP.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
- remove the spu_acquire_runnable from spu_run_init. I need to
opencode it in spufs_run_spu in the next patch
- remove various inline attributes, we don't really want to inline
long functions with multiple callsites
- cleanup return values and runcntl_write calls in spu_run_init
- use normal kernel codingstyle in spu_reacquire_runnable
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
spu_coredump_calls.owner is NULL in case of a builtin spufs,
so the checks in here break.
Check for the availability of the spu_coredump_calls variable
instead.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Dynamically allocated read/write buffer in spufs_arch_write_note() will
not be freed. Convert it to get_free_page at the same time.
Cc: Akinobu Mita <mita@fixstars.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Change the loop in spu_wait to be a little more straightforward.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Add a 'mode=' option to spufs mount arguments. This allows more
control over access to the top-level spufs directory.
Tested on Cell.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
GCC may generates inline copy loop to handle memcpy() function
instead of kernel defined memcpy(). But this inlined version of memcpy()
causes an alignment interrupt when copying from local store.
This patch uses memcpy_fromio() and memcpy_toio to copy local store
to prevent memcpy() being inlined.
Signed-off-by: Akinobu Mita <mita@fixstars.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
We now have proper locking around assignets of the mapping pointers,
and the spin_unlock implies enough of a barrier to get rid of the
explicit one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
When SPU isolation mode enabled, isolated_loader would be
allocated by spufs_init_isolated_loader() on module_init().
But anyone do not free it.
This patch introduces spufs_exit_isolated_loader() which is
the opposite of spufs_init_isolated_loader() and called on
module_exit().
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Akinobu Mita <mita@fixstars.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
spufs module_init forgot to call a few cleanup functions
on error path. This patch also includes cosmetic changes in
spu_sched_init() (identation fix and return error code).
[modified by hch to apply ontop of the latest schedule changes]
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Akinobu Mita <mita@fixstars.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch checks return value of spu_acquire_runnable() in
spufs_mfc_write().
Signed-off-by: Akinobu Mita <mita@fixstars.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
There is no reason for run_sema to be a struct semaphore. Changing
it to a mutex and rename it accordingly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This change populates a siginfo struct for SPE application exceptions
(ie, invalid DMAs and illegal instructions).
Tested on an IBM Cell Blade.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Until now, we have always entered the spu page fault handler
with a mutex for the spu context held. This has multiple
bad side-effects:
- it becomes impossible to suspend the context during
page faults
- if an spu program attempts to access its own mmio
areas through DMA, we get an immediate livelock when
the nopage function tries to acquire the same mutex
This patch makes the page fault logic operate on a
struct spu_context instead of a struct spu, and moves it
from spu_base.c to a new file fault.c inside of spufs.
We now also need to copy the dar and dsisr contents
of the last fault into the saved context to have it
accessible in case we schedule out the context before
activating the page fault handler.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
There is no reason to execute spu_init_channels under spu_mutex
after the spu has been taken off the freelist it's ours.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Addition to stop_wq needs to happen before adding to the runqeueue and
under the same lock so that we don't have a race window for a lost
wake up in the spu scheduler.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
For quite a while now spu state is protected by a simple mutex instead
of the old rw_semaphore, and this means we can simplify the locking
around spu_setup_isolated a lot.
Instead of doing an spu_release before entering spu_setup_isolated and
then calling the complicated spu_acquire_exclusive we can now simply
enter the function locked an in guaranteed runnable state, so that the
only bit of spu_acquire_exclusive that's left is the call to
spu_unmap_mappings.
Similarly there's no more need to unlock and reacquire the state_mutex
when spu_setup_isolated is done, but we can always return with the
lock held and only drop it in spu_run_init in the failure case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
A single context should only be woken once, and we should not have
more wakeups for a given priority than the number of contexts on
that runqueue position.
Also add some asserts to trap future problems in this area more
easily.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
set_bit does not guarantee ordering on powerpc, so using it
for communication between threads requires explicit
mb() calls.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
To not lose a spu thread we need to make sure it always gets put back
on the runqueue. In find_victim aswell as in the scheduler tick as done
in the previous patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
To not lose a spu thread we need to make sure it always gets put back
on the runqueue.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Make sure the pointers to various mappings are cleared once the last
user stopped using them. This avoids accessing freed memory when
tearing down the gang directory aswell as optimizing away
pte invalidations if no one uses these.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The scheduler workqueue may rearm itself and deadlock when we try to stop
it. Put a flag in place to avoid skip the work if we're tearing down
the context.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This also fixes a bug where a property value was being modified
in place.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There is no reason to yield the CPU in spu_yield - if the backing
thread reenters spu_run it gets added to the end of the runqueue for
it's priority. So the yield is just a slowdown for the case where
we have higher priority contexts waiting.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
I wanted to enable CBE_THERM on PS3. So I had to enable CBE_RAS first.
But the resulting kernel doesn't link, as cbe_regs.c isn't compiled for
non-PPC_CELL_NATIVE.
CBE_RAS should depend on PPC_CELL_NATIVE; this makes it so.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The SPU code doesn't properly invalidate SPUs SLBs when necessary,
for example when changing a segment size from the hugetlbfs code. In
addition, it saves and restores the SLB content on context switches
which makes it harder to properly handle those invalidations.
This patch removes the saving & restoring for now, something more
efficient might be found later on. It also adds a spu_flush_all_slbs(mm)
that can be used by the core mm code to flush the SLBs of all SPEs that
are running a given mm at the time of the flush.
In order to do that, it adds a spinlock to the list of all SPEs and move
some bits & pieces from spufs to spu_base.c
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Due to a buggy unsigned comparison, it was possible to write
beyond the end of the local store file in spufs under some
circumstances.
This rewrites the buggy function to look more like
simple_copy_from_buffer.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Cc: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
This will allow us to build without PCI easier.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This is a clean up patch that includes the following changes:
-Some comments were added to clarify the code based on feedback
from the community.
-The write_pm_cntrl() and set_count_mode() were passed a
structure element from a global variable. The argument was
removed so the functions now just operate on the global directly.
-The set_pm_event() function call in the cell_virtual_cntr()
routine was moved to a for-loop before the for_each_cpu loop
Signed-off-by: Carl Love <carll@us.ibm.com>
Signed-off-by: Maynard Johnson <mpjohn@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
I found an exploit in current kernel.
Currently, there is no range check about mmapping "/mem" node in
spufs. Thus, an application can access privilege memory region.
In case this kernel already worked on a public server, I send this
information only here.
If there are such servers in somewhere, please replace it, ASAP.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
For SCHED_RR tasks we can do some really trivial timeslicing. Basically
we fire up a time for every scheduler tick that searches for a higher
or same priority thread that is on the runqueue and if there is one
context switches to it. Because we can't lock spus from timer context
we actually run this from a delayed runqueue instead of a timer.
A nice optimization would be to skip the actual priority bitmap search
when there are less contexts than physical spus available. To implement
this I need a so far unpublished patch from Andre, and it will be added
after we have that patch in.
Note that right now we only do the time slicing for SCHED_RR tasks.
The code would work for SCHED_OTHER tasks aswell, but their prio
value is defered from the one the PPU thread has at time of spu_run,
and using this for spu scheduling decisions would make the code very
unfair. SCHED_OTHER support will be enabled once we the spu scheduler
knows how to calculcate cpu_context.prio (very soon)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
use DECLARE_BITMAP in the spu scheduler instead of reimplementing it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
If we start a spu context with realtime priority we want it to run
immediately and not wait until some other lower priority thread has
finished. Try to find a suitable victim and use it's spu in this
case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Give spu_yield a kerneldoc comment and remove the old comment
documenting spu_activate, spu_deactive and spu_yield as all of them
now have descriptive kerneldoc comments of their own.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
If we call spu_remove_from_active_list that spu is always guaranteed
to be on the active list and in runnable state, so we can simply
do a list_del to remove it and unconditionally take the was_active
codepath.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
There is no need to directly wake up contexts in spu_activate when
called from spu_run, so add a flag to surpress this wakeup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This is the biggest patch in this series, and it reworks the guts of
the spu scheduler runqueue mechanism:
- instead of embedding a waitqueue in the runqueue there is now a
simple doubly-linked list, the actual wakeups happen by reusing
the stop_wq in the spu context (maybe we should rename it one day)
- spu_free and spu_prio_wakeup are merged into a single spu_reschedule
function
- various functionality is split out into small helpers, and kerneldoc
comments are added in various places to document what's going on.
- spu_activate is rewritten into a tight loop by removing test for
various impossible conditions and using the infrastructure in this
patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
It doesn't make any sense to have a priority field in the physical spu
structure. Move it into the spu context instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Various cleanups in code surrounding the state semaphore:
- inline spu_acquire/spu_release
- cleanup spu_acquire_* and add kerneldoc comments to these functions
- remove spu_release_exclusive and replace it with spu_release
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The r/w semaphore to lock the spus was overkill and can be replaced
with a mutex to make it faster, simpler and easier to debug. It also
helps to allow making most spufs interruptible in future patches.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Various cleanups to sched.c that don't change the global control flow:
- add kerneldoc comments to various functions
- add spu_ prefixes to various functions
- add/remove context from the runqueue in bind/unbind_context as
it's part of the logical operation
- add a call to put_active_spu to spu_unbind_contex as it's logically
part of the unbind operation
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Only bind_context/unbind_context change the spu context state. Thus
we can move all assignents of SPU_STATE_RUNNABLE into bind_context,
which parallels the unbind side aswell.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
unbind_context already sets the context state to SPU_STATE_SAVED, thus
the spu_deactivate callers don't need to do it again.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Remove the empty last line in arch/powerpc/platforms/cell/spufs/run.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Remove the SPU_CONTEXT_PREEMPT define. It's unused and won't be used
in this form after the scheduler rework.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
It looks like we've had some serious bitrot there mostly due to tracking
of address_space's of mmap'ed files getting out of sync with the actual
mmap code. The mfc, mss and psmap were not tracked properly and thus
not invalidated on context switches (oops !)
I also removed the various file->f_mapping = inode->i_mapping;
assignments that were done in the other open() routines since that
is already done for us by __dentry_open.
One improvement we might want to do later is to assign the various
ctx-> fields at mmap time instead of file open/close time so that we
don't call unmap_mapping_range() on thing that have not been mmap'ed
Finally, I added some smp_wmb's after assigning the ctx-> fields to make
sure they are visible to other CPUs. I don't think this is really
necessary as I suspect locking in the fs layer will make that happen
anyway but better safe than sorry.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch removes the need for struct page for SPE local store
and registers from spufs. It also makes the locking much more
obvious and no longer relying on the truncate logic black magic
for protecting against races between unmap_mapping_range() and
new pages faulted in. It does so by switching to a nopfn() handler
and using the new vm_insert_pfn() to setup the PTEs itself while
holding a lock on the SPE.
The nice thing is that this patch actually removes a lot more code
than it adds :-)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Many struct inode_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
[akpm@osdl.org: sparc64 fix]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Spu management ops in arch/platforms/cell/spu_priv1_mmio.h can be used
commonly in of based platform. This patch separates spu management ops
from native cell code and uses on celleb platform.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Signed-off-by: Paul Mackerras <paulus@samba.org>