Commit graph

57487 commits

Author SHA1 Message Date
Jiri Kosina 1deb9d341d HID: debug: fix RCU preemption issue
Commit 2353f2bea ("HID: protect hid_debug_list") introduced mutex
locking around debug_list access to prevent SMP races when debugfs
nodes are being operated upon by multiple userspace processess.

mutex is not a proper synchronization primitive though, as the hid-debug
callbacks are being called from atomic contexts.

We also have to be careful about disabling IRQs when taking the lock
to prevent deadlock against IRQ handlers.

Benjamin reports this has also been reported in RH bugzilla as bug #958935.

 ===============================
 [ INFO: suspicious RCU usage. ]
 3.9.0+ #94 Not tainted
 -------------------------------
 include/linux/rcupdate.h:476 Illegal context switch in RCU read-side critical section!

 other info that might help us debug this:

 rcu_scheduler_active = 1, debug_locks = 0
 4 locks held by Xorg/5502:
  #0:  (&evdev->mutex){+.+...}, at: [<ffffffff81512c3d>] evdev_write+0x6d/0x160
  #1:  (&(&dev->event_lock)->rlock#2){-.-...}, at: [<ffffffff8150dd9b>] input_inject_event+0x5b/0x230
  #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff8150dd82>] input_inject_event+0x42/0x230
  #3:  (&(&usbhid->lock)->rlock){-.....}, at: [<ffffffff81565289>] usb_hidinput_input_event+0x89/0x120

 stack backtrace:
 CPU: 0 PID: 5502 Comm: Xorg Not tainted 3.9.0+ #94
 Hardware name: Dell Inc. OptiPlex 390/0M5DCD, BIOS A09 07/24/2012
  0000000000000001 ffff8800689c7c38 ffffffff816f249f ffff8800689c7c68
  ffffffff810acb1d 0000000000000000 ffffffff81a03ac7 000000000000019d
  0000000000000000 ffff8800689c7c90 ffffffff8107cda7 0000000000000000
 Call Trace:
  [<ffffffff816f249f>] dump_stack+0x19/0x1b
  [<ffffffff810acb1d>] lockdep_rcu_suspicious+0xfd/0x130
  [<ffffffff8107cda7>] __might_sleep+0xc7/0x230
  [<ffffffff816f7770>] mutex_lock_nested+0x40/0x3a0
  [<ffffffff81312ac4>] ? vsnprintf+0x354/0x640
  [<ffffffff81553cc4>] hid_debug_event+0x34/0x100
  [<ffffffff81554197>] hid_dump_input+0x67/0xa0
  [<ffffffff81556430>] hid_set_field+0x50/0x120
  [<ffffffff8156529a>] usb_hidinput_input_event+0x9a/0x120
  [<ffffffff8150d89e>] input_handle_event+0x8e/0x530
  [<ffffffff8150df10>] input_inject_event+0x1d0/0x230
  [<ffffffff8150dd82>] ? input_inject_event+0x42/0x230
  [<ffffffff81512cae>] evdev_write+0xde/0x160
  [<ffffffff81185038>] vfs_write+0xc8/0x1f0
  [<ffffffff81185535>] SyS_write+0x55/0xa0
  [<ffffffff81704482>] system_call_fastpath+0x16/0x1b
 BUG: sleeping function called from invalid context at kernel/mutex.c:413
 in_atomic(): 1, irqs_disabled(): 1, pid: 5502, name: Xorg
 INFO: lockdep is turned off.
 irq event stamp: 1098574
 hardirqs last  enabled at (1098573): [<ffffffff816fb53f>] _raw_spin_unlock_irqrestore+0x3f/0x70
 hardirqs last disabled at (1098574): [<ffffffff816faaf5>] _raw_spin_lock_irqsave+0x25/0xa0
 softirqs last  enabled at (1098306): [<ffffffff8104971f>] __do_softirq+0x18f/0x3c0
 softirqs last disabled at (1097867): [<ffffffff81049ad5>] irq_exit+0xa5/0xb0
 CPU: 0 PID: 5502 Comm: Xorg Not tainted 3.9.0+ #94
 Hardware name: Dell Inc. OptiPlex 390/0M5DCD, BIOS A09 07/24/2012
  ffffffff81a03ac7 ffff8800689c7c68 ffffffff816f249f ffff8800689c7c90
  ffffffff8107ce60 0000000000000000 ffff8800689c7fd8 ffff88006a62c800
  ffff8800689c7d10 ffffffff816f7770 ffff8800689c7d00 ffffffff81312ac4
 Call Trace:
  [<ffffffff816f249f>] dump_stack+0x19/0x1b
  [<ffffffff8107ce60>] __might_sleep+0x180/0x230
  [<ffffffff816f7770>] mutex_lock_nested+0x40/0x3a0
  [<ffffffff81312ac4>] ? vsnprintf+0x354/0x640
  [<ffffffff81553cc4>] hid_debug_event+0x34/0x100
  [<ffffffff81554197>] hid_dump_input+0x67/0xa0
  [<ffffffff81556430>] hid_set_field+0x50/0x120
  [<ffffffff8156529a>] usb_hidinput_input_event+0x9a/0x120
  [<ffffffff8150d89e>] input_handle_event+0x8e/0x530
  [<ffffffff8150df10>] input_inject_event+0x1d0/0x230
  [<ffffffff8150dd82>] ? input_inject_event+0x42/0x230
  [<ffffffff81512cae>] evdev_write+0xde/0x160
  [<ffffffff81185038>] vfs_write+0xc8/0x1f0
  [<ffffffff81185535>] SyS_write+0x55/0xa0
  [<ffffffff81704482>] system_call_fastpath+0x16/0x1b

Reported-by: majianpeng <majianpeng@gmail.com>
Reported-by: Benjamin Tissoires <benjamin.tissoires@gmail.com>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-05-06 13:07:33 +02:00
Linus Torvalds 19b344efa3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID updates from Jiri Kosina:

 - hid driver transport cleanup, finalizing the long-desired decoupling
   of core from transport layers, by Benjamin Tissoires and Henrik
   Rydberg

 - support for hybrid finger/pen multitouch HID devices, by Benjamin
   Tissoires

 - fix for long-standing issue in Logitech unifying driver sometimes not
   inializing properly due to device specifics, by Andrew de los Reyes

 - Wii remote driver updates to support 2nd generation of devices, by
   David Herrmann

 - support for Apple IR remote

 - roccat driver now supports new devices (Roccat Kone Pure, IskuFX), by
   Stefan Achatz

 - debugfs locking fixes in hid debug interface, by Jiri Kosina

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (43 commits)
  HID: protect hid_debug_list
  HID: debug: break out hid_dump_report() into hid-debug
  HID: Add PID for Japanese version of NE4K keyboard
  HID: hid-lg4ff add support for new version of DFGT wheel
  HID: icade: u16 which never < 0
  HID: clarify Magic Mouse Kconfig description
  HID: appleir: add support for Apple ir devices
  HID: roccat: added media key support for Kone
  HID: hid-lenovo-tpkbd: remove doubled hid_get_drvdata
  HID: i2c-hid: fix length for set/get report in i2c hid
  HID: wiimote: parse reduced status reports
  HID: wiimote: add 2nd generation Wii Remote IDs
  HID: wiimote: use unique battery names
  HID: hidraw: warn if userspace headers are outdated
  HID: multitouch: force BTN_STYLUS for pen devices
  HID: multitouch: append " Pen" to the name of the stylus input
  HID: multitouch: add handling for pen in dual-sensors device
  HID: multitouch: change touch sensor detection in mt_input_configured()
  HID: multitouch: do not map usage from non used reports
  HID: multitouch: breaks out touch handling in specific functions
  ...
2013-04-30 09:37:55 -07:00
Linus Torvalds 5d434fcb25 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual stuff, mostly comment fixes, typo fixes, printk fixes and small
  code cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (45 commits)
  mm: Convert print_symbol to %pSR
  gfs2: Convert print_symbol to %pSR
  m32r: Convert print_symbol to %pSR
  iostats.txt: add easy-to-find description for field 6
  x86 cmpxchg.h: fix wrong comment
  treewide: Fix typo in printk and comments
  doc: devicetree: Fix various typos
  docbook: fix 8250 naming in device-drivers
  pata_pdc2027x: Fix compiler warning
  treewide: Fix typo in printks
  mei: Fix comments in drivers/misc/mei
  treewide: Fix typos in kernel messages
  pm44xx: Fix comment for "CONFIG_CPU_IDLE"
  doc: Fix typo "CONFIG_CGROUP_CGROUP_MEMCG_SWAP"
  mmzone: correct "pags" to "pages" in comment.
  kernel-parameters: remove outdated 'noresidual' parameter
  Remove spurious _H suffixes from ifdef comments
  sound: Remove stray pluses from Kconfig file
  radio-shark: Fix printk "CONFIG_LED_CLASS"
  doc: put proper reference to CONFIG_MODULE_SIG_ENFORCE
  ...
2013-04-30 09:36:50 -07:00
Linus Torvalds 5a5a1bf099 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS changes from Ingo Molnar:

 - Add an Intel CMCI hotplug fix

 - Add AMD family 16h EDAC support

 - Make the AMD MCE banks code more flexible for virtual environments

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  amd64_edac: Add Family 16h support
  x86/mce: Rework cmci_rediscover() to play well with CPU hotplug
  x86, MCE, AMD: Use MCG_CAP MSR to find out number of banks on AMD
  x86, MCE, AMD: Replace shared_bank array with is_shared_bank() helper
2013-04-30 08:42:45 -07:00
Linus Torvalds ab86e974f0 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core timer updates from Ingo Molnar:
 "The main changes in this cycle's merge are:

   - Implement shadow timekeeper to shorten in kernel reader side
     blocking, by Thomas Gleixner.

   - Posix timers enhancements by Pavel Emelyanov:

   - allocate timer ID per process, so that exact timer ID allocations
     can be re-created be checkpoint/restore code.

   - debuggability and tooling (/proc/PID/timers, etc.) improvements.

   - suspend/resume enhancements by Feng Tang: on certain new Intel Atom
     processors (Penwell and Cloverview), there is a feature that the
     TSC won't stop in S3 state, so the TSC value won't be reset to 0
     after resume.  This can be taken advantage of by the generic via
     the CLOCK_SOURCE_SUSPEND_NONSTOP flag: instead of using the RTC to
     recover/approximate sleep time, the main (and precise) clocksource
     can be used.

   - Fix /proc/timer_list for 4096 CPUs by Nathan Zimmer: on so many
     CPUs the file goes beyond 4MB of size and thus the current
     simplistic seqfile approach fails.  Convert /proc/timer_list to a
     proper seq_file with its own iterator.

   - Cleanups and refactorings of the core timekeeping code by John
     Stultz.

   - International Atomic Clock time is managed by the NTP code
     internally currently but not exposed externally.  Separate the TAI
     code out and add CLOCK_TAI support and TAI support to the hrtimer
     and posix-timer code, by John Stultz.

   - Add deep idle support enhacement to the broadcast clockevents core
     timer code, by Daniel Lezcano: add an opt-in CLOCK_EVT_FEAT_DYNIRQ
     clockevents feature (which will be utilized by future clockevents
     driver updates), which allows the use of IRQ affinities to avoid
     spurious wakeups of idle CPUs - the right CPU with an expiring
     timer will be woken.

   - Add new ARM bcm281xx clocksource driver, by Christian Daudt

   - ... various other fixes and cleanups"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  clockevents: Set dummy handler on CPU_DEAD shutdown
  timekeeping: Update tk->cycle_last in resume
  posix-timers: Remove unused variable
  clockevents: Switch into oneshot mode even if broadcast registered late
  timer_list: Convert timer list to be a proper seq_file
  timer_list: Split timer_list_show_tickdevices
  posix-timers: Show sigevent info in proc file
  posix-timers: Introduce /proc/PID/timers file
  posix timers: Allocate timer id per process (v2)
  timekeeping: Make sure to notify hrtimers when TAI offset changes
  hrtimer: Fix ktime_add_ns() overflow on 32bit architectures
  hrtimer: Add expiry time overflow check in hrtimer_interrupt
  timekeeping: Shorten seq_count region
  timekeeping: Implement a shadow timekeeper
  timekeeping: Delay update of clock->cycle_last
  timekeeping: Store cycle_last value in timekeeper struct as well
  ntp: Remove ntp_lock, using the timekeeping locks to protect ntp state
  timekeeping: Simplify tai updating from do_adjtimex
  timekeeping: Hold timekeepering locks in do_adjtimex and hardpps
  timekeeping: Move ADJ_SETOFFSET to top level do_adjtimex()
  ...
2013-04-30 08:15:40 -07:00
Linus Torvalds 8700c95adb Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP/hotplug changes from Ingo Molnar:
 "This is a pretty large, multi-arch series unifying and generalizing
  the various disjunct pieces of idle routines that architectures have
  historically copied from each other and have grown in random, wildly
  inconsistent and sometimes buggy directions:

   101 files changed, 455 insertions(+), 1328 deletions(-)

  this went through a number of review and test iterations before it was
  committed, it was tested on various architectures, was exposed to
  linux-next for quite some time - nevertheless it might cause problems
  on architectures that don't read the mailing lists and don't regularly
  test linux-next.

  This cat herding excercise was motivated by the -rt kernel, and was
  brought to you by Thomas "the Whip" Gleixner."

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  idle: Remove GENERIC_IDLE_LOOP config switch
  um: Use generic idle loop
  ia64: Make sure interrupts enabled when we "safe_halt()"
  sparc: Use generic idle loop
  idle: Remove unused ARCH_HAS_DEFAULT_IDLE
  bfin: Fix typo in arch_cpu_idle()
  xtensa: Use generic idle loop
  x86: Use generic idle loop
  unicore: Use generic idle loop
  tile: Use generic idle loop
  tile: Enter idle with preemption disabled
  sh: Use generic idle loop
  score: Use generic idle loop
  s390: Use generic idle loop
  powerpc: Use generic idle loop
  parisc: Use generic idle loop
  openrisc: Use generic idle loop
  mn10300: Use generic idle loop
  mips: Use generic idle loop
  microblaze: Use generic idle loop
  ...
2013-04-30 07:50:17 -07:00
Linus Torvalds 16fa94b532 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "The main changes in this development cycle were:

   - full dynticks preparatory work by Frederic Weisbecker

   - factor out the cpu time accounting code better, by Li Zefan

   - multi-CPU load balancer cleanups and improvements by Joonsoo Kim

   - various smaller fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
  sched: Fix init NOHZ_IDLE flag
  sched: Prevent to re-select dst-cpu in load_balance()
  sched: Rename load_balance_tmpmask to load_balance_mask
  sched: Move up affinity check to mitigate useless redoing overhead
  sched: Don't consider other cpus in our group in case of NEWLY_IDLE
  sched: Explicitly cpu_idle_type checking in rebalance_domains()
  sched: Change position of resched_cpu() in load_balance()
  sched: Fix wrong rq's runnable_avg update with rt tasks
  sched: Document task_struct::personality field
  sched/cpuacct/UML: Fix header file dependency bug on the UML build
  cgroup: Kill subsys.active flag
  sched/cpuacct: No need to check subsys active state
  sched/cpuacct: Initialize cpuacct subsystem earlier
  sched/cpuacct: Initialize root cpuacct earlier
  sched/cpuacct: Allocate per_cpu cpuusage for root cpuacct statically
  sched/cpuacct: Clean up cpuacct.h
  sched/cpuacct: Remove redundant NULL checks in cpuacct_acount_field()
  sched/cpuacct: Remove redundant NULL checks in cpuacct_charge()
  sched/cpuacct: Add cpuacct_acount_field()
  sched/cpuacct: Add cpuacct_init()
  ...
2013-04-30 07:43:28 -07:00
Linus Torvalds e0972916e8 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Features:

   - Add "uretprobes" - an optimization to uprobes, like kretprobes are
     an optimization to kprobes.  "perf probe -x file sym%return" now
     works like kretprobes.  By Oleg Nesterov.

   - Introduce per core aggregation in 'perf stat', from Stephane
     Eranian.

   - Add memory profiling via PEBS, from Stephane Eranian.

   - Event group view for 'annotate' in --stdio, --tui and --gtk, from
     Namhyung Kim.

   - Add support for AMD NB and L2I "uncore" counters, by Jacob Shin.

   - Add Ivy Bridge-EP uncore support, by Zheng Yan

   - IBM zEnterprise EC12 oprofile support patchlet from Robert Richter.

   - Add perf test entries for checking breakpoint overflow signal
     handler issues, from Jiri Olsa.

   - Add perf test entry for for checking number of EXIT events, from
     Namhyung Kim.

   - Add perf test entries for checking --cpu in record and stat, from
     Jiri Olsa.

   - Introduce perf stat --repeat forever, from Frederik Deweerdt.

   - Add --no-demangle to report/top, from Namhyung Kim.

   - PowerPC fixes plus a couple of cleanups/optimizations in uprobes
     and trace_uprobes, by Oleg Nesterov.

  Various fixes and refactorings:

   - Fix dependency of the python binding wrt libtraceevent, from
     Naohiro Aota.

   - Simplify some perf_evlist methods and to allow 'stat' to share code
     with 'record' and 'trace', by Arnaldo Carvalho de Melo.

   - Remove dead code in related to libtraceevent integration, from
     Namhyung Kim.

   - Revert "perf sched: Handle PERF_RECORD_EXIT events" to get 'perf
     sched lat' back working, by Arnaldo Carvalho de Melo

   - We don't use Newt anymore, just plain libslang, by Arnaldo Carvalho
     de Melo.

   - Kill a bunch of die() calls, from Namhyung Kim.

   - Fix build on non-glibc systems due to libio.h absence, from Cody P
     Schafer.

   - Remove some perf_session and tracing dead code, from David Ahern.

   - Honor parallel jobs, fix from Borislav Petkov

   - Introduce tools/lib/lk library, initially just removing duplication
     among tools/perf and tools/vm.  from Borislav Petkov

  ... and many more I missed to list, see the shortlog and git log for
  more details."

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (136 commits)
  perf/x86/intel/P4: Robistify P4 PMU types
  perf/x86/amd: Fix AMD NB and L2I "uncore" support
  perf/x86/amd: Remove old-style NB counter support from perf_event_amd.c
  perf/x86: Check all MSRs before passing hw check
  perf/x86/amd: Add support for AMD NB and L2I "uncore" counters
  perf/x86/intel: Add Ivy Bridge-EP uncore support
  perf/x86/intel: Fix SNB-EP CBO and PCU uncore PMU filter management
  perf/x86: Avoid kfree() in CPU_{STARTING,DYING}
  uprobes/perf: Avoid perf_trace_buf_prepare/submit if ->perf_events is empty
  uprobes/tracing: Don't pass addr=ip to perf_trace_buf_submit()
  uprobes/tracing: Change create_trace_uprobe() to support uretprobes
  uprobes/tracing: Make seq_printf() code uretprobe-friendly
  uprobes/tracing: Make register_uprobe_event() paths uretprobe-friendly
  uprobes/tracing: Make uprobe_{trace,perf}_print() uretprobe-friendly
  uprobes/tracing: Introduce is_ret_probe() and uretprobe_dispatcher()
  uprobes/tracing: Introduce uprobe_{trace,perf}_print() helpers
  uprobes/tracing: Generalize struct uprobe_trace_entry_head
  uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls
  uprobes/tracing: Kill the pointless seq_print_ip_sym() call
  uprobes/tracing: Kill the pointless task_pt_regs() calls
  ...
2013-04-30 07:41:01 -07:00
Linus Torvalds 1f889ec62c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main changes in this cycle are mostly related to preparatory work
  for the full-dynticks work:

   - Remove restrictions on no-CBs CPUs, make RCU_FAST_NO_HZ take
     advantage of numbered callbacks, do callback accelerations based on
     numbered callbacks.  Posted to LKML at
        https://lkml.org/lkml/2013/3/18/960

   - RCU documentation updates.  Posted to LKML at
        https://lkml.org/lkml/2013/3/18/570

   - Miscellaneous fixes.  Posted to LKML at
        https://lkml.org/lkml/2013/3/18/594"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  rcu: Make rcu_accelerate_cbs() note need for future grace periods
  rcu: Abstract rcu_start_future_gp() from rcu_nocb_wait_gp()
  rcu: Rename n_nocb_gp_requests to need_future_gp
  rcu: Push lock release to rcu_start_gp()'s callers
  rcu: Repurpose no-CBs event tracing to future-GP events
  rcu: Rearrange locking in rcu_start_gp()
  rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks
  rcu: Accelerate RCU callbacks at grace-period end
  rcu: Export RCU_FAST_NO_HZ parameters to sysfs
  rcu: Distinguish "rcuo" kthreads by RCU flavor
  rcu: Add event tracing for no-CBs CPUs' grace periods
  rcu: Add event tracing for no-CBs CPUs' callback registration
  rcu: Introduce proper blocking to no-CBs kthreads GP waits
  rcu: Provide compile-time control for no-CBs CPUs
  rcu: Tone down debugging during boot-up and shutdown.
  rcu: Add softirq-stall indications to stall-warning messages
  rcu: Documentation update
  rcu: Make bugginess of code sample more evident
  rcu: Fix hlist_bl_set_first_rcu() annotation
  rcu: Delete unused rcu_node "wakemask" field
  ...
2013-04-30 07:39:01 -07:00
Jiri Kosina 72c16d9a5c Merge branch 'for-3.10/mt-hybrid-finger-pen' into for-linus
Conflicts:
	drivers/hid/hid-multitouch.c
2013-04-30 10:17:48 +02:00
Jiri Kosina 4f5a810429 Merge branches 'for-3.10/appleir', 'for-3.10/hid-debug', 'for-3.10/hid-driver-transport-cleanups', 'for-3.10/i2c-hid' and 'for-3.10/logitech' into for-linus 2013-04-30 10:12:44 +02:00
Jiri Kosina 2353f2bea3 HID: protect hid_debug_list
Accesses to hid_device->hid_debug_list are not serialized properly, which
could result in SMP concurrency issues when HID debugfs events are accessesed
by multiple userspace processess.

Serialize all the list operations by a mutex.

Spotted by Al Viro.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-04-30 10:09:31 +02:00
Benjamin Tissoires a5f04b9df1 HID: debug: break out hid_dump_report() into hid-debug
No semantic changes, but hid_dump_report should be in hid-debug.c, not
in hid-core.c

Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-04-30 10:09:06 +02:00
Linus Torvalds 56847d857c Merge branch 'akpm' (incoming from Andrew)
Merge second batch of fixes from Andrew Morton:

 - various misc bits

 - some printk updates

 - a new "SRAM" driver.

 - MAINTAINERS updates

 - the backlight driver queue

 - checkpatch updates

 - a few init/ changes

 - a huge number of drivers/rtc changes

 - fatfs updates

 - some lib/idr.c work

 - some renaming of the random driver interfaces

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (285 commits)
  net: rename random32 to prandom
  net/core: remove duplicate statements by do-while loop
  net/core: rename random32() to prandom_u32()
  net/netfilter: rename random32() to prandom_u32()
  net/sched: rename random32() to prandom_u32()
  net/sunrpc: rename random32() to prandom_u32()
  scsi: rename random32() to prandom_u32()
  lguest: rename random32() to prandom_u32()
  uwb: rename random32() to prandom_u32()
  video/uvesafb: rename random32() to prandom_u32()
  mmc: rename random32() to prandom_u32()
  drbd: rename random32() to prandom_u32()
  kernel/: rename random32() to prandom_u32()
  mm/: rename random32() to prandom_u32()
  lib/: rename random32() to prandom_u32()
  x86: rename random32() to prandom_u32()
  x86: pageattr-test: remove srandom32 call
  uuid: use prandom_bytes()
  raid6test: use prandom_bytes()
  sctp: convert sctp_assoc_set_id() to use idr_alloc_cyclic()
  ...
2013-04-29 19:47:50 -07:00
Linus Torvalds 191a712090 Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - Fixes and a lot of cleanups.  Locking cleanup is finally complete.
   cgroup_mutex is no longer exposed to individual controlelrs which
   used to cause nasty deadlock issues.  Li fixed and cleaned up quite a
   bit including long standing ones like racy cgroup_path().

 - device cgroup now supports proper hierarchy thanks to Aristeu.

 - perf_event cgroup now supports proper hierarchy.

 - A new mount option "__DEVEL__sane_behavior" is added.  As indicated
   by the name, this option is to be used for development only at this
   point and generates a warning message when used.  Unfortunately,
   cgroup interface currently has too many brekages and inconsistencies
   to implement a consistent and unified hierarchy on top.  The new flag
   is used to collect the behavior changes which are necessary to
   implement consistent unified hierarchy.  It's likely that this flag
   won't be used verbatim when it becomes ready but will be enabled
   implicitly along with unified hierarchy.

   The option currently disables some of broken behaviors in cgroup core
   and also .use_hierarchy switch in memcg (will be routed through -mm),
   which can be used to make very unusual hierarchy where nesting is
   partially honored.  It will also be used to implement hierarchy
   support for blk-throttle which would be impossible otherwise without
   introducing a full separate set of control knobs.

   This is essentially versioning of interface which isn't very nice but
   at this point I can't see any other options which would allow keeping
   the interface the same while moving towards hierarchy behavior which
   is at least somewhat sane.  The planned unified hierarchy is likely
   to require some level of adaptation from userland anyway, so I think
   it'd be best to take the chance and update the interface such that
   it's supportable in the long term.

   Maintaining the existing interface does complicate cgroup core but
   shouldn't put too much strain on individual controllers and I think
   it'd be manageable for the foreseeable future.  Maybe we'll be able
   to drop it in a decade.

Fix up conflicts (including a semantic one adding a new #include to ppc
that was uncovered by header the file changes) as per Tejun.

* 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (45 commits)
  cpuset: fix compile warning when CONFIG_SMP=n
  cpuset: fix cpu hotplug vs rebuild_sched_domains() race
  cpuset: use rebuild_sched_domains() in cpuset_hotplug_workfn()
  cgroup: restore the call to eventfd->poll()
  cgroup: fix use-after-free when umounting cgroupfs
  cgroup: fix broken file xattrs
  devcg: remove parent_cgroup.
  memcg: force use_hierarchy if sane_behavior
  cgroup: remove cgrp->top_cgroup
  cgroup: introduce sane_behavior mount option
  move cgroupfs_root to include/linux/cgroup.h
  cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix
  cgroup: make cgroup_path() not print double slashes
  Revert "cgroup: remove bind() method from cgroup_subsys."
  perf: make perf_event cgroup hierarchical
  cgroup: implement cgroup_is_descendant()
  cgroup: make sure parent won't be destroyed before its children
  cgroup: remove bind() method from cgroup_subsys.
  devcg: remove broken_hierarchy tag
  cgroup: remove cgroup_lock_is_held()
  ...
2013-04-29 19:14:20 -07:00
Linus Torvalds 46d9be3e5e Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "A lot of activities on workqueue side this time.  The changes achieve
  the followings.

   - WQ_UNBOUND workqueues - the workqueues which are per-cpu - are
     updated to be able to interface with multiple backend worker pools.
     This involved a lot of churning but the end result seems actually
     neater as unbound workqueues are now a lot closer to per-cpu ones.

   - The ability to interface with multiple backend worker pools are
     used to implement unbound workqueues with custom attributes.
     Currently the supported attributes are the nice level and CPU
     affinity.  It may be expanded to include cgroup association in
     future.  The attributes can be specified either by calling
     apply_workqueue_attrs() or through /sys/bus/workqueue/WQ_NAME/* if
     the workqueue in question is exported through sysfs.

     The backend worker pools are keyed by the actual attributes and
     shared by any workqueues which share the same attributes.  When
     attributes of a workqueue are changed, the workqueue binds to the
     worker pool with the specified attributes while leaving the work
     items which are already executing in its previous worker pools
     alone.

     This allows converting custom worker pool implementations which
     want worker attribute tuning to use workqueues.  The writeback pool
     is already converted in block tree and there are a couple others
     are likely to follow including btrfs io workers.

   - WQ_UNBOUND's ability to bind to multiple worker pools is also used
     to make it NUMA-aware.  Because there's no association between work
     item issuer and the specific worker assigned to execute it, before
     this change, using unbound workqueue led to unnecessary cross-node
     bouncing and it couldn't be helped by autonuma as it requires tasks
     to have implicit node affinity and workers are assigned randomly.

     After these changes, an unbound workqueue now binds to multiple
     NUMA-affine worker pools so that queued work items are executed in
     the same node.  This is turned on by default but can be disabled
     system-wide or for individual workqueues.

     Crypto was requesting NUMA affinity as encrypting data across
     different nodes can contribute noticeable overhead and doing it
     per-cpu was too limiting for certain cases and IO throughput could
     be bottlenecked by one CPU being fully occupied while others have
     idle cycles.

  While the new features required a lot of changes including
  restructuring locking, it didn't complicate the execution paths much.
  The unbound workqueue handling is now closer to per-cpu ones and the
  new features are implemented by simply associating a workqueue with
  different sets of backend worker pools without changing queue,
  execution or flush paths.

  As such, even though the amount of change is very high, I feel
  relatively safe in that it isn't likely to cause subtle issues with
  basic correctness of work item execution and handling.  If something
  is wrong, it's likely to show up as being associated with worker pools
  with the wrong attributes or OOPS while workqueue attributes are being
  changed or during CPU hotplug.

  While this creates more backend worker pools, it doesn't add too many
  more workers unless, of course, there are many workqueues with unique
  combinations of attributes.  Assuming everything else is the same,
  NUMA awareness costs an extra worker pool per NUMA node with online
  CPUs.

  There are also a couple things which are being routed outside the
  workqueue tree.

   - block tree pulled in workqueue for-3.10 so that writeback worker
     pool can be converted to unbound workqueue with sysfs control
     exposed.  This simplifies the code, makes writeback workers
     NUMA-aware and allows tuning nice level and CPU affinity via sysfs.

   - The conversion to workqueue means that there's no 1:1 association
     between a specific worker, which makes writeback folks unhappy as
     they want to be able to tell which filesystem caused a problem from
     backtrace on systems with many filesystems mounted.  This is
     resolved by allowing work items to set debug info string which is
     printed when the task is dumped.  As this change involves unifying
     implementations of dump_stack() and friends in arch codes, it's
     being routed through Andrew's -mm tree."

* 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (84 commits)
  workqueue: use kmem_cache_free() instead of kfree()
  workqueue: avoid false negative WARN_ON() in destroy_workqueue()
  workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity
  workqueue: implement NUMA affinity for unbound workqueues
  workqueue: introduce put_pwq_unlocked()
  workqueue: introduce numa_pwq_tbl_install()
  workqueue: use NUMA-aware allocation for pool_workqueues
  workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq()
  workqueue: map an unbound workqueues to multiple per-node pool_workqueues
  workqueue: move hot fields of workqueue_struct to the end
  workqueue: make workqueue->name[] fixed len
  workqueue: add workqueue->unbound_attrs
  workqueue: determine NUMA node of workers accourding to the allowed cpumask
  workqueue: drop 'H' from kworker names of unbound worker pools
  workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]
  workqueue: move pwq_pool_locking outside of get/put_unbound_pool()
  workqueue: fix memory leak in apply_workqueue_attrs()
  workqueue: fix unbound workqueue attrs hashing / comparison
  workqueue: fix race condition in unbound workqueue free path
  workqueue: remove pwq_lock which is no longer used
  ...
2013-04-29 19:07:40 -07:00
Linus Torvalds ce8aa48929 Merge branch 'for-3.10-async' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull async update from Tejun Heo:
 "This contains three cleanup patches for async from Lai.  All three
  patches are essentially cosmetic."

* 'for-3.10-async' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  async: rename and redefine async_func_ptr
  async: remove unused @node from struct async_domain
  async: simplify lowest_in_progress()
2013-04-29 19:06:59 -07:00
Akinobu Mita 8d564368a9 net: rename random32 to prandom
Commit 496f2f93b1 ("random32: rename random32 to prandom") renamed
random32() and srandom32() to prandom_u32() and prandom_seed()
respectively.

net_random() and net_srandom() need to be redefined with prandom_* in
order to finish the naming transition.

While I'm at it, enclose macro argument of net_srandom() with parenthesis.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:44 -07:00
Jeff Layton a66c04b453 inotify: convert inotify_add_to_idr() to use idr_alloc_cyclic()
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:41 -07:00
Jeff Layton 3e6628c4b3 idr: introduce idr_alloc_cyclic()
As Tejun points out, there are several users of the IDR facility that
attempt to use it in a cyclic fashion.  These users are likely to see
-ENOSPC errors after the counter wraps one or more times however.

This patchset adds a new idr_alloc_cyclic routine and converts several
of these users to it.  Many of these users are in obscure parts of the
kernel, and I don't have a good way to test some of them.  The change is
pretty straightforward though, so hopefully it won't be an issue.

There is one other cyclic user of idr_alloc that I didn't touch in
ipc/util.c.  That one is doing some strange stuff that I didn't quite
understand, but it looks like it should probably be converted later
somehow.

This patch:

Thus spake Tejun Heo:

    Ooh, BTW, the cyclic allocation is broken.  It's prone to -ENOSPC
    after the first wraparound.  There are several cyclic users in the
    kernel and I think it probably would be best to implement cyclic
    support in idr.

This patch does that by adding new idr_alloc_cyclic function that such
users in the kernel can use.  With this, there's no need for a caller to
keep track of the last value used as that's now tracked internally.  This
should prevent the ENOSPC problems that can hit when the "last allocated"
counter exceeds INT_MAX.

Later patches will convert existing cyclic users to the new interface.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Tom Tucker <tom@opengridcomputing.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:41 -07:00
Namjae Jeon ea3983ace6 fat: restructure export_operations
Define two nfs export_operation structures,one for 'stale_rw' mounts and
the other for 'nostale_ro'.  The latter uses i_pos as a basis for encoding
and decoding file handles.

Also, assign i_pos to kstat->ino.  The logic for rebuilding the inode is
added in the subsequent patches.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ravishankar N <ravi.n1@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:40 -07:00
Jingoo Han 6636a9944b drivers/rtc/class.c: use struct device as the first argument for devm_rtc_device_register()
Other devm_* APIs use 'struct device *dev' as the first argument.  Thus,
in order to sync with other devm_* functions, struct device is used as
the first argument for devm_rtc_device_register().

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:22 -07:00
Jingoo Han 3e217b6602 rtc: add devm_rtc_device_{register,unregister}()
These functions allow the driver core to automatically clean up any
allocation made by rtc drivers.  Thus it simplifies the error paths.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:21 -07:00
Andy Shevchenko 2e0fb404c8 lib, net: make isodigit() public and use it
There are at least two users of isodigit().  Let's make it a public
function of ctype.h.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:19 -07:00
Matus Ujhelyi 4d22f8c306 drivers/video/backlight/tps65217_bl.c add default brightness value option
Signed-off-by: Matus Ujhelyi <matus.ujhelyi@streamunlimited.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:19 -07:00
Kim, Milo c365e59d47 backlight: lp855x: remove duplicate platform data
The 'load_new_rom_data' was used for checking whether new ROM data should
be updated or not.

However, we can decide it with 'size_program' data.  If the size is
greater than 0, it means updating ROM area is required.  Otherwise, the
default ROM data will be used.  Therefore, this duplicate platform data
can be removed.

Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:19 -07:00
Kim, Milo 98e35be2ba backlight: lp855x: fix initial brightness type
Valid range of the brightness is from 0 to 255, so initial brightness
is changed from integer to u8.

Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:18 -07:00
Kim, Milo 0b81857339 backlight: lp855x: move backlight mode platform data
The brightness of LP855x devices is controlled by I2C register or PWM
input .  This mode was selected through the platform data, but it can be
chosen by the driver internally without platform data configuration.

How to decide the control mode:
  If the PWM period has specific value, the mode is PWM input.
  On the other hand, the mode is register-based.
  This mode selection is done on the _probe().

Move 'mode' from a header file to the driver private data structure,
'lp855 x'.  And correlated code was replaced.

Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:18 -07:00
Kim, Milo 600ffd33d0 backlight: lp855x: convert a type of device name
Configurable data, backlight device name is set to constant character type.

Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:18 -07:00
Andrew Bresticker 46e1915eef drivers/video/backlight/platform_lcd.c: introduce probe callback
Platform LCD devices may need to do some device-specific initialization
before they can be used (regulator or GPIO setup, for example), but
currently the driver does not support any way of doing this.  This patch
adds a probe() callback to plat_lcd_data which platform LCD devices can
set to indicate that device-specific initialization is needed.

Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Acked-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:18 -07:00
Andrew Morton 1b2c289b4f include/linux/printk.h: include stdarg.h
printk.h uses va_list but doesn't include stdarg.h.  Hence printk.h is
unusable unless its includer has already included kernel.h (which includes
stdarg.h).

Remove the dependency by including stdarg.h in printk.h

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:13 -07:00
Thomas Gleixner d0380e6c3c early_printk: consolidate random copies of identical code
The early console implementations are the same all over the place.  Move
the print function to kernel/printk and get rid of the copies.

[akpm@linux-foundation.org: arch/mips/kernel/early_printk.c needs kernel.h for va_list]
[paul.gortmaker@windriver.com: sh4: make the bios early console support depend on EARLY_PRINTK]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Richard Weinberger <richard@nod.at>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:13 -07:00
zhangwei(Jovi) 07c65f4d1a printk/tracing: rework console tracing
Commit 7ff9554bb5 ("printk: convert byte-buffer to variable-length
record buffer") removed start and end parameters from
call_console_drivers, but those parameters still exist in
include/trace/events/printk.h.

Without start and end parameters handling, printk tracing became more
simple as: trace_console(text, len);

Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:13 -07:00
Guenter Roeck 2fb0815c9e gcc4: disable __compiletime_object_size for GCC 4.6+
__builtin_object_size is known to be broken on gcc 4.6+.
See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48880 for details.

This causes unnecssary build warnings and errors such as

  In function 'copy_from_user', inlined from 'sb16_copy_from_user'
	at sound/oss/sb_audio.c:878:22:
  arch/x86/include/asm/uaccess_32.h:211:26: error: call to 'copy_from_user_overflow'
	declared with attribute error: copy_from_user() buffer size is not provably correct
  make[3]: [sound/oss/sb_audio.o] Error 1 (ignored)

Disable it where broken.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:13 -07:00
Philipp Zabel 657eee7d25 media: coda: use genalloc API
This patch depends on "genalloc: add devres support, allow to find a
managed pool by device", which provides the of_get_named_gen_pool and
dev_get_gen_pool functions.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Acked-by: Javier Martin <javier.martin@vista-silicon.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Dong Aisheng <dong.aisheng@linaro.org>
Cc: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Huang Shijie <shijie8@gmail.com>
Cc: Matt Porter <mporter@ti.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:13 -07:00
Philipp Zabel 9375db07ad genalloc: add devres support, allow to find a managed pool by device
This patch adds three exported functions to lib/genalloc.c:
devm_gen_pool_create, dev_get_gen_pool, and of_get_named_gen_pool.

devm_gen_pool_create is a managed version of gen_pool_create that keeps
track of the pool via devres and allows the management code to
automatically destroy it after device removal.

dev_get_gen_pool retrieves the gen_pool for a given device, if it was
created with devm_gen_pool_create, using devres_find.

of_get_named_gen_pool retrieves the gen_pool for a given device node and
property name, where the property must contain a phandle pointing to a
platform device node.  The corresponding platform device is then fed into
dev_get_gen_pool and the resulting gen_pool is returned.

[akpm@linux-foundation.org: make the of_get_named_gen_pool() stub static, fixing a zillion link errors]
[akpm@linux-foundation.org: squish "struct device declared inside parameter list" warning]
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Tested-by: Michal Simek <monstr@monstr.eu>
Cc: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Matt Porter <mporter@ti.com>
Cc: Dong Aisheng <dong.aisheng@linaro.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Javier Martin <javier.martin@vista-silicon.com>
Cc: Huang Shijie <shijie8@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:13 -07:00
Linus Torvalds 73154383f0 Merge branch 'akpm' (incoming from Andrew)
Merge first batch of fixes from Andrew Morton:

 - A couple of kthread changes

 - A few minor audit patches

 - A number of fbdev patches.  Florian remains AWOL so I'm picking up
   some of these.

 - A few kbuild things

 - ocfs2 updates

 - Almost all of the MM queue

(And in the meantime, I already have the second big batch from Andrew
pending in my mailbox ;^)

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (149 commits)
  memcg: take reference before releasing rcu_read_lock
  mem hotunplug: fix kfree() of bootmem memory
  mmKconfig: add an option to disable bounce
  mm, nobootmem: do memset() after memblock_reserve()
  mm, nobootmem: clean-up of free_low_memory_core_early()
  fs/buffer.c: remove unnecessary init operation after allocating buffer_head.
  numa, cpu hotplug: change links of CPU and node when changing node number by onlining CPU
  mm: fix memory_hotplug.c printk format warning
  mm: swap: mark swap pages writeback before queueing for direct IO
  swap: redirty page if page write fails on swap file
  mm, memcg: give exiting processes access to memory reserves
  thp: fix huge zero page logic for page with pfn == 0
  memcg: avoid accessing memcg after releasing reference
  fs: fix fsync() error reporting
  memblock: fix missing comment of memblock_insert_region()
  mm: Remove unused parameter of pages_correctly_reserved()
  firmware, memmap: fix firmware_map_entry leak
  mm/vmstat: add note on safety of drain_zonestat
  mm: thp: add split tail pages to shrink page list in page reclaim
  mm: allow for outstanding swap writeback accounting
  ...
2013-04-29 17:29:08 -07:00
Linus Torvalds 362ed48dee The common clock framework changes for 3.10 include many fixes for
existing platforms, as well as adoption of the framework by new
 platforms and devices.  Some long-needed fixes to the core framework are
 here as well as new features such as improved initialization of clocks
 from DT as well as framework reentrancy for nested clock operations.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRfqtLAAoJEDqPOy9afJhJsxwP/RLvfeeMIU3804ahVNK2C59h
 ehJ06ZP+b0u0A7+YSC7CX1pHXIFW+UoZgYLJiLdV2kEdpOIKMELZyUcEVB97u1Of
 TVlsmHfTLv2zVAq/LYRVSKFYeMUd/6RRoq7Cm6hoj638IVeXG7C+8pei2aVZe++t
 1ENmb4UGFJ7NLfpE5zQ3fEuIfHfuWA8Od6SmPaV/YG5Io8HgkDGF3/tCJURJGII6
 xLN2Rh8qbFktJLVvKe6yLyvUEZiWh8A6HNPyNiFYYGX11wU76zK2wMN3BW6Nn/kW
 3PubzISoKRaoCZvuVK+CoLWnhFl2LteFVVmL1TBc/jxJe6q+rLX33sXl1q9K+SLt
 POnHf/7nDyO3zbZWgfRR1r3FdeZqdLYw8HVsLcOKFcv9n1UligzuUNml5PklKwNh
 BDMmSo5ytS1QPV1e9ZtVrk6IyvDyrenwfDW1Mw43ST6D23FVrivywB4X9ur6WljI
 d1/CBvQXQZ11Hd4OAvqRL8QYFJvc5WlERjSd1j6I6XS6xioKOTKMkUC/KpRcCid9
 avA6mJ5k/a1jTojvh2wl37paI//OzY0VDlxRSeMZIu9Dsn29DnPlE5CLg535Ovu+
 mn9OtLFEDNnlgWCMQYUehGd7ITgtwrB/fxxNeBbMYjDz4AIirR2BIvMR7I8CMTQz
 M0rHu8NpwKH6eqC6kAup
 =+LO3
 -----END PGP SIGNATURE-----

Merge tag 'clk-for-linus-3.10' of git://git.linaro.org/people/mturquette/linux

Pull clock framework update from Michael Turquette:
 "The common clock framework changes for 3.10 include many fixes for
  existing platforms, as well as adoption of the framework by new
  platforms and devices.

  Some long-needed fixes to the core framework are here as well as new
  features such as improved initialization of clocks from DT as well as
  framework reentrancy for nested clock operations."

* tag 'clk-for-linus-3.10' of git://git.linaro.org/people/mturquette/linux: (44 commits)
  clk: add clk_ignore_unused option to keep boot clocks on
  clk: ux500: fix mismatched types
  clk: vexpress: Add separate SP810 driver
  clk: si5351: make clk-si5351 depend on CONFIG_OF
  clk: export __clk_get_flags for modular clock providers
  clk: vt8500: Missing breaks in vtwm_pll_round_rate/_set_rate.
  clk: sunxi: Unify oscillator clock
  clk: composite: allow fixed rates & fixed dividers
  clk: composite: rename 'div' references to 'rate'
  clk: add si5351 i2c common clock driver
  clk: add device tree fixed-factor-clock binding support
  clk: Properly handle notifier return values
  clk: ux500: abx500: Define clock tree for ab850x
  clk: ux500: Add support for sysctrl clocks
  clk: mvebu: Fix valid value range checking for cpu_freq_select
  clk: Fixup locking issues for clk_set_parent
  clk: Fixup errorhandling for clk_set_parent
  clk: Restructure code for __clk_reparent
  clk: sunxi: drop an unnecesary kmalloc
  clk: sunxi: drop CLK_IGNORE_UNUSED
  ...
2013-04-29 16:43:54 -07:00
Linus Torvalds 61f3d0a988 spi: Updates for v3.10
A fairly quiet release for SPI, mainly driver work.  A few highlights:
 
 - Supports bits per word compatibility checking in the core.
 - Allow use of the IP used in Freescale SPI controllers outside
   Freescale SoCs.
 - DMA support for the Atmel SPI driver.
 - New drivers for the BCM2835 and Tegra114.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRfoxOAAoJELSic+t+oim9P2IP/0bKjrSdJ3aypqi5k8hF7Sw0
 ksWYyYQ7yVIQlr+2zCIn3YO69/Z8OzJf2skGW7NW9TZ/mSXp0NXB/E4v5+fB+d4h
 +Dj/eFQG/T39RLSvuHsuJP0VAFTzigFM2DGZ4yQDUIyxZQiG4U3R50rOmj91GeDK
 s00By0nVAQVnnHcQJ4KDr82Z30NoPW32caz1GzB3xCkXO3HnDSNXnOHa93fxrVGx
 iyN52gkmLyyD9MwxzMHvxIg/HY3/US5i7RkgUuWRhVaG+gwEOrfrC9PmniFyJUf/
 qbqnoP2xQB50eo4DeCMZDknxgWb7n8S/FbmXYxUcVZVqYbkNuHEAP0SqroMlgc55
 cVu0zQ84qwwU3jmngg7CkVvqxw2L3znYjEr0StfxmpJwr93Tn0yaWLjzTuY57zaz
 BWuHG0SK1+wghCwdzqQBpRY7yRg9lE+1S81YQoLRYTqYz6fT6TwhLpdTUNpP2zIu
 Ue1rM3JEgYr5TsOF/vZV8MuNXvodhCvzsv95Mm5G2R3uSCN/0LApVi6A96AAk6ms
 WpFvqSZ2+ugEVE+ZUgmOqXjUuOTKxooTwfIZEogXKabBtHmGCGLXG7wwG5X4thBy
 UJgfvm0LE+zmAGVGmZycnyfDu+JSs1ofnkUGJb28edyP4HOlbm+6gHvxGMf2iUpw
 nqrbZ2lvUdiu69SGeV53
 =+Omc
 -----END PGP SIGNATURE-----

Merge tag 'spi-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi updates from Mark Brown:
 "A fairly quiet release for SPI, mainly driver work.  A few highlights:

   - Supports bits per word compatibility checking in the core.
   - Allow use of the IP used in Freescale SPI controllers outside
     Freescale SoCs.
   - DMA support for the Atmel SPI driver.
   - New drivers for the BCM2835 and Tegra114"

* tag 'spi-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (68 commits)
  spi-topcliff-pch: fix to use list_for_each_entry_safe() when delete list items
  spi-topcliff-pch: missing platform_driver_unregister() on error in pch_spi_init()
  ARM: dts: add pinctrl property for spi node for atmel SoC
  ARM: dts: add spi nodes for the atmel boards
  ARM: dts: add spi nodes for atmel SoC
  ARM: at91: add clocks for spi dt entries
  spi/spi-atmel: add dmaengine support
  spi/spi-atmel: add flag to controller data for lock operations
  spi/spi-atmel: add physical base address
  spi/sirf: fix MODULE_DEVICE_TABLE
  MAINTAINERS: Add git repository and update my address
  spi/s3c64xx: Check for errors in dmaengine prepare_transfer()
  spi/s3c64xx: Fix non-dmaengine usage
  spi: omap2-mcspi: fix error return code in omap2_mcspi_probe()
  spi/s3c64xx: let device core setup the default pin configuration
  MAINTAINERS: Update Grant's email address and maintainership
  spi: omap2-mcspi: Fix transfers if DMADEVICES is not set
  spi: s3c64xx: move to generic dmaengine API
  spi-gpio: init CS before spi_bitbang_setup()
  spi: spi-mpc512x-psc: let transmiter/receiver enabled when in xfer loop
  ...
2013-04-29 16:38:41 -07:00
Linus Torvalds 8ded8d4e4f regulator: Updates for v3.10
The diffstat and changelog here is dominated by Lee Jones' heroic
 efforts to sync the ab8500 driver that's been maintained out of tree
 with mainline (plus Axel's cleanup work on the results) but there's a
 few other things here:
 
 - Axel Lin added regulator_map_voltage_ascend() optimising a common
   pattern for drivers using the core code.
 - Milo Kim tought the regulator core to handle regulators sharing an
   enable GPIO, avoiding the need to do hacks to support such systems.
 - Andrew Bresticker added code to handle missing supplies for regulators
   more sensibly for device tree systems, reducing the need for stubbing
   there.
 
 plus the usual batch of driver specific updates and fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRfohQAAoJELSic+t+oim9MvgP/2PxUNYQ8XIEXwCFP4GGsS8x
 NxFRAEpRJPBqa6qnTDywnN5VyBykTqKDfcA0EILW7Hz+EzWKpqucsloli0e7B4VE
 PMXf5s5YPwRLDslAv1VYMzKbGzzB2jDWlVjhXtiBRNXSUv2zwR1MWnoguQzSXq8J
 pE4uh6u2/5FUO/upcQ9LxmmBGr2CFZ/egKK3HvAWpidWOO9ykzIA8VzAD5dKAwzV
 Bo63ia51ymxn1HyokhPtIpko4+J6KYO3Lts8vi+g1DT1aA1nAHTcN4ewUl1v5NkD
 xFBpt06m95AQ7y9oQ1gdcGXDefnfdrzPtFZkofVVJpYNMtcbxOoO+WJk2ZUBjhrZ
 cpVmvqELfRp/eMr1xe1XJIuLelyE+bOCx36F5FQgGCQNI+gNWT2SlRCWeH4VLhh+
 Zeuqnhlce5Chv0wsjrNk4biwj981V3uKNo/n/O9mDQAXLYC2AVGJbXL04EcoxXag
 ButmfjWshYUzEXmxpXD9+pas4EMsuziWqCQjtuVRtTf9XSWkps39mitPRu3h2aWg
 IwWlk3/eMI3WPr7eE7vcu5PvOnQ9Nm6fasx3NhxjiYBVwktyprV3tMhKDBjt8qdG
 frzOfimOUGumeKinFm7tfP5EQE4prfwpN/kT+PPleNeXARe3AWKLsatO1mEtey9b
 t1PC8z5k8/9rBDIWvzq3
 =rX43
 -----END PGP SIGNATURE-----

Merge tag 'regulator-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator updates from Mark Brown:
 "The diffstat and changelog here is dominated by Lee Jones' heroic
  efforts to sync the ab8500 driver that's been maintained out of tree
  with mainline (plus Axel's cleanup work on the results) but there's a
  few other things here:

   - Axel Lin added regulator_map_voltage_ascend() optimising a common
     pattern for drivers using the core code.
   - Milo Kim tought the regulator core to handle regulators sharing an
     enable GPIO, avoiding the need to do hacks to support such systems.
   - Andrew Bresticker added code to handle missing supplies for
     regulators more sensibly for device tree systems, reducing the need
     for stubbing there.

  plus the usual batch of driver specific updates and fixes"

* tag 'regulator-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (152 commits)
  regulator: mc13892: Fix MC13892_SWITCHERS0_SWxHI bit in set_voltage_sel
  regulator: Remove NULL test before calling regulator_unregister()
  regulator: mc13783: Add device tree probe support
  regulator: mc13xxx: Add warning of incorrect names of regulators
  regulator: max77686: Don't update max77686->opmode if update register fails
  regulator: max8952: Add missing config.of_node setting for regulator register
  regulator: ab3100: Fix regulator register error handling
  regulator: tps6524x: Use regulator_map_voltage_ascend
  regulator: lp8788-buck: Use regulator_map_voltage_ascend
  regulator: lp872x: Use regulator_map_voltage_ascend
  regulator: mc13892: Use regulator_map_voltage_ascend for mc13892_sw_regulator_ops
  regulator: tps65023: Use regulator_map_voltage_ascend
  regulator: tps65023: Merge tps65020 ldo1 and ldo2 vsel table
  regulator: tps6507x: Use regulator_map_voltage_ascend
  regulator: mc13892: Fix MC13892_SWITCHERS0_SWxHI bit in set_voltage_sel
  regulator: ab3100: device tree support
  regulator: ab3100: refactor probe to use IDs
  regulator: max8973: Don't override control1 variable when set ramp delay bits
  regulator: tps80031: Convert tps80031_dcdc_ops to [get|set]_voltage_sel_regmap
  regulator: tps80031: Fix LDO2 track mode for TPS80031 or TPS80032-ES1.0
  ...
2013-04-29 16:32:25 -07:00
Linus Torvalds 7b053842b9 regmap: Updates for v3.10
In user visible terms just a couple of enhancements here, though there
 was a moderate amount of refactoring required in order to support the
 register cache sync performance improvements.
 
 - Support for block and asynchronous I/O during register cache syncing;
   this provides a use case dependant performance improvement.
 - Additional debugfs information on the memory consuption and register
   set.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIbBAABAgAGBQJRfoZRAAoJELSic+t+oim9gXAP+JhAihmIQJlhUxZkXojFhClD
 SKNWuFHmFC6VGndv52HPZR7nLN6hIlT4VUqk/rEw58R/RTqGuuWGc0KnKJf7ipid
 6CdutuOP6q8mgs02kGKFAWRbSl++IXJ4TwvBbiyDMBmmngFoJY+gnmtnpP+PzcAd
 LA3fn54jDWzBKCSlFBEC5acYxOMPmzm2uW13mO8Gy1RJrUkXfOemEFsyP0NVNJys
 N0Zslp4nUUWmEu41UujuAUGZ7xXnnNQF5R4/RdS3+p22+sCEe7/mhLU1AxalUT4c
 m9h9U2UKoXqRBuFQ9kRGwM2Gufjg33DoB0ExqIDEgaD2kRdAdAo/WhTHLxTiQEfq
 6YXGZYwl0QUC1KcUwUWJZIq/nECibaYDAoyooNzLQNPAbbO6gdjsTIVCaZK8U/k6
 D8bWAM4eRbv6xwXEd8rKW5+2f41dnsb5O3OgbdEEBZnbQ8UizI9KDGbPB3ARV2RI
 Xqn+lYZV/q/99Bb3Pn0oS6Ud/tz5BqN4w3N84H0KcvcRHXvYjkdQ6ulsterRykOa
 gYWfsCKTbm2C1zBLGDPXkDablodLZmzoCs4ajeIt6zIELNzuIsI3trprpT85RtrS
 cjYl61ECuypPYBIW4uzxxBk/FeiEjQ4ndgQ4MgVnUfx0NpmG2N9LlDc2r6i+UgV/
 EBxvYlPsEzQYLKoiJl8=
 =RG1W
 -----END PGP SIGNATURE-----

Merge tag 'regmap-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap updates from Mark Brown:
 "In user visible terms just a couple of enhancements here, though there
  was a moderate amount of refactoring required in order to support the
  register cache sync performance improvements.

   - Support for block and asynchronous I/O during register cache
     syncing; this provides a use case dependant performance
     improvement.
   - Additional debugfs information on the memory consuption and
     register set"

* tag 'regmap-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: (23 commits)
  regmap: don't corrupt work buffer in _regmap_raw_write()
  regmap: cache: Fix format specifier in dev_dbg
  regmap: cache: Make regcache_sync_block_raw static
  regmap: cache: Write consecutive registers in a single block write
  regmap: cache: Split raw and non-raw syncs
  regmap: cache: Factor out block sync
  regmap: cache: Factor out reg_present support from rbtree cache
  regmap: cache: Use raw I/O to sync rbtrees if we can
  regmap: core: Provide regmap_can_raw_write() operation
  regmap: cache: Provide a get address of value operation
  regmap: Cut down on the average # of nodes in the rbtree cache
  regmap: core: Make raw write available to regcache
  regmap: core: Warn on invalid operation combinations
  regmap: irq: Clarify error message when we fail to request primary IRQ
  regmap: rbtree Expose total memory consumption in the rbtree debugfs entry
  regmap: debugfs: Add a registers `range' file
  regmap: debugfs: Simplify calculation of `c->max_reg'
  regmap: cache: Store caches in native register format where possible
  regmap: core: Split out in place value parsing
  regmap: cache: Use regcache_get_value() to check if we updated
  ...
2013-04-29 16:31:26 -07:00
Joonsoo Kim b4def3509d mm, nobootmem: clean-up of free_low_memory_core_early()
Remove unused argument and make function static, because there is no user
outside of nobootmem.c

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:39 -07:00
Shaohua Li 5bc7b8aca9 mm: thp: add split tail pages to shrink page list in page reclaim
In page reclaim, huge page is split.  split_huge_page() adds tail pages
to LRU list.  Since we are reclaiming a huge page, it's better we
reclaim all subpages of the huge page instead of just the head page.
This patch adds split tail pages to shrink page list so the tail pages
can be reclaimed soon.

Before this patch, run a swap workload:
  thp_fault_alloc 3492
  thp_fault_fallback 608
  thp_collapse_alloc 6
  thp_collapse_alloc_failed 0
  thp_split 916

With this patch:
  thp_fault_alloc 4085
  thp_fault_fallback 16
  thp_collapse_alloc 90
  thp_collapse_alloc_failed 0
  thp_split 1272

fallback allocation is reduced a lot.

[akpm@linux-foundation.org: fix CONFIG_SWAP=n build]
Signed-off-by: Shaohua Li <shli@fusionio.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:38 -07:00
Seth Jennings 1eec6702a8 mm: allow for outstanding swap writeback accounting
To prevent flooding the swap device with writebacks, frontswap backends
need to count and limit the number of outstanding writebacks.  The
incrementing of the counter can be done before the call to
__swap_writepage().  However, the caller must receive a notification
when the writeback completes in order to decrement the counter.

To achieve this functionality, this patch modifies __swap_writepage() to
take the bio completion callback function as an argument.

end_swap_bio_write(), the normal bio completion function, is also made
non-static so that code doing the accounting can call it after the
accounting is done.

There should be no behavioural change to existing code.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:38 -07:00
Seth Jennings 2f772e6cad mm: break up swap_writepage() for frontswap backends
swap_writepage() is currently where frontswap hooks into the swap write
path to capture pages with the frontswap_store() function.  However, if
a frontswap backend wants to "resume" the writeback of a page to the
swap device, it can't call swap_writepage() as the page will simply
reenter the backend.

This patch separates swap_writepage() into a top and bottom half, the
bottom half named __swap_writepage() to allow a frontswap backend, like
zswap, to resume writeback beyond the frontswap_store() hook.

__add_to_swap_cache() is also made non-static so that the page for which
writeback is to be resumed can be added to the swap cache.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:38 -07:00
Anton Vorontsov 70ddf637ee memcg: add memory.pressure_level events
With this patch userland applications that want to maintain the
interactivity/memory allocation cost can use the pressure level
notifications.  The levels are defined like this:

The "low" level means that the system is reclaiming memory for new
allocations.  Monitoring this reclaiming activity might be useful for
maintaining cache level.  Upon notification, the program (typically
"Activity Manager") might analyze vmstat and act in advance (i.e.
prematurely shutdown unimportant services).

The "medium" level means that the system is experiencing medium memory
pressure, the system might be making swap, paging out active file
caches, etc.  Upon this event applications may decide to further analyze
vmstat/zoneinfo/memcg or internal memory usage statistics and free any
resources that can be easily reconstructed or re-read from a disk.

The "critical" level means that the system is actively thrashing, it is
about to out of memory (OOM) or even the in-kernel OOM killer is on its
way to trigger.  Applications should do whatever they can to help the
system.  It might be too late to consult with vmstat or any other
statistics, so it's advisable to take an immediate action.

The events are propagated upward until the event is handled, i.e.  the
events are not pass-through.  Here is what this means: for example you
have three cgroups: A->B->C.  Now you set up an event listener on
cgroups A, B and C, and suppose group C experiences some pressure.  In
this situation, only group C will receive the notification, i.e.  groups
A and B will not receive it.  This is done to avoid excessive
"broadcasting" of messages, which disturbs the system and which is
especially bad if we are low on memory or thrashing.  So, organize the
cgroups wisely, or propagate the events manually (or, ask us to
implement the pass-through events, explaining why would you need them.)

Performance wise, the memory pressure notifications feature itself is
lightweight and does not require much of bookkeeping, in contrast to the
rest of memcg features.  Unfortunately, as of current memcg
implementation, pages accounting is an inseparable part and cannot be
turned off.  The good news is that there are some efforts[1] to improve
the situation; plus, implementing the same, fully API-compatible[2]
interface for CONFIG_MEMCG=n case (e.g.  embedded) is also a viable
option, so it will not require any changes on the userland side.

[1] http://permalink.gmane.org/gmane.linux.kernel.cgroups/6291
[2] http://lkml.org/lkml/2013/2/21/454

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix CONFIG_CGROPUPS=n warnings]
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Leonid Moiseichuk <leonid.moiseichuk@nokia.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:38 -07:00
David Rientjes 4edd7ceff0 mm, hotplug: avoid compiling memory hotremove functions when disabled
__remove_pages() is only necessary for CONFIG_MEMORY_HOTREMOVE.  PowerPC
pseries will return -EOPNOTSUPP if unsupported.

Adding an #ifdef causes several other functions it depends on to also
become unnecessary, which saves in .text when disabled (it's disabled in
most defconfigs besides powerpc, including x86).  remove_memory_block()
becomes static since it is not referenced outside of
drivers/base/memory.c.

Build tested on x86 and powerpc with CONFIG_MEMORY_HOTREMOVE both enabled
and disabled.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:37 -07:00
Toshi Kani 825f787bb4 resource: add release_mem_region_adjustable()
Add release_mem_region_adjustable(), which releases a requested region
from a currently busy memory resource.  This interface adjusts the
matched memory resource accordingly even if the requested region does
not match exactly but still fits into.

This new interface is intended for memory hot-delete.  During bootup,
memory resources are inserted from the boot descriptor table, such as
EFI Memory Table and e820.  Each memory resource entry usually covers
the whole contigous memory range.  Memory hot-delete request, on the
other hand, may target to a particular range of memory resource, and its
size can be much smaller than the whole contiguous memory.  Since the
existing release interfaces like __release_region() require a requested
region to be exactly matched to a resource entry, they do not allow a
partial resource to be released.

This new interface is restrictive (i.e.  release under certain
conditions), which is consistent with other release interfaces,
__release_region() and __release_resource().  Additional release
conditions, such as an overlapping region to a resource entry, can be
supported after they are confirmed as valid cases.

There is no change to the existing interfaces since their restriction is
valid for I/O resources.

[akpm@linux-foundation.org: use GFP_ATOMIC under write_lock()]
[akpm@linux-foundation.org: switch back to GFP_KERNEL, less buggily]
[akpm@linux-foundation.org: remove unneeded and wrong kfree(), per Toshi]
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by : Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Cc: T Makphaibulchoke <tmac@hp.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:37 -07:00
Yijing Wang f1cb08798e mm: remove CONFIG_HOTPLUG ifdefs
CONFIG_HOTPLUG is going away as an option, cleanup CONFIG_HOTPLUG
ifdefs in mm files.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:37 -07:00
Andrew Shewmaker 4eeab4f558 mm: replace hardcoded 3% with admin_reserve_pages knob
Add an admin_reserve_kbytes knob to allow admins to change the hardcoded
memory reserve to something other than 3%, which may be multiple
gigabytes on large memory systems.  Only about 8MB is necessary to
enable recovery in the default mode, and only a few hundred MB are
required even when overcommit is disabled.

This affects OVERCOMMIT_GUESS and OVERCOMMIT_NEVER.

admin_reserve_kbytes is initialized to min(3% free pages, 8MB)

I arrived at 8MB by summing the RSS of sshd or login, bash, and top.

Please see first patch in this series for full background, motivation,
testing, and full changelog.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: make init_admin_reserve() static]
Signed-off-by: Andrew Shewmaker <agshew@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:36 -07:00