If an MSR based monitor is run in parallel this is not needed. This is the
default case on all/most Intel machines.
But when only sysfs info is read via cpupower monitor -m Idle_Stats (typically
the case for non root users) or when other monitors are PCI based (AMD),
Idle_Stats, read from sysfs can be totally bogus:
cpupower monitor -m Idle_Stats
PKG |CORE|CPU | POLL | C1-N | C3-N | C6-N
0| 0| 0| 0.00| 0.00| 0.24| 99.81
0| 0| 32| 0.00| 0.00| 0.00| 100.7
...
0| 17| 20| 0.00| 0.00| 0.00| 173.1
0| 17| 52| 0.00| 0.00| 0.07| 173.0
0| 18| 68| 0.00| 0.00| 0.00| 0.00
0| 18| 76| 0.00| 0.00| 0.00| 0.00
...
With the -c option all cores are woken up and the kernel
did update cpuidle statistics before reading out sysfs.
This causes some overhead. Therefore avoid if possible, use
if needed:
cpupower monitor -c -m Idle_Stats
PKG |CORE|CPU | POLL | C1-N | C3-N | C6-N
0| 0| 0| 0.00| 0.00| 0.00| 100.2
0| 0| 32| 0.00| 0.00| 0.00| 100.2
...
0| 8| 8| 0.00| 0.00| 0.00| 99.82
0| 8| 40| 0.00| 0.00| 0.00| 99.81
0| 9| 24| 0.00| 0.00| 0.00| 100.3
0| 9| 56| 0.00| 0.00| 0.00| 100.2
0| 16| 4| 0.00| 0.00| 0.00| 99.75
0| 16| 36| 0.00| 0.00| 0.00| 99.38
...
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The pkgs member of cpupower_topology is being used as the number of
cpu packages. As the comment in get_cpu_topology notes, the package ids
are not guaranteed to be contiguous. So, simply setting pkgs to the value
of the highest physical_package_id doesn't actually provide a count of
the number of cpu packages. Instead, calculate pkgs by setting it to
the number of distinct physical_packge_id values which is pretty easy
to do after the core_info structs are sorted. Calculating pkgs this
way also has the nice benefit of getting rid of a sign comparison warning
that GCC 4.6 was reporting.
Signed-off-by: Palmer Cox <p@lmercox.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpu_info member of cpupower_topology was being declared as an unnamed
structure. This member was then being malloced using the size of the
parent cpupower_topology * the number of cpus. This works
because cpu_info is smaller than cpupower_topology. However, there is
no guarantee that will always be the case. Making cpu_info its own
top level structure (named cpuid_core_info) allows for mallocing the actual
size of this structure. This also lets us get rid of a redefinition of
the structure in topology.c with slightly different field names.
Signed-off-by: Palmer Cox <p@lmercox.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix a variety of issues with sysfs_topology_read_file:
* The return value of sysfs_topology_read_file function was not properly
being checked for failure.
* The function was reading int valued sysfs variables and then returning
their value. So, even if a function was trying to check the return value
of this function, a caller would not be able to tell an failure code apart
from reading a negative value. This also conflicted with the comment on the
function which said that a return value of 0 indicated success.
* The function was parsing int valued sysfs values with strtoul instead of
strtol.
* The function was non-static even though it was only used in the
file it was declared in.
Signed-off-by: Palmer Cox <p@lmercox.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix minor warnings reported with GCC 4.6:
* The sysfs_write_file function is unused - remove it.
* The pr_mon_len in the print_header function is unsed - remove it.
Signed-off-by: Palmer Cox <p@lmercox.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The files generated by the Makefiles in the debug directories aren't listed
in the .gitignore file in the root of the cpupower tool which causes these
files to show up in the output of 'git status'.
Signed-off-by: Palmer Cox <p@lmercox.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The clean targets from the cpupower tools' Makefiles use brace expansion to
remove some generated files. However, the default shells on many systems do
not support this feature resulting in some generated files not being removed
by clean.
Signed-off-by: Palmer Cox <p@lmercox.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This fix black screen on resume issue that some people are
experiencing. There is a bug in the atombios code regarding
pll/crtc mapping. The atombios code reverse the logic for
the pll and crtc mapping.
agd5f: drop unnecessary crtc id check, cc stable in case
we miss 3.7.
This fixes the root cause that was worked around by commits:
drm/radeon: allocate PPLLs from low to high
drm/radeon/dce3: switch back to old pll allocation order for discrete
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Pull media fixes from Mauro Carvalho Chehab:
"For some media fixes:
- dvb_usb_v2: some fixes at the core
- Some fixes on some embedded drivers: soc_camera, adv7604, omap3isp,
exynos/s5p
- Several Exynos4/5 camera fixes
- a fix at stv0900 driver
- a few USB ID additions to detect more variants of rtl28xxu-based
sticks"
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (25 commits)
[media] rtl28xxu: 0ccd:00d7 TerraTec Cinergy T Stick+
[media] rtl28xxu: 1d19:1102 Dexatek DK mini DVB-T Dongle
[media] mt9v022: fix the V4L2_CID_EXPOSURE control
[media] mx2_camera: fix missing unlock on error in mx2_start_streaming()
[media] media: omap1_camera: fix const cropping related warnings
[media] media: mx1_camera: use the default .set_crop() implementation
[media] media: mx2_camera: fix const cropping related warnings
[media] media: mx3_camera: fix const cropping related warnings
[media] media: pxa_camera: fix const cropping related warnings
[media] media: sh_mobile_ceu_camera: fix const cropping related warnings
[media] media: sh_vou: fix const cropping related warnings
[media] adv7604: restart STDI once if format is not found
[media] adv7604: use presets where possible
[media] adv7604: Replace prim_mode by mode
[media] adv7604: cleanup references
[media] dvb_usb_v2: switch interruptible mutex to normal
[media] dvb_usb_v2: fix pid_filter callback error logging
[media] exynos-gsc: change driver compatible string
[media] omap3isp: Fix warning caused by bad subdev events operations prototypes
[media] omap3isp: video: Fix warning caused by bad vidioc_s_crop prototype
...
Commit eddb079deb created a regression in the writepages codepath.
Previously, whenever it needed to check the size of the file, it did so
by consulting the inode->i_size field directly. With that patch, the
i_size was fetched once on entry into the writepages code and that value
was used henceforth.
If the file is changing size though (for instance, if someone is writing
to it or has truncated it), then that value is likely to be wrong. This
can lead to data corruption. Pages past the EOF at the time that the
writepages call was issued may be silently dropped and ignored because
cifs_writepages wrongly assumes that the file must have been truncated
in the interim.
Fix cifs_writepages to properly fetch the size from the inode->i_size
field instead to properly account for this possibility.
Original bug report is here:
https://bugzilla.kernel.org/show_bug.cgi?id=50991
Reported-and-Tested-by: Maxim Britov <ungifted01@gmail.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
On some platforms, _TMP and _CRT/_HOT/_PSV/_ACx have dependency.
And there is no way for OS to detect this dependency.
commit 9bcb811896 shows us a problem
that _TMP must be evaluate after _CRT/_HOT/_PSV/_ACx, or else
firmware will shutdown the system.
But the machine in https://bugzilla.kernel.org/show_bug.cgi?id=43284
shows us that _PSV would return valid value only if _TMP has been
evaluated once.
With this patch, all of the control methods will be evaluated once,
in the _CRT/_HOT/_PSV/_CRT/_TMP order, before they are actually used.
[rjw: Added a local variable for the handle and modified the loop
slightly.]
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: katabami <katabami@lavabit.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There appear to have been some 486 clones, including the "enhanced"
version of Am486, which have CPUID but not CR4. These 486 clones had
only the FPU flag, if any, unlike the Intel 486s with CPUID, which
also had VME and therefore needed CR4.
Therefore, look at the basic CPUID flags and require at least one bit
other than bit 0 before we modify CR4.
Thanks to Christian Ludloff of sandpile.org for confirming this as a
problem.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Many cpuidle drivers measure their time spent in an idle state by
reading the wallclock time before and after idling and calculating the
difference. This leads to erroneous results when the wallclock time gets
updated by another processor in the meantime, adding that clock
adjustment to the idle state's time counter.
If the clock adjustment was negative, the result is even worse due to an
erroneous cast from int to unsigned long long of the last_residency
variable. The negative 32 bit integer will zero-extend and result in a
forward time jump of roughly four billion milliseconds or 1.3 hours on
the idle state residency counter.
This patch changes all affected cpuidle drivers to either use the
monotonic clock for their measurements or make use of the generic time
measurement wrapper in cpuidle.c, which was already working correctly.
Some superfluous CLIs/STIs in the ACPI code are removed (interrupts
should always already be disabled before entering the idle function, and
not get reenabled until the generic wrapper has performed its second
measurement). It also removes the erroneous cast, making sure that
negative residency values are applied correctly even though they should
not appear anymore.
Signed-off-by: Julius Werner <jwerner@chromium.org>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix cpufreq_gov_ondemand to skip CPU where another governor is used.
The bug present itself as NULL pointer access on the mutex_lock() call,
an can be reproduced on an SMP machine by setting the default governor
to anything other than ondemand, setting a single CPU's governor to
ondemand, then changing the sample rate by writing on:
> /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate
Backtrace:
Nov 26 17:36:54 balto kernel: [ 839.585241] BUG: unable to handle kernel NULL pointer dereference at (null)
Nov 26 17:36:54 balto kernel: [ 839.585311] IP: [<ffffffff8174e082>] __mutex_lock_slowpath+0xb2/0x170
[snip]
Nov 26 17:36:54 balto kernel: [ 839.587005] Call Trace:
Nov 26 17:36:54 balto kernel: [ 839.587030] [<ffffffff8174da82>] mutex_lock+0x22/0x40
Nov 26 17:36:54 balto kernel: [ 839.587067] [<ffffffff81610b8f>] store_sampling_rate+0xbf/0x150
Nov 26 17:36:54 balto kernel: [ 839.587110] [<ffffffff81031e9c>] ? __do_page_fault+0x1cc/0x4c0
Nov 26 17:36:54 balto kernel: [ 839.587153] [<ffffffff813309bf>] kobj_attr_store+0xf/0x20
Nov 26 17:36:54 balto kernel: [ 839.587192] [<ffffffff811bb62d>] sysfs_write_file+0xcd/0x140
Nov 26 17:36:54 balto kernel: [ 839.587234] [<ffffffff8114c12c>] vfs_write+0xac/0x180
Nov 26 17:36:54 balto kernel: [ 839.587271] [<ffffffff8114c472>] sys_write+0x52/0xa0
Nov 26 17:36:54 balto kernel: [ 839.587306] [<ffffffff810321ce>] ? do_page_fault+0xe/0x10
Nov 26 17:36:54 balto kernel: [ 839.587345] [<ffffffff81751202>] system_call_fastpath+0x16/0x1b
Signed-off-by: Fabio Baltieri <fabio.baltieri@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
SPEAr is an ARM based family of SoCs. This patch adds in support of cpufreq
driver for SPEAr SoCs. It is supported via DT only and so bindings are present
in binding document.
Signed-off-by: Deepak Sikri <deepak.sikri@st.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is no need to initialize the node before appending it to the list.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The internal.h declares the acpi_create_platform_device(). Without
that include we get a following warning:
drivers/acpi/acpi_platform.c:133:24: warning: symbol 'acpi_create_platform_device' was not declared. Should it be static?
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Drivers usually expect that the devices they are supposed to handle
will be operational when their .probe() routines are called, but that
need not be the case on some ACPI-based systems with ACPI-based
device enumeration where the BIOSes don't put devices into D0 by
default. To work around this problem it is sufficient to change
bus type .probe() routines to ensure that devices will be powered
on before the drivers' .probe() routines run (and their .remove()
and .shutdown() routines accordingly).
Modify platform_drv_probe() to run acpi_dev_pm_attach() for devices
whose ACPI handles are present, so that ACPI power management is used
to change their power states. Analogously, modify
platform_drv_remove() and platform_drv_shutdown() to call
acpi_dev_pm_detach() for those devices, so that they are not subject
to ACPI PM any more.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Daniel writes:
- Unbreak mbp retina, this time with a much more fine-grained approach
(since the previous "completely ignore edp vbt bpp value" regressed some
machines even after fixing a bug in our dp bw code).
- Disable cloning on sdvo. It just doesn't work (yeah took us a while to
figure out), leading to jittery outputs in the best case.
- Revert rc6 for ilk again. It seems to help a few of the gpu hang
reporters at least, and it's definitely the best we've got.
Head-against-the-wall-banging is still ongoing for what really breaks
(and how we can reproduce the non-rc6 hangs and how to reproduce on
gen4).
* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
Revert "drm/i915: enable rc6 on ilk again"
drm/i915: do not default to 18 bpp for eDP if missing from VBT
drm/i915: disable cloning on sdvo
Merge misc fixes from Andrew Morton:
"8 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (8 patches)
futex: avoid wake_futex() for a PI futex_q
watchdog: using u64 in get_sample_period()
writeback: put unused inodes to LRU after writeback completion
mm: vmscan: check for fatal signals iff the process was throttled
Revert "mm: remove __GFP_NO_KSWAPD"
proc: check vma->vm_file before dereferencing
UAPI: strip the _UAPI prefix from header guards during header installation
include/linux/bug.h: fix sparse warning related to BUILD_BUG_ON_INVALID
Here is a single fix for a reported regression in 3.7-rc5 for the tty
layer. This fix has been in the linux-next tree and solves the reported
problem.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlCz9SgACgkQMUfUDdst+ylwJACgrm6WSMdZy+Dcg+4lY+zLftUq
UhQAn1Y00nwte19cNvJLqWYgJlqkcUBw
=NAlw
-----END PGP SIGNATURE-----
Merge tag 'tty-3.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull TTY fix from Greg Kroah-Hartman:
"Here is a single fix for a reported regression in 3.7-rc5 for the tty
layer. This fix has been in the linux-next tree and solves the
reported problem.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'tty-3.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty vt: Fix a regression in command line edition
We have:
- A twl fix preventing a buffer overflow.
- A wm5102 register patch fix.
- A wm5110 error misreport fix.
- Arizona fixes: Use the right array size when adding subdevices, correctly
report underclocked events, synchronize register cache after reset.
- A twl4030 fix for preventing the system to hang from an interrupt flood.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQs33KAAoJEIqAPN1PVmxKlI8P/2wz99BLHEGG4uEZyRprOnA1
B1I0TYmrHidz9XqX9rB/ytWnyVg3ZumpwJOWETGcpGE842/jw9yKtYcra9R2JZ7H
IdobU2dQxlB+7J5Yoj7SuK9gY4eAN3/krKb7F0x8YKYjHleAwWwjzBVHXsK4I0oJ
wPuN5NnmEs7wqT9y74WgLwZCZmbvjDYBFNW3ZrwD28KJAHKak1FE2RxtiTuwUv4o
21j4s9XKlyI0mNgNPXQ0v5P6KZdceGBwXrjDa1+1onEuOsdXQF0ZHXfIfU24j/TB
rRzcWo8sycAf35HvPdfHOd6vwdmIEx36i9nuJLVUJEAxkkEGKar0ZfIhDXuvGFCm
bVS05+ZdZTbEWgiRRivpOuiL9P5KFElG5N/QhVhpNRkcOCKo3eGn/NDjIuv7hl/p
Y+nJFrqSmdzvdN2MUCh+dXoe4LTb3jdB1qRpNr3VMlyL0I/8w7YW9ypB2bitN+IE
0rZ4oK/BDHHRvQLIr4+frNbhG4OwH7XZmmtdiB10GMLw12akOkO2L0Zm2TTG/68g
asVECywpqEGLBvyn96n0PYO489Obk72VU9oUz1KftIdRTxd9xj+Eo0v9W4lIMb47
rscgjCrWTZs/kar52Em3Wc3q6eHfTG/h8uqN/R/ONwQzPwBXonHk1UxJI5+a1VXK
tvHiz+RvjfH0UW58Bc5w
=tHAv
-----END PGP SIGNATURE-----
Merge tag 'mfd-for-linus-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6
Pull MFD fixes from Samuel Ortiz:
- A twl fix preventing a buffer overflow.
- A wm5102 register patch fix.
- A wm5110 error misreport fix.
- Arizona fixes: Use the right array size when adding subdevices,
correctly report underclocked events, synchronize register cache
after reset.
- A twl4030 fix for preventing the system to hang from an interrupt
flood.
* tag 'mfd-for-linus-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
mfd: twl4030: Fix chained irq handling on resume from suspend
mfd: arizona: Sync regcache after reset
mfd: arizona: Correctly report when AIF2/AIF1 is underclocked
mfd: arizona: Use correct array for ARRAY_SIZE in mfd_add_devices call
mfd: wm5110: Disable control interface error report for WM5110 rev B
mfd: wm5102: Update register patch for latest evaluation
mfd: twl-core: Fix chip ID for the twl6030-pwm module
Pull ARM fixes from Russell King:
"Not much here, just a couple minor/cosmetic fixes and a patch for the
decompressor which fixes problems with modern GCC and CPUs."
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7583/1: decompressor: Enable unaligned memory access for v6 and above
ARM: 7572/1: proc-v6.S: fix comment
ARM: 7570/1: quiet down the non make -s output
Pull ext3 regression fix from Jan Kara:
"Fix an ext3 regression introduced during 3.7 merge window. It leads
to deadlock if you stress the filesystem in the right way (luckily
only if blocksize < pagesize)."
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
jbd: Fix lock ordering bug in journal_unmap_buffer()
Dave Jones reported a bug with futex_lock_pi() that his trinity test
exposed. Sometime between queue_me() and taking the q.lock_ptr, the
lock_ptr became NULL, resulting in a crash.
While futex_wake() is careful to not call wake_futex() on futex_q's with
a pi_state or an rt_waiter (which are either waiting for a
futex_unlock_pi() or a PI futex_requeue()), futex_wake_op() and
futex_requeue() do not perform the same test.
Update futex_wake_op() and futex_requeue() to test for q.pi_state and
q.rt_waiter and abort with -EINVAL if detected. To ensure any future
breakage is caught, add a WARN() to wake_futex() if the same condition
is true.
This fix has seen 3 hours of testing with "trinity -c futex" on an
x86_64 VM with 4 CPUS.
[akpm@linux-foundation.org: tidy up the WARN()]
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Reported-by: Dave Jones <davej@redat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: John Kacur <jkacur@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In get_sample_period(), unsigned long is not enough:
watchdog_thresh * 2 * (NSEC_PER_SEC / 5)
case1:
watchdog_thresh is 10 by default, the sample value will be: 0xEE6B2800
case2:
set watchdog_thresh is 20, the sample value will be: 0x1 DCD6 5000
In case2, we need use u64 to express the sample period. Otherwise,
changing the threshold thru proc often can not be successful.
Signed-off-by: liu chuansheng <chuansheng.liu@intel.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 169ebd9013 ("writeback: Avoid iput() from flusher thread")
removed iget-iput pair from inode writeback. As a side effect, inodes
that are dirty during iput_final() call won't be ever added to inode LRU
(iput_final() doesn't add dirty inodes to LRU and later when the inode
is cleaned there's noone to add the inode there). Thus inodes are
effectively unreclaimable until someone looks them up again.
The practical effect of this bug is limited by the fact that inodes are
pinned by a dentry for long enough that the inode gets cleaned. But
still the bug can have nasty consequences leading up to OOM conditions
under certain circumstances. Following can easily reproduce the
problem:
for (( i = 0; i < 1000; i++ )); do
mkdir $i
for (( j = 0; j < 1000; j++ )); do
touch $i/$j
echo 2 > /proc/sys/vm/drop_caches
done
done
then one needs to run 'sync; ls -lR' to make inodes reclaimable again.
We fix the issue by inserting unused clean inodes into the LRU after
writeback finishes in inode_sync_complete().
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: <stable@vger.kernel.org> [3.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 5515061d22 ("mm: throttle direct reclaimers if PF_MEMALLOC
reserves are low and swap is backed by network storage") introduced a
check for fatal signals after a process gets throttled for network
storage. The intention was that if a process was throttled and got
killed that it should not trigger the OOM killer. As pointed out by
Minchan Kim and David Rientjes, this check is in the wrong place and too
broad. If a system is in am OOM situation and a process is exiting, it
can loop in __alloc_pages_slowpath() and calling direct reclaim in a
loop. As the fatal signal is pending it returns 1 as if it is making
forward progress and can effectively deadlock.
This patch moves the fatal_signal_pending() check after throttling to
throttle_direct_reclaim() where it belongs. If the process is killed
while throttled, it will return immediately without direct reclaim
except now it will have TIF_MEMDIE set and will use the PFMEMALLOC
reserves.
Minchan pointed out that it may be better to direct reclaim before
returning to avoid using the reserves because there may be pages that
can easily reclaim that would avoid using the reserves. However, we do
no such targetted reclaim and there is no guarantee that suitable pages
are available. As it is expected that this throttling happens when
swap-over-NFS is used there is a possibility that the process will
instead swap which may allocate network buffers from the PFMEMALLOC
reserves. Hence, in the swap-over-nfs case where a process can be
throtted and be killed it can use the reserves to exit or it can
potentially use reserves to swap a few pages and then exit. This patch
takes the option of using the reserves if necessary to allow the process
exit quickly.
If this patch passes review it should be considered a -stable candidate
for 3.6.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction
based on failures" reverted, Zdenek Kabelac reported the following
Hmm, so it's just took longer to hit the problem and observe
kswapd0 spinning on my CPU again - it's not as endless like before -
but still it easily eats minutes - it helps to turn off Firefox
or TB (memory hungry apps) so kswapd0 stops soon - and restart
those apps again. (And I still have like >1GB of cached memory)
kswapd0 R running task 0 30 2 0x00000000
Call Trace:
preempt_schedule+0x42/0x60
_raw_spin_unlock+0x55/0x60
put_super+0x31/0x40
drop_super+0x22/0x30
prune_super+0x149/0x1b0
shrink_slab+0xba/0x510
The sysrq+m indicates the system has no swap so it'll never reclaim
anonymous pages as part of reclaim/compaction. That is one part of the
problem but not the root cause as file-backed pages could also be
reclaimed.
The likely underlying problem is that kswapd is woken up or kept awake
for each THP allocation request in the page allocator slow path.
If compaction fails for the requesting process then compaction will be
deferred for a time and direct reclaim is avoided. However, if there
are a storm of THP requests that are simply rejected, it will still be
the the case that kswapd is awake for a prolonged period of time as
pgdat->kswapd_max_order is updated each time. This is noticed by the
main kswapd() loop and it will not call kswapd_try_to_sleep(). Instead
it will loopp, shrinking a small number of pages and calling
shrink_slab() on each iteration.
The temptation is to supply a patch that checks if kswapd was woken for
THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not
backed up by proper testing. As 3.7 is very close to release and this
is not a bug we should release with, a safer path is to revert "mm:
remove __GFP_NO_KSWAPD" for now and revisit it with the view to ironing
out the balance_pgdat() logic in general.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 7b540d0646 ("proc_map_files_readdir(): don't bother with
grabbing files") switched proc_map_files_readdir() to use @f_mode
directly instead of grabbing @file reference, but same time the test for
@vm_file presence was lost leading to nil dereference. The patch brings
the test back.
The all proc_map_files feature is CONFIG_CHECKPOINT_RESTORE wrapped
(which is set to 'n' by default) so the bug doesn't affect regular
kernels.
The regression is 3.7-rc1 only as far as I can tell.
[gorcunov@openvz.org: provided changelog]
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Strip the _UAPI prefix from header guards during header installation so
that any userspace dependencies aren't affected. glibc, for example,
checks for linux/types.h, linux/kernel.h, linux/compiler.h and
linux/list.h by their guards - though the last two aren't actually
exported.
libtool: compile: gcc -std=gnu99 -DHAVE_CONFIG_H -I. -Wall -Werror -Wformat -Wformat-security -D_FORTIFY_SOURCE=2 -fno-delete-null-pointer-checks -fstack-protector -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -c child.c -fPIC -DPIC -o .libs/child.o
In file included from cli.c:20:0:
common.h:152:8: error: redefinition of 'struct sysinfo'
In file included from /usr/include/linux/kernel.h:4:0,
from /usr/include/linux/sysctl.h:25,
from /usr/include/sys/sysctl.h:43,
from common.h:50,
from cli.c:20:
/usr/include/linux/sysinfo.h:7:8: note: originally defined here
Reported-by: Tomasz Torcz <tomek@pipebreaker.pl>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit baf05aa927 ("bug: introduce BUILD_BUG_ON_INVALID() macro")
introduces this macro only when _CHECKER_ is not defined. Define a
silent macro in the else condition to fix following sparse warning:
mm/filemap.c:395:9: error: undefined identifier 'BUILD_BUG_ON_INVALID'
mm/filemap.c:396:9: error: undefined identifier 'BUILD_BUG_ON_INVALID'
mm/filemap.c:397:9: error: undefined identifier 'BUILD_BUG_ON_INVALID'
include/linux/mm.h:419:9: error: undefined identifier 'BUILD_BUG_ON_INVALID'
include/linux/mm.h:419:9: error: not a function <noident>
Signed-off-by: Tushar Behera <tushar.behera@linaro.org>
Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the raid1 or raid10 unplug function gets called
from a make_request function (which is very possible) when
there are bios on the current->bio_list list, then it will not
be able to successfully call bitmap_unplug() and it could
need to submit more bios and wait for them to complete.
But they won't complete while current->bio_list is non-empty.
So detect that case and handle the unplugging off to another thread
just like we already do when called from within the scheduler.
RAID1 version of bug was introduced in 3.6, so that part of fix is
suitable for 3.6.y. RAID10 part won't apply.
Cc: stable@vger.kernel.org
Reported-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Reported-by: Peter Maloney <peter.maloney@brockmann-consult.de>
Signed-off-by: NeilBrown <neilb@suse.de>
In __emulate_1op_rax_rdx, we use "+a" and "+d" which are input/output
constraints, and *then* use "a" and "d" as input constraints. This is
incorrect, but happens to work on some versions of gcc.
However, it breaks gcc with -O0 and icc, and may break on future
versions of gcc.
Reported-and-tested-by: Melanie Blower <melanie.blower@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/B3584E72CFEBED439A3ECA9BCE67A4EF1B17AF90@FMSMSX107.amr.corp.intel.com
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
A comment in entry.S incorrectly stated that interrupt vectors
called __do_IRQ() and that int6 vector was used for syscalls.
Both statements are incorrect for the current kernel, so this
patch cleans up the wording to reflect current reality.
Signed-off-by: Mark Salter <msalter@redhat.com>
Name of pimreg devices are built from following format :
char name[IFNAMSIZ]; // IFNAMSIZ == 16
sprintf(name, "pimreg%u", mrt->id);
We must therefore limit mrt->id to 9 decimal digits
or risk a buffer overflow and a crash.
Restrict table identifiers in [0 ... 999999999] interval.
Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
inet_getpeer_v4() can return NULL under OOM conditions, and while
inet_peer_xrlim_allow() is OK with a NULL peer, inet_putpeer() will
crash.
This code path now uses the same idiom as the others from:
1d861aa4b3 ("inet: Minimize use of
cached route inetpeer.").
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set in the rx_ifindex to pass the correct interface index in the case of a
message timeout detection. Usually the rx_ifindex value is set at receive
time. But when no CAN frame has been received the RX_TIMEOUT notification
did not contain a valid value.
Cc: linux-stable <stable@vger.kernel.org>
Reported-by: Andre Naujoks <nautsch2@googlemail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The skb->tstamp is set to the hardware timestamp when available in the USB
urb message. This leads to user visible timestamps which contain the 'uptime'
of the USB adapter - and not the usual system generated timestamp.
Fix this wrong assignment by applying the available hardware timestamp to the
skb_shared_hwtstamps data structure - which is intended for this purpose.
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
When one input buffer has multiple frames, it should be fed
again to the hardware with the remaining bytes. Removed the
check for P frame in this scenario as this condition can come with
all frame types.
Signed-off-by: Arun Kumar K <arun.kk@samsung.com>
Signed-off-by: ARUN MANKUZHI <arun.m@samsung.com>
Acked-by: Kamil Debski <k.debski@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Modified the function s5p_mfc_get_dec_y_adr_v6 to access the
decode Y address register instead of display Y address.
Signed-off-by: Sunil Mazhavanchery <sunilm@samsung.com>
Signed-off-by: Arun Kumar K <arun.kk@samsung.com>
Acked-by: Kamil Debski <k.debski@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
vfl_dir should be set to VFL_DIR_M2M so valid ioctls for this
mem-to-mem device can be properly determined in the v4l2 core.
Signed-off-by: Sylwester Nawrocki <sylvester.nawrocki@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Macros used to set input and output RGB type aren't correct.
Updating the macros as per register manual.
Signed-off-by: Shaik Ameer Basha <shaik.ameer@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Use uninterruptible mutex_lock in the release() file op to make
sure all resources are properly freed when a process is being
terminated. Returning -ERESTARTSYS has no effect for a terminating
process and this may cause driver resources not to be released.
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Use uninterruptible mutex_lock in the release() file op to make
sure all resources are properly freed when a process is being
terminated. Returning -ERESTARTSYS has no effect for a terminating
process and this may cause driver resources not to be released.
This patch is required for stable kernels v3.5+.
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Use uninterruptible mutex_lock in the release() file op to make
sure all resources are properly freed when a process is being
terminated. Returning -ERESTARTSYS has no effect for a terminating
process and this caused driver resources not to be released. Not
releasing the buffer queue also prevented other drivers to free
memory, e.g. in MMAP -> USERPTR scenario.
This patch is required for stable kernels v3.6+.
Reported-by: Kamil Debski <k.debski@samsung.com>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>