Commit graph

240 commits

Author SHA1 Message Date
Rafael J. Wysocki 1c3d85dd4e cpufreq: Revert incorrect commit 5800043
Commit 5800043 (cpufreq: convert cpufreq_driver to using RCU) causes
the following call trace to be spit on boot:

 BUG: sleeping function called from invalid context at /scratch/rafael/work/linux-pm/mm/slab.c:3179
 in_atomic(): 0, irqs_disabled(): 0, pid: 292, name: systemd-udevd
 2 locks held by systemd-udevd/292:
  #0:  (subsys mutex){+.+.+.}, at: [<ffffffff8146851a>] subsys_interface_register+0x4a/0xe0
  #1:  (rcu_read_lock){.+.+.+}, at: [<ffffffff81538210>] cpufreq_add_dev_interface+0x60/0x5e0
 Pid: 292, comm: systemd-udevd Not tainted 3.9.0-rc8+ #323
 Call Trace:
  [<ffffffff81072c90>] __might_sleep+0x140/0x1f0
  [<ffffffff811581c2>] kmem_cache_alloc+0x42/0x2b0
  [<ffffffff811e7179>] sysfs_new_dirent+0x59/0x130
  [<ffffffff811e63cb>] sysfs_add_file_mode+0x6b/0x110
  [<ffffffff81538210>] ? cpufreq_add_dev_interface+0x60/0x5e0
  [<ffffffff810a3254>] ? __lock_is_held+0x54/0x80
  [<ffffffff811e647d>] sysfs_add_file+0xd/0x10
  [<ffffffff811e6541>] sysfs_create_file+0x21/0x30
  [<ffffffff81538280>] cpufreq_add_dev_interface+0xd0/0x5e0
  [<ffffffff81538210>] ? cpufreq_add_dev_interface+0x60/0x5e0
  [<ffffffffa000337f>] ? acpi_processor_get_platform_limit+0x32/0xbb [processor]
  [<ffffffffa022f540>] ? do_drv_write+0x70/0x70 [acpi_cpufreq]
  [<ffffffff810a3254>] ? __lock_is_held+0x54/0x80
  [<ffffffff8106c97e>] ? up_read+0x1e/0x40
  [<ffffffff8106e632>] ? __blocking_notifier_call_chain+0x72/0xc0
  [<ffffffff81538dbd>] cpufreq_add_dev+0x62d/0xae0
  [<ffffffff815389b8>] ? cpufreq_add_dev+0x228/0xae0
  [<ffffffff81468569>] subsys_interface_register+0x99/0xe0
  [<ffffffffa014d000>] ? 0xffffffffa014cfff
  [<ffffffff81535d5d>] cpufreq_register_driver+0x9d/0x200
  [<ffffffffa014d000>] ? 0xffffffffa014cfff
  [<ffffffffa014d0e9>] acpi_cpufreq_init+0xe9/0x1000 [acpi_cpufreq]
  [<ffffffff810002fa>] do_one_initcall+0x11a/0x170
  [<ffffffff810b4b87>] load_module+0x1cf7/0x2920
  [<ffffffff81322580>] ? ddebug_proc_open+0xb0/0xb0
  [<ffffffff816baee0>] ? retint_restore_args+0xe/0xe
  [<ffffffff810b5887>] sys_init_module+0xd7/0x120
  [<ffffffff816bb6d2>] system_call_fastpath+0x16/0x1b

which is quite obvious, because that commit put (multiple instances
of) sysfs_create_file() under rcu_read_lock()/rcu_read_unlock(),
although sysfs_create_file() may cause memory to be allocated with
GFP_KERNEL and that may sleep, which is not permitted in RCU read
critical section.

Revert the buggy commit altogether along with some changes on top
of it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-29 00:08:16 +02:00
Viresh Kumar 820c6ca293 cpufreq: Don't call __cpufreq_governor() for drivers without target()
Some cpufreq drivers implement their own governor and so don't need
us to call generic governors interface via __cpufreq_governor(). Few
recent commits haven't obeyed this law well and we saw some
regressions.

This patch is an attempt to fix the above issue.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Dirk Brandewie <dirk.brandewie@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-22 00:48:03 +02:00
Viresh Kumar e4969ebac8 cpufreq: Call __cpufreq_governor() with correct policy->cpus mask
__cpufreq_governor() must be called with a correct policy->cpus mask.
In __cpufreq_remove_dev() we initially clear policy->cpus with
cpumask_clear_cpu() and then call
__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT). If the governor
is doing some per-cpu stuff in EXIT callback, this can create
uncertain behavior.

Generic governors in drivers/cpufreq/ doesn't do any per-cpu stuff
in EXIT callback and so we don't face any issues currently. But its
better to keep the code clean, so we don't face any issues in future.

Now, we call cpumask_clear_cpu() only when multiple cpus are managed
by policy.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-11 22:50:09 +02:00
Nathan Zimmer 5800043b24 cpufreq: convert cpufreq_driver to using RCU
We eventually would like to remove the rwlock cpufreq_driver_lock or
convert it back to a spinlock and protect the read sections with RCU.
The first step in that direction is to make cpufreq_driver use RCU.
I don't see an easy wasy to protect the cpufreq_cpu_data structure
with RCU, so I am leaving it with the rwlock for now since under
certain configs __cpufreq_cpu_get is a hot spot with 256+ cores.

[rjw: Subject, changelog, white space]
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-10 13:19:26 +02:00
Viresh Kumar b43a7ffbf3 cpufreq: Notify all policy->cpus in cpufreq_notify_transition()
policy->cpus contains all online cpus that have single shared clock line. And
their frequencies are always updated together.

Many SMP system's cpufreq drivers take care of this in individual drivers but
the best place for this code is in cpufreq core.

This patch modifies cpufreq_notify_transition() to notify frequency change for
all cpus in policy->cpus and hence updates all users of this API.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-02 15:24:00 +02:00
Viresh Kumar 4d5dcc4211 cpufreq: governor: Implement per policy instances of governors
Currently, there can't be multiple instances of single governor_type.
If we have a multi-package system, where we have multiple instances
of struct policy (per package), we can't have multiple instances of
same governor. i.e. We can't have multiple instances of ondemand
governor for multiple packages.

Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
governor-name/. Which again reflects that there can be only one
instance of a governor_type in the system.

This is a bottleneck for multicluster system, where we want different
packages to use same governor type, but with different tunables.

This patch uses the infrastructure provided by earlier patch and
implements init/exit routines for ondemand and conservative
governors.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01 01:11:34 +02:00
Viresh Kumar 7bd353a995 cpufreq: Add per policy governor-init/exit infrastructure
Currently, there can't be multiple instances of single governor_type.
If we have a multi-package system, where we have multiple instances
of struct policy (per package), we can't have multiple instances of
same governor. i.e. We can't have multiple instances of ondemand
governor for multiple packages.

Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
governor-name/. Which again reflects that there can be only one
instance of a governor_type in the system.

This is a bottleneck for multicluster system, where we want different
packages to use same governor type, but with different tunables.

This patch is inclined towards providing this infrastructure. Because
we are required to allocate governor's resources dynamically now, we
must do it at policy creation and end. And so got
CPUFREQ_GOV_POLICY_INIT/EXIT.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01 01:11:34 +02:00
Nathan Zimmer 0d1857a1b9 cpufreq: Convert the cpufreq_driver_lock to a rwlock
This eliminates the contention I am seeing in __cpufreq_cpu_get.
It also nicely stages the lock to be replaced by the rcu.

Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01 01:11:34 +02:00
Rafael J. Wysocki 4419fbd4b4 Merge branch 'pm-cpufreq'
* pm-cpufreq: (55 commits)
  cpufreq / intel_pstate: Fix 32 bit build
  cpufreq: conservative: Fix typos in comments
  cpufreq: ondemand: Fix typos in comments
  cpufreq: exynos: simplify .init() for setting policy->cpus
  cpufreq: kirkwood: Add a cpufreq driver for Marvell Kirkwood SoCs
  cpufreq/x86: Add P-state driver for sandy bridge.
  cpufreq_stats: do not remove sysfs files if frequency table is not present
  cpufreq: Do not track governor name for scaling drivers with internal governors.
  cpufreq: Only call cpufreq_out_of_sync() for driver that implement cpufreq_driver.target()
  cpufreq: Retrieve current frequency from scaling drivers with internal governors
  cpufreq: Fix locking issues
  cpufreq: Create a macro for unlock_policy_rwsem{read,write}
  cpufreq: Remove unused HOTPLUG_CPU code
  cpufreq: governors: Fix WARN_ON() for multi-policy platforms
  cpufreq: ondemand: Replace down_differential tuner with adj_up_threshold
  cpufreq / stats: Get rid of CPUFREQ_STATDEVICE_ATTR
  cpufreq: Don't check cpu_online(policy->cpu)
  cpufreq: add imx6q-cpufreq driver
  cpufreq: Don't remove sysfs link for policy->cpu
  cpufreq: Remove unnecessary use of policy->shared_type
  ...
2013-02-15 13:59:07 +01:00
Dirk Brandewie fa69e33f7d cpufreq: Do not track governor name for scaling drivers with internal governors.
Scaling drivers that implement internal governors do not have governor
structures assocaited with them.  Only track the name of the governor
associated with the CPU if the driver does not implement
cpufreq_driver.setpolicy()

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-09 12:55:53 +01:00
Dirk Brandewie f6b0515b07 cpufreq: Only call cpufreq_out_of_sync() for driver that implement cpufreq_driver.target()
Scaling drivers that implement cpufreq_driver.setpolicy() have
internal governors that do not signal changes via
cpufreq_notify_transition() so the frequncy in the policy will almost
certainly be different than the current frequncy.  Only call
cpufreq_out_of_sync() when the underlying driver implements
cpufreq_driver.target()

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-09 12:55:47 +01:00
Dirk Brandewie 9e21ba8bd8 cpufreq: Retrieve current frequency from scaling drivers with internal governors
Scaling drivers that implement the cpufreq_driver.setpolicy() versus
the cpufreq_driver.target() interface do not set policy->cur.

Normally policy->cur is set during the call to cpufreq_driver.target()
when the frequnecy request is made by the governor.

If the scaling driver implements cpufreq_driver.setpolicy() and
cpufreq_driver.get() interfaces use cpufreq_driver.get() to retrieve
the current frequency.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-09 12:55:03 +01:00
Viresh Kumar 2eaa3e2df1 cpufreq: Fix locking issues
cpufreq core uses two locks:
- cpufreq_driver_lock: General lock for driver and cpufreq_cpu_data array.
- cpu_policy_rwsemfix locking: per CPU reader-writer semaphore designed to cure
  all cpufreq/hotplug/workqueue/etc related lock issues.

These locks were not used properly and are placed against their principle
(present before their definition) at various places. This patch is an attempt to
fix their use.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-09 01:22:57 +01:00
Viresh Kumar fa1d8af47f cpufreq: Create a macro for unlock_policy_rwsem{read,write}
On the lines of macro: lock_policy_rwsem, we can create another macro for
unlock_policy_rwsem. Lets do it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-09 01:22:06 +01:00
Viresh Kumar 65922465b5 cpufreq: Remove unused HOTPLUG_CPU code
Because the sibling cpu of any online cpu is identified very early in
cpufreq_add_dev(), below code is never executed. And so can be removed.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-09 01:21:37 +01:00
Viresh Kumar 8e53695f7f cpufreq: governors: Fix WARN_ON() for multi-policy platforms
On multi-policy systems there is a single instance of governor for both the
policies (if same governor is chosen for both policies). With the code update
from following patches:

8eeed09 cpufreq: governors: Get rid of dbs_data->enable field
b394058 cpufreq: governors: Reset tunables only for cpufreq_unregister_governor()

We are creating/removing sysfs directory of governor for for every call to
GOV_START and STOP. This would fail for multi-policy system as there is a
per-policy call to START/STOP.

This patch reuses the governor->initialized variable to detect total users of
governor.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-09 01:21:13 +01:00
Viresh Kumar 3361b7b173 cpufreq: Don't check cpu_online(policy->cpu)
policy->cpu or cpus in policy->cpus can't be offline anymore. And so we don't
need to check if they are online or not.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-09 01:18:34 +01:00
Viresh Kumar 73bf0fc2b0 cpufreq: Don't remove sysfs link for policy->cpu
"cpufreq" directory in policy->cpu is never created using
sysfs_create_link(), but using kobject_init_and_add(). And so we
shouldn't call sysfs_remove_link() for policy->cpu().  sysfs stuff
for policy->cpu is automatically removed when we call kobject_put()
for dying policy.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dirk Brandewie <dirk.brandewie@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-05 22:21:14 +01:00
Viresh Kumar b394058f06 cpufreq: governors: Reset tunables only for cpufreq_unregister_governor()
Currently, whenever governor->governor() is called for CPUFRREQ_GOV_START event
we reset few tunables of governor. Which isn't correct, as this routine is
called for every cpu hot-[un]plugging event. We should actually be resetting
these only when the governor module is removed and re-installed.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02 01:29:31 +01:00
Viresh Kumar fcf8058296 cpufreq: Simplify cpufreq_add_dev()
Currently cpufreq_add_dev() firsts allocates policy, calls
driver->init() and then checks if this CPU is already managed or not.
And if it is already managed, its policy is freed.

We can save all this if we somehow know that CPU is managed or not in
advance.  policy->related_cpus contains the list of all valid sibling
CPUs of policy->cpu. We can check this to see if the current CPU is
already managed.

From now on, platforms don't really need to set related_cpus from
their init() routines, as the same work is done by core too.

If a platform driver needs to set the related_cpus mask with some
additional CPUs, other than CPUs present in policy->cpus, they are
free to do it, though, as we don't override anything.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02 00:01:16 +01:00
Viresh Kumar b26f72042e cpufreq: Revert "cpufreq: Don't use cpu removed during cpufreq_driver_unregister"
This reverts commit 956f339 "cpufreq: Don't use cpu removed during
cpufreq_driver_unregister".

With the addition of the following commit, this change/variable is not
required any more:

commit b9ba2725343ae57add3f324dfa5074167f48de96
Author: Viresh Kumar <viresh.kumar@linaro.org>
Date:   Mon Jan 14 13:23:03 2013 +0000

    cpufreq: Simplify __cpufreq_remove_dev()

[rjw: Subject and changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02 00:01:16 +01:00
Borislav Petkov 9d95046e5d cpufreq: Add a get_current_driver helper
Add a helper function to return cpufreq_driver->name.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02 00:01:15 +01:00
Dirk Brandewie d5aaffa9dd cpufreq: handle cpufreq being disabled for all exported function.
When disable_cpufreq() is called some exported functions are still
being used that do not have a check for cpufreq being disabled.

Add a disabled check into cpufreq_cpu_get() to return NULL if
cpufreq is disabled this covers most of the exported functions. For
the exported functions that do not call cpufreq_cpu_get() add an
explicit check.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02 00:01:14 +01:00
Viresh Kumar b8eed8af94 cpufreq: Simplify __cpufreq_remove_dev()
__cpufreq_remove_dev() is called on multiple occasions: cpufreq_driver
unregister and cpu removals.

Current implementation of this routine is overly complex without much need. If
the cpu to be removed is the policy->cpu, we remove the policy first and add all
other cpus again from policy->cpus and then finally call __cpufreq_remove_dev()
again to remove the cpu to be deleted. Haahhhh..

There exist a simple solution to removal of a cpu:
- Simply use the old policy structure
- update its fields like: policy->cpu, etc.
- notify any users of cpufreq, which depend on changing policy->cpu

Hence this patch, which tries to implement the above theory. It is tested well
by myself on ARM big.LITTLE TC2 SoC, which has 5 cores (2 A15 and 3 A7). Both
A15's share same struct policy and all A7's share same policy structure.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02 00:01:14 +01:00
Viresh Kumar 6954ca9c8b cpufreq: Don't use cpu removed during cpufreq_driver_unregister
This is how the core works:
cpufreq_driver_unregister()
 - subsys_interface_unregister()
   - for_each_cpu() call cpufreq_remove_dev(), i.e. 0,1,2,3,4 when we
     unregister.

cpufreq_remove_dev():
 - Remove policy node
 - Call cpufreq_add_dev() for next cpu, sharing mask with removed cpu.
   i.e. When cpu 0 is removed, we call it for cpu 1. And when called for cpu 2,
   we call it for cpu 3.
   - cpufreq_add_dev() would call cpufreq_driver->init()
   - init would return mask as AND of 2, 3 and 4 for cluster A7.
   - cpufreq core would do online_cpu && policy->cpus
     Here is the BUG(). Because cpu hasn't died but we have just unregistered
     the cpufreq driver, online cpu would still have cpu 2 in it. And so thing
     go bad again.

Solution: Keep cpumask of cpus that are registered with cpufreq core and clear
	  cpus when we get a call from subsys_interface_unregister() via
	  cpufreq_remove_dev().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02 00:01:14 +01:00
Viresh Kumar f6a7409cab cpufreq: Notify governors when cpus are hot-[un]plugged
Because cpufreq core and governors worry only about the online cpus, if a cpu is
hot [un]plugged, we must notify governors about it, otherwise be ready to expect
something unexpected.

We already have notifiers in the form of CPUFREQ_GOV_START/CPUFREQ_GOV_STOP, we
just need to call them now.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02 00:01:14 +01:00
Viresh Kumar 643ae6e81d cpufreq: Manage only online cpus
cpufreq core doesn't manage offline cpus and if driver->init() has returned
mask including offline cpus, it may result in unwanted behavior by cpufreq core
or governors.

We need to get only online cpus in this mask. There are two places to fix this
mask, cpufreq core and cpufreq driver. It makes sense to do this at common place
and hence is done in core.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02 00:01:14 +01:00
Paul Gortmaker 43720bd601 PM / tracing: remove deprecated power trace API
The text in Documentation said it would be removed in 2.6.41;
the text in the Kconfig said removal in the 3.1 release.  Either
way you look at it, we are well past both, so push it off a cliff.

Note that the POWER_CSTATE and the POWER_PSTATE are part of the
legacy tracing API.  Remove all tracepoints which use these flags.
As can be seen from context, most already have a trace entry via
trace_cpu_idle anyways.

Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as
compared to the CSTATE ones which all have a clear start/stop.
As part of this, the trace_power_frequency also becomes orphaned,
so it too is deleted.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-26 00:39:12 +01:00
Jingoo Han f55c9c2627 cpufreq: Remove unnecessary initialization of a local variable
Remove an unnecessary initializer for the 'ret' variable in
__cpufreq_set_policy().

[rjw: Modified the subject and changelog slightly.]
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:33:09 +01:00
Viresh Kumar 7249924e53 cpufreq: Make sure target freq is within limits
__cpufreq_driver_target() must not pass target frequency beyond the
limits of current policy.

Today most of cpufreq platform drivers are doing this check in their
target routines. Why not move it to __cpufreq_driver_target()?

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:33:09 +01:00
Viresh Kumar 5a1c022850 cpufreq: Avoid calling cpufreq driver's target() routine if target_freq == policy->cur
Avoid calling cpufreq driver's target() routine if new frequency is same as
policies current frequency.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:33:09 +01:00
Viresh Kumar da58445570 cpufreq: Fix sparse warning by making local function static
cpufreq_disabled() is a local function, so should be marked static.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:33:08 +01:00
Viresh Kumar 0676f7f2e7 cpufreq: return early from __cpufreq_driver_getavg()
There is no need to do cpufreq_get_cpu() and cpufreq_put_cpu() for drivers that
don't support getavg() routine.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:33:07 +01:00
Viresh Kumar db7011516c cpufreq: Improve debug prints
With debug options on, it is difficult to locate cpufreq core's debug prints.
Fix this by prefixing debug prints with KBUILD_MODNAME.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:33:06 +01:00
viresh kumar 4b972f0b04 cpufreq / core: Fix printing of governor and driver name
Arrays for governer and driver name are of size CPUFREQ_NAME_LEN or 16.
i.e. 15 bytes for name and 1 for trailing '\0'.

When cpufreq driver print these names (for sysfs), it includes '\n' or ' ' in
the fmt string and still passes length as CPUFREQ_NAME_LEN. If the driver or
governor names are using all 15 fields allocated to them, then the trailing '\n'
or ' ' will never be printed. And so commands like:

root@linaro-developer# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver

will print something like:

cpufreq_foodrvroot@linaro-developer#

Fix this by increasing print length by one character.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:33:06 +01:00
viresh kumar 8bf1ac7236 cpufreq / core: Fix typo in comment describing show_bios_limit()
show_bios_limit is mistakenly written as show_scaling_driver in a comment
describing purpose of show_bios_limit() routine.

Fix it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:33:05 +01:00
Stephen Boyd a914443627 cpufreq: Fix sysfs deadlock with concurrent hotplug/frequency switch
Running one program that continuously hotplugs and replugs a cpu
concurrently with another program that continuously writes to the
scaling_setspeed node eventually deadlocks with:

=============================================
[ INFO: possible recursive locking detected ]
3.4.0 #37 Tainted: G        W
---------------------------------------------
filemonkey/122 is trying to acquire lock:
 (s_active#13){++++.+}, at: [<c01a3d28>] sysfs_remove_dir+0x9c/0xb4

but task is already holding lock:
 (s_active#13){++++.+}, at: [<c01a22f0>] sysfs_write_file+0xe8/0x140

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(s_active#13);
  lock(s_active#13);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by filemonkey/122:
 #0:  (&buffer->mutex){+.+.+.}, at: [<c01a2230>] sysfs_write_file+0x28/0x140
 #1:  (s_active#13){++++.+}, at: [<c01a22f0>] sysfs_write_file+0xe8/0x140

stack backtrace:
[<c0014fcc>] (unwind_backtrace+0x0/0x120) from [<c00ca600>] (validate_chain+0x6f8/0x1054)
[<c00ca600>] (validate_chain+0x6f8/0x1054) from [<c00cb778>] (__lock_acquire+0x81c/0x8d8)
[<c00cb778>] (__lock_acquire+0x81c/0x8d8) from [<c00cb9c0>] (lock_acquire+0x18c/0x1e8)
[<c00cb9c0>] (lock_acquire+0x18c/0x1e8) from [<c01a3ba8>] (sysfs_addrm_finish+0xd0/0x180)
[<c01a3ba8>] (sysfs_addrm_finish+0xd0/0x180) from [<c01a3d28>] (sysfs_remove_dir+0x9c/0xb4)
[<c01a3d28>] (sysfs_remove_dir+0x9c/0xb4) from [<c02d0e5c>] (kobject_del+0x10/0x38)
[<c02d0e5c>] (kobject_del+0x10/0x38) from [<c02d0f74>] (kobject_release+0xf0/0x194)
[<c02d0f74>] (kobject_release+0xf0/0x194) from [<c0565a98>] (cpufreq_cpu_put+0xc/0x24)
[<c0565a98>] (cpufreq_cpu_put+0xc/0x24) from [<c05683f0>] (store+0x6c/0x74)
[<c05683f0>] (store+0x6c/0x74) from [<c01a2314>] (sysfs_write_file+0x10c/0x140)
[<c01a2314>] (sysfs_write_file+0x10c/0x140) from [<c014af44>] (vfs_write+0xb0/0x128)
[<c014af44>] (vfs_write+0xb0/0x128) from [<c014b06c>] (sys_write+0x3c/0x68)
[<c014b06c>] (sys_write+0x3c/0x68) from [<c000e0e0>] (ret_fast_syscall+0x0/0x3c)

This is because store() in cpufreq.c indirectly calls
kobject_get() via cpufreq_cpu_get() and is the last one to call
kobject_put() via cpufreq_cpu_put(). Sysfs code should not call
kobject_get() or kobject_put() directly (see the comment around
sysfs_schedule_callback() for more information).

Fix this deadlock by introducing two new functions:

	struct cpufreq_policy *cpufreq_cpu_get_sysfs(unsigned int cpu)
	void cpufreq_cpu_put_sysfs(struct cpufreq_policy *data)

which do the same thing as cpufreq_cpu_{get,put}() but don't call
kobject functions.

To easily trigger this deadlock you can insert an msleep() with a
reasonably large value right after the fail label at the bottom
of the store() function in cpufreq.c and then write
scaling_setspeed in one task and offline the cpu in another. The
first task will hang and be detected by the hung task detector.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-07-20 21:39:25 +02:00
Konrad Rzeszutek Wilk a7b422cda5 provide disable_cpufreq() function to disable the API.
useful for disabling cpufreq altogether. The cpu frequency
scaling drivers and cpu frequency governors will fail to register.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2012-03-14 14:45:03 -04:00
Linus Torvalds 02d929502c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: (23 commits)
  [CPUFREQ] EXYNOS: Removed useless headers and codes
  [CPUFREQ] EXYNOS: Make EXYNOS common cpufreq driver
  [CPUFREQ] powernow-k8: Update copyright, maintainer and documentation information
  [CPUFREQ] powernow-k8: Fix indexing issue
  [CPUFREQ] powernow-k8: Avoid Pstate MSR accesses on systems supporting CPB
  [CPUFREQ] update lpj only if frequency has changed
  [CPUFREQ] cpufreq:userspace: fix cpu_cur_freq updation
  [CPUFREQ] Remove wall variable from cpufreq_gov_dbs_init()
  [CPUFREQ] EXYNOS4210: cpufreq code is changed for stable working
  [CPUFREQ] EXYNOS4210: Update frequency table for cpu divider
  [CPUFREQ] EXYNOS4210: Remove code about bus on cpufreq
  [CPUFREQ] s3c64xx: Use pr_fmt() for consistent log messages
  cpufreq: OMAP: fixup for omap_device changes, include <linux/module.h>
  cpufreq: OMAP: fix freq_table leak
  cpufreq: OMAP: put clk if cpu_init failed
  cpufreq: OMAP: only supports OPP library
  cpufreq: OMAP: dont support !freq_table
  cpufreq: OMAP: deny initialization if no mpudev
  cpufreq: OMAP: move clk name decision to init
  cpufreq: OMAP: notify even with bad boot frequency
  ...
2012-01-11 18:53:33 -08:00
Afzal Mohammed d08de0c19c [CPUFREQ] update lpj only if frequency has changed
During scaling up of cpu frequency, loops_per_jiffy
is updated upon invoking PRECHANGE notifier.
If setting to new frequency fails in cpufreq driver,
lpj is left at incorrect value.

Hence update lpj only if cpu frequency is changed,
i.e. upon invoking POSTCHANGE notifier.

Penalty would be that during time period between
changing cpu frequency & invocation of POSTCHANGE
notifier, udelay(x) may not gurantee minimal delay
of 'x' us for frequency scaling up operation.

Perhaps a better solution would be to define
CPUFREQ_ABORTCHANGE & handle accordingly, but then
it would be more intrusive (using ABORTCHANGE may
help drivers also; if any has registered notifier
and expect POST for a PRECHANGE, their needs can
be taken care using ABORT)

Signed-off-by: Afzal Mohammed <afzal@ti.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2012-01-06 10:10:53 -05:00
Kay Sievers 8a25a2fd12 cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem
This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.

After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Userspace relies on events and generic sysfs subsystem infrastructure
from sysdev devices, which are made available with this conversion.

Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@amd64.org>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Len Brown <lenb@kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21 14:29:42 -08:00
Jesse Barnes 3d73710880 cpufreq: expose a cpufreq_quick_get_max routine
This allows drivers and other code to get the max reported CPU frequency.
Initial use is for scaling ring frequency with GPU frequency in the i915
driver.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-06-28 13:54:26 -07:00
Kees Cook 1a8e1463a4 [CPUFREQ] remove redundant sprintf from request_module call.
Since format string handling is part of request_module, there is no
need to construct the module name. As such, drop the redundant sprintf
and heap usage.

Signed-off-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2011-05-04 11:50:58 -04:00
Dominik Brodowski 2d06d8c49a [CPUFREQ] use dynamic debug instead of custom infrastructure
With dynamic debug having gained the capability to report debug messages
also during the boot process, it offers a far superior interface for
debug messages than the custom cpufreq infrastructure. As a first step,
remove the old cpufreq_debug_printk() function and replace it with a call
to the generic pr_debug() function.

How can dynamic debug be used on cpufreq? You need a kernel which has
CONFIG_DYNAMIC_DEBUG enabled.

To enabled debugging during runtime, mount debugfs and

$ echo -n 'module cpufreq +p' > /sys/kernel/debug/dynamic_debug/control

for debugging the complete "cpufreq" module. To achieve the same goal during
boot, append

	ddebug_query="module cpufreq +p"

as a boot parameter to the kernel of your choice.

For more detailled instructions, please see
Documentation/dynamic-debug-howto.txt

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
2011-05-04 11:50:57 -04:00
Jacob Shin 27ecddc2a9 [CPUFREQ] CPU hotplug, re-create sysfs directory and symlinks
When we discover CPUs that are affected by each other's
frequency/voltage transitions, the first CPU gets a sysfs directory
created, and rest of the siblings get symlinks. Currently, when we
hotplug off only the first CPU, all of the symlinks and the sysfs
directory gets removed. Even though rest of the siblings are still
online and functional, they are orphaned, and no longer governed by
cpufreq.

This patch, given the above scenario, creates a sysfs directory for
the first sibling and symlinks for the rest of the siblings.

Please note the recursive call, it was rather too ugly to roll it
out. And the removal of redundant NULL setting (it is already taken
care of near the top of the function).

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: stable@kernel.org
2011-05-04 11:50:57 -04:00
Lucas De Marchi 25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Rafael J. Wysocki e00e56dfd3 cpufreq: Use syscore_ops for boot CPU suspend/resume (v2)
The cpufreq subsystem uses sysdev suspend and resume for
executing cpufreq_suspend() and cpufreq_resume(), respectively,
during system suspend, after interrupts have been switched off on the
boot CPU, and during system resume, while interrupts are still off on
the boot CPU.  In both cases the other CPUs are off-line at the
relevant point (either they have been switched off via CPU hotplug
during suspend, or they haven't been switched on yet during resume).
For this reason, although it may seem that cpufreq_suspend() and
cpufreq_resume() are executed for all CPUs in the system, they are
only called for the boot CPU in fact, which is quite confusing.

To remove the confusion and to prepare for elimiating sysdev
suspend and resume operations from the kernel enirely, convernt
cpufreq to using a struct syscore_ops object for the boot CPU
suspend and resume and rename the callbacks so that their names
reflect their purpose.  In addition, put some explanatory remarks
into their kerneldoc comments.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-03-23 22:16:32 +01:00
Rafael J. Wysocki 7ca64e2d28 [CPUFREQ] Remove the pm_message_t argument from driver suspend
None of the existing cpufreq drivers uses the second argument of
its .suspend() callback (which isn't useful anyway), so remove it.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
2011-03-16 17:54:33 -04:00
Jiri Slaby 8f5bc2abfd [CPUFREQ] fix BUG on cpufreq policy init failure
cpufreq_register_driver sets cpufreq_driver to a structure owned (and
placed) in the caller's memory. If cpufreq policy fails in its ->init
function, sysdev_driver_register returns nonzero in
cpufreq_register_driver. Now, cpufreq_register_driver returns an error
without setting cpufreq_driver back to NULL.

Usually cpufreq policy modules are unloaded because they propagate the
error to the module init function and return that.

So a later access to any member of cpufreq_driver causes bugs like:
BUG: unable to handle kernel paging request at ffffffffa00270a0
IP: [<ffffffff8145eca3>] cpufreq_cpu_get+0x53/0xe0
PGD 1805067 PUD 1809063 PMD 1c3f90067 PTE 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/net/tun0/statistics/collisions
CPU 0
Modules linked in: ...
Pid: 5677, comm: thunderbird-bin Tainted: G        W   2.6.38-rc4-mm1_64+ #1389 To be filled by O.E.M./To Be Filled By O.E.M.
RIP: 0010:[<ffffffff8145eca3>]  [<ffffffff8145eca3>] cpufreq_cpu_get+0x53/0xe0
RSP: 0018:ffff8801aec37d98  EFLAGS: 00010086
RAX: 0000000000000202 RBX: 0000000000000000 RCX: 0000000000000001
RDX: ffffffffa00270a0 RSI: 0000000000001000 RDI: ffffffff8199ece8
...
Call Trace:
 [<ffffffff8145f490>] cpufreq_quick_get+0x10/0x30
 [<ffffffff8103f12b>] show_cpuinfo+0x2ab/0x300
 [<ffffffff81136292>] seq_read+0xf2/0x3f0
 [<ffffffff8126c5d3>] ? __strncpy_from_user+0x33/0x60
 [<ffffffff8116850d>] proc_reg_read+0x6d/0xa0
 [<ffffffff81116e53>] vfs_read+0xc3/0x180
 [<ffffffff81116f5c>] sys_read+0x4c/0x90
 [<ffffffff81030dbb>] system_call_fastpath+0x16/0x1b
...

It's all cause by weird fail path handling in cpufreq_register_driver.
To fix that, shuffle the code to do proper handling with gotos.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Dave Jones <davej@redhat.com>
2011-03-01 18:49:44 -05:00
Thomas Renninger 25e41933b5 perf: Clean up power events by introducing new, more generic ones
Add these new power trace events:

 power:cpu_idle
 power:cpu_frequency
 power:machine_suspend

The old C-state/idle accounting events:
  power:power_start
  power:power_end

Have now a replacement (but we are still keeping the old
tracepoints for compatibility):

  power:cpu_idle

and
  power:power_frequency

is replaced with:
  power:cpu_frequency

power:machine_suspend is newly introduced.

Jean Pihet has a patch integrated into the generic layer
(kernel/power/suspend.c) which will make use of it.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Jean Pihet <jean.pihet@newoldbits.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: rjw@sisk.pl
LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
2011-01-04 08:16:54 +01:00
Julia Lawall bec037aa6c [CPUFREQ] drivers/cpufreq: Adjust confusing if indentation
Indent the body of for_each_cpu.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r disable braces4@
position p1,p2;
statement S1,S2;
@@

(
if (...) { ... }
|
if (...) S1@p1 S2@p2
)

@script:python@
p1 << r.p1;
p2 << r.p2;
@@

if (p1[0].column == p2[0].column):
  cocci.print_main("branch",p1)
  cocci.print_secs("after",p2)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-10-22 11:44:47 -04:00
Neal Buckendahl 9c36f746d7 [CPUFREQ] fix brace coding style issue.
This patch fixes up a brace warning found by the checkpatch.pl tool

Signed-off-by: Neal Buckendahl <nealb001@tbcnet.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:05 -04:00
Thomas Renninger 6f4f2723d0 [CPUFREQ] x86 cpufreq: Make trace_power_frequency cpufreq driver independent
and fix the broken case if a core's frequency depends on others.

trace_power_frequency was only implemented in a rather ungeneric way
in acpi-cpufreq driver's target() function only.
-> Move the call to trace_power_frequency to
   cpufreq.c:cpufreq_notify_transition() where CPUFREQ_POSTCHANGE
   notifier is triggered.
   This will support power frequency tracing by all cpufreq drivers

trace_power_frequency did not trace frequency changes correctly when
the userspace governor was used or when CPU cores' frequency depend
on each other.
-> Moving this into the CPUFREQ_POSTCHANGE notifier and pass the cpu
   which gets switched automatically fixes this.

Robert Schoene provided some important fixes on top of my initial
quick shot version which are integrated in this patch:
- Forgot some changes in power_end trace (TP_printk/variable names)
- Variable dummy in power_end must now be cpu_id
- Use static 64 bit variable instead of unsigned int for cpu_id

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: davej@redhat.com
CC: arjan@infradead.org
CC: linux-kernel@vger.kernel.org
CC: robert.schoene@tu-dresden.de
Tested-by: robert.schoene@tu-dresden.de
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:05 -04:00
Amerigo Wang 226528c610 [CPUFREQ] unexport (un)lock_policy_rwsem* functions
lock_policy_rwsem_* and unlock_policy_rwsem_* functions are scheduled
to be unexported when 2.6.33. Now there are no other callers of them
out of cpufreq.c, unexport them and make them static.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:04 -04:00
Xiaotian Feng cad70a6ae5 [CPUFREQ] fix memory leak in cpufreq_add_dev
We didn't free policy->related_cpus in error path err_unlock_policy.
This is catched by following kmemleak report:

unreferenced object 0xffff88022a0b96d0 (size 512):
  comm "modprobe", pid 886, jiffies 4294689177 (age 780.694s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8111ebe5>] create_object+0x186/0x281
    [<ffffffff814fad4f>] kmemleak_alloc+0x60/0xa7
    [<ffffffff8111127a>] kmem_cache_alloc_node_notrace+0x120/0x142
    [<ffffffff81262e4f>] alloc_cpumask_var_node+0x2c/0xd7
    [<ffffffff81262f0b>] alloc_cpumask_var+0x11/0x13
    [<ffffffff81262f1c>] zalloc_cpumask_var+0xf/0x11
    [<ffffffff8140fac0>] cpufreq_add_dev+0x11f/0x547
    [<ffffffff81334bda>] sysdev_driver_register+0xc2/0x11d
    [<ffffffff8140e334>] cpufreq_register_driver+0xcb/0x1b8
    [<ffffffffa032e040>] 0xffffffffa032e040
    [<ffffffff810021ba>] do_one_initcall+0x5e/0x15c
    [<ffffffff81087f94>] sys_init_module+0xa6/0x1e6
    [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b
    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:02 -04:00
Andrej Gelenberg ffe6275f90 [CPUFREQ] revert "[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)"
395913d0b1 ("[CPUFREQ] remove rwsem lock
from CPUFREQ_GOV_STOP call (second call site)") is not needed, because
there is no rwsem lock in cpufreq_ondemand and cpufreq_conservative
anymore.  Lock should not be released until the work done.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=1594

Signed-off-by: Andrej Gelenberg <andrej.gelenberg@udo.edu>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:01 -04:00
H. Peter Anvin d7be0ce6af Merge commit 'v2.6.34-rc6' into x86/cpu 2010-05-08 14:59:58 -07:00
Borislav Petkov 6dad2a2964 cpufreq: Unify sysfs attribute definition macros
Multiple modules used to define those which are with identical
functionality and were needlessly replicated among the different cpufreq
drivers. Push them into the header and remove duplication.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1270065406-1814-7-git-send-email-bp@amd64.org>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-09 14:07:56 -07:00
Amerigo Wang 499bca9b6d [CPUFREQ] fix a lockdep warning
There is no need to do sysfs_remove_link() or kobject_put() etc.
when policy_rwsem_write is held, move them after releasing the lock.

This fixes the lockdep warning:

halt/4071 is trying to acquire lock:
 (s_active){++++.+}, at: [<c0000000001ef868>] .sysfs_addrm_finish+0x58/0xc0

but task is already holding lock:
 (&per_cpu(cpu_policy_rwsem, cpu)){+.+.+.}, at: [<c0000000004cd6ac>] .lock_policy_rwsem_write+0x84/0xf4

Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-03-31 12:00:21 -04:00
Emese Revfy 52cf25d0ab Driver core: Constify struct sysfs_ops in struct kobj_type
Constify struct sysfs_ops.

This is part of the ops structure constification
effort started by Arjan van de Ven et al.

Benefits of this constification:

 * prevents modification of data that is shared
   (referenced) by many other structure instances
   at runtime

 * detects/prevents accidental (but not intentional)
   modification attempts on archs that enforce
   read-only kernel data at runtime

 * potentially better optimized code as the compiler
   can assume that the const data cannot be changed

 * the compiler/linker move const data into .rodata
   and therefore exclude them from false sharing

Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:49 -08:00
Linus Torvalds d0316554d3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
  m68k: rename global variable vmalloc_end to m68k_vmalloc_end
  percpu: add missing per_cpu_ptr_to_phys() definition for UP
  percpu: Fix kdump failure if booted with percpu_alloc=page
  percpu: make misc percpu symbols unique
  percpu: make percpu symbols in ia64 unique
  percpu: make percpu symbols in powerpc unique
  percpu: make percpu symbols in x86 unique
  percpu: make percpu symbols in xen unique
  percpu: make percpu symbols in cpufreq unique
  percpu: make percpu symbols in oprofile unique
  percpu: make percpu symbols in tracer unique
  percpu: make percpu symbols under kernel/ and mm/ unique
  percpu: remove some sparse warnings
  percpu: make alloc_percpu() handle array types
  vmalloc: fix use of non-existent percpu variable in put_cpu_var()
  this_cpu: Use this_cpu_xx in trace_functions_graph.c
  this_cpu: Use this_cpu_xx for ftrace
  this_cpu: Use this_cpu_xx in nmi handling
  this_cpu: Use this_cpu operations in RCU
  this_cpu: Use this_cpu ops for VM statistics
  ...

Fix up trivial (famous last words) global per-cpu naming conflicts in
	arch/x86/kvm/svm.c
	mm/slab.c
2009-12-14 09:58:24 -08:00
Thomas Renninger e2f74f355e [ACPI/CPUFREQ] Introduce bios_limit per cpu cpufreq sysfs interface
This interface is mainly intended (and implemented) for ACPI _PPC BIOS
frequency limitations, but other cpufreq drivers can also use it for
similar use-cases.

Why is this needed:

Currently it's not obvious why cpufreq got limited.
People see cpufreq/scaling_max_freq reduced, but this could have
happened by:
  - any userspace prog writing to scaling_max_freq
  - thermal limitations
  - hardware (_PPC in ACPI case) limitiations

Therefore export bios_limit (in kHz) to:
  - Point the user that it's the BIOS (broken or intended) which limits
    frequency
  - Export it as a sysfs interface for userspace progs.
    While this was a rarely used feature on laptops, there will appear
    more and more server implemenations providing "Green IT" features like
    allowing the service processor to limit the frequency. People want
    to know about HW/BIOS frequency limitations.

All ACPI P-state driven cpufreq drivers are covered with this patch:
  - powernow-k8
  - powernow-k7
  - acpi-cpufreq

Tested with a patched DSDT which limits the first two cores (_PPC returns 1)
via _PPC, exposed by bios_limit:
# echo 2200000 >cpu2/cpufreq/scaling_max_freq
# cat cpu*/cpufreq/scaling_max_freq
2600000
2600000
2200000
2200000
# #scaling_max_freq shows general user/thermal/BIOS limitations

# cat cpu*/cpufreq/bios_limit
2600000
2600000
2800000
2800000
# #bios_limit only shows the HW/BIOS limitation

CC: Pallipadi Venkatesh <venkatesh.pallipadi@intel.com>
CC: Len Brown <lenb@kernel.org>
CC: davej@codemonkey.org.uk
CC: linux@dominikbrodowski.net

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-11-24 13:33:34 -05:00
Alex Chiang cf3289d0e7 [CPUFREQ] make internal cpufreq_add_dev_* static
No need to export these symbols; make them static.

	cpufreq_add_dev_policy
	cpufreq_add_dev_symlink
	cpufreq_add_dev_interface

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-11-24 13:33:34 -05:00
Prarit Bhargava 90e41bac10 [CPUFREQ] Fix stale cpufreq_cpu_governor pointer
Dave,

Attached is an update of my patch against the cpufreq fixes branch.

Before applying the patch I compiled and booted the tree to see if the panic
was still there -- to my surprise it was not.  This is because of the conversion
of cpufreq_cpu_governor to a char[].

While the panic is kaput, the problem of stale data continues and my patch is
still valid.  It is possible to end up with the wrong governor after hotplug
events because CPUFREQ_DEFAULT_GOVERNOR is statically linked to a default,
while the cpu siblings may have had a different governor assigned by a user.

ie) the patch is still needed in order to keep the governors assigned
properly when hotplugging devices

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-11-17 23:15:04 -05:00
Dmitry Monakhov e77b89f13a [CPUFREQ] Fix use after free on governor restore
Currently on governer backup/restore path we storing governor's pointer.
This is wrong because one may unload governor's module after cpu goes
offline. As result use-after-free will take place on restored cpu.
It is not easy to exploit this bug, but still we have to close this
issue ASAP. Issue was introduced by following commit
084f349394

##TESTCASE##
#!/bin/sh -x
modprobe acpi_cpufreq
# Any non default governor, in may case it is "ondemand"
modprobe cpufreq_ondemand
echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
rmmod acpi_cpufreq
rmmod cpufreq_ondemand
modprobe acpi_cpufreq  # << use-after-free here.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-11-17 23:15:04 -05:00
Tejun Heo f16250669d percpu: make percpu symbols in cpufreq unique
This patch updates percpu related symbols in cpufreq such that percpu
symbols are unique and don't clash with local symbols.  This serves
two purposes of decreasing the possibility of global percpu symbol
collision and allowing dropping per_cpu__ prefix from percpu symbols.

* drivers/cpufreq/cpufreq.c: s/policy_cpu/cpufreq_policy_cpu/
* drivers/cpufreq/freq_table.c: s/show_table/cpufreq_show_table/
* arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c: s/drv_data/acfreq_data/
  					      s/old_perf/acfreq_old_perf/

Partly based on Rusty Russell's "alloc_percpu: rename percpu vars
which cause name clashes" patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2009-10-29 22:34:13 +09:00
Mathieu Desnoyers 395913d0b1 [CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)
remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)

commit	42a06f2166

Missed a call site for CPUFREQ_GOV_STOP to remove the rwlock taken around the
teardown. To make a long story short, the rwlock write-lock causes a circular
dependency with cancel_delayed_work_sync(), because the timer handler takes the
read lock.

Note that all callers to __cpufreq_set_policy are taking the rwsem. All sysfs
callers (writers) hold the write rwsem at the earliest sysfs calling stage.

However, the rwlock write-lock is not needed upon governor stop.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
CC: rjw@sisk.pl
CC: mingo@elte.hu
CC: Shaohua Li <shaohua.li@intel.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
CC: Dave Young <hidave.darkstar@gmail.com>
CC: "Rafael J. Wysocki" <rjw@sisk.pl>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: trenn@suse.de
CC: sven.wegener@stealer.net
CC: cpufreq@vger.kernel.org
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:18 -04:00
Thomas Renninger 8aa84ad8d6 [CPUFREQ] Introduce global, not per core: /sys/devices/system/cpu/cpufreq
Currently everything in the cpufreq layer is per core based.
This does not reflect reality, for example ondemand on conservative
governors have global sysfs variables.

Introduce a global cpufreq directory and add the kobject to the governor
struct, so that governors can easily access it.
The directory is initialized in the cpufreq_core_init initcall and thus will
always be created if cpufreq is compiled in, even if no cpufreq driver is
active later.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:14 -04:00
Thomas Renninger 4bfa042cd3 [CPUFREQ] Bail out of cpufreq_add_dev if the link for a managed CPU got created
Doing:
echo 0 >cpu1/online
echo 1 >cpu1/online

on a managed CPU will result in:
Jul 22 15:15:37 linux kernel: [   80.013864] WARNING: at fs/sysfs/dir.c:487 sysfs_add_one+0xcf/0xe6()
Jul 22 15:15:37 linux kernel: [   80.013866] Hardware name: To Be Filled By O.E.M.
Jul 22 15:15:37 linux kernel: [   80.013868] sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq'
Jul 22 15:15:37 linux kernel: [   80.013870] Modules linked in: powernow_k8
Jul 22 15:15:37 linux kernel: [   80.013874] Pid: 5750, comm: bash Not tainted 2.6.31-rc2 #40
Jul 22 15:15:37 linux kernel: [   80.013876] Call Trace:
Jul 22 15:15:37 linux kernel: [   80.013879]  [<ffffffff8112ebda>] ? sysfs_add_one+0xcf/0xe6
Jul 22 15:15:37 linux kernel: [   80.013884]  [<ffffffff81041926>] warn_slowpath_common+0x77/0xa4
Jul 22 15:15:37 linux kernel: [   80.013888]  [<ffffffff810419a0>] warn_slowpath_fmt+0x3c/0x3e
Jul 22 15:15:37 linux kernel: [   80.013891]  [<ffffffff8112ebda>] sysfs_add_one+0xcf/0xe6
Jul 22 15:15:37 linux kernel: [   80.013894]  [<ffffffff8112f213>] create_dir+0x58/0x87
Jul 22 15:15:37 linux kernel: [   80.013898]  [<ffffffff8112f27a>] sysfs_create_dir+0x38/0x4f
Jul 22 15:15:37 linux kernel: [   80.013902]  [<ffffffff811ffb8a>] kobject_add_internal+0x11f/0x1de
Jul 22 15:15:37 linux kernel: [   80.013905]  [<ffffffff811ffd21>] kobject_add_varg+0x41/0x4e
Jul 22 15:15:37 linux kernel: [   80.013908]  [<ffffffff811ffd7a>] kobject_init_and_add+0x4c/0x57
Jul 22 15:15:37 linux kernel: [   80.013913]  [<ffffffff810667bc>] ? mark_lock+0x22/0x228
Jul 22 15:15:37 linux kernel: [   80.013918]  [<ffffffff813e8a3b>] cpufreq_add_dev_interface+0x40/0x1e4
...

This bug slipped in by git commit:
150b06f7f223cfd0f808737a5243cceca8ea47fa

When splitting up cpufreq_add_dev, the whole cpufreq_add_dev function
is not left anymore, only cpufreq_add_dev_policy.
This patch should reconstruct the identical functionality again as it
was before the split.

CC: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:14 -04:00
Dave Jones ecf7e4611c [CPUFREQ] Factor out policy setting from cpufreq_add_dev
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:14 -04:00
Dave Jones 909a694e33 [CPUFREQ] Factor out interface creation from cpufreq_add_dev
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:13 -04:00
Dave Jones 19d6f7ec3e [CPUFREQ] Factor out symlink creation from cpufreq_add_dev
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:12 -04:00
Dave Jones 059019a3c3 [CPUFREQ] cleanup up -ENOMEM handling in cpufreq_add_dev
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:09 -04:00
Dave Jones 54e6fe167b [CPUFREQ] Reduce scope of cpu_sys_dev in cpufreq_add_dev
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:09 -04:00
Dominik Brodowski ce6c3997c2 [CPUFREQ] Re-enable cpufreq suspend and resume code
Commit 4bc5d34135 is broken and causes regressions:

(1) cpufreq_driver->resume() and ->suspend() were only called on
__powerpc__, but you could set them on all architectures. In fact,
->resume() was defined and used before the PPC-related commit
42d4dc3f4e complained about in 4bc5d34135.

(2) Therfore, the resume functions in acpi_cpufreq and speedstep-smi
would never be called.

(3) This means speedstep-smi would be unusuable after suspend or resume.

The _real_ problem was calling cpufreq_driver->get() with interrupts
off, but it re-enabling interrupts on some platforms. Why is ->get()
necessary?

Some systems like to change the CPU frequency behind our
back, especially during BIOS-intensive operations like suspend or
resume. If such systems also use a CPU frequency-dependant timing loop,
delays might be off by large factors. Therefore, we need to ascertain
as soon as possible that the CPU frequency is indeed at the speed we
think it is. You can do this two ways: either setting it anew, or trying
to get it. The latter is what was done, the former also has the same IRQ
issue.

So, let's try something different: defer the checking to after interrupts
are re-enabled, by calling cpufreq_update_policy() (via schedule_work()).
Timings may be off until this later stage, so let's watch out for
resume regressions caused by the deferred handling of frequency changes
behind the kernel's back.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:08 -04:00
Dave Jones 4bc5d34135 [CPUFREQ] Make cpufreq suspend code conditional on powerpc.
The suspend code runs with interrupts disabled, and the powerpc workaround we
do in the cpufreq suspend hook calls the drivers ->get method.

powernow-k8's ->get does an smp_call_function_single
which needs interrupts enabled

cpufreq's suspend/resume code was added in 42d4dc3f4e to work around
a hardware problem on ppc powerbooks.  If we make all this code
conditional on powerpc, we avoid the issue above.

Signed-off-by: Dave Jones <davej@redhat.com>
2009-08-04 14:32:11 -04:00
Thomas Renninger d5194decd0 [CPUFREQ] Fix a kobject reference bug related to managed CPUs
The first offline/online cycle is successful, the second not.
Doing:
echo 0 >cpu1/online
echo 1 >cpu1/online
echo 0 >cpu1/online

The last command will trigger:
Jul 22 14:39:50 linux kernel: [  593.210125] ------------[ cut here ]------------
Jul 22 14:39:50 linux kernel: [  593.210139] WARNING: at lib/kref.c:43 kref_get+0x23/0x2b()
Jul 22 14:39:50 linux kernel: [  593.210144] Hardware name: To Be Filled By O.E.M.
Jul 22 14:39:50 linux kernel: [  593.210148] Modules linked in: powernow_k8
Jul 22 14:39:50 linux kernel: [  593.210158] Pid: 378, comm: kondemand/2 Tainted: G        W  2.6.31-rc2 #38
Jul 22 14:39:50 linux kernel: [  593.210163] Call Trace:
Jul 22 14:39:50 linux kernel: [  593.210171]  [<ffffffff812008e8>] ? kref_get+0x23/0x2b
Jul 22 14:39:50 linux kernel: [  593.210181]  [<ffffffff81041926>] warn_slowpath_common+0x77/0xa4
Jul 22 14:39:50 linux kernel: [  593.210190]  [<ffffffff81041962>] warn_slowpath_null+0xf/0x11
Jul 22 14:39:50 linux kernel: [  593.210198]  [<ffffffff812008e8>] kref_get+0x23/0x2b
Jul 22 14:39:50 linux kernel: [  593.210206]  [<ffffffff811ffa19>] kobject_get+0x1a/0x22
Jul 22 14:39:50 linux kernel: [  593.210214]  [<ffffffff813e815d>] cpufreq_cpu_get+0x8a/0xcb
Jul 22 14:39:50 linux kernel: [  593.210222]  [<ffffffff813e87d1>] __cpufreq_driver_getavg+0x1d/0x67
Jul 22 14:39:50 linux kernel: [  593.210231]  [<ffffffff813ea18f>] do_dbs_timer+0x158/0x27f
Jul 22 14:39:50 linux kernel: [  593.210240]  [<ffffffff810529ea>] worker_thread+0x200/0x313
...

The output continues on every do_dbs_timer ondemand freq checking poll.
This regression was introduced by git commit:
3f4a782b5c

The policy is released when the cpufreq device is removed in:
__cpufreq_remove_dev():
	/* if this isn't the CPU which is the parent of the kobj, we
	 * only need to unlink, put and exit
	 */

Not creating the symlink is not sever at all.
As long as:
sysfs_remove_link(&sys_dev->kobj, "cpufreq");
handles it gracefully that the symlink did not exist.
Possibly no error should be returned at all, because ondemand
governor would still provide the same functionality.
Userspace in userspace gov case might be confused if the link
is missing.

Resolves http://bugzilla.kernel.org/show_bug.cgi?id=13903

CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-08-04 14:32:11 -04:00
Prarit Bhargava 42c74b84c6 [CPUFREQ] Do not set policy for offline cpus
Suspend/Resume fails on multi socket, multi core systems because the cpufreq
code erroneously sets the per_cpu policy_cpu value when a logical cpu is
offline.

This most notably results in missing sysfs files that are used to set the
cpu frequencies of the various cpus.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-08-04 14:32:10 -04:00
Dave Jones 5e1596f753 [CPUFREQ] Fix compile failure in cpufreq.c
managed_policy is out of scope for the non-smp case.
Declare it locally where used (twice)

Signed-off-by: Dave Jones <davej@redhat.com>
2009-07-08 19:04:23 -04:00
Mathieu Desnoyers 3f4a782b5c [CPUFREQ] fix (utter) cpufreq_add_dev mess
OK, I've tried to clean it up the best I could, but please test this with
concurrent cpu hotplug and cpufreq add/remove in loops. I'm sure we will make
other interesting findings.

This is step one of fixing the overall locking dependency mess in cpufreq.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
CC: rjw@sisk.pl
CC: mingo@elte.hu
CC: Shaohua Li <shaohua.li@intel.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
CC: Dave Young <hidave.darkstar@gmail.com>
CC: "Rafael J. Wysocki" <rjw@sisk.pl>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: sven.wegener@stealer.net
CC: cpufreq@vger.kernel.org
CC: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-07-06 21:38:28 -04:00
venkatesh.pallipadi@intel.com 7d26e2d5e2 [CPUFREQ] Eliminate the recent lockdep warnings in cpufreq
Commit b14893a62c although it was very
much needed to properly cleanup ondemand timer, opened-up a can of worms
related to locking dependencies in cpufreq.

Patch here defines the need for dbs_mutex and cleans up its usage in
ondemand governor. This also resolves the lockdep warnings reported here

http://lkml.indiana.edu/hypermail/linux/kernel/0906.1/01925.html
http://lkml.indiana.edu/hypermail/linux/kernel/0907.0/00820.html

and few others..

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-07-06 21:38:27 -04:00
Yinghai Lu eaa958402e cpumask: alloc zeroed cpumask for static cpumask_var_ts
These are defined as static cpumask_var_t so if MAXSMP is not used,
they are cleared already.  Avoid surprises when MAXSMP is enabled.

Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-09 22:30:27 +09:30
Mathieu Desnoyers 42a06f2166 [CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call
* Rafael J. Wysocki (rjw@sisk.pl) wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.28 and 2.6.29.
>
> The following bug entry is on the current list of known regressions
> introduced between 2.6.28 and 2.6.29.  Please verify if it still should
> be listed and let me know (either way).
>
>
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=13186
> Subject		: cpufreq timer teardown problem
> Submitter	: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Date		: 2009-04-23 14:00 (24 days old)
> References	: http://marc.info/?l=linux-kernel&m=124049523515036&w=4
> Handled-By	: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Patch		: http://patchwork.kernel.org/patch/19754/
> 		  http://patchwork.kernel.org/patch/19753/

The patches linked above depend on the following patch to remove
circular locking dependency :

cpufreq: remove rwsem lock from CPUFREQ_GOV_STOP call

(the following issue was faced when using cancel_delayed_work_sync() in the
timer teardown (which fixes a race).

* KOSAKI Motohiro (kosaki.motohiro@jp.fujitsu.com) wrote:
> Hi
>
> my box output following warnings.
> it seems regression by commit 7ccc7608b836e58fbacf65ee4f8eefa288e86fac.
>
> A: work -> do_dbs_timer()  -> cpu_policy_rwsem
> B: store() -> cpu_policy_rwsem -> cpufreq_governor_dbs() -> work
>
>

Hrm, I think it must be due to my attempt to fix the timer teardown race
in ondemand governor mixed with new locking behavior in 2.6.30-rc.

The rwlock seems to be taken around the whole call to
cpufreq_governor_dbs(), when it should be only taken around accesses to
the locked data, and especially *not* around the call to
dbs_timer_exit().

Reverting my fix attempt would put the teardown race back in place
(replacing the cancel_delayed_work_sync by cancel_delayed_work).
Instead, a proper fix would imply modifying this critical section :

cpufreq.c: __cpufreq_remove_dev()
...
        if (cpufreq_driver->target)
                __cpufreq_governor(data, CPUFREQ_GOV_STOP);

        unlock_policy_rwsem_write(cpu);

To make sure the __cpufreq_governor() callback is not called with rwsem
held. This would allow execution of cancel_delayed_work_sync() without
being nested within the rwsem.

Applies on top of the 2.6.30-rc5 tree.

Required to remove circular dep in teardown of both conservative and
ondemande governors so they can use cancel_delayed_work_sync().
CPUFREQ_GOV_STOP does not modify the policy, therefore this locking seemed
unneeded.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Greg KH <greg@kroah.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: "Rafael J. Wysocki" <rjw@sisk.pl>
CC: Ben Slusky <sluskyb@paranoiacs.org>
CC: Chris Wright <chrisw@sous-sol.org>
CC: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-05-26 12:04:50 -04:00
Linus Torvalds ada19a31a9 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: (35 commits)
  [CPUFREQ] Prevent p4-clockmod from auto-binding to the ondemand governor.
  [CPUFREQ] Make cpufreq-nforce2 less obnoxious
  [CPUFREQ] p4-clockmod reports wrong frequency.
  [CPUFREQ] powernow-k8: Use a common exit path.
  [CPUFREQ] Change link order of x86 cpufreq modules
  [CPUFREQ] conservative: remove 10x from def_sampling_rate
  [CPUFREQ] conservative: fixup governor to function more like ondemand logic
  [CPUFREQ] conservative: fix dbs_cpufreq_notifier so freq is not locked
  [CPUFREQ] conservative: amend author's email address
  [CPUFREQ] Use swap() in longhaul.c
  [CPUFREQ] checkpatch cleanups for acpi-cpufreq
  [CPUFREQ] powernow-k8: Only print error message once, not per core.
  [CPUFREQ] ondemand/conservative: sanitize sampling_rate restrictions
  [CPUFREQ] ondemand/conservative: deprecate sampling_rate{min,max}
  [CPUFREQ] powernow-k8: Always compile powernow-k8 driver with ACPI support
  [CPUFREQ] Introduce /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_transition_latency
  [CPUFREQ] checkpatch cleanups for powernow-k8
  [CPUFREQ] checkpatch cleanups for ondemand governor.
  [CPUFREQ] checkpatch cleanups for powernow-k7
  [CPUFREQ] checkpatch cleanups for speedstep related drivers.
  ...
2009-03-26 11:04:08 -07:00
Dave Jones 129f8ae9b1 Revert "[CPUFREQ] Disable sysfs ui for p4-clockmod."
This reverts commit e088e4c9cd.

Removing the sysfs interface for p4-clockmod was flagged as a
regression in bug 12826.

Course of action:
 - Find out the remaining causes of overheating, and fix them
   if possible. ACPI should be doing the right thing automatically.
   If it isn't, we need to fix that.
 - mark p4-clockmod ui as deprecated
 - try again with the removal in six months.

It's not really feasible to printk about the deprecation, because
it needs to happen at all the sysfs entry points, which means adding
a lot of strcmp("p4-clockmod".. calls to the core, which.. bleuch.

Signed-off-by: Dave Jones <davej@redhat.com>
2009-03-09 15:07:33 -04:00
Thomas Renninger ed12978453 [CPUFREQ] Introduce /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_transition_latency
It's not only useful for the ondemand and conservative governors, but
also for userspace daemons to know about the HW transition latency of
the CPU.
It is especially useful for userspace to know about this value when
the ondemand or conservative governors are run. The sampling rate
control value depends on it and for userspace being able to set sane
tuning values there it has to know about the transition latency.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:31 -05:00
Dave Jones 29464f2813 [CPUFREQ] checkpatch cleanups for cpufreq core
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:29 -05:00
Rusty Russell 835481d9bc cpumask: convert struct cpufreq_policy to cpumask_var_t
Impact: use new cpumask API to reduce memory usage

This is part of an effort to reduce structure sizes for machines
configured with large NR_CPUS.  cpumask_t gets replaced by
cpumask_var_t, which is either struct cpumask[1] (small NR_CPUS) or
struct cpumask * (large NR_CPUS).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-06 09:05:31 +01:00
Mike Chan 187d9f4ed4 [CPUFREQ] Fix on resume, now preserves user policy min/max.
Previously driver resume would always set the current policy min/max with
the cpuinfo min/max, defined by user_policy.min/max. Resulting in a reset
of policy settings when policy.min/max != cpuinfo.min/max when coming out
of suspend. Now user_policy is saved as the policy instead of cpuinfo to
preserve what the user actually set.

Signed-off-by: Mike Chan <mike@android.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-12-05 15:20:11 -05:00
Matthew Garrett e088e4c9cd [CPUFREQ] Disable sysfs ui for p4-clockmod.
p4-clockmod has a long history of abuse.   It pretends to be a CPU
frequency scaling driver, even though it doesn't actually change
the CPU frequency, but instead just modulates the frequency with
wait-states.
The biggest misconception is that when running at the lower 'frequency'
p4-clockmod is saving power.  This isn't the case, as workloads running
slower take longer to complete, preventing the CPU from entering deep C states.

However p4-clockmod does have a purpose.  It can prevent overheating.
Having it hooked up to the cpufreq interfaces is the wrong way to achieve
cooling however. It should instead be hooked up to ACPI.

This diff introduces a means for a cpufreq driver to register with the
cpufreq core, but not present a sysfs interface.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-12-05 15:20:10 -05:00
venkatesh.pallipadi@intel.com bf0b90e357 [CPUFREQ][1/6] cpufreq: Add cpu number parameter to __cpufreq_driver_getavg()
Add a cpu parameter to __cpufreq_driver_getavg(). This is needed for software
cpufreq coordination where policy->cpu may not be same as the CPU on which we
want to getavg frequency.

A follow-on patch will use this parameter to getavg freq from all cpus
in policy->cpus.

Change since last patch. Fix the offline/online and suspend/resume
oops reported by Youquan Song <youquan.song@intel.com>

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-10-09 13:52:43 -04:00
Julia Lawall f1829e4a37 [CPUFREQ] drivers/cpufreq/cpufreq.c: Adjust error handling code involving cpufreq_cpu_put
After calling cpufreq_cpu_get, error handling code should call
cpufreq_cpu_put.

The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@r@
expression x,E;
statement S;
position p1,p2,p3;
@@

(
if ((x = cpufreq_cpu_get@p1(...)) == NULL || ...) S
|
x = cpufreq_cpu_get@p1(...)
... when != x
if (x == NULL || ...) S
)
<...
if@p3 (...) { ... when != cpufreq_cpu_put(x)
                  when != if (x) { ... cpufreq_cpu_put(x); ...}
    return@p2 ...;
}
...>
(
return x;
|
return 0;
|
x = E
|
E = x
|
cpufreq_cpu_put(x)
)

@exists@
position r.p1,r.p2,r.p3;
expression x;
int ret != 0;
statement S;
@@

* x = cpufreq_cpu_get@p1(...)
  <...
* if@p3 (...)
  S
  ...>
* return@p2 \(NULL\|ret\);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-10-09 13:52:43 -04:00
Thomas Renninger a1531acd43 cpufreq acpi: only call _PPC after cpufreq ACPI init funcs got called already
Ingo Molnar provided a fix to not call _PPC at processor driver
initialization time in "[PATCH] ACPI: fix cpufreq regression" (git
commit e4233dec74)

But it can still happen that _PPC is called at processor driver
initialization time.

This patch should make sure that this is not possible anymore.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-30 09:41:43 -07:00
Linus Torvalds 26dcce0fab Merge branch 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
  NR_CPUS: Replace NR_CPUS in speedstep-centrino.c
  cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP
  NR_CPUS: Replace NR_CPUS in cpufreq userspace routines
  NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var
  NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c
  NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c
  NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c
  NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c
  cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix
  cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target
  cpumask: Provide a generic set of CPUMASK_ALLOC macros
  cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c
  cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c
  cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c
  cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c
  cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c
  cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr
  Revert "cpumask: introduce new APIs"
  cpumask: make for_each_cpu_mask a bit smaller
  net: Pass reference to cpumask variable in net/sunrpc/svc.c
  ...

Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually
2008-07-23 18:37:44 -07:00
Linus Torvalds 6d52dcbe56 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] cpufreq: remove CVS keywords
  [CPUFREQ] change cpu freq arrays to per_cpu variables
2008-07-21 15:10:37 -07:00
Ingo Molnar 68083e05d7 Merge commit 'v2.6.26-rc9' into cpus4096 2008-07-06 14:23:39 +02:00
Linus Torvalds cc55875e26 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Fix format string bug.
2008-06-09 11:27:55 -07:00
Chris Wright 326f6a5c9c [CPUFREQ] Fix format string bug.
Format string bug.  Not exploitable, as this is only writable by root,
but worth fixing all the same.

Spotted-by: Ilja van Sprundel <ilja@netric.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-06-09 13:24:55 -04:00
CHIKAMA masaki 879000f944 cpufreq: fix null object access on Transmeta CPU
If cpu specific cpufreq driver(i.e.  longrun) has "setpolicy" function,
governor object isn't set into cpufreq_policy object at "__cpufreq_set_policy"
function in driver/cpufreq/cpufreq.c .

This causes a null object access at "store_scaling_setspeed" and
"show_scaling_setspeed" function in driver/cpufreq/cpufreq.c when reading or
writing through /sys interface (ex.  cat
/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed)

Addresses:
	http://bugzilla.kernel.org/show_bug.cgi?id=10654
	https://bugzilla.redhat.com/show_bug.cgi?id=443354

Signed-off-by: CHIKAMA Masaki <masaki.chikama@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Chuck Ebbert <cebbert@redhat.com>
Acked-by: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-06 11:29:11 -07:00
Lothar Waßmann dca0261393 [CPUFREQ] fix double unlock of cpu_policy_rwsem in drivers/cpufreq/cpufreq.c
In drivers/cpufreq/cpufreq.c the function cpufreq_add_dev() takes the
error exit 'err_out_unregister' from different places once with the
'cpu_policy_rwsem' lock held, once with the lock released:
|		if (ret)
|			goto err_out_unregister;
|	}
|
|	policy->governor = NULL; /* to assure that the starting sequence is
|				  * run in cpufreq_set_policy */
|
|	/* set default policy */
|	ret = __cpufreq_set_policy(policy, &new_policy);
|	policy->user_policy.policy = policy->policy;
|	policy->user_policy.governor = policy->governor;
|
|	unlock_policy_rwsem_write(cpu);
|
|	if (ret) {
|		dprintk("setting policy failed\n");
|		goto err_out_unregister;
|	}

This leads to the following error message in case of a failing
__cpufreq_set_policy() call:
=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
swapper/1 is trying to release lock (&per_cpu(cpu_policy_rwsem, cpu)) at:
[<c01b4564>] unlock_policy_rwsem_write+0x30/0x40
but there are no more locks to release!

other info that might help us debug this:
1 lock held by swapper/1:
 #0:  (sysdev_drivers_lock){--..}, at: [<c018fd18>] sysdev_driver_register+0x74/0x130

stack backtrace:
[<c002f588>] (dump_stack+0x0/0x14) from [<c00692fc>] (print_unlock_inbalance_bug+0xc8/0x104)
[<c0069234>] (print_unlock_inbalance_bug+0x0/0x104) from [<c006b7ac>] (lock_release_non_nested+0xc4/0x19c)
 r6:00000028 r5:c3c1ab80 r4:c01b4564
[<c006b6e8>] (lock_release_non_nested+0x0/0x19c) from [<c006b9e0>] (lock_release+0x15c/0x18c)
 r8:60000013 r7:00000001 r6:c01b4564 r5:c0541bb4 r4:c3c1ab80
[<c006b884>] (lock_release+0x0/0x18c) from [<c0061ba0>] (up_write+0x24/0x30)
 r8:c0541b80 r7:00000000 r6:ffffffea r5:c3c34828 r4:c0541b8c
[<c0061b7c>] (up_write+0x0/0x30) from [<c01b4564>] (unlock_policy_rwsem_write+0x30/0x40)
 r4:c3c34884
[<c01b4534>] (unlock_policy_rwsem_write+0x0/0x40) from [<c01b4c40>] (cpufreq_add_dev+0x324/0x398)
[<c01b491c>] (cpufreq_add_dev+0x0/0x398) from [<c018fd64>] (sysdev_driver_register+0xc0/0x130)
[<c018fca4>] (sysdev_driver_register+0x0/0x130) from [<c01b3574>] (cpufreq_register_driver+0xbc/0x174)

Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-05-29 12:10:12 -04:00