Skip to content

Commit

Permalink
Merge tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/…
Browse files Browse the repository at this point in the history
…git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "The big ticket items here are the rework of suspend-to-idle in order
  to add proper support for power button wakeup from it on recent Dell
  laptops and the rework of interfaces exporting the current CPU
  frequency on x86.

  In addition to that, support for a few new pieces of hardware is
  added, the PCI/ACPI device wakeup infrastructure is simplified
  significantly and the wakeup IRQ framework is fixed to unbreak the IRQ
  bus locking infrastructure.

  Also, there are some functional improvements for intel_pstate, tools
  updates and small fixes and cleanups all over.

  Specifics:

   - Rework suspend-to-idle to allow it to take wakeup events signaled
     by the EC into account on ACPI-based platforms in order to properly
     support power button wakeup from suspend-to-idle on recent Dell
     laptops (Rafael Wysocki).

     That includes the core suspend-to-idle code rework, support for the
     Low Power S0 _DSM interface, and support for the ACPI INT0002
     Virtual GPIO device from Hans de Goede (required for USB keyboard
     wakeup from suspend-to-idle to work on some machines).

   - Stop trying to export the current CPU frequency via /proc/cpuinfo
     on x86 as that is inaccurate and confusing (Len Brown).

   - Rework the way in which the current CPU frequency is exported by
     the kernel (over the cpufreq sysfs interface) on x86 systems with
     the APERF and MPERF registers by always using values read from
     these registers, when available, to compute the current frequency
     regardless of which cpufreq driver is in use (Len Brown).

   - Rework the PCI/ACPI device wakeup infrastructure to remove the
     questionable and artificial distinction between "devices that can
     wake up the system from sleep states" and "devices that can
     generate wakeup signals in the working state" from it, which allows
     the code to be simplified quite a bit (Rafael Wysocki).

   - Fix the wakeup IRQ framework by making it use SRCU instead of RCU
     which doesn't allow sleeping in the read-side critical sections,
     but which in turn is expected to be allowed by the IRQ bus locking
     infrastructure (Thomas Gleixner).

   - Modify some computations in the intel_pstate driver to avoid
     rounding errors resulting from them (Srinivas Pandruvada).

   - Reduce the overhead of the intel_pstate driver in the HWP
     (hardware-managed P-states) mode and when the "performance" P-state
     selection algorithm is in use by making it avoid registering
     scheduler callbacks in those cases (Len Brown).

   - Rework the energy_performance_preference sysfs knob in intel_pstate
     by changing the values that correspond to different symbolic hint
     names used by it (Len Brown).

   - Make it possible to use more than one cpuidle driver at the same
     time on ARM (Daniel Lezcano).

   - Make it possible to prevent the cpuidle menu governor from using
     the 0 state by disabling it via sysfs (Nicholas Piggin).

   - Add support for FFH (Fixed Functional Hardware) MWAIT in ACPI C1 on
     AMD systems (Yazen Ghannam).

   - Make the CPPC cpufreq driver take the lowest nonlinear performance
     information into account (Prashanth Prakash).

   - Add support for hi3660 to the cpufreq-dt driver, fix the imx6q
     driver and clean up the sfi, exynos5440 and intel_pstate drivers
     (Colin Ian King, Krzysztof Kozlowski, Octavian Purdila, Rafael
     Wysocki, Tao Wang).

   - Fix a few minor issues in the generic power domains (genpd)
     framework and clean it up somewhat (Krzysztof Kozlowski, Mikko
     Perttunen, Viresh Kumar).

   - Fix a couple of minor issues in the operating performance points
     (OPP) framework and clean it up somewhat (Viresh Kumar).

   - Fix a CONFIG dependency in the hibernation core and clean it up
     slightly (Balbir Singh, Arvind Yadav, BaoJun Luo).

   - Add rk3228 support to the rockchip-io adaptive voltage scaling
     (AVS) driver (David Wu).

   - Fix an incorrect bit shift operation in the RAPL power capping
     driver (Adam Lessnau).

   - Add support for the EPP field in the HWP (hardware managed
     P-states) control register, HWP.EPP, to the x86_energy_perf_policy
     tool and update msr-index.h with HWP.EPP values (Len Brown).

   - Fix some minor issues in the turbostat tool (Len Brown).

   - Add support for AMD family 0x17 CPUs to the cpupower tool and fix a
     minor issue in it (Sherry Hurwitz).

   - Assorted cleanups, mostly related to the constification of some
     data structures (Arvind Yadav, Joe Perches, Kees Cook, Krzysztof
     Kozlowski)"

* tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (69 commits)
  cpufreq: Update scaling_cur_freq documentation
  cpufreq: intel_pstate: Clean up after performance governor changes
  PM: hibernate: constify attribute_group structures.
  cpuidle: menu: allow state 0 to be disabled
  intel_idle: Use more common logging style
  PM / Domains: Fix missing default_power_down_ok comment
  PM / Domains: Fix unsafe iteration over modified list of domains
  PM / Domains: Fix unsafe iteration over modified list of domain providers
  PM / Domains: Fix unsafe iteration over modified list of device links
  PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device
  PM / Domains: Call driver's noirq callbacks
  PM / core: Drop run_wake flag from struct dev_pm_info
  PCI / PM: Simplify device wakeup settings code
  PCI / PM: Drop pme_interrupt flag from struct pci_dev
  ACPI / PM: Consolidate device wakeup settings code
  ACPI / PM: Drop run_wake from struct acpi_device_wakeup_flags
  PM / QoS: constify *_attribute_group.
  PM / AVS: rockchip-io: add io selectors and supplies for rk3228
  powercap/RAPL: prevent overridding bits outside of the mask
  PM / sysfs: Constify attribute groups
  ...
  • Loading branch information
torvalds committed Jul 4, 2017
2 parents b39de27 + 8f8e5c3 commit 408c986
Show file tree
Hide file tree
Showing 76 changed files with 2,887 additions and 925 deletions.
12 changes: 6 additions & 6 deletions Documentation/admin-guide/pm/cpufreq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -269,16 +269,16 @@ are the following:
``scaling_cur_freq``
Current frequency of all of the CPUs belonging to this policy (in kHz).

For the majority of scaling drivers, this is the frequency of the last
P-state requested by the driver from the hardware using the scaling
In the majority of cases, this is the frequency of the last P-state
requested by the scaling driver from the hardware using the scaling
interface provided by it, which may or may not reflect the frequency
the CPU is actually running at (due to hardware design and other
limitations).

Some scaling drivers (e.g. |intel_pstate|) attempt to provide
information more precisely reflecting the current CPU frequency through
this attribute, but that still may not be the exact current CPU
frequency as seen by the hardware at the moment.
Some architectures (e.g. ``x86``) may attempt to provide information
more precisely reflecting the current CPU frequency through this
attribute, but that still may not be the exact current CPU frequency as
seen by the hardware at the moment.

``scaling_driver``
The scaling driver currently in use.
Expand Down
6 changes: 2 additions & 4 deletions Documentation/admin-guide/pm/intel_pstate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -157,10 +157,8 @@ Without HWP, this P-state selection algorithm is always the same regardless of
the processor model and platform configuration.

It selects the maximum P-state it is allowed to use, subject to limits set via
``sysfs``, every time the P-state selection computations are carried out by the
driver's utilization update callback for the given CPU (that does not happen
more often than every 10 ms), but the hardware configuration will not be changed
if the new P-state is the same as the current one.
``sysfs``, every time the driver configuration for the given CPU is updated
(e.g. via ``sysfs``).

This is the default P-state selection algorithm if the
:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option
Expand Down
38 changes: 19 additions & 19 deletions Documentation/devicetree/bindings/opp/opp.txt
Original file line number Diff line number Diff line change
Expand Up @@ -186,20 +186,20 @@ Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together.
compatible = "operating-points-v2";
opp-shared;

opp@1000000000 {
opp-1000000000 {
opp-hz = /bits/ 64 <1000000000>;
opp-microvolt = <975000 970000 985000>;
opp-microamp = <70000>;
clock-latency-ns = <300000>;
opp-suspend;
};
opp@1100000000 {
opp-1100000000 {
opp-hz = /bits/ 64 <1100000000>;
opp-microvolt = <1000000 980000 1010000>;
opp-microamp = <80000>;
clock-latency-ns = <310000>;
};
opp@1200000000 {
opp-1200000000 {
opp-hz = /bits/ 64 <1200000000>;
opp-microvolt = <1025000>;
clock-latency-ns = <290000>;
Expand Down Expand Up @@ -265,20 +265,20 @@ independently.
* independently.
*/

opp@1000000000 {
opp-1000000000 {
opp-hz = /bits/ 64 <1000000000>;
opp-microvolt = <975000 970000 985000>;
opp-microamp = <70000>;
clock-latency-ns = <300000>;
opp-suspend;
};
opp@1100000000 {
opp-1100000000 {
opp-hz = /bits/ 64 <1100000000>;
opp-microvolt = <1000000 980000 1010000>;
opp-microamp = <80000>;
clock-latency-ns = <310000>;
};
opp@1200000000 {
opp-1200000000 {
opp-hz = /bits/ 64 <1200000000>;
opp-microvolt = <1025000>;
opp-microamp = <90000;
Expand Down Expand Up @@ -341,20 +341,20 @@ DVFS state together.
compatible = "operating-points-v2";
opp-shared;

opp@1000000000 {
opp-1000000000 {
opp-hz = /bits/ 64 <1000000000>;
opp-microvolt = <975000 970000 985000>;
opp-microamp = <70000>;
clock-latency-ns = <300000>;
opp-suspend;
};
opp@1100000000 {
opp-1100000000 {
opp-hz = /bits/ 64 <1100000000>;
opp-microvolt = <1000000 980000 1010000>;
opp-microamp = <80000>;
clock-latency-ns = <310000>;
};
opp@1200000000 {
opp-1200000000 {
opp-hz = /bits/ 64 <1200000000>;
opp-microvolt = <1025000>;
opp-microamp = <90000>;
Expand All @@ -367,20 +367,20 @@ DVFS state together.
compatible = "operating-points-v2";
opp-shared;

opp@1300000000 {
opp-1300000000 {
opp-hz = /bits/ 64 <1300000000>;
opp-microvolt = <1050000 1045000 1055000>;
opp-microamp = <95000>;
clock-latency-ns = <400000>;
opp-suspend;
};
opp@1400000000 {
opp-1400000000 {
opp-hz = /bits/ 64 <1400000000>;
opp-microvolt = <1075000>;
opp-microamp = <100000>;
clock-latency-ns = <400000>;
};
opp@1500000000 {
opp-1500000000 {
opp-hz = /bits/ 64 <1500000000>;
opp-microvolt = <1100000 1010000 1110000>;
opp-microamp = <95000>;
Expand Down Expand Up @@ -409,7 +409,7 @@ Example 4: Handling multiple regulators
compatible = "operating-points-v2";
opp-shared;

opp@1000000000 {
opp-1000000000 {
opp-hz = /bits/ 64 <1000000000>;
opp-microvolt = <970000>, /* Supply 0 */
<960000>, /* Supply 1 */
Expand All @@ -422,7 +422,7 @@ Example 4: Handling multiple regulators

/* OR */

opp@1000000000 {
opp-1000000000 {
opp-hz = /bits/ 64 <1000000000>;
opp-microvolt = <975000 970000 985000>, /* Supply 0 */
<965000 960000 975000>, /* Supply 1 */
Expand All @@ -435,7 +435,7 @@ Example 4: Handling multiple regulators

/* OR */

opp@1000000000 {
opp-1000000000 {
opp-hz = /bits/ 64 <1000000000>;
opp-microvolt = <975000 970000 985000>, /* Supply 0 */
<965000 960000 975000>, /* Supply 1 */
Expand Down Expand Up @@ -467,7 +467,7 @@ Example 5: opp-supported-hw
status = "okay";
opp-shared;

opp@600000000 {
opp-600000000 {
/*
* Supports all substrate and process versions for 0xF
* cuts, i.e. only first four cuts.
Expand All @@ -478,7 +478,7 @@ Example 5: opp-supported-hw
...
};

opp@800000000 {
opp-800000000 {
/*
* Supports:
* - cuts: only one, 6th cut (represented by 6th bit).
Expand Down Expand Up @@ -510,15 +510,15 @@ Example 6: opp-microvolt-<name>, opp-microamp-<name>:
compatible = "operating-points-v2";
opp-shared;

opp@1000000000 {
opp-1000000000 {
opp-hz = /bits/ 64 <1000000000>;
opp-microvolt-slow = <915000 900000 925000>;
opp-microvolt-fast = <975000 970000 985000>;
opp-microamp-slow = <70000>;
opp-microamp-fast = <71000>;
};

opp@1200000000 {
opp-1200000000 {
opp-hz = /bits/ 64 <1200000000>;
opp-microvolt-slow = <915000 900000 925000>, /* Supply vcc0 */
<925000 910000 935000>; /* Supply vcc1 */
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ SoC is on the same page.
Required properties:
- compatible: should be one of:
- "rockchip,rk3188-io-voltage-domain" for rk3188
- "rockchip,rk3228-io-voltage-domain" for rk3228
- "rockchip,rk3288-io-voltage-domain" for rk3288
- "rockchip,rk3328-io-voltage-domain" for rk3328
- "rockchip,rk3368-io-voltage-domain" for rk3368
Expand Down Expand Up @@ -59,6 +60,12 @@ Possible supplies for rk3188:
- vccio1-supply: The supply connected to VCCIO1.
Sometimes also labeled VCCIO1 and VCCIO2.

Possible supplies for rk3228:
- vccio1-supply: The supply connected to VCCIO1.
- vccio2-supply: The supply connected to VCCIO2.
- vccio3-supply: The supply connected to VCCIO3.
- vccio4-supply: The supply connected to VCCIO4.

Possible supplies for rk3288:
- audio-supply: The supply connected to APIO4_VDD.
- bb-supply: The supply connected to APIO5_VDD.
Expand Down
7 changes: 2 additions & 5 deletions Documentation/power/runtime_pm.txt
Original file line number Diff line number Diff line change
Expand Up @@ -105,9 +105,9 @@ knows what to do to handle the device).

In particular, if the driver requires remote wakeup capability (i.e. hardware
mechanism allowing the device to request a change of its power state, such as
PCI PME) for proper functioning and device_run_wake() returns 'false' for the
PCI PME) for proper functioning and device_can_wakeup() returns 'false' for the
device, then ->runtime_suspend() should return -EBUSY. On the other hand, if
device_run_wake() returns 'true' for the device and the device is put into a
device_can_wakeup() returns 'true' for the device and the device is put into a
low-power state during the execution of the suspend callback, it is expected
that remote wakeup will be enabled for the device. Generally, remote wakeup
should be enabled for all input devices put into low-power states at run time.
Expand Down Expand Up @@ -253,9 +253,6 @@ defined in include/linux/pm.h:
being executed for that device and it is not practical to wait for the
suspend to complete; means "start a resume as soon as you've suspended"

unsigned int run_wake;
- set if the device is capable of generating runtime wake-up events

enum rpm_status runtime_status;
- the runtime PM status of the device; this field's initial value is
RPM_SUSPENDED, which means that each device is initially regarded by the
Expand Down
18 changes: 12 additions & 6 deletions arch/x86/include/asm/msr-index.h
Original file line number Diff line number Diff line change
Expand Up @@ -251,9 +251,13 @@
#define HWP_MIN_PERF(x) (x & 0xff)
#define HWP_MAX_PERF(x) ((x & 0xff) << 8)
#define HWP_DESIRED_PERF(x) ((x & 0xff) << 16)
#define HWP_ENERGY_PERF_PREFERENCE(x) ((x & 0xff) << 24)
#define HWP_ACTIVITY_WINDOW(x) ((x & 0xff3) << 32)
#define HWP_PACKAGE_CONTROL(x) ((x & 0x1) << 42)
#define HWP_ENERGY_PERF_PREFERENCE(x) (((unsigned long long) x & 0xff) << 24)
#define HWP_EPP_PERFORMANCE 0x00
#define HWP_EPP_BALANCE_PERFORMANCE 0x80
#define HWP_EPP_BALANCE_POWERSAVE 0xC0
#define HWP_EPP_POWERSAVE 0xFF
#define HWP_ACTIVITY_WINDOW(x) ((unsigned long long)(x & 0xff3) << 32)
#define HWP_PACKAGE_CONTROL(x) ((unsigned long long)(x & 0x1) << 42)

/* IA32_HWP_STATUS */
#define HWP_GUARANTEED_CHANGE(x) (x & 0x1)
Expand Down Expand Up @@ -476,9 +480,11 @@
#define MSR_MISC_PWR_MGMT 0x000001aa

#define MSR_IA32_ENERGY_PERF_BIAS 0x000001b0
#define ENERGY_PERF_BIAS_PERFORMANCE 0
#define ENERGY_PERF_BIAS_NORMAL 6
#define ENERGY_PERF_BIAS_POWERSAVE 15
#define ENERGY_PERF_BIAS_PERFORMANCE 0
#define ENERGY_PERF_BIAS_BALANCE_PERFORMANCE 4
#define ENERGY_PERF_BIAS_NORMAL 6
#define ENERGY_PERF_BIAS_BALANCE_POWERSAVE 8
#define ENERGY_PERF_BIAS_POWERSAVE 15

#define MSR_IA32_PACKAGE_THERM_STATUS 0x000001b1

Expand Down
5 changes: 2 additions & 3 deletions arch/x86/include/asm/suspend_64.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,7 @@ struct saved_context {
set_debugreg((thread)->debugreg##register, register)

/* routines for saving/restoring kernel state */
extern int acpi_save_state_mem(void);
extern char core_restore_code;
extern char restore_registers;
extern char core_restore_code[];
extern char restore_registers[];

#endif /* _ASM_X86_SUSPEND_64_H */
3 changes: 2 additions & 1 deletion arch/x86/kernel/acpi/cstate.c
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,8 @@ static int __init ffh_cstate_init(void)
{
struct cpuinfo_x86 *c = &boot_cpu_data;

if (c->x86_vendor != X86_VENDOR_INTEL)
if (c->x86_vendor != X86_VENDOR_INTEL &&
c->x86_vendor != X86_VENDOR_AMD)
return -1;

cpu_cstate_entry = alloc_percpu(struct cstate_entry);
Expand Down
1 change: 1 addition & 0 deletions arch/x86/kernel/cpu/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ obj-y += common.o
obj-y += rdrand.o
obj-y += match.o
obj-y += bugs.o
obj-$(CONFIG_CPU_FREQ) += aperfmperf.o

obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o
Expand Down
79 changes: 79 additions & 0 deletions arch/x86/kernel/cpu/aperfmperf.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
/*
* x86 APERF/MPERF KHz calculation for
* /sys/.../cpufreq/scaling_cur_freq
*
* Copyright (C) 2017 Intel Corp.
* Author: Len Brown <[email protected]>
*
* This file is licensed under GPLv2.
*/

#include <linux/jiffies.h>
#include <linux/math64.h>
#include <linux/percpu.h>
#include <linux/smp.h>

struct aperfmperf_sample {
unsigned int khz;
unsigned long jiffies;
u64 aperf;
u64 mperf;
};

static DEFINE_PER_CPU(struct aperfmperf_sample, samples);

/*
* aperfmperf_snapshot_khz()
* On the current CPU, snapshot APERF, MPERF, and jiffies
* unless we already did it within 10ms
* calculate kHz, save snapshot
*/
static void aperfmperf_snapshot_khz(void *dummy)
{
u64 aperf, aperf_delta;
u64 mperf, mperf_delta;
struct aperfmperf_sample *s = this_cpu_ptr(&samples);

/* Don't bother re-computing within 10 ms */
if (time_before(jiffies, s->jiffies + HZ/100))
return;

rdmsrl(MSR_IA32_APERF, aperf);
rdmsrl(MSR_IA32_MPERF, mperf);

aperf_delta = aperf - s->aperf;
mperf_delta = mperf - s->mperf;

/*
* There is no architectural guarantee that MPERF
* increments faster than we can read it.
*/
if (mperf_delta == 0)
return;

/*
* if (cpu_khz * aperf_delta) fits into ULLONG_MAX, then
* khz = (cpu_khz * aperf_delta) / mperf_delta
*/
if (div64_u64(ULLONG_MAX, cpu_khz) > aperf_delta)
s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta);
else /* khz = aperf_delta / (mperf_delta / cpu_khz) */
s->khz = div64_u64(aperf_delta,
div64_u64(mperf_delta, cpu_khz));
s->jiffies = jiffies;
s->aperf = aperf;
s->mperf = mperf;
}

unsigned int arch_freq_get_on_cpu(int cpu)
{
if (!cpu_khz)
return 0;

if (!static_cpu_has(X86_FEATURE_APERFMPERF))
return 0;

smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);

return per_cpu(samples.khz, cpu);
}
Loading

0 comments on commit 408c986

Please sign in to comment.