Updates for timekeeping, timers and clockevent/source drivers:

Core:
 
     - Yet another round of improvements to make the clocksource watchdog
       more robust:
 
       	 - Relax the clocksource-watchdog skew criteria to match the NTP
            criteria.
 
 	 - Temporarily skip the watchdog when high memory latencies are
 	   detected which can lead to false-positives.
 
 	 - Provide an option to enable TSC skew detection even on systems
            where TSC is marked as reliable.
 
       Sigh!
 
     - Initialize the restart block in the nanosleep syscalls to be directed
       to the no restart function instead of doing a partial setup on entry.
 
       This prevents an erroneous restart_syscall() invocation from
       corrupting user space data. While such a situation is clearly a user
       space bug, preventing this is a correctness issue and caters to the
       least suprise principle.
 
     - Ignore the hrtimer slack for realtime tasks in schedule_hrtimeout()
       to align it with the nanosleep semantics.
 
   Drivers:
 
     - The obligatory new driver bindings for Mediatek, Rockchip and RISC-V
       variants.
 
     - Add support for the C3STOP misfeature to the RISC-V timer to handle
       the case where the timer stops in deeper idle state.
 
     - Set up a static key in the RISC-V timer correctly before first use.
 
     - The usual small improvements and fixes all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmPzV+cTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYlDEACMrjN2F6qeiOW94t4nQ3qP1M9AMSgO
 OihC04XuM14/3tEviu/cUOd60wYcUQ/kfI5C+IL35ezeP2w9lnuKqeFpG7aDOa33
 5F3isDPamJdXZEZs44CW15brR6dqDlEi5acKee/TtFV9mN6xNhzxM64IaFqecPmW
 P+BTwunB8xwquY8RzsHXor/GOGb6mqWQIPoHEPnywTDe/xQYWt0Exzi7ch6HQr5Z
 ZzHG6X4h6UTNimjay6L4qsRQWILmPIg4Z5IlycWMQ8qDFM0lbnIJqkG4JwceolI6
 aRQyLe3NQFcPYgq3ue+SNm4RckYn4NbAa1zFm0d5VDgKp4xW1sxvtkxOJuxjaOw2
 /rLkHkmyuVvCeTMAySfxrwnszAoM505CHC6CEYc1xELbeCkROFUaymtVyNFnnTru
 V/Jt/T2Gyx6tOrafX7u+djUjv9figddRpNbskVZvEi3Ztq4MQ069nK3oSUqtP5vO
 INApNg4lq6s8aGqVE+Kp9+CKwGqZqI4MdxQMNMAmCRLPon6apActVawbj18qO/wS
 qblQ0cbF8a16itlQ3V68qmhcPh6EZOuq8II4etNq6U0ulV9712WfMbat3z53LG94
 QNkAmZ3/wui93I+Q2NPxhf5ybJFQZhR0SOtVO6xIdTgOntkODwzzGu9UapfD8mLb
 k5BpWnH8CoUgiw==
 =I67j
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "Updates for timekeeping, timers and clockevent/source drivers:

  Core:

   - Yet another round of improvements to make the clocksource watchdog
     more robust:

       - Relax the clocksource-watchdog skew criteria to match the NTP
         criteria.

       - Temporarily skip the watchdog when high memory latencies are
         detected which can lead to false-positives.

       - Provide an option to enable TSC skew detection even on systems
         where TSC is marked as reliable.

     Sigh!

   - Initialize the restart block in the nanosleep syscalls to be
     directed to the no restart function instead of doing a partial
     setup on entry.

     This prevents an erroneous restart_syscall() invocation from
     corrupting user space data. While such a situation is clearly a
     user space bug, preventing this is a correctness issue and caters
     to the least suprise principle.

   - Ignore the hrtimer slack for realtime tasks in schedule_hrtimeout()
     to align it with the nanosleep semantics.

  Drivers:

   - The obligatory new driver bindings for Mediatek, Rockchip and
     RISC-V variants.

   - Add support for the C3STOP misfeature to the RISC-V timer to handle
     the case where the timer stops in deeper idle state.

   - Set up a static key in the RISC-V timer correctly before first use.

   - The usual small improvements and fixes all over the place"

* tag 'timers-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
  clocksource/drivers/timer-sun4i: Add CLOCK_EVT_FEAT_DYNIRQ
  clocksource/drivers/em_sti: Mark driver as non-removable
  clocksource/drivers/sh_tmu: Mark driver as non-removable
  clocksource/drivers/riscv: Patch riscv_clock_next_event() jump before first use
  clocksource/drivers/timer-microchip-pit64b: Add delay timer
  clocksource/drivers/timer-microchip-pit64b: Select driver only on ARM
  dt-bindings: timer: sifive,clint: add comaptibles for T-Head's C9xx
  dt-bindings: timer: mediatek,mtk-timer: add MT8365
  clocksource/drivers/riscv: Get rid of clocksource_arch_init() callback
  clocksource/drivers/sh_cmt: Mark driver as non-removable
  clocksource/drivers/timer-microchip-pit64b: Drop obsolete dependency on COMPILE_TEST
  clocksource/drivers/riscv: Increase the clock source rating
  clocksource/drivers/timer-riscv: Set CLOCK_EVT_FEAT_C3STOP based on DT
  dt-bindings: timer: Add bindings for the RISC-V timer device
  RISC-V: time: initialize hrtimer based broadcast clock event device
  dt-bindings: timer: rk-timer: Add rktimer for rv1126
  time/debug: Fix memory leak with using debugfs_lookup()
  clocksource: Enable TSC watchdog checking of HPET and PMTMR only when requested
  posix-timers: Use atomic64_try_cmpxchg() in __update_gt_cputime()
  clocksource: Verify HPET and PMTMR when TSC unverified
  ...
This commit is contained in:
Linus Torvalds 2023-02-21 09:45:13 -08:00
commit 560b803067
27 changed files with 252 additions and 77 deletions

View file

@ -6369,6 +6369,16 @@
in situations with strict latency requirements (where
interruptions from clocksource watchdog are not
acceptable).
[x86] recalibrate: force recalibration against a HW timer
(HPET or PM timer) on systems whose TSC frequency was
obtained from HW or FW using either an MSR or CPUID(0x15).
Warn if the difference is more than 500 ppm.
[x86] watchdog: Use TSC as the watchdog clocksource with
which to check other HW timers (HPET or PM timer), but
only on systems where TSC has been deemed trustworthy.
This will be suppressed by an earlier tsc=nowatchdog and
can be overridden by a later tsc=nowatchdog. A console
message will flag any such suppression or overriding.
tsc_early_khz= [X86] Skip early TSC calibration and use the given
value instead. Useful when the early TSC frequency discovery

View file

@ -33,6 +33,7 @@ Required properties:
For those SoCs that use CPUX
* "mediatek,mt6795-systimer" for MT6795 compatible timers (CPUX)
* "mediatek,mt8365-systimer" for MT8365 compatible timers (CPUX)
- reg: Should contain location and length for timer register.
- clocks: Should contain system clock.

View file

@ -0,0 +1,52 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/timer/riscv,timer.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: RISC-V timer
maintainers:
- Anup Patel <anup@brainfault.org>
description: |+
RISC-V platforms always have a RISC-V timer device for the supervisor-mode
based on the time CSR defined by the RISC-V privileged specification. The
timer interrupts of this device are configured using the RISC-V SBI Time
extension or the RISC-V Sstc extension.
The clock frequency of RISC-V timer device is specified via the
"timebase-frequency" DT property of "/cpus" DT node which is described
in Documentation/devicetree/bindings/riscv/cpus.yaml
properties:
compatible:
enum:
- riscv,timer
interrupts-extended:
minItems: 1
maxItems: 4096 # Should be enough?
riscv,timer-cannot-wake-cpu:
type: boolean
description:
If present, the timer interrupt cannot wake up the CPU from one or
more suspend/idle states.
additionalProperties: false
required:
- compatible
- interrupts-extended
examples:
- |
timer {
compatible = "riscv,timer";
interrupts-extended = <&cpu1intc 5>,
<&cpu2intc 5>,
<&cpu3intc 5>,
<&cpu4intc 5>;
};
...

View file

@ -17,6 +17,7 @@ properties:
- items:
- enum:
- rockchip,rv1108-timer
- rockchip,rv1126-timer
- rockchip,rk3036-timer
- rockchip,rk3128-timer
- rockchip,rk3188-timer

View file

@ -20,6 +20,10 @@ description:
property of "/cpus" DT node. The "timebase-frequency" DT property is
described in Documentation/devicetree/bindings/riscv/cpus.yaml
T-Head C906/C910 CPU cores include an implementation of CLINT too, however
their implementation lacks a memory-mapped MTIME register, thus not
compatible with SiFive ones.
properties:
compatible:
oneOf:
@ -29,6 +33,10 @@ properties:
- starfive,jh7100-clint
- canaan,k210-clint
- const: sifive,clint0
- items:
- enum:
- allwinner,sun20i-d1-clint
- const: thead,c900-clint
- items:
- const: sifive,clint0
- const: riscv,clint0

View file

@ -12,7 +12,6 @@ config 32BIT
config RISCV
def_bool y
select ARCH_CLOCKSOURCE_INIT
select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION
select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
select ARCH_HAS_BINFMT_FLAT

View file

@ -5,6 +5,7 @@
*/
#include <linux/of_clk.h>
#include <linux/clockchips.h>
#include <linux/clocksource.h>
#include <linux/delay.h>
#include <asm/sbi.h>
@ -29,13 +30,6 @@ void __init time_init(void)
of_clk_init(NULL);
timer_probe();
}
void clocksource_arch_init(struct clocksource *cs)
{
#ifdef CONFIG_GENERIC_GETTIMEOFDAY
cs->vdso_clock_mode = VDSO_CLOCKMODE_ARCHTIMER;
#else
cs->vdso_clock_mode = VDSO_CLOCKMODE_NONE;
#endif
tick_setup_hrtimer_broadcast();
}

View file

@ -8,6 +8,7 @@
extern void hpet_time_init(void);
extern void time_init(void);
extern bool pit_timer_init(void);
extern bool tsc_clocksource_watchdog_disabled(void);
extern struct clock_event_device *global_clock_event;

View file

@ -1091,6 +1091,8 @@ int __init hpet_enable(void)
if (!hpet_counting())
goto out_nohpet;
if (tsc_clocksource_watchdog_disabled())
clocksource_hpet.flags |= CLOCK_SOURCE_MUST_VERIFY;
clocksource_register_hz(&clocksource_hpet, (u32)hpet_freq);
if (id & HPET_ID_LEGSUP) {

View file

@ -48,6 +48,8 @@ static DEFINE_STATIC_KEY_FALSE(__use_tsc);
int tsc_clocksource_reliable;
static int __read_mostly tsc_force_recalibrate;
static u32 art_to_tsc_numerator;
static u32 art_to_tsc_denominator;
static u64 art_to_tsc_offset;
@ -291,6 +293,7 @@ __setup("notsc", notsc_setup);
static int no_sched_irq_time;
static int no_tsc_watchdog;
static int tsc_as_watchdog;
static int __init tsc_setup(char *str)
{
@ -300,8 +303,22 @@ static int __init tsc_setup(char *str)
no_sched_irq_time = 1;
if (!strcmp(str, "unstable"))
mark_tsc_unstable("boot parameter");
if (!strcmp(str, "nowatchdog"))
if (!strcmp(str, "nowatchdog")) {
no_tsc_watchdog = 1;
if (tsc_as_watchdog)
pr_alert("%s: Overriding earlier tsc=watchdog with tsc=nowatchdog\n",
__func__);
tsc_as_watchdog = 0;
}
if (!strcmp(str, "recalibrate"))
tsc_force_recalibrate = 1;
if (!strcmp(str, "watchdog")) {
if (no_tsc_watchdog)
pr_alert("%s: tsc=watchdog overridden by earlier tsc=nowatchdog\n",
__func__);
else
tsc_as_watchdog = 1;
}
return 1;
}
@ -1184,6 +1201,12 @@ static void __init tsc_disable_clocksource_watchdog(void)
clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
}
bool tsc_clocksource_watchdog_disabled(void)
{
return !(clocksource_tsc.flags & CLOCK_SOURCE_MUST_VERIFY) &&
tsc_as_watchdog && !no_tsc_watchdog;
}
static void __init check_system_tsc_reliable(void)
{
#if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC)
@ -1372,6 +1395,25 @@ static void tsc_refine_calibration_work(struct work_struct *work)
else
freq = calc_pmtimer_ref(delta, ref_start, ref_stop);
/* Will hit this only if tsc_force_recalibrate has been set */
if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
/* Warn if the deviation exceeds 500 ppm */
if (abs(tsc_khz - freq) > (tsc_khz >> 11)) {
pr_warn("Warning: TSC freq calibrated by CPUID/MSR differs from what is calibrated by HW timer, please check with vendor!!\n");
pr_info("Previous calibrated TSC freq:\t %lu.%03lu MHz\n",
(unsigned long)tsc_khz / 1000,
(unsigned long)tsc_khz % 1000);
}
pr_info("TSC freq recalibrated by [%s]:\t %lu.%03lu MHz\n",
hpet ? "HPET" : "PM_TIMER",
(unsigned long)freq / 1000,
(unsigned long)freq % 1000);
return;
}
/* Make sure we're within 1% */
if (abs(tsc_khz - freq) > tsc_khz/100)
goto out;
@ -1405,8 +1447,10 @@ static int __init init_tsc_clocksource(void)
if (!boot_cpu_has(X86_FEATURE_TSC) || !tsc_khz)
return 0;
if (tsc_unstable)
goto unreg;
if (tsc_unstable) {
clocksource_unregister(&clocksource_tsc_early);
return 0;
}
if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
@ -1419,9 +1463,10 @@ static int __init init_tsc_clocksource(void)
if (boot_cpu_has(X86_FEATURE_ART))
art_related_clocksource = &clocksource_tsc;
clocksource_register_khz(&clocksource_tsc, tsc_khz);
unreg:
clocksource_unregister(&clocksource_tsc_early);
return 0;
if (!tsc_force_recalibrate)
return 0;
}
schedule_delayed_work(&tsc_irqwork, 0);

View file

@ -706,7 +706,7 @@ config INGENIC_OST
config MICROCHIP_PIT64B
bool "Microchip PIT64B support"
depends on OF || COMPILE_TEST
depends on OF && ARM
select TIMER_OF
help
This option enables Microchip PIT64B timer for Atmel

View file

@ -23,6 +23,7 @@
#include <linux/pci.h>
#include <linux/delay.h>
#include <asm/io.h>
#include <asm/time.h>
/*
* The I/O port the PMTMR resides at.
@ -210,8 +211,9 @@ static int __init init_acpi_pm_clocksource(void)
return -ENODEV;
}
return clocksource_register_hz(&clocksource_acpi_pm,
PMTMR_TICKS_PER_SEC);
if (tsc_clocksource_watchdog_disabled())
clocksource_acpi_pm.flags |= CLOCK_SOURCE_MUST_VERIFY;
return clocksource_register_hz(&clocksource_acpi_pm, PMTMR_TICKS_PER_SEC);
}
/* We use fs_initcall because we want the PCI fixups to have run

View file

@ -333,11 +333,6 @@ static int em_sti_probe(struct platform_device *pdev)
return 0;
}
static int em_sti_remove(struct platform_device *pdev)
{
return -EBUSY; /* cannot unregister clockevent and clocksource */
}
static const struct of_device_id em_sti_dt_ids[] = {
{ .compatible = "renesas,em-sti", },
{},
@ -346,10 +341,10 @@ MODULE_DEVICE_TABLE(of, em_sti_dt_ids);
static struct platform_driver em_sti_device_driver = {
.probe = em_sti_probe,
.remove = em_sti_remove,
.driver = {
.name = "em_sti",
.of_match_table = em_sti_dt_ids,
.suppress_bind_attrs = true,
}
};

View file

@ -1145,17 +1145,12 @@ static int sh_cmt_probe(struct platform_device *pdev)
return 0;
}
static int sh_cmt_remove(struct platform_device *pdev)
{
return -EBUSY; /* cannot unregister clockevent and clocksource */
}
static struct platform_driver sh_cmt_device_driver = {
.probe = sh_cmt_probe,
.remove = sh_cmt_remove,
.driver = {
.name = "sh_cmt",
.of_match_table = of_match_ptr(sh_cmt_of_table),
.suppress_bind_attrs = true,
},
.id_table = sh_cmt_id_table,
};

View file

@ -632,11 +632,6 @@ static int sh_tmu_probe(struct platform_device *pdev)
return 0;
}
static int sh_tmu_remove(struct platform_device *pdev)
{
return -EBUSY; /* cannot unregister clockevent and clocksource */
}
static const struct platform_device_id sh_tmu_id_table[] = {
{ "sh-tmu", SH_TMU },
{ "sh-tmu-sh3", SH_TMU_SH3 },
@ -652,10 +647,10 @@ MODULE_DEVICE_TABLE(of, sh_tmu_of_table);
static struct platform_driver sh_tmu_device_driver = {
.probe = sh_tmu_probe,
.remove = sh_tmu_remove,
.driver = {
.name = "sh_tmu",
.of_match_table = of_match_ptr(sh_tmu_of_table),
.suppress_bind_attrs = true,
},
.id_table = sh_tmu_id_table,
};

View file

@ -9,6 +9,7 @@
#include <linux/clk.h>
#include <linux/clockchips.h>
#include <linux/delay.h>
#include <linux/interrupt.h>
#include <linux/of_address.h>
#include <linux/of_irq.h>
@ -92,6 +93,8 @@ struct mchp_pit64b_clksrc {
static void __iomem *mchp_pit64b_cs_base;
/* Default cycles for clockevent timer. */
static u64 mchp_pit64b_ce_cycles;
/* Delay timer. */
static struct delay_timer mchp_pit64b_dt;
static inline u64 mchp_pit64b_cnt_read(void __iomem *base)
{
@ -169,6 +172,11 @@ static u64 notrace mchp_pit64b_sched_read_clk(void)
return mchp_pit64b_cnt_read(mchp_pit64b_cs_base);
}
static unsigned long notrace mchp_pit64b_dt_read(void)
{
return mchp_pit64b_cnt_read(mchp_pit64b_cs_base);
}
static int mchp_pit64b_clkevt_shutdown(struct clock_event_device *cedev)
{
struct mchp_pit64b_timer *timer = clkevt_to_mchp_pit64b_timer(cedev);
@ -376,6 +384,10 @@ static int __init mchp_pit64b_init_clksrc(struct mchp_pit64b_timer *timer,
sched_clock_register(mchp_pit64b_sched_read_clk, 64, clk_rate);
mchp_pit64b_dt.read_current_timer = mchp_pit64b_dt_read;
mchp_pit64b_dt.freq = clk_rate;
register_current_timer_delay(&mchp_pit64b_dt);
return 0;
}

View file

@ -28,6 +28,7 @@
#include <asm/timex.h>
static DEFINE_STATIC_KEY_FALSE(riscv_sstc_available);
static bool riscv_timer_cannot_wake_cpu;
static int riscv_clock_next_event(unsigned long delta,
struct clock_event_device *ce)
@ -73,10 +74,15 @@ static u64 notrace riscv_sched_clock(void)
static struct clocksource riscv_clocksource = {
.name = "riscv_clocksource",
.rating = 300,
.rating = 400,
.mask = CLOCKSOURCE_MASK(64),
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
.read = riscv_clocksource_rdtime,
#if IS_ENABLED(CONFIG_GENERIC_GETTIMEOFDAY)
.vdso_clock_mode = VDSO_CLOCKMODE_ARCHTIMER,
#else
.vdso_clock_mode = VDSO_CLOCKMODE_NONE,
#endif
};
static int riscv_timer_starting_cpu(unsigned int cpu)
@ -85,6 +91,8 @@ static int riscv_timer_starting_cpu(unsigned int cpu)
ce->cpumask = cpumask_of(cpu);
ce->irq = riscv_clock_event_irq;
if (riscv_timer_cannot_wake_cpu)
ce->features |= CLOCK_EVT_FEAT_C3STOP;
clockevents_config_and_register(ce, riscv_timebase, 100, 0x7fffffff);
enable_percpu_irq(riscv_clock_event_irq,
@ -139,6 +147,13 @@ static int __init riscv_timer_init_dt(struct device_node *n)
if (cpuid != smp_processor_id())
return 0;
child = of_find_compatible_node(NULL, NULL, "riscv,timer");
if (child) {
riscv_timer_cannot_wake_cpu = of_property_read_bool(child,
"riscv,timer-cannot-wake-cpu");
of_node_put(child);
}
domain = NULL;
child = of_get_compatible_child(n, "riscv,cpu-intc");
if (!child) {
@ -177,6 +192,11 @@ static int __init riscv_timer_init_dt(struct device_node *n)
return error;
}
if (riscv_isa_extension_available(NULL, SSTC)) {
pr_info("Timer interrupt in S-mode is available via sstc extension\n");
static_branch_enable(&riscv_sstc_available);
}
error = cpuhp_setup_state(CPUHP_AP_RISCV_TIMER_STARTING,
"clockevents/riscv/timer:starting",
riscv_timer_starting_cpu, riscv_timer_dying_cpu);
@ -184,11 +204,6 @@ static int __init riscv_timer_init_dt(struct device_node *n)
pr_err("cpu hp setup state failed for RISCV timer [%d]\n",
error);
if (riscv_isa_extension_available(NULL, SSTC)) {
pr_info("Timer interrupt in S-mode is available via sstc extension\n");
static_branch_enable(&riscv_sstc_available);
}
return error;
}

View file

@ -144,7 +144,8 @@ static struct timer_of to = {
.clkevt = {
.name = "sun4i_tick",
.rating = 350,
.features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT,
.features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT |
CLOCK_EVT_FEAT_DYNIRQ,
.set_state_shutdown = sun4i_clkevt_shutdown,
.set_state_periodic = sun4i_clkevt_set_periodic,
.set_state_oneshot = sun4i_clkevt_set_oneshot,

View file

@ -6,7 +6,6 @@
#include <vdso/bits.h>
#include <asm/bitsperlong.h>
#define BIT_ULL(nr) (ULL(1) << (nr))
#define BIT_MASK(nr) (UL(1) << ((nr) % BITS_PER_LONG))
#define BIT_WORD(nr) ((nr) / BITS_PER_LONG)
#define BIT_ULL_MASK(nr) (ULL(1) << ((nr) % BITS_PER_LONG_LONG))

View file

@ -5,5 +5,6 @@
#include <vdso/const.h>
#define BIT(nr) (UL(1) << (nr))
#define BIT_ULL(nr) (ULL(1) << (nr))
#endif /* __VDSO_BITS_H */

View file

@ -200,10 +200,14 @@ config CLOCKSOURCE_WATCHDOG_MAX_SKEW_US
int "Clocksource watchdog maximum allowable skew (in μs)"
depends on CLOCKSOURCE_WATCHDOG
range 50 1000
default 100
default 125
help
Specify the maximum amount of allowable watchdog skew in
microseconds before reporting the clocksource to be unstable.
The default is based on a half-second clocksource watchdog
interval and NTP's maximum frequency drift of 500 parts
per million. If the clocksource is good enough for NTP,
it is good enough for the clocksource watchdog!
endmenu
endif

View file

@ -95,6 +95,11 @@ static char override_name[CS_NAME_LEN];
static int finished_booting;
static u64 suspend_start;
/*
* Interval: 0.5sec.
*/
#define WATCHDOG_INTERVAL (HZ >> 1)
/*
* Threshold: 0.0312s, when doubled: 0.0625s.
* Also a default for cs->uncertainty_margin when registering clocks.
@ -106,11 +111,14 @@ static u64 suspend_start;
* clocksource surrounding a read of the clocksource being validated.
* This delay could be due to SMIs, NMIs, or to VCPU preemptions. Used as
* a lower bound for cs->uncertainty_margin values when registering clocks.
*
* The default of 500 parts per million is based on NTP's limits.
* If a clocksource is good enough for NTP, it is good enough for us!
*/
#ifdef CONFIG_CLOCKSOURCE_WATCHDOG_MAX_SKEW_US
#define MAX_SKEW_USEC CONFIG_CLOCKSOURCE_WATCHDOG_MAX_SKEW_US
#else
#define MAX_SKEW_USEC 100
#define MAX_SKEW_USEC (125 * WATCHDOG_INTERVAL / HZ)
#endif
#define WATCHDOG_MAX_SKEW (MAX_SKEW_USEC * NSEC_PER_USEC)
@ -140,11 +148,6 @@ static inline void clocksource_watchdog_unlock(unsigned long *flags)
static int clocksource_watchdog_kthread(void *data);
static void __clocksource_change_rating(struct clocksource *cs, int rating);
/*
* Interval: 0.5sec.
*/
#define WATCHDOG_INTERVAL (HZ >> 1)
static void clocksource_watchdog_work(struct work_struct *work)
{
/*
@ -257,8 +260,8 @@ static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow,
goto skip_test;
}
pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, marking unstable\n",
smp_processor_id(), watchdog->name, wd_delay, nretries);
pr_warn("timekeeping watchdog on CPU%d: wd-%s-wd excessive read-back delay of %lldns vs. limit of %ldns, wd-wd read-back delay only %lldns, attempt %d, marking %s unstable\n",
smp_processor_id(), cs->name, wd_delay, WATCHDOG_MAX_SKEW, wd_seq_delay, nretries, cs->name);
return WD_READ_UNSTABLE;
skip_test:
@ -384,6 +387,15 @@ void clocksource_verify_percpu(struct clocksource *cs)
}
EXPORT_SYMBOL_GPL(clocksource_verify_percpu);
static inline void clocksource_reset_watchdog(void)
{
struct clocksource *cs;
list_for_each_entry(cs, &watchdog_list, wd_list)
cs->flags &= ~CLOCK_SOURCE_WATCHDOG;
}
static void clocksource_watchdog(struct timer_list *unused)
{
u64 csnow, wdnow, cslast, wdlast, delta;
@ -391,6 +403,7 @@ static void clocksource_watchdog(struct timer_list *unused)
int64_t wd_nsec, cs_nsec;
struct clocksource *cs;
enum wd_read_status read_ret;
unsigned long extra_wait = 0;
u32 md;
spin_lock(&watchdog_lock);
@ -410,13 +423,30 @@ static void clocksource_watchdog(struct timer_list *unused)
read_ret = cs_watchdog_read(cs, &csnow, &wdnow);
if (read_ret != WD_READ_SUCCESS) {
if (read_ret == WD_READ_UNSTABLE)
/* Clock readout unreliable, so give it up. */
__clocksource_unstable(cs);
if (read_ret == WD_READ_UNSTABLE) {
/* Clock readout unreliable, so give it up. */
__clocksource_unstable(cs);
continue;
}
/*
* When WD_READ_SKIP is returned, it means the system is likely
* under very heavy load, where the latency of reading
* watchdog/clocksource is very big, and affect the accuracy of
* watchdog check. So give system some space and suspend the
* watchdog check for 5 minutes.
*/
if (read_ret == WD_READ_SKIP) {
/*
* As the watchdog timer will be suspended, and
* cs->last could keep unchanged for 5 minutes, reset
* the counters.
*/
clocksource_reset_watchdog();
extra_wait = HZ * 300;
break;
}
/* Clocksource initialized ? */
if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
atomic_read(&watchdog_reset_pending)) {
@ -443,12 +473,20 @@ static void clocksource_watchdog(struct timer_list *unused)
/* Check the deviation from the watchdog clocksource. */
md = cs->uncertainty_margin + watchdog->uncertainty_margin;
if (abs(cs_nsec - wd_nsec) > md) {
u64 cs_wd_msec;
u64 wd_msec;
u32 wd_rem;
pr_warn("timekeeping watchdog on CPU%d: Marking clocksource '%s' as unstable because the skew is too large:\n",
smp_processor_id(), cs->name);
pr_warn(" '%s' wd_nsec: %lld wd_now: %llx wd_last: %llx mask: %llx\n",
watchdog->name, wd_nsec, wdnow, wdlast, watchdog->mask);
pr_warn(" '%s' cs_nsec: %lld cs_now: %llx cs_last: %llx mask: %llx\n",
cs->name, cs_nsec, csnow, cslast, cs->mask);
cs_wd_msec = div_u64_rem(cs_nsec - wd_nsec, 1000U * 1000U, &wd_rem);
wd_msec = div_u64_rem(wd_nsec, 1000U * 1000U, &wd_rem);
pr_warn(" Clocksource '%s' skewed %lld ns (%lld ms) over watchdog '%s' interval of %lld ns (%lld ms)\n",
cs->name, cs_nsec - wd_nsec, cs_wd_msec, watchdog->name, wd_nsec, wd_msec);
if (curr_clocksource == cs)
pr_warn(" '%s' is current clocksource.\n", cs->name);
else if (curr_clocksource)
@ -512,7 +550,7 @@ static void clocksource_watchdog(struct timer_list *unused)
* pair clocksource_stop_watchdog() clocksource_start_watchdog().
*/
if (!timer_pending(&watchdog_timer)) {
watchdog_timer.expires += WATCHDOG_INTERVAL;
watchdog_timer.expires += WATCHDOG_INTERVAL + extra_wait;
add_timer_on(&watchdog_timer, next_cpu);
}
out:
@ -537,14 +575,6 @@ static inline void clocksource_stop_watchdog(void)
watchdog_running = 0;
}
static inline void clocksource_reset_watchdog(void)
{
struct clocksource *cs;
list_for_each_entry(cs, &watchdog_list, wd_list)
cs->flags &= ~CLOCK_SOURCE_WATCHDOG;
}
static void clocksource_resume_watchdog(void)
{
atomic_inc(&watchdog_reset_pending);

View file

@ -2089,7 +2089,7 @@ long hrtimer_nanosleep(ktime_t rqtp, const enum hrtimer_mode mode,
u64 slack;
slack = current->timer_slack_ns;
if (dl_task(current) || rt_task(current))
if (rt_task(current))
slack = 0;
hrtimer_init_sleeper_on_stack(&t, clockid, mode);
@ -2126,6 +2126,7 @@ SYSCALL_DEFINE2(nanosleep, struct __kernel_timespec __user *, rqtp,
if (!timespec64_valid(&tu))
return -EINVAL;
current->restart_block.fn = do_no_restart_syscall;
current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE;
current->restart_block.nanosleep.rmtp = rmtp;
return hrtimer_nanosleep(timespec64_to_ktime(tu), HRTIMER_MODE_REL,
@ -2147,6 +2148,7 @@ SYSCALL_DEFINE2(nanosleep_time32, struct old_timespec32 __user *, rqtp,
if (!timespec64_valid(&tu))
return -EINVAL;
current->restart_block.fn = do_no_restart_syscall;
current->restart_block.nanosleep.type = rmtp ? TT_COMPAT : TT_NONE;
current->restart_block.nanosleep.compat_rmtp = rmtp;
return hrtimer_nanosleep(timespec64_to_ktime(tu), HRTIMER_MODE_REL,
@ -2270,7 +2272,7 @@ void __init hrtimers_init(void)
/**
* schedule_hrtimeout_range_clock - sleep until timeout
* @expires: timeout value (ktime_t)
* @delta: slack in expires timeout (ktime_t)
* @delta: slack in expires timeout (ktime_t) for SCHED_OTHER tasks
* @mode: timer mode
* @clock_id: timer clock to be used
*/
@ -2297,6 +2299,13 @@ schedule_hrtimeout_range_clock(ktime_t *expires, u64 delta,
return -EINTR;
}
/*
* Override any slack passed by the user if under
* rt contraints.
*/
if (rt_task(current))
delta = 0;
hrtimer_init_sleeper_on_stack(&t, clock_id, mode);
hrtimer_set_expires_range_ns(&t.timer, *expires, delta);
hrtimer_sleeper_start_expires(&t, mode);
@ -2316,7 +2325,7 @@ EXPORT_SYMBOL_GPL(schedule_hrtimeout_range_clock);
/**
* schedule_hrtimeout_range - sleep until timeout
* @expires: timeout value (ktime_t)
* @delta: slack in expires timeout (ktime_t)
* @delta: slack in expires timeout (ktime_t) for SCHED_OTHER tasks
* @mode: timer mode
*
* Make the current task sleep until the given expiry time has
@ -2324,7 +2333,8 @@ EXPORT_SYMBOL_GPL(schedule_hrtimeout_range_clock);
* the current task state has been set (see set_current_state()).
*
* The @delta argument gives the kernel the freedom to schedule the
* actual wakeup to a time that is both power and performance friendly.
* actual wakeup to a time that is both power and performance friendly
* for regular (non RT/DL) tasks.
* The kernel give the normal best effort behavior for "@expires+@delta",
* but may decide to fire the timer earlier, but no earlier than @expires.
*

View file

@ -243,13 +243,12 @@ static void proc_sample_cputime_atomic(struct task_cputime_atomic *at,
*/
static inline void __update_gt_cputime(atomic64_t *cputime, u64 sum_cputime)
{
u64 curr_cputime;
retry:
curr_cputime = atomic64_read(cputime);
if (sum_cputime > curr_cputime) {
if (atomic64_cmpxchg(cputime, curr_cputime, sum_cputime) != curr_cputime)
goto retry;
}
u64 curr_cputime = atomic64_read(cputime);
do {
if (sum_cputime <= curr_cputime)
return;
} while (!atomic64_try_cmpxchg(cputime, &curr_cputime, sum_cputime));
}
static void update_gt_cputime(struct task_cputime_atomic *cputime_atomic,

View file

@ -147,6 +147,7 @@ SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, which_clock, int, flags,
return -EINVAL;
if (flags & TIMER_ABSTIME)
rmtp = NULL;
current->restart_block.fn = do_no_restart_syscall;
current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE;
current->restart_block.nanosleep.rmtp = rmtp;
texp = timespec64_to_ktime(t);
@ -240,6 +241,7 @@ SYSCALL_DEFINE4(clock_nanosleep_time32, clockid_t, which_clock, int, flags,
return -EINVAL;
if (flags & TIMER_ABSTIME)
rmtp = NULL;
current->restart_block.fn = do_no_restart_syscall;
current->restart_block.nanosleep.type = rmtp ? TT_COMPAT : TT_NONE;
current->restart_block.nanosleep.compat_rmtp = rmtp;
texp = timespec64_to_ktime(t);

View file

@ -1270,6 +1270,7 @@ SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, which_clock, int, flags,
return -EINVAL;
if (flags & TIMER_ABSTIME)
rmtp = NULL;
current->restart_block.fn = do_no_restart_syscall;
current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE;
current->restart_block.nanosleep.rmtp = rmtp;
@ -1297,6 +1298,7 @@ SYSCALL_DEFINE4(clock_nanosleep_time32, clockid_t, which_clock, int, flags,
return -EINVAL;
if (flags & TIMER_ABSTIME)
rmtp = NULL;
current->restart_block.fn = do_no_restart_syscall;
current->restart_block.nanosleep.type = rmtp ? TT_COMPAT : TT_NONE;
current->restart_block.nanosleep.compat_rmtp = rmtp;

View file

@ -149,7 +149,7 @@ module_init(udelay_test_init);
static void __exit udelay_test_exit(void)
{
mutex_lock(&udelay_test_lock);
debugfs_remove(debugfs_lookup(DEBUGFS_FILENAME, NULL));
debugfs_lookup_and_remove(DEBUGFS_FILENAME, NULL);
mutex_unlock(&udelay_test_lock);
}