Commit graph

812158 commits

Author SHA1 Message Date
Ming Lei 6dc4f100c1 block: allow bio_for_each_segment_all() to iterate over multi-page bvec
This patch introduces one extra iterator variable to bio_for_each_segment_all(),
then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.

Given it is just one mechannical & simple change on all bio_for_each_segment_all()
users, this patch does tree-wide change in one single patch, so that we can
avoid to use a temporary helper for this conversion.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15 08:40:11 -07:00
Ming Lei 2e1f4f4d24 bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()
bch_bio_alloc_pages() is always called on one new bio, so it is safe
to access the bvec table directly. Given it is the only kind of this
case, open code the bvec table access since bio_for_each_segment_all()
will be changed to support for iterating over multipage bvec.

Acked-by: Coly Li <colyli@suse.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15 08:40:11 -07:00
Ming Lei 86af5952a8 block: loop: pass multi-page bvec to iov_iter
iov_iter is implemented on bvec itererator helpers, so it is safe to pass
multi-page bvec to it, and this way is much more efficient than passing one
page in each bvec.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15 08:40:11 -07:00
Ming Lei c3a7ce7380 btrfs: use mp_bvec_last_segment to get bio's last page
Preparing for supporting multi-page bvec.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15 08:40:11 -07:00
Ming Lei f70f446407 fs/buffer.c: use bvec iterator to truncate the bio
Once multi-page bvec is enabled, the last bvec may include more than one
page, this patch use mp_bvec_last_segment() to truncate the bio.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15 08:40:11 -07:00
Ming Lei 45a3fb9529 block: introduce mp_bvec_last_segment()
BTRFS and guard_bio_eod() need to get the last singlepage segment
from one multipage bvec, so introduce this helper to make them happy.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15 08:40:11 -07:00
Ming Lei 862e5a5e6f block: use bio_for_each_bvec() to map sg
It is more efficient to use bio_for_each_bvec() to map sg, meantime
we have to consider splitting multipage bvec as done in blk_bio_segment_split().

Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15 08:40:11 -07:00
Ming Lei dcebd75592 block: use bio_for_each_bvec() to compute multi-page bvec count
First it is more efficient to use bio_for_each_bvec() in both
blk_bio_segment_split() and __blk_recalc_rq_segments() to compute how
many multi-page bvecs there are in the bio.

Secondly once bio_for_each_bvec() is used, the bvec may need to be
splitted because its length can be very longer than max segment size,
so we have to split the big bvec into several segments.

Thirdly when splitting multi-page bvec into segments, the max segment
limit may be reached, so the bio split need to be considered under
this situation too.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15 08:40:11 -07:00
Ming Lei d18d91740a block: introduce bio_for_each_bvec() and rq_for_each_bvec()
bio_for_each_bvec() is used for iterating over multi-page bvec for bio
split & merge code.

rq_for_each_bvec() can be used for drivers which may handle the
multi-page bvec directly, so far loop is one perfect use case.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15 08:40:10 -07:00
Ming Lei 3d75ca0ade block: introduce multi-page bvec helpers
This patch introduces helpers of 'mp_bvec_iter_*' for multi-page bvec
support.

The introduced helpers treate one bvec as real multi-page segment,
which may include more than one pages.

The existed helpers of bvec_iter_* are interfaces for supporting current
bvec iterator which is thought as single-page by drivers, fs, dm and
etc. These introduced helpers will build single-page bvec in flight, so
this way won't break current bio/bvec users, which needn't any change.

Follows some multi-page bvec background:

- bvecs stored in bio->bi_io_vec is always multi-page style

- bvec(struct bio_vec) represents one physically contiguous I/O
  buffer, now the buffer may include more than one page after
  multi-page bvec is supported, and all these pages represented
  by one bvec is physically contiguous. Before multi-page bvec
  support, at most one page is included in one bvec, we call it
  single-page bvec.

- .bv_page of the bvec points to the 1st page in the multi-page bvec

- .bv_offset of the bvec is the offset of the buffer in the bvec

The effect on the current drivers/filesystem/dm/bcache/...:

- almost everyone supposes that one bvec only includes one single
  page, so we keep the sp interface not changed, for example,
  bio_for_each_segment() still returns single-page bvec

- bio_for_each_segment_all() will return single-page bvec too

- during iterating, iterator variable(struct bvec_iter) is always
  updated in multi-page bvec style, and bvec_iter_advance() is kept
  not changed

- returned(copied) single-page bvec is built in flight by bvec
  helpers from the stored multi-page bvec

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15 08:40:10 -07:00
Ming Lei 19d62f6d00 block: remove bvec_iter_rewind()
Commit 7759eb23fd ("block: remove bio_rewind_iter()") removes
bio_rewind_iter(), then no one uses bvec_iter_rewind() any more,
so remove it.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15 08:40:10 -07:00
Ming Lei 1a67356e9a block: don't use bio->bi_vcnt to figure out segment number
It is wrong to use bio->bi_vcnt to figure out how many segments
there are in the bio even though CLONED flag isn't set on this bio,
because this bio may be splitted or advanced.

So always use bio_segments() in blk_recount_segments(), and it shouldn't
cause any performance loss now because the physical segment number is figured
out in blk_queue_split() and BIO_SEG_VALID is set meantime since
bdced438ac ("block: setup bi_phys_segments after splitting").

Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Fixes: 76d8137a31 ("blk-merge: recaculate segment if it isn't less than max segments")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15 08:40:10 -07:00
Christoph Hellwig 8a2ee44a37 btrfs: look at bi_size for repair decisions
bio_readpage_error currently uses bi_vcnt to decide if it is worth
retrying an I/O.  But the vector count is mostly an implementation
artifact - it really should figure out if there is more than a
single sector worth retrying.  Use bi_size for that and shift by
PAGE_SHIFT.  This really should be blocks/sectors, but given that
btrfs doesn't support a sector size different from the PAGE_SIZE
using the page size keeps the changes to a minimum.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15 08:40:10 -07:00
Aleksei Zakharov fbd72127c9 block: avoid setting none scheduler if it's already none
There's no reason to freeze queue and remove scheduler
if there's no scheduler already.

Signed-off-by: Aleksei Zakharov <zakharov.a.g@yandex.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-11 08:21:40 -07:00
Aleksei Zakharov b7143fe67b block: avoid setting wbt_lat_usec to current value
There's no reason to set wbt min lat and freeze request queue
if current value is the same.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Aleksei Zakharov <zakharov.a.g@yandex.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-11 08:20:14 -07:00
Heiner Litz 0586942f03 lightnvm: pblk: fix race condition on GC
This patch fixes a race condition where a write is mapped to the last
sectors of a line. The write is synced to the device but the L2P is not
updated yet. When the line is garbage collected before the L2P update
is performed, the sectors are ignored by the GC logic and the line is
freed before all sectors are moved. When the L2P is finally updated, it
contains a mapping to a freed line, subsequent reads of the
corresponding LBAs fail.

This patch introduces a per line counter specifying the number of
sectors that are synced to the device but have not been updated in the
L2P. Lines with a counter of greater than zero will not be selected
for GC.

Signed-off-by: Heiner Litz <hlitz@ucsc.edu>
Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-11 08:18:08 -07:00
Javier González b4cdc4260e lightnvm: pblk: prevent stall due to wb threshold
In order to respect mw_cuinits, pblk's write buffer maintains a
backpointer to protect data not yet persisted; when writing to the write
buffer, this backpointer defines a threshold that pblk's rate-limiter
enforces.

On small PU configurations, the following scenarios might take place: (i)
the threshold is larger than the write buffer and (ii) the threshold is
smaller than the write buffer, but larger than the maximun allowed
split bio - 256KB at this moment (Note that writes are not always
split - we only do this when we the size of the buffer is smaller
than the buffer). In both cases, pblk's rate-limiter prevents the I/O to
be written to the buffer, thus stalling.

This patch fixes the original backpointer implementation by considering
the threshold both on buffer creation and on the rate-limiters path,
when bio_split is triggered (case (ii) above).

Fixes: 766c8ceb16 ("lightnvm: pblk: guarantee that backpointer is respected on writer stall")
Signed-off-by: Javier González <javier@javigon.com>
Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-11 08:18:08 -07:00
Hans Holmberg aa8759d80a lightnvm: pblk: extend line wp balance check
pblk stripes writes of minimal write size across all non-offline chunks
in a line, which means that the maximum write pointer delta should not
exceed the minimal write size.

Extend the line write pointer balance check to cover this case, and
ignore the offline chunk wps.

This will render us a warning during recovery if something unexpected
has happened to the chunk write pointers (i.e. powerloss,  a spurious
chunk reset, ..).

Reported-by: Zhoujie Wu <zjwu@marvell.com>
Tested-by: Zhoujie Wu <zjwu@marvell.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-11 08:18:07 -07:00
Masahiro Yamada b7fce8f79d lightnvm: pblk: fix TRACE_INCLUDE_PATH
As the comment block in include/trace/define_trace.h says,
TRACE_INCLUDE_PATH should be a relative path to the define_trace.h

../../drivers/lightnvm is the correct relative path.

../../../drivers/lightnvm is working by coincidence because the top
Makefile adds -I$(srctree)/arch/$(SRCARCH)/include as a header
search path, but we should not rely on it.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-11 08:18:07 -07:00
Andy Shevchenko 7e0a0847ed lightnvm: pblk: Switch to use new generic UUID API
There are new types and helpers that are supposed to be used in new code.

As a preparation to get rid of legacy types and API functions do
the conversion here.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-11 08:18:07 -07:00
Andy Shevchenko e74ecf63ef lightnvm: Use u64 instead of __le64 for CPU visible side
Sparse complains about using strict data types:

drivers/lightnvm/pblk-read.c:254:43: warning: incorrect type in assignment (different base types)
drivers/lightnvm/pblk-read.c:254:43:    expected restricted __le64 <noident>
drivers/lightnvm/pblk-read.c:254:43:    got unsigned long long [unsigned] [usertype] <noident>
drivers/lightnvm/pblk-read.c:255:29: warning: cast from restricted __le64
drivers/lightnvm/pblk-read.c:268:29: warning: cast from restricted __le64
drivers/lightnvm/pblk-read.c:328:41: warning: incorrect type in assignment (different base types)
drivers/lightnvm/pblk-read.c:328:41:    expected restricted __le64 <noident>
drivers/lightnvm/pblk-read.c:328:41:    got unsigned long long [unsigned] [usertype] <noident>

In the code it seems explicit that lba_list_mem and lba_list_media members
of struct pblk_pr_ctx are used on CPU side, which means they should not be
of strict types.

Change types of lba_list_mem and lba_list_media members to be u64.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-11 08:18:07 -07:00
Hans Holmberg 6916cf5426 lightnvm: pblk: use vfree to free metadata on error path
As chunk metadata is allocated using vmalloc, we need to free it
using vfree.

Fixes: 090ee26fd5 ("lightnvm: use internal allocation for chunk log page")
Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-11 08:18:07 -07:00
Hans Holmberg f9324980d7 lightnvm: pblk: stop taking the free lock in in pblk_lines_free
pblk_line_meta_free might sleep (it can end up calling vfree, depending
on how we allocate lba lists), and this can lead to a BUG()
if we wake up on a different cpu and release the lock.

As there is no point of grabbing the free lock when pblk has shut down,
remove the lock.

Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-11 08:18:07 -07:00
Linus Torvalds d13937116f Linux 5.0-rc6 2019-02-10 14:42:20 -08:00
Linus Torvalds 68d94a8424 dmaengine-fix-5.0-rc6
dmaengine fixes for v5.0-rc6
 
  - Fix in at_xdmac fr wrongful channel state
  - Fix for imx driver for wrong callback invocation
  - Fix to bcm driver for interrupt race & transaction abort.
  - Fix in dmatest to abort in mapping error
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJcYD8SAAoJEHwUBw8lI4NHqc0P/At5/XRhS60fnsibTEnGH3ee
 I3TQY3DrGXHXUAq8LqVIjaql7NUQ+I07bRtP6JMX9GYrTguUVVDPau6E4CtWt571
 +TnKXQefgY/UlP9f5K127Dal5YNaYsBZqbEvhE2xgYKrsFStiIYDFIPGFeIE841E
 J0cm8mjM+dQxZOXCCf/vSxA9h2Aqj7Fa4RRY3Sj4r/CWahaKpZt+OM6bdDpmUxmq
 PzDLSE4DdHLWZSTcvAFQQtwlMc4Dq8a7cs2MmCvLC2jwGq0IgLr0bfzj6pZOqcuN
 O05TNcyhQHZ5YRnBHTgJUSgR2abhKuNnBWN/pA1Tbsn7+OISsvt4erjITAFv6fwR
 mkWgyAzH3tZ1PnO1YO43KEfy/nrT6Veg1xePTzWaaYAsPJ9FJeQJuZ5YxPysOLcn
 lux2/B19HeWVb02vredn5eUXH7b8C0WQhfEax4hgbbPcaDX+y3WFN0n2gxOAFQ8H
 xloKu511Iuu0Uteov4wys6LucqX61vPPank9Ma2N7RLdZ4x2Ya8gDjG/kfQOFHDa
 jRbbjzw8Q5tuU2++GXwSpWq55LOKYnaXQbUIRWxrrdwAJahKAQFW++KR2ehB1kvS
 pCaIWwpSHXmCbat/A1JfcYMagAtw6QSgbJlNwru1B7OYujFMLml+GXtqeA/kJ8yo
 3+oF8PB0xA+++o83USaa
 =NTGY
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-fix-5.0-rc6' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
 - Fix in at_xdmac fr wrongful channel state
 - Fix for imx driver for wrong callback invocation
 - Fix to bcm driver for interrupt race & transaction abort.
 - Fix in dmatest to abort in mapping error

* tag 'dmaengine-fix-5.0-rc6' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: dmatest: Abort test in case of mapping error
  dmaengine: bcm2835: Fix abort of transactions
  dmaengine: bcm2835: Fix interrupt race on RT
  dmaengine: imx-dma: fix wrong callback invoke
  dmaengine: at_xdmac: Fix wrongfull report of a channel as in use
2019-02-10 10:39:37 -08:00
Linus Torvalds aadaa80611 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "A handful of fixes:

   - Fix an MCE corner case bug/crash found via MCE injection testing

   - Fix 5-level paging boot crash

   - Fix MCE recovery cache invalidation bug

   - Fix regression on Xen guests caused by a recent PMD level mremap
     speedup optimization"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Make set_pmd_at() paravirt aware
  x86/mm/cpa: Fix set_mce_nospec()
  x86/boot/compressed/64: Do not corrupt EDX on EFER.LME=1 setting
  x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()
2019-02-10 09:57:42 -08:00
Linus Torvalds 73a4c52184 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
 "irqchip driver fixes: most of them are race fixes for ARM GIC (General
  Interrupt Controller) variants, but also a fix for the ARM MMP
  (Marvell PXA168 et al) irqchip affecting OLPC keyboards"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3-its: Fix ITT_entry_size accessor
  irqchip/mmp: Only touch the PJ4 IRQ & FIQ bits on enable/disable
  irqchip/gic-v3-its: Gracefully fail on LPI exhaustion
  irqchip/gic-v3-its: Plug allocation race for devices sharing a DevID
  irqchip/gic-v4: Fix occasional VLPI drop
2019-02-10 09:54:19 -08:00
Linus Torvalds 212146f080 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "A couple of kernel side fixes:

   - Fix the Intel uncore driver on certain hardware configurations

   - Fix a CPU hotplug related memory allocation bug

   - Remove a spurious WARN()

  ... plus also a handful of perf tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf script python: Add Python3 support to tests/attr.py
  perf trace: Support multiple "vfs_getname" probes
  perf symbols: Filter out hidden symbols from labels
  perf symbols: Add fallback definitions for GELF_ST_VISIBILITY()
  tools headers uapi: Sync linux/in.h copy from the kernel sources
  perf clang: Do not use 'return std::move(something)'
  perf mem/c2c: Fix perf_mem_events to support powerpc
  perf tests evsel-tp-sched: Fix bitwise operator
  perf/core: Don't WARN() for impossible ring-buffer sizes
  perf/x86/intel: Delay memory deallocation until x86_pmu_dead_cpu()
  perf/x86/intel/uncore: Add Node ID mask
2019-02-10 09:48:18 -08:00
Linus Torvalds d2a6aae99f Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
 "An rtmutex (PI-futex) deadlock scenario fix, plus a locking
  documentation fix"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Handle early deadlock return correctly
  futex: Fix barrier comment
2019-02-10 09:44:52 -08:00
Marcos Paulo de Souza 1e93642837 blk-sysfs: Rework documention of __blk_release_queue
The Notes section of the comment was removed, because now
blk_release_queue can only be executed from blk_cleanup_queue (being
called when the q->kobj reaches zero), and also blk_init_queue was removed
in a1ce35fa49.

Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-10 10:23:29 -07:00
Marcos Paulo de Souza 7585d5082e blk-cgroup: Fix doc related to blkcg_exit_queue
Since 4cf6324b17, a portion of function blk_cleanup_queue was moved to
a newly created function called blk_exit_queue, including the call of
blkcg_exit_queue. So, adjust the documenation according.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-10 08:24:08 -07:00
Juergen Gross 20e55bc17d x86/mm: Make set_pmd_at() paravirt aware
set_pmd_at() calls native_set_pmd() unconditionally on x86. This was
fine as long as only huge page entries were written via set_pmd_at(),
as Xen pv guests don't support those.

Commit 2c91bd4a4e ("mm: speed up mremap by 20x on large regions")
introduced a usage of set_pmd_at() possible on pv guests, leading to
failures like:

BUG: unable to handle kernel paging request at ffff888023e26778
#PF error: [PROT] [WRITE]
RIP: e030:move_page_tables+0x7c1/0xae0
move_vma.isra.3+0xd1/0x2d0
__se_sys_mremap+0x3c6/0x5b0
 do_syscall_64+0x49/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Make set_pmd_at() paravirt aware by just letting it use set_pmd().

Fixes: 2c91bd4a4e ("mm: speed up mremap by 20x on large regions")
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Cc: boris.ostrovsky@oracle.com
Cc: sstabellini@kernel.org
Cc: hpa@zytor.com
Cc: bp@alien8.de
Cc: torvalds@linux-foundation.org
Link: https://lkml.kernel.org/r/20190210074056.11842-1-jgross@suse.com
2019-02-10 08:47:12 +01:00
Jens Axboe eca7abf31a block: queue flag cleanup
We have QUEUE_FLAG_DEFAULT defined, but it's not used anymore since
the legacy IO stack is gone. Kill it.

Sanitize the queue flags in general, they use spaces (for some
reason), and the space is pretty sparse. With the flags renumbered,
we can more clearly see how many we have available.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-09 15:42:07 -07:00
Jens Axboe d11a399898 block: kill QUEUE_FLAG_FLUSH_NQ
We have various helpers for setting/clearing this flag, and also
a helper to check if the queue supports queueable flushes or not.
But nobody uses them anymore, kill it with fire.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-09 15:40:24 -07:00
Linus Torvalds df3865f8f5 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "One PM related driver bugfix and a MAINTAINERS update"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  MAINTAINERS: Update the ocores i2c bus driver maintainer, etc
  i2c: omap: Use noirq system sleep pm ops to idle device for suspend
2019-02-09 13:43:12 -08:00
Linus Torvalds e8b50608f6 A batch of MIPS fixes for 5.0, nothing too scary.
- A workaround for a Loongson 3 CPU bug is the biggest change, but
    still fairly straightforward. It adds extra memory barriers (sync
    instructions) around atomics to avoid a CPU bug that can break
    atomicity.
 
  - Loongson64 also sees a fix for powering off some systems which would
    incorrectly reboot rather than waiting for the power down sequence to
    complete.
 
  - We have DT fixes for the Ingenic JZ4740 SoC & the JZ4780-based Ci20
    board, and a DT warning fix for the Nexsys4/MIPSfpga board.
 
  - The Cavium Octeon platform sees a further fix to the behaviour of the
    pcie_disable command line argument that was introduced in v3.3.
 
  - The VDSO, introduced in v4.4, sees build fixes for configurations of
    GCC that were built using the --with-fp-32= flag to specify a default
    32-bit floating point ABI.
 
  - get_frame_info() sees a fix for configurations with
    CONFIG_KALLSYMS=n, for which it previously always returned an error.
 
  - If the MIPS Coherence Manager (CM) reports an error then we'll now
    clear that error correctly so that the GCR_ERROR_CAUSE register will
    be updated with information about any future errors.
 -----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCXF8rDhUccGF1bC5idXJ0
 b25AbWlwcy5jb20ACgkQPqefrLV1AN0PngEAyEGqu+OyUjLGZApcM49LDMcr5RHq
 V6ED2HApfNbxsAYBAN+Uv6fm4H1LAbU+sIoHUp7DGQ1f1oFsbij4V/bqP4IA
 =nv8r
 -----END PGP SIGNATURE-----

Merge tag 'mips_fixes_5.0_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Paul Burton:
 "A batch of MIPS fixes for 5.0, nothing too scary.

   - A workaround for a Loongson 3 CPU bug is the biggest change, but
     still fairly straightforward. It adds extra memory barriers (sync
     instructions) around atomics to avoid a CPU bug that can break
     atomicity.

   - Loongson64 also sees a fix for powering off some systems which
     would incorrectly reboot rather than waiting for the power down
     sequence to complete.

   - We have DT fixes for the Ingenic JZ4740 SoC & the JZ4780-based Ci20
     board, and a DT warning fix for the Nexsys4/MIPSfpga board.

   - The Cavium Octeon platform sees a further fix to the behaviour of
     the pcie_disable command line argument that was introduced in v3.3.

   - The VDSO, introduced in v4.4, sees build fixes for configurations
     of GCC that were built using the --with-fp-32= flag to specify a
     default 32-bit floating point ABI.

   - get_frame_info() sees a fix for configurations with
     CONFIG_KALLSYMS=n, for which it previously always returned an
     error.

   - If the MIPS Coherence Manager (CM) reports an error then we'll now
     clear that error correctly so that the GCR_ERROR_CAUSE register
     will be updated with information about any future errors"

* tag 'mips_fixes_5.0_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  mips: cm: reprime error cause
  mips: loongson64: remove unreachable(), fix loongson_poweroff().
  MIPS: Remove function size check in get_frame_info()
  MIPS: Use lower case for addresses in nexys4ddr.dts
  MIPS: Loongson: Introduce and use loongson_llsc_mb()
  MIPS: VDSO: Include $(ccflags-vdso) in o32,n32 .lds builds
  MIPS: VDSO: Use same -m%-float cflag as the kernel proper
  MIPS: OCTEON: don't set octeon_dma_bar_type if PCI is disabled
  DTS: CI20: Fix bugs in ci20's device tree.
  MIPS: DTS: jz4740: Correct interrupt number of DMA core
2019-02-09 12:41:14 -08:00
Linus Torvalds e5a8a11632 for-linus-20190209
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlxfARoQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjsgEACP8vQzbvsOZOxHKi9Vcd8ziwyjyBebNh4F
 cKOx2Blgv0ReVAqLOVp9VJOJQoVQumV1btaA2YrmevxnCMpNUBpbP6G02tAqe9Z+
 D75FSpZXy4UvcMSlhfc/iB/RMI06benI9LnuL7zbzIQtrbtu+OFRnO6fpQOVGLxT
 Qa1wt/Rgahc48L4aHnIgPn0nyBRsEvuhC6FjI2D8akDaNiaHzwtGbpx7yDdmLNml
 fCzC2uSRJ31bXsO/5/fJorinaJ56r5N8aHaINYwXDv8zd8i94nQZhITAasXub1Km
 0nyuAg/fSzIdkrGmPINTKFaGYsOfRwpS4C4vagreBhzjfolPY0z9sQEQ63gZzDrd
 mAjHPxLTd165OLlR/RxoMC8AjZCZ0/YQaucxUOPkaIHfth5/dy5BFaCkWyA/I7/Z
 VnAyq0SqeL4hgIOGxZM0HeehKx+palNdJNZTcY7vF/7MVPuh5WM6z/FWsFa8k+ss
 B9YN4wchh7I8EVbLmfz9s/eqabRWF3Agh1dE+yAKwt1KIWHaMXWZTRQnj/69fs2e
 s3pwVMiiSz6K/Xnoe12nmQ4K0XeyKNROO78IIGY/Oa0Pe/hzCAaJMRMDsLp5EcJj
 dxpoi1OfGHMGoqYhL6tx6Atq5f6CMDrS28k/D44DHfO7T1qQGVy1A9SY7ZCfM5+c
 HKxTuRh8mg==
 =tuL6
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20190209' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - NVMe pull request from Christoph, fixing namespace locking when
   dealing with the effects log, and a rapid add/remove issue (Keith)

 - blktrace tweak, ensuring requests with -1 sectors are shown (Jan)

 - link power management quirk for a Smasung SSD (Hans)

 - m68k nfblock dynamic major number fix (Chengguang)

 - series fixing blk-iolatency inflight counter issue (Liu)

 - ensure that we clear ->private when setting up the aio kiocb (Mike)

 - __find_get_block_slow() rate limit print (Tetsuo)

* tag 'for-linus-20190209' of git://git.kernel.dk/linux-block:
  blk-mq: remove duplicated definition of blk_mq_freeze_queue
  Blk-iolatency: warn on negative inflight IO counter
  blk-iolatency: fix IO hang due to negative inflight counter
  blktrace: Show requests without sector
  fs: ratelimit __find_get_block_slow() failure message.
  m68k: set proper major_num when specifying module param major_num
  libata: Add NOLPM quirk for SAMSUNG MZ7TE512HMHP-000L1 SSD
  nvme-pci: fix rapid add remove sequence
  nvme: lock NS list changes while handling command effects
  aio: initialize kiocb private in case any filesystems expect it.
2019-02-09 10:26:09 -08:00
Linus Torvalds 5610789ad0 - Fix a problem with the imx28 ECC engine
- Remove a debug trace introduced in 2b6f0090a3 ("mtd: Check
   add_mtd_device() ret code")
 - Make sure partitions of size 0 can be registered
 - Fix kernel-doc warning in the rawnand core
 - Fix the error path of spinand_init() (missing manufacturer cleanup
   in a few places)
 - Address a problem with the SPI NAND PROGRAM LOAD operation which
   does not work as expected on some parts.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJcXqZKAAoJEGXtNgF+CLcA01sQAJSSf/dnaie76vt9icUezM7b
 ulHdnGJEDazEYj5aIbFbT8S8TMpPA8FRR1zOsW1DbNA+khw82p8n2wykFzl2IYHh
 /qWzWcNGhr/s1tTwdS4lpR+lOyKJ3izk+AkL/vYwK1XlHu8uR+1/uazEiE/qNSKO
 grq+jIy5z5n8iVyhd6vjKumbkmWQlueTLVLGWyQeFIlSxYXyig1x3xhcvCcSBmGL
 +LpO+vB8T7G60nOAN5MTM0SDEoeg3R6clKeWWbVfZ5tkFoD8SvXagn3fYHIA6lDA
 pqGWUk6/HmUxec3DxLYZqfoaqi9b3u182H5sY6gOKApPF0HiBGJslo/tiplfnZVe
 3cKm2Mf+LJ4fkqdmCT/UcSm6GuP19CF0rSWPW9Xdq2jgOr5RJ5FlUWAucqM0QPmp
 q3LfQIiwegHh0PQ2dRPAFG1J6btnCxX+pnitNwskbU9XbnDvpPWnaUHcg/qXwhTt
 nImdqBr401d+bv9NgiPSgTQPltmSPZ1t+SI8AE6G1tdzarB2KQ0tOQL+CCMftGZ0
 /+eEY1+U05JrWzJhn7VnqszkNy9kvcszWqVkxomw+ZWQiiN7Nboaadzh/TJ8Sxog
 lOg3EGU3BWBSG3gD/G6v4DuuK9RpKuIzxBP/7ROJJsJRM9N+I4iI1babPdTVbUc0
 MX70tM0a9R5/y0EscaK0
 =QL6/
 -----END PGP SIGNATURE-----

Merge tag 'mtd/fixes-for-5.0-rc6' of git://git.infradead.org/linux-mtd

Pull mtd fixes from Boris Brezillon:

 - Fix a problem with the imx28 ECC engine

 - Remove a debug trace introduced in 2b6f0090a3 ("mtd: Check
   add_mtd_device() ret code")

 - Make sure partitions of size 0 can be registered

 - Fix kernel-doc warning in the rawnand core

 - Fix the error path of spinand_init() (missing manufacturer cleanup in
   a few places)

 - Address a problem with the SPI NAND PROGRAM LOAD operation which does
   not work as expected on some parts.

* tag 'mtd/fixes-for-5.0-rc6' of git://git.infradead.org/linux-mtd:
  mtd: rawnand: gpmi: fix MX28 bus master lockup problem
  mtd: Make sure mtd->erasesize is valid even if the partition is of size 0
  mtd: Remove a debug trace in mtdpart.c
  mtd: rawnand: fix kernel-doc warnings
  mtd: spinand: Fix the error/cleanup path in spinand_init()
  mtd: spinand: Handle the case where PROGRAM LOAD does not reset the cache
2019-02-09 10:17:01 -08:00
Linus Torvalds 3e5e692fcd xen: fixes for 5.0-rc6
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXF7kVAAKCRCAXGG7T9hj
 vqnqAP9eokv+M7IIhKMyOPg9L1H8pspaFvtvI3VDckJOv1yCqQD/VFi0aITkGdMU
 0W7LCawec41dRwCMQ2bg6XB16ArfLww=
 =Stew
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Two very minor fixes: one remove of a #include for an unused header
  and a fix of the xen ML address in MAINTAINERS"

* tag 'for-linus-5.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  MAINTAINERS: unify reference to xen-devel list
  arch/arm/xen: Remove duplicate header
2019-02-09 09:44:08 -08:00
Coly Li dc7292a5bc bcache: use (REQ_META|REQ_PRIO) to indicate bio for metadata
In 'commit 752f66a75a ("bcache: use REQ_PRIO to indicate bio for
metadata")' REQ_META is replaced by REQ_PRIO to indicate metadata bio.
This assumption is not always correct, e.g. XFS uses REQ_META to mark
metadata bio other than REQ_PRIO. This is why Nix noticed that bcache
does not cache metadata for XFS after the above commit.

Thanks to Dave Chinner, he explains the difference between REQ_META and
REQ_PRIO from view of file system developer. Here I quote part of his
explanation from mailing list,
   REQ_META is used for metadata. REQ_PRIO is used to communicate to
   the lower layers that the submitter considers this IO to be more
   important that non REQ_PRIO IO and so dispatch should be expedited.

   IOWs, if the filesystem considers metadata IO to be more important
   that user data IO, then it will use REQ_PRIO | REQ_META rather than
   just REQ_META.

Then it seems bios with REQ_META or REQ_PRIO should both be cached for
performance optimation, because they are all probably low I/O latency
demand by upper layer (e.g. file system).

So in this patch, when we want to decide whether to bypass the cache,
REQ_META and REQ_PRIO are both checked. Then both metadata and
high priority I/O requests will be handled properly.

Reported-by: Nix <nix@esperi.org.uk>
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Andre Noll <maan@tuebingen.mpg.de>
Tested-by: Nix <nix@esperi.org.uk>
Cc: stable@vger.kernel.org
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-09 07:18:33 -07:00
Coly Li a91fbda49f bcache: fix input overflow to cache set sysfs file io_error_halflife
Cache set sysfs entry io_error_halflife is used to set c->error_decay.
c->error_decay is in type unsigned int, and it is converted by
strtoul_or_return(), therefore overflow to c->error_decay is possible
for a large input value.

This patch fixes the overflow by using strtoul_safe_clamp() to convert
input string to an unsigned long value in range [0, UINT_MAX], then
divides by 88 and set it to c->error_decay.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-09 07:18:33 -07:00
Coly Li b15008403b bcache: fix input overflow to cache set io_error_limit
c->error_limit is in type unsigned int, it is set via cache set sysfs
file io_error_limit. Inside the bcache code, input string is converted
by strtoul_or_return() and set the converted value to c->error_limit.

Because the converted value is unsigned long, and c->error_limit is
unsigned int, if the input is large enought, overflow will happen to
c->error_limit.

This patch uses sysfs_strtoul_clamp() to convert input string, and set
the range in [0, UINT_MAX] to avoid the potential overflow.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-09 07:18:32 -07:00
Coly Li 453745fbbe bcache: fix input overflow to journal_delay_ms
c->journal_delay_ms is in type unsigned short, it is set via sysfs
interface and converted by sysfs_strtoul() from input string to
unsigned short value. Therefore overflow to unsigned short might be
happen when the converted value exceed USHRT_MAX. e.g. writing
65536 into sysfs file journal_delay_ms, c->journal_delay_ms is set to
0.

This patch uses sysfs_strtoul_clamp() to convert the input string and
limit value range in [0, USHRT_MAX], to avoid the input overflow.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-09 07:18:32 -07:00
Coly Li dab71b2db9 bcache: fix input overflow to writeback_rate_minimum
dc->writeback_rate_minimum is type unsigned integer variable, it is set
via sysfs interface, and converte from input string to unsigned integer
by d_strtoul_nonzero(). When the converted input value is larger than
UINT_MAX, overflow to unsigned integer happens.

This patch fixes the overflow by using sysfs_strotoul_clamp() to
convert input string and limit the value in range [1, UINT_MAX], then
the overflow can be avoided.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-09 07:18:32 -07:00
Coly Li 5b5fd3c94e bcache: fix potential div-zero error of writeback_rate_p_term_inverse
Current code already uses d_strtoul_nonzero() to convert input string
to an unsigned integer, to make sure writeback_rate_p_term_inverse
won't be zero value. But overflow may happen when converting input
string to an unsigned integer value by d_strtoul_nonzero(), then
dc->writeback_rate_p_term_inverse can still be set to 0 even if the
sysfs file input value is not zero, e.g. 4294967296 (a.k.a UINT_MAX+1).

If dc->writeback_rate_p_term_inverse is set to 0, it might cause a
dev-zero error in following code from __update_writeback_rate(),
	int64_t proportional_scaled =
		div_s64(error, dc->writeback_rate_p_term_inverse);

This patch replaces d_strtoul_nonzero() by sysfs_strtoul_clamp() and
limit the value range in [1, UINT_MAX]. Then the unsigned integer
overflow and dev-zero error can be avoided.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-09 07:18:32 -07:00
Coly Li c3b75a2199 bcache: fix potential div-zero error of writeback_rate_i_term_inverse
dc->writeback_rate_i_term_inverse can be set via sysfs interface. It is
in type unsigned int, and convert from input string by d_strtoul(). The
problem is d_strtoul() does not check valid range of the input, if
4294967296 is written into sysfs file writeback_rate_i_term_inverse,
an overflow of unsigned integer will happen and value 0 is set to
dc->writeback_rate_i_term_inverse.

In writeback.c:__update_writeback_rate(), there are following lines of
code,
      integral_scaled = div_s64(dc->writeback_rate_integral,
                      dc->writeback_rate_i_term_inverse);
If dc->writeback_rate_i_term_inverse is set to 0 via sysfs interface,
a div-zero error might be triggered in the above code.

Therefore we need to add a range limitation in the sysfs interface,
this is what this patch does, use sysfs_stroul_clamp() to replace
d_strtoul() and restrict the input range in [1, UINT_MAX].

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-09 07:18:32 -07:00
Coly Li 369d21a73a bcache: fix input overflow to writeback_delay
Sysfs file writeback_delay is used to configure dc->writeback_delay
which is type unsigned int. But bcache code uses sysfs_strtoul() to
convert the input string, therefore it might be overflowed if the input
value is too large. E.g. input value is 4294967296 but indeed 0 is
set to dc->writeback_delay.

This patch uses sysfs_strtoul_clamp() to convert the input string and
set the result value range in [0, UINT_MAX] to avoid such unsigned
integer overflow.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-09 07:18:32 -07:00
Coly Li f5c0b95d2e bcache: use sysfs_strtoul_bool() to set bit-field variables
When setting bcache parameters via sysfs, there are some variables are
defined as bit-field value. Current bcache code in sysfs.c uses either
d_strtoul() or sysfs_strtoul() to convert the input string to unsigned
integer value and set it to the corresponded bit-field value.

The problem is, the bit-field value only takes the lowest bit of the
converted value. If input is 2, the expected value (like bool value)
of the bit-field value should be 1, but indeed it is 0.

The following sysfs files for bit-field variables have such problem,
	bypass_torture_test,	for dc->bypass_torture_test
	writeback_metadata,	for dc->writeback_metadata
	writeback_running,	for dc->writeback_running
	verify,			for c->verify
	key_merging_disabled,	for c->key_merging_disabled
	gc_always_rewrite,	for c->gc_always_rewrite
	btree_shrinker_disabled,for c->shrinker_disabled
	copy_gc_enabled,	for c->copy_gc_enabled

This patch uses sysfs_strtoul_bool() to set such bit-field variables,
then if the converted value is non-zero, the bit-field variables will
be set to 1, like setting a bool value like expensive_debug_checks.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-09 07:18:32 -07:00
Coly Li e4db37fb69 bcache: add sysfs_strtoul_bool() for setting bit-field variables
When setting bool values via sysfs interface, e.g. writeback_metadata,
if writing 1 into writeback_metadata file, dc->writeback_metadata is
set to 1, but if writing 2 into the file, dc->writeback_metadata is
0. This is misleading, a better result should be 1 for all non-zero
input value.

It is because dc->writeback_metadata is a bit-field variable, and
current code simply use d_strtoul() to convert a string into integer
and takes the lowest bit value. To fix such error, we need a routine
to convert the input string into unsigned integer, and set target
variable to 1 if the converted integer is non-zero.

This patch introduces a new macro called sysfs_strtoul_bool(), it can
be used to convert input string into bool value, we can use it to set
bool value for bit-field vairables.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-09 07:18:32 -07:00
Coly Li 8c27a3953e bcache: fix input overflow to sequential_cutoff
People may set sequential_cutoff of a cached device via sysfs file,
but current code does not check input value overflow. E.g. if value
4294967295 (UINT_MAX) is written to file sequential_cutoff, its value
is 4GB, but if 4294967296 (UINT_MAX + 1) is written into, its value
will be 0. This is an unexpected behavior.

This patch replaces d_strtoi_h() by sysfs_strtoul_clamp() to convert
input string to unsigned integer value, and limit its range in
[0, UINT_MAX]. Then the input overflow can be fixed.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-09 07:18:32 -07:00