This action is meant to be passive, i.e. we should not alter
skb->nfct: If nfct is present just leave it alone.
Compile tested only.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the 0x0x prefix in an integer constant.
In this case, while at it, also fix a typo (s/unitcast/unicast/).
Signed-off-by: Antonio Ospite <ao2@ao2.it>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the 0x0x prefix in an integer constant.
Signed-off-by: Antonio Ospite <ao2@ao2.it>
Cc: Jay Cliburn <jcliburn@gmail.com>
Cc: Chris Snook <chris.snook@gmail.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt says:
====================
bnx2x: minor cleanups related to TPA bits
I noticed some simplification possibilities while looking into the bug
fixed by "bnx2x: really disable TPA if 'disable_tpa' is set'.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
These flags are redundant with dev->features. Remove them.
Just make sure to set dev->features ourselves in bnx2x_set_features()
before performing the reload of the card.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is simpler to have the TPA mode as one three-state variable.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If disable_tpa is set, remove NETIF_F_LRO from hw_features, so ethtool sees
it as "off [fixed]".
Note that setting the NETIF_F_LRO bit in dev->features in the 'else'
branch is not needed, because the bit was already set by
bnx2x_init_dev().
Then the check for disable_tpa in in bnx2x_fix_features() becomes unnecessary.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. Introduce netif-msg to netvsc to control debug logging output
and keep msg_enable in netvsc_device_context so that it is
kept persistently.
2. Only call dump_rndis_message() when NETIF_MSG_RX_ERR or above
is specified in netvsc module debug param.
In non-debug mode, in current code, dump_rndis_message() will not
dump anything but it still initialize some local variables and
process the switch logic which is unnecessary, especially in
high network throughput situation.
Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 3cdaa5be9e ("ipv4: Don't
increase PMTU with Datagram Too Big message") broke PMTU in cases
where the rt_pmtu value has expired but is smaller than the new
PMTU value.
This obsolete rt_pmtu then prevents the new PMTU value from being
installed.
Fixes: 3cdaa5be9e ("ipv4: Don't increase PMTU with Datagram Too Big message")
Reported-by: Gerd v. Egidy <gerd.von.egidy@intra2net.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Fix a crash in nf_tables when dictionaries are used from the ruleset,
due to memory corruption, from Florian Westphal.
2) Fix another crash in nf_queue when used with br_netfilter. Also from
Florian.
Both fixes are related to new stuff that got in 4.0-rc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ALU64_DIV instruction should be dividing 64-bit by 64-bit,
whereas do_div() does 64-bit by 32-bit divide.
x64 and arm64 JITs correctly implement 64 by 64 unsigned divide.
llvm BPF backend emits code assuming that ALU64_DIV does 64 by 64.
Fixes: 89aa075832 ("net: sock: allow eBPF programs to be attached to sockets")
Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when interface type is MAC to Phy, netif_carrier_(on/off)
is called which is not needed as Phy lib already updates the carrier
status to net stack. This is needed only for other interface types
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) mlx4 doesn't check fully for supported valid RSS hash function, fix
from Amir Vadai
2) Off by one in ibmveth_change_mtu(), from David Gibson
3) Prevent altera chip from reporting false error interrupts in some
circumstances, from Chee Nouk Phoon
4) Get rid of that stupid endless loop trying to allocate a FIN packet
in TCP, and in the process kill deadlocks. From Eric Dumazet
5) Fix get_rps_cpus() crash due to wrong invalid-cpu value, also from
Eric Dumazet
6) Fix two bugs in async rhashtable resizing, from Thomas Graf
7) Fix topology server listener socket namespace bug in TIPC, from Ying
Xue
8) Add some missing HAS_DMA kconfig dependencies, from Geert
Uytterhoeven
9) bgmac driver intends to force re-polling but does so by returning
the wrong value from it's ->poll() handler. Fix from Rafał Miłecki
10) When the creater of an rhashtable configures a max size for it,
don't bark in the logs and drop insertions when that is exceeded.
Fix from Johannes Berg
11) Recover from out of order packets in ppp mppe properly, from Sylvain
Rochet
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
bnx2x: really disable TPA if 'disable_tpa' option is set
net:treewide: Fix typo in drivers/net
net/mlx4_en: Prevent setting invalid RSS hash function
mdio-mux-gpio: use new gpiod_get_array and gpiod_put_array functions
netfilter; Add some missing default cases to switch statements in nft_reject.
ppp: mppe: discard late packet in stateless mode
ppp: mppe: sanity error path rework
net/bonding: Make DRV macros private
net: rfs: fix crash in get_rps_cpus()
altera tse: add support for fixed-links.
pxa168: fix double deallocation of managed resources
net: fix crash in build_skb()
net: eth: altera: Resolve false errors from MSGDMA to TSE
ehea: Fix memory hook reference counting crashes
net/tg3: Release IRQs on permanent error
net: mdio-gpio: support access that may sleep
inet: fix possible panic in reqsk_queue_unlink()
rhashtable: don't attempt to grow when at max_size
bgmac: fix requests for extra polling calls from NAPI
tcp: avoid looping in tcp_send_fin()
...
bnx2x's 'disable_tpa=1' module option is not respected properly and TPA
(transparent packet aggregation) remains enabled. Even though the
module option causes LRO to be disabled, TPA is enabled in GRO mode.
Additionally, disabling GRO via ethtool then has no effect. One can
still observe tpa_* statistics increase and large packets being received
in tcpdump.
The bug was an unintended consequence of commit aebf6244cd "bnx2x: Be
more forgiving toward SW GRO".
Fix it by following the bp->disable_tpa flag when initializing fp's.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx4_en_check_rxfh_func() was checking for hardware support before
setting a known RSS hash function, but didn't do any check before
setting unknown RSS hash function. Need to make it fail on such values.
In this occasion, moved the actual setting of the new value from the
check function into mlx4_en_set_rxfh().
Fixes: 947cbb0 ("net/mlx4_en: Support for configurable RSS hash function")
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the new gpiod_get_array and gpiod_put_array functions
(added to mainline in the v4.1 merge window) for obtaining and
disposing of GPIO descriptors.
Cc: David Miller <davem@davemloft.net>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Rojhalat Ibrahim <imr@rtschenk.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes:
====================
net/netfilter/nft_reject.c: In function ‘nft_reject_dump’:
net/netfilter/nft_reject.c:61:2: warning: enumeration value ‘NFT_REJECT_TCP_RST’ not handled in switch [-Wswitch]
switch (priv->type) {
^
net/netfilter/nft_reject.c:61:2: warning: enumeration value ‘NFT_REJECT_ICMPX_UNREACH’ not handled in switch [-Wswi\
tch]
net/netfilter/nft_reject_inet.c: In function ‘nft_reject_inet_dump’:
net/netfilter/nft_reject_inet.c:105:2: warning: enumeration value ‘NFT_REJECT_TCP_RST’ not handled in switch [-Wswi\
tch]
switch (priv->type) {
^
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sylvain Rochet says:
====================
ppp: mppe: fixes MPPE desync on links which don't guarantee packet ordering
I am currently having an issue with PPP over L2TP (UDP) and MPPE in
stateless mode (default mode), UDP does not guarantee packet ordering so
we might get out of order packet. MPPE needs to be continuously synched
so we should drop late UDP packet.
I added a printk on the number of time we rekeyed in MPPE decompressor,
this is what we currently have if we receive a slightly out of order UDP
packet:
[1731001.049206] mppe_decompress[1]: ccount 1559
[1731001.049216] mppe_decompress[1]: rekeyed 1 times
[1731001.049228] mppe_decompress[1]: ccount 1560
[1731001.049232] mppe_decompress[1]: rekeyed 1 times
[1731001.050170] mppe_decompress[1]: ccount 1562
[1731001.050182] mppe_decompress[1]: rekeyed 2 times
[1731001.050191] mppe_decompress[1]: ccount 1561
[1731001.062576] mppe_decompress[1]: rekeyed 4095 times
^^^^
This is obviously wrong, we missed packet 1561 and we already rekeyed 2
times for 1562 we previously received, we can't recover the decryption
key we need for 1561, we should drop it instead of rekeying 4095 times.
This patch series drop any packet with are not within the 4096/2 forward
window.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When PPP is used over a link which does not guarantee packet ordering,
we might get late MPPE packets. This is a problem because MPPE must be
kept synchronized and the current implementation does not drop them and
rekey 4095 times instead of 0, which is wrong.
In order to prevent rekeying about a whole count space times (~ 4095
times), drop packets which are not within the forward 4096/2 window and
increase sanity error counter.
Signed-off-by: Sylvain Rochet <sylvain.rochet@finsecur.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are going to need sanity error path a little further, rework to be
able to use the sanity error path anywhere in decompressor.
Signed-off-by: Sylvain Rochet <sylvain.rochet@finsecur.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bonding modules currently defines four macros with
general names that pollute the global namespace:
DRV_VERSION
DRV_RELDATE
DRV_NAME
DRV_DESCRIPTION
Fixing that by defining a private bonding_priv.h
header files which includes those defines.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with
SS == 0 results in an invalid usermode state in which SS is apparently
equal to __USER_DS but causes #SS if used.
Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
ensuring that SYSRET never happens with SS set to NULL.
This was exposed by a recent vDSO cleanup.
Fixes: e7d6eefaaa x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull intel drm fixes from Dave Airlie.
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915: vlv: fix save/restore of GFX_MAX_REQ_COUNT reg
drm/i915: Workaround to avoid lite restore with HEAD==TAIL
drm/i915: cope with large i2c transfers
Pull intel iommu updates from David Woodhouse:
"This lays a little of the groundwork for upcoming Shared Virtual
Memory support — fixing some bogus #defines for capability bits and
adding the new ones, and starting to use the new wider page tables
where we can, in anticipation of actually filling in the new fields
therein.
It also allows graphics devices to be assigned to VM guests again.
This got broken in 3.17 by disallowing assignment of RMRR-afflicted
devices. Like USB, we do understand why there's an RMRR for graphics
devices — and unlike USB, it's actually sane. So we can make an
exception for graphics devices, just as we do USB controllers.
Finally, tone down the warning about the X2APIC_OPT_OUT bit, due to
persistent requests. X2APIC_OPT_OUT was added to the spec as a nasty
hack to allow broken BIOSes to forbid us from using X2APIC when they
do stupid and invasive things and would break if we did.
Someone noticed that since Windows doesn't have full IOMMU support for
DMA protection, setting the X2APIC_OPT_OUT bit made Windows avoid
initialising the IOMMU on the graphics unit altogether.
This means that it would be available for use in "driver mode", where
the IOMMU registers are made available through a BAR of the graphics
device and the graphics driver can do SVM all for itself.
So they started setting the X2APIC_OPT_OUT bit on *all* platforms with
SVM capabilities. And even the platforms which *might*, if the
planets had been aligned correctly, possibly have had SVM capability
but which in practice actually don't"
* git://git.infradead.org/intel-iommu:
iommu/vt-d: support extended root and context entries
iommu/vt-d: Add new extended capabilities from v2.3 VT-d specification
iommu/vt-d: Allow RMRR on graphics devices too
iommu/vt-d: Print x2apic opt out info instead of printing a warning
iommu/vt-d: kill bogus ecap_niotlb_iunits()
Pull i2c fixes from Wolfram Sang:
"This has a mixture of merge window cleanups and bugfixes"
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: st: add include for pinctrl
i2c: mux: use proper dev when removing "channel-X" symlinks
i2c: digicolor: remove duplicate include
i2c: Mark adapter devices with pm_runtime_no_callbacks
i2c: pca-platform: fix broken email address
i2c: mxs: fix broken email address
i2c: rk3x: report number of messages transmitted
Pull btrfs fixes from Chris Mason:
"Filipe hit two problems in my block group cache patches. We finalized
the fixes last week and ran through more tests"
* 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: prevent list corruption during free space cache processing
Btrfs: fix inode cache writeout
three fixes for i915.
* tag 'drm-intel-next-fixes-2015-04-25' of git://anongit.freedesktop.org/drm-intel:
drm/i915: vlv: fix save/restore of GFX_MAX_REQ_COUNT reg
drm/i915: Workaround to avoid lite restore with HEAD==TAIL
drm/i915: cope with large i2c transfers
Highlights include:
Stable patches:
- Fix a regression in /proc/self/mountstats
- Fix the pNFS flexfiles O_DIRECT support
- Fix high load average due to callback thread sleeping
Bugfixes:
- Various patches to fix the pNFS layoutcommit support
- Do not cache pNFS deviceids unless server notifications are enabled
- Fix a SUNRPC transport reconnection regression
- make debugfs file creation failure non-fatal in SUNRPC
- Another fix for circular directory warnings on NFSv4 "junctioned" mountpoints
- Fix locking around NFSv4.2 fallocate() support
- Truncating NFSv4 file opens should also sync O_DIRECT writes
- Prevent infinite loop in rpcrdma_ep_create()
Features:
- Various improvements to the RDMA transport code's handling of memory
registration
- Various code cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVOmT6AAoJEGcL54qWCgDyrhYQAMPKXB55jrdOR/7UVSF/xPML
7OjMGHvBnTn/y0pNIyLyS1PjTZZsD/WZjoW9EFGpTv727qQNVoFxFRLNUcgi3NoL
1YledCkLf7Q32aqod93SRRFPc9hzBoKhOZpOzBuWaAviyAB3KLi70DWAq9qRReYM
prXUQQjpW5FLU+B2ifaVc2RCnu/rZ2c02YdR2XdtkBaAJxuhB2vR8IY1evwjCv3R
5zyLDd9zSDDoArdpUzM97cxZPcYRSqbOwgTKvaaRnDDq/mKbKMZaqmEfjblwzNFt
b43FbveJzZ3hlPADIpmaiMHjRTbxWjIKc9K1sOF2FPfcuPe2yM3DMAxDegUkEveS
7fkbv/qRZ30NqfchGanX/pmBlLOcdI76qe/bwhN19wCnw48O1eeHi1HK8rWGhU+E
qcrRZ3ZS2ufP/MVBuhauy0qU9Q4wcEtm7NGGP1231ZtmfjHKyBa4pLirNfG1AlJt
dK7tBrknVx+WVm/UddJp/fEsxbP0+fki6TwzioHUSWcz8rDVYF6PFT/QPM54SX2h
0oqwvu6d/uShpkVRm+fbje8FHmUxKdgqDsCYX2fNjWskh1oXSPsItvjqmTmTlE0i
EBmBwVwI0uB1ZQ3PrJLadhRcO3ZJmLQ5gNj456dstvWy6UQds1xyIQ/DgvmlzxWO
E9t0l18xHGRwbndsDa8f
=j5dP
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Another set of mainly bugfixes and a couple of cleanups. No new
functionality in this round.
Highlights include:
Stable patches:
- Fix a regression in /proc/self/mountstats
- Fix the pNFS flexfiles O_DIRECT support
- Fix high load average due to callback thread sleeping
Bugfixes:
- Various patches to fix the pNFS layoutcommit support
- Do not cache pNFS deviceids unless server notifications are enabled
- Fix a SUNRPC transport reconnection regression
- make debugfs file creation failure non-fatal in SUNRPC
- Another fix for circular directory warnings on NFSv4 "junctioned"
mountpoints
- Fix locking around NFSv4.2 fallocate() support
- Truncating NFSv4 file opens should also sync O_DIRECT writes
- Prevent infinite loop in rpcrdma_ep_create()
Features:
- Various improvements to the RDMA transport code's handling of
memory registration
- Various code cleanups"
* tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (55 commits)
fs/nfs: fix new compiler warning about boolean in switch
nfs: Remove unneeded casts in nfs
NFS: Don't attempt to decode missing directory entries
Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one"
NFS: Rename idmap.c to nfs4idmap.c
NFS: Move nfs_idmap.h into fs/nfs/
NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h
NFS: Add a stub for GETDEVICELIST
nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes
nfs: fix DIO good bytes calculation
nfs: Fetch MOUNTED_ON_FILEID when updating an inode
sunrpc: make debugfs file creation failure non-fatal
nfs: fix high load average due to callback thread sleeping
NFS: Reduce time spent holding the i_mutex during fallocate()
NFS: Don't zap caches on fallocate()
xprtrdma: Make rpcrdma_{un}map_one() into inline functions
xprtrdma: Handle non-SEND completions via a callout
xprtrdma: Add "open" memreg op
xprtrdma: Add "destroy MRs" memreg op
xprtrdma: Add "reset MRs" memreg op
...
Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only
- Fix a build warning in the intel_pstate driver showing up in non-SMP
builds (Borislav Petkov).
- Change one of the intel_pstate's P-state selection parameters for
Baytrail and Cherrytrail CPUs to significantly improve performance
at the cost of a small increase in energy consumption (Kristen
Carlson Accardi).
- Fix a NULL pointer dereference in the ACPI EC driver due to an unsafe
list walk in the query handler removal routine (Chris Bainbridge).
- Get rid of a false-positive lockdep warning in the ACPI container
hot-remove code (Rafael J Wysocki).
- Prevent the ACPI device enumeration code from creating device
objects of a wrong type in some cases (Rafael J Wysocki).
- Add Skylake processors support to the Intel RAPL power capping
driver (Brian Bian).
- Drop the stale MAINTAINERS entry for the ACPI dock driver that is
regarded as part of the ACPI core and maintained along with it now
(Chao Yu).
- Fix cpupower tool breakage caused by a library API change in libpci
3.3.0 (Lucas Stach).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJVOoMNAAoJEILEb/54YlRxx+QP/A/gaX9GgB6rv627Khlaw2AK
PpE1SrJoU9jAEwPqXtRnYc/6nducDzO5XaHOdWaV02YCpKLL5vIyia6wcy7cf4kr
6vJUnBz0OiHjp4e4sYrvIGt7RrLnlodayKAzNgtjDjF2JE3gzOOldD0klcfisiUP
Uc7Y3bpFndS8fwpcJO0tGm73aA9kNC5N8sYxgtuv/OAeMmJgi0fIi2V227XIurJU
48nMJ1v7M3OChmhdfxEtUSR9TM8J0Ck3yJwPYORNFItRCGqHJb3WY6lo1kbNcv1g
SLwbw+nCaq+E4k2PtDAsg68Ni0uGGnytcBTzhtPeUOdPRulp4CHCnZEyMUBwFVV8
gZlgL2S3CxcBlZ52hVB9tAhAyRcxmaWrp1v/hNsTh1xX9v9JFUFdpjYvk1RlHKwD
aT30G1mV8icU9lrYzhNOnh9gMrZVv6wy4sp4Uk1NLa17WXu+p0LMrCX8HJEb3PT3
rYZ8jbzo422lTksopiLTLhY4ipgX+kysa7NhTtw/J5Hb9xctho0uTsIhP6sSkh2e
eHO8WthEpF36228n6qiGGTBMcs4xPD8goxXCeBmgsBeVAQDaVR1IcArVTS+HAaJX
KOhemws+ThpEpjqB8jbpa42OEmY1Um3UUufHwnJIX54vGkSlXBxXE9r3hCwK7Mgu
hQhnM9hargHcy9Z8vd91
=0bKD
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-4.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management and ACPI updates from Rafael Wysocki:
"These are fixes mostly (intel_pstate, ACPI core, ACPI EC driver,
cpupower tool), a new CPU ID for the Intel RAPL driver and one
intel_pstate driver improvement that didn't make it to my previous
pull requests due to timing.
Specifics:
- Fix a build warning in the intel_pstate driver showing up in
non-SMP builds (Borislav Petkov)
- Change one of the intel_pstate's P-state selection parameters for
Baytrail and Cherrytrail CPUs to significantly improve performance
at the cost of a small increase in energy consumption (Kristen
Carlson Accardi)
- Fix a NULL pointer dereference in the ACPI EC driver due to an
unsafe list walk in the query handler removal routine (Chris
Bainbridge)
- Get rid of a false-positive lockdep warning in the ACPI container
hot-remove code (Rafael J Wysocki)
- Prevent the ACPI device enumeration code from creating device
objects of a wrong type in some cases (Rafael J Wysocki)
- Add Skylake processors support to the Intel RAPL power capping
driver (Brian Bian)
- Drop the stale MAINTAINERS entry for the ACPI dock driver that is
regarded as part of the ACPI core and maintained along with it now
(Chao Yu)
- Fix cpupower tool breakage caused by a library API change in libpci
3.3.0 (Lucas Stach)"
* tag 'pm+acpi-4.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / scan: Add a scan handler for PRP0001
ACPI / scan: Annotate physical_node_lock in acpi_scan_is_offline()
ACPI / EC: fix NULL pointer dereference in acpi_ec_remove_query_handler()
MAINTAINERS: remove maintainship entry of docking station driver
powercap / RAPL: Add support for Intel Skylake processors
cpufreq: intel_pstate: Fix an annoying !CONFIG_SMP warning
intel_pstate: Change the setpoint for Atom params
cpupower: fix breakage from libpci API change
Pull crypto fixes from Herbert Xu:
"This push fixes a build problem with img-hash under non-standard
configurations and a serious regression with sha512_ssse3 which can
lead to boot failures"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: img-hash - CRYPTO_DEV_IMGTEC_HASH should depend on HAS_DMA
crypto: x86/sha512_ssse3 - fixup for asm function prototype change
dell-laptop: Add support for keyboard backlight.
toshiba_acpi: Adaptive keyboard, hotkey, USB sleep and charge,
and backlight updates. Update sysfs documentation.
toshiba_bluetooth: Fix enabling/disabling loop on recent devices
apple-gmux: lock iGP IO to protect from vgaarb changes
other: Fix typos, clear gcc warnings, clarify pr_* messages,
correct return types, update MAINTAINERS.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJVOzXzAAoJEKbMaAwKp364yjQH/3RJQAiemygVKTv8npk6am4W
6NjoQHwFbvbHnea1DsMBI66DgvDFuXBi04/eKoFtZiSQdt3LOWyF04VY7yPdGKT/
0yIgxMonhLk/lbBiU1PmyAsloOI4mG3zylOO+zJv66LeW0q2vjlLK7xE7AJn0dVU
hRn+Wl0YCjPzEEB4uZpKY6V0+7ys0Odxd2MeYu7pcs5DQzbvzeo4JRwUL4VtNiX9
M1I4ucBRA9jjnuNDzr4d9WtttorOOymoBYy3KFE+2QzDr5chhXTbWp6mRzwnYRvy
siOEPLzeR9jTSB4U514I1CktsCmYxvGnrGcNj1IgiY8VFujoh9j6Ndh339f8064=
=O+n4
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v4.1-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
Pull x86 platform driver updates from Darren Hart:
"This series includes significant updates to the toshiba_acpi driver
and the reintroduction of the dell-laptop keyboard backlight additions
I had to revert previously. Also included are various fixes for
typos, warnings, correctness, and minor bugs.
Specifics:
dell-laptop:
- add support for keyboard backlight.
toshiba_acpi:
- adaptive keyboard, hotkey, USB sleep and charge, and backlight
updates. Update sysfs documentation.
toshiba_bluetooth:
- fix enabling/disabling loop on recent devices
apple-gmux:
- lock iGP IO to protect from vgaarb changes
other:
- Fix typos, clear gcc warnings, clarify pr_* messages, correct
return types, update MAINTAINERS"
* tag 'platform-drivers-x86-v4.1-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: (25 commits)
toshiba_acpi: Do not register vendor backlight when acpi_video bl is available
MAINTAINERS: Add me on list of Dell laptop drivers
platform: x86: dell-laptop: Add support for keyboard backlight
Documentation/ABI: Update sysfs-driver-toshiba_acpi entry
toshiba_acpi: Fix pr_* messages from USB Sleep Functions
toshiba_acpi: Update and fix USB Sleep and Charge modes
wmi: Use bool function return values of true/false not 1/0
toshiba_bluetooth: Fix enabling/disabling loop on recent devices
toshiba_bluetooth: Clean up *_add function and disable BT device at removal
toshiba_bluetooth: Add three new functions to the driver
toshiba_acpi: Fix the enabling of the Special Functions
toshiba_acpi: Use the Hotkey Event Type function for keymap choosing
toshiba_acpi: Add Hotkey Event Type function and definitions
x86/wmi: delete unused wmi_data_lock mutex causing gcc warning
apple-gmux: lock iGP IO to protect from vgaarb changes
MAINTAINERS: Add missing Toshiba devices and add myself as maintainer
toshiba_acpi: Update events in toshiba_acpi_notify
intel-oaktrail: Fix trivial typo in comment
thinkpad_acpi: off by one in adaptive_keyboard_hotkey_notify_hotkey()
thinkpad_acpi: signedness bugs getting current_mode
...
Here's a set of updates to the Chrome OS platform drivers for this merge window.
Main new things this cycle is:
- Driver changes to expose the lightbar to users. With this, you can make your
own blinkenlights on Chromebook Pixels.
- Changes in the way that the atmel_mxt trackpads are probed. The laptop driver
is trying to be smart and not instantiate the devices that don't answer to
probe. For the trackpad that can come up in two modes (bootloader or regular),
this gets complicated since the driver already knows how to handle the two
modes including the actual addresses used. So now the laptop driver needs to
know more too, instantiating the regular address even if the bootloader one
is the probe that passed.
- mfd driver improvements by Javier Martines Canillas, and a few bugfixes
from him, kbuild and myself.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVOyVhAAoJEIwa5zzehBx3U/gP/jEqIMKEB6r0qApnYLU/0v2V
6AiAtQBDZ6PSNDOqy5Mo5HoMQ0WI09n4xvml3Ntmx0/584RGECn8nlFvwlowIxNo
FLGYcKWuy8w8wKgN19hhEYySnTEex4+kBuDTITvya61SpvxUUfu7fpGV+DXwM2CS
aJQdMOwl24BJ4gjev9JS5QasyZrAzZVuDwo8vSKG6PKZNGgC1uyjOrm+NjiTEW15
FzCk77rRHfiN6Zr9C79ZfqV/nWKm4rPvaJJOiNr2vZUQ/0bhbvSHp3/BekjtnlOv
W6GbUCoDT6/DU/p1SP2Yegqk5pOEcqKQFe7Uc3YDSfiNLNCp03nF1RuIoi/NzfDy
1GcLYWAvHCrtmpQwqM/gIgc9uAsFN9Stin2G79xt3U/dUitdAmwMsCfqDE1FO63e
pGjPx0H7e1Ot3en3O5agaAlYlsokptKl3bIVOMfK6s6bH3RK4Y83LxwsVQKYkayA
TyulczOPnx6i4+acQroIwpFTj8QhhNjjhBU5gXTebVj4B/CwfieZBadaYF23O765
shX71oUJ1gQ6LCZtu8brl/82uk3sSkpVDi8e5WWaSnLfnAmqtU/ITy5yg77uuD0b
RAdHxVFUO6Y0FspWmWzBckrPec7ub+SKglCACq8HNciGx/9BWx6NUWI9FK93CDIu
O36D/l9hoUvA0gds5Iom
=NVa4
-----END PGP SIGNATURE-----
Merge tag 'chrome-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform
Pull chrome platform updates from Olof Johansson:
"Here's a set of updates to the Chrome OS platform drivers for this
merge window.
Main new things this cycle is:
- Driver changes to expose the lightbar to users. With this, you can
make your own blinkenlights on Chromebook Pixels.
- Changes in the way that the atmel_mxt trackpads are probed. The
laptop driver is trying to be smart and not instantiate the devices
that don't answer to probe. For the trackpad that can come up in
two modes (bootloader or regular), this gets complicated since the
driver already knows how to handle the two modes including the
actual addresses used. So now the laptop driver needs to know more
too, instantiating the regular address even if the bootloader one
is the probe that passed.
- mfd driver improvements by Javier Martines Canillas, and a few
bugfixes from him, kbuild and myself"
* tag 'chrome-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform:
platform/chrome: chromeos_laptop - instantiate Atmel at primary address
platform/chrome: cros_ec_lpc - Depend on X86 || COMPILE_TEST
platform/chrome: cros_ec_lpc - Include linux/io.h header file
platform/chrome: fix platform_no_drv_owner.cocci warnings
platform/chrome: cros_ec_lightbar - fix duplicate const warning
platform/chrome: cros_ec_dev - fix Unknown escape '%' warning
platform/chrome: Expose Chrome OS Lightbar to users
platform/chrome: Create sysfs attributes for the ChromeOS EC
mfd: cros_ec: Instantiate ChromeOS EC character device
platform/chrome: Add Chrome OS EC userspace device interface
platform/chrome: Add cros_ec_lpc driver for x86 devices
mfd: cros_ec: Add char dev and virtual dev pointers
mfd: cros_ec: Use fixed size arrays to transfer data with the EC
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iEYEABECAAYFAlU70nMACgkQ31LbvUHyf1cYgwCfSmPhyLFmr0pGM/BxsVY7K1v6
PaEAn2+7xfZV38E6hwrGMrT42ZvKyL6r
=LHQU
-----END PGP SIGNATURE-----
Merge tag 'cris-for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jesper/cris
Pull arch/cris updates from Jesper Nilsson:
"Some much needed love for the CRIS-port.
There's a bunch of changes this time, giving the CRISv32 port a bit of
modern makeover with device-tree, irq domain and gpiolib support, and
more switchover to generic frameworks.
Some small fixes and removal of the theoretical SMP support brings up
the rear"
* tag 'cris-for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jesper/cris:
cris: fix integer overflow in ELF_ET_DYN_BASE
CRISv32: use GENERIC_SCHED_CLOCK
CRISv32: use MMIO clocksource
CRISv32: use generic clockevents
CRIS: use generic headers via Kbuild
CRIS: use generic cmpxchg.h
CRIS: use generic atomic.h
CRIS: use generic atomic bitops
CRISv10: remove redundant macros from system.h
CRIS: remove SMP code
CRISv32: don't enable irqs in INIT_THREAD
CRISv32: handle multiple signals
CRISv32: prevent bogus restarts on sigreturn
CRISv32: don't attempt syscall restart on irq exit
Add binding documentation for CRIS
CRIS: add Axis 88 board device tree
CRISv32: add device tree support
CRISv32: add irq domains support
CRIS: enable GPIOLIB
- Fix for mm_dec_nr_pmds() from Scott.
- Fixes for oopses seen with KVM + THP from Aneesh.
- Build fixes from Aneesh & Shreyas.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVOsk5AAoJEFHr6jzI4aWACWMP/3EaNoeA1g8VbZWZEdRaoLvX
W7D08DI3Dt8HLxyn2JR08jYZF0gr68XrF6OiscVVki7wVXT8fbH4jSNmBkbzNH95
d9taScJyR1CUavkhsXivnR1qEE1Fi2KA2OW9RaNfoSt1MVtdsvOK6xXklUGksuJQ
XygzyrRr4Dj82kuMUAMO0YDMvknMlzi3a8dzyrWZBXBZOOTWavGB6bQKtCTaOQ99
3OFGLQ10uY7lmdHDi0t0tQ99FuYfLiJpg5fTLoUni4J5tFp8JlZ+x0Gwc0apN0cy
Ym8EO6++qWDv8FXvYEPfVUEjbF1fyPiawUgpkMnyvXgd8K5G85SIrtkGW0Ml+6sX
GfJH8w9hpDbF5EnWlC9bn/jT7sHBHFdrxZuQUc0L4M2OtM73R2a0Xr3b7ZxFCD1q
7RpYu8MKKcyvaIXNg7VBJjj8zL+WmUJKF6J5uX5bGU2xH0khmp0vTknyyjbwrlcF
uHidv5ZhMt3aAI70v14jA5BTEmLyOYRu58Ei6cT/VT/DjdbpEApdK8BMAvKSEeib
+hzh6oDFT92AM0tbg15bNmqGbGfgqtVKe4GDS2QyGaHGAFOGs1nPuSa9se1xYDcM
CCtRyABwpzJsrCfwra2fsTU6FxlatK4ONViyWFBXa6mEjBNSZ4XmyZvdWUqlwpSC
F5jNGppm5Ama6xxcLphA
=6yQx
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux
Pull powerpc fixes from Michael Ellerman:
- fix for mm_dec_nr_pmds() from Scott.
- fixes for oopses seen with KVM + THP from Aneesh.
- build fixes from Aneesh & Shreyas.
* tag 'powerpc-4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
powerpc/mm: Fix build error with CONFIG_PPC_TRANSACTIONAL_MEM disabled
powerpc/kvm: Fix ppc64_defconfig + PPC_POWERNV=n build error
powerpc/mm/thp: Return pte address if we find trans_splitting.
powerpc/mm/thp: Make page table walk safe against thp split/collapse
KVM: PPC: Remove page table walk helpers
KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer
powerpc/hugetlb: Call mm_dec_nr_pmds() in hugetlb_free_pmd_range()
Commit 567e4b7973 ("net: rfs: add hash collision detection") had one
mistake :
RPS_NO_CPU is no longer the marker for invalid cpu in set_rps_cpu()
and get_rps_cpu(), as @next_cpu was the result of an AND with
rps_cpu_mask
This bug showed up on a host with 72 cpus :
next_cpu was 0x7f, and the code was trying to access percpu data of an
non existent cpu.
In a follow up patch, we might get rid of compares against nr_cpu_ids,
if we init the tables with 0. This is silly to test for a very unlikely
condition that exists only shortly after table initialization, as
we got rid of rps_reset_sock_flow() and similar functions that were
writing this RPS_NO_CPU magic value at flow dismantle : When table is
old enough, it never contains this value anymore.
Fixes: 567e4b7973 ("net: rfs: add hash collision detection")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for fixed-links in configurations without PHY.
(e.g. connection to a switch, SGMII point to point, SFPs)
Check: Documentation/devicetree/bindings/net/fixed-link.txt.
Signed-off-by: Andreas Oetken <ennoerlangen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Book3S HV only (debugging aids, minor performance improvements and some
cleanups). But there are also bug fixes and small cleanups for ARM,
x86 and s390.
The task_migration_notifier revert and real fix is still pending review,
but I'll send it as soon as possible after -rc1.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJVOONLAAoJEL/70l94x66DbsMIAIpZPsaqgXOC1sDEiZuYay+6
rD4n4id7j8hIAzcf3AlZdyf5XgLlr6I1Zyt62s1WcoRq/CCnL7k9EljzSmw31WFX
P2y7/J0iBdkn0et+PpoNThfL2GsgTqNRCLOOQlKgEQwMP9Dlw5fnUbtC1UchOzTg
eAMeBIpYwufkWkXhdMw4PAD4lJ9WxUZ1eXHEBRzJb0o0ZxAATJ1tPZGrFJzoUOSM
WsVNTuBsNd7upT02kQdvA1TUo/OPjseTOEoksHHwfcORt6bc5qvpctL3jYfcr7sk
/L6sIhYGVNkjkuredjlKGLfT2DDJjSEdJb1k2pWrDRsY76dmottQubAE9J9cDTk=
=OAi2
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull second batch of KVM changes from Paolo Bonzini:
"This mostly includes the PPC changes for 4.1, which this time cover
Book3S HV only (debugging aids, minor performance improvements and
some cleanups). But there are also bug fixes and small cleanups for
ARM, x86 and s390.
The task_migration_notifier revert and real fix is still pending
review, but I'll send it as soon as possible after -rc1"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (29 commits)
KVM: arm/arm64: check IRQ number on userland injection
KVM: arm: irqfd: fix value returned by kvm_irq_map_gsi
KVM: VMX: Preserve host CR4.MCE value while in guest mode.
KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C
KVM: PPC: Book3S HV: Streamline guest entry and exit
KVM: PPC: Book3S HV: Use bitmap of active threads rather than count
KVM: PPC: Book3S HV: Use decrementer to wake napping threads
KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI
KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken
KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu
KVM: PPC: Book3S HV: Minor cleanups
KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update
KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT
KVM: PPC: Book3S HV: Add ICP real mode counters
KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode
KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock
KVM: PPC: Book3S HV: Add guest->host real mode completion counters
KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte
...
Commit 43d3ddf87a ("net: pxa168_eth: add device tree support") starts
to use managed resources by adding devm_clk_get() and
devm_ioremap_resource(), but it leaves explicit iounmap() and clock_put()
in pxa168_eth_remove() and in failure handling code of pxa168_eth_probe().
As a result double free can happen.
The patch removes explicit resource deallocation. Also it converts
clk_disable() to clk_disable_unprepare() to make it symmetrical with
clk_prepare_enable().
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch resolves false errors from MSGDMA in TX mSGDMA MM to ST
mode, and is a continuation of the patch recently submitted by Andrea
Oetken. The MSGDMA had a logic bug that masked detection of this issue
prior to Quartus 14.1/Build 164. When the MSGDMA logic bug was addressed
in Quartus 14.1/Build 164, the driver problem was exposed.
The problem is corrected by making sure MSGDMA_DESC_CTL_TR_ERR_IRQ is not
set for any of the transmit DMA descriptors, and only used for receive
descriptors.
Fixes: 71cd26e altera tse: Error-Bit on tx-avalon-stream always set.
Signed-off-by: Chee Nouk Phoon <cnphoon@altera.com>
Signed-off-by: Vince Bridgers <vbridger@opensource.altera.com>a
Cc: Andreas Oetken <ennoerlangen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent commit to only register the EHEA memory hotplug hooks on
adapter probe has a few problems.
Firstly the reference counting is wrong for multiple adapters, in that
the hooks are registered multiple times. Secondly the check in the tear
down path is backward. Finally the error path doesn't decrement the
count.
The multiple registration of the hooks is the biggest problem, as it
leads to oopses when the system is rebooted, and/or errors during memory
hotplug, eg:
$ ./mem-on-off-test.sh -r 2
...
ehea: memory is going offline
ehea: LPAR memory changed - re-initializing driver
ehea: re-initializing driver complete
ehea: memory is going offline
ehea: LPAR memory changed - re-initializing driver
ehea: opcode=26c ret=fffffffffffffffc arg1=8000000003000003 arg2=0 arg3=700000060000d600 arg4=3fded0000 arg5=200 arg6=0 arg7=0
ehea: register_rpage_mr failed
ehea: registering mr failed
ehea: register MR failed - driver inoperable!
ehea: memory is going offline
Fixes: aa18332331 ("ehea: Register memory hotplug, reboot and crash hooks on adapter probe")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When having permanent EEH error, the PCI device will be removed
from the system. For this case, we shouldn't set pcierr_recovery
to true wrongly, which blocks the driver to release the allocated
interrupts and their handlers. Eventually, we can't disable MSI
or MSIx successfully because of the MSI or MSIx interrupts still
have associated interrupt actions, which is turned into following
stack dump.
Oops: Exception in kernel mode, sig: 5 [#1]
:
[c0000000003b76a8] .free_msi_irqs+0x80/0x1a0 (unreliable)
[c00000000039f388] .pci_remove_bus_device+0x98/0x110
[c0000000000790f4] .pcibios_remove_pci_devices+0x9c/0x128
[c000000000077b98] .handle_eeh_events+0x2d8/0x4b0
[c0000000000782d0] .eeh_event_handler+0x130/0x1c0
[c000000000022bd4] .kernel_thread+0x54/0x70
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new Atmel MXT driver expects i2c client's address contain the
primary (main address) of the chip, and calculates the expected
bootloader address form the primary address. Unfortunately chrome_laptop
does probe the devices and if touchpad (or touchscreen, or both) comes
up in bootloader mode the i2c device gets instantiated with the
bootloader address which confuses the driver.
To work around this issue let's probe the primary address first. If the
device is not detected at the primary address we'll probe alternative
addresses as "dummy" devices. If any of them are found, destroy the
dummy client and instantiate client with proper name at primary address
still.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Calling unlazy_walk() in walk_component() and do_last() when we find
a symlink that needs to be followed doesn't acquire a reference to vfsmount.
That's fine when the symlink is on the same vfsmount as the parent directory
(which is almost always the case), but it's not always true - one _can_
manage to bind a symlink on top of something. And in such cases we end up
with excessive mntput().
Cc: stable@vger.kernel.org # since 2.6.39
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
I_DIO_WAKEUP is never directly used, but fix it up anyway.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
do_blockdev_direct_IO() increments and decrements the inode
->i_dio_count for each IO operation. It does this to protect against
truncate of a file. Block devices don't need this sort of protection.
For a capable multiqueue setup, this atomic int is the only shared
state between applications accessing the device for O_DIRECT, and it
presents a scaling wall for that. In my testing, as much as 30% of
system time is spent incrementing and decrementing this value. A mixed
read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
better latencies too. Before:
clat percentiles (usec):
| 1.00th=[ 33], 5.00th=[ 34], 10.00th=[ 34], 20.00th=[ 34],
| 30.00th=[ 34], 40.00th=[ 34], 50.00th=[ 35], 60.00th=[ 35],
| 70.00th=[ 35], 80.00th=[ 35], 90.00th=[ 37], 95.00th=[ 80],
| 99.00th=[ 98], 99.50th=[ 151], 99.90th=[ 155], 99.95th=[ 155],
| 99.99th=[ 165]
After:
clat percentiles (usec):
| 1.00th=[ 95], 5.00th=[ 108], 10.00th=[ 129], 20.00th=[ 149],
| 30.00th=[ 155], 40.00th=[ 161], 50.00th=[ 167], 60.00th=[ 171],
| 70.00th=[ 177], 80.00th=[ 185], 90.00th=[ 201], 95.00th=[ 270],
| 99.00th=[ 390], 99.50th=[ 398], 99.90th=[ 418], 99.95th=[ 422],
| 99.99th=[ 438]
In other setups, Robert Elliott reported seeing good performance
improvements:
https://lkml.org/lkml/2015/4/3/557
The more applications accessing the device, the worse it gets.
Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells
do_blockdev_direct_IO() that it need not worry about incrementing
or decrementing the inode i_dio_count for this caller.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Elliott, Robert (Server Storage) <elliott@hp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>