Commit graph

753598 commits

Author SHA1 Message Date
Heiner Kallweit
9a3c81fa61 r8169: replace magic number for INTT mask with a constant
Use a proper constant for INTT bit mask.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30 09:38:19 -04:00
Heiner Kallweit
a3984578bf r8169: improve rtl8169_set_features
__rtl8169_set_features is used in rtl8169_set_features only, so we
can inline it. In addition:
- Remove check (features ^ dev->features), __netdev_update_features
  check's already that requested features differ from current ones.
- Don't mask out unsupported flags, there's no benefit in it.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30 09:38:19 -04:00
Heiner Kallweit
e66267483e r8169: remove unneeded call to __rtl8169_set_features in rtl_open
RxChkSum and RxVlan aren't touched outside __rtl8169_set_features
(except in probe), so they are always in sync with dev->features.
And the RxConfig flags are set in rtl_set_rx_mode() which is
called via dev_set_rx_mode() from __dev_open().
Therefore we can safely remove this call.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30 09:38:19 -04:00
Colin Ian King
f944ad1b2b net: ethernet: ucc: fix spelling mistake: "tx-late-collsion" -> "tx-late-collision"
Trivial fix to spelling mistake in tx_fw_stat_gstrings text

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30 09:29:39 -04:00
Colin Ian King
76c2a96d42 liquidio: fix spelling mistake: "mac_tx_multi_collison" -> "mac_tx_multi_collision"
Trivial fix to spelling mistake in oct_stats_strings text

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30 09:29:09 -04:00
Colin Ian King
ff81de73e4 qed: fix spelling mistake: "checksumed" -> "checksummed"
Trivial fix to spelling mistake in DP_INFO message text

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30 09:28:22 -04:00
David S. Miller
dbafb0c488 Merge branch 'liquidio-enhanced-ethtool-set-channels-feature'
Intiyaz Basha says:

====================
liquidio: enhanced ethtool --set-channels feature

For the ethtool --set-channels feature, the liquidio driver currently
accepts max combined value as the queue count configured during driver
load time, where max combined count is the total count of input and output
queues. This limitation is applicable only when SR-IOV is enabled, that
is, when VFs are created for PF. If SR-IOV is not enabled, the driver can
configure max supported (64) queues.

This series of patches are for enhancing driver to accept
max supported queues for ethtool --set-channels.

Changes in V2:
  Only patch #6 was changed to fix these Sparse warnings reported by kbuild
  test robot:
    lio_ethtool.c:848:5: warning: symbol 'lio_23xx_reconfigure_queue_count'
                         was not declared. Should it be static?
    lio_ethtool.c:877:22: warning: incorrect type in assignment (different
                          base types)
    lio_ethtool.c:878:22: warning: incorrect type in assignment (different
                          base types)
    lio_ethtool.c:879:22: warning: incorrect type in assignment (different
                          base types)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30 09:26:29 -04:00
Intiyaz Basha
c33c997346 liquidio: enhanced ethtool --set-channels feature
Enhancing driver to accept max supported queues for ethtool --set-channels

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30 09:26:29 -04:00
Intiyaz Basha
128ea39439 liquidio: Moved common function setup_glists to lio_core.c
Moved common function setup_glists to lio_core.c
and reamed it to lio_setup_glists

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30 09:26:28 -04:00
Intiyaz Basha
a72b2c8ced liquidio: Moved common definition octnic_gather to octeon_network.h
Moving common definition octnic_gather to octeon_network.h

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30 09:26:28 -04:00
Intiyaz Basha
fd311f1e75 liquidio: Moved common function delete_glists to lio_core.c
Moved common function delete_glists to lio_core.c
and renamed it to lio_delete_glists

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30 09:26:28 -04:00
Intiyaz Basha
85a0cd8186 liquidio: Moved common function list_delete_head to octeon_network.h
Moved common function list_delete_head to octeon_network.h
and renamed it to lio_list_delete_head

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30 09:26:28 -04:00
Intiyaz Basha
592a4cebc2 liquidio: Moved common function if_cfg_callback to lio_core.c
Moved common function if_cfg_callback to lio_core.c
and renamed it to lio_if_cfg_callback.

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30 09:26:28 -04:00
Daniel Borkmann
081023a311 Merge branch 'bpf-formatting-fixes-helpers'
Quentin Monnet says:

====================
Here is a follow-up set for eBPF helper documentation, with two patches
to fix formatting issues:

- One to fix an error I introduced with the initial set for the doc.
- One for the newly added bpf_get_stack(), that is currently not parsed
  correctly with the Python script (function signature is fine, but
  description and return value appear as empty).

There is no change to text contents in this set.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-30 13:53:13 +02:00
Quentin Monnet
a56497d3d5 bpf: update bpf.h uapi header for tools
Bring fixes for eBPF helper documentation formatting to bpf.h under
tools/ as well.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-30 13:53:12 +02:00
Quentin Monnet
79552fbc0f bpf: fix formatting for bpf_get_stack() helper doc
Fix formatting (indent) for bpf_get_stack() helper documentation, so
that the doc is rendered correctly with the Python script.

Fixes: c195651e56 ("bpf: add bpf_get_stack helper")
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-30 13:53:12 +02:00
Quentin Monnet
3bd5a09b52 bpf: fix formatting for bpf_perf_event_read() helper doc
Some edits brought to the last iteration of BPF helper functions
documentation introduced an error with RST formatting. As a result, most
of one paragraph is rendered in bold text when only the name of a helper
should be. Fix it, and fix formatting of another function name in the
same paragraph.

Fixes: c6b5fb8690 ("bpf: add documentation for eBPF helpers (42-50)")
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-30 13:53:11 +02:00
Alexei Starovoitov
4d220ed0f8 bpf: remove tracepoints from bpf core
tracepoints to bpf core were added as a way to provide introspection
to bpf programs and maps, but after some time it became clear that
this approach is inadequate, so prog_id, map_id and corresponding
get_next_id, get_fd_by_id, get_info_by_fd, prog_query APIs were
introduced and fully adopted by bpftool and other applications.
The tracepoints in bpf core started to rot and causing syzbot warnings:
WARNING: CPU: 0 PID: 3008 at kernel/trace/trace_event_perf.c:274
Kernel panic - not syncing: panic_on_warn set ...
perf_trace_bpf_map_keyval+0x260/0xbd0 include/trace/events/bpf.h:228
trace_bpf_map_update_elem include/trace/events/bpf.h:274 [inline]
map_update_elem kernel/bpf/syscall.c:597 [inline]
SYSC_bpf kernel/bpf/syscall.c:1478 [inline]
Hence this patch deletes tracepoints in bpf core.

Reported-by: Eric Biggers <ebiggers3@gmail.com>
Reported-by: syzbot <bot+a9dbb3c3e64b62536a4bc5ee7bbd4ca627566188@syzkaller.appspotmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-30 10:55:56 +02:00
Yonghong Song
34745aed51 samples/bpf: fix kprobe attachment issue on x64
Commit d5a00528b5 ("syscalls/core, syscalls/x86: Rename
struct pt_regs-based sys_*() to __x64_sys_*()") renamed a lot
of syscall function sys_*() to __x64_sys_*().
This caused several kprobe based samples/bpf tests failing.

This patch fixed the problem in bpf_load.c.
For x86_64 architecture, function name __x64_sys_*() will be
first used for kprobe event creation. If the creation is successful,
it will be used. Otherwise, function name sys_*() will be used
for kprobe event creation.

Fixes: d5a00528b5 ("syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-04-29 20:36:53 -07:00
Florian Fainelli
3ac305c386 net: core: Assert the size of netdev_featres_t
We have about 53 netdev_features_t bits defined and counting, add a
build time check to catch when an u64 type will not be enough and we
will have to convert that to a bitmap. This is done in
register_netdevice() for convenience.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 22:50:36 -04:00
Marcelo Ricardo Leitner
2cb5fb1454 MAINTAINERS: add myself as SCTP co-maintainer
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 22:49:45 -04:00
Colin Ian King
14b7dc18ee net: systemport: fix spelling mistake: "asymetric" -> "asymmetric"
Trivial fix to spelling mistake in netdev_warn warning message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 22:49:03 -04:00
David S. Miller
c1b28847f7 Merge branch 'net-cleanup-skb_tx_hash'
Alexander Duyck says:

====================
Clean up users of skb_tx_hash and __skb_tx_hash

I am in the process of doing some work to try and enable macvlan Tx queue
selection without using ndo_select_queue. As a part of that I will likely
need to make changes to skb_tx_hash. As such this is a clean up or refactor
of the two spots where he function has been used. In both cases it didn't
really seem like the function was being used correctly so I have updated
both code paths to not make use of the function.

My current development environment doesn't have an mlx4 or OPA vnic
available so the changes to those have been build tested only.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 22:01:33 -04:00
Alexander Duyck
1b837d489e net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash
I am dropping the export of __skb_tx_hash as after my patches nobody is
using it outside of the net/core/dev.c file. In addition I am renaming and
repurposing it to just be a static declaration of skb_tx_hash since that
was the only user for it at this point. By doing this the compiler can
inline it into __netdev_pick_tx as that will improve performance.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 22:01:33 -04:00
Alexander Duyck
b86629ebd5 mlx4: Don't bother using skb_tx_hash in mlx4_en_select_queue
The code in the fallback path has supported XDP in conjunction with the Tx
traffic classification for TCs for over a year now. So instead of just
calling skb_tx_hash for every packet we are better off using the fallback
since that will record the Tx queue to the socket and then that can be used
instead of having to recompute the hash every time.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 22:01:32 -04:00
Alexander Duyck
d80d8d5573 opa_vnic: Just use skb_get_hash instead of skb_tx_hash
This patch is meant to clean up how the opa_vnic is obtaining entropy from
Tx packets.

The code as it was written was claiming to get 16 bits of hash, but from
what I can tell it was only ever actually getting 14 bits as it was limited
to 0 - (2^15 - 1). It then was folding the result to get a 8 bit value for
entropy.

Instead of throwing away all that input I am cutting out the middle man and
instead having the code call skb_get_hash directly and then folding the 32
bit value into a 8 bit value using a pair of shifts and XOR operations.

Execution wise this new approach should provide more entropy and be faster
since we are bypassing the reciprocal multiplication to reduce the 32b
value to 16b and instead just using a shift/XOR combination.

In addition we can drop the unneeded adapter value from the call to get the
entropy since the netdev itself isn't even needed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 22:01:32 -04:00
David S. Miller
f90652841f Merge branch 'lan78xx-fixed-phy'
Raghuram Chary J says:

====================
lan78xx updates along with Fixed phy Support

These series of patches handle few modifications in driver
and adds support for fixed phy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 21:41:01 -04:00
Raghuram Chary J
7670ed7a25 lan78xx: Modify error messages
Modify the error messages when phy registration fails.

Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 21:41:01 -04:00
Raghuram Chary J
e92258c761 lan78xx: Remove DRIVER_VERSION for lan78xx driver
Remove driver version info from the lan78xx driver.

Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 21:41:01 -04:00
Raghuram Chary J
89b36fb5e5 lan78xx: Lan7801 Support for Fixed PHY
Adding Fixed PHY support to the lan78xx driver.

Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 21:41:01 -04:00
David S. Miller
5d659b1d7d Merge branch 'tcp-mmap-rework-zerocopy-receive'
Eric Dumazet says:

====================
tcp: mmap: rework zerocopy receive

syzbot reported a lockdep issue caused by tcp mmap() support.

I implemented Andy Lutomirski nice suggestions to resolve the
issue and increase scalability as well.

First patch is adding a new getsockopt() operation and changes mmap()
behavior.

Second patch changes tcp_mmap reference program.

v4: tcp mmap() support depends on CONFIG_MMU, as kbuild bot told us.

v3: change TCP_ZEROCOPY_RECEIVE to be a getsockopt() option
    instead of setsockopt(), feedback from Ka-Cheon Poon

v2: Added a missing page align of zc->length in tcp_zerocopy_receive()
    Properly clear zc->recv_skip_hint in case user request was completed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 21:29:55 -04:00
Eric Dumazet
aacb0c2e52 selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE
After prior kernel change, mmap() on TCP socket only reserves VMA.

We have to use getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...)
to perform the transfert of pages from skbs in TCP receive queue into such VMA.

struct tcp_zerocopy_receive {
	__u64 address;		/* in: address of mapping */
	__u32 length;		/* in/out: number of bytes to map/mapped */
	__u32 recv_skip_hint;	/* out: amount of bytes to skip */
};

After a successful getsockopt(...TCP_ZEROCOPY_RECEIVE...), @length contains
number of bytes that were mapped, and @recv_skip_hint contains number of bytes
that should be read using conventional read()/recv()/recvmsg() system calls,
to skip a sequence of bytes that can not be mapped, because not properly page
aligned.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 21:29:55 -04:00
Eric Dumazet
05255b823a tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive
When adding tcp mmap() implementation, I forgot that socket lock
had to be taken before current->mm->mmap_sem. syzbot eventually caught
the bug.

Since we can not lock the socket in tcp mmap() handler we have to
split the operation in two phases.

1) mmap() on a tcp socket simply reserves VMA space, and nothing else.
  This operation does not involve any TCP locking.

2) getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements
 the transfert of pages from skbs to one VMA.
  This operation only uses down_read(&current->mm->mmap_sem) after
  holding TCP lock, thus solving the lockdep issue.

This new implementation was suggested by Andy Lutomirski with great details.

Benefits are :

- Better scalability, in case multiple threads reuse VMAS
   (without mmap()/munmap() calls) since mmap_sem wont be write locked.

- Better error recovery.
   The previous mmap() model had to provide the expected size of the
   mapping. If for some reason one part could not be mapped (partial MSS),
   the whole operation had to be aborted.
   With the tcp_zerocopy_receive struct, kernel can report how
   many bytes were successfuly mapped, and how many bytes should
   be read to skip the problematic sequence.

- No more memory allocation to hold an array of page pointers.
  16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/

- skbs are freed while mmap_sem has been released

Following patch makes the change in tcp_mmap tool to demonstrate
one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)

Note that memcg might require additional changes.

Fixes: 93ab6cc691 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Cc: linux-mm@kvack.org
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 21:29:55 -04:00
Hangbin Liu
e8238fc2bd bridge: check iface upper dev when setting master via ioctl
When we set a bond slave's master to bridge via ioctl, we only check
the IFF_BRIDGE_PORT flag. Although we will find the slave's real master
at netdev_master_upper_dev_link() later, it already does some settings
and allocates some resources. It would be better to return as early
as possible.

v1 -> v2:
use netdev_master_upper_dev_get() instead of netdev_has_any_upper_dev()
to check if we have a master, because not all upper devs are masters,
e.g. vlan device.

Reported-by: syzbot+de73361ee4971b6e6f75@syzkaller.appspotmail.com
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 21:08:02 -04:00
David S. Miller
589f84fb95 Merge branch 'dsa-mv88e6xxx-remove-Global-2-setup'
Vivien Didelot says:

====================
net: dsa: mv88e6xxx: remove Global 2 setup

Parts of the mv88e6xxx driver still write arbitrary registers of
different banks at setup time, which is misleading especially when
supporting multiple device models.

This patchset moves two features setup into the top lovel
mv88e6xxx_setup function and kills the old Global 2 register bank setup
function. It brings no functional changes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 20:36:50 -04:00
Vivien Didelot
5d49d60307 net: dsa: mv88e6xxx: remove Global 2 setup
The remaining values written to the Switch Management Register in the
mv88e6xxx_g2_setup function are specific to 88E6352 and older, and are
the default values anyway.

Thus remove completely this function. The mv88e6xxx driver no more
contains setup code to access arbitrary Global 2 registers.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 20:36:49 -04:00
Vivien Didelot
c7f047b6c7 net: dsa: mv88e6xxx: move device mapping setup
Move the Device Mapping setup out of the specific Global 2 code,
into the top level device setup function.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 20:36:49 -04:00
Vivien Didelot
b28f872dc4 net: dsa: mv88e6xxx: move trunk setup
Move the trunking setup out of Global 2 specific setup into the top
level mv88e6xxx_setup function.

Note that the 88E6390 family calls this LAG instead of Trunk and
supports 32 possible ID routing vectors, with LAG ID bit 4 being placed
in Global 2 register 0x1D...

We don't need Trunk (or LAG) IDs for the moment, thus keep it simple.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 20:36:49 -04:00
Linus Torvalds
6da6c0db53 Linux v4.17-rc3 2018-04-29 14:17:42 -07:00
Linus Torvalds
c61a56abab Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "Another set of x86 related updates:

   - Fix the long broken x32 version of the IPC user space headers which
     was noticed by Arnd Bergman in course of his ongoing y2038 work.
     GLIBC seems to have non broken private copies of these headers so
     this went unnoticed.

   - Two microcode fixlets which address some more fallout from the
     recent modifications in that area:

      - Unconditionally save the microcode patch, which was only saved
        when CPU_HOTPLUG was enabled causing failures in the late
        loading mechanism

      - Make the later loader synchronization finally work under all
        circumstances. It was exiting early and causing timeout failures
        due to a missing synchronization point.

   - Do not use mwait_play_dead() on AMD systems to prevent excessive
     power consumption as the CPU cannot go into deep power states from
     there.

   - Address an annoying sparse warning due to lost type qualifiers of
     the vmemmap and vmalloc base address constants.

   - Prevent reserving crash kernel region on Xen PV as this leads to
     the wrong perception that crash kernels actually work there which
     is not the case. Xen PV has its own crash mechanism handled by the
     hypervisor.

   - Add missing TLB cpuid values to the table to make the printout on
     certain machines correct.

   - Enumerate the new CLDEMOTE instruction

   - Fix an incorrect SPDX identifier

   - Remove stale macros"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ipc: Fix x32 version of shmid64_ds and msqid64_ds
  x86/setup: Do not reserve a crash kernel region if booted on Xen PV
  x86/cpu/intel: Add missing TLB cpuid values
  x86/smpboot: Don't use mwait_play_dead() on AMD systems
  x86/mm: Make vmemmap and vmalloc base address constants unsigned long
  x86/vector: Remove the unused macro FPU_IRQ
  x86/vector: Remove the macro VECTOR_OFFSET_START
  x86/cpufeatures: Enumerate cldemote instruction
  x86/microcode: Do not exit early from __reload_late()
  x86/microcode/intel: Save microcode patch unconditionally
  x86/jailhouse: Fix incorrect SPDX identifier
2018-04-29 10:06:05 -07:00
Linus Torvalds
65f4d6d0f8 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti fixes from Thomas Gleixner:
 "A set of updates for the x86/pti related code:

   - Preserve r8-r11 in int $0x80. r8-r11 need to be preserved, but the
     int$80 entry code removed that quite some time ago. Make it correct
     again.

   - A set of fixes for the Global Bit work which went into 4.17 and
     caused a bunch of interesting regressions:

      - Triggering a BUG in the page attribute code due to a missing
        check for early boot stage

      - Warnings in the page attribute code about holes in the kernel
        text mapping which are caused by the freeing of the init code.
        Handle such holes gracefully.

      - Reduce the amount of kernel memory which is set global to the
        actual text and do not incidentally overlap with data.

      - Disable the global bit when RANDSTRUCT is enabled as it
        partially defeats the hardening.

      - Make the page protection setup correct for vma->page_prot
        population again. The adjustment of the protections fell through
        the crack during the Global bit rework and triggers warnings on
        machines which do not support certain features, e.g. NX"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry/64/compat: Preserve r8-r11 in int $0x80
  x86/pti: Filter at vma->vm_page_prot population
  x86/pti: Disallow global kernel text with RANDSTRUCT
  x86/pti: Reduce amount of kernel text allowed to be Global
  x86/pti: Fix boot warning from Global-bit setting
  x86/pti: Fix boot problems from Global-bit setting
2018-04-29 09:36:22 -07:00
Teng Qin
7ef3771205 bpf: Allow bpf_current_task_under_cgroup in interrupt
Currently, the bpf_current_task_under_cgroup helper has a check where if
the BPF program is running in_interrupt(), it will return -EINVAL. This
prevents the helper to be used in many useful scenarios, particularly
BPF programs attached to Perf Events.

This commit removes the check. Tested a few NMI (Perf Event) and some
softirq context, the helper returns the correct result.

Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-04-29 09:18:04 -07:00
Linus Torvalds
810fb07a9b Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "Two fixes from the timer departement:

   - Fix a long standing issue in the NOHZ tick code which causes RB
     tree corruption, delayed timers and other malfunctions. The cause
     for this is code which modifies the expiry time of an enqueued
     hrtimer.

   - Revert the CLOCK_MONOTONIC/CLOCK_BOOTTIME unification due to
     regression reports. Seems userspace _is_ relying on the documented
     behaviour despite our hope that it wont"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME
  tick/sched: Do not mess with an enqueued hrtimer
2018-04-29 09:03:25 -07:00
Linus Torvalds
7d9e55feae Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "The perf update contains the following bits:

  x86:
   - Prevent setting freeze_on_smi on PerfMon V1 CPUs to avoid #GP

  perf stat:
   - Keep the '/' event modifier separator in fallback, for example when
     fallbacking from 'cpu/cpu-cycles/' to user level only, where it
     should become 'cpu/cpu-cycles/u' and not 'cpu/cpu-cycles/:u' (Jiri
     Olsa)

   - Fix PMU events parsing rule, improving error reporting for invalid
     events (Jiri Olsa)

   - Disable write_backward and other event attributes for !group events
     in a group, fixing, for instance this group: '{cycles,msr/aperf/}:S'
     that has leader sampling (:S) and where just the 'cycles', the
     leader event, should have the write_backward attribute set, in this
     case it all fails because the PMU where 'msr/aperf/' lives doesn't
     accepts write_backward style sampling (Jiri Olsa)

   - Only fall back group read for leader (Kan Liang)

   - Fix core PMU alias list for x86 platform (Kan Liang)

   - Print out hint for mixed PMU group error (Kan Liang)

   - Fix duplicate PMU name for interval print (Kan Liang)

  Core:
   - Set main kernel end address properly when reading kernel and module
     maps (Namhyung Kim)

  perf mem:
   - Fix incorrect entries and add missing man options (Sangwon Hong)

  s/390:
   - Remove s390 specific strcmp_cpuid_cmp function (Thomas Richter)

   - Adapt 'perf test' case record+probe_libc_inet_pton.sh for s390

   - Fix s390 undefined record__auxtrace_init() return value in 'perf
     record' (Thomas Richter)"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Don't enable freeze-on-smi for PerfMon V1
  perf stat: Fix duplicate PMU name for interval print
  perf evsel: Only fall back group read for leader
  perf stat: Print out hint for mixed PMU group error
  perf pmu: Fix core PMU alias list for X86 platform
  perf record: Fix s390 undefined record__auxtrace_init() return value
  perf mem: Document incorrect and missing options
  perf evsel: Disable write_backward for leader sampling group events
  perf pmu: Fix pmu events parsing rule
  perf stat: Keep the / modifier separator in fallback
  perf test: Adapt test case record+probe_libc_inet_pton.sh for s390
  perf list: Remove s390 specific strcmp_cpuid_cmp function
  perf machine: Set main kernel end address properly
2018-04-29 08:58:50 -07:00
Alexei Starovoitov
fcf85729d8 Merge branch 'fix-bpf-helpers-doc'
Andrey Ignatov says:

====================
BPF helpers documentation in UAPI refers to kernel ctx structures when it
has to refer to user visible ones. Fix it.
====================

Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-04-29 08:56:32 -07:00
Andrey Ignatov
96871b9f6f bpf: Sync bpf.h to tools/
The patch syncs bpf.h to tools/.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-04-29 08:56:31 -07:00
Andrey Ignatov
a3ef8e9a4d bpf: Fix helpers ctx struct types in uapi doc
Helpers may operate on two types of ctx structures: user visible ones
(e.g. `struct bpf_sock_ops`) when used in user programs, and kernel ones
(e.g. `struct bpf_sock_ops_kern`) in kernel implementation.

UAPI documentation must refer to only user visible structures.

The patch replaces references to `_kern` structures in BPF helpers
description by corresponding user visible structures.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-04-29 08:56:31 -07:00
Alexei Starovoitov
f60ad0a0c4 Merge branch 'bpf_get_stack'
Yonghong Song says:

====================
Currently, stackmap and bpf_get_stackid helper are provided
for bpf program to get the stack trace. This approach has
a limitation though. If two stack traces have the same hash,
only one will get stored in the stackmap table regardless of
whether BPF_F_REUSE_STACKID is specified or not,
so some stack traces may be missing from user perspective.

This patch implements a new helper, bpf_get_stack, will
send stack traces directly to bpf program. The bpf program
is able to see all stack traces, and then can do in-kernel
processing or send stack traces to user space through
shared map or bpf_perf_event_output.

Patches #1 and #2 implemented the core kernel support.
Patch #3 removes two never-hit branches in verifier.
Patches #4 and #5 are two verifier improves to make
bpf programming easier. Patch #6 synced the new helper
to tools headers. Patch #7 moved perf_event polling code
and ksym lookup code from samples/bpf to
tools/testing/selftests/bpf. Patch #8 added a verifier
test in tools/bpf for new verifier change.
Patches #9 and #10 added tests for raw tracepoint prog
and tracepoint prog respectively.

Changelogs:
  v8 -> v9:
    . make function perf_event_mmap (in trace_helpers.c) extern
      to decouple perf_event_mmap and perf_event_poller.
    . add jit enabled handling for kernel stack verification
      in Patch #9. Since we did not have a good way to
      verify jit enabled kernel stack, just return true if
      the kernel stack is not empty.
    . In path #9, using raw_syscalls/sys_enter instead of
      sched/sched_switch, removed calling cmd
      "task 1 dd if=/dev/zero of=/dev/null" which is left
      with dangling process after the program exited.
  v7 -> v8:
    . rebase on top of latest bpf-next
    . simplify BPF_ARSH dst_reg->smin_val/smax_value tracking
    . rewrite the description of bpf_get_stack() in uapi bpf.h
      based on new format.
  v6 -> v7:
    . do perf callchain buffer allocation inside the
      verifier. so if the prog->has_callchain_buf is set,
      it is guaranteed that the buffer has been allocated.
    . change condition "trace_nr <= skip" to "trace_nr < skip"
      so that for zero size buffer, return 0 instead of -EFAULT
  v5 -> v6:
    . after refining return register smax_value and umax_value
      for helpers bpf_get_stack and bpf_probe_read_str,
      bounds and var_off of the return register are further refined.
    . added missing commit message for tools header sync commit.
    . removed one unnecessary empty line.
  v4 -> v5:
    . relied on dst_reg->var_off to refine umin_val/umax_val
      in verifier handling BPF_ARSH value range tracking,
      suggested by Edward.
  v3 -> v4:
    . fixed a bug when meta ptr is set to NULL in check_func_arg.
    . introduced tnum_arshift and added detailed comments for
      the underlying implementation
    . avoided using VLA in tools/bpf test_progs.
  v2 -> v3:
    . used meta to track helper memory size argument
    . implemented range checking for ARSH in verifier
    . moved perf event polling and ksym related functions
      from samples/bpf to tools/bpf
    . added test to compare build id's between bpf_get_stackid
      and bpf_get_stack
  v1 -> v2:
    . fixed compilation error when CONFIG_PERF_EVENTS is not enabled
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-04-29 08:45:55 -07:00
Yonghong Song
79b4535013 tools/bpf: add a test for bpf_get_stack with tracepoint prog
The test_stacktrace_map and test_stacktrace_build_id are
enhanced to call bpf_get_stack in the helper to get the
stack trace as well.  The stack traces from bpf_get_stack
and bpf_get_stackid are compared to ensure that for the
same stack as represented as the same hash, their ip addresses
or build id's must be the same.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-04-29 08:45:54 -07:00
Yonghong Song
173965fbfb tools/bpf: add a test for bpf_get_stack with raw tracepoint prog
The test attached a raw_tracepoint program to raw_syscalls/sys_enter.
It tested to get stack for user space, kernel space and user
space with build_id request. It also tested to get user
and kernel stack into the same buffer with back-to-back
bpf_get_stack helper calls.

If jit is not enabled, the user space application will check
to ensure that the kernel function for raw_tracepoint
___bpf_prog_run is part of the stack.

If jit is enabled, we did not have a reliable way to
verify the kernel stack, so just assume the kernel stack
is good when the kernel stack size is greater than 0.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-04-29 08:45:54 -07:00