Commit graph

511 commits

Author SHA1 Message Date
Peter Maydell
84a5a80148 * Log filtering from Alex and Peter
* Chardev fix from Marc-André
 * config.status tweak from David
 * Header file tweaks from Markus, myself and Veronia (Outreachy candidate)
 * get_ticks_per_sec() removal from Rutuja (Outreachy candidate)
 * Coverity fix from myself
 * PKE implementation from myself, based on rth's XSAVE support
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJW9ErPAAoJEL/70l94x66DJfEH/A/QkMpAhrgNdyVsahzsGrzE
 wx5gHFIc1nBYxyr62w4apUb5jPB7zaXu0LA7EAWDeAe0pyP8hZzLT9kJyOEDsuJu
 zwKN2QeLSNMtPbnbKN0I/YQ2za2xX1V5ruhSeOJoVslUI214hgnAURaGshhQNzuZ
 2CluDT9KgL5cQifAnKs5kJrwhIYShYNQB+1eDC/7wk28dd/EH+sPALIoF+rqrSmt
 Zu4Mdqd+9Ns+oKOjA6br9ULq/Hzg0aDfY82J+XLVVqfF3PXQe8rTDmuMf/7jTn+M
 Un7ZOcei9oZF2/9vfAfKQpDCcgD9HvOUSbgqV/ubmkPPmN/LNJzeKj0fBhrRN+Y=
 =K12D
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* Log filtering from Alex and Peter
* Chardev fix from Marc-André
* config.status tweak from David
* Header file tweaks from Markus, myself and Veronia (Outreachy candidate)
* get_ticks_per_sec() removal from Rutuja (Outreachy candidate)
* Coverity fix from myself
* PKE implementation from myself, based on rth's XSAVE support

# gpg: Signature made Thu 24 Mar 2016 20:15:11 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"

* remotes/bonzini/tags/for-upstream: (28 commits)
  target-i386: implement PKE for TCG
  config.status: Pass extra parameters
  char: translate from QIOChannel error to errno
  exec: fix error handling in file_ram_alloc
  cputlb: modernise the debug support
  qemu-log: support simple pid substitution for logs
  target-arm: dfilter support for in_asm
  qemu-log: dfilter-ise exec, out_asm, op and opt_op
  qemu-log: new option -dfilter to limit output
  qemu-log: Improve the "exec" TB execution logging
  qemu-log: Avoid function call for disabled qemu_log_mask logging
  qemu-log: correct help text for -d cpu
  tcg: pass down TranslationBlock to tcg_code_gen
  util: move declarations out of qemu-common.h
  Replaced get_tick_per_sec() by NANOSECONDS_PER_SECOND
  hw: explicitly include qemu-common.h and cpu.h
  include/crypto: Include qapi-types.h or qemu/bswap.h instead of qemu-common.h
  isa: Move DMA_transfer_handler from qemu-common.h to hw/isa/isa.h
  Move ParallelIOArg from qemu-common.h to sysemu/char.h
  Move QEMU_ALIGN_*() from qemu-common.h to qemu/osdep.h
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Conflicts:
	scripts/clean-includes
2016-03-24 21:42:40 +00:00
Thomas Huth
57c522f47b hw/net/spapr_llan: Enable the RX buffer pools by default for new machines
RX buffer pools are now enabled by default for new machine types.
For older machine types, they are still disabled to avoid breaking
migration.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Thomas Huth
831e882253 hw/net/spapr_llan: Fix receive buffer handling for better performance
tl;dr:
This patch introduces an alternate way of handling the receive
buffers of the spapr-vlan device, resulting in much better
receive performance for the guest.

Full story:
One of our testers recently discovered that the performance of the
spapr-vlan device is very poor compared to other NICs, and that
a simple "ping -i 0.2 -s 65507 someip" in the guest can result
in more than 50% lost ping packets (especially with older guest
kernels < 3.17).

After doing some analysis, it was clear that there is a problem
with the way we handle the receive buffers in spapr_llan.c: The
ibmveth driver of the guest Linux kernel tries to add a lot of
buffers into several buffer pools (with 512, 2048 and 65536 byte
sizes by default, but it can be changed via the entries in the
/sys/devices/vio/1000/pool* directories of the guest). However,
the spapr-vlan device of QEMU only tries to squeeze all receive
buffer descriptors into one single page which has been supplied
by the guest during the H_REGISTER_LOGICAL_LAN call, without
taking care of different buffer sizes. This has two bad effects:
First, only a very limited number of buffer descriptors is accepted
at all. Second, we also hand 64k buffers to the guest even if
the 2k buffers would fit better - and this results in dropped packets
in the IP layer of the guest since too much skbuf memory is used.

Though it seems at a first glance like PAPR says that we should store
the receive buffer descriptors in the page that is supplied during
the H_REGISTER_LOGICAL_LAN call, chapter 16.4.1.2 in the LoPAPR spec
declares that "the contents of these descriptors are architecturally
opaque, none of these descriptors are manipulated by code above
the architected interfaces". That means we don't have to store
the RX buffer descriptors in this page, but can also manage the
receive buffers at the hypervisor level only. This is now what we
are doing here: Introducing proper RX buffer pools which are also
sorted by size of the buffers, so we can hand out a buffer with
the best fitting size when a packet has been received.

To avoid problems with migration from/to older version of QEMU,
the old behavior is also retained and enabled by default. The new
buffer management has to be enabled via a new "use-rx-buffer-pools"
property.

Now with the new buffer pool management enabled, the problem with
"ping -s 65507" is fixed for me, and the throughput of a simple
test with wget increases from creeping 3MB/s up to 20MB/s!

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Thomas Huth
d6f39fdfcd hw/net/spapr_llan: Extract rx buffer code into separate functions
Refactor the code a little bit by extracting the code that reads
and writes the receive buffer list page into separate functions.
There should be no functional change in this patch, this is just
a preparation for the upcoming extensions that introduce receive
buffer pools.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Rutuja Shah
73bcb24d93 Replaced get_tick_per_sec() by NANOSECONDS_PER_SECOND
This patch replaces get_ticks_per_sec() calls with the macro
NANOSECONDS_PER_SECOND. Also, as there are no callers, get_ticks_per_sec()
is then removed.  This replacement improves the readability and
understandability of code.

For example,

    timer_mod(fdctrl->result_timer,
	      qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + (get_ticks_per_sec() / 50));

NANOSECONDS_PER_SECOND makes it obvious that qemu_clock_get_ns
matches the unit of the expression on the right side of the plus.

Signed-off-by: Rutuja Shah <rutu.shah.26@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:17 +01:00
Paolo Bonzini
4771d756f4 hw: explicitly include qemu-common.h and cpu.h
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:17 +01:00
Markus Armbruster
c80f6e9caa Clean up includes some more
Manually drop redundant includes that scripts/clean-includes misses,
e.g. because they're hidden in generator programs, or they use the
wrong kind of delimiter.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:16 +01:00
Markus Armbruster
da34e65cb4 include/qemu/osdep.h: Don't include qapi/error.h
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef.  Since then, we've moved to include qemu/osdep.h
everywhere.  Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h.  That's in excess of
100KiB of crap most .c files don't actually need.

Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h.  Include qapi/error.h in .c files that need it and don't
get it now.  Include qapi-types.h in qom/object.h for uint16List.

Update scripts/clean-includes accordingly.  Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h.  Update the list of includes in the qemu/osdep.h
comment quoted above similarly.

This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third.  Unfortunately, the number depending on
qapi-types.h shrinks only a little.  More work is needed for that one.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:15 +01:00
Jean-Christophe Dubois
eccfa35e9f i.MX: Add missing descriptions in devices.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: f1f565eb9dffdeb582feb1b15ba9e8b0afcf5468.1456868959.git.jcd@tribudubois.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Jiri Pirko
9fe7101f1d rocker: allow user to specify rocker world by property
Add property to specify rocker world. All ports will be assigned to this
world.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08 15:34:18 +08:00
Jiri Pirko
031143c8d5 rocker: add name field into WorldOps ale let world specify its name
Also use this in world_name getter function.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08 15:34:18 +08:00
Jiri Pirko
39e0c4f47d rocker: return -ENOMEM in case of some world alloc fails
Until now, 0 is returned in this error case. Fix it ro return -ENOMEM.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08 15:34:18 +08:00
Jiri Pirko
0ab9cd9a4b rocker: forbid to change world type
Port to world assignment should be permitted only by qemu user. Driver
should not be able to do it, so forbid that possibility.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08 15:34:18 +08:00
Prasad J Pandit
415ab35a44 net: ne2000: check ring buffer control registers
Ne2000 NIC uses ring buffer of NE2000_MEM_SIZE(49152)
bytes to process network packets. Registers PSTART & PSTOP
define ring buffer size & location. Setting these registers
to invalid values could lead to infinite loop or OOB r/w
access issues. Add check to avoid it.

Reported-by: Yang Hongke <yanghongke@huawei.com>
Tested-by: Yang Hongke <yanghongke@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08 15:34:09 +08:00
Peter Maydell
30456d5ba3 all: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-02-23 12:43:05 +00:00
Victor Kaplansky
5669655aaf vhost-user interrupt management fixes
Since guest_mask_notifier can not be used in vhost-user mode due
to buffering implied by unix control socket, force
use_mask_notifier on virtio devices of vhost-user interfaces, and
send correct callfd to the guest at vhost start.

Using guest_notifier_mask function in vhost-user case may
break interrupt mask paradigm, because mask/unmask is not
really done when returning from guest_notifier_mask call, instead
message is posted in a unix socket, and processed later.

Add an option boolean flag 'use_mask_notifier' to disable the use
of guest_notifier_mask in virtio pci.

Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-18 16:13:56 +02:00
Greg Kurz
3154d1e426 vhost-net: revert support of cross-endian vnet headers
Cross-endian is now handled by the core virtio-net code.

This patch reverts:

commit 5be7d9f1b1
	vhost-net: tell tap backend about the vnet endianness

and

commit cf0a628f6e81bfc9b7a944fa0b80c3594836df56
	net: set endianness on all backend devices

Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-02-16 12:05:17 +02:00
Greg Kurz
1bfa316ce7 virtio-net: use the backend cross-endian capabilities
When running a fully emulated device in cross-endian conditions, including
a virtio 1.0 device offered to a big endian guest, we need to fix the vnet
headers. This is currently handled by the virtio_net_hdr_swap() function
in the core virtio-net code but it should actually be handled by the net
backend.

With this patch, virtio-net now tries to configure the backend to do the
endian fixing when the device starts (i.e. drivers sets the CONFIG_OK bit).
If the backend cannot support the requested endiannes, we have to fallback
onto virtio_net_hdr_swap(): this is recorded in the needs_vnet_hdr_swap flag,
to be used in the TX and RX paths.

Note that we reset the backend to the default behaviour (guest native
endianness) when the device stops (i.e. device status had CONFIG_OK bit and
driver unsets it). This is needed, with the linux tap backend at least,
otherwise the guest may lose network connectivity if rebooted into a
different endianness.

The current vhost-net code also tries to configure net backends. This will
be no more needed and will be reverted in a subsequent patch.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-02-16 12:05:17 +02:00
Eric Blake
d7bce9999d qom: Swap 'name' next to visitor in ObjectPropertyAccessor
Similar to the previous patch, it's nice to have all functions
in the tree that involve a visitor and a name for conversion to
or from QAPI to consistently stick the 'name' parameter next
to the Visitor parameter.

Done by manually changing include/qom/object.h and qom/object.c,
then running this Coccinelle script and touching up the fallout
(Coccinelle insisted on adding some trailing whitespace).

    @ rule1 @
    identifier fn;
    typedef Object, Visitor, Error;
    identifier obj, v, opaque, name, errp;
    @@
     void fn
    - (Object *obj, Visitor *v, void *opaque, const char *name,
    + (Object *obj, Visitor *v, const char *name, void *opaque,
       Error **errp) { ... }

    @@
    identifier rule1.fn;
    expression obj, v, opaque, name, errp;
    @@
     fn(obj, v,
    -   opaque, name,
    +   name, opaque,
        errp)

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1454075341-13658-20-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-02-08 17:29:56 +01:00
Eric Blake
51e72bc1dd qapi: Swap visit_* arguments for consistent 'name' placement
JSON uses "name":value, but many of our visitor interfaces were
called with visit_type_FOO(v, &value, name, errp).  This can be
a bit confusing to have to mentally swap the parameter order to
match JSON order.  It's particularly bad for visit_start_struct(),
where the 'name' parameter is smack in the middle of the
otherwise-related group of 'obj, kind, size' parameters! It's
time to do a global swap of the parameter ordering, so that the
'name' parameter is always immediately after the Visitor argument.

Additional reason in favor of the swap: the existing include/qjson.h
prefers listing 'name' first in json_prop_*(), and I have plans to
unify that file with the qapi visitors; listing 'name' first in
qapi will minimize churn to the (admittedly few) qjson.h clients.

Later patches will then fix docs, object.h, visitor-impl.h, and
those clients to match.

Done by first patching scripts/qapi*.py by hand to make generated
files do what I want, then by running the following Coccinelle
script to affect the rest of the code base:
 $ spatch --sp-file script `git grep -l '\bvisit_' -- '**/*.[ch]'`
I then had to apply some touchups (Coccinelle insisted on TAB
indentation in visitor.h, and botched the signature of
visit_type_enum() by rewriting 'const char *const strings[]' to
the syntactically invalid 'const char*const[] strings').  The
movement of parameters is sufficient to provoke compiler errors
if any callers were missed.

    // Part 1: Swap declaration order
    @@
    type TV, TErr, TObj, T1, T2;
    identifier OBJ, ARG1, ARG2;
    @@
     void visit_start_struct
    -(TV v, TObj OBJ, T1 ARG1, const char *name, T2 ARG2, TErr errp)
    +(TV v, const char *name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp)
     { ... }

    @@
    type bool, TV, T1;
    identifier ARG1;
    @@
     bool visit_optional
    -(TV v, T1 ARG1, const char *name)
    +(TV v, const char *name, T1 ARG1)
     { ... }

    @@
    type TV, TErr, TObj, T1;
    identifier OBJ, ARG1;
    @@
     void visit_get_next_type
    -(TV v, TObj OBJ, T1 ARG1, const char *name, TErr errp)
    +(TV v, const char *name, TObj OBJ, T1 ARG1, TErr errp)
     { ... }

    @@
    type TV, TErr, TObj, T1, T2;
    identifier OBJ, ARG1, ARG2;
    @@
     void visit_type_enum
    -(TV v, TObj OBJ, T1 ARG1, T2 ARG2, const char *name, TErr errp)
    +(TV v, const char *name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp)
     { ... }

    @@
    type TV, TErr, TObj;
    identifier OBJ;
    identifier VISIT_TYPE =~ "^visit_type_";
    @@
     void VISIT_TYPE
    -(TV v, TObj OBJ, const char *name, TErr errp)
    +(TV v, const char *name, TObj OBJ, TErr errp)
     { ... }

    // Part 2: swap caller order
    @@
    expression V, NAME, OBJ, ARG1, ARG2, ERR;
    identifier VISIT_TYPE =~ "^visit_type_";
    @@
    (
    -visit_start_struct(V, OBJ, ARG1, NAME, ARG2, ERR)
    +visit_start_struct(V, NAME, OBJ, ARG1, ARG2, ERR)
    |
    -visit_optional(V, ARG1, NAME)
    +visit_optional(V, NAME, ARG1)
    |
    -visit_get_next_type(V, OBJ, ARG1, NAME, ERR)
    +visit_get_next_type(V, NAME, OBJ, ARG1, ERR)
    |
    -visit_type_enum(V, OBJ, ARG1, ARG2, NAME, ERR)
    +visit_type_enum(V, NAME, OBJ, ARG1, ARG2, ERR)
    |
    -VISIT_TYPE(V, OBJ, NAME, ERR)
    +VISIT_TYPE(V, NAME, OBJ, ERR)
    )

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1454075341-13658-19-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-02-08 17:29:56 +01:00
Peter Maydell
bdad0f3977 pc and misc cleanups and fixes, virtio optimizations
Included here:
 Refactoring and bugfix patches in PC/ACPI.
 New commands for ipmi.
 Virtio optimizations.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWtj8KAAoJECgfDbjSjVRpBIQIAJSB9xwTcBLXwD0+8z5lqjKC
 GTtuVbHU0+Y/eO8O3llN5l+SzaRtPHo18Ele20Oz7IQc0ompANY273K6TOlyILwB
 rOhrub71uqpOKbGlxXJflroEAXb78xVK02lohSUvOzCDpwV+6CS4ZaSer7yDCYkA
 MODZj7rrEuN0RmBWqxbs1R7Mj2CeQJzlgTUNTBGCLEstoZGFOJq8FjVdG5P1q8vI
 fnI9mGJ1JsDnmcUZe/bTFfB4VreqeQ7UuGyNAMMGnvIbr0D1a+CoaMdV7/HZ+KyT
 5TIs0siVdhZei60A/Cq2OtSVCbj5QdxPBLhZfwJCp6oU4lh2U5tSvva0mh7MwJ0=
 =D/cA
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc and misc cleanups and fixes, virtio optimizations

Included here:
Refactoring and bugfix patches in PC/ACPI.
New commands for ipmi.
Virtio optimizations.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Sat 06 Feb 2016 18:44:26 GMT using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream: (45 commits)
  net: set endianness on all backend devices
  fix MSI injection on Xen
  intel_iommu: large page support
  dimm: Correct type of MemoryHotplugState->base
  pc: set the OEM fields in the RSDT and the FADT from the SLIC
  acpi: add function to extract oem_id and oem_table_id from the user's SLIC
  acpi: expose oem_id and oem_table_id in build_rsdt()
  acpi: take oem_id in build_header(), optionally
  pc: Eliminate PcGuestInfo struct
  pc: Move APIC and NUMA data from PcGuestInfo to PCMachineState
  pc: Move PcGuestInfo.fw_cfg to PCMachineState
  pc: Remove PcGuestInfo.isapc_ram_fw field
  pc: Remove RAM size fields from PcGuestInfo
  pc: Remove compat fields from PcGuestInfo
  acpi: Don't save PcGuestInfo on AcpiBuildState
  acpi: Remove guest_info parameters from functions
  pc: Simplify xen_load_linux() signature
  pc: Simplify pc_memory_init() signature
  pc: Eliminate struct PcGuestInfoState
  pc: Move PcGuestInfo declaration to top of file
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-02-08 11:25:31 +00:00
Laurent Vivier
a407644079 net: set endianness on all backend devices
commit 5be7d9f1b1
       vhost-net: tell tap backend about the vnet endianness

makes vhost net to set the endianness of the device, but only for
the first device.

In case of multiqueue, we have multiple devices... This patch sets the
endianness for all the devices of the interface.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
2016-02-06 20:44:10 +02:00
Paolo Bonzini
51b19ebe43 virtio: move allocation to virtqueue_pop/vring_pop
The return code of virtqueue_pop/vring_pop is unused except to check for
errors or 0.  We can thus easily move allocation inside the functions
and just return a pointer to the VirtQueueElement.

The advantage is that we will be able to allocate only the space that
is needed for the actual size of the s/g list instead of the full
VIRTQUEUE_MAX_SIZE items.  Currently VirtQueueElement takes about 48K
of memory, and this kind of allocation puts a lot of stress on malloc.
By cutting the size by two or three orders of magnitude, malloc can
use much more efficient algorithms.

The patch is pretty large, but changes to each device are testable
more or less independently.  Splitting it would mostly add churn.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-02-06 20:39:07 +02:00
Laszlo Ersek
dd793a7488 e1000: eliminate infinite loops on out-of-bounds transfer start
The start_xmit() and e1000_receive_iov() functions implement DMA transfers
iterating over a set of descriptors that the guest's e1000 driver
prepares:

- the TDLEN and RDLEN registers store the total size of the descriptor
  area,

- while the TDH and RDH registers store the offset (in whole tx / rx
  descriptors) into the area where the transfer is supposed to start.

Each time a descriptor is processed, the TDH and RDH register is bumped
(as appropriate for the transfer direction).

QEMU already contains logic to deal with bogus transfers submitted by the
guest:

- Normally, the transmit case wants to increase TDH from its initial value
  to TDT. (TDT is allowed to be numerically smaller than the initial TDH
  value; wrapping at or above TDLEN bytes to zero is normal.) The failsafe
  that QEMU currently has here is a check against reaching the original
  TDH value again -- a complete wraparound, which should never happen.

- In the receive case RDH is increased from its initial value until
  "total_size" bytes have been received; preferably in a single step, or
  in "s->rxbuf_size" byte steps, if the latter is smaller. However, null
  RX descriptors are skipped without receiving data, while RDH is
  incremented just the same. QEMU tries to prevent an infinite loop
  (processing only null RX descriptors) by detecting whether RDH assumes
  its original value during the loop. (Again, wrapping from RDLEN to 0 is
  normal.)

What both directions miss is that the guest could program TDLEN and RDLEN
so low, and the initial TDH and RDH so high, that these registers will
immediately be truncated to zero, and then never reassume their initial
values in the loop -- a full wraparound will never occur.

The condition that expresses this is:

  xdh_start >= s->mac_reg[XDLEN] / sizeof(desc)

i.e., TDH or RDH start out after the last whole rx or tx descriptor that
fits into the TDLEN or RDLEN sized area.

This condition could be checked before we enter the loops, but
pci_dma_read() / pci_dma_write() knows how to fill in buffers safely for
bogus DMA addresses, so we just extend the existing failsafes with the
above condition.

This is CVE-2016-1981.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Prasad Pandit <ppandit@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: qemu-stable@nongnu.org
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1296044
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-02-04 14:13:11 +08:00
Michael S. Tsirkin
d7f053652f cadence_gem: fix buffer overflow
gem_transmit copies a packet from guest into an tx_packet[2048]
array on stack, with size limited by descriptor length set by guest.  If
guest is malicious and specifies a descriptor length that is too large,
and should packet size exceed array size, this results in a buffer
overflow.

Reported-by: 刘令 <liuling-it@360.cn>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-02-04 13:22:06 +08:00
Prasad J Pandit
244381ec19 net: cadence_gem: check packet size in gem_recieve
While receiving packets in 'gem_receive' routine, if Frame Check
Sequence(FCS) is enabled, it copies the packet into a local
buffer without checking its size. Add check to validate packet
length against the buffer size to avoid buffer overflow.

Reported-by: Ling Liu <liuling-it@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-02-04 13:22:06 +08:00
Peter Maydell
e8d4046559 hw/net: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-19-git-send-email-peter.maydell@linaro.org
2016-01-29 15:07:23 +00:00
Peter Maydell
9b8bfe21be virtio: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-15-git-send-email-peter.maydell@linaro.org
2016-01-29 15:07:23 +00:00
Peter Maydell
21cbfe5f37 xen: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-14-git-send-email-peter.maydell@linaro.org
2016-01-29 15:07:23 +00:00
Peter Maydell
8ef94f0bc9 arm: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-13-git-send-email-peter.maydell@linaro.org
2016-01-29 15:07:23 +00:00
Peter Maydell
0d75590d91 ppc: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-6-git-send-email-peter.maydell@linaro.org
2016-01-29 15:07:22 +00:00
Peter Maydell
ea99dde191 lm32: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-5-git-send-email-peter.maydell@linaro.org
2016-01-29 15:07:22 +00:00
Ian Campbell
c1345a8878 xen: Switch to libxengnttab interface for compat shims.
In Xen 4.7 we are refactoring parts libxenctrl into a number of
separate libraries which will provide backward and forward API and ABI
compatiblity.

One such library will be libxengnttab which provides access to grant
tables.

In preparation for this switch the compatibility layer in xen_common.h
(which support building with older versions of Xen) to use what will
be the new library API. This means that the gnttab shim will disappear
for versions of Xen which include libxengnttab.

To simplify things for the <= 4.0.0 support we wrap the int fd in a
malloc(sizeof int) such that the handle is always a pointer. This
leads to less typedef headaches and the need for
XC_HANDLER_INITIAL_VALUE etc for these interfaces.

Note that this patch does not add any support for actually using
libxengnttab, it just adjusts the existing shims.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2016-01-26 17:19:28 +00:00
Peter Maydell
d341d9f306 fpu: Replace uint8 typedef with uint8_t
Replace the uint8 softfloat-specific typedef with uint8_t.
This change was made with

find include hw fpu target-* -name '*.[ch]' | xargs sed -i -e 's/\buint8\b/uint8_t/g'

together with manual removal of the typedef definition and
manual fixing of more erroneous uses found via test compilation.

It turns out that the only code using this type is an accidental
use where uint8_t was intended anyway...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Acked-by: Leon Alrae <leon.alrae@imgtec.com>
Acked-by: James Hogan <james.hogan@imgtec.com>
Message-id: 1452603315-27030-7-git-send-email-peter.maydell@linaro.org
2016-01-22 15:09:21 +00:00
Peter Maydell
3a87d00910 fpu: Replace uint32 typedef with uint32_t
Replace the uint32 softfloat-specific typedef with uint32_t.
This change was made with

find include hw fpu target-* -name '*.[ch]' | xargs sed -i -e 's/\buint32\b/uint32_t/g'

together with manual removal of the typedef definition,
manual undoing of various mis-hits, and another couple of
fixes found via test compilation.

All the uses in hw/ were using the wrong type by mistake.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Acked-by: Leon Alrae <leon.alrae@imgtec.com>
Acked-by: James Hogan <james.hogan@imgtec.com>
Message-id: 1452603315-27030-5-git-send-email-peter.maydell@linaro.org
2016-01-22 15:09:21 +00:00
Markus Armbruster
5a8de107e3 etraxfs_eth: Don't use hw_error() in init() method
Device init() methods aren't supposed to call hw_error(), they should
report the error and fail cleanly.  Do that.

Cc: "Edgar E. Iglesias" <edgar.iglesias@gmail.com>
Signed-off-by: Markus Armbruster <armbru@pond.sub.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-Id: <1450370121-5768-5-git-send-email-armbru@redhat.com>
2016-01-13 11:58:58 +01:00
Dr. David Alan Gilbert
9c7ffe2664 ether/slirp: Avoid redefinition of the same constants
eth.h and slirp.h both define ETH_ALEN and ETH_P_IP
rtl8139.c and eth.h both define ETH_HLEN

Move the related constant (ETH_P_ARP) from slirp.h to eth.h, and
remove the duplicates; make slirp.h include eth.h

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-01-11 11:01:35 +08:00
Prasad J Pandit
aa7f9966df net: ne2000: fix bounds check in ioport operations
While doing ioport r/w operations, ne2000 device emulation suffers
from OOB r/w errors. Update respective array bounds check to avoid
OOB access.

Reported-by: Ling Liu <liuling-it@360.cn>
Cc: qemu-stable@nongnu.org
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-01-11 11:01:35 +08:00
Prasad J Pandit
007cd223de net: rocker: fix an incorrect array bounds check
While processing transmit(tx) descriptors in 'tx_consume' routine
the switch emulator suffers from an off-by-one error, if a
descriptor was to have more than allowed(ROCKER_TX_FRAGS_MAX=16)
fragments. Fix an incorrect bounds check to avoid it.

Reported-by: Qinghao Tang <luodalongde@gmail.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-01-11 11:01:35 +08:00
Shmulik Ladkani
7d6d347d06 vmxnet3: Introduce 'x-disable-pcie' back-compat property
Following the previous patch which changed vmxnet3 to be a pci express
device, this patch introduces a boolean property 'x-disable-pcie' whose
default is false.

Setting 'x-disable-pcie' to 'on' preserves the old 'pci device' (non
express) behavior. This allows migration to older versions.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-01-11 11:01:35 +08:00
Shmulik Ladkani
3509866ab3 vmxnet3: Report the Device Serial Number capability
Report the DSN extended PCI capability at 0x100.
DSN value is a transformation of device MAC address, as calculated
by VMware virtual hardware.

DSN is reported only if device is pcie.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-01-11 11:01:35 +08:00
Shmulik Ladkani
f713d4d2f1 vmxnet3: The vmxnet3 device is a PCIE endpoint
Report the 'express endpoint' capability if on a PCIE bus.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-01-11 11:01:35 +08:00
Shmulik Ladkani
b79f17a9bc vmxnet3: coding: Introduce VMXNET3Class
Introduce a class type for vmxnet3, and the usual
DEVICE_CLASS/DEVICE_GET_CLASS macros.

No semantic change.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-01-11 11:01:34 +08:00
Shmulik Ladkani
b22e0aef46 vmxnet3: Introduce 'x-old-msi-offsets' back-compat property
Following the previous patches, where vmxnet3's pci's msi/msix
capability offsets and msix's PBA table offsets have been changed, this
patch introduces a boolean property 'x-old-msi-offsets' to vmxnet3,
whose default is false.

Setting 'x-old-msi-offsets' to 'on' preserves the old offsets behavior,
which allows migration to older versions.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-01-11 11:01:34 +08:00
Shmulik Ladkani
9c087a0504 vmxnet3: Change the offset of the MSIX PBA table
Place the PBA table at 0x1000, as placed by VMware virtual hardware.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-01-11 11:01:34 +08:00
Shmulik Ladkani
f9262dae13 vmxnet3: Change offsets of msi/msix pci capabilities
Place device reported PCI capabilities at the same offsets as placed by
the VMware virtual hardware: MSI at [84], MSI-X at [9c].

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-01-11 11:01:34 +08:00
Miao Yan
c12d82ef15 net/vmxnet3: rename VMXNET3_DEVICE_VERSION to VMXNET3_UPT_REVISION
VMXNET3_DEVICE_VERSION is used as return value for accessing
UPT Revision Report and Selection register. So rename it
to VMXNET3_UPT_REVISION.

Signed-off-by: Miao Yan <yanmiaoebest@gmail.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-01-11 11:01:34 +08:00
Miao Yan
8856be1512 net/vmxnet3: return 0 on unknown command
Return 0 on unknown command, this is what esxi (5.x+) behaves.

Signed-off-by: Miao Yan <yanmiaobest@gmail.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-01-11 11:01:34 +08:00
Miao Yan
5ae3e91c35 net/vmxnet3: return correct value for VMXNET3_CMD_GET_DEV_EXTRA_INFO
VMXNET3_CMD_GET_DEV_EXTRA_INFO should return 0 for emulation
mode

This behavior can be observed by the following steps:

1) run a Linux distro on esxi server (5.x+)
2) modify vmxnet3 Linux driver to read the register:

  VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD, VMXNET3_CMD_GET_DEV_EXTRA_INFO);
  ret =  VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_CMD);
  pr_info("vmxnet3 dev_info: 0x%x\n", ret);

The kernel log will have some like the following message:

  [ 7005.111170] vmxnet3 dev_info: 0x0

Signed-off-by: Miao Yan <yanmiaobest@gmail.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-01-11 11:01:34 +08:00
Miao Yan
c469669ef7 net/vmxnet3: return correct value for VMXNET3_CMD_GET_DID_* command
VMXNET3_CMD_GET_DID_LO should return PCI ID of the device
and VMXNET3_CMD_GET_DID_HI should return vmxnet3 revision ID.

This behavior can be observed by the following steps:

1) run a Linux distro on esxi server (5.x+)
2) modify vmxnet3 Linux driver to read DID_HI and DID_LO:

  VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD, VMXNET3_CMD_GET_DID_LO);
  lo =  VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_CMD);

  VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD, VMXNET3_CMD_GET_DID_HI);
  high =  VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_CMD);
  pr_info("vmxnet3 DID lo: 0x%x, high: 0x%x\n", lo, high);

The kernel log will have something like the following message:

  [ 7005.111170] vmxnet3 DID lo: 0x7b0, high: 0x1

Signed-off-by: Miao Yan <yanmiaobest@gmail.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-01-11 11:01:34 +08:00