Commit graph

142 commits

Author SHA1 Message Date
Ben Widawsky 5ce097254e drm/i915: Missed dropped VMA conversion
This belonged in
commit 07fe0b1280
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Wed Jul 31 17:00:10 2013 -0700

    drm/i915: plumb VM into bind/unbind code

But it was somehow missed along the way.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-26 10:12:32 +01:00
Ben Widawsky 17601cbc93 drm/i915: Removed unused vm args
i915_gem_execbuffer_relocate became defunct in:
commit 27173f1f95
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Wed Aug 14 11:38:36 2013 +0200

    drm/i915: Convert execbuf code to use vmas

eb_create: never used?

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: The lingering vm parameter to eb_create might have been back
from the days where we didn't yet keep both vmas and obj lists in the
eb struct. But I didn't check really.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-26 10:11:21 +01:00
Ben Widawsky 28cf541543 drm/i915/bdw: unleash PPGTT
v2: Squash in fix from Ben: Set PPGTT batches as necessary

This fixes the regression in the last couple of days when we enabled
PPGTT.

v3: Squash in fixup to still use GTT for secure batches from Ville:

BDW doesn't have a separate secure vs. non-secure bit in
MI_BATCH_BUFFER_START. So for secure batches we have to simply
leave the PPGTT bit unset. Fortunately older generations (except
HSW) had similar limitations so execbuffer already creates a GTT
mapping for all secure batches.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-08 18:09:48 +01:00
Ben Widawsky 3c94ceeee2 drm/i915/bdw: Support 64b relocations
We don't actually return any to userspace yet, however we can pretend
like we do now so userspace will support it when it happens.

This is just to please Chris as the code itself isn't ready for > 64b
relocations.

v2: Rebase on top of the refactored relocate_entry_gtt|cpu functions.

v3: Squash in fixup from Rafal Barbalho for 64 byte relocs using cpu
relocs and those crossing a page boundary.

v4: Squash in a fixup for the fixup from Rafael.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Barbalho, Rafael <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-08 18:09:41 +01:00
Ben Widawsky e2d05a8b1e drm/i915: Convert active API to VMA
Even though we track object activity and not VMA, because we have the
active_list be based on the VM, it makes the most sense to use VMAs in
the APIs.

NOTE: Daniel intends to eventually rip out active/inactive LRUs, but for
now, leave them be.

v2: Remove leftover hunk from the previous patch which didn't keep
i915_gem_object_move_to_active. That patch had to rely on the ring to
get the dev instead of the obj. (Chris)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-01 07:45:21 +02:00
Daniel Vetter b205ca5721 drm/i915: Use unsigned for overflow checks in execbuf
There's actually no real risk since we already check for stricter
constraints earlier (using UINT_MAX / sizeof (struct
drm_i915_gem_exec_object2) as the limit). But in eb_create we use
signed integers, which steals a factor of 2. Luckily struct
drm_i915_gem_exec_object2 for this to not matter.

Still, be consistent and use unsigned integers.

Similar use unsinged integers when checking for overflows in the
relocation entry processing.

I've also added a new subtests to igt/gem_reloc_overflow to also
test for overflowing args->buffer_count values.

v2: Give the variables again tighter scope to make it clear that the
computation is purely local and doesn't leak out to the 2nd block.
Requested by Chris Wilson.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-01 07:45:03 +02:00
Daniel Vetter a1e2265332 drm/i915: Use kcalloc more
No buffer overflows here, but better safe than sorry.

v2:
- Fixup the sizeof conversion, I've missed the pointer deref (Jani).
- Drop the redundant GFP_ZERO, kcalloc alreads memsets (Jani).
- Use kmalloc_array for the execbuf fastpath to avoid the memset
  (Chris). I've opted to leave all other conversions as-is since they
  aren't in a fastpath and dealing with cleared memory instead of
  random garbage is just generally nicer.

Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Drop the contentious kmalloc_array hunk in execbuf.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-01 07:45:01 +02:00
Ben Widawsky 68c8c17f52 drm/i915: evict VM instead of everything
When reserving objects during execbuf, it is possible to come across an
object which will not fit given the current fragmentation of the address
space. We do not have any defragment in drm_mm, so the strategy is to
instead evict everything, and reallocate objects.

With the upcoming addition of multiple VMs, there is no point to evict
everything since doing so is overkill for the specific case mentioned
above.

Recommended-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: One additional s/evict_everything/evict_vm/ to update a
comment in the code.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-12 21:58:22 +02:00
Mika Kuoppala be62acb4cc drm/i915: ban badly behaving contexts
Now when we have mechanism in place to track which context
was guilty of hanging the gpu, it is possible to punish
for bad behaviour.

If context has recently submitted a faulty batchbuffers guilty of
gpu hang and submits another batch which hangs gpu in quick
succession, ban it permanently. If ctx is banned, no more
batchbuffers will be queued for execution.

There is no need for global wedge machinery anymore and
it would be unwise to wedge the whole gpu if we have multiple
hanging batches queued for execution. Instead just ban
the guilty ones and carry on.

v2: Store guilty ban status bool in gpu_error instead of pointers
    that might become danling before hang is declared.

v3: Use return value for banned status instead of stashing state
    into gpu_error (Chris Wilson)

v4: - rebase on top of fixed hang stats api
    - add define for ban period
    - rename commit and improve commit msg

v5: - rely context banning instead of wedging the gpu
    - beautification and fix for ban calculation (Chris)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-06 17:55:50 +02:00
Chris Wilson 2cc86b8260 drm/i915: Always prefer CPU relocations with LLC
A follow-on to the update of the LLC coherency logic is that we can rely
on the LLC being coherent with the CS for rewriting batchbuffers
irrespective of their cache domain. (This should have no effect
currently as all the batch buffers are expected to be I915_CACHE_LLC and
so using the cpu relocation path anyway.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:43 +02:00
Daniel Vetter e656a6cba0 drm/i915: inline vma_create into lookup_or_create_vma
In the execbuf code we don't clean up any vmas which ended up not
getting bound for code simplicity. To make sure that we don't end up
creating multiple vma for the same vm kill the somewhat dangerous
vma_create function and inline it into lookup_or_create.

This is just a safety measure to prevent surprises in the future.

Also update the somewhat confused comment in the execbuf code and
clarify what kind of magic is going on with a new one.

v2: Keep the function separate as requested by Chris. But give it a __
prefix for paranoia and move it tighter together with the other vma
stuff.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:41 +02:00
Ben Widawsky 27173f1f95 drm/i915: Convert execbuf code to use vmas
In order to transition more of our code over to using a VMA instead of
an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up
until now, we've only had a VMA when actually binding an object.

The previous patch helped handle the distinction on bound vs. unbound.
This patch will help us catch leaks, and other issues before we actually
shuffle a bunch of stuff around.

This attempts to convert all the execbuf code to speak in vmas. Since
the execbuf code is very self contained it was a nice isolated
conversion.

The meat of the code is about turning eb_objects into eb_vma, and then
wiring up the rest of the code to use vmas instead of obj, vm pairs.

Unfortunately, to do this, we must move the exec_list link from the obj
structure. This list is reused in the eviction code, so we must also
modify the eviction code to make this work.

WARNING: This patch makes an already hotly profiled path slower. The cost is
unavoidable. In reply to this mail, I will attach the extra data.

v2: Release table lock early, and two a 2 phase vma lookup to avoid
having to use a GFP_ATOMIC. (Chris)

v3: s/obj_exec_list/obj_exec_link/
Updates to address
commit 6d2b888569
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Aug 7 18:30:54 2013 +0100

    drm/i915: List objects allocated from stolen memory in debugfs

v4: Use obj = vma->obj for neatness in some places (Chris)
need_reloc_mappable() should return false if ppgtt (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out prep patches. Also remove a FIXME comment which is
now taken care of.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:41 +02:00
Daniel Vetter d4d36014ca drm/i915: fix up the relocate_entry refactoring
Somehow we've lost the error handling in the patch split-up between
the internal and external patch. This regression has been introduced
in

commit 5032d871f7
Author: Rafael Barbalho <rafael.barbalho@intel.com>
Date:   Wed Aug 21 17:10:51 2013 +0100

    drm/i915: Cleaning up the relocate entry function

This bug is exercised by igt/gem_reloc_vs_gpu/interruptible.

Cc: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-03 19:18:01 +02:00
Rafael Barbalho 5032d871f7 drm/i915: Cleaning up the relocate entry function
As the relocate entry function was getting a bit too big I've moved
the code that used to use either the cpu or the gtt to for the
relocation into two separate functions.

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-23 14:52:31 +02:00
Chris Wilson 000433b67e drm/i915: Only do a chipset flush after a clflush
Now that we skip clflushes more often, return a boolean indicating
whether the clflush was actually performed, and only if it was do the
chipset flush. (Though on most of the architectures where the clflush will
be skipped, the chipset flush is a no-op!)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:34 +02:00
Chris Wilson 2c22569bba drm/i915: Update rules for writing through the LLC with the cpu
As mentioned in the previous commit, reads and writes from both the CPU
and GPU go through the LLC. This gives us coherency between the CPU and
GPU irrespective of the attribute settings either device sets. We can
use to avoid having to clflush even uncached memory.

Except for the scanout.

The scanout resides within another functional block that does not use
the LLC but reads directly from main memory. So in order to maintain
coherency with the scanout, writes to uncached memory must be flushed.
In order to optimize writes elsewhere, we start tracking whether an
framebuffer is attached to an object.

v2: Use pin_display tracking rather than fb_count (to ensure we flush
cursors as well etc) and only force the clflush along explicit writes to
the scanout paths (i.e. pin_to_display_plane and pwrite into scanout).

v3: Force the flush after hitting the slowpath in pwrite, as after
dropping the lock the object's cache domain may be invalidated. (Ville)

Based on a patch by Ville Syrjälä.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-10 11:20:49 +02:00
Ben Widawsky ca191b1313 drm/i915: mm_list is per VMA
formerly: "drm/i915: Create VMAs (part 5) - move mm_list"

The mm_list is used for the active/inactive LRUs. Since those LRUs are
per address space, the link should be per VMx .

Because we'll only ever have 1 VMA before this point, it's not incorrect
to defer this change until this point in the patch series, and doing it
here makes the change much easier to understand.

Shamelessly manipulated out of Daniel:
"active/inactive stuff is used by eviction when we run out of address
space, so needs to be per-vma and per-address space. Bound/unbound otoh
is used by the shrinker which only cares about the amount of memory used
and not one bit about in which address space this memory is all used in.
Of course to actual kick out an object we need to unbind it from every
address space, but for that we have the per-object list of vmas."

v2: only bump GGTT LRU in i915_gem_object_set_to_gtt_domain (Chris)

v3: Moved earlier in the series

v4: Add dropped message from v3

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Frob patch to apply and use vma->node.size directly as
discused with Ben. Also drop a needles BUG_ON before move_to_inactive,
the function itself has the same check.]
[danvet 2nd: Rebase on top of the lost "drm/i915: Cleanup more of VMA
in destroy", specifically unlink the vma from the mm_list in
vma_unbind (to keep it symmetric with bind_to_vm) instead of
vma_destroy.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-08 14:06:58 +02:00
Ben Widawsky 9843877d10 drm/i915: turn bound_ggtt checks to bound_any
In some places, we want to know if an object is bound in any address
space, and not just the global GTT. This often applies when there is a
single global resource (object, pages, etc.)

function                             |      reason
--------------------------------------------------
i915_gem_object_is_inactive          | global object
i915_gem_object_put_pages            | object's pages
915_gem_object_unpin                 | global object
i915_gem_execbuffer_unreserve_object | temporary until we plumb vma
pread/pwrite                         | see the note below

Note: set_to_gtt_domain in pwrite/pread is abused as a wait_rendering
call - but that once only worked if the object is bound. We really
should replace this with a plain wait_rendering call, which would have
the upside that in pread it would be clearer that we actually only
wait for oustanding gpu writes.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Explain the set_to_gtt_domain in pwrite/pread and volunteer
Ben to replace those with wait_rendering calls.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-08 14:04:43 +02:00
Ben Widawsky 07fe0b1280 drm/i915: plumb VM into bind/unbind code
As alluded to in several patches, and it will be reiterated later... A
VMA is an abstraction for a GEM BO bound into an address space.
Therefore it stands to reason, that the existing bind, and unbind are
the ones which will be the most impacted. This patch implements this,
and updates all callers which weren't already updated in the series
(because it was too messy).

This patch represents the bulk of an earlier, larger patch. I've pulled
out a bunch of things by the request of Daniel. The history is preserved
for posterity with the email convention of ">" One big change from the
original patch aside from a bunch of cropping is I've created an
i915_vma_unbind() function. That is because we always have the VMA
anyway, and doing an extra lookup is useful. There is a caveat, we
retain an i915_gem_object_ggtt_unbind, for the global cases which might
not talk in VMAs.

> drm/i915: plumb VM into object operations
>
> This patch was formerly known as:
> "drm/i915: Create VMAs (part 3) - plumbing"
>
> This patch adds a VM argument, bind/unbind, and the object
> offset/size/color getters/setters. It preserves the old ggtt helper
> functions because things still need, and will continue to need them.
>
> Some code will still need to be ported over after this.
>
> v2: Fix purge to pick an object and unbind all vmas
> This was doable because of the global bound list change.
>
> v3: With the commit to actually pin/unpin pages in place, there is no
> longer a need to check if unbind succeeded before calling put_pages().
> Make put_pages only BUG() after checking pin count.
>
> v4: Rebased on top of the new hangcheck work by Mika
> plumbed eb_destroy also
> Many checkpatch related fixes
>
> v5: Very large rebase
>
> v6:
> Change BUG_ON to WARN_ON (Daniel)
> Rename vm to ggtt in preallocate stolen, since it is always ggtt when
> dealing with stolen memory. (Daniel)
> list_for_each will short-circuit already (Daniel)
> remove superflous space (Daniel)
> Use per object list of vmas (Daniel)
> Make obj_bound_any() use obj_bound for each vm (Ben)
> s/bind_to_gtt/bind_to_vm/ (Ben)
>
> Fixed up the inactive shrinker. As Daniel noticed the code could
> potentially count the same object multiple times. While it's not
> possible in the current case, since 1 object can only ever be bound into
> 1 address space thus far - we may as well try to get something more
> future proof in place now. With a prep patch before this to switch over
> to using the bound list + inactive check, we're now able to carry that
> forward for every address space an object is bound into.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Rebase on top of the loss of "drm/i915: Cleanup more of VMA
in destroy".]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-08 14:04:20 +02:00
Ben Widawsky 28d6a7bfa2 drm/i915: thread address space through execbuf
This represents the first half of hooking up VMs to execbuf. Here we
basically pass an address space all around to the different internal
functions. It should be much more readable, and have less risk than the
second half, which begins switching over to using VMAs instead of an
obj,vm.

The overall series echoes this style of, "add a VM, then make it smart
later"

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Switch a BUG_ON to WARN_ON.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-05 19:04:11 +02:00
Ben Widawsky c37e220461 drm/i915: Add VM to pin
To verbalize it, one can say, "pin an object into the given address
space." The semantics of pinning remain the same otherwise.

Certain objects will always have to be bound into the global GTT.
Therefore, global GTT is a special case, and keep a special interface
around for it (i915_gem_obj_ggtt_pin).

v2: s/i915_gem_ggtt_pin/i915_gem_obj_ggtt_pin

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-05 19:04:09 +02:00
Chris Wilson de51f04f06 drm/i915: Replace open-coded offset_in_page()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-26 19:45:11 +02:00
Xiong Zhang 0b74b508f7 drm/i915: add prefault_disable module option
prefault is stll enabled by default which prevent most of pwrite/pread/reloc
from running slow path, in order to verify these slow pathes, prefault need
to be disabled.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
[danvet: Make checkpatch happy and bikeshed the module option help
text a bit.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-19 09:29:26 +02:00
Chris Wilson e852096986 drm/i915: Replace open-coding of DEFAULT_CONTEXT_ID
The intent of the check is made more clear if we use the proper name for
0 here.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-16 10:40:40 +02:00
Daniel Vetter db1b76ca6a drm/i915: don't frob mm.suspended when not using ums
In kernel modeset driver mode we're in full control of the chip,
always. So there's no need at all to set mm.suspended in
i915_gem_idle. Hence move that out into the leavevt ioctl. Since
i915_gem_idle doesn't suspend gem any more we can also drop the
re-enabling for KMS in the thaw function.

Also clean up the handling of mm.suspend at driver load by coalescing
all the assignments.

Stumbled over while reading through our resume code for unrelated
reasons.

v2: Shovel mm.suspended into the (newly created) ums dungeon as
suggested by Chris Wilson. The plan is that once we've completely
stopped relying on the register save/restore code we could shovel even
that in there.

v3: Improve the locking for the entervt/leavevt ioctls a bit by moving
the dev->struct_mutex locking outside of i915_gem_idle. Also don't
clear dev_priv->ums.mm_suspended for the kms case, we allocate it with
kzalloc. Both suggested by Chris Wilson.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-10 14:30:25 +02:00
Ben Widawsky f343c5f647 drm/i915: Getter/setter for object attributes
Soon we want to gut a lot of our existing assumptions how many address
spaces an object can live in, and in doing so, embed the drm_mm_node in
the object (and later the VMA).

It's possible in the future we'll want to add more getter/setter
methods, but for now this is enough to enable the VMAs.

v2: Reworked commit message (Ben)
Added comments to the main functions (Ben)
sed -i "s/i915_gem_obj_set_color/i915_gem_obj_ggtt_set_color/" drivers/gpu/drm/i915/*.[ch]
sed -i "s/i915_gem_obj_bound/i915_gem_obj_ggtt_bound/" drivers/gpu/drm/i915/*.[ch]
sed -i "s/i915_gem_obj_size/i915_gem_obj_ggtt_size/" drivers/gpu/drm/i915/*.[ch]
sed -i "s/i915_gem_obj_offset/i915_gem_obj_ggtt_offset/" drivers/gpu/drm/i915/*.[ch]
(Daniel)

v3: Rebased on new reserve_node patch
Changed DRM_DEBUG_KMS to actually work (will need fixing later)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-08 22:04:34 +02:00
Mika Kuoppala 7d736f4f0b drm/i915: add batch bo to i915_add_request()
In order to track down a batch buffer and context which
caused the ring to hang, store reference to bo into the request struct.
Request can also cause gpu to hang after the batch in the flush section
in the ring. To detect this add start of the flush portion offset into the
request.

v2: Included comment about request vs batch_obj lifetimes (Chris Wilson)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-13 17:42:16 +02:00
Mika Kuoppala 0025c0772d drm/i915: change i915_add_request to macro
Only execbuffer needed all the parameters on i915_add_request().
By putting __i915_add_request behind macro, all current callsites
become cleaner. Following patch will introduce a new parameter
for __i915_add_request. With this patch, only the relevant callsite
will reflect the change making commit smaller and easier to understand.

v2: _i915_add_request as function name (Chris Wilson)

v3: change name __i915_add_request and fix ordering of params (Ben Widawsky)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-13 17:42:15 +02:00
Chris Wilson c65355bbef drm/i915: Track when we dirty the scanout with render commands
This is required for tracking render damage for use with FBC and will be
used in subsequent patches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-07 17:56:45 +02:00
Xiang, Haihao 82f91b6e93 drm/i915: add I915_EXEC_VEBOX to i915_gem_do_execbuffer()
A user can run batchbuffer via VEBOX ring.

Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-31 20:54:21 +02:00
Dave Airlie 399403c7ce Merge tag 'drm-intel-next-2013-03-23' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Daniel writes:
Highlights:
- Imre's for_each_sg_pages rework (now also with the stolen mem backed
  case fixed with a hack) plus the drm prime sg list coalescing patch from
  Rahul Sharma. I have some follow-up cleanups pending, already acked by
  Andrew Morton.
- Some prep-work for the crazy no-pch/display-less platform by Ben.
- Some vlv patches, by far not all (Jesse et al).
- Clean up the HDMI/SDVO #define confusion (Paulo)
- gen2-4 vblank fixes from Ville.
- Unclaimed register warning fixes for hsw (Paulo). More still to come ...
- Complete pageflips which have been stuck in a gpu hang, should prevent
  stuck gl compositors (Ville).
- pm patches for vt-switchless resume (Jesse). Note that the i915 enabling
  is not (yet) included, that took a bit longer to settle. PM patches are
  acked by Rafael Wysocki.
- Minor fixlets all over from various people.

* tag 'drm-intel-next-2013-03-23' of git://people.freedesktop.org/~danvet/drm-intel: (79 commits)
  drm/i915: Implement WaSwitchSolVfFArbitrationPriority
  drm/i915: Set the VIC in AVI infoframe for SDVO
  drm/i915: Kill a strange comment about DPMS functions
  drm/i915: Correct sandybrige overclocking
  drm/i915: Introduce GEN7_FEATURES for device info
  drm/i915: Move num_pipes to intel info
  drm/i915: fixup pd vs pt confusion in gen6 ppgtt code
  style nit: Align function parameter continuation properly.
  drm/i915: VLV doesn't have HDMI on port C
  drm/i915: DSPFW and BLC regs are in the display offset range
  drm/i915: set conservative clock gating values on VLV v2
  drm/i915: fix WaDisablePSDDualDispatchEnable on VLV v2
  drm/i915: add more VLV IDs
  drm/i915: use VLV DIP routines on VLV v2
  drm/i915: add media well to VLV force wake routines v2
  drm/i915: don't use plane pipe select on VLV
  drm: modify pages_to_sg prime helper to create optimized SG table
  drm/i915: use for_each_sg_page for setting up the gtt ptes
  drm/i915: create compact dma scatter lists for gem objects
  drm/i915: handle walking compact dma scatter lists
  ...
2013-04-05 10:18:13 +10:00
Lauri Kasanen 27b7c63a7c drm/i915: Fix build failure
ERROR: "__build_bug_on_failed" [drivers/gpu/drm/i915/i915.ko] undefined!

Originally reported at http://www.gossamer-threads.com/lists/linux/kernel/1631803
FDO bug #62775

This needs to be backported to both 3.7 and 3.8 stable trees. Doesn't apply straight,
but it's a quick change.

Signed-off-by: Lauri Kasanen <cand@gmx.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62775
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-03-27 15:05:42 +01:00
Ben Widawsky 41fda59682 drm/i915: Remove unneeded dev argument
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-03-18 03:03:19 +01:00
Ben Widawsky cf144969d5 drm/i915: Remove unused file arg from execbuf
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-03-18 03:02:53 +01:00
Kees Cook 3118a4f652 drm/i915: bounds check execbuffer relocation count
It is possible to wrap the counter used to allocate the buffer for
relocation copies. This could lead to heap writing overflows.

CVE-2013-0913

v3: collapse test, improve comment
v2: move check into validate_exec_list

Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Pinkie Pie
Cc: stable@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-03-13 21:31:03 +01:00
Kees Cook 3058753583 drm/i915: clarify reasoning for the access_ok call
This clarifies the comment above the access_ok check so a missing
VERIFY_READ doesn't alarm anyone.

v2:
 - rewrote comment, thanks to Chris Wilson

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: add patch history log to commit message.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-03-13 21:17:28 +01:00
Ville Syrjälä 2bb4629add drm/i915: Add to_user_ptr()
to_user_ptr() simply casts a pointer passed as u64 from user space
to void __user * correctly. Using this lets us get rid of all the
tiresome casts.

The idea came from Chris Wilson <chris@chris-wilson.co.uk>.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-03-03 19:49:11 +01:00
Dave Airlie 6dc1c49da6 Merge branch 'fbcon-locking-fixes' of ssh://people.freedesktop.org/~airlied/linux into drm-next
This pulls in most of Linus tree up to -rc6, this fixes the worst lockdep
reported issues and re-enables fbcon lockdep.

(not the fbcon maintainer)
* 'fbcon-locking-fixes' of ssh://people.freedesktop.org/~airlied/linux: (529 commits)
  Revert "Revert "console: implement lockdep support for console_lock""
  fbcon: fix locking harder
  fb: Yet another band-aid for fixing lockdep mess
  fb: rework locking to fix lock ordering on takeover
2013-02-08 12:10:18 +10:00
Ben Widawsky 5d4545aef5 drm/i915: Create a gtt structure
The purpose of the gtt structure is to help isolate our gtt specific
properties from the rest of the code (in doing so it help us finish the
isolation from the AGP connection).

The following members are pulled out (and renamed):
gtt_start
gtt_total
gtt_mappable_end
gtt_mappable
gtt_base_addr
gsm

The gtt structure will serve as a nice place to put gen specific gtt
routines in upcoming patches. As far as what else I feel belongs in this
structure: it is meant to encapsulate the GTT's physical properties.
This is why I've not added fields which track various drm_mm properties,
or things like gtt_mtrr (which is itself a pretty transient field).

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[Ben modified commit messages]
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-17 22:33:56 +01:00
Chris Wilson eef90ccb8a drm/i915: Use the reloc.handle as an index into the execbuffer array
Using copywinwin10 as an example that is dependent upon emitting a lot
of relocations (2 per operation), we see improvements of:

c2d/gm45: 618000.0/sec to 623000.0/sec.
i3-330m: 748000.0/sec to 789000.0/sec.

(measured relative to a baseline with neither optimisations applied).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-17 22:23:47 +01:00
Daniel Vetter ed5982e6ce drm/i915: Allow userspace to hint that the relocations were known
Userspace is able to hint to the kernel that its command stream and
auxiliary state buffers already hold the correct presumed addresses and
so the relocation process may be skipped if the kernel does not need to
move any buffers in preparation for the execbuffer. Thus for the common
case where the allotment of buffers is static between batches, we can
avoid the overhead of individually checking the relocation entries.

Note that this requires userspace to supply the domain tracking and
requests for workarounds itself that would otherwise be computed based
upon the relocation entries.

Using copywinwin10 as an example that is dependent upon emitting a lot
of relocations (2 per operation), we see improvements of:

c2d/gm45: 618000.0/sec to 632000.0/sec.
i3-330m: 748000.0/sec to 830000.0/sec.

(measured relative to a baseline with neither optimisations applied).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
[danvet: Fixup merge conflict in userspace header due to different
baseline trees.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-17 22:23:36 +01:00
Chris Wilson bcffc3faa6 drm/i915: Move the execbuffer objects list from the stack into the tracker
Instead of passing around the eb-objects hashtable and a separate object
list, we can include the object list into the eb-objects structure for
convenience.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-17 22:08:02 +01:00
Chris Wilson 3b96eff447 drm/i915: Take the handle idr spinlock once for looking up the exec objects
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-17 22:08:01 +01:00
Chris Wilson 419fa72a19 drm/i915: Mark a temporary allocation for copy-from-user as such
The difference is that the kernel will then know that this memory will
be reclaimable in the near future.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-17 22:08:00 +01:00
Dave Airlie b5cc6c0387 Merge tag 'drm-intel-next-2012-12-21' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Daniel writes:
- seqno wrap fixes and debug infrastructure from Mika Kuoppala and Chris
  Wilson
- some leftover kill-agp on gen6+ patches from Ben
- hotplug improvements from Damien
- clear fb when allocated from stolen, avoids dirt on the fbcon (Chris)
- Stolen mem support from Chris Wilson, one of the many steps to get to
  real fastboot support.
- Some DDI code cleanups from Paulo.
- Some refactorings around lvds and dp code.
- some random little bits&pieces

* tag 'drm-intel-next-2012-12-21' of git://people.freedesktop.org/~danvet/drm-intel: (93 commits)
  drm/i915: Return the real error code from intel_set_mode()
  drm/i915: Make GSM void
  drm/i915: Move GSM mapping into dev_priv
  drm/i915: Move even more gtt code to i915_gem_gtt
  drm/i915: Make next_seqno debugs entry to use i915_gem_set_seqno
  drm/i915: Introduce i915_gem_set_seqno()
  drm/i915: Always clear semaphore mboxes on seqno wrap
  drm/i915: Initialize hardware semaphore state on ring init
  drm/i915: Introduce ring set_seqno
  drm/i915: Missed conversion to gtt_pte_t
  drm/i915: Bug on unsupported swizzled platforms
  drm/i915: BUG() if fences are used on unsupported platform
  drm/i915: fixup overlay stolen memory leak
  drm/i915: clean up PIPECONF bpc #defines
  drm/i915: add intel_dp_set_signal_levels
  drm/i915: remove leftover display.update_wm assignment
  drm/i915: check for the PCH when setting pch_transcoder
  drm/i915: Clear the stolen fb before enabling
  drm/i915: Access to snooped system memory through the GTT is incoherent
  drm/i915: Remove stale comment about intel_dp_detect()
  ...

Conflicts:
	drivers/gpu/drm/i915/intel_display.c
2013-01-17 20:34:08 +10:00
Chris Wilson 262b6d363f drm/i915: Invalidate the relocation presumed_offsets along the slow path
In the slow path, we are forced to copy the relocations prior to
acquiring the struct mutex in order to handle pagefaults. We forgo
copying the new offsets back into the relocation entries in order to
prevent a recursive locking bug should we trigger a pagefault whilst
holding the mutex for the reservations of the execbuffer. Therefore, we
need to reset the presumed_offsets just in case the objects are rebound
back into their old locations after relocating for this exexbuffer - if
that were to happen we would assume the relocations were valid and leave
the actual pointers to the kernels dangling, instant hang.

Fixes regression from commit bcf50e2775
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Nov 21 22:07:12 2010 +0000

    drm/i915: Handle pagefaults in execbuffer user relocations

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55984
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@fwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-16 10:53:38 +01:00
Daniel Vetter b45305fce5 drm/i915: Implement workaround for broken CS tlb on i830/845
Now that Chris Wilson demonstrated that the key for stability on early
gen 2 is to simple _never_ exchange the physical backing storage of
batch buffers I've tried a stab at a kernel solution. Doesn't look too
nefarious imho, now that I don't try to be too clever for my own good
any more.

v2: After discussing the various techniques, we've decided to always blit
batches on the suspect devices, but allow userspace to opt out of the
kernel workaround assume full responsibility for providing coherent
batches. The principal reason is that avoiding the blit does improve
performance in a few key microbenchmarks and also in cairo-trace
replays.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet:
- Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring
  wrap w/a. Suggested by Chris Wilson.
- Also add the ACTHD check from Chris Wilson for the error state
  dumping, so that we still catch batches when userspace opts out of
  the w/a.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-17 17:27:02 +01:00
Chris Wilson c1f093e09c drm/i915: Remove check for conflicting relocation write-domains
Simply use the last write-domain set for the object in the batch,
trusting userspace to have correctly flushed the caches between usage as
a write target. This check dates back from the golden age of having only
a single operation per batch with the kernel repeating it for each
cliprect, and conflicts both with userspace trying to efficiently batch
multiple operations and with reducing the kernel overhead of relocation
processing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-03 20:13:16 +01:00
Ville Syrjälä ca9c46c5c7 drm/i915: Kill i915_gem_execbuffer_wait_for_flips()
As per Chris Wilson's suggestion make
i915_gem_execbuffer_wait_for_flips() go away.

This was used to stall the GPU ring while there are pending
page flips involving the relevant BO. Ie. while the BO is still
being scanned out by the display controller.

The recommended alternative is to use the page flip events to
wait for the page flips to fully complete before reusing the BO
of the old front buffer. Or use more buffers.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: don't remove obj->pending_flips, still required due to
reorder patches.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-29 11:58:15 +01:00
Chris Wilson 9d7730914f drm/i915: Preallocate next seqno before touching the ring
Based on the work by Mika Kuoppala, we realised that we need to handle
seqno wraparound prior to committing our changes to the ring. The most
obvious point then is to grab the seqno inside intel_ring_begin(), and
then to reuse that seqno for all ring operations until the next request.
As intel_ring_begin() can fail, the callers must already be prepared to
handle such failure and so we can safely add further checks.

This patch looks like it should be split up into the interface
changes and the tweaks to move seqno wrapping from the execbuffer into
the core seqno increment. However, I found no easy way to break it into
incremental steps without introducing further broken behaviour.

v2: Mika found a silly mistake and a subtle error in the existing code;
inside i915_gem_retire_requests() we were resetting the sync_seqno of
the target ring based on the seqno from this ring - which are only
related by the order of their allocation, not retirement. Hence we were
applying the optimisation that the rings were synchronised too early,
fortunately the only real casualty there is the handling of seqno
wrapping.

v3: Do not forget to reset the sync_seqno upon module reinitialisation,
ala resume.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=863861
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-29 11:43:52 +01:00