linux/arch/arm/mm
Doug Anderson 33298ef6d8 ARM: 8505/1: dma-mapping: Optimize allocation
The __iommu_alloc_buffer() is expected to be called to allocate pretty
sizeable buffers.  Upon simple tests of video I saw it trying to
allocate 4,194,304 bytes.  The function tries to allocate large chunks
in order to optimize IOMMU TLB usage.

The current function is very, very slow.

One problem is the way it keeps trying and trying to allocate big
chunks.  Imagine a very fragmented memory that has 4M free but no
contiguous pages at all.  Further imagine allocating 4M (1024 pages).
We'll do the following memory allocations:
- For page 1:
  - Try to allocate order 10 (no retry)
  - Try to allocate order 9 (no retry)
  - ...
  - Try to allocate order 0 (with retry, but not needed)
- For page 2:
  - Try to allocate order 9 (no retry)
  - Try to allocate order 8 (no retry)
  - ...
  - Try to allocate order 0 (with retry, but not needed)
- ...
- ...

Total number of calls to alloc() calls for this case is:
  sum(int(math.log(i, 2)) + 1 for i in range(1, 1025))
  => 9228

The above is obviously worse case, but given how slow alloc can be we
really want to try to avoid even somewhat bad cases.  I timed the old
code with a device under memory pressure and it wasn't hard to see it
take more than 120 seconds to allocate 4 megs of memory! (NOTE: testing
was done on kernel 3.14, so possibly mainline would behave
differently).

A second problem is that allocating big chunks under memory pressure
when we don't need them is just not a great idea anyway unless we really
need them.  We can make due pretty well with smaller chunks so it's
probably wise to leave bigger chunks for other users once memory
pressure is on.

Let's adjust the allocation like this:

1. If a big chunk fails, stop trying to hard and bump down to lower
   order allocations.
2. Don't try useless orders.  The whole point of big chunks is to
   optimize the TLB and it can really only make use of 2M, 1M, 64K and
   4K sizes.

We'll still tend to eat up a bunch of big chunks, but that might be the
right answer for some users.  A future patch could possibly add a new
DMA_ATTR that would let the caller decide that TLB optimization isn't
important and that we should use smaller chunks.  Presumably this would
be a sane strategy for some callers.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-02-11 15:33:37 +00:00
..
abort-ev4.S ARM: entry: provide uaccess assembly macro hooks 2015-08-26 20:27:02 +01:00
abort-ev4t.S
abort-ev5t.S ARM: entry: provide uaccess assembly macro hooks 2015-08-26 20:27:02 +01:00
abort-ev5tj.S ARM: entry: provide uaccess assembly macro hooks 2015-08-26 20:27:02 +01:00
abort-ev6.S ARM: entry: provide uaccess assembly macro hooks 2015-08-26 20:27:02 +01:00
abort-ev7.S ARM: entry: provide uaccess assembly macro hooks 2015-08-26 20:27:02 +01:00
abort-lv4t.S ARM: entry: provide uaccess assembly macro hooks 2015-08-26 20:27:02 +01:00
abort-macro.S ARM: entry: provide uaccess assembly macro hooks 2015-08-26 20:27:02 +01:00
abort-nommu.S
alignment.c uaccess: reimplement probe_kernel_address() using probe_kernel_read() 2015-11-05 19:34:48 -08:00
cache-aurora-l2.h ARM: 7547/4: cache-l2x0: add support for Aurora L2 cache ctrl 2012-11-06 19:47:35 +00:00
cache-fa.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
cache-feroceon-l2.c ARM: 8416/1: Feroceon: use of_iomap() to map register base 2015-08-18 14:00:30 +01:00
cache-l2x0.c ARM: 8482/1: l2x0: make it possible to disable outer sync from DT 2015-12-22 12:15:53 +00:00
cache-nop.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
cache-tauros2.c ARM: convert printk(KERN_* to pr_* 2014-11-21 15:24:50 +00:00
cache-tauros3.h ARM: 7922/1: l2x0: add Marvell Tauros3 support 2013-12-29 12:32:47 +00:00
cache-uniphier.c ARM: 8462/1: cache-uniphier: use common API to find the next level cache 2015-12-03 00:03:09 +00:00
cache-v4.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
cache-v4wb.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
cache-v4wt.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
cache-v6.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
cache-v7.S ARM: cache-v7: optimise test for Cortex A9 r0pX devices 2015-04-14 22:26:52 +01:00
cache-xsc3l2.c
context.c ARM: 8465/1: mm: keep reserved ASIDs in sync with mm after multiple rollovers 2015-12-02 23:57:54 +00:00
copypage-fa.c
copypage-feroceon.c
copypage-v4mc.c
copypage-v4wb.c
copypage-v4wt.c
copypage-v6.c ARM: 8236/1: mm: fix discard_old_kernel_data 2014-12-03 16:00:04 +00:00
copypage-xsc3.c
copypage-xscale.c
dma-mapping.c ARM: 8505/1: dma-mapping: Optimize allocation 2016-02-11 15:33:37 +00:00
dma.h ARM: reduce visibility of dmac_* functions 2015-08-01 22:25:04 +01:00
dump.c ARM: 8249/1: mm: dump: don't skip regions 2015-01-07 20:33:33 +00:00
extable.c ARM: 7876/1: clear Thumb-2 IT state on exception handling 2013-11-07 00:15:49 +00:00
fault-armv.c ARM: convert printk(KERN_* to pr_* 2014-11-21 15:24:50 +00:00
fault.c ARM: 8447/1: catch pending imprecise abort on unmask 2015-10-19 17:08:33 +01:00
fault.h ARM: 8447/1: catch pending imprecise abort on unmask 2015-10-19 17:08:33 +01:00
flush.c mm: differentiate page_mapped() from page_mapcount() for compound pages 2016-01-15 17:56:32 -08:00
fsr-2level.c
fsr-3level.c ARM: mm: Transparent huge page support for LPAE systems. 2013-06-04 16:52:38 +01:00
highmem.c kmap_atomic_to_page() has no users, remove it 2015-11-09 15:11:24 -08:00
hugetlbpage.c mm/hugetlb: reduce arch dependent code about huge_pmd_unshare 2015-06-24 17:49:41 -07:00
idmap.c ARM: make virt_to_idmap() return unsigned long 2016-02-08 15:47:28 +00:00
init.c ARM: 8501/1: mm: flip priority of CONFIG_DEBUG_RODATA 2016-02-08 15:56:45 +00:00
iomap.c
ioremap.c ARM: add support for generic early_ioremap/early_memremap 2015-12-13 19:18:28 +01:00
Kconfig ARM: 8501/1: mm: flip priority of CONFIG_DEBUG_RODATA 2016-02-08 15:56:45 +00:00
l2c-common.c ARM: outer cache: add WARN_ON() to outer_disable() 2014-05-30 00:47:23 +01:00
l2c-l2x0-resume.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
Makefile ARM: uniphier: add outer cache support 2015-10-27 09:20:50 +09:00
mm.h ARM: provide common method to clear bits in CPU control register 2014-06-02 09:20:11 +01:00
mmap.c arm: mm: support ARCH_MMAP_RND_BITS 2016-01-14 16:00:49 -08:00
mmu.c ARM: SoC multiplatform code changes for v4.5 2016-01-20 18:03:56 -08:00
nommu.c ARM: io: convert ioremap*() to functions 2015-07-03 17:06:56 +01:00
pabort-legacy.S
pabort-v6.S
pabort-v7.S
pageattr.c ARM: 8311/1: Don't use is_module_addr in setting page attributes 2015-03-18 10:13:46 +00:00
pgd.c ARM: domains: keep vectors in separate domain 2015-08-21 13:55:53 +01:00
proc-arm7tdmi.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-arm9tdmi.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-arm720.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-arm740.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-arm920.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-arm922.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-arm925.S ARM: 8349/1: arch/arm/mm/proc-arm925.S: remove dead #ifdef block 2015-05-03 23:22:27 +01:00
proc-arm926.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-arm940.S Merge branches 'misc', 'vdso' and 'fixes' into for-next 2015-04-14 22:28:25 +01:00
proc-arm946.S Merge branches 'misc', 'vdso' and 'fixes' into for-next 2015-04-14 22:28:25 +01:00
proc-arm1020.S ARM: 8348/1: remove comments on CPU_ARM1020_CPU_IDLE 2015-05-03 23:22:09 +01:00
proc-arm1020e.S ARM: 8348/1: remove comments on CPU_ARM1020_CPU_IDLE 2015-05-03 23:22:09 +01:00
proc-arm1022.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-arm1026.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-fa526.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-feroceon.S ARM: 8350/1: proc-feroceon: Fix feroceon_proc_info macro 2015-05-03 23:23:09 +01:00
proc-macros.S Merge branches 'misc', 'vdso' and 'fixes' into for-next 2015-04-14 22:28:25 +01:00
proc-mohawk.S ARM: mohawk: allow building with MMU disabled 2015-12-01 21:44:25 +01:00
proc-sa110.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-sa1100.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-syms.c ARM: modules: don't export cpu_set_pte_ext when !MMU 2013-03-26 09:55:34 +00:00
proc-v6.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-v7-2level.S Merge branches 'arnd-fixes', 'clk', 'misc', 'v7' and 'fixes' into for-next 2015-06-12 21:18:08 +01:00
proc-v7-3level.S ARM: redo TTBR setup code for LPAE 2015-06-01 23:48:19 +01:00
proc-v7.S Merge branches 'misc' and 'misc-rc6' into for-linus 2016-01-05 11:07:28 +00:00
proc-v7m.S ARM: 8451/1: v7-M: Set an early stack for __v7m_setup 2015-11-16 18:34:38 +00:00
proc-xsc3.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-xscale.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
pv-fixup-asm.S ARM: re-implement physical address space switching 2015-06-01 23:46:33 +01:00
tcm.h ARM: 7694/1: ARM, TCM: initialize TCM in paging_init(), instead of setup_arch() 2013-04-17 16:53:24 +01:00
tlb-fa.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
tlb-v4.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
tlb-v4wb.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
tlb-v4wbi.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
tlb-v6.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
tlb-v7.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00