mirror of
https://github.com/torvalds/linux
synced 2024-11-05 18:23:50 +00:00
70f665fe77
Set-associative caches on all v7 implementations map the index bits to physical addresses LSBs and tag bits to MSBs. As the last level of cache on current and upcoming ARM systems grows in size, this means that under normal DRAM controller configurations, the current v7 cache flush routine using set/way operations triggers a DRAM memory controller precharge/activate for every cache line writeback since the cache routine cleans lines by first fixing the index and then looping through ways (index bits are mapped to lower physical addresses on all v7 cache implementations; this means that, with last level cache sizes in the order of MBytes, lines belonging to the same set but different ways map to different DRAM pages). Given the random content of cache tags, swapping the order between indexes and ways loops do not prevent DRAM pages precharge and activate cycles but at least, on average, improves the chances that either multiple lines hit the same page or multiple lines belong to different DRAM banks, improving throughput significantly. This patch swaps the inner loops in the v7 cache flushing routine to carry out the clean operations first on all sets belonging to a given way (looping through sets) and then decrementing the way. Benchmarks showed that by swapping the ordering in which sets and ways are decremented in the v7 cache flushing routine, that uses set/way operations, time required to flush caches is reduced significantly, owing to improved writebacks throughput to the DRAM controller. Benchmarks results vary and depend heavily on the last level of cache tag RAM content when cache is cleaned and invalidated, ranging from 2x throughput when all tag RAM entries contain dirty lines mapping to sequential pages of RAM to 1x (ie no improvement) when all tag RAM accesses trigger a DRAM precharge/activate cycle, as the current code implies on most DRAM controller configurations. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> |
||
---|---|---|
.. | ||
abort-ev4.S | ||
abort-ev4t.S | ||
abort-ev5t.S | ||
abort-ev5tj.S | ||
abort-ev6.S | ||
abort-ev7.S | ||
abort-lv4t.S | ||
abort-macro.S | ||
abort-nommu.S | ||
alignment.c | ||
cache-aurora-l2.h | ||
cache-fa.S | ||
cache-feroceon-l2.c | ||
cache-l2x0.c | ||
cache-nop.S | ||
cache-tauros2.c | ||
cache-v4.S | ||
cache-v4wb.S | ||
cache-v4wt.S | ||
cache-v6.S | ||
cache-v7.S | ||
cache-xsc3l2.c | ||
context.c | ||
copypage-fa.c | ||
copypage-feroceon.c | ||
copypage-v4mc.c | ||
copypage-v4wb.c | ||
copypage-v4wt.c | ||
copypage-v6.c | ||
copypage-xsc3.c | ||
copypage-xscale.c | ||
dma-mapping.c | ||
extable.c | ||
fault-armv.c | ||
fault.c | ||
fault.h | ||
flush.c | ||
fsr-2level.c | ||
fsr-3level.c | ||
highmem.c | ||
hugetlbpage.c | ||
idmap.c | ||
init.c | ||
iomap.c | ||
ioremap.c | ||
Kconfig | ||
Makefile | ||
mm.h | ||
mmap.c | ||
mmu.c | ||
nommu.c | ||
pabort-legacy.S | ||
pabort-v6.S | ||
pabort-v7.S | ||
pgd.c | ||
proc-arm7tdmi.S | ||
proc-arm9tdmi.S | ||
proc-arm720.S | ||
proc-arm740.S | ||
proc-arm920.S | ||
proc-arm922.S | ||
proc-arm925.S | ||
proc-arm926.S | ||
proc-arm940.S | ||
proc-arm946.S | ||
proc-arm1020.S | ||
proc-arm1020e.S | ||
proc-arm1022.S | ||
proc-arm1026.S | ||
proc-fa526.S | ||
proc-feroceon.S | ||
proc-macros.S | ||
proc-mohawk.S | ||
proc-sa110.S | ||
proc-sa1100.S | ||
proc-syms.c | ||
proc-v6.S | ||
proc-v7-2level.S | ||
proc-v7-3level.S | ||
proc-v7.S | ||
proc-v7m.S | ||
proc-xsc3.S | ||
proc-xscale.S | ||
tcm.h | ||
tlb-fa.S | ||
tlb-v4.S | ||
tlb-v4wb.S | ||
tlb-v4wbi.S | ||
tlb-v6.S | ||
tlb-v7.S |