linux/mm
Mel Gorman 56fd56b868 Bias the location of pages freed for min_free_kbytes in the same MAX_ORDER_NR_PAGES blocks
The standard buddy allocator always favours the smallest block of pages.
The effect of this is that the pages free to satisfy min_free_kbytes tends
to be preserved since boot time at the same location of memory ffor a very
long time and as a contiguous block.  When an administrator sets the
reserve at 16384 at boot time, it tends to be the same MAX_ORDER blocks
that remain free.  This allows the occasional high atomic allocation to
succeed up until the point the blocks are split.  In practice, it is
difficult to split these blocks but when they do split, the benefit of
having min_free_kbytes for contiguous blocks disappears.  Additionally,
increasing min_free_kbytes once the system has been running for some time
has no guarantee of creating contiguous blocks.

On the other hand, CONFIG_PAGE_GROUP_BY_MOBILITY favours splitting large
blocks when there are no free pages of the appropriate type available.  A
side-effect of this is that all blocks in memory tends to be used up and
the contiguous free blocks from boot time are not preserved like in the
vanilla allocator.  This can cause a problem if a new caller is unwilling
to reclaim or does not reclaim for long enough.

A failure scenario was found for a wireless network device allocating
order-1 atomic allocations but the allocations were not intense or frequent
enough for a whole block of pages to be preserved for MIGRATE_HIGHALLOC.
This was reproduced on a desktop by booting with mem=256mb, forcing the
driver to allocate at order-1, running a bittorrent client (downloading a
debian ISO) and building a kernel with -j2.

This patch addresses the problem on the desktop machine booted with
mem=256mb.  It works by setting aside a reserve of MAX_ORDER_NR_PAGES
blocks, the number of which depends on the value of min_free_kbytes.  These
blocks are only fallen back to when there is no other free pages.  Then the
smallest possible page is used just like the normal buddy allocator instead
of the largest possible page to preserve contiguous pages The pages in free
lists in the reserve blocks are never taken for another migrate type.  The
results is that even if min_free_kbytes is set to a low value, contiguous
blocks will be preserved in the MIGRATE_RESERVE blocks.

This works better than the vanilla allocator because if min_free_kbytes is
increased, a new reserve block will be chosen based on the location of
reclaimable pages and the block will free up as contiguous pages.  In the
vanilla allocator, no effort is made to target a block of pages to free as
contiguous pages and min_free_kbytes pages are scattered randomly.

This effect has been observed on the test machine.  min_free_kbytes was set
initially low but it was kept as a contiguous free block within
MIGRATE_RESERVE.  min_free_kbytes was then set to a higher value and over a
period of time, the free blocks were within the reserve and coalescing.
How long it takes to free up depends on how quickly LRU is rotating.
Amusingly, this means that more activity will free the blocks faster.

This mechanism potentially replaces MIGRATE_HIGHALLOC as it may be more
effective than grouping contiguous free pages together.  It all depends on
whether the number of active atomic high allocations exceeds
min_free_kbytes or not.  If the number of active allocations exceeds
min_free_kbytes, it's worth it but maybe in that situation, min_free_kbytes
should be set higher.  Once there are no more reports of allocation
failures, a patch will be submitted that backs out MIGRATE_HIGHALLOC and
see if the reports stay missing.

Credit to Mariusz Kozlowski for discovering the problem, describing the
failure scenario and testing patches and scenarios.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:00 -07:00
..
allocpercpu.c Slab allocators: Replace explicit zeroing with __GFP_ZERO 2007-07-17 10:23:02 -07:00
backing-dev.c remove mm/backing-dev.c:congestion_wait_interruptible() 2007-07-16 09:05:52 -07:00
bootmem.c [PATCH] remove EXPORT_UNUSED_SYMBOL'ed symbols 2006-12-07 08:39:44 -08:00
bounce.c Drop 'size' argument from bio_endio and bi_end_io 2007-10-10 09:25:57 +02:00
fadvise.c [PATCH] mm: change uses of f_{dentry,vfsmnt} to use f_path 2006-12-08 08:28:43 -08:00
filemap.c fs: remove some AOP_TRUNCATED_PAGE 2007-10-16 09:42:58 -07:00
filemap_xip.c mm: write iovec cleanup 2007-10-16 09:42:54 -07:00
fremap.c fix VM_CAN_NONLINEAR check in sys_remap_file_pages 2007-10-08 12:58:14 -07:00
highmem.c Create the ZONE_MOVABLE zone 2007-07-17 10:22:59 -07:00
hugetlb.c flush icache before set_pte() on ia64: flush icache at set_pte 2007-10-16 09:42:59 -07:00
internal.h Make page->private usable in compound pages 2007-05-07 12:12:53 -07:00
Kconfig vmemmap: generify initialisation via helpers 2007-10-16 09:42:51 -07:00
madvise.c speed up madvise_need_mmap_write() usage 2007-07-16 09:05:36 -07:00
Makefile Generic Virtual Memmap support for SPARSEMEM 2007-10-16 09:42:51 -07:00
memory.c flush icache before set_pte() on ia64: flush icache at set_pte 2007-10-16 09:42:59 -07:00
memory_hotplug.c Memoryless nodes: introduce mask of nodes with memory 2007-10-16 09:42:58 -07:00
mempolicy.c memoryless nodes: fixup uses of node_online_map in generic code 2007-10-16 09:42:59 -07:00
mempool.c Slab allocators: Replace explicit zeroing with __GFP_ZERO 2007-07-17 10:23:02 -07:00
migrate.c flush icache before set_pte() on ia64: flush icache at set_pte 2007-10-16 09:42:59 -07:00
mincore.c [PATCH] mincore: vma crossing fix 2007-02-15 09:57:03 -08:00
mlock.c do not limit locked memory when RLIMIT_MEMLOCK is RLIM_INFINITY 2007-07-16 09:05:37 -07:00
mmap.c fix NULL pointer dereference in __vm_enough_memory() 2007-08-22 19:52:45 -07:00
mmzone.c [PATCH] remove EXPORT_UNUSED_SYMBOL'ed symbols 2006-12-07 08:39:44 -08:00
mprotect.c flush icache before set_pte() on ia64: flush icache at set_pte 2007-10-16 09:42:59 -07:00
mremap.c mm: variable length argument support 2007-07-19 10:04:45 -07:00
msync.c Detach sched.h from mm.h 2007-05-21 09:18:19 -07:00
nommu.c fix NULL pointer dereference in __vm_enough_memory() 2007-08-22 19:52:45 -07:00
oom_kill.c Memoryless nodes: OOM: use N_HIGH_MEMORY map instead of constructing one on the fly 2007-10-16 09:42:58 -07:00
page-writeback.c memoryless nodes: fixup uses of node_online_map in generic code 2007-10-16 09:42:59 -07:00
page_alloc.c Bias the location of pages freed for min_free_kbytes in the same MAX_ORDER_NR_PAGES blocks 2007-10-16 09:43:00 -07:00
page_io.c Drop 'size' argument from bio_endio and bi_end_io 2007-10-10 09:25:57 +02:00
pdflush.c Freezer: make kernel threads nonfreezable by default 2007-07-17 10:23:02 -07:00
prio_tree.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
quicklist.c Quicklists for page table pages 2007-05-07 12:12:54 -07:00
readahead.c mm: buffered write cleanup 2007-10-16 09:42:54 -07:00
rmap.c flush icache before set_pte() on ia64: flush icache at set_pte 2007-10-16 09:42:59 -07:00
shmem.c Group short-lived and reclaimable kernel allocations 2007-10-16 09:43:00 -07:00
shmem_acl.c [PATCH] Fix typos in mm/shmem_acl.c 2006-10-11 11:14:23 -07:00
slab.c Group short-lived and reclaimable kernel allocations 2007-10-16 09:43:00 -07:00
slob.c Slab allocators: fail if ksize is called with a NULL parameter 2007-10-16 09:42:53 -07:00
slub.c Group short-lived and reclaimable kernel allocations 2007-10-16 09:43:00 -07:00
sparse-vmemmap.c vmemmap: generify initialisation via helpers 2007-10-16 09:42:51 -07:00
sparse.c Fix corruption of memmap on IA64 SPARSEMEM when mem_section is not a power of 2 2007-10-16 09:43:00 -07:00
swap.c mm: use pagevec to rotate reclaimable page 2007-10-16 09:42:54 -07:00
swap_state.c mm: clarify __add_to_swap_cache locking 2007-10-16 09:42:53 -07:00
swapfile.c Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION 2007-07-29 16:45:38 -07:00
thrash.c Bug in mm/thrash.c function grab_swap_token() 2007-05-11 08:29:32 -07:00
tiny-shmem.c [PATCH] mm/{,tiny-}shmem.c cleanups 2007-03-01 14:53:35 -08:00
truncate.c mm: merge populate and nopage into fault (fixes nonlinear) 2007-07-19 10:04:41 -07:00
util.c Slab allocators: fail if ksize is called with a NULL parameter 2007-10-16 09:42:53 -07:00
vmalloc.c Categorize GFP flags 2007-10-16 09:42:59 -07:00
vmscan.c make swappiness safer to use 2007-10-16 09:42:59 -07:00
vmstat.c Remove fs.h from mm.h 2007-07-29 17:09:29 -07:00