linux/Documentation
Kirill A. Shutemov c8d78c1823 mm: replace remap_file_pages() syscall with emulation
remap_file_pages(2) was invented to be able efficiently map parts of
huge file into limited 32-bit virtual address space such as in database
workloads.

Nonlinear mappings are pain to support and it seems there's no
legitimate use-cases nowadays since 64-bit systems are widely available.

Let's drop it and get rid of all these special-cased code.

The patch replaces the syscall with emulation which creates new VMA on
each remap_file_pages(), unless they it can be merged with an adjacent
one.

I didn't find *any* real code that uses remap_file_pages(2) to test
emulation impact on.  I've checked Debian code search and source of all
packages in ALT Linux.  No real users: libc wrappers, mentions in
strace, gdb, valgrind and this kind of stuff.

There are few basic tests in LTP for the syscall.  They work just fine
with emulation.

To test performance impact, I've written small test case which
demonstrate pretty much worst case scenario: map 4G shmfs file, write to
begin of every page pgoff of the page, remap pages in reverse order,
read every page.

The test creates 1 million of VMAs if emulation is in use, so I had to
set vm.max_map_count to 1100000 to avoid -ENOMEM.

Before:		23.3 ( +-  4.31% ) seconds
After:		43.9 ( +-  0.85% ) seconds
Slowdown:	1.88x

I believe we can live with that.

Test case:

        #define _GNU_SOURCE
        #include <assert.h>
        #include <stdlib.h>
        #include <stdio.h>
        #include <sys/mman.h>

        #define MB	(1024UL * 1024)
        #define SIZE	(4096 * MB)

        int main(int argc, char **argv)
        {
                unsigned long *p;
                long i, pass;

                for (pass = 0; pass < 10; pass++) {
                        p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
                                        MAP_SHARED | MAP_ANONYMOUS, -1, 0);
                        if (p == MAP_FAILED) {
                                perror("mmap");
                                return -1;
                        }

                        for (i = 0; i < SIZE / 4096; i++)
                                p[i * 4096 / sizeof(*p)] = i;

                        for (i = 0; i < SIZE / 4096; i++) {
                                if (remap_file_pages(p + i * 4096 / sizeof(*p), 4096,
                                                0, (SIZE - 4096 * (i + 1)) >> 12, 0)) {
                                        perror("remap_file_pages");
                                        return -1;
                                }
                        }

                        for (i = SIZE / 4096 - 1; i >= 0; i--)
                                assert(p[i * 4096 / sizeof(*p)] == SIZE / 4096 - i - 1);

                        munmap(p, SIZE);
                }

                return 0;
        }

[akpm@linux-foundation.org: fix spello]
[sasha.levin@oracle.com: initialize populate before usage]
[sasha.levin@oracle.com: grab file ref to prevent race while mmaping]
Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Armin Rigo <arigo@tunes.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10 14:30:30 -08:00
..
ABI perf/core improvements and fixes: 2015-01-28 15:43:15 +01:00
accounting
acpi
aoe
arm Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm 2014-12-12 15:26:48 -08:00
arm64
auxdisplay
backlight
blackfin
block Merge branch 'for-3.19/core' of git://git.kernel.dk/linux-block 2014-12-13 14:14:23 -08:00
blockdev
bus-devices
cdrom
cgroups Update of Documentation/cgroups/00-INDEX 2015-01-05 09:19:32 -05:00
connector
console
cpu-freq
cpuidle
cris
crypto
development-process
device-mapper
devicetree Merge branch 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata 2015-02-09 19:40:42 -08:00
dmaengine
DocBook media updates for v3.19-rc1 2014-12-18 20:14:49 -08:00
driver-model
dvb
early-userspace
EDID
extcon
fault-injection
fb
filesystems fsioctl.c: make generic_block_fiemap() signal-tolerant 2015-02-10 14:30:30 -08:00
firmware_class
fmc
frv
gpio This is the bulk of GPIO changes for the v3.19 series: 2014-12-14 14:05:05 -08:00
hid
hwmon hwmon: (ina2xx) Add ina231 compatible string 2015-01-25 21:23:59 -08:00
i2c
i2o
ia64
ide
infiniband
input Docs changes for the 3.19 merge window 2014-12-12 14:42:48 -08:00
ioctl
isdn
ja_JP
kbuild
kdump kernel: add panic_on_warn 2014-12-10 17:41:10 -08:00
ko_KR
laptops
leds
locking locking/Documentation: Update code path 2015-01-16 09:09:21 +01:00
m68k
memory-devices
metag
mic
mips
misc-devices
mmc
mn10300
mtd
namespaces
netlabel
networking Documentation: Update netlink_mmap.txt 2015-02-02 18:50:00 -08:00
nfc
nios2
parisc
PCI
pcmcia
phy
platform
power Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm 2014-12-12 15:26:48 -08:00
powerpc
pps
prctl
pti
ptp
rapidio
RCU Merge branches 'doc.2015.01.07a', 'fixes.2015.01.15a', 'preempt.2015.01.06a', 'srcu.2015.01.06a', 'stall.2015.01.16a' and 'torture.2015.01.11a' into HEAD 2015-01-15 23:34:34 -08:00
s390
scheduler
scsi Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2014-12-12 10:08:06 -08:00
security
serial
sh
sound
spi
sysctl ipc/msg: increase MSGMNI, remove scaling 2014-12-13 12:42:52 -08:00
target Documentation/target: Update fabric_ops to latest code 2015-01-06 13:46:49 -08:00
thermal Documentation: thermal: document of_cpufreq_cooling_register() 2015-01-06 14:39:17 -04:00
timers
tpm
trace Char/Misc driver patches for 3.19-rc1 2014-12-14 16:43:47 -08:00
usb USB patches for 3.19-rc1 2014-12-14 14:57:16 -08:00
vDSO
video4linux [media] vivid.txt: document new controls 2014-12-16 23:21:37 -02:00
virtual Second round of changes for KVM for arm/arm64 for v3.19; fixes reboot 2014-12-15 13:06:40 +01:00
vm mm: replace remap_file_pages() syscall with emulation 2015-02-10 14:30:30 -08:00
w1
watchdog
wimax
x86 x86, entry: Switch stacks on a paranoid entry from userspace 2015-01-02 10:22:45 -08:00
xtensa
zh_CN
00-INDEX
applying-patches.txt
assoc_array.txt
atomic_ops.txt
bad_memory.txt
basic_profiling.txt
bcache.txt
binfmt_misc.txt
braille-console.txt
bt8xxgpio.txt
btmrvl.txt
BUG-HUNTING
bus-virt-phys-mapping.txt
cachetlb.txt
Changes
circular-buffers.txt
clk.txt
coccinelle.txt
CodingStyle
cpu-hotplug.txt
cpu-load.txt
cputopology.txt
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt
dell_rbu.txt
devices.txt
digsig.txt
DMA-API-HOWTO.txt
DMA-API.txt
DMA-attributes.txt
dma-buf-sharing.txt
DMA-ISA-LPC.txt
dontdiff
dynamic-debug-howto.txt
edac.txt
efi-stub.txt
eisa.txt
email-clients.txt
flexible-arrays.txt
futex-requeue-pi.txt doc: Fix misnamed FUTEX_CMP_REQUEUE_PI op constants 2015-01-19 12:05:32 +01:00
gcov.txt
highuid.txt
HOWTO
hsi.txt
hw_random.txt
hwspinlock.txt
init.txt
initrd.txt
Intel-IOMMU.txt
intel_txt.txt
io-mapping.txt
io_ordering.txt
iostats.txt
IPMI.txt ipmi: Add SMBus interface driver (SSIF) 2014-12-11 15:04:11 -06:00
IRQ-affinity.txt
IRQ-domain.txt
IRQ.txt
irqflags-tracing.txt
isapnp.txt
java.txt
kernel-doc-nano-HOWTO.txt
kernel-docs.txt
kernel-parameters.txt Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2015-01-19 04:55:23 +12:00
kernel-per-CPU-kthreads.txt
kmemcheck.txt
kmemleak.txt
kobject.txt
kprobes.txt
kref.txt
kselftest.txt
ldm.txt
local_ops.txt percpu: update local_ops.txt to reflect this_cpu operations 2014-12-13 12:42:53 -08:00
lockup-watchdogs.txt
logo.gif
logo.txt
lzo.txt
magic-number.txt
mailbox.txt
Makefile
ManagementStyle
md.txt
media-framework.txt
memory-barriers.txt documentation: Fix smp typo in memory-barriers.txt 2015-01-07 08:59:32 -08:00
memory-hotplug.txt
module-signing.txt
mono.txt
nommu-mmap.txt
numastat.txt
oops-tracing.txt
padata.txt
parport-lowlevel.txt
parport.txt
percpu-rw-semaphore.txt
phy.txt
pi-futex.txt
pinctrl.txt
pnp.txt
preempt-locking.txt
printk-formats.txt
pwm.txt
ramoops.txt pstore-ram: Allow optional mapping with pgprot_noncached 2014-12-11 13:38:31 -08:00
rbtree.txt
remoteproc.txt
rfkill.txt
robust-futex-ABI.txt
robust-futexes.txt
rpmsg.txt
rtc.txt
SAK.txt
SecurityBugs
serial-console.txt
sgi-ioc4.txt
SM501.txt
smsc_ece1099.txt
sparse.txt
stable_api_nonsense.txt
stable_kernel_rules.txt
static-keys.txt
SubmitChecklist
SubmittingDrivers
SubmittingPatches
svga.txt
sysfs-rules.txt
sysrq.txt
this_cpu_ops.txt
unaligned-memory-access.txt
unicode.txt
unshare.txt
vfio.txt
VGA-softcursor.txt
vgaarbiter.txt
video-output.txt
vme_api.txt
volatile-considered-harmful.txt
workqueue.txt
xillybus.txt
xz.txt
zorro.txt