linux/drivers
Jason Wang 955abe0a1b vduse: avoid using __GFP_NOFAIL
Patch series "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL
and improve related doc and warn", v4.

__GFP_NOFAIL carries the semantics of never failing, so its callers do not
check the return value:

  %__GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
  cannot handle allocation failures. The allocation could block
  indefinitely but will never return with failure. Testing for
  failure is pointless.

However, __GFP_NOFAIL can sometimes fail if it exceeds size limits or is
used with GFP_ATOMIC/GFP_NOWAIT in a non-sleepable context.  This patchset
handles illegal using __GFP_NOFAIL together with GFP_ATOMIC lacking
__GFP_DIRECT_RECLAIM(without this, we can't do anything to reclaim memory
to satisfy the nofail requirement) and improve related document and
warnings.

The proper size limits for __GFP_NOFAIL will be handled separately after
more discussions.


This patch (of 3):

mm doesn't support non-blockable __GFP_NOFAIL allocation.  Because
persisting in providing __GFP_NOFAIL services for non-block users who
cannot perform direct memory reclaim may only result in an endless busy
loop.

Therefore, in such cases, the current mm-core may directly return a NULL
pointer:

static inline struct page *
__alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
                                                struct alloc_context *ac)
{
        ...
        if (gfp_mask & __GFP_NOFAIL) {
                /*
                 * All existing users of the __GFP_NOFAIL are blockable, so warn
                 * of any new users that actually require GFP_NOWAIT
                 */
                if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
                        goto fail;
                ...
        }
        ...
fail:
        warn_alloc(gfp_mask, ac->nodemask,
                        "page allocation failure: order:%u", order);
got_pg:
        return page;
}

Unfortuantely, vpda does that nofail allocation under non-sleepable lock. 
A possible way to fix that is to move the pages allocation out of the lock
into the caller, but having to allocate a huge number of pages and
auxiliary page array seems to be problematic as well per Tetsuon: " You
should implement proper error handling instead of using __GFP_NOFAIL if
count can become large."

So I chose another way, which does not release kernel bounce pages when
user tries to register userspace bounce pages.  Then we can avoid
allocating in paths where failure is not expected.(e.g in the release). 
We pay this for more memory usage as we don't release kernel bounce pages
but further optimizations could be done on top.

[v-songbaohua@oppo.com: Refine the changelog]
Link: https://lkml.kernel.org/r/20240830202823.21478-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20240830202823.21478-2-21cnbao@gmail.com
Fixes: 6c77ed2288 ("vduse: Support using userspace pages as bounce buffer")
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Tested-by: Xie Yongji <xieyongji@bytedance.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hailong.Liu <hailong.liu@oppo.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Eugenio Pérez" <eperezma@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 16:39:02 -07:00
..
accel
accessibility treewide: remove unnecessary <linux/version.h> inclusion 2024-08-12 18:36:44 +09:00
acpi mm: introduce numa_memblks 2024-09-03 21:15:30 -07:00
amba driver core: have match() callback in struct bus_type take a const * 2024-07-03 15:16:54 +02:00
android binder_alloc: Fix sleeping function called from invalid context 2024-07-31 13:48:25 +02:00
ata ata: pata_macio: Use WARN instead of BUG 2024-08-21 14:33:23 +09:00
atm atm: idt77252: prevent use after free in dequeue_rx() 2024-08-12 10:41:44 +01:00
auxdisplay auxdisplay updates for v6.11 2024-07-26 11:04:28 -07:00
base arch_numa: switch over to numa_memblks 2024-09-03 21:15:32 -07:00
bcma driver core: have match() callback in struct bus_type take a const * 2024-07-03 15:16:54 +02:00
block zsmalloc: use all available 24 bits of page_type 2024-09-03 21:15:43 -07:00
bluetooth Bluetooth: btnxpuart: Fix random crash seen while removing driver 2024-08-23 15:56:04 -04:00
bus Devicetree fixes for 6.11, part 1 2024-07-27 12:46:16 -07:00
cache cache: StarFive: Require a 64-bit system 2024-08-01 07:15:02 -07:00
cdrom sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
cdx driver core: have match() callback in struct bus_type take a const * 2024-07-03 15:16:54 +02:00
char tpm: ibmvtpm: Call tpm2_sessions_init() to initialize session support 2024-08-27 21:11:44 +03:00
clk clk: thead: fix dependency on clk_ignore_unused 2024-07-31 14:51:47 -07:00
clocksource of: remove internal arguments from of_property_for_each_u32() 2024-07-25 06:53:47 -05:00
comedi
connector
counter Char/Misc and other driver changes for 6.11-rc1 2024-07-19 15:55:08 -07:00
cpufreq cpufreq/amd-pstate-ut: Don't check for highest perf matching on prefcore 2024-08-23 11:07:58 -05:00
cpuidle cpuidle: teo: Don't count non-existent intercepts 2024-07-01 18:58:55 +02:00
crypto ARM: 2024-07-20 12:41:03 -07:00
cxl mm: make range-to-target_node lookup facility a part of numa_memblks 2024-09-03 21:15:32 -07:00
dax mm: make range-to-target_node lookup facility a part of numa_memblks 2024-09-03 21:15:32 -07:00
dca Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
devfreq
dio dio: Have dio_bus_match() callback take a const * 2024-07-10 15:38:14 +02:00
dma dmaengine: dw-edma: Do not enable watermark interrupts for HDMA 2024-08-28 18:40:17 +05:30
dma-buf - 875fa64577da ("mm/hugetlb_vmemmap: fix race with speculative PFN 2024-07-21 17:15:46 -07:00
dpll
edac minmax: make generic MIN() and MAX() macros available everywhere 2024-07-28 15:49:18 -07:00
eisa driver core: have match() callback in struct bus_type take a const * 2024-07-03 15:16:54 +02:00
extcon
firewire Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
firmware mm: rework accept memory helpers 2024-09-01 20:26:07 -07:00
fpga Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
fsi fsi: add missing MODULE_DESCRIPTION() macros 2024-07-31 13:40:00 +02:00
gnss
gpio gpio: mlxbf3: Support shutdown() function 2024-08-10 21:35:16 +02:00
gpu mm: kvmalloc: align kvrealloc() with krealloc() 2024-09-01 20:25:44 -07:00
greybus Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
hid hid-for-linus-2024081901 2024-08-19 11:02:13 -07:00
hsi Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
hte
hv Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
hwmon hwmon: (pt5161l) Fix invalid temperature reading 2024-08-26 20:58:05 -07:00
hwspinlock
hwtracing Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
i2c i2c: tegra: Do not mark ACPI devices as irq safe 2024-08-15 00:22:28 +02:00
i3c I3C for 6.11 2024-07-27 10:53:06 -07:00
idle
iio of: remove internal arguments from of_property_for_each_u32() 2024-07-25 06:53:47 -05:00
infiniband IOMMU Updates for Linux v6.11 2024-07-19 09:59:58 -07:00
input Input updates for v6.11-rc5 2024-08-31 15:32:38 +12:00
interconnect Char/Misc and other driver changes for 6.11-rc1 2024-07-19 15:55:08 -07:00
iommu IOMMU Fixes for Linux v6.11-rc5 2024-08-31 06:11:34 +12:00
ipack driver core: have match() callback in struct bus_type take a const * 2024-07-03 15:16:54 +02:00
irqchip irqchip/riscv-aplic: Retrigger MSI interrupt on source configuration 2024-08-10 10:42:04 +02:00
isdn mISDN: Fix a use after free in hfcmulti_tx() 2024-07-25 08:05:05 -07:00
leds - Core Frameworks 2024-07-17 17:51:30 -07:00
macintosh sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-25 12:58:36 -07:00
mailbox mailbox: mtk-cmdq: Move devm_mbox_controller_register() after devm_pm_runtime_enable() 2024-07-19 21:25:23 -05:00
mcb Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
md block-6.11-20240824 2024-08-16 14:03:31 -07:00
media media fixes for v6.11-rc4 2024-08-15 10:23:19 -07:00
memory
memstick Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
message
mfd Devicetree fixes for 6.11, part 1 2024-07-27 12:46:16 -07:00
misc Char/Misc fixes for 6.11-rc4 2024-08-18 10:16:34 -07:00
mmc mmc: mmc_test: Fix NULL dereference on allocation failure 2024-08-20 13:47:36 +02:00
most Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
mtd This pull request contains updates (actually, just fixes) for UBI and UBIFS: 2024-07-28 11:51:51 -07:00
mux
net Regressions: 2024-08-28 16:54:45 -07:00
nfc nfc: pn533: Add poll mod list filling check 2024-08-29 12:08:44 +02:00
ntb Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
nubus
nvdimm nvdimm/pmem: Set dax flag for all 'PFN_MAP' cases 2024-08-09 14:29:58 -05:00
nvme nvme: Remove unused field 2024-08-22 13:28:40 -07:00
nvmem Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
of of, numa: return -EINVAL when no numa-node-id is found 2024-09-03 21:15:32 -07:00
opp Merge branches 'pm-opp' and 'pm-tools' 2024-07-15 18:55:14 +02:00
parisc
parport sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-25 12:58:36 -07:00
pci pci-v6.11-fixes-2 2024-08-31 14:54:11 +12:00
pcmcia Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
peci Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
perf perf: riscv: Fix selecting counters in legacy mode 2024-08-01 07:15:13 -07:00
phy phy: xilinx: phy-zynqmp: Fix SGMII linkup failure on resume 2024-08-05 21:46:58 +05:30
pinctrl pinctrl: rockchip: correct RK3328 iomux width flag for GPIO2-B pins 2024-08-24 16:39:51 +02:00
platform platform-drivers-x86 for v6.11-5 2024-08-29 07:12:02 +12:00
pmdomain pmdomain: imx: wait SSAR when i.MX93 power domain on 2024-08-15 12:47:09 +02:00
pnp driver core: have match() callback in struct bus_type take a const * 2024-07-03 15:16:54 +02:00
power power sequencing fixes for v6.11-rc6 2024-09-01 09:07:44 +12:00
powercap
pps Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
ps3
ptp Networking changes for 6.11. Not much excitement - a handful of large 2024-07-16 19:28:34 -07:00
pwm of: remove internal arguments from of_property_for_each_u32() 2024-07-25 06:53:47 -05:00
rapidio driver core: have match() callback in struct bus_type take a const * 2024-07-03 15:16:54 +02:00
ras - The AMD memory controllers data fabric version 4.5 supports 2024-07-15 18:20:24 -07:00
regulator regulator: Fixes for v6.11 2024-07-27 12:27:52 -07:00
remoteproc rpmsg updates for v6.11 2024-07-23 13:41:59 -07:00
reset Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
rpmsg Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
rtc rtc: stm32: add new st,stm32mp25-rtc compatible and check RIF configuration 2024-07-10 17:15:33 +02:00
s390 s390 updates for 6.11-rc5 2024-08-25 12:05:23 +12:00
sbus sbus: add missing MODULE_DESCRIPTION() macros 2024-07-11 15:42:03 +02:00
scsi SCSI fixes on 20240831 2024-09-01 07:00:38 +12:00
sh driver core: have match() callback in struct bus_type take a const * 2024-07-03 15:16:54 +02:00
siox Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
slimbus Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
soc Qualcomm driver fixes for v6.11 2024-08-28 20:27:39 +00:00
soundwire soundwire: stream: fix programming slave ports for non-continous port maps 2024-08-17 22:55:05 +05:30
spi spi: pxa2xx: Move PM runtime handling to the glue drivers 2024-08-22 13:34:06 +01:00
spmi spmi: pmic-arb: add missing newline in dev_err format strings 2024-07-31 13:49:28 +02:00
ssb driver core: have match() callback in struct bus_type take a const * 2024-07-03 15:16:54 +02:00
staging Kbuild fixes for v6.11 (2nd) 2024-08-23 07:43:15 +08:00
target
tc driver core: have match() callback in struct bus_type take a const * 2024-07-03 15:16:54 +02:00
tee Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
thermal thermal: of: Fix OF node leak in of_thermal_zone_find() error paths 2024-08-22 20:58:49 +02:00
thunderbolt thunderbolt: Mark XDomain as unplugged when router is removed 2024-08-06 08:01:10 +03:00
tty Revert "serial: 8250_omap: Set the console genpd always on if no console suspend" 2024-08-15 07:22:10 +02:00
ufs scsi: ufs: qcom: Add UFSHCD_QUIRK_BROKEN_LSDBS_CAP for SM8550 SoC 2024-08-16 21:09:17 -04:00
uio
usb USB fixes for 6.11-rc6 2024-09-01 07:06:28 +12:00
vdpa vduse: avoid using __GFP_NOFAIL 2024-09-09 16:39:02 -07:00
vfio Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
vhost virtio: bugfix 2024-08-06 10:58:28 -07:00
video video/aperture: optionally match the device in sysfb_disable() 2024-08-26 19:14:48 -04:00
virt ARM: 2024-07-20 12:41:03 -07:00
virtio virtio: fixes 2024-07-29 12:53:37 -07:00
w1
watchdog linux-watchdog 6.11-rc1 tag 2024-07-25 10:18:35 -07:00
xen Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
zorro Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
Kconfig
Makefile