From f1868165d2d737a85dfce18e1bfada86d0ed4f92 Mon Sep 17 00:00:00 2001 From: Xiongwei Song Date: Fri, 15 Dec 2023 11:41:47 +0800 Subject: [PATCH 01/17] Documentation: kernel-parameters: remove noaliencache Since slab allocator has already been removed. There is no users of the noaliencache parameter, so let's remove it. Suggested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Kees Cook Reviewed-by: Vlastimil Babka Signed-off-by: Xiongwei Song Signed-off-by: Vlastimil Babka --- Documentation/admin-guide/kernel-parameters.txt | 4 ---- 1 file changed, 4 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 31b3a25680d0..89c67c2403fc 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3753,10 +3753,6 @@ no5lvl [X86-64,RISCV] Disable 5-level paging mode. Forces kernel to use 4-level paging instead. - noaliencache [MM, NUMA, SLAB] Disables the allocation of alien - caches in the slab allocator. Saves per-node memory, - but will impact performance. - noalign [KNL,ARM] noaltinstr [S390] Disables alternative instructions patching From 671776b32b26d0cb625bf834170e982fda712cab Mon Sep 17 00:00:00 2001 From: Xiongwei Song Date: Fri, 15 Dec 2023 11:41:48 +0800 Subject: [PATCH 02/17] mm/slub: unify all sl[au]b parameters with "slab_$param" Since the SLAB allocator has been removed, so we can clean up the sl[au]b_$params. With only one slab allocator left, it's better to use the generic "slab" term instead of "slub" which is an implementation detail, which is pointed out by Vlastimil Babka. For more information please see [1]. Hence, we are going to use "slab_$param" as the primary prefix. This patch is changing the following slab parameters - slub_max_order - slub_min_order - slub_min_objects - slub_debug to - slab_max_order - slab_min_order - slab_min_objects - slab_debug as the primary slab parameters for all references of them in docs and comments. But this patch won't change variables and functions inside slub as we will have wider slub/slab change. Meanwhile, "slub_$params" can also be passed by command line, which is to keep backward compatibility. Also mark all "slub_$params" as legacy. Remove the separate descriptions for slub_[no]merge, append legacy tip for them at the end of descriptions of slab_[no]merge. [1] https://lore.kernel.org/linux-mm/7512b350-4317-21a0-fab3-4101bc4d8f7a@suse.cz/ Signed-off-by: Xiongwei Song Reviewed-by: Vlastimil Babka Signed-off-by: Vlastimil Babka --- .../admin-guide/kernel-parameters.txt | 75 +++++++++---------- drivers/misc/lkdtm/heap.c | 2 +- mm/Kconfig.debug | 6 +- mm/slab.h | 2 +- mm/slab_common.c | 4 +- mm/slub.c | 41 +++++----- 6 files changed, 64 insertions(+), 66 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 89c67c2403fc..aefd68cec2e9 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -5891,11 +5891,42 @@ simeth= [IA-64] simscsi= - slram= [HW,MTD] + slab_debug[=options[,slabs][;[options[,slabs]]...] [MM] + Enabling slab_debug allows one to determine the + culprit if slab objects become corrupted. Enabling + slab_debug can create guard zones around objects and + may poison objects when not in use. Also tracks the + last alloc / free. For more information see + Documentation/mm/slub.rst. + (slub_debug legacy name also accepted for now) + + slab_max_order= [MM] + Determines the maximum allowed order for slabs. + A high setting may cause OOMs due to memory + fragmentation. For more information see + Documentation/mm/slub.rst. + (slub_max_order legacy name also accepted for now) slab_merge [MM] Enable merging of slabs with similar size when the kernel is built without CONFIG_SLAB_MERGE_DEFAULT. + (slub_merge legacy name also accepted for now) + + slab_min_objects= [MM] + The minimum number of objects per slab. SLUB will + increase the slab order up to slab_max_order to + generate a sufficiently large slab able to contain + the number of objects indicated. The higher the number + of objects the smaller the overhead of tracking slabs + and the less frequently locks need to be acquired. + For more information see Documentation/mm/slub.rst. + (slub_min_objects legacy name also accepted for now) + + slab_min_order= [MM] + Determines the minimum page order for slabs. Must be + lower or equal to slab_max_order. For more information see + Documentation/mm/slub.rst. + (slub_min_order legacy name also accepted for now) slab_nomerge [MM] Disable merging of slabs with similar size. May be @@ -5909,47 +5940,9 @@ unchanged). Debug options disable merging on their own. For more information see Documentation/mm/slub.rst. + (slub_nomerge legacy name also accepted for now) - slab_max_order= [MM, SLAB] - Determines the maximum allowed order for slabs. - A high setting may cause OOMs due to memory - fragmentation. Defaults to 1 for systems with - more than 32MB of RAM, 0 otherwise. - - slub_debug[=options[,slabs][;[options[,slabs]]...] [MM, SLUB] - Enabling slub_debug allows one to determine the - culprit if slab objects become corrupted. Enabling - slub_debug can create guard zones around objects and - may poison objects when not in use. Also tracks the - last alloc / free. For more information see - Documentation/mm/slub.rst. - - slub_max_order= [MM, SLUB] - Determines the maximum allowed order for slabs. - A high setting may cause OOMs due to memory - fragmentation. For more information see - Documentation/mm/slub.rst. - - slub_min_objects= [MM, SLUB] - The minimum number of objects per slab. SLUB will - increase the slab order up to slub_max_order to - generate a sufficiently large slab able to contain - the number of objects indicated. The higher the number - of objects the smaller the overhead of tracking slabs - and the less frequently locks need to be acquired. - For more information see Documentation/mm/slub.rst. - - slub_min_order= [MM, SLUB] - Determines the minimum page order for slabs. Must be - lower than slub_max_order. - For more information see Documentation/mm/slub.rst. - - slub_merge [MM, SLUB] - Same with slab_merge. - - slub_nomerge [MM, SLUB] - Same with slab_nomerge. This is supported for legacy. - See slab_nomerge for more information. + slram= [HW,MTD] smart2= [HW] Format: [,[,...,]] diff --git a/drivers/misc/lkdtm/heap.c b/drivers/misc/lkdtm/heap.c index 4f467d3972a6..b1b316f99703 100644 --- a/drivers/misc/lkdtm/heap.c +++ b/drivers/misc/lkdtm/heap.c @@ -48,7 +48,7 @@ static void lkdtm_VMALLOC_LINEAR_OVERFLOW(void) * correctly. * * This should get caught by either memory tagging, KASan, or by using - * CONFIG_SLUB_DEBUG=y and slub_debug=ZF (or CONFIG_SLUB_DEBUG_ON=y). + * CONFIG_SLUB_DEBUG=y and slab_debug=ZF (or CONFIG_SLUB_DEBUG_ON=y). */ static void lkdtm_SLAB_LINEAR_OVERFLOW(void) { diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug index 321ab379994f..afc72fde0f03 100644 --- a/mm/Kconfig.debug +++ b/mm/Kconfig.debug @@ -64,11 +64,11 @@ config SLUB_DEBUG_ON help Boot with debugging on by default. SLUB boots by default with the runtime debug capabilities switched off. Enabling this is - equivalent to specifying the "slub_debug" parameter on boot. + equivalent to specifying the "slab_debug" parameter on boot. There is no support for more fine grained debug control like - possible with slub_debug=xxx. SLUB debugging may be switched + possible with slab_debug=xxx. SLUB debugging may be switched off in a kernel built with CONFIG_SLUB_DEBUG_ON by specifying - "slub_debug=-". + "slab_debug=-". config PAGE_OWNER bool "Track page owner" diff --git a/mm/slab.h b/mm/slab.h index 54deeb0428c6..f7df6d701c5b 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -528,7 +528,7 @@ static inline bool __slub_debug_enabled(void) #endif /* - * Returns true if any of the specified slub_debug flags is enabled for the + * Returns true if any of the specified slab_debug flags is enabled for the * cache. Use only for flags parsed by setup_slub_debug() as it also enables * the static key. */ diff --git a/mm/slab_common.c b/mm/slab_common.c index 238293b1dbe1..230ef7cc3467 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -282,7 +282,7 @@ kmem_cache_create_usercopy(const char *name, #ifdef CONFIG_SLUB_DEBUG /* - * If no slub_debug was enabled globally, the static key is not yet + * If no slab_debug was enabled globally, the static key is not yet * enabled by setup_slub_debug(). Enable it if the cache is being * created with any of the debugging flags passed explicitly. * It's also possible that this is the first cache created with @@ -766,7 +766,7 @@ EXPORT_SYMBOL(kmalloc_size_roundup); } /* - * kmalloc_info[] is to make slub_debug=,kmalloc-xx option work at boot time. + * kmalloc_info[] is to make slab_debug=,kmalloc-xx option work at boot time. * kmalloc_index() supports up to 2^21=2MB, so the final entry of the table is * kmalloc-2M. */ diff --git a/mm/slub.c b/mm/slub.c index 2ef88bbf56a3..e66bc888d23b 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -295,7 +295,7 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s) /* * Debugging flags that require metadata to be stored in the slab. These get - * disabled when slub_debug=O is used and a cache's min order increases with + * disabled when slab_debug=O is used and a cache's min order increases with * metadata. */ #define DEBUG_METADATA_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER) @@ -1616,7 +1616,7 @@ static inline int free_consistency_checks(struct kmem_cache *s, } /* - * Parse a block of slub_debug options. Blocks are delimited by ';' + * Parse a block of slab_debug options. Blocks are delimited by ';' * * @str: start of block * @flags: returns parsed flags, or DEBUG_DEFAULT_FLAGS if none specified @@ -1677,7 +1677,7 @@ parse_slub_debug_flags(char *str, slab_flags_t *flags, char **slabs, bool init) break; default: if (init) - pr_err("slub_debug option '%c' unknown. skipped\n", *str); + pr_err("slab_debug option '%c' unknown. skipped\n", *str); } } check_slabs: @@ -1736,7 +1736,7 @@ static int __init setup_slub_debug(char *str) /* * For backwards compatibility, a single list of flags with list of * slabs means debugging is only changed for those slabs, so the global - * slub_debug should be unchanged (0 or DEBUG_DEFAULT_FLAGS, depending + * slab_debug should be unchanged (0 or DEBUG_DEFAULT_FLAGS, depending * on CONFIG_SLUB_DEBUG_ON). We can extended that to multiple lists as * long as there is no option specifying flags without a slab list. */ @@ -1760,7 +1760,8 @@ static int __init setup_slub_debug(char *str) return 1; } -__setup("slub_debug", setup_slub_debug); +__setup("slab_debug", setup_slub_debug); +__setup_param("slub_debug", slub_debug, setup_slub_debug, 0); /* * kmem_cache_flags - apply debugging options to the cache @@ -1770,7 +1771,7 @@ __setup("slub_debug", setup_slub_debug); * * Debug option(s) are applied to @flags. In addition to the debug * option(s), if a slab name (or multiple) is specified i.e. - * slub_debug=,, ... + * slab_debug=,, ... * then only the select slabs will receive the debug option(s). */ slab_flags_t kmem_cache_flags(unsigned int object_size, @@ -3263,7 +3264,7 @@ slab_out_of_memory(struct kmem_cache *s, gfp_t gfpflags, int nid) oo_order(s->min)); if (oo_order(s->min) > get_order(s->object_size)) - pr_warn(" %s debugging increased min order, use slub_debug=O to disable.\n", + pr_warn(" %s debugging increased min order, use slab_debug=O to disable.\n", s->name); for_each_kmem_cache_node(s, node, n) { @@ -3792,11 +3793,11 @@ void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, zero_size = orig_size; /* - * When slub_debug is enabled, avoid memory initialization integrated + * When slab_debug is enabled, avoid memory initialization integrated * into KASAN and instead zero out the memory via the memset below with * the proper size. Otherwise, KASAN might overwrite SLUB redzones and * cause false-positive reports. This does not lead to a performance - * penalty on production builds, as slub_debug is not intended to be + * penalty on production builds, as slab_debug is not intended to be * enabled there. */ if (__slub_debug_enabled()) @@ -4702,8 +4703,8 @@ static unsigned int slub_min_objects; * activity on the partial lists which requires taking the list_lock. This is * less a concern for large slabs though which are rarely used. * - * slub_max_order specifies the order where we begin to stop considering the - * number of objects in a slab as critical. If we reach slub_max_order then + * slab_max_order specifies the order where we begin to stop considering the + * number of objects in a slab as critical. If we reach slab_max_order then * we try to keep the page order as low as possible. So we accept more waste * of space in favor of a small page order. * @@ -4770,14 +4771,14 @@ static inline int calculate_order(unsigned int size) * and backing off gradually. * * We start with accepting at most 1/16 waste and try to find the - * smallest order from min_objects-derived/slub_min_order up to - * slub_max_order that will satisfy the constraint. Note that increasing + * smallest order from min_objects-derived/slab_min_order up to + * slab_max_order that will satisfy the constraint. Note that increasing * the order can only result in same or less fractional waste, not more. * * If that fails, we increase the acceptable fraction of waste and try * again. The last iteration with fraction of 1/2 would effectively * accept any waste and give us the order determined by min_objects, as - * long as at least single object fits within slub_max_order. + * long as at least single object fits within slab_max_order. */ for (unsigned int fraction = 16; fraction > 1; fraction /= 2) { order = calc_slab_order(size, min_order, slub_max_order, @@ -4787,7 +4788,7 @@ static inline int calculate_order(unsigned int size) } /* - * Doh this slab cannot be placed using slub_max_order. + * Doh this slab cannot be placed using slab_max_order. */ order = get_order(size); if (order <= MAX_PAGE_ORDER) @@ -5313,7 +5314,9 @@ static int __init setup_slub_min_order(char *str) return 1; } -__setup("slub_min_order=", setup_slub_min_order); +__setup("slab_min_order=", setup_slub_min_order); +__setup_param("slub_min_order=", slub_min_order, setup_slub_min_order, 0); + static int __init setup_slub_max_order(char *str) { @@ -5326,7 +5329,8 @@ static int __init setup_slub_max_order(char *str) return 1; } -__setup("slub_max_order=", setup_slub_max_order); +__setup("slab_max_order=", setup_slub_max_order); +__setup_param("slub_max_order=", slub_max_order, setup_slub_max_order, 0); static int __init setup_slub_min_objects(char *str) { @@ -5335,7 +5339,8 @@ static int __init setup_slub_min_objects(char *str) return 1; } -__setup("slub_min_objects=", setup_slub_min_objects); +__setup("slab_min_objects=", setup_slub_min_objects); +__setup_param("slub_min_objects=", slub_min_objects, setup_slub_min_objects, 0); #ifdef CONFIG_HARDENED_USERCOPY /* From cb109a9d607056915d6a0dc8e1ebfc5854c72d92 Mon Sep 17 00:00:00 2001 From: Xiongwei Song Date: Fri, 15 Dec 2023 11:41:49 +0800 Subject: [PATCH 03/17] mm/slub: replace slub_$params with slab_$params in slub.rst We've unified slab parameters with "slab_$params", then we can use slab_$params in Documentation/mm/slub.rst. Signed-off-by: Xiongwei Song Reviewed-by: Vlastimil Babka Signed-off-by: Vlastimil Babka --- Documentation/mm/slub.rst | 60 +++++++++++++++++++-------------------- 1 file changed, 30 insertions(+), 30 deletions(-) diff --git a/Documentation/mm/slub.rst b/Documentation/mm/slub.rst index be75971532f5..6579a55b7852 100644 --- a/Documentation/mm/slub.rst +++ b/Documentation/mm/slub.rst @@ -9,7 +9,7 @@ SLUB can enable debugging only for selected slabs in order to avoid an impact on overall system performance which may make a bug more difficult to find. -In order to switch debugging on one can add an option ``slub_debug`` +In order to switch debugging on one can add an option ``slab_debug`` to the kernel command line. That will enable full debugging for all slabs. @@ -26,16 +26,16 @@ be enabled on the command line. F.e. no tracking information will be available without debugging on and validation can only partially be performed if debugging was not switched on. -Some more sophisticated uses of slub_debug: +Some more sophisticated uses of slab_debug: ------------------------------------------- -Parameters may be given to ``slub_debug``. If none is specified then full +Parameters may be given to ``slab_debug``. If none is specified then full debugging is enabled. Format: -slub_debug= +slab_debug= Enable options for all slabs -slub_debug=,,,... +slab_debug=,,,... Enable options only for select slabs (no spaces after a comma) @@ -60,23 +60,23 @@ Possible debug options are:: F.e. in order to boot just with sanity checks and red zoning one would specify:: - slub_debug=FZ + slab_debug=FZ Trying to find an issue in the dentry cache? Try:: - slub_debug=,dentry + slab_debug=,dentry to only enable debugging on the dentry cache. You may use an asterisk at the end of the slab name, in order to cover all slabs with the same prefix. For example, here's how you can poison the dentry cache as well as all kmalloc slabs:: - slub_debug=P,kmalloc-*,dentry + slab_debug=P,kmalloc-*,dentry Red zoning and tracking may realign the slab. We can just apply sanity checks to the dentry cache with:: - slub_debug=F,dentry + slab_debug=F,dentry Debugging options may require the minimum possible slab order to increase as a result of storing the metadata (for example, caches with PAGE_SIZE object @@ -84,20 +84,20 @@ sizes). This has a higher liklihood of resulting in slab allocation errors in low memory situations or if there's high fragmentation of memory. To switch off debugging for such caches by default, use:: - slub_debug=O + slab_debug=O You can apply different options to different list of slab names, using blocks of options. This will enable red zoning for dentry and user tracking for kmalloc. All other slabs will not get any debugging enabled:: - slub_debug=Z,dentry;U,kmalloc-* + slab_debug=Z,dentry;U,kmalloc-* You can also enable options (e.g. sanity checks and poisoning) for all caches except some that are deemed too performance critical and don't need to be debugged by specifying global debug options followed by a list of slab names with "-" as options:: - slub_debug=FZ;-,zs_handle,zspage + slab_debug=FZ;-,zs_handle,zspage The state of each debug option for a slab can be found in the respective files under:: @@ -105,7 +105,7 @@ under:: /sys/kernel/slab// If the file contains 1, the option is enabled, 0 means disabled. The debug -options from the ``slub_debug`` parameter translate to the following files:: +options from the ``slab_debug`` parameter translate to the following files:: F sanity_checks Z red_zone @@ -129,7 +129,7 @@ in order to reduce overhead and increase cache hotness of objects. Slab validation =============== -SLUB can validate all object if the kernel was booted with slub_debug. In +SLUB can validate all object if the kernel was booted with slab_debug. In order to do so you must have the ``slabinfo`` tool. Then you can do :: @@ -150,29 +150,29 @@ list_lock once in a while to deal with partial slabs. That overhead is governed by the order of the allocation for each slab. The allocations can be influenced by kernel parameters: -.. slub_min_objects=x (default 4) -.. slub_min_order=x (default 0) -.. slub_max_order=x (default 3 (PAGE_ALLOC_COSTLY_ORDER)) +.. slab_min_objects=x (default 4) +.. slab_min_order=x (default 0) +.. slab_max_order=x (default 3 (PAGE_ALLOC_COSTLY_ORDER)) -``slub_min_objects`` +``slab_min_objects`` allows to specify how many objects must at least fit into one slab in order for the allocation order to be acceptable. In general slub will be able to perform this number of allocations on a slab without consulting centralized resources (list_lock) where contention may occur. -``slub_min_order`` +``slab_min_order`` specifies a minimum order of slabs. A similar effect like - ``slub_min_objects``. + ``slab_min_objects``. -``slub_max_order`` - specified the order at which ``slub_min_objects`` should no +``slab_max_order`` + specified the order at which ``slab_min_objects`` should no longer be checked. This is useful to avoid SLUB trying to - generate super large order pages to fit ``slub_min_objects`` + generate super large order pages to fit ``slab_min_objects`` of a slab cache with large object sizes into one high order page. Setting command line parameter ``debug_guardpage_minorder=N`` (N > 0), forces setting - ``slub_max_order`` to 0, what cause minimum possible order of + ``slab_max_order`` to 0, what cause minimum possible order of slabs allocation. SLUB Debug output @@ -219,7 +219,7 @@ Here is a sample of slub debug output:: FIX kmalloc-8: Restoring Redzone 0xc90f6d28-0xc90f6d2b=0xcc If SLUB encounters a corrupted object (full detection requires the kernel -to be booted with slub_debug) then the following output will be dumped +to be booted with slab_debug) then the following output will be dumped into the syslog: 1. Description of the problem encountered @@ -239,7 +239,7 @@ into the syslog: pid= (Object allocation / free information is only available if SLAB_STORE_USER is - set for the slab. slub_debug sets that option) + set for the slab. slab_debug sets that option) 2. The object contents if an object was involved. @@ -262,7 +262,7 @@ into the syslog: the object boundary. (Redzone information is only available if SLAB_RED_ZONE is set. - slub_debug sets that option) + slab_debug sets that option) Padding
: Unused data to fill up the space in order to get the next object @@ -296,7 +296,7 @@ Emergency operations Minimal debugging (sanity checks alone) can be enabled by booting with:: - slub_debug=F + slab_debug=F This will be generally be enough to enable the resiliency features of slub which will keep the system running even if a bad kernel component will @@ -311,13 +311,13 @@ and enabling debugging only for that cache I.e.:: - slub_debug=F,dentry + slab_debug=F,dentry If the corruption occurs by writing after the end of the object then it may be advisable to enable a Redzone to avoid corrupting the beginning of other objects:: - slub_debug=FZ,dentry + slab_debug=FZ,dentry Extended slabinfo mode and plotting =================================== From 98d3b6d98f8013d4e96f40d8d4b22d4da0d3f699 Mon Sep 17 00:00:00 2001 From: Xiongwei Song Date: Fri, 15 Dec 2023 11:41:50 +0800 Subject: [PATCH 04/17] mm/slub: make the description of slab_min_objects helpful in doc There is no a value assigned to slab_min_objects by default, it always is 0 that is initialized by compiler if no assigned value by command line. min_objects is calculated based on processor numbers in calculate_order(). For more details, see commit 9b2cd506e5f2 ("slub: Calculate min_objects based on number of processors.") Signed-off-by: Xiongwei Song Reviewed-by: Vlastimil Babka Signed-off-by: Vlastimil Babka --- Documentation/mm/slub.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/mm/slub.rst b/Documentation/mm/slub.rst index 6579a55b7852..b517ee28a955 100644 --- a/Documentation/mm/slub.rst +++ b/Documentation/mm/slub.rst @@ -150,7 +150,7 @@ list_lock once in a while to deal with partial slabs. That overhead is governed by the order of the allocation for each slab. The allocations can be influenced by kernel parameters: -.. slab_min_objects=x (default 4) +.. slab_min_objects=x (default: automatically scaled by number of cpus) .. slab_min_order=x (default 0) .. slab_max_order=x (default 3 (PAGE_ALLOC_COSTLY_ORDER)) From 90b1e56641bbab801e22141c56aa79dc095a3764 Mon Sep 17 00:00:00 2001 From: Chengming Zhou Date: Tue, 23 Jan 2024 09:33:29 +0000 Subject: [PATCH 05/17] mm/slub: directly load freelist from cpu partial slab in the likely case The likely case is that we get a usable slab from the cpu partial list, we can directly load freelist from it and return back, instead of going the other way that need more work, like reenable interrupt and recheck. But we need to remove the "VM_BUG_ON(!new.frozen)" in get_freelist() for reusing it, since cpu partial slab is not frozen. It seems acceptable since it's only for debug purpose. And get_freelist() also assumes it can return NULL if the freelist is empty, which is not possible for the cpu partial slab case, so we add "VM_BUG_ON(!freelist)" after get_freelist() to make it explicit. There is some small performance improvement too, which shows by: perf bench sched messaging -g 5 -t -l 100000 mm-stable slub-optimize Total time 7.473 7.209 Signed-off-by: Chengming Zhou Reviewed-by: Vlastimil Babka Signed-off-by: Vlastimil Babka --- mm/slub.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 2ef88bbf56a3..fda402b2d649 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3326,7 +3326,6 @@ static inline void *get_freelist(struct kmem_cache *s, struct slab *slab) counters = slab->counters; new.counters = counters; - VM_BUG_ON(!new.frozen); new.inuse = slab->objects; new.frozen = freelist != NULL; @@ -3498,18 +3497,20 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, slab = slub_percpu_partial(c); slub_set_percpu_partial(c, slab); - local_unlock_irqrestore(&s->cpu_slab->lock, flags); - stat(s, CPU_PARTIAL_ALLOC); - if (unlikely(!node_match(slab, node) || - !pfmemalloc_match(slab, gfpflags))) { - slab->next = NULL; - __put_partials(s, slab); - continue; + if (likely(node_match(slab, node) && + pfmemalloc_match(slab, gfpflags))) { + c->slab = slab; + freelist = get_freelist(s, slab); + VM_BUG_ON(!freelist); + stat(s, CPU_PARTIAL_ALLOC); + goto load_freelist; } - freelist = freeze_slab(s, slab); - goto retry_load_slab; + local_unlock_irqrestore(&s->cpu_slab->lock, flags); + + slab->next = NULL; + __put_partials(s, slab); } #endif From a6def11b6dcde5d8f1fcc9e2c0ae71399432b62e Mon Sep 17 00:00:00 2001 From: Chengming Zhou Date: Tue, 23 Jan 2024 09:33:30 +0000 Subject: [PATCH 06/17] mm/slub: remove full list manipulation for non-debug slab Since debug slab is processed by free_to_partial_list(), and only debug slab which has SLAB_STORE_USER flag would care about the full list, we can remove these unrelated full list manipulations from __slab_free(). Acked-by: Christoph Lameter (Ampere) Reviewed-by: Vlastimil Babka Signed-off-by: Chengming Zhou Signed-off-by: Vlastimil Babka --- mm/slub.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index fda402b2d649..5c6fbeef05a8 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -4188,7 +4188,6 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab, * then add it. */ if (!kmem_cache_has_cpu_partial(s) && unlikely(!prior)) { - remove_full(s, n, slab); add_partial(n, slab, DEACTIVATE_TO_TAIL); stat(s, FREE_ADD_PARTIAL); } @@ -4202,9 +4201,6 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab, */ remove_partial(n, slab); stat(s, FREE_REMOVE_PARTIAL); - } else { - /* Slab must be on the full list */ - remove_full(s, n, slab); } spin_unlock_irqrestore(&n->list_lock, flags); From c63349fc4a2d10f5d1b5ee805cb639ee88fd6e4a Mon Sep 17 00:00:00 2001 From: Chengming Zhou Date: Tue, 23 Jan 2024 09:33:31 +0000 Subject: [PATCH 07/17] mm/slub: remove unused parameter in next_freelist_entry() The parameter "struct slab *slab" is unused in next_freelist_entry(), so just remove it. Acked-by: Christoph Lameter (Ampere) Reviewed-by: Vlastimil Babka Signed-off-by: Chengming Zhou Signed-off-by: Vlastimil Babka --- mm/slub.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 5c6fbeef05a8..7f235fa6592d 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2243,7 +2243,7 @@ static void __init init_freelist_randomization(void) } /* Get the next entry on the pre-computed freelist randomized */ -static void *next_freelist_entry(struct kmem_cache *s, struct slab *slab, +static void *next_freelist_entry(struct kmem_cache *s, unsigned long *pos, void *start, unsigned long page_limit, unsigned long freelist_count) @@ -2282,13 +2282,12 @@ static bool shuffle_freelist(struct kmem_cache *s, struct slab *slab) start = fixup_red_left(s, slab_address(slab)); /* First entry is used as the base of the freelist */ - cur = next_freelist_entry(s, slab, &pos, start, page_limit, - freelist_count); + cur = next_freelist_entry(s, &pos, start, page_limit, freelist_count); cur = setup_object(s, cur); slab->freelist = cur; for (idx = 1; idx < slab->objects; idx++) { - next = next_freelist_entry(s, slab, &pos, start, page_limit, + next = next_freelist_entry(s, &pos, start, page_limit, freelist_count); next = setup_object(s, next); set_freepointer(s, cur, next); From 66b3dc1f04135f89384af39ec4e38546b82e1510 Mon Sep 17 00:00:00 2001 From: Zheng Yejian Date: Tue, 30 Jan 2024 09:41:07 +0800 Subject: [PATCH 08/17] mm/slub: remove parameter 'flags' in create_kmalloc_caches() After commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"), parameter 'flags' is only passed as 0 in create_kmalloc_caches(), and then it is only passed to new_kmalloc_cache(). So we can change parameter 'flags' to be a local variable with initial value 0 in new_kmalloc_cache() and remove parameter 'flags' in create_kmalloc_caches(). Also make new_kmalloc_cache() static due to it is only used in mm/slab_common.c. Signed-off-by: Zheng Yejian Acked-by: David Rientjes Reviewed-by: Chengming Zhou Signed-off-by: Vlastimil Babka --- mm/slab.h | 4 +--- mm/slab_common.c | 13 +++++++------ mm/slub.c | 2 +- 3 files changed, 9 insertions(+), 10 deletions(-) diff --git a/mm/slab.h b/mm/slab.h index f7df6d701c5b..9abec38be1d0 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -387,7 +387,7 @@ extern const struct kmalloc_info_struct { /* Kmalloc array related functions */ void setup_kmalloc_cache_index_table(void); -void create_kmalloc_caches(slab_flags_t); +void create_kmalloc_caches(void); extern u8 kmalloc_size_index[24]; @@ -422,8 +422,6 @@ gfp_t kmalloc_fix_flags(gfp_t flags); int __kmem_cache_create(struct kmem_cache *, slab_flags_t flags); void __init kmem_cache_init(void); -void __init new_kmalloc_cache(int idx, enum kmalloc_cache_type type, - slab_flags_t flags); extern void create_boot_cache(struct kmem_cache *, const char *name, unsigned int size, slab_flags_t flags, unsigned int useroffset, unsigned int usersize); diff --git a/mm/slab_common.c b/mm/slab_common.c index 230ef7cc3467..1910252d7e89 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -853,9 +853,10 @@ static unsigned int __kmalloc_minalign(void) return max(minalign, arch_slab_minalign()); } -void __init -new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) +static void __init +new_kmalloc_cache(int idx, enum kmalloc_cache_type type) { + slab_flags_t flags = 0; unsigned int minalign = __kmalloc_minalign(); unsigned int aligned_size = kmalloc_info[idx].size; int aligned_idx = idx; @@ -902,7 +903,7 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) * may already have been created because they were needed to * enable allocations for slab creation. */ -void __init create_kmalloc_caches(slab_flags_t flags) +void __init create_kmalloc_caches(void) { int i; enum kmalloc_cache_type type; @@ -913,7 +914,7 @@ void __init create_kmalloc_caches(slab_flags_t flags) for (type = KMALLOC_NORMAL; type < NR_KMALLOC_TYPES; type++) { for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) { if (!kmalloc_caches[type][i]) - new_kmalloc_cache(i, type, flags); + new_kmalloc_cache(i, type); /* * Caches that are not of the two-to-the-power-of size. @@ -922,10 +923,10 @@ void __init create_kmalloc_caches(slab_flags_t flags) */ if (KMALLOC_MIN_SIZE <= 32 && i == 6 && !kmalloc_caches[type][1]) - new_kmalloc_cache(1, type, flags); + new_kmalloc_cache(1, type); if (KMALLOC_MIN_SIZE <= 64 && i == 7 && !kmalloc_caches[type][2]) - new_kmalloc_cache(2, type, flags); + new_kmalloc_cache(2, type); } } #ifdef CONFIG_RANDOM_KMALLOC_CACHES diff --git a/mm/slub.c b/mm/slub.c index e66bc888d23b..4ebc0df410ff 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -5668,7 +5668,7 @@ void __init kmem_cache_init(void) /* Now we can use the kmem_cache to allocate kmalloc slabs */ setup_kmalloc_cache_index_table(); - create_kmalloc_caches(0); + create_kmalloc_caches(); /* Setup random freelists for each cache */ init_freelist_randomization(); From 303cd69394bb01f7ce0ce7509b3bd02f34795c05 Mon Sep 17 00:00:00 2001 From: Chengming Zhou Date: Wed, 21 Feb 2024 12:12:53 +0000 Subject: [PATCH 09/17] mm, slab: remove unused object_size parameter in kmem_cache_flags() We don't use the object_size parameter in kmem_cache_flags(), so just remove it. Signed-off-by: Chengming Zhou Signed-off-by: Vlastimil Babka --- mm/slab.h | 3 +-- mm/slab_common.c | 2 +- mm/slub.c | 9 +++------ 3 files changed, 5 insertions(+), 9 deletions(-) diff --git a/mm/slab.h b/mm/slab.h index 9abec38be1d0..6b879d8e8a7a 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -433,8 +433,7 @@ struct kmem_cache * __kmem_cache_alias(const char *name, unsigned int size, unsigned int align, slab_flags_t flags, void (*ctor)(void *)); -slab_flags_t kmem_cache_flags(unsigned int object_size, - slab_flags_t flags, const char *name); +slab_flags_t kmem_cache_flags(slab_flags_t flags, const char *name); static inline bool is_kmalloc_cache(struct kmem_cache *s) { diff --git a/mm/slab_common.c b/mm/slab_common.c index 1910252d7e89..e19544043fdf 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -172,7 +172,7 @@ struct kmem_cache *find_mergeable(unsigned int size, unsigned int align, size = ALIGN(size, sizeof(void *)); align = calculate_alignment(flags, align, size); size = ALIGN(size, align); - flags = kmem_cache_flags(size, flags, name); + flags = kmem_cache_flags(flags, name); if (flags & SLAB_NEVER_MERGE) return NULL; diff --git a/mm/slub.c b/mm/slub.c index 4ebc0df410ff..d8d8dd8e9803 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1765,7 +1765,6 @@ __setup_param("slub_debug", slub_debug, setup_slub_debug, 0); /* * kmem_cache_flags - apply debugging options to the cache - * @object_size: the size of an object without meta data * @flags: flags to set * @name: name of the cache * @@ -1774,8 +1773,7 @@ __setup_param("slub_debug", slub_debug, setup_slub_debug, 0); * slab_debug=,, ... * then only the select slabs will receive the debug option(s). */ -slab_flags_t kmem_cache_flags(unsigned int object_size, - slab_flags_t flags, const char *name) +slab_flags_t kmem_cache_flags(slab_flags_t flags, const char *name) { char *iter; size_t len; @@ -1851,8 +1849,7 @@ static inline void add_full(struct kmem_cache *s, struct kmem_cache_node *n, struct slab *slab) {} static inline void remove_full(struct kmem_cache *s, struct kmem_cache_node *n, struct slab *slab) {} -slab_flags_t kmem_cache_flags(unsigned int object_size, - slab_flags_t flags, const char *name) +slab_flags_t kmem_cache_flags(slab_flags_t flags, const char *name) { return flags; } @@ -5105,7 +5102,7 @@ static int calculate_sizes(struct kmem_cache *s) static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags) { - s->flags = kmem_cache_flags(s->size, flags, s->name); + s->flags = kmem_cache_flags(flags, s->name); #ifdef CONFIG_SLAB_FREELIST_HARDENED s->random = get_random_long(); #endif From c94d222445c168788a8849413e676fcc256e9479 Mon Sep 17 00:00:00 2001 From: Chengming Zhou Date: Wed, 21 Feb 2024 12:12:54 +0000 Subject: [PATCH 10/17] mm, slab: fix the comment of cpu partial list The partial slabs on cpu partial list are not frozen after the commit 8cd3fa428b56 ("slub: Delay freezing of partial slabs") merged. So fix the comment. Signed-off-by: Chengming Zhou Signed-off-by: Vlastimil Babka --- mm/slub.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slub.c b/mm/slub.c index d8d8dd8e9803..e28929e6e252 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -391,7 +391,7 @@ struct kmem_cache_cpu { }; struct slab *slab; /* The slab from which we are allocating */ #ifdef CONFIG_SLUB_CPU_PARTIAL - struct slab *partial; /* Partially allocated frozen slabs */ + struct slab *partial; /* Partially allocated slabs */ #endif local_lock_t lock; /* Protects the fields above */ #ifdef CONFIG_SLUB_STATS From cdeeaaba174886aa6c1ff4c0c5449c5066dbe82f Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Fri, 23 Feb 2024 19:27:17 +0100 Subject: [PATCH 11/17] mm, slab: deprecate SLAB_MEM_SPREAD flag The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was removed. SLUB instead relies on the page allocator's NUMA policies. Change the flag's value to 0 to free up the value it had, and mark it for full removal once all users are gone. Reported-by: Steven Rostedt Closes: https://lore.kernel.org/all/20240131172027.10f64405@gandalf.local.home/ Reviewed-and-tested-by: Xiongwei Song Reviewed-by: Chengming Zhou Reviewed-by: Roman Gushchin Acked-by: David Rientjes Signed-off-by: Vlastimil Babka --- include/linux/slab.h | 5 +++-- mm/slab.h | 1 - 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index b5f5ee8308d0..b1675ff6b904 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -96,8 +96,6 @@ */ /* Defer freeing slabs to RCU */ #define SLAB_TYPESAFE_BY_RCU ((slab_flags_t __force)0x00080000U) -/* Spread some memory over cpuset */ -#define SLAB_MEM_SPREAD ((slab_flags_t __force)0x00100000U) /* Trace allocations and frees */ #define SLAB_TRACE ((slab_flags_t __force)0x00200000U) @@ -164,6 +162,9 @@ #endif #define SLAB_TEMPORARY SLAB_RECLAIM_ACCOUNT /* Objects are short-lived */ +/* Obsolete unused flag, to be removed */ +#define SLAB_MEM_SPREAD ((slab_flags_t __force)0U) + /* * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests. * diff --git a/mm/slab.h b/mm/slab.h index 54deeb0428c6..f4534eefb35d 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -469,7 +469,6 @@ static inline bool is_kmalloc_cache(struct kmem_cache *s) SLAB_STORE_USER | \ SLAB_TRACE | \ SLAB_CONSISTENCY_CHECKS | \ - SLAB_MEM_SPREAD | \ SLAB_NOLEAKTRACE | \ SLAB_RECLAIM_ACCOUNT | \ SLAB_TEMPORARY | \ From cc61eb851c9ae38546d7df6076fd883d3dbc322d Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Fri, 23 Feb 2024 19:27:18 +0100 Subject: [PATCH 12/17] mm, slab: use an enum to define SLAB_ cache creation flags The values of SLAB_ cache creation flags are defined by hand, which is tedious and error-prone. Use an enum to assign the bit number and a __SLAB_FLAG_BIT() macro to #define the final flags. This renumbers the flag values, which is OK as they are only used internally. Also define a __SLAB_FLAG_UNUSED macro to assign value to flags disabled by their respective config options in a unified and sparse-friendly way. Reviewed-and-tested-by: Xiongwei Song Reviewed-by: Chengming Zhou Reviewed-by: Roman Gushchin Acked-by: David Rientjes Signed-off-by: Vlastimil Babka --- include/linux/slab.h | 94 +++++++++++++++++++++++++++++++------------- mm/slub.c | 6 +-- 2 files changed, 70 insertions(+), 30 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index b1675ff6b904..f6323763cd61 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -21,29 +21,69 @@ #include #include +enum _slab_flag_bits { + _SLAB_CONSISTENCY_CHECKS, + _SLAB_RED_ZONE, + _SLAB_POISON, + _SLAB_KMALLOC, + _SLAB_HWCACHE_ALIGN, + _SLAB_CACHE_DMA, + _SLAB_CACHE_DMA32, + _SLAB_STORE_USER, + _SLAB_PANIC, + _SLAB_TYPESAFE_BY_RCU, + _SLAB_TRACE, +#ifdef CONFIG_DEBUG_OBJECTS + _SLAB_DEBUG_OBJECTS, +#endif + _SLAB_NOLEAKTRACE, + _SLAB_NO_MERGE, +#ifdef CONFIG_FAILSLAB + _SLAB_FAILSLAB, +#endif +#ifdef CONFIG_MEMCG_KMEM + _SLAB_ACCOUNT, +#endif +#ifdef CONFIG_KASAN_GENERIC + _SLAB_KASAN, +#endif + _SLAB_NO_USER_FLAGS, +#ifdef CONFIG_KFENCE + _SLAB_SKIP_KFENCE, +#endif +#ifndef CONFIG_SLUB_TINY + _SLAB_RECLAIM_ACCOUNT, +#endif + _SLAB_OBJECT_POISON, + _SLAB_CMPXCHG_DOUBLE, + _SLAB_FLAGS_LAST_BIT +}; + +#define __SLAB_FLAG_BIT(nr) ((slab_flags_t __force)(1U << (nr))) +#define __SLAB_FLAG_UNUSED ((slab_flags_t __force)(0U)) /* * Flags to pass to kmem_cache_create(). * The ones marked DEBUG need CONFIG_SLUB_DEBUG enabled, otherwise are no-op */ /* DEBUG: Perform (expensive) checks on alloc/free */ -#define SLAB_CONSISTENCY_CHECKS ((slab_flags_t __force)0x00000100U) +#define SLAB_CONSISTENCY_CHECKS __SLAB_FLAG_BIT(_SLAB_CONSISTENCY_CHECKS) /* DEBUG: Red zone objs in a cache */ -#define SLAB_RED_ZONE ((slab_flags_t __force)0x00000400U) +#define SLAB_RED_ZONE __SLAB_FLAG_BIT(_SLAB_RED_ZONE) /* DEBUG: Poison objects */ -#define SLAB_POISON ((slab_flags_t __force)0x00000800U) +#define SLAB_POISON __SLAB_FLAG_BIT(_SLAB_POISON) /* Indicate a kmalloc slab */ -#define SLAB_KMALLOC ((slab_flags_t __force)0x00001000U) +#define SLAB_KMALLOC __SLAB_FLAG_BIT(_SLAB_KMALLOC) /* Align objs on cache lines */ -#define SLAB_HWCACHE_ALIGN ((slab_flags_t __force)0x00002000U) +#define SLAB_HWCACHE_ALIGN __SLAB_FLAG_BIT(_SLAB_HWCACHE_ALIGN) /* Use GFP_DMA memory */ -#define SLAB_CACHE_DMA ((slab_flags_t __force)0x00004000U) +#define SLAB_CACHE_DMA __SLAB_FLAG_BIT(_SLAB_CACHE_DMA) /* Use GFP_DMA32 memory */ -#define SLAB_CACHE_DMA32 ((slab_flags_t __force)0x00008000U) +#define SLAB_CACHE_DMA32 __SLAB_FLAG_BIT(_SLAB_CACHE_DMA32) /* DEBUG: Store the last owner for bug hunting */ -#define SLAB_STORE_USER ((slab_flags_t __force)0x00010000U) +#define SLAB_STORE_USER __SLAB_FLAG_BIT(_SLAB_STORE_USER) /* Panic if kmem_cache_create() fails */ -#define SLAB_PANIC ((slab_flags_t __force)0x00040000U) +#define SLAB_PANIC __SLAB_FLAG_BIT(_SLAB_PANIC) /* * SLAB_TYPESAFE_BY_RCU - **WARNING** READ THIS! * @@ -95,19 +135,19 @@ * Note that SLAB_TYPESAFE_BY_RCU was originally named SLAB_DESTROY_BY_RCU. */ /* Defer freeing slabs to RCU */ -#define SLAB_TYPESAFE_BY_RCU ((slab_flags_t __force)0x00080000U) +#define SLAB_TYPESAFE_BY_RCU __SLAB_FLAG_BIT(_SLAB_TYPESAFE_BY_RCU) /* Trace allocations and frees */ -#define SLAB_TRACE ((slab_flags_t __force)0x00200000U) +#define SLAB_TRACE __SLAB_FLAG_BIT(_SLAB_TRACE) /* Flag to prevent checks on free */ #ifdef CONFIG_DEBUG_OBJECTS -# define SLAB_DEBUG_OBJECTS ((slab_flags_t __force)0x00400000U) +# define SLAB_DEBUG_OBJECTS __SLAB_FLAG_BIT(_SLAB_DEBUG_OBJECTS) #else -# define SLAB_DEBUG_OBJECTS 0 +# define SLAB_DEBUG_OBJECTS __SLAB_FLAG_UNUSED #endif /* Avoid kmemleak tracing */ -#define SLAB_NOLEAKTRACE ((slab_flags_t __force)0x00800000U) +#define SLAB_NOLEAKTRACE __SLAB_FLAG_BIT(_SLAB_NOLEAKTRACE) /* * Prevent merging with compatible kmem caches. This flag should be used @@ -119,25 +159,25 @@ * - performance critical caches, should be very rare and consulted with slab * maintainers, and not used together with CONFIG_SLUB_TINY */ -#define SLAB_NO_MERGE ((slab_flags_t __force)0x01000000U) +#define SLAB_NO_MERGE __SLAB_FLAG_BIT(_SLAB_NO_MERGE) /* Fault injection mark */ #ifdef CONFIG_FAILSLAB -# define SLAB_FAILSLAB ((slab_flags_t __force)0x02000000U) +# define SLAB_FAILSLAB __SLAB_FLAG_BIT(_SLAB_FAILSLAB) #else -# define SLAB_FAILSLAB 0 +# define SLAB_FAILSLAB __SLAB_FLAG_UNUSED #endif /* Account to memcg */ #ifdef CONFIG_MEMCG_KMEM -# define SLAB_ACCOUNT ((slab_flags_t __force)0x04000000U) +# define SLAB_ACCOUNT __SLAB_FLAG_BIT(_SLAB_ACCOUNT) #else -# define SLAB_ACCOUNT 0 +# define SLAB_ACCOUNT __SLAB_FLAG_UNUSED #endif #ifdef CONFIG_KASAN_GENERIC -#define SLAB_KASAN ((slab_flags_t __force)0x08000000U) +#define SLAB_KASAN __SLAB_FLAG_BIT(_SLAB_KASAN) #else -#define SLAB_KASAN 0 +#define SLAB_KASAN __SLAB_FLAG_UNUSED #endif /* @@ -145,25 +185,25 @@ * Intended for caches created for self-tests so they have only flags * specified in the code and other flags are ignored. */ -#define SLAB_NO_USER_FLAGS ((slab_flags_t __force)0x10000000U) +#define SLAB_NO_USER_FLAGS __SLAB_FLAG_BIT(_SLAB_NO_USER_FLAGS) #ifdef CONFIG_KFENCE -#define SLAB_SKIP_KFENCE ((slab_flags_t __force)0x20000000U) +#define SLAB_SKIP_KFENCE __SLAB_FLAG_BIT(_SLAB_SKIP_KFENCE) #else -#define SLAB_SKIP_KFENCE 0 +#define SLAB_SKIP_KFENCE __SLAB_FLAG_UNUSED #endif /* The following flags affect the page allocator grouping pages by mobility */ /* Objects are reclaimable */ #ifndef CONFIG_SLUB_TINY -#define SLAB_RECLAIM_ACCOUNT ((slab_flags_t __force)0x00020000U) +#define SLAB_RECLAIM_ACCOUNT __SLAB_FLAG_BIT(_SLAB_RECLAIM_ACCOUNT) #else -#define SLAB_RECLAIM_ACCOUNT ((slab_flags_t __force)0) +#define SLAB_RECLAIM_ACCOUNT __SLAB_FLAG_UNUSED #endif #define SLAB_TEMPORARY SLAB_RECLAIM_ACCOUNT /* Objects are short-lived */ /* Obsolete unused flag, to be removed */ -#define SLAB_MEM_SPREAD ((slab_flags_t __force)0U) +#define SLAB_MEM_SPREAD __SLAB_FLAG_UNUSED /* * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests. diff --git a/mm/slub.c b/mm/slub.c index 2ef88bbf56a3..2934ef5f3cff 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -306,13 +306,13 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s) /* Internal SLUB flags */ /* Poison object */ -#define __OBJECT_POISON ((slab_flags_t __force)0x80000000U) +#define __OBJECT_POISON __SLAB_FLAG_BIT(_SLAB_OBJECT_POISON) /* Use cmpxchg_double */ #ifdef system_has_freelist_aba -#define __CMPXCHG_DOUBLE ((slab_flags_t __force)0x40000000U) +#define __CMPXCHG_DOUBLE __SLAB_FLAG_BIT(_SLAB_CMPXCHG_DOUBLE) #else -#define __CMPXCHG_DOUBLE ((slab_flags_t __force)0U) +#define __CMPXCHG_DOUBLE __SLAB_FLAG_UNUSED #endif /* From 96d8dbb6f65041b670a79e8ae76f67cc11dee203 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Fri, 23 Feb 2024 19:27:19 +0100 Subject: [PATCH 13/17] mm, slab, kasan: replace kasan_never_merge() with SLAB_NO_MERGE The SLAB_KASAN flag prevents merging of caches in some configurations, which is handled in a rather complicated way via kasan_never_merge(). Since we now have a generic SLAB_NO_MERGE flag, we can instead use it for KASAN caches in addition to SLAB_KASAN in those configurations, and simplify the SLAB_NEVER_MERGE handling. Tested-by: Xiongwei Song Reviewed-by: Chengming Zhou Reviewed-by: Andrey Konovalov Tested-by: David Rientjes Signed-off-by: Vlastimil Babka --- include/linux/kasan.h | 6 ------ mm/kasan/generic.c | 22 ++++++---------------- mm/slab_common.c | 2 +- 3 files changed, 7 insertions(+), 23 deletions(-) diff --git a/include/linux/kasan.h b/include/linux/kasan.h index dbb06d789e74..70d6a8f6e25d 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -429,7 +429,6 @@ struct kasan_cache { }; size_t kasan_metadata_size(struct kmem_cache *cache, bool in_object); -slab_flags_t kasan_never_merge(void); void kasan_cache_create(struct kmem_cache *cache, unsigned int *size, slab_flags_t *flags); @@ -446,11 +445,6 @@ static inline size_t kasan_metadata_size(struct kmem_cache *cache, { return 0; } -/* And thus nothing prevents cache merging. */ -static inline slab_flags_t kasan_never_merge(void) -{ - return 0; -} /* And no cache-related metadata initialization is required. */ static inline void kasan_cache_create(struct kmem_cache *cache, unsigned int *size, diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c index df6627f62402..27297dc4a55b 100644 --- a/mm/kasan/generic.c +++ b/mm/kasan/generic.c @@ -334,14 +334,6 @@ DEFINE_ASAN_SET_SHADOW(f3); DEFINE_ASAN_SET_SHADOW(f5); DEFINE_ASAN_SET_SHADOW(f8); -/* Only allow cache merging when no per-object metadata is present. */ -slab_flags_t kasan_never_merge(void) -{ - if (!kasan_requires_meta()) - return 0; - return SLAB_KASAN; -} - /* * Adaptive redzone policy taken from the userspace AddressSanitizer runtime. * For larger allocations larger redzones are used. @@ -370,15 +362,13 @@ void kasan_cache_create(struct kmem_cache *cache, unsigned int *size, return; /* - * SLAB_KASAN is used to mark caches that are sanitized by KASAN - * and that thus have per-object metadata. - * Currently this flag is used in two places: - * 1. In slab_ksize() to account for per-object metadata when - * calculating the size of the accessible memory within the object. - * 2. In slab_common.c via kasan_never_merge() to prevent merging of - * caches with per-object metadata. + * SLAB_KASAN is used to mark caches that are sanitized by KASAN and + * that thus have per-object metadata. Currently, this flag is used in + * slab_ksize() to account for per-object metadata when calculating the + * size of the accessible memory within the object. Additionally, we use + * SLAB_NO_MERGE to prevent merging of caches with per-object metadata. */ - *flags |= SLAB_KASAN; + *flags |= SLAB_KASAN | SLAB_NO_MERGE; ok_size = *size; diff --git a/mm/slab_common.c b/mm/slab_common.c index 238293b1dbe1..7cfa2f1ce655 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -50,7 +50,7 @@ static DECLARE_WORK(slab_caches_to_rcu_destroy_work, */ #define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \ SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \ - SLAB_FAILSLAB | SLAB_NO_MERGE | kasan_never_merge()) + SLAB_FAILSLAB | SLAB_NO_MERGE) #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \ SLAB_CACHE_DMA32 | SLAB_ACCOUNT) From 011568eb3117a1b0e1b2e980de37a4ec47952617 Mon Sep 17 00:00:00 2001 From: Xiaolei Wang Date: Wed, 28 Feb 2024 11:04:08 +0800 Subject: [PATCH 14/17] mm/slab: Fix a kmemleak in kmem_cache_destroy() For earlier kmem cache creation, slab_sysfs_init() has not been called. Consequently, kmem_cache_destroy() cannot utilize kobj_type::release to release the kmem_cache structure. Therefore, tweak kmem_cache_release() to use slab_kmem_cache_release() for releasing kmem_cache when slab_state isn't FULL. This will fixes the memory leaks like following: unreferenced object 0xffff0000c2d87080 (size 128): comm "swapper/0", pid 1, jiffies 4294893428 hex dump (first 32 bytes): 00 00 00 00 ad 4e ad de ff ff ff ff 6b 6b 6b 6b .....N......kkkk ff ff ff ff ff ff ff ff b8 ab 48 89 00 80 ff ff.....H..... backtrace (crc 8819d0f6): [] kmemleak_alloc+0xb0/0xc4 [] kmem_cache_alloc_node+0x288/0x3a8 [] __kmem_cache_create+0x1e4/0x64c [] kmem_cache_create_usercopy+0x1c4/0x2cc [] kmem_cache_create+0x1c/0x28 [] arm_v7s_alloc_pgtable+0x1c0/0x6d4 [] alloc_io_pgtable_ops+0xe8/0x2d0 [] arm_v7s_do_selftests+0xe0/0x73c [] do_one_initcall+0x11c/0x7ac [] kernel_init_freeable+0x53c/0xbb8 [] kernel_init+0x24/0x144 [] ret_from_fork+0x10/0x20 Signed-off-by: Xiaolei Wang Reviewed-by: Chengming Zhou Signed-off-by: Vlastimil Babka --- mm/slab_common.c | 8 ++++++-- mm/slub.c | 6 ++---- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index e19544043fdf..7d60cfc2b30f 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -404,8 +404,12 @@ EXPORT_SYMBOL(kmem_cache_create); */ static void kmem_cache_release(struct kmem_cache *s) { - sysfs_slab_unlink(s); - sysfs_slab_release(s); + if (slab_state >= FULL) { + sysfs_slab_unlink(s); + sysfs_slab_release(s); + } else { + slab_kmem_cache_release(s); + } } #else static void kmem_cache_release(struct kmem_cache *s) diff --git a/mm/slub.c b/mm/slub.c index e28929e6e252..0e02e072693b 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -6794,14 +6794,12 @@ static int sysfs_slab_add(struct kmem_cache *s) void sysfs_slab_unlink(struct kmem_cache *s) { - if (slab_state >= FULL) - kobject_del(&s->kobj); + kobject_del(&s->kobj); } void sysfs_slab_release(struct kmem_cache *s) { - if (slab_state >= FULL) - kobject_put(&s->kobj); + kobject_put(&s->kobj); } /* From 3dd549a557f7dc326d59c5fa105e230ebf3d5458 Mon Sep 17 00:00:00 2001 From: Chengming Zhou Date: Thu, 22 Feb 2024 13:02:33 +0000 Subject: [PATCH 15/17] mm, slab: remove the corner case of inc_slabs_node() We already have the inc_slabs_node() after kmem_cache_node->node[node] initialized in early_kmem_cache_node_alloc(), this special case of inc_slabs_node() can be removed. Then we don't need to consider the existence of kmem_cache_node in inc_slabs_node() anymore. Signed-off-by: Chengming Zhou Signed-off-by: Vlastimil Babka --- mm/slub.c | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 0e02e072693b..12066e69688d 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1498,16 +1498,8 @@ static inline void inc_slabs_node(struct kmem_cache *s, int node, int objects) { struct kmem_cache_node *n = get_node(s, node); - /* - * May be called early in order to allocate a slab for the - * kmem_cache_node structure. Solve the chicken-egg - * dilemma by deferring the increment of the count during - * bootstrap (see early_kmem_cache_node_alloc). - */ - if (likely(n)) { - atomic_long_inc(&n->nr_slabs); - atomic_long_add(objects, &n->total_objects); - } + atomic_long_inc(&n->nr_slabs); + atomic_long_add(objects, &n->total_objects); } static inline void dec_slabs_node(struct kmem_cache *s, int node, int objects) { @@ -4855,7 +4847,6 @@ static void early_kmem_cache_node_alloc(int node) slab = new_slab(kmem_cache_node, GFP_NOWAIT, node); BUG_ON(!slab); - inc_slabs_node(kmem_cache_node, slab_nid(slab), slab->objects); if (slab_nid(slab) != node) { pr_err("SLUB: Unable to allocate memory from node %d\n", node); pr_err("SLUB: Allocating a useless per node structure in order to be able to continue\n"); From 17cce771c5fc85f43680143ac8b3b944fdad113f Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Fri, 1 Mar 2024 17:08:09 +0100 Subject: [PATCH 16/17] mm, slab: remove memcg_from_slab_obj() This empty wrapped exists only for !CONFIG_MEMCG_KMEM and seems it was never used. Probably a leftover from development of a series. Reviewed-by: Chengming Zhou Reviewed-by: Roman Gushchin Acked-by: David Rientjes Signed-off-by: Vlastimil Babka --- mm/slub.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 12066e69688d..186a8ec28228 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2028,11 +2028,6 @@ void memcg_slab_alloc_error_hook(struct kmem_cache *s, int objects, obj_cgroup_uncharge(objcg, objects * obj_full_size(s)); } #else /* CONFIG_MEMCG_KMEM */ -static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr) -{ - return NULL; -} - static inline void memcg_free_slab_cgroups(struct slab *slab) { } From fae1b0129327343b30be8f254da246a010810959 Mon Sep 17 00:00:00 2001 From: Chengming Zhou Date: Tue, 5 Mar 2024 08:39:13 +0000 Subject: [PATCH 17/17] slab: remove PARTIAL_NODE slab_state The PARTIAL_NODE slab_state has gone with SLAB removed, so just remove it. Signed-off-by: Chengming Zhou Signed-off-by: Vlastimil Babka --- mm/slab.h | 1 - tools/include/linux/slab.h | 1 - 2 files changed, 2 deletions(-) diff --git a/mm/slab.h b/mm/slab.h index 6b879d8e8a7a..16c343942456 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -363,7 +363,6 @@ static inline int objs_per_slab(const struct kmem_cache *cache, enum slab_state { DOWN, /* No slab functionality yet */ PARTIAL, /* SLUB: kmem_cache_node available */ - PARTIAL_NODE, /* SLAB: kmalloc size for node struct available */ UP, /* Slab caches usable but not all extras yet */ FULL /* Everything is working */ }; diff --git a/tools/include/linux/slab.h b/tools/include/linux/slab.h index 311759ea25e9..51b25e9c4ec7 100644 --- a/tools/include/linux/slab.h +++ b/tools/include/linux/slab.h @@ -18,7 +18,6 @@ bool slab_is_available(void); enum slab_state { DOWN, PARTIAL, - PARTIAL_NODE, UP, FULL };