page_pool: refurbish version of page_pool code
Need a fast page recycle mechanism for ndo_xdp_xmit API for returning
pages on DMA-TX completion time, which have good cross CPU
performance, given DMA-TX completion time can happen on a remote CPU.
Refurbish my page_pool code, that was presented[1] at MM-summit 2016.
Adapted page_pool code to not depend the page allocator and
integration into struct page. The DMA mapping feature is kept,
even-though it will not be activated/used in this patchset.
[1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
V2: Adjustments requested by Tariq
- Changed page_pool_create return codes, don't return NULL, only
ERR_PTR, as this simplifies err handling in drivers.
V4: many small improvements and cleanups
- Add DOC comment section, that can be used by kernel-doc
- Improve fallback mode, to work better with refcnt based recycling
e.g. remove a WARN as pointed out by Tariq
e.g. quicker fallback if ptr_ring is empty.
V5: Fixed SPDX license as pointed out by Alexei
V6: Adjustments requested by Eric Dumazet
- Adjust ____cacheline_aligned_in_smp usage/placement
- Move rcu_head in struct page_pool
- Free pages quicker on destroy, minimize resources delayed an RCU period
- Remove code for forward/backward compat ABI interface
V8: Issues found by kbuild test robot
- Address sparse should be static warnings
- Only compile+link when a driver use/select page_pool,
mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 14:46:17 +00:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0
|
|
|
|
*
|
2023-08-04 18:05:24 +00:00
|
|
|
* page_pool/helpers.h
|
page_pool: refurbish version of page_pool code
Need a fast page recycle mechanism for ndo_xdp_xmit API for returning
pages on DMA-TX completion time, which have good cross CPU
performance, given DMA-TX completion time can happen on a remote CPU.
Refurbish my page_pool code, that was presented[1] at MM-summit 2016.
Adapted page_pool code to not depend the page allocator and
integration into struct page. The DMA mapping feature is kept,
even-though it will not be activated/used in this patchset.
[1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
V2: Adjustments requested by Tariq
- Changed page_pool_create return codes, don't return NULL, only
ERR_PTR, as this simplifies err handling in drivers.
V4: many small improvements and cleanups
- Add DOC comment section, that can be used by kernel-doc
- Improve fallback mode, to work better with refcnt based recycling
e.g. remove a WARN as pointed out by Tariq
e.g. quicker fallback if ptr_ring is empty.
V5: Fixed SPDX license as pointed out by Alexei
V6: Adjustments requested by Eric Dumazet
- Adjust ____cacheline_aligned_in_smp usage/placement
- Move rcu_head in struct page_pool
- Free pages quicker on destroy, minimize resources delayed an RCU period
- Remove code for forward/backward compat ABI interface
V8: Issues found by kbuild test robot
- Address sparse should be static warnings
- Only compile+link when a driver use/select page_pool,
mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 14:46:17 +00:00
|
|
|
* Author: Jesper Dangaard Brouer <netoptimizer@brouer.com>
|
|
|
|
* Copyright (C) 2016 Red Hat, Inc.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/**
|
|
|
|
* DOC: page_pool allocator
|
|
|
|
*
|
2023-10-20 09:59:51 +00:00
|
|
|
* The page_pool allocator is optimized for recycling page or page fragment used
|
|
|
|
* by skb packet and xdp frame.
|
page_pool: refurbish version of page_pool code
Need a fast page recycle mechanism for ndo_xdp_xmit API for returning
pages on DMA-TX completion time, which have good cross CPU
performance, given DMA-TX completion time can happen on a remote CPU.
Refurbish my page_pool code, that was presented[1] at MM-summit 2016.
Adapted page_pool code to not depend the page allocator and
integration into struct page. The DMA mapping feature is kept,
even-though it will not be activated/used in this patchset.
[1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
V2: Adjustments requested by Tariq
- Changed page_pool_create return codes, don't return NULL, only
ERR_PTR, as this simplifies err handling in drivers.
V4: many small improvements and cleanups
- Add DOC comment section, that can be used by kernel-doc
- Improve fallback mode, to work better with refcnt based recycling
e.g. remove a WARN as pointed out by Tariq
e.g. quicker fallback if ptr_ring is empty.
V5: Fixed SPDX license as pointed out by Alexei
V6: Adjustments requested by Eric Dumazet
- Adjust ____cacheline_aligned_in_smp usage/placement
- Move rcu_head in struct page_pool
- Free pages quicker on destroy, minimize resources delayed an RCU period
- Remove code for forward/backward compat ABI interface
V8: Issues found by kbuild test robot
- Address sparse should be static warnings
- Only compile+link when a driver use/select page_pool,
mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 14:46:17 +00:00
|
|
|
*
|
2023-12-13 04:36:50 +00:00
|
|
|
* Basic use involves replacing any alloc_pages() calls with page_pool_alloc(),
|
2023-10-20 09:59:51 +00:00
|
|
|
* which allocate memory with or without page splitting depending on the
|
|
|
|
* requested memory size.
|
page_pool: refurbish version of page_pool code
Need a fast page recycle mechanism for ndo_xdp_xmit API for returning
pages on DMA-TX completion time, which have good cross CPU
performance, given DMA-TX completion time can happen on a remote CPU.
Refurbish my page_pool code, that was presented[1] at MM-summit 2016.
Adapted page_pool code to not depend the page allocator and
integration into struct page. The DMA mapping feature is kept,
even-though it will not be activated/used in this patchset.
[1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
V2: Adjustments requested by Tariq
- Changed page_pool_create return codes, don't return NULL, only
ERR_PTR, as this simplifies err handling in drivers.
V4: many small improvements and cleanups
- Add DOC comment section, that can be used by kernel-doc
- Improve fallback mode, to work better with refcnt based recycling
e.g. remove a WARN as pointed out by Tariq
e.g. quicker fallback if ptr_ring is empty.
V5: Fixed SPDX license as pointed out by Alexei
V6: Adjustments requested by Eric Dumazet
- Adjust ____cacheline_aligned_in_smp usage/placement
- Move rcu_head in struct page_pool
- Free pages quicker on destroy, minimize resources delayed an RCU period
- Remove code for forward/backward compat ABI interface
V8: Issues found by kbuild test robot
- Address sparse should be static warnings
- Only compile+link when a driver use/select page_pool,
mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 14:46:17 +00:00
|
|
|
*
|
2023-10-20 09:59:51 +00:00
|
|
|
* If the driver knows that it always requires full pages or its allocations are
|
|
|
|
* always smaller than half a page, it can use one of the more specific API
|
|
|
|
* calls:
|
|
|
|
*
|
|
|
|
* 1. page_pool_alloc_pages(): allocate memory without page splitting when
|
|
|
|
* driver knows that the memory it need is always bigger than half of the page
|
|
|
|
* allocated from page pool. There is no cache line dirtying for 'struct page'
|
|
|
|
* when a page is recycled back to the page pool.
|
|
|
|
*
|
|
|
|
* 2. page_pool_alloc_frag(): allocate memory with page splitting when driver
|
|
|
|
* knows that the memory it need is always smaller than or equal to half of the
|
|
|
|
* page allocated from page pool. Page splitting enables memory saving and thus
|
|
|
|
* avoids TLB/cache miss for data access, but there also is some cost to
|
|
|
|
* implement page splitting, mainly some cache line dirtying/bouncing for
|
2023-12-12 04:46:11 +00:00
|
|
|
* 'struct page' and atomic operation for page->pp_ref_count.
|
2023-10-20 09:59:51 +00:00
|
|
|
*
|
|
|
|
* The API keeps track of in-flight pages, in order to let API users know when
|
|
|
|
* it is safe to free a page_pool object, the API users must call
|
|
|
|
* page_pool_put_page() or page_pool_free_va() to free the page_pool object, or
|
|
|
|
* attach the page_pool object to a page_pool-aware object like skbs marked with
|
2023-08-07 21:00:51 +00:00
|
|
|
* skb_mark_for_recycle().
|
page_pool: refurbish version of page_pool code
Need a fast page recycle mechanism for ndo_xdp_xmit API for returning
pages on DMA-TX completion time, which have good cross CPU
performance, given DMA-TX completion time can happen on a remote CPU.
Refurbish my page_pool code, that was presented[1] at MM-summit 2016.
Adapted page_pool code to not depend the page allocator and
integration into struct page. The DMA mapping feature is kept,
even-though it will not be activated/used in this patchset.
[1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
V2: Adjustments requested by Tariq
- Changed page_pool_create return codes, don't return NULL, only
ERR_PTR, as this simplifies err handling in drivers.
V4: many small improvements and cleanups
- Add DOC comment section, that can be used by kernel-doc
- Improve fallback mode, to work better with refcnt based recycling
e.g. remove a WARN as pointed out by Tariq
e.g. quicker fallback if ptr_ring is empty.
V5: Fixed SPDX license as pointed out by Alexei
V6: Adjustments requested by Eric Dumazet
- Adjust ____cacheline_aligned_in_smp usage/placement
- Move rcu_head in struct page_pool
- Free pages quicker on destroy, minimize resources delayed an RCU period
- Remove code for forward/backward compat ABI interface
V8: Issues found by kbuild test robot
- Address sparse should be static warnings
- Only compile+link when a driver use/select page_pool,
mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 14:46:17 +00:00
|
|
|
*
|
2023-12-13 04:36:50 +00:00
|
|
|
* page_pool_put_page() may be called multiple times on the same page if a page
|
|
|
|
* is split into multiple fragments. For the last fragment, it will either
|
|
|
|
* recycle the page, or in case of page->_refcount > 1, it will release the DMA
|
|
|
|
* mapping and in-flight state accounting.
|
2023-10-20 09:59:51 +00:00
|
|
|
*
|
|
|
|
* dma_sync_single_range_for_device() is only called for the last fragment when
|
|
|
|
* page_pool is created with PP_FLAG_DMA_SYNC_DEV flag, so it depends on the
|
|
|
|
* last freed fragment to do the sync_for_device operation for all fragments in
|
2023-12-13 04:36:50 +00:00
|
|
|
* the same page when a page is split. The API user must setup pool->p.max_len
|
2023-10-20 09:59:51 +00:00
|
|
|
* and pool->p.offset correctly and ensure that page_pool_put_page() is called
|
|
|
|
* with dma_sync_size being -1 for fragment API.
|
page_pool: refurbish version of page_pool code
Need a fast page recycle mechanism for ndo_xdp_xmit API for returning
pages on DMA-TX completion time, which have good cross CPU
performance, given DMA-TX completion time can happen on a remote CPU.
Refurbish my page_pool code, that was presented[1] at MM-summit 2016.
Adapted page_pool code to not depend the page allocator and
integration into struct page. The DMA mapping feature is kept,
even-though it will not be activated/used in this patchset.
[1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
V2: Adjustments requested by Tariq
- Changed page_pool_create return codes, don't return NULL, only
ERR_PTR, as this simplifies err handling in drivers.
V4: many small improvements and cleanups
- Add DOC comment section, that can be used by kernel-doc
- Improve fallback mode, to work better with refcnt based recycling
e.g. remove a WARN as pointed out by Tariq
e.g. quicker fallback if ptr_ring is empty.
V5: Fixed SPDX license as pointed out by Alexei
V6: Adjustments requested by Eric Dumazet
- Adjust ____cacheline_aligned_in_smp usage/placement
- Move rcu_head in struct page_pool
- Free pages quicker on destroy, minimize resources delayed an RCU period
- Remove code for forward/backward compat ABI interface
V8: Issues found by kbuild test robot
- Address sparse should be static warnings
- Only compile+link when a driver use/select page_pool,
mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 14:46:17 +00:00
|
|
|
*/
|
2023-08-04 18:05:24 +00:00
|
|
|
#ifndef _NET_PAGE_POOL_HELPERS_H
|
|
|
|
#define _NET_PAGE_POOL_HELPERS_H
|
page_pool: refurbish version of page_pool code
Need a fast page recycle mechanism for ndo_xdp_xmit API for returning
pages on DMA-TX completion time, which have good cross CPU
performance, given DMA-TX completion time can happen on a remote CPU.
Refurbish my page_pool code, that was presented[1] at MM-summit 2016.
Adapted page_pool code to not depend the page allocator and
integration into struct page. The DMA mapping feature is kept,
even-though it will not be activated/used in this patchset.
[1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
V2: Adjustments requested by Tariq
- Changed page_pool_create return codes, don't return NULL, only
ERR_PTR, as this simplifies err handling in drivers.
V4: many small improvements and cleanups
- Add DOC comment section, that can be used by kernel-doc
- Improve fallback mode, to work better with refcnt based recycling
e.g. remove a WARN as pointed out by Tariq
e.g. quicker fallback if ptr_ring is empty.
V5: Fixed SPDX license as pointed out by Alexei
V6: Adjustments requested by Eric Dumazet
- Adjust ____cacheline_aligned_in_smp usage/placement
- Move rcu_head in struct page_pool
- Free pages quicker on destroy, minimize resources delayed an RCU period
- Remove code for forward/backward compat ABI interface
V8: Issues found by kbuild test robot
- Address sparse should be static warnings
- Only compile+link when a driver use/select page_pool,
mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 14:46:17 +00:00
|
|
|
|
2024-04-18 11:36:12 +00:00
|
|
|
#include <linux/dma-mapping.h>
|
|
|
|
|
2023-08-04 18:05:24 +00:00
|
|
|
#include <net/page_pool/types.h>
|
page_pool: refurbish version of page_pool code
Need a fast page recycle mechanism for ndo_xdp_xmit API for returning
pages on DMA-TX completion time, which have good cross CPU
performance, given DMA-TX completion time can happen on a remote CPU.
Refurbish my page_pool code, that was presented[1] at MM-summit 2016.
Adapted page_pool code to not depend the page allocator and
integration into struct page. The DMA mapping feature is kept,
even-though it will not be activated/used in this patchset.
[1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
V2: Adjustments requested by Tariq
- Changed page_pool_create return codes, don't return NULL, only
ERR_PTR, as this simplifies err handling in drivers.
V4: many small improvements and cleanups
- Add DOC comment section, that can be used by kernel-doc
- Improve fallback mode, to work better with refcnt based recycling
e.g. remove a WARN as pointed out by Tariq
e.g. quicker fallback if ptr_ring is empty.
V5: Fixed SPDX license as pointed out by Alexei
V6: Adjustments requested by Eric Dumazet
- Adjust ____cacheline_aligned_in_smp usage/placement
- Move rcu_head in struct page_pool
- Free pages quicker on destroy, minimize resources delayed an RCU period
- Remove code for forward/backward compat ABI interface
V8: Issues found by kbuild test robot
- Address sparse should be static warnings
- Only compile+link when a driver use/select page_pool,
mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 14:46:17 +00:00
|
|
|
|
2022-03-02 07:55:47 +00:00
|
|
|
#ifdef CONFIG_PAGE_POOL_STATS
|
net: page_pool: expose page pool stats via netlink
Dump the stats into netlink. More clever approaches
like dumping the stats per-CPU for each CPU individually
to see where the packets get consumed can be implemented
in the future.
A trimmed example from a real (but recently booted system):
$ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \
--dump page-pool-stats-get
[{'info': {'id': 19, 'ifindex': 2},
'alloc-empty': 48,
'alloc-fast': 3024,
'alloc-refill': 0,
'alloc-slow': 48,
'alloc-slow-high-order': 0,
'alloc-waive': 0,
'recycle-cache-full': 0,
'recycle-cached': 0,
'recycle-released-refcnt': 0,
'recycle-ring': 0,
'recycle-ring-full': 0},
{'info': {'id': 18, 'ifindex': 2},
'alloc-empty': 66,
'alloc-fast': 11811,
'alloc-refill': 35,
'alloc-slow': 66,
'alloc-slow-high-order': 0,
'alloc-waive': 0,
'recycle-cache-full': 1145,
'recycle-cached': 6541,
'recycle-released-refcnt': 0,
'recycle-ring': 1275,
'recycle-ring-full': 0},
{'info': {'id': 17, 'ifindex': 2},
'alloc-empty': 73,
'alloc-fast': 62099,
'alloc-refill': 413,
...
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-26 23:07:38 +00:00
|
|
|
/* Deprecated driver-facing API, use netlink instead */
|
2022-04-12 16:31:58 +00:00
|
|
|
int page_pool_ethtool_stats_get_count(void);
|
|
|
|
u8 *page_pool_ethtool_stats_get_strings(u8 *data);
|
2024-04-18 11:36:11 +00:00
|
|
|
u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats);
|
2022-04-12 16:31:58 +00:00
|
|
|
|
net: page_pool: expose page pool stats via netlink
Dump the stats into netlink. More clever approaches
like dumping the stats per-CPU for each CPU individually
to see where the packets get consumed can be implemented
in the future.
A trimmed example from a real (but recently booted system):
$ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \
--dump page-pool-stats-get
[{'info': {'id': 19, 'ifindex': 2},
'alloc-empty': 48,
'alloc-fast': 3024,
'alloc-refill': 0,
'alloc-slow': 48,
'alloc-slow-high-order': 0,
'alloc-waive': 0,
'recycle-cache-full': 0,
'recycle-cached': 0,
'recycle-released-refcnt': 0,
'recycle-ring': 0,
'recycle-ring-full': 0},
{'info': {'id': 18, 'ifindex': 2},
'alloc-empty': 66,
'alloc-fast': 11811,
'alloc-refill': 35,
'alloc-slow': 66,
'alloc-slow-high-order': 0,
'alloc-waive': 0,
'recycle-cache-full': 1145,
'recycle-cached': 6541,
'recycle-released-refcnt': 0,
'recycle-ring': 1275,
'recycle-ring-full': 0},
{'info': {'id': 17, 'ifindex': 2},
'alloc-empty': 73,
'alloc-fast': 62099,
'alloc-refill': 413,
...
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-26 23:07:38 +00:00
|
|
|
bool page_pool_get_stats(const struct page_pool *pool,
|
2022-03-02 07:55:49 +00:00
|
|
|
struct page_pool_stats *stats);
|
2022-04-12 16:31:58 +00:00
|
|
|
#else
|
|
|
|
static inline int page_pool_ethtool_stats_get_count(void)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline u8 *page_pool_ethtool_stats_get_strings(u8 *data)
|
|
|
|
{
|
|
|
|
return data;
|
|
|
|
}
|
|
|
|
|
2024-04-18 11:36:11 +00:00
|
|
|
static inline u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats)
|
2022-04-12 16:31:58 +00:00
|
|
|
{
|
|
|
|
return data;
|
|
|
|
}
|
2022-03-02 07:55:47 +00:00
|
|
|
#endif
|
page_pool: refurbish version of page_pool code
Need a fast page recycle mechanism for ndo_xdp_xmit API for returning
pages on DMA-TX completion time, which have good cross CPU
performance, given DMA-TX completion time can happen on a remote CPU.
Refurbish my page_pool code, that was presented[1] at MM-summit 2016.
Adapted page_pool code to not depend the page allocator and
integration into struct page. The DMA mapping feature is kept,
even-though it will not be activated/used in this patchset.
[1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
V2: Adjustments requested by Tariq
- Changed page_pool_create return codes, don't return NULL, only
ERR_PTR, as this simplifies err handling in drivers.
V4: many small improvements and cleanups
- Add DOC comment section, that can be used by kernel-doc
- Improve fallback mode, to work better with refcnt based recycling
e.g. remove a WARN as pointed out by Tariq
e.g. quicker fallback if ptr_ring is empty.
V5: Fixed SPDX license as pointed out by Alexei
V6: Adjustments requested by Eric Dumazet
- Adjust ____cacheline_aligned_in_smp usage/placement
- Move rcu_head in struct page_pool
- Free pages quicker on destroy, minimize resources delayed an RCU period
- Remove code for forward/backward compat ABI interface
V8: Issues found by kbuild test robot
- Address sparse should be static warnings
- Only compile+link when a driver use/select page_pool,
mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 14:46:17 +00:00
|
|
|
|
2023-08-02 16:18:21 +00:00
|
|
|
/**
|
|
|
|
* page_pool_dev_alloc_pages() - allocate a page.
|
|
|
|
* @pool: pool from which to allocate
|
|
|
|
*
|
|
|
|
* Get a page from the page allocator or page_pool caches.
|
|
|
|
*/
|
page_pool: refurbish version of page_pool code
Need a fast page recycle mechanism for ndo_xdp_xmit API for returning
pages on DMA-TX completion time, which have good cross CPU
performance, given DMA-TX completion time can happen on a remote CPU.
Refurbish my page_pool code, that was presented[1] at MM-summit 2016.
Adapted page_pool code to not depend the page allocator and
integration into struct page. The DMA mapping feature is kept,
even-though it will not be activated/used in this patchset.
[1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
V2: Adjustments requested by Tariq
- Changed page_pool_create return codes, don't return NULL, only
ERR_PTR, as this simplifies err handling in drivers.
V4: many small improvements and cleanups
- Add DOC comment section, that can be used by kernel-doc
- Improve fallback mode, to work better with refcnt based recycling
e.g. remove a WARN as pointed out by Tariq
e.g. quicker fallback if ptr_ring is empty.
V5: Fixed SPDX license as pointed out by Alexei
V6: Adjustments requested by Eric Dumazet
- Adjust ____cacheline_aligned_in_smp usage/placement
- Move rcu_head in struct page_pool
- Free pages quicker on destroy, minimize resources delayed an RCU period
- Remove code for forward/backward compat ABI interface
V8: Issues found by kbuild test robot
- Address sparse should be static warnings
- Only compile+link when a driver use/select page_pool,
mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 14:46:17 +00:00
|
|
|
static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool)
|
|
|
|
{
|
|
|
|
gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN);
|
|
|
|
|
|
|
|
return page_pool_alloc_pages(pool, gfp);
|
|
|
|
}
|
|
|
|
|
2023-10-20 09:59:51 +00:00
|
|
|
/**
|
|
|
|
* page_pool_dev_alloc_frag() - allocate a page fragment.
|
|
|
|
* @pool: pool from which to allocate
|
|
|
|
* @offset: offset to the allocated page
|
|
|
|
* @size: requested size
|
|
|
|
*
|
|
|
|
* Get a page fragment from the page allocator or page_pool caches.
|
|
|
|
*
|
|
|
|
* Return:
|
|
|
|
* Return allocated page fragment, otherwise return NULL.
|
|
|
|
*/
|
2021-08-06 02:46:21 +00:00
|
|
|
static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool,
|
|
|
|
unsigned int *offset,
|
|
|
|
unsigned int size)
|
|
|
|
{
|
|
|
|
gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN);
|
|
|
|
|
|
|
|
return page_pool_alloc_frag(pool, offset, size, gfp);
|
|
|
|
}
|
|
|
|
|
2023-10-20 09:59:50 +00:00
|
|
|
static inline struct page *page_pool_alloc(struct page_pool *pool,
|
|
|
|
unsigned int *offset,
|
|
|
|
unsigned int *size, gfp_t gfp)
|
|
|
|
{
|
|
|
|
unsigned int max_size = PAGE_SIZE << pool->p.order;
|
|
|
|
struct page *page;
|
|
|
|
|
|
|
|
if ((*size << 1) > max_size) {
|
|
|
|
*size = max_size;
|
|
|
|
*offset = 0;
|
|
|
|
return page_pool_alloc_pages(pool, gfp);
|
|
|
|
}
|
|
|
|
|
|
|
|
page = page_pool_alloc_frag(pool, offset, *size, gfp);
|
|
|
|
if (unlikely(!page))
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
/* There is very likely not enough space for another fragment, so append
|
|
|
|
* the remaining size to the current fragment to avoid truesize
|
|
|
|
* underestimate problem.
|
|
|
|
*/
|
|
|
|
if (pool->frag_offset + *size > max_size) {
|
|
|
|
*size = max_size - *offset;
|
|
|
|
pool->frag_offset = max_size;
|
|
|
|
}
|
|
|
|
|
|
|
|
return page;
|
|
|
|
}
|
|
|
|
|
2023-10-20 09:59:51 +00:00
|
|
|
/**
|
|
|
|
* page_pool_dev_alloc() - allocate a page or a page fragment.
|
|
|
|
* @pool: pool from which to allocate
|
|
|
|
* @offset: offset to the allocated page
|
|
|
|
* @size: in as the requested size, out as the allocated size
|
|
|
|
*
|
|
|
|
* Get a page or a page fragment from the page allocator or page_pool caches
|
|
|
|
* depending on the requested size in order to allocate memory with least memory
|
|
|
|
* utilization and performance penalty.
|
|
|
|
*
|
|
|
|
* Return:
|
|
|
|
* Return allocated page or page fragment, otherwise return NULL.
|
|
|
|
*/
|
2023-10-20 09:59:50 +00:00
|
|
|
static inline struct page *page_pool_dev_alloc(struct page_pool *pool,
|
|
|
|
unsigned int *offset,
|
|
|
|
unsigned int *size)
|
|
|
|
{
|
|
|
|
gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN);
|
|
|
|
|
|
|
|
return page_pool_alloc(pool, offset, size, gfp);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void *page_pool_alloc_va(struct page_pool *pool,
|
|
|
|
unsigned int *size, gfp_t gfp)
|
|
|
|
{
|
|
|
|
unsigned int offset;
|
|
|
|
struct page *page;
|
|
|
|
|
|
|
|
/* Mask off __GFP_HIGHMEM to ensure we can use page_address() */
|
|
|
|
page = page_pool_alloc(pool, &offset, size, gfp & ~__GFP_HIGHMEM);
|
|
|
|
if (unlikely(!page))
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
return page_address(page) + offset;
|
|
|
|
}
|
|
|
|
|
2023-10-20 09:59:51 +00:00
|
|
|
/**
|
|
|
|
* page_pool_dev_alloc_va() - allocate a page or a page fragment and return its
|
|
|
|
* va.
|
|
|
|
* @pool: pool from which to allocate
|
|
|
|
* @size: in as the requested size, out as the allocated size
|
|
|
|
*
|
|
|
|
* This is just a thin wrapper around the page_pool_alloc() API, and
|
|
|
|
* it returns va of the allocated page or page fragment.
|
|
|
|
*
|
|
|
|
* Return:
|
|
|
|
* Return the va for the allocated page or page fragment, otherwise return NULL.
|
|
|
|
*/
|
2023-10-20 09:59:50 +00:00
|
|
|
static inline void *page_pool_dev_alloc_va(struct page_pool *pool,
|
|
|
|
unsigned int *size)
|
|
|
|
{
|
|
|
|
gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN);
|
|
|
|
|
|
|
|
return page_pool_alloc_va(pool, size, gfp);
|
|
|
|
}
|
|
|
|
|
2023-08-02 16:18:21 +00:00
|
|
|
/**
|
|
|
|
* page_pool_get_dma_dir() - Retrieve the stored DMA direction.
|
|
|
|
* @pool: pool from which page was allocated
|
|
|
|
*
|
|
|
|
* Get the stored dma direction. A driver might decide to store this locally
|
|
|
|
* and avoid the extra cache line from page_pool to determine the direction.
|
2019-06-29 05:23:24 +00:00
|
|
|
*/
|
2024-04-18 11:36:11 +00:00
|
|
|
static inline enum dma_data_direction
|
|
|
|
page_pool_get_dma_dir(const struct page_pool *pool)
|
2019-06-29 05:23:24 +00:00
|
|
|
{
|
|
|
|
return pool->p.dma_dir;
|
|
|
|
}
|
|
|
|
|
2023-12-12 04:46:11 +00:00
|
|
|
/**
|
|
|
|
* page_pool_fragment_page() - split a fresh page into fragments
|
|
|
|
* @page: page to split
|
|
|
|
* @nr: references to set
|
|
|
|
*
|
|
|
|
* pp_ref_count represents the number of outstanding references to the page,
|
|
|
|
* which will be freed using page_pool APIs (rather than page allocator APIs
|
|
|
|
* like put_page()). Such references are usually held by page_pool-aware
|
|
|
|
* objects like skbs marked for page pool recycling.
|
2023-02-17 22:21:30 +00:00
|
|
|
*
|
2023-12-12 04:46:11 +00:00
|
|
|
* This helper allows the caller to take (set) multiple references to a
|
|
|
|
* freshly allocated page. The page must be freshly allocated (have a
|
|
|
|
* pp_ref_count of 1). This is commonly done by drivers and
|
|
|
|
* "fragment allocators" to save atomic operations - either when they know
|
|
|
|
* upfront how many references they will need; or to take MAX references and
|
|
|
|
* return the unused ones with a single atomic dec(), instead of performing
|
|
|
|
* multiple atomic inc() operations.
|
2023-02-17 22:21:30 +00:00
|
|
|
*/
|
2022-01-31 16:40:01 +00:00
|
|
|
static inline void page_pool_fragment_page(struct page *page, long nr)
|
|
|
|
{
|
2023-12-12 04:46:11 +00:00
|
|
|
atomic_long_set(&page->pp_ref_count, nr);
|
2022-01-31 16:40:01 +00:00
|
|
|
}
|
|
|
|
|
2023-12-12 04:46:11 +00:00
|
|
|
static inline long page_pool_unref_page(struct page *page, long nr)
|
2022-01-31 16:40:01 +00:00
|
|
|
{
|
|
|
|
long ret;
|
|
|
|
|
2023-12-12 04:46:11 +00:00
|
|
|
/* If nr == pp_ref_count then we have cleared all remaining
|
page_pool: unify frag_count handling in page_pool_is_last_frag()
Currently when page_pool_create() is called with
PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only
allowed to be called under the below constraints:
1. page_pool_fragment_page() need to be called to setup
page->pp_frag_count immediately.
2. page_pool_defrag_page() often need to be called to drain
the page->pp_frag_count when there is no more user will
be holding on to that page.
Those constraints exist in order to support a page to be
split into multi fragments.
And those constraints have some overhead because of the
cache line dirtying/bouncing and atomic update.
Those constraints are unavoidable for case when we need a
page to be split into more than one fragment, but there is
also case that we want to avoid the above constraints and
their overhead when a page can't be split as it can only
hold a fragment as requested by user, depending on different
use cases:
use case 1: allocate page without page splitting.
use case 2: allocate page with page splitting.
use case 3: allocate page with or without page splitting
depending on the fragment size.
Currently page pool only provide page_pool_alloc_pages() and
page_pool_alloc_frag() API to enable the 1 & 2 separately,
so we can not use a combination of 1 & 2 to enable 3, it is
not possible yet because of the per page_pool flag
PP_FLAG_PAGE_FRAG.
So in order to allow allocating unsplit page without the
overhead of split page while still allow allocating split
page we need to remove the per page_pool flag in
page_pool_is_last_frag(), as best as I can think of, it seems
there are two methods as below:
1. Add per page flag/bit to indicate a page is split or
not, which means we might need to update that flag/bit
everytime the page is recycled, dirtying the cache line
of 'struct page' for use case 1.
2. Unify the page->pp_frag_count handling for both split and
unsplit page by assuming all pages in the page pool is split
into a big fragment initially.
As page pool already supports use case 1 without dirtying the
cache line of 'struct page' whenever a page is recyclable, we
need to support the above use case 3 with minimal overhead,
especially not adding any noticeable overhead for use case 1,
and we are already doing an optimization by not updating
pp_frag_count in page_pool_defrag_page() for the last fragment
user, this patch chooses to unify the pp_frag_count handling
to support the above use case 3.
There is no noticeable performance degradation and some
justification for unifying the frag_count handling with this
patch applied using a micro-benchmark testing in [1].
1. https://lore.kernel.org/all/bf2591f8-7b3c-4480-bb2c-31dc9da1d6ac@huawei.com/
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20231020095952.11055-2-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-20 09:59:48 +00:00
|
|
|
* references to the page:
|
|
|
|
* 1. 'n == 1': no need to actually overwrite it.
|
|
|
|
* 2. 'n != 1': overwrite it with one, which is the rare case
|
2023-12-12 04:46:11 +00:00
|
|
|
* for pp_ref_count draining.
|
2022-01-31 16:40:01 +00:00
|
|
|
*
|
page_pool: unify frag_count handling in page_pool_is_last_frag()
Currently when page_pool_create() is called with
PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only
allowed to be called under the below constraints:
1. page_pool_fragment_page() need to be called to setup
page->pp_frag_count immediately.
2. page_pool_defrag_page() often need to be called to drain
the page->pp_frag_count when there is no more user will
be holding on to that page.
Those constraints exist in order to support a page to be
split into multi fragments.
And those constraints have some overhead because of the
cache line dirtying/bouncing and atomic update.
Those constraints are unavoidable for case when we need a
page to be split into more than one fragment, but there is
also case that we want to avoid the above constraints and
their overhead when a page can't be split as it can only
hold a fragment as requested by user, depending on different
use cases:
use case 1: allocate page without page splitting.
use case 2: allocate page with page splitting.
use case 3: allocate page with or without page splitting
depending on the fragment size.
Currently page pool only provide page_pool_alloc_pages() and
page_pool_alloc_frag() API to enable the 1 & 2 separately,
so we can not use a combination of 1 & 2 to enable 3, it is
not possible yet because of the per page_pool flag
PP_FLAG_PAGE_FRAG.
So in order to allow allocating unsplit page without the
overhead of split page while still allow allocating split
page we need to remove the per page_pool flag in
page_pool_is_last_frag(), as best as I can think of, it seems
there are two methods as below:
1. Add per page flag/bit to indicate a page is split or
not, which means we might need to update that flag/bit
everytime the page is recycled, dirtying the cache line
of 'struct page' for use case 1.
2. Unify the page->pp_frag_count handling for both split and
unsplit page by assuming all pages in the page pool is split
into a big fragment initially.
As page pool already supports use case 1 without dirtying the
cache line of 'struct page' whenever a page is recyclable, we
need to support the above use case 3 with minimal overhead,
especially not adding any noticeable overhead for use case 1,
and we are already doing an optimization by not updating
pp_frag_count in page_pool_defrag_page() for the last fragment
user, this patch chooses to unify the pp_frag_count handling
to support the above use case 3.
There is no noticeable performance degradation and some
justification for unifying the frag_count handling with this
patch applied using a micro-benchmark testing in [1].
1. https://lore.kernel.org/all/bf2591f8-7b3c-4480-bb2c-31dc9da1d6ac@huawei.com/
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20231020095952.11055-2-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-20 09:59:48 +00:00
|
|
|
* The main advantage to doing this is that not only we avoid a atomic
|
|
|
|
* update, as an atomic_read is generally a much cheaper operation than
|
|
|
|
* an atomic update, especially when dealing with a page that may be
|
2023-12-12 04:46:11 +00:00
|
|
|
* referenced by only 2 or 3 users; but also unify the pp_ref_count
|
page_pool: unify frag_count handling in page_pool_is_last_frag()
Currently when page_pool_create() is called with
PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only
allowed to be called under the below constraints:
1. page_pool_fragment_page() need to be called to setup
page->pp_frag_count immediately.
2. page_pool_defrag_page() often need to be called to drain
the page->pp_frag_count when there is no more user will
be holding on to that page.
Those constraints exist in order to support a page to be
split into multi fragments.
And those constraints have some overhead because of the
cache line dirtying/bouncing and atomic update.
Those constraints are unavoidable for case when we need a
page to be split into more than one fragment, but there is
also case that we want to avoid the above constraints and
their overhead when a page can't be split as it can only
hold a fragment as requested by user, depending on different
use cases:
use case 1: allocate page without page splitting.
use case 2: allocate page with page splitting.
use case 3: allocate page with or without page splitting
depending on the fragment size.
Currently page pool only provide page_pool_alloc_pages() and
page_pool_alloc_frag() API to enable the 1 & 2 separately,
so we can not use a combination of 1 & 2 to enable 3, it is
not possible yet because of the per page_pool flag
PP_FLAG_PAGE_FRAG.
So in order to allow allocating unsplit page without the
overhead of split page while still allow allocating split
page we need to remove the per page_pool flag in
page_pool_is_last_frag(), as best as I can think of, it seems
there are two methods as below:
1. Add per page flag/bit to indicate a page is split or
not, which means we might need to update that flag/bit
everytime the page is recycled, dirtying the cache line
of 'struct page' for use case 1.
2. Unify the page->pp_frag_count handling for both split and
unsplit page by assuming all pages in the page pool is split
into a big fragment initially.
As page pool already supports use case 1 without dirtying the
cache line of 'struct page' whenever a page is recyclable, we
need to support the above use case 3 with minimal overhead,
especially not adding any noticeable overhead for use case 1,
and we are already doing an optimization by not updating
pp_frag_count in page_pool_defrag_page() for the last fragment
user, this patch chooses to unify the pp_frag_count handling
to support the above use case 3.
There is no noticeable performance degradation and some
justification for unifying the frag_count handling with this
patch applied using a micro-benchmark testing in [1].
1. https://lore.kernel.org/all/bf2591f8-7b3c-4480-bb2c-31dc9da1d6ac@huawei.com/
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20231020095952.11055-2-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-20 09:59:48 +00:00
|
|
|
* handling by ensuring all pages have partitioned into only 1 piece
|
|
|
|
* initially, and only overwrite it when the page is partitioned into
|
|
|
|
* more than one piece.
|
2022-01-31 16:40:01 +00:00
|
|
|
*/
|
2023-12-12 04:46:11 +00:00
|
|
|
if (atomic_long_read(&page->pp_ref_count) == nr) {
|
page_pool: unify frag_count handling in page_pool_is_last_frag()
Currently when page_pool_create() is called with
PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only
allowed to be called under the below constraints:
1. page_pool_fragment_page() need to be called to setup
page->pp_frag_count immediately.
2. page_pool_defrag_page() often need to be called to drain
the page->pp_frag_count when there is no more user will
be holding on to that page.
Those constraints exist in order to support a page to be
split into multi fragments.
And those constraints have some overhead because of the
cache line dirtying/bouncing and atomic update.
Those constraints are unavoidable for case when we need a
page to be split into more than one fragment, but there is
also case that we want to avoid the above constraints and
their overhead when a page can't be split as it can only
hold a fragment as requested by user, depending on different
use cases:
use case 1: allocate page without page splitting.
use case 2: allocate page with page splitting.
use case 3: allocate page with or without page splitting
depending on the fragment size.
Currently page pool only provide page_pool_alloc_pages() and
page_pool_alloc_frag() API to enable the 1 & 2 separately,
so we can not use a combination of 1 & 2 to enable 3, it is
not possible yet because of the per page_pool flag
PP_FLAG_PAGE_FRAG.
So in order to allow allocating unsplit page without the
overhead of split page while still allow allocating split
page we need to remove the per page_pool flag in
page_pool_is_last_frag(), as best as I can think of, it seems
there are two methods as below:
1. Add per page flag/bit to indicate a page is split or
not, which means we might need to update that flag/bit
everytime the page is recycled, dirtying the cache line
of 'struct page' for use case 1.
2. Unify the page->pp_frag_count handling for both split and
unsplit page by assuming all pages in the page pool is split
into a big fragment initially.
As page pool already supports use case 1 without dirtying the
cache line of 'struct page' whenever a page is recyclable, we
need to support the above use case 3 with minimal overhead,
especially not adding any noticeable overhead for use case 1,
and we are already doing an optimization by not updating
pp_frag_count in page_pool_defrag_page() for the last fragment
user, this patch chooses to unify the pp_frag_count handling
to support the above use case 3.
There is no noticeable performance degradation and some
justification for unifying the frag_count handling with this
patch applied using a micro-benchmark testing in [1].
1. https://lore.kernel.org/all/bf2591f8-7b3c-4480-bb2c-31dc9da1d6ac@huawei.com/
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20231020095952.11055-2-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-20 09:59:48 +00:00
|
|
|
/* As we have ensured nr is always one for constant case using
|
|
|
|
* the BUILD_BUG_ON(), only need to handle the non-constant case
|
2023-12-12 04:46:11 +00:00
|
|
|
* here for pp_ref_count draining, which is a rare case.
|
page_pool: unify frag_count handling in page_pool_is_last_frag()
Currently when page_pool_create() is called with
PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only
allowed to be called under the below constraints:
1. page_pool_fragment_page() need to be called to setup
page->pp_frag_count immediately.
2. page_pool_defrag_page() often need to be called to drain
the page->pp_frag_count when there is no more user will
be holding on to that page.
Those constraints exist in order to support a page to be
split into multi fragments.
And those constraints have some overhead because of the
cache line dirtying/bouncing and atomic update.
Those constraints are unavoidable for case when we need a
page to be split into more than one fragment, but there is
also case that we want to avoid the above constraints and
their overhead when a page can't be split as it can only
hold a fragment as requested by user, depending on different
use cases:
use case 1: allocate page without page splitting.
use case 2: allocate page with page splitting.
use case 3: allocate page with or without page splitting
depending on the fragment size.
Currently page pool only provide page_pool_alloc_pages() and
page_pool_alloc_frag() API to enable the 1 & 2 separately,
so we can not use a combination of 1 & 2 to enable 3, it is
not possible yet because of the per page_pool flag
PP_FLAG_PAGE_FRAG.
So in order to allow allocating unsplit page without the
overhead of split page while still allow allocating split
page we need to remove the per page_pool flag in
page_pool_is_last_frag(), as best as I can think of, it seems
there are two methods as below:
1. Add per page flag/bit to indicate a page is split or
not, which means we might need to update that flag/bit
everytime the page is recycled, dirtying the cache line
of 'struct page' for use case 1.
2. Unify the page->pp_frag_count handling for both split and
unsplit page by assuming all pages in the page pool is split
into a big fragment initially.
As page pool already supports use case 1 without dirtying the
cache line of 'struct page' whenever a page is recyclable, we
need to support the above use case 3 with minimal overhead,
especially not adding any noticeable overhead for use case 1,
and we are already doing an optimization by not updating
pp_frag_count in page_pool_defrag_page() for the last fragment
user, this patch chooses to unify the pp_frag_count handling
to support the above use case 3.
There is no noticeable performance degradation and some
justification for unifying the frag_count handling with this
patch applied using a micro-benchmark testing in [1].
1. https://lore.kernel.org/all/bf2591f8-7b3c-4480-bb2c-31dc9da1d6ac@huawei.com/
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20231020095952.11055-2-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-20 09:59:48 +00:00
|
|
|
*/
|
|
|
|
BUILD_BUG_ON(__builtin_constant_p(nr) && nr != 1);
|
|
|
|
if (!__builtin_constant_p(nr))
|
2023-12-12 04:46:11 +00:00
|
|
|
atomic_long_set(&page->pp_ref_count, 1);
|
page_pool: unify frag_count handling in page_pool_is_last_frag()
Currently when page_pool_create() is called with
PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only
allowed to be called under the below constraints:
1. page_pool_fragment_page() need to be called to setup
page->pp_frag_count immediately.
2. page_pool_defrag_page() often need to be called to drain
the page->pp_frag_count when there is no more user will
be holding on to that page.
Those constraints exist in order to support a page to be
split into multi fragments.
And those constraints have some overhead because of the
cache line dirtying/bouncing and atomic update.
Those constraints are unavoidable for case when we need a
page to be split into more than one fragment, but there is
also case that we want to avoid the above constraints and
their overhead when a page can't be split as it can only
hold a fragment as requested by user, depending on different
use cases:
use case 1: allocate page without page splitting.
use case 2: allocate page with page splitting.
use case 3: allocate page with or without page splitting
depending on the fragment size.
Currently page pool only provide page_pool_alloc_pages() and
page_pool_alloc_frag() API to enable the 1 & 2 separately,
so we can not use a combination of 1 & 2 to enable 3, it is
not possible yet because of the per page_pool flag
PP_FLAG_PAGE_FRAG.
So in order to allow allocating unsplit page without the
overhead of split page while still allow allocating split
page we need to remove the per page_pool flag in
page_pool_is_last_frag(), as best as I can think of, it seems
there are two methods as below:
1. Add per page flag/bit to indicate a page is split or
not, which means we might need to update that flag/bit
everytime the page is recycled, dirtying the cache line
of 'struct page' for use case 1.
2. Unify the page->pp_frag_count handling for both split and
unsplit page by assuming all pages in the page pool is split
into a big fragment initially.
As page pool already supports use case 1 without dirtying the
cache line of 'struct page' whenever a page is recyclable, we
need to support the above use case 3 with minimal overhead,
especially not adding any noticeable overhead for use case 1,
and we are already doing an optimization by not updating
pp_frag_count in page_pool_defrag_page() for the last fragment
user, this patch chooses to unify the pp_frag_count handling
to support the above use case 3.
There is no noticeable performance degradation and some
justification for unifying the frag_count handling with this
patch applied using a micro-benchmark testing in [1].
1. https://lore.kernel.org/all/bf2591f8-7b3c-4480-bb2c-31dc9da1d6ac@huawei.com/
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20231020095952.11055-2-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-20 09:59:48 +00:00
|
|
|
|
2022-01-31 16:40:01 +00:00
|
|
|
return 0;
|
page_pool: unify frag_count handling in page_pool_is_last_frag()
Currently when page_pool_create() is called with
PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only
allowed to be called under the below constraints:
1. page_pool_fragment_page() need to be called to setup
page->pp_frag_count immediately.
2. page_pool_defrag_page() often need to be called to drain
the page->pp_frag_count when there is no more user will
be holding on to that page.
Those constraints exist in order to support a page to be
split into multi fragments.
And those constraints have some overhead because of the
cache line dirtying/bouncing and atomic update.
Those constraints are unavoidable for case when we need a
page to be split into more than one fragment, but there is
also case that we want to avoid the above constraints and
their overhead when a page can't be split as it can only
hold a fragment as requested by user, depending on different
use cases:
use case 1: allocate page without page splitting.
use case 2: allocate page with page splitting.
use case 3: allocate page with or without page splitting
depending on the fragment size.
Currently page pool only provide page_pool_alloc_pages() and
page_pool_alloc_frag() API to enable the 1 & 2 separately,
so we can not use a combination of 1 & 2 to enable 3, it is
not possible yet because of the per page_pool flag
PP_FLAG_PAGE_FRAG.
So in order to allow allocating unsplit page without the
overhead of split page while still allow allocating split
page we need to remove the per page_pool flag in
page_pool_is_last_frag(), as best as I can think of, it seems
there are two methods as below:
1. Add per page flag/bit to indicate a page is split or
not, which means we might need to update that flag/bit
everytime the page is recycled, dirtying the cache line
of 'struct page' for use case 1.
2. Unify the page->pp_frag_count handling for both split and
unsplit page by assuming all pages in the page pool is split
into a big fragment initially.
As page pool already supports use case 1 without dirtying the
cache line of 'struct page' whenever a page is recyclable, we
need to support the above use case 3 with minimal overhead,
especially not adding any noticeable overhead for use case 1,
and we are already doing an optimization by not updating
pp_frag_count in page_pool_defrag_page() for the last fragment
user, this patch chooses to unify the pp_frag_count handling
to support the above use case 3.
There is no noticeable performance degradation and some
justification for unifying the frag_count handling with this
patch applied using a micro-benchmark testing in [1].
1. https://lore.kernel.org/all/bf2591f8-7b3c-4480-bb2c-31dc9da1d6ac@huawei.com/
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20231020095952.11055-2-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-20 09:59:48 +00:00
|
|
|
}
|
2022-01-31 16:40:01 +00:00
|
|
|
|
2023-12-12 04:46:11 +00:00
|
|
|
ret = atomic_long_sub_return(nr, &page->pp_ref_count);
|
2022-01-31 16:40:01 +00:00
|
|
|
WARN_ON(ret < 0);
|
page_pool: unify frag_count handling in page_pool_is_last_frag()
Currently when page_pool_create() is called with
PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only
allowed to be called under the below constraints:
1. page_pool_fragment_page() need to be called to setup
page->pp_frag_count immediately.
2. page_pool_defrag_page() often need to be called to drain
the page->pp_frag_count when there is no more user will
be holding on to that page.
Those constraints exist in order to support a page to be
split into multi fragments.
And those constraints have some overhead because of the
cache line dirtying/bouncing and atomic update.
Those constraints are unavoidable for case when we need a
page to be split into more than one fragment, but there is
also case that we want to avoid the above constraints and
their overhead when a page can't be split as it can only
hold a fragment as requested by user, depending on different
use cases:
use case 1: allocate page without page splitting.
use case 2: allocate page with page splitting.
use case 3: allocate page with or without page splitting
depending on the fragment size.
Currently page pool only provide page_pool_alloc_pages() and
page_pool_alloc_frag() API to enable the 1 & 2 separately,
so we can not use a combination of 1 & 2 to enable 3, it is
not possible yet because of the per page_pool flag
PP_FLAG_PAGE_FRAG.
So in order to allow allocating unsplit page without the
overhead of split page while still allow allocating split
page we need to remove the per page_pool flag in
page_pool_is_last_frag(), as best as I can think of, it seems
there are two methods as below:
1. Add per page flag/bit to indicate a page is split or
not, which means we might need to update that flag/bit
everytime the page is recycled, dirtying the cache line
of 'struct page' for use case 1.
2. Unify the page->pp_frag_count handling for both split and
unsplit page by assuming all pages in the page pool is split
into a big fragment initially.
As page pool already supports use case 1 without dirtying the
cache line of 'struct page' whenever a page is recyclable, we
need to support the above use case 3 with minimal overhead,
especially not adding any noticeable overhead for use case 1,
and we are already doing an optimization by not updating
pp_frag_count in page_pool_defrag_page() for the last fragment
user, this patch chooses to unify the pp_frag_count handling
to support the above use case 3.
There is no noticeable performance degradation and some
justification for unifying the frag_count handling with this
patch applied using a micro-benchmark testing in [1].
1. https://lore.kernel.org/all/bf2591f8-7b3c-4480-bb2c-31dc9da1d6ac@huawei.com/
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20231020095952.11055-2-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-20 09:59:48 +00:00
|
|
|
|
2023-12-12 04:46:11 +00:00
|
|
|
/* We are the last user here too, reset pp_ref_count back to 1 to
|
page_pool: unify frag_count handling in page_pool_is_last_frag()
Currently when page_pool_create() is called with
PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only
allowed to be called under the below constraints:
1. page_pool_fragment_page() need to be called to setup
page->pp_frag_count immediately.
2. page_pool_defrag_page() often need to be called to drain
the page->pp_frag_count when there is no more user will
be holding on to that page.
Those constraints exist in order to support a page to be
split into multi fragments.
And those constraints have some overhead because of the
cache line dirtying/bouncing and atomic update.
Those constraints are unavoidable for case when we need a
page to be split into more than one fragment, but there is
also case that we want to avoid the above constraints and
their overhead when a page can't be split as it can only
hold a fragment as requested by user, depending on different
use cases:
use case 1: allocate page without page splitting.
use case 2: allocate page with page splitting.
use case 3: allocate page with or without page splitting
depending on the fragment size.
Currently page pool only provide page_pool_alloc_pages() and
page_pool_alloc_frag() API to enable the 1 & 2 separately,
so we can not use a combination of 1 & 2 to enable 3, it is
not possible yet because of the per page_pool flag
PP_FLAG_PAGE_FRAG.
So in order to allow allocating unsplit page without the
overhead of split page while still allow allocating split
page we need to remove the per page_pool flag in
page_pool_is_last_frag(), as best as I can think of, it seems
there are two methods as below:
1. Add per page flag/bit to indicate a page is split or
not, which means we might need to update that flag/bit
everytime the page is recycled, dirtying the cache line
of 'struct page' for use case 1.
2. Unify the page->pp_frag_count handling for both split and
unsplit page by assuming all pages in the page pool is split
into a big fragment initially.
As page pool already supports use case 1 without dirtying the
cache line of 'struct page' whenever a page is recyclable, we
need to support the above use case 3 with minimal overhead,
especially not adding any noticeable overhead for use case 1,
and we are already doing an optimization by not updating
pp_frag_count in page_pool_defrag_page() for the last fragment
user, this patch chooses to unify the pp_frag_count handling
to support the above use case 3.
There is no noticeable performance degradation and some
justification for unifying the frag_count handling with this
patch applied using a micro-benchmark testing in [1].
1. https://lore.kernel.org/all/bf2591f8-7b3c-4480-bb2c-31dc9da1d6ac@huawei.com/
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20231020095952.11055-2-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-20 09:59:48 +00:00
|
|
|
* ensure all pages have been partitioned into 1 piece initially,
|
|
|
|
* this should be the rare case when the last two fragment users call
|
2023-12-12 04:46:11 +00:00
|
|
|
* page_pool_unref_page() currently.
|
page_pool: unify frag_count handling in page_pool_is_last_frag()
Currently when page_pool_create() is called with
PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only
allowed to be called under the below constraints:
1. page_pool_fragment_page() need to be called to setup
page->pp_frag_count immediately.
2. page_pool_defrag_page() often need to be called to drain
the page->pp_frag_count when there is no more user will
be holding on to that page.
Those constraints exist in order to support a page to be
split into multi fragments.
And those constraints have some overhead because of the
cache line dirtying/bouncing and atomic update.
Those constraints are unavoidable for case when we need a
page to be split into more than one fragment, but there is
also case that we want to avoid the above constraints and
their overhead when a page can't be split as it can only
hold a fragment as requested by user, depending on different
use cases:
use case 1: allocate page without page splitting.
use case 2: allocate page with page splitting.
use case 3: allocate page with or without page splitting
depending on the fragment size.
Currently page pool only provide page_pool_alloc_pages() and
page_pool_alloc_frag() API to enable the 1 & 2 separately,
so we can not use a combination of 1 & 2 to enable 3, it is
not possible yet because of the per page_pool flag
PP_FLAG_PAGE_FRAG.
So in order to allow allocating unsplit page without the
overhead of split page while still allow allocating split
page we need to remove the per page_pool flag in
page_pool_is_last_frag(), as best as I can think of, it seems
there are two methods as below:
1. Add per page flag/bit to indicate a page is split or
not, which means we might need to update that flag/bit
everytime the page is recycled, dirtying the cache line
of 'struct page' for use case 1.
2. Unify the page->pp_frag_count handling for both split and
unsplit page by assuming all pages in the page pool is split
into a big fragment initially.
As page pool already supports use case 1 without dirtying the
cache line of 'struct page' whenever a page is recyclable, we
need to support the above use case 3 with minimal overhead,
especially not adding any noticeable overhead for use case 1,
and we are already doing an optimization by not updating
pp_frag_count in page_pool_defrag_page() for the last fragment
user, this patch chooses to unify the pp_frag_count handling
to support the above use case 3.
There is no noticeable performance degradation and some
justification for unifying the frag_count handling with this
patch applied using a micro-benchmark testing in [1].
1. https://lore.kernel.org/all/bf2591f8-7b3c-4480-bb2c-31dc9da1d6ac@huawei.com/
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20231020095952.11055-2-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-20 09:59:48 +00:00
|
|
|
*/
|
|
|
|
if (unlikely(!ret))
|
2023-12-12 04:46:11 +00:00
|
|
|
atomic_long_set(&page->pp_ref_count, 1);
|
page_pool: unify frag_count handling in page_pool_is_last_frag()
Currently when page_pool_create() is called with
PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only
allowed to be called under the below constraints:
1. page_pool_fragment_page() need to be called to setup
page->pp_frag_count immediately.
2. page_pool_defrag_page() often need to be called to drain
the page->pp_frag_count when there is no more user will
be holding on to that page.
Those constraints exist in order to support a page to be
split into multi fragments.
And those constraints have some overhead because of the
cache line dirtying/bouncing and atomic update.
Those constraints are unavoidable for case when we need a
page to be split into more than one fragment, but there is
also case that we want to avoid the above constraints and
their overhead when a page can't be split as it can only
hold a fragment as requested by user, depending on different
use cases:
use case 1: allocate page without page splitting.
use case 2: allocate page with page splitting.
use case 3: allocate page with or without page splitting
depending on the fragment size.
Currently page pool only provide page_pool_alloc_pages() and
page_pool_alloc_frag() API to enable the 1 & 2 separately,
so we can not use a combination of 1 & 2 to enable 3, it is
not possible yet because of the per page_pool flag
PP_FLAG_PAGE_FRAG.
So in order to allow allocating unsplit page without the
overhead of split page while still allow allocating split
page we need to remove the per page_pool flag in
page_pool_is_last_frag(), as best as I can think of, it seems
there are two methods as below:
1. Add per page flag/bit to indicate a page is split or
not, which means we might need to update that flag/bit
everytime the page is recycled, dirtying the cache line
of 'struct page' for use case 1.
2. Unify the page->pp_frag_count handling for both split and
unsplit page by assuming all pages in the page pool is split
into a big fragment initially.
As page pool already supports use case 1 without dirtying the
cache line of 'struct page' whenever a page is recyclable, we
need to support the above use case 3 with minimal overhead,
especially not adding any noticeable overhead for use case 1,
and we are already doing an optimization by not updating
pp_frag_count in page_pool_defrag_page() for the last fragment
user, this patch chooses to unify the pp_frag_count handling
to support the above use case 3.
There is no noticeable performance degradation and some
justification for unifying the frag_count handling with this
patch applied using a micro-benchmark testing in [1].
1. https://lore.kernel.org/all/bf2591f8-7b3c-4480-bb2c-31dc9da1d6ac@huawei.com/
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20231020095952.11055-2-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-20 09:59:48 +00:00
|
|
|
|
2022-01-31 16:40:01 +00:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
skbuff: Optimization of SKB coalescing for page pool
In order to address the issues encountered with commit 1effe8ca4e34
("skbuff: fix coalescing for page_pool fragment recycling"), the
combination of the following condition was excluded from skb coalescing:
from->pp_recycle = 1
from->cloned = 1
to->pp_recycle = 1
However, with page pool environments, the aforementioned combination can
be quite common(ex. NetworkMananger may lead to the additional
packet_type being registered, thus the cloning). In scenarios with a
higher number of small packets, it can significantly affect the success
rate of coalescing. For example, considering packets of 256 bytes size,
our comparison of coalescing success rate is as follows:
Without page pool: 70%
With page pool: 13%
Consequently, this has an impact on performance:
Without page pool: 2.57 Gbits/sec
With page pool: 2.26 Gbits/sec
Therefore, it seems worthwhile to optimize this scenario and enable
coalescing of this particular combination. To achieve this, we need to
ensure the correct increment of the "from" SKB page's page pool
reference count (pp_ref_count).
Following this optimization, the success rate of coalescing measured in
our environment has improved as follows:
With page pool: 60%
This success rate is approaching the rate achieved without using page
pool, and the performance has also been improved:
With page pool: 2.52 Gbits/sec
Below is the performance comparison for small packets before and after
this optimization. We observe no impact to packets larger than 4K.
packet size before after improved
(bytes) (Gbits/sec) (Gbits/sec)
128 1.19 1.27 7.13%
256 2.26 2.52 11.75%
512 4.13 4.81 16.50%
1024 6.17 6.73 9.05%
2048 14.54 15.47 6.45%
4096 25.44 27.87 9.52%
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-15 03:30:11 +00:00
|
|
|
static inline void page_pool_ref_page(struct page *page)
|
|
|
|
{
|
|
|
|
atomic_long_inc(&page->pp_ref_count);
|
|
|
|
}
|
|
|
|
|
2023-12-12 04:46:11 +00:00
|
|
|
static inline bool page_pool_is_last_ref(struct page *page)
|
2022-01-31 16:40:01 +00:00
|
|
|
{
|
2023-12-12 04:46:11 +00:00
|
|
|
/* If page_pool_unref_page() returns 0, we were the last user */
|
|
|
|
return page_pool_unref_page(page, 1) == 0;
|
2022-01-31 16:40:01 +00:00
|
|
|
}
|
|
|
|
|
2023-08-02 16:18:21 +00:00
|
|
|
/**
|
|
|
|
* page_pool_put_page() - release a reference to a page pool page
|
|
|
|
* @pool: pool from which page was allocated
|
|
|
|
* @page: page to release a reference on
|
|
|
|
* @dma_sync_size: how much of the page may have been touched by the device
|
|
|
|
* @allow_direct: released by the consumer, allow lockless caching
|
|
|
|
*
|
|
|
|
* The outcome of this depends on the page refcnt. If the driver bumps
|
|
|
|
* the refcnt > 1 this will unmap the page. If the page refcnt is 1
|
|
|
|
* the allocator owns the page and will try to recycle it in one of the pool
|
|
|
|
* caches. If PP_FLAG_DMA_SYNC_DEV is set, the page will be synced for_device
|
|
|
|
* using dma_sync_single_range_for_device().
|
|
|
|
*/
|
2022-01-31 16:40:01 +00:00
|
|
|
static inline void page_pool_put_page(struct page_pool *pool,
|
|
|
|
struct page *page,
|
|
|
|
unsigned int dma_sync_size,
|
|
|
|
bool allow_direct)
|
page_pool: refurbish version of page_pool code
Need a fast page recycle mechanism for ndo_xdp_xmit API for returning
pages on DMA-TX completion time, which have good cross CPU
performance, given DMA-TX completion time can happen on a remote CPU.
Refurbish my page_pool code, that was presented[1] at MM-summit 2016.
Adapted page_pool code to not depend the page allocator and
integration into struct page. The DMA mapping feature is kept,
even-though it will not be activated/used in this patchset.
[1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
V2: Adjustments requested by Tariq
- Changed page_pool_create return codes, don't return NULL, only
ERR_PTR, as this simplifies err handling in drivers.
V4: many small improvements and cleanups
- Add DOC comment section, that can be used by kernel-doc
- Improve fallback mode, to work better with refcnt based recycling
e.g. remove a WARN as pointed out by Tariq
e.g. quicker fallback if ptr_ring is empty.
V5: Fixed SPDX license as pointed out by Alexei
V6: Adjustments requested by Eric Dumazet
- Adjust ____cacheline_aligned_in_smp usage/placement
- Move rcu_head in struct page_pool
- Free pages quicker on destroy, minimize resources delayed an RCU period
- Remove code for forward/backward compat ABI interface
V8: Issues found by kbuild test robot
- Address sparse should be static warnings
- Only compile+link when a driver use/select page_pool,
mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 14:46:17 +00:00
|
|
|
{
|
2018-04-17 14:46:22 +00:00
|
|
|
/* When page_pool isn't compiled-in, net/core/xdp.c doesn't
|
|
|
|
* allow registering MEM_TYPE_PAGE_POOL, but shield linker.
|
|
|
|
*/
|
|
|
|
#ifdef CONFIG_PAGE_POOL
|
2023-12-12 04:46:11 +00:00
|
|
|
if (!page_pool_is_last_ref(page))
|
2022-01-31 16:40:01 +00:00
|
|
|
return;
|
|
|
|
|
2023-12-12 04:46:11 +00:00
|
|
|
page_pool_put_unrefed_page(pool, page, dma_sync_size, allow_direct);
|
2018-04-17 14:46:22 +00:00
|
|
|
#endif
|
page_pool: refurbish version of page_pool code
Need a fast page recycle mechanism for ndo_xdp_xmit API for returning
pages on DMA-TX completion time, which have good cross CPU
performance, given DMA-TX completion time can happen on a remote CPU.
Refurbish my page_pool code, that was presented[1] at MM-summit 2016.
Adapted page_pool code to not depend the page allocator and
integration into struct page. The DMA mapping feature is kept,
even-though it will not be activated/used in this patchset.
[1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
V2: Adjustments requested by Tariq
- Changed page_pool_create return codes, don't return NULL, only
ERR_PTR, as this simplifies err handling in drivers.
V4: many small improvements and cleanups
- Add DOC comment section, that can be used by kernel-doc
- Improve fallback mode, to work better with refcnt based recycling
e.g. remove a WARN as pointed out by Tariq
e.g. quicker fallback if ptr_ring is empty.
V5: Fixed SPDX license as pointed out by Alexei
V6: Adjustments requested by Eric Dumazet
- Adjust ____cacheline_aligned_in_smp usage/placement
- Move rcu_head in struct page_pool
- Free pages quicker on destroy, minimize resources delayed an RCU period
- Remove code for forward/backward compat ABI interface
V8: Issues found by kbuild test robot
- Address sparse should be static warnings
- Only compile+link when a driver use/select page_pool,
mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 14:46:17 +00:00
|
|
|
}
|
2020-02-20 07:41:55 +00:00
|
|
|
|
2023-08-02 16:18:21 +00:00
|
|
|
/**
|
|
|
|
* page_pool_put_full_page() - release a reference on a page pool page
|
|
|
|
* @pool: pool from which page was allocated
|
|
|
|
* @page: page to release a reference on
|
|
|
|
* @allow_direct: released by the consumer, allow lockless caching
|
|
|
|
*
|
|
|
|
* Similar to page_pool_put_page(), but will DMA sync the entire memory area
|
|
|
|
* as configured in &page_pool_params.max_len.
|
|
|
|
*/
|
2022-01-31 16:40:01 +00:00
|
|
|
static inline void page_pool_put_full_page(struct page_pool *pool,
|
|
|
|
struct page *page, bool allow_direct)
|
|
|
|
{
|
|
|
|
page_pool_put_page(pool, page, -1, allow_direct);
|
|
|
|
}
|
|
|
|
|
2023-08-02 16:18:21 +00:00
|
|
|
/**
|
|
|
|
* page_pool_recycle_direct() - release a reference on a page pool page
|
|
|
|
* @pool: pool from which page was allocated
|
|
|
|
* @page: page to release a reference on
|
|
|
|
*
|
|
|
|
* Similar to page_pool_put_full_page() but caller must guarantee safe context
|
|
|
|
* (e.g NAPI), since it will recycle the page directly into the pool fast cache.
|
|
|
|
*/
|
page_pool: refurbish version of page_pool code
Need a fast page recycle mechanism for ndo_xdp_xmit API for returning
pages on DMA-TX completion time, which have good cross CPU
performance, given DMA-TX completion time can happen on a remote CPU.
Refurbish my page_pool code, that was presented[1] at MM-summit 2016.
Adapted page_pool code to not depend the page allocator and
integration into struct page. The DMA mapping feature is kept,
even-though it will not be activated/used in this patchset.
[1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
V2: Adjustments requested by Tariq
- Changed page_pool_create return codes, don't return NULL, only
ERR_PTR, as this simplifies err handling in drivers.
V4: many small improvements and cleanups
- Add DOC comment section, that can be used by kernel-doc
- Improve fallback mode, to work better with refcnt based recycling
e.g. remove a WARN as pointed out by Tariq
e.g. quicker fallback if ptr_ring is empty.
V5: Fixed SPDX license as pointed out by Alexei
V6: Adjustments requested by Eric Dumazet
- Adjust ____cacheline_aligned_in_smp usage/placement
- Move rcu_head in struct page_pool
- Free pages quicker on destroy, minimize resources delayed an RCU period
- Remove code for forward/backward compat ABI interface
V8: Issues found by kbuild test robot
- Address sparse should be static warnings
- Only compile+link when a driver use/select page_pool,
mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 14:46:17 +00:00
|
|
|
static inline void page_pool_recycle_direct(struct page_pool *pool,
|
|
|
|
struct page *page)
|
|
|
|
{
|
2020-02-20 07:41:55 +00:00
|
|
|
page_pool_put_full_page(pool, page, true);
|
2019-06-18 13:05:27 +00:00
|
|
|
}
|
|
|
|
|
2023-10-13 06:48:21 +00:00
|
|
|
#define PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA \
|
2021-11-17 07:56:52 +00:00
|
|
|
(sizeof(dma_addr_t) > sizeof(unsigned long))
|
|
|
|
|
2023-10-20 09:59:51 +00:00
|
|
|
/**
|
|
|
|
* page_pool_free_va() - free a va into the page_pool
|
|
|
|
* @pool: pool from which va was allocated
|
|
|
|
* @va: va to be freed
|
|
|
|
* @allow_direct: freed by the consumer, allow lockless caching
|
|
|
|
*
|
|
|
|
* Free a va allocated from page_pool_allo_va().
|
|
|
|
*/
|
2023-10-20 09:59:50 +00:00
|
|
|
static inline void page_pool_free_va(struct page_pool *pool, void *va,
|
|
|
|
bool allow_direct)
|
|
|
|
{
|
|
|
|
page_pool_put_page(pool, virt_to_head_page(va), -1, allow_direct);
|
|
|
|
}
|
|
|
|
|
2023-08-02 16:18:21 +00:00
|
|
|
/**
|
|
|
|
* page_pool_get_dma_addr() - Retrieve the stored DMA address.
|
|
|
|
* @page: page allocated from a page pool
|
|
|
|
*
|
|
|
|
* Fetch the DMA address of the page. The page pool to which the page belongs
|
|
|
|
* must had been created with PP_FLAG_DMA_MAP.
|
|
|
|
*/
|
2024-04-18 11:36:11 +00:00
|
|
|
static inline dma_addr_t page_pool_get_dma_addr(const struct page *page)
|
2019-06-18 13:05:12 +00:00
|
|
|
{
|
2021-11-17 07:56:52 +00:00
|
|
|
dma_addr_t ret = page->dma_addr;
|
|
|
|
|
2023-10-13 06:48:21 +00:00
|
|
|
if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA)
|
|
|
|
ret <<= PAGE_SHIFT;
|
2021-11-17 07:56:52 +00:00
|
|
|
|
|
|
|
return ret;
|
2021-05-15 00:27:24 +00:00
|
|
|
}
|
|
|
|
|
2023-10-13 06:48:21 +00:00
|
|
|
static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
|
2021-05-15 00:27:24 +00:00
|
|
|
{
|
2023-10-13 06:48:21 +00:00
|
|
|
if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) {
|
|
|
|
page->dma_addr = addr >> PAGE_SHIFT;
|
|
|
|
|
|
|
|
/* We assume page alignment to shave off bottom bits,
|
|
|
|
* if this "compression" doesn't work we need to drop.
|
|
|
|
*/
|
|
|
|
return addr != (dma_addr_t)page->dma_addr << PAGE_SHIFT;
|
|
|
|
}
|
|
|
|
|
2021-08-06 02:46:20 +00:00
|
|
|
page->dma_addr = addr;
|
2023-10-13 06:48:21 +00:00
|
|
|
return false;
|
2021-08-06 02:46:20 +00:00
|
|
|
}
|
|
|
|
|
2024-04-18 11:36:12 +00:00
|
|
|
/**
|
|
|
|
* page_pool_dma_sync_for_cpu - sync Rx page for CPU after it's written by HW
|
|
|
|
* @pool: &page_pool the @page belongs to
|
|
|
|
* @page: page to sync
|
|
|
|
* @offset: offset from page start to "hard" start if using PP frags
|
|
|
|
* @dma_sync_size: size of the data written to the page
|
|
|
|
*
|
|
|
|
* Can be used as a shorthand to sync Rx pages before accessing them in the
|
|
|
|
* driver. Caller must ensure the pool was created with ``PP_FLAG_DMA_MAP``.
|
|
|
|
* Note that this version performs DMA sync unconditionally, even if the
|
|
|
|
* associated PP doesn't perform sync-for-device.
|
|
|
|
*/
|
|
|
|
static inline void page_pool_dma_sync_for_cpu(const struct page_pool *pool,
|
|
|
|
const struct page *page,
|
|
|
|
u32 offset, u32 dma_sync_size)
|
|
|
|
{
|
|
|
|
dma_sync_single_range_for_cpu(pool->p.dev,
|
|
|
|
page_pool_get_dma_addr(page),
|
|
|
|
offset + pool->p.offset, dma_sync_size,
|
|
|
|
page_pool_get_dma_dir(pool));
|
|
|
|
}
|
|
|
|
|
2019-07-08 21:34:28 +00:00
|
|
|
static inline bool page_pool_put(struct page_pool *pool)
|
|
|
|
{
|
|
|
|
return refcount_dec_and_test(&pool->user_cnt);
|
|
|
|
}
|
|
|
|
|
2019-11-20 00:15:17 +00:00
|
|
|
static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid)
|
|
|
|
{
|
|
|
|
if (unlikely(pool->p.nid != new_nid))
|
|
|
|
page_pool_update_nid(pool, new_nid);
|
|
|
|
}
|
2020-11-13 11:48:29 +00:00
|
|
|
|
2023-08-04 18:05:24 +00:00
|
|
|
#endif /* _NET_PAGE_POOL_HELPERS_H */
|