Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Aug 09, 2021
    • Yunsheng Lin's avatar
      page_pool: add frag page recycling support in page pool · 53e0961d
      Yunsheng Lin authored
      
      Currently page pool only support page recycling when there
      is only one user of the page, and the split page reusing
      implemented in the most driver can not use the page pool as
      bing-pong way of reusing requires the multi user support in
      page pool.
      
      Those reusing or recycling has below limitations:
      1. page from page pool can only be used be one user in order
         for the page recycling to happen.
      2. Bing-pong way of reusing in most driver does not support
         multi desc using different part of the same page in order
         to save memory.
      
      So add multi-users support and frag page recycling in page
      pool to overcome the above limitation.
      
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      53e0961d
    • Yunsheng Lin's avatar
      page_pool: add interface to manipulate frag count in page pool · 0e9d2a0a
      Yunsheng Lin authored
      
      For 32 bit systems with 64 bit dma, dma_addr[1] is used to
      store the upper 32 bit dma addr, those system should be rare
      those days.
      
      For normal system, the dma_addr[1] in 'struct page' is not
      used, so we can reuse dma_addr[1] for storing frag count,
      which means how many frags this page might be splited to.
      
      In order to simplify the page frag support in the page pool,
      the PAGE_POOL_DMA_USE_PP_FRAG_COUNT macro is added to indicate
      the 32 bit systems with 64 bit dma, and the page frag support
      in page pool is disabled for such system.
      
      The newly added page_pool_set_frag_count() is called to reserve
      the maximum frag count before any page frag is passed to the
      user. The page_pool_atomic_sub_frag_count_return() is called
      when user is done with the page frag.
      
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0e9d2a0a
    • Yunsheng Lin's avatar
      page_pool: keep pp info as long as page pool owns the page · 57f05bc2
      Yunsheng Lin authored
      
      Currently, page->pp is cleared and set everytime the page
      is recycled, which is unnecessary.
      
      So only set the page->pp when the page is added to the page
      pool and only clear it when the page is released from the
      page pool.
      
      This is also a preparation to support allocating frag page
      in page pool.
      
      Reviewed-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      57f05bc2
  2. Jun 07, 2021
    • Ilias Apalodimas's avatar
      page_pool: Allow drivers to hint on SKB recycling · 6a5bcd84
      Ilias Apalodimas authored
      
      Up to now several high speed NICs have custom mechanisms of recycling
      the allocated memory they use for their payloads.
      Our page_pool API already has recycling capabilities that are always
      used when we are running in 'XDP mode'. So let's tweak the API and the
      kernel network stack slightly and allow the recycling to happen even
      during the standard operation.
      The API doesn't take into account 'split page' policies used by those
      drivers currently, but can be extended once we have users for that.
      
      The idea is to be able to intercept the packet on skb_release_data().
      If it's a buffer coming from our page_pool API recycle it back to the
      pool for further usage or just release the packet entirely.
      
      To achieve that we introduce a bit in struct sk_buff (pp_recycle:1) and
      a field in struct page (page->pp) to store the page_pool pointer.
      Storing the information in page->pp allows us to recycle both SKBs and
      their fragments.
      We could have skipped the skb bit entirely, since identical information
      can bederived from struct page. However, in an effort to affect the free path
      as less as possible, reading a single bit in the skb which is already
      in cache, is better that trying to derive identical information for the
      page stored data.
      
      The driver or page_pool has to take care of the sync operations on it's own
      during the buffer recycling since the buffer is, after opting-in to the
      recycling, never unmapped.
      
      Since the gain on the drivers depends on the architecture, we are not
      enabling recycling by default if the page_pool API is used on a driver.
      In order to enable recycling the driver must call skb_mark_for_recycle()
      to store the information we need for recycling in page->pp and
      enabling the recycling bit, or page_pool_store_mem_info() for a fragment.
      
      Co-developed-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Co-developed-by: default avatarMatteo Croce <mcroce@microsoft.com>
      Signed-off-by: default avatarMatteo Croce <mcroce@microsoft.com>
      Signed-off-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a5bcd84
  3. May 14, 2021
  4. Apr 30, 2021
    • Jesper Dangaard Brouer's avatar
      net: page_pool: use alloc_pages_bulk in refill code path · be5dba25
      Jesper Dangaard Brouer authored
      There are cases where the page_pool need to refill with pages from the
      page allocator.  Some workloads cause the page_pool to release pages
      instead of recycling these pages.
      
      For these workload it can improve performance to bulk alloc pages from the
      page-allocator to refill the alloc cache.
      
      For XDP-redirect workload with 100G mlx5 driver (that use page_pool)
      redirecting xdp_frame packets into a veth, that does XDP_PASS to create an
      SKB from the xdp_frame, which then cannot return the page to the
      page_pool.
      
      Performance results under GitHub xdp-project[1]:
       [1] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/page_pool06_alloc_pages_bulk.org
      
      Mel: The patch "net: page_pool: convert to use alloc_pages_bulk_array
      variant" was squashed with this patch. From the test page, the array
      variant was superior with one of the test results as follows.
      
      	Kernel		XDP stats       CPU     pps           Delta
      	Baseline	XDP-RX CPU      total   3,771,0...
      be5dba25
  5. Nov 13, 2020
  6. Feb 20, 2020
  7. Nov 20, 2019
    • Lorenzo Bianconi's avatar
      net: page_pool: add the possibility to sync DMA memory for device · e68bc756
      Lorenzo Bianconi authored
      
      Introduce the following parameters in order to add the possibility to sync
      DMA memory for device before putting allocated pages in the page_pool
      caches:
      - PP_FLAG_DMA_SYNC_DEV: if set in page_pool_params flags, all pages that
        the driver gets from page_pool will be DMA-synced-for-device according
        to the length provided by the device driver. Please note DMA-sync-for-CPU
        is still device driver responsibility
      - offset: DMA address offset where the DMA engine starts copying rx data
      - max_len: maximum DMA memory size page_pool is allowed to flush. This
        is currently used in __page_pool_alloc_pages_slow routine when pages
        are allocated from page allocator
      These parameters are supposed to be set by device drivers.
      
      This optimization reduces the length of the DMA-sync-for-device.
      The optimization is valid because pages are initially
      DMA-synced-for-device as defined via max_len. At RX time, the driver
      will perform a DMA-sync-for-CPU on the memory for the packet length.
      What is important is the memory occupied by packet payload, because
      this is the area CPU is allowed to read and modify. As we don't track
      cache-lines written into by the CPU, simply use the packet payload length
      as dma_sync_size at page_pool recycle time. This also take into account
      any tail-extend.
      
      Tested-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e68bc756
    • Saeed Mahameed's avatar
      page_pool: Add API to update numa node · bc836748
      Saeed Mahameed authored
      
      Add page_pool_update_nid() to be called by page pool consumers when they
      detect numa node changes.
      
      It will update the page pool nid value to start allocating from the new
      effective numa node.
      
      This is to mitigate page pool allocating pages from a wrong numa node,
      where the pool was originally allocated, and holding on to pages that
      belong to a different numa node, which causes performance degradation.
      
      For pages that are already being consumed and could be returned to the
      pool by the consumer, in next patch we will add a check per page to avoid
      recycling them back to the pool and return them to the page allocator.
      
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Reviewed-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc836748
  8. Nov 18, 2019
    • Jesper Dangaard Brouer's avatar
      page_pool: add destroy attempts counter and rename tracepoint · 7c9e6942
      Jesper Dangaard Brouer authored
      
      When Jonathan change the page_pool to become responsible to its
      own shutdown via deferred work queue, then the disconnect_cnt
      counter was removed from xdp memory model tracepoint.
      
      This patch change the page_pool_inflight tracepoint name to
      page_pool_release, because it reflects the new responsability
      better.  And it reintroduces a counter that reflect the number of
      times page_pool_release have been tried.
      
      The counter is also used by the code, to only empty the alloc
      cache once.  With a stuck work queue running every second and
      counter being 64-bit, it will overrun in approx 584 billion
      years. For comparison, Earth lifetime expectancy is 7.5 billion
      years, before the Sun will engulf, and destroy, the Earth.
      
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c9e6942
  9. Nov 16, 2019
  10. Jul 08, 2019
  11. Jul 01, 2019
  12. Jun 19, 2019
  13. May 24, 2018
    • Jesper Dangaard Brouer's avatar
      xdp: introduce xdp_return_frame_rx_napi · 389ab7f0
      Jesper Dangaard Brouer authored
      
      When sending an xdp_frame through xdp_do_redirect call, then error
      cases can happen where the xdp_frame needs to be dropped, and
      returning an -errno code isn't sufficient/possible any-longer
      (e.g. for cpumap case). This is already fully supported, by simply
      calling xdp_return_frame.
      
      This patch is an optimization, which provides xdp_return_frame_rx_napi,
      which is a faster variant for these error cases.  It take advantage of
      the protection provided by XDP RX running under NAPI protection.
      
      This change is mostly relevant for drivers using the page_pool
      allocator as it can take advantage of this. (Tested with mlx5).
      
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      389ab7f0
  14. Apr 17, 2018
    • Jesper Dangaard Brouer's avatar
      xdp: allow page_pool as an allocator type in xdp_return_frame · 57d0a1c1
      Jesper Dangaard Brouer authored
      
      New allocator type MEM_TYPE_PAGE_POOL for page_pool usage.
      
      The registered allocator page_pool pointer is not available directly
      from xdp_rxq_info, but it could be (if needed).  For now, the driver
      should keep separate track of the page_pool pointer, which it should
      use for RX-ring page allocation.
      
      As suggested by Saeed, to maintain a symmetric API it is the drivers
      responsibility to allocate/create and free/destroy the page_pool.
      Thus, after the driver have called xdp_rxq_info_unreg(), it is drivers
      responsibility to free the page_pool, but with a RCU free call.  This
      is done easily via the page_pool helper page_pool_destroy() (which
      avoids touching any driver code during the RCU callback, which could
      happen after the driver have been unloaded).
      
      V8: address issues found by kbuild test robot
       - Address sparse should be static warnings
       - Allow xdp.o to be compiled without page_pool.o
      
      V9: Remove inline from .c file, compiler knows best
      
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57d0a1c1
    • Jesper Dangaard Brouer's avatar
      page_pool: refurbish version of page_pool code · ff7d6b27
      Jesper Dangaard Brouer authored
      Need a fast page recycle mechanism for ndo_xdp_xmit API for returning
      pages on DMA-TX completion time, which have good cross CPU
      performance, given DMA-TX completion time can happen on a remote CPU.
      
      Refurbish my page_pool code, that was presented[1] at MM-summit 2016.
      Adapted page_pool code to not depend the page allocator and
      integration into struct page.  The DMA mapping feature is kept,
      even-though it will not be activated/used in this patchset.
      
      [1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
      
      
      
      V2: Adjustments requested by Tariq
       - Changed page_pool_create return codes, don't return NULL, only
         ERR_PTR, as this simplifies err handling in drivers.
      
      V4: many small improvements and cleanups
      - Add DOC comment section, that can be used by kernel-doc
      - Improve fallback mode, to work better with refcnt based recycling
        e.g. remove a WARN as pointed out by Tariq
        e.g. quicker fallback if ptr_ring is empty.
      
      V5: Fixed SPDX license as pointed out by Alexei
      
      V6: Adjustments requested by Eric Dumazet
       - Adjust ____cacheline_aligned_in_smp usage/placement
       - Move rcu_head in struct page_pool
       - Free pages quicker on destroy, minimize resources delayed an RCU period
       - Remove code for forward/backward compat ABI interface
      
      V8: Issues found by kbuild test robot
       - Address sparse should be static warnings
       - Only compile+link when a driver use/select page_pool,
         mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches
      
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff7d6b27