Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Nov 13, 2017
    • David Howells's avatar
      afs: Overhaul cell database management · 989782dc
      David Howells authored
      
      Overhaul the way that the in-kernel AFS client keeps track of cells in the
      following manner:
      
       (1) Cells are now held in an rbtree to make walking them quicker and RCU
           managed (though this is probably overkill).
      
       (2) Cells now have a manager work item that:
      
           (A) Looks after fetching and refreshing the VL server list.
      
           (B) Manages cell record lifetime, including initialising and
           	 destruction.
      
           (B) Manages cell record caching whereby threads are kept around for a
           	 certain time after last use and then destroyed.
      
           (C) Manages the FS-Cache index cookie for a cell.  It is not permitted
           	 for a cookie to be in use twice, so we have to be careful to not
           	 allow a new cell record to exist at the same time as an old record
           	 of the same name.
      
       (3) Each AFS network namespace is given a manager work item that manages
           the cells within it, maintaining a single timer to prod cells into
           updating their DNS records.
      
           This uses the reduce_timer() facility to make the timer expire at the
           soonest timed event that needs happening.
      
       (4) When a module is being unloaded, cells and cell managers are now
           counted out using dec_after_work() to make sure the module text is
           pinned until after the data structures have been cleaned up.
      
       (5) Each cell's VL server list is now protected by a seqlock rather than a
           semaphore.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      989782dc
    • David Howells's avatar
      afs: Overhaul permit caching · be080a6f
      David Howells authored
      
      Overhaul permit caching in AFS by making it per-vnode and sharing permit
      lists where possible.
      
      When most of the fileserver operations are called, they return a status
      structure indicating the (revised) details of the vnode or vnodes involved
      in the operation.  This includes the access mark derived from the ACL
      (named CallerAccess in the protocol definition file).  This is cacheable
      and if the ACL changes, the server will tell us that it is breaking the
      callback promise, at which point we can discard the currently cached
      permits.
      
      With this patch, the afs_permits structure has, at the end, an array of
      { key, CallerAccess } elements, sorted by key pointer.  This is then cached
      in a hash table so that it can be shared between vnodes with the same
      access permits.
      
      Permit lists can only be shared if they contain the exact same set of
      key->CallerAccess mappings.
      
      Note that that table is global rather than being per-net_ns.  If the keys
      in a permit list cross net_ns boundaries, there is no problem sharing the
      cached permits, since the permits are just integer masks.
      
      Since permit lists pin keys, the permit cache also makes it easier for a
      future patch to find all occurrences of a key and remove them by means of
      setting the afs_permits::invalidated flag and then clearing the appropriate
      key pointer.  In such an event, memory barriers will need adding.
      
      Lastly, the permit caching is skipped if the server has sent either a
      vnode-specific or an entire-server callback since the start of the
      operation.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      be080a6f
    • David Howells's avatar
      afs: Overhaul the callback handling · c435ee34
      David Howells authored
      
      Overhaul the AFS callback handling by the following means:
      
       (1) Don't give up callback promises on vnodes that we are no longer using,
           rather let them just expire on the server or let the server break
           them.  This is actually more efficient for the server as the callback
           lookup is expensive if there are lots of extant callbacks.
      
       (2) Only give up the callback promises we have from a server when the
           server record is destroyed.  Then we can just give up *all* the
           callback promises on it in one go.
      
       (3) Servers can end up being shared between cells if cells are aliased, so
           don't add all the vnodes being backed by a particular server into a
           big FID-indexed tree on that server as there may be duplicates.
      
           Instead have each volume instance (~= superblock) register an interest
           in a server as it starts to make use of it and use this to allow the
           processor for callbacks from the server to find the superblock and
           thence the inode corresponding to the FID being broken by means of
           ilookup_nowait().
      
       (4) Rather than iterating over the entire callback list when a mass-break
           comes in from the server, maintain a counter of mass-breaks in
           afs_server (cb_seq) and make afs_validate() check it against the copy
           in afs_vnode.
      
           It would be nice not to have to take a read_lock whilst doing this,
           but that's tricky without using RCU.
      
       (5) Save a ref on the fileserver we're using for a call in the afs_call
           struct so that we can access its cb_s_break during call decoding.
      
       (6) Write-lock around callback and status storage in a vnode and read-lock
           around getattr so that we don't see the status mid-update.
      
      This has the following consequences:
      
       (1) Data invalidation isn't seen until someone calls afs_validate() on a
           vnode.  Unfortunately, we need to use a key to query the server, but
           getting one from a background thread is tricky without caching loads
           of keys all over the place.
      
       (2) Mass invalidation isn't seen until someone calls afs_validate().
      
       (3) Callback breaking is going to hit the inode_hash_lock quite a bit.
           Could this be replaced with rcu_read_lock() since inodes are destroyed
           under RCU conditions.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      c435ee34
    • David Howells's avatar
      afs: Rename struct afs_call server member to cm_server · d0676a16
      David Howells authored
      
      Rename the server member of struct afs_call to cm_server as we're only
      going to be using it for incoming calls for the Cache Manager service.
      This makes it easier to differentiate from the pointer to the target server
      for the client, which will point to a different structure to allow for
      callback handling.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      d0676a16
    • David Howells's avatar
      afs: Fix the afs_uuid struct to make the char-sized fields signed · 03dc2cfc
      David Howells authored
      
      In AFS's encoding of a UUID, the eight 'char' fields are all signed, so
      represent them with __s8 rather than __u8.  This makes the compiler
      sign-extend them correctly when XDR-encoding them.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      03dc2cfc
    • David Howells's avatar
      afs: Connect up the CB.ProbeUuid · f4b3526d
      David Howells authored
      
      The handler for the CB.ProbeUuid operation in the cache manager is
      implemented, but isn't listed in the switch-statement of operation
      selection, so won't be used.  Fix this by adding it.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f4b3526d
    • David Howells's avatar
      afs: Potentially return call->reply[0] from afs_make_call() · 33cd7f2b
      David Howells authored
      
      If call->ret_reply0 is set, return call->reply[0] on success.  Change the
      return type of afs_make_call() to long so that this can be passed back
      without bit loss and then cast to a pointer if required.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      33cd7f2b
    • David Howells's avatar
      afs: Condense afs_call's reply{,2,3,4} into an array · 97e3043a
      David Howells authored
      
      Condense struct afs_call's reply anchor members - reply{,2,3,4} - into an
      array.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      97e3043a
    • David Howells's avatar
      afs: Consolidate abort_to_error translators · f780c8ea
      David Howells authored
      
      The AFS abort code space is shared across all services, so there's no need
      for separate abort_to_error translators for each service.
      
      Consolidate them into a single function and remove the function pointers
      for them.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f780c8ea
    • David Howells's avatar
      afs: Allow IPv6 address specification of VL servers · 3838d3ec
      David Howells authored
      
      Allow VL server specifications to be given IPv6 addresses as well as IPv4
      addresses, for example as:
      
      	echo add foo.org 1111:2222:3333:0:4444:5555:6666:7777 >/proc/fs/afs/cells
      
      Note that ':' is the expected separator for separating IPv4 addresses, but
      if a ',' is detected or no '.' is detected in the string, the delimiter is
      switched to ','.
      
      This also works with DNS AFSDB or SRV record strings fetched by upcall from
      userspace.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      3838d3ec
    • David Howells's avatar
      afs: Keep and pass sockaddr_rxrpc addresses rather than in_addr · 4d9df986
      David Howells authored
      
      Keep and pass sockaddr_rxrpc addresses around rather than keeping and
      passing in_addr addresses to allow for the use of IPv6 and non-standard
      port numbers in future.
      
      This also allows the port and service_id fields to be removed from the
      afs_call struct.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      4d9df986
    • David Howells's avatar
      afs: Update the cache index structure · ad6a942a
      David Howells authored
      
      Update the cache index structure in the following ways:
      
       (1) Don't use the volume name followed by the volume type as levels in the
           cache index.  Volumes can be renamed.  Use the volume ID instead.
      
       (2) Don't store the VLDB data for a volume in the tree.  If the volume
           database should be cached locally, then it should be done in a separate
           tree.
      
       (3) Expand the volume ID stored in the cache to 64 bits.
      
       (4) Expand the file/vnode ID stored in the cache to 96 bits.
      
       (5) Increment the cache structure version number to 1.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      ad6a942a
    • David Howells's avatar
      afs: Add some protocol defs · 91a90380
      David Howells authored
      
      Add some protocol definitions, including max field lengths, flag defs, an
      XDR-encoded UUID def, more VL operation IDs and more fileserver abort
      codes.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      91a90380
    • David Howells's avatar
      afs: Push the net ns pointer to more places · 9ed900b1
      David Howells authored
      
      Push the network namespace pointer to more places in AFS, including the
      afs_server structure (which doesn't hold a ref on the netns).
      
      In particular, afs_put_cell() now takes requires a net ns parameter so that
      it can safely alter the netns after decrementing the cell usage count - the
      cell will be deallocated by a background thread after being cached for a
      period, which means that it's not safe to access it after reducing its
      usage count.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      9ed900b1
    • David Howells's avatar
      afs: Note the cell in the superblock info also · 49566f6f
      David Howells authored
      
      Keep a reference to the cell in the superblock info structure in addition
      to the volume and net pointers.  This will make it easier to clean up in a
      future patch in which afs_put_volume() will need the cell pointer.
      
      Whilst we're at it, make the cell and volume getting functions return a
      pointer to the object got to make the call sites look neater.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      49566f6f
    • David Howells's avatar
      afs: Fix server reaping · 59fa1c4a
      David Howells authored
      
      Fix server reaping and make sure it's all done before we start trying to
      purge cells, given that servers currently pin cells.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      59fa1c4a
    • David Howells's avatar
      afs: Close the rxrpc socket only after purging the servers · e3b2ffe0
      David Howells authored
      
      Close the rxrpc socket only after we've purged the server records (and also
      cell and volume records which might refer to servers) so that we can give
      up the callbacks on each server.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      e3b2ffe0
    • David Howells's avatar
      afs: Lay the groundwork for supporting network namespaces · f044c884
      David Howells authored
      
      Lay the groundwork for supporting network namespaces (netns) to the AFS
      filesystem by moving various global features to a network-namespace struct
      (afs_net) and providing an instance of this as a temporary global variable
      that everything uses via accessor functions for the moment.
      
      The following changes have been made:
      
       (1) Store the netns in the superblock info.  This will be obtained from
           the mounter's nsproxy on a manual mount and inherited from the parent
           superblock on an automount.
      
       (2) The cell list is made per-netns.  It can be viewed through
           /proc/net/afs/cells and also be modified by writing commands to that
           file.
      
       (3) The local workstation cell is set per-ns in /proc/net/afs/rootcell.
           This is unset by default.
      
       (4) The 'rootcell' module parameter, which sets a cell and VL server list
           modifies the init net namespace, thereby allowing an AFS root fs to be
           theoretically used.
      
       (5) The volume location lists and the file lock manager are made
           per-netns.
      
       (6) The AF_RXRPC socket and associated I/O bits are made per-ns.
      
      The various workqueues remain global for the moment.
      
      Changes still to be made:
      
       (1) /proc/fs/afs/ should be moved to /proc/net/afs/ and a symlink emplaced
           from the old name.
      
       (2) A per-netns subsys needs to be registered for AFS into which it can
           store its per-netns data.
      
       (3) Rather than the AF_RXRPC socket being opened on module init, it needs
           to be opened on the creation of a superblock in that netns.
      
       (4) The socket needs to be closed when the last superblock using it is
           destroyed and all outstanding client calls on it have been completed.
           This prevents a reference loop on the namespace.
      
       (5) It is possible that several namespaces will want to use AFS, in which
           case each one will need its own UDP port.  These can either be set
           through /proc/net/afs/cm_port or the kernel can pick one at random.
           The init_ns gets 7001 by default.
      
      Other issues that need resolving:
      
       (1) The DNS keyring needs net-namespacing.
      
       (2) Where do upcalls go (eg. DNS request-key upcall)?
      
       (3) Need something like open_socket_in_file_ns() syscall so that AFS
           command line tools attempting to operate on an AFS file/volume have
           their RPC calls go to the right place.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f044c884
    • David Howells's avatar
      Pass mode to wait_on_atomic_t() action funcs and provide default actions · 5e4def20
      David Howells authored
      
      Make wait_on_atomic_t() pass the TASK_* mode onto its action function as an
      extra argument and make it 'unsigned int throughout.
      
      Also, consolidate a bunch of identical action functions into a default
      function that can do the appropriate thing for the mode.
      
      Also, change the argument name in the bit_wait*() function declarations to
      reflect the fact that it's the mode and not the bit number.
      
      [Peter Z gives this a grudging ACK, but thinks that the whole atomic_t wait
      should be done differently, though he's not immediately sure as to how]
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      cc: Ingo Molnar <mingo@kernel.org>
      5e4def20
    • David Howells's avatar
      Merge remote-tracking branch 'tip/timers/core' into afs-next · 81445e63
      David Howells authored
      
      These AFS patches need the timer_reduce() patch from timers/core.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      81445e63
  2. Nov 12, 2017