Commit graph

100 commits

Author SHA1 Message Date
Amir Goldstein a5e57b4d37 fsnotify: optimize the case of no permission event watchers
Commit e43de7f086 ("fsnotify: optimize the case of no marks of any type")
optimized the case where there are no fsnotify watchers on any of the
filesystem's objects.

It is quite common for a system to have a single local filesystem and
it is quite common for the system to have some inotify watches on some
config files or directories, so the optimization of no marks at all is
often not in effect.

Permission event watchers, which require high priority group are more
rare, so optimizing the case of no marks og high priority groups can
improve performance for more systems, especially for performance
sensitive io workloads.

Count per-sb watched objects by high priority groups and use that the
optimize out the call to __fsnotify_parent() and fsnotify() in fsnotify
permission hooks.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20240317184154.1200192-11-amir73il@gmail.com>
2024-04-04 16:24:16 +02:00
Amir Goldstein 07a3b8d0bf fsnotify: lazy attach fsnotify_sb_info state to sb
Define a container struct fsnotify_sb_info to hold per-sb state,
including the reference to sb marks connector.

Allocate the fsnotify_sb_info state before attaching connector to any
object on the sb and free it only when killing sb.

This state is going to be used for storing per priority watched objects
counters.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20240317184154.1200192-8-amir73il@gmail.com>
2024-04-04 16:24:16 +02:00
Amir Goldstein c9d4603b05 fsnotify: create helper fsnotify_update_sb_watchers()
We would like to count watched objects by priority group, so we will need
to update the watched object counter after adding/removing marks.

Create a helper fsnotify_update_sb_watchers() and call it after
attaching/detaching a mark, instead of fsnotify_{get,put}_sb_watchers()
only after attaching/detaching a connector.

Soon, we will use this helper to count watched objects by the highest
watching priority group.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20240317184154.1200192-7-amir73il@gmail.com>
2024-04-04 16:24:16 +02:00
Amir Goldstein 687c217c6a fsnotify: pass object pointer and type to fsnotify mark helpers
Instead of passing fsnotify_connp_t, pass the pointer to the marked
object.

Store the object pointer in the connector and move the definition of
fsnotify_connp_t to internal fsnotify subsystem API, so it is no longer
used by fsnotify backends.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20240317184154.1200192-6-amir73il@gmail.com>
2024-04-04 16:24:16 +02:00
Amir Goldstein f115815d75 fsnotify: create helpers to get sb and connp from object
In preparation to passing an object pointer to add/remove/find mark
helpers, create helpers to get sb and connp by object type.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20240317184154.1200192-3-amir73il@gmail.com>
2024-04-04 16:23:46 +02:00
Amir Goldstein d2f277e26f fsnotify: rename fsnotify_{get,put}_sb_connectors()
Instead of counting the number of connectors in an sb, we would like
to count the number of watched objects per priority group.

As a start, create an accessor fsnotify_sb_watched_objects() to
s_fsnotify_connectors and rename the fsnotify_{get,put}_sb_connectors()
helpers to fsnotify_{get,put}_sb_watchers() to better describes the
counter.

Increment the counter at the end of fsnotify_attach_connector_to_object()
if connector was attached instead of decrementing it on race to connect.

This is fine, because fsnotify_delete_sb() cannot be running in parallel
to fsnotify_attach_connector_to_object() which requires a reference to
a filesystem object.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20240317184154.1200192-2-amir73il@gmail.com>
2024-04-04 16:21:12 +02:00
Amir Goldstein 7232522e6c fanotify: store fsid in mark instead of in connector
Some filesystems like fuse and nfs have zero or non-unique fsid.
We would like to avoid reporting ambiguous fsid in events, so we need
to avoid marking objects with same fsid and different sb.

To make this easier to enforce, store the fsid in the marks of the group
instead of in the shared conenctor.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20231130165619.3386452-2-amir73il@gmail.com>
2023-12-01 10:55:21 +01:00
Amir Goldstein c3638b5b13 fsnotify: allow adding an inode mark without pinning inode
fsnotify_add_mark() and variants implicitly take a reference on inode
when attaching a mark to an inode.

Make that behavior opt-out with the mark flag FSNOTIFY_MARK_FLAG_NO_IREF.

Instead of taking the inode reference when attaching connector to inode
and dropping the inode reference when detaching connector from inode,
take the inode reference on attach of the first mark that wants to hold
an inode reference and drop the inode reference on detach of the last
mark that wants to hold an inode reference.

Backends can "upgrade" an existing mark to take an inode reference, but
cannot "downgrade" a mark with inode reference to release the refernce.

This leaves the choice to the backend whether or not to pin the inode
when adding an inode mark.

This is intended to be used when adding a mark with ignored mask that is
used for optimization in cases where group can afford getting unneeded
events and reinstate the mark with ignored mask when inode is accessed
again after being evicted.

Link: https://lore.kernel.org/r/20220422120327.3459282-12-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25 14:42:45 +02:00
Amir Goldstein 43b245a788 fsnotify: create helpers for group mark_mutex lock
Create helpers to take and release the group mark_mutex lock.

Define a flag FSNOTIFY_GROUP_NOFS in fsnotify_group that determines
if the mark_mutex lock is fs reclaim safe or not.  If not safe, the
lock helpers take the lock and disable direct fs reclaim.

In that case we annotate the mutex with a different lockdep class to
express to lockdep that an allocation of mark of an fs reclaim safe group
may take the group lock of another "NOFS" group to evict inodes.

For now, converted only the callers in common code and no backend
defines the NOFS flag.  It is intended to be set by fanotify for
evictable marks support.

Link: https://lore.kernel.org/r/20220422120327.3459282-7-amir73il@gmail.com
Suggested-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25 14:37:22 +02:00
Amir Goldstein f3010343d9 fsnotify: make allow_dups a property of the group
Instead of passing the allow_dups argument to fsnotify_add_mark()
as an argument, define the group flag FSNOTIFY_GROUP_DUPS to express
the allow_dups behavior and set this behavior at group creation time
for all calls of fsnotify_add_mark().

Rename the allow_dups argument to generic add_flags argument for future
use.

Link: https://lore.kernel.org/r/20220422120327.3459282-6-amir73il@gmail.com
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25 14:37:18 +02:00
Amir Goldstein 623af4f538 fsnotify: fix wrong lockdep annotations
Commit 6960b0d909 ("fsnotify: change locking order") changed some
of the mark_mutex locks in direct reclaim path to use:
  mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);

This change is explained:
 "...It uses nested locking to avoid deadlock in case we do the final
  iput() on an inode which still holds marks and thus would take the
  mutex again when calling fsnotify_inode_delete() in destroy_inode()."

The problem is that the mutex_lock_nested() is not a nested lock at
all. In fact, it has the opposite effect of preventing lockdep from
warning about a very possible deadlock.

Due to these wrong annotations, a deadlock that was introduced with
nfsd filecache in kernel v5.4 went unnoticed in v5.4.y for over two
years until it was reported recently by Khazhismel Kumykov, only to
find out that the deadlock was already fixed in kernel v5.5.

Fix the wrong lockdep annotations.

Cc: Khazhismel Kumykov <khazhy@google.com>
Fixes: 6960b0d909 ("fsnotify: change locking order")
Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/
Link: https://lore.kernel.org/r/20220422120327.3459282-4-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25 14:37:08 +02:00
Amir Goldstein 4f0b903ded fsnotify: fix merge with parent's ignored mask
fsnotify_parent() does not consider the parent's mark at all unless
the parent inode shows interest in events on children and in the
specific event.

So unless parent added an event to both its mark mask and ignored mask,
the event will not be ignored.

Fix this by declaring the interest of an object in an event when the
event is in either a mark mask or ignored mask.

Link: https://lore.kernel.org/r/20220223151438.790268-2-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2022-02-24 14:04:51 +01:00
Amir Goldstein 1c9007d62b fsnotify: separate mark iterator type from object type enum
They are two different types that use the same enum, so this confusing.

Use the object type to indicate the type of object mark is attached to
and the iter type to indicate the type of watch.

A group can have two different watches of the same object type (parent
and child watches) that match the same event.

Link: https://lore.kernel.org/r/20211129201537.1932819-3-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-12-15 14:04:06 +01:00
Amir Goldstein ad69cd9972 fsnotify: clarify object type argument
In preparation for separating object type from iterator type, rename
some 'type' arguments in functions to 'obj_type' and remove the unused
interface to clear marks by object type mask.

Link: https://lore.kernel.org/r/20211129201537.1932819-2-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-12-15 14:04:03 +01:00
Amir Goldstein 4396a73115 fsnotify: fix sb_connectors leak
Fix a leak in s_fsnotify_connectors counter in case of a race between
concurrent add of new fsnotify mark to an object.

The task that lost the race fails to drop the counter before freeing
the unused connector.

Following umount() hangs in fsnotify_sb_delete()/wait_var_event(),
because s_fsnotify_connectors never drops to zero.

Fixes: ec44610fe2 ("fsnotify: count all objects with attached connectors")
Reported-by: Murphy Zhou <jencce.kernel@gmail.com>
Link: https://lore.kernel.org/linux-fsdevel/20210907063338.ycaw6wvhzrfsfdlp@xzhoux.usersys.redhat.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-10 09:46:48 -07:00
Amir Goldstein ec44610fe2 fsnotify: count all objects with attached connectors
Rename s_fsnotify_inode_refs to s_fsnotify_connectors and count all
objects with attached connectors, not only inodes with attached
connectors.

This will be used to optimize fsnotify() calls on sb without any
type of marks.

Link: https://lore.kernel.org/r/20210810151220.285179-4-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Matthew Bobrowski <repnop@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-08-11 13:50:48 +02:00
Amir Goldstein 11fa333b58 fsnotify: count s_fsnotify_inode_refs for attached connectors
Instead of incrementing s_fsnotify_inode_refs when detaching connector
from inode, increment it earlier when attaching connector to inode.
Next patch is going to use s_fsnotify_inode_refs to count all objects
with attached connectors.

Link: https://lore.kernel.org/r/20210810151220.285179-3-amir73il@gmail.com
Reviewed-by: Matthew Bobrowski <repnop@google.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-08-11 13:50:42 +02:00
Amir Goldstein 09ddbe69c9 fsnotify: replace igrab() with ihold() on attach connector
We must have a reference on inode, so ihold is cheaper.

Link: https://lore.kernel.org/r/20210810151220.285179-2-amir73il@gmail.com
Reviewed-by: Matthew Bobrowski <repnop@google.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-08-11 13:50:28 +02:00
Amir Goldstein 5b8fea65d1 fanotify: configurable limits via sysfs
fanotify has some hardcoded limits. The only APIs to escape those limits
are FAN_UNLIMITED_QUEUE and FAN_UNLIMITED_MARKS.

Allow finer grained tuning of the system limits via sysfs tunables under
/proc/sys/fs/fanotify, similar to tunables under /proc/sys/fs/inotify,
with some minor differences.

- max_queued_events - global system tunable for group queue size limit.
  Like the inotify tunable with the same name, it defaults to 16384 and
  applies on initialization of a new group.

- max_user_marks - user ns tunable for marks limit per user.
  Like the inotify tunable named max_user_watches, on a machine with
  sufficient RAM and it defaults to 1048576 in init userns and can be
  further limited per containing user ns.

- max_user_groups - user ns tunable for number of groups per user.
  Like the inotify tunable named max_user_instances, it defaults to 128
  in init userns and can be further limited per containing user ns.

The slightly different tunable names used for fanotify are derived from
the "group" and "mark" terminology used in the fanotify man pages and
throughout the code.

Considering the fact that the default value for max_user_instances was
increased in kernel v5.10 from 8192 to 1048576, leaving the legacy
fanotify limit of 8192 marks per group in addition to the max_user_marks
limit makes little sense, so the per group marks limit has been removed.

Note that when a group is initialized with FAN_UNLIMITED_MARKS, its own
marks are not accounted in the per user marks account, so in effect the
limit of max_user_marks is only for the collection of groups that are
not initialized with FAN_UNLIMITED_MARKS.

Link: https://lore.kernel.org/r/20210304112921.3996419-2-amir73il@gmail.com
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-03-16 16:49:31 +01:00
Jules Irenge 00e0afb658 fsnotify: Add missing annotation for fsnotify_finish_user_wait() and for fsnotify_prepare_user_wait()
Sparse reports warnings at fsnotify_prepare_user_wait()
	and at fsnotify_finish_user_wait()

warning: context imbalance in fsnotify_finish_user_wait()
	- wrong count at exit
warning: context imbalance in fsnotify_prepare_user_wait()
	- unexpected unlock

The root cause is the missing annotation at fsnotify_finish_user_wait()
	and at fsnotify_prepare_user_wait()
fsnotify_prepare_user_wait() has an extra annotation __release()
 that only tell Sparse and not GCC to shutdown the warning

Add the missing  __acquires(&fsnotify_mark_srcu) annotation
Add the missing __releases(&fsnotify_mark_srcu) annotation
Add the __release(&fsnotify_mark_srcu) annotation.

Link: https://lore.kernel.org/r/20200413214240.15245-1-jbi.octave@gmail.com
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2020-04-15 11:44:43 +02:00
Trond Myklebust b72679ee89 notify: export symbols for use by the knfsd file cache
The knfsd file cache will need to detect when files are unlinked, so that
it can close the associated cached files. Export a minimal set of notifier
functions to allow it to do so.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19 11:00:39 -04:00
Amir Goldstein c285a2f01d fanotify: update connector fsid cache on add mark
When implementing connector fsid cache, we only initialized the cache
when the first mark added to object was added by FAN_REPORT_FID group.
We forgot to update conn->fsid when the second mark is added by
FAN_REPORT_FID group to an already attached connector without fsid
cache.

Reported-and-tested-by: syzbot+c277e8e2f46414645508@syzkaller.appspotmail.com
Fixes: 77115225ac ("fanotify: cache fsid in fsnotify_mark_connector")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2019-06-19 15:53:58 +02:00
Thomas Gleixner c82ee6d3be treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 18
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 or at your option any
  later version this program is distributed in the hope that it will
  be useful but without any warranty without even the implied warranty
  of merchantability or fitness for a particular purpose see the gnu
  general public license for more details you should have received a
  copy of the gnu general public license along with this program see
  the file copying if not write to the free software foundation 675
  mass ave cambridge ma 02139 usa

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 52 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190519154042.342335923@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 11:28:46 +02:00
Jan Kara 11a6f8e2db fsnotify: Clarify connector assignment in fsnotify_add_mark_list()
Add a comment explaining why WRITE_ONCE() is enough when setting
mark->connector which can get dereferenced by RCU protected readers.

Signed-off-by: Jan Kara <jack@suse.cz>
2019-05-01 18:05:11 +02:00
Jan Kara b1da6a5187 fsnotify: Fix NULL ptr deref in fanotify_get_fsid()
fanotify_get_fsid() is reading mark->connector->fsid under srcu. It can
happen that it sees mark not fully initialized or mark that is already
detached from the object list. In these cases mark->connector
can be NULL leading to NULL ptr dereference. Fix the problem by
being careful when reading mark->connector and check it for being NULL.
Also use WRITE_ONCE when writing the mark just to prevent compiler from
doing something stupid.

Reported-by: syzbot+15927486a4f1bfcbaf91@syzkaller.appspotmail.com
Fixes: 77115225ac ("fanotify: cache fsid in fsnotify_mark_connector")
Signed-off-by: Jan Kara <jack@suse.cz>
2019-04-28 22:14:50 +02:00
Amir Goldstein 77115225ac fanotify: cache fsid in fsnotify_mark_connector
For FAN_REPORT_FID, we need to encode fid with fsid of the filesystem on
every event. To avoid having to call vfs_statfs() on every event to get
fsid, we store the fsid in fsnotify_mark_connector on the first time we
add a mark and on handle event we use the cached fsid.

Subsequent calls to add mark on the same object are expected to pass the
same fsid, so the call will fail on cached fsid mismatch.

If an event is reported on several mark types (inode, mount, filesystem),
all connectors should already have the same fsid, so we use the cached
fsid from the first connector.

[JK: Simplify code flow around fanotify_get_fid()
     make fsid argument of fsnotify_add_mark_locked() unconditional]

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2019-02-07 16:38:35 +01:00
Jan Kara 721fb6fbfd fsnotify: Fix busy inodes during unmount
Detaching of mark connector from fsnotify_put_mark() can race with
unmounting of the filesystem like:

  CPU1				CPU2
fsnotify_put_mark()
  spin_lock(&conn->lock);
  ...
  inode = fsnotify_detach_connector_from_object(conn)
  spin_unlock(&conn->lock);
				generic_shutdown_super()
				  fsnotify_unmount_inodes()
				    sees connector detached for inode
				      -> nothing to do
				  evict_inode()
				    barfs on pending inode reference
  iput(inode);

Resulting in "Busy inodes after unmount" message and possible kernel
oops. Make fsnotify_unmount_inodes() properly wait for outstanding inode
references from detached connectors.

Note that the accounting of outstanding inode references in the
superblock can cause some cacheline contention on the counter. OTOH it
happens only during deletion of the last notification mark from an inode
(or during unlinking of watched inode) and that is not too bad. I have
measured time to create & delete inotify watch 100000 times from 64
processes in parallel (each process having its own inotify group and its
own file on a shared superblock) on a 64 CPU machine. Average and
standard deviation of 15 runs look like:

	Avg		Stddev
Vanilla	9.817400	0.276165
Fixed	9.710467	0.228294

So there's no statistically significant difference.

Fixes: 6b3f05d24d ("fsnotify: Detach mark from object list when last reference is dropped")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2018-10-25 15:49:19 +02:00
Amir Goldstein 1e6cb72399 fsnotify: add super block object type
Add the infrastructure to attach a mark to a super_block struct
and detach all attached marks when super block is destroyed.

This is going to be used by fanotify backend to setup super block
marks.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-09-03 15:14:01 +02:00
Jan Kara d3bc0fa841 fsnotify: fix false positive warning on inode delete
When inode is getting deleted and someone else holds reference to a mark
attached to the inode, we just detach the connector from the inode. In
that case fsnotify_put_mark() called from fsnotify_destroy_marks() will
decide to recalculate mask for the inode and __fsnotify_recalc_mask()
will WARN about invalid connector type:

WARNING: CPU: 1 PID: 12015 at fs/notify/mark.c:139
__fsnotify_recalc_mask+0x2d7/0x350 fs/notify/mark.c:139

Actually there's no reason to warn about detached connector in
__fsnotify_recalc_mask() so just silently skip updating the mask in such
case.

Reported-by: syzbot+c34692a51b9a6ca93540@syzkaller.appspotmail.com
Fixes: 3ac70bfcde ("fsnotify: add helper to get mask from connector")
Signed-off-by: Jan Kara <jack@suse.cz>
2018-08-20 13:55:45 +02:00
Amir Goldstein 3ac70bfcde fsnotify: add helper to get mask from connector
Use a helper to get the mask from the object (i.e. i_fsnotify_mask)
to generalize code of add/remove inode/vfsmount mark.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-27 13:45:07 +02:00
Amir Goldstein 36f10f55ff fsnotify: let connector point to an abstract object
Make the code to attach/detach a connector to object more generic
by letting the fsnotify connector point to an abstract fsnotify_connp_t.
Code that needs to dereference an inode or mount object now uses the
helpers fsnotify_conn_{inode,mount}.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-27 13:45:05 +02:00
Amir Goldstein b812a9f589 fsnotify: pass connp and object type to fsnotify_add_mark()
Instead of passing inode and vfsmount arguments to fsnotify_add_mark()
and its _locked variant, pass an abstract object pointer and the object
type.

The helpers fsnotify_obj_{inode,mount} are added to get the concrete
object pointer from abstract object pointer.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-27 13:45:03 +02:00
Amir Goldstein 9b6e543450 fsnotify: use typedef fsnotify_connp_t for brevity
The object marks manipulation functions fsnotify_destroy_marks()
fsnotify_find_mark() and their helpers take an argument of type
struct fsnotify_mark_connector __rcu ** to dereference the connector
pointer. use a typedef to describe this type for brevity.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-27 13:44:59 +02:00
Amir Goldstein 47d9c7cc45 fsnotify: generalize iteration of marks by object type
Make some code that handles marks of object types inode and vfsmount
generic, so it can handle other object types.

Introduce fsnotify_foreach_obj_type macro to iterate marks by object type
and fsnotify_iter_{should|set}_report_type macros to set/test report_mask.

This is going to be used for adding mark of another object type
(super block mark).

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-05-18 14:58:22 +02:00
Amir Goldstein d6f7b98bc8 fsnotify: use type id to identify connector object type
An fsnotify_mark_connector is referencing a single type of object
(either inode or vfsmount). Instead of storing a type mask in
connector->flags, store a single type id in connector->type to
identify the type of object.

When a connector object is detached from the object, its type is set
to FSNOTIFY_OBJ_TYPE_DETACHED and this object is not going to be
reused.

The function fsnotify_clear_marks_by_group() is the only place where
type mask was used, so use type flags instead of type id to this
function.

This change is going to be more convenient when adding a new object
type (super block).

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-05-18 14:58:22 +02:00
Elena Reshetova ab97f87325 fsnotify: convert fsnotify_mark.refcnt from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable fsnotify_mark.refcnt is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31 17:54:56 +01:00
Miklos Szeredi 9a31d7ad99 fsnotify: fix pinning group in fsnotify_prepare_user_wait()
Blind increment of group's user_waits is not enough, we could be far enough
in the group's destruction that it isn't taken into account (i.e. grabbing
the mark ref afterwards doesn't guarantee that it was the ref coming from
the _group_ that was grabbed).

Instead we need to check (under lock) that the mark is still attached to
the group after having obtained a ref to the mark.  If not, skip it.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 9385a84d7e ("fsnotify: Pass fsnotify_iter_info into handle_event handler")
Cc: <stable@vger.kernel.org> # v4.12
Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31 17:54:56 +01:00
Miklos Szeredi 24c20305c7 fsnotify: clean up fsnotify_prepare/finish_user_wait()
This patch doesn't actually fix any bug, just paves the way for fixing mark
and group pinning.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org> # v4.12
Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31 17:54:56 +01:00
Jan Kara 9cf90cef36 fsnotify: Protect bail out path of fsnotify_add_mark_locked() properly
When fsnotify_add_mark_locked() fails it cleans up the mark it was
adding. Since the mark is already visible in group's list, we should
protect update of mark->flags with mark->lock. I'm not aware of any real
issues this could cause (since we also hold group->mark_mutex) but
better be safe and obey locking rules properly.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31 17:54:56 +01:00
Dan Carpenter f4edce1afd fsnotify: remove a stray unlock
We recently shifted this code around, so we're no longer holding the
lock on this path.

Fixes: 755b5bc681 ("fsnotify: Remove indirection from mark list addition")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-24 16:41:28 +02:00
Jan Kara 054c636e5c fsnotify: Move ->free_mark callback to fsnotify_ops
Pointer to ->free_mark callback unnecessarily occupies one long in each
fsnotify_mark although they are the same for all marks from one
notification group. Move the callback pointer to fsnotify_ops.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10 17:37:36 +02:00
Jan Kara 7b12932340 fsnotify: Add group pointer in fsnotify_init_mark()
Currently we initialize mark->group only in fsnotify_add_mark_lock().
However we will need to access fsnotify_ops of corresponding group from
fsnotify_put_mark() so we need mark->group initialized earlier. Do that
in fsnotify_init_mark() which has a consequence that once
fsnotify_init_mark() is called on a mark, the mark has to be destroyed
by fsnotify_put_mark().

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10 17:37:36 +02:00
Jan Kara 2e37c6ca8d fsnotify: Remove fsnotify_detach_group_marks()
The function is already mostly contained in what
fsnotify_clear_marks_by_group() does. Just update that function to not
select marks when all of them should be destroyed and remove
fsnotify_detach_group_marks().

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10 17:37:36 +02:00
Jan Kara 18f2e0d3a4 fsnotify: Rename fsnotify_clear_marks_by_group_flags()
The _flags() suffix in the function name was more confusing than
explaining so just remove it. Also rename the argument from 'flags' to
'type' to better explain what the function expects.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10 17:37:36 +02:00
Jan Kara 66d2b81bcb fsnotify: Remove fsnotify_set_mark_{,ignored_}mask_locked()
These helpers are now only a simple assignment and just obfuscate
what is going on. Remove them.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10 17:37:36 +02:00
Jan Kara abc77577a6 fsnotify: Provide framework for dropping SRCU lock in ->handle_event
fanotify wants to drop fsnotify_mark_srcu lock when waiting for response
from userspace so that the whole notification subsystem is not blocked
during that time. This patch provides a framework for safely getting
mark reference for a mark found in the object list which pins the mark
in that list. We can then drop fsnotify_mark_srcu, wait for userspace
response and then safely continue iteration of the object list once we
reaquire fsnotify_mark_srcu.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10 17:37:36 +02:00
Jan Kara f09b04a03e fsnotify: Remove special handling of mark destruction on group shutdown
Currently we queue all marks for destruction on group shutdown and then
destroy them from fsnotify_destroy_group() instead from a worker thread
which is the usual path. However worker can already be processing some
list of marks to destroy so this does not make 100% all marks are really
destroyed by the time group is shut down. This isn't a big problem as
each mark holds group reference and thus group stays partially alive
until all marks are really freed but there's no point in complicating
our lives - just wait for the delayed work to be finished instead.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10 17:37:36 +02:00
Jan Kara 6b3f05d24d fsnotify: Detach mark from object list when last reference is dropped
Instead of removing mark from object list from fsnotify_detach_mark(),
remove the mark when last reference to the mark is dropped. This will
allow fanotify to wait for userspace response to event without having to
hold onto fsnotify_mark_srcu.

To avoid pinning inodes by elevated refcount (and thus e.g. delaying
file deletion) while someone holds mark reference, we detach connector
from the object also from fsnotify_destroy_marks() and not only after
removing last mark from the list as it was now.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10 17:37:36 +02:00
Jan Kara 11375145a7 fsnotify: Move queueing of mark for destruction into fsnotify_put_mark()
Currently we queue mark into a list of marks for destruction in
__fsnotify_free_mark() and keep the last mark reference dangling. After the
worker waits for SRCU period, it drops the last reference to the mark
which frees it. This scheme has the disadvantage that if we hold
reference to a mark and drop and reacquire SRCU lock, the mark can get
freed immediately which is slightly inconvenient and we will need to
avoid this in the future.

Move to a scheme where queueing of mark into a list of marks for
destruction happens when the last reference to the mark is dropped. Also
drop reference to the mark held by group list already when mark is
removed from that list instead of dropping it only from the destruction
worker.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10 17:37:35 +02:00
Jan Kara 08991e83b7 fsnotify: Free fsnotify_mark_connector when there is no mark attached
Currently we free fsnotify_mark_connector structure only when inode /
vfsmount is getting freed. This can however impose noticeable memory
overhead when marks get attached to inodes only temporarily. So free the
connector structure once the last mark is detached from the object.
Since notification infrastructure can be working with the connector
under the protection of fsnotify_mark_srcu, we have to be careful and
free the fsnotify_mark_connector only after SRCU period passes.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-10 17:37:35 +02:00