Skip to content

Commit

Permalink
Merge tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel…
Browse files Browse the repository at this point in the history
…/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "Misc features, cleanups, and fixes for vfs and individual filesystems.

  Features:

   - Support idmapped mounts for hugetlbfs.

   - Add RWF_NOAPPEND flag for pwritev2(). This allows us to fix a bug
     where the passed offset is ignored if the file is O_APPEND. The new
     flag allows a caller to enforce that the offset is honored to
     conform to posix even if the file was opened in append mode.

   - Move i_mmap_rwsem in struct address_space to avoid false sharing
     between i_mmap and i_mmap_rwsem.

   - Convert efs, qnx4, and coda to use the new mount api.

   - Add a generic is_dot_dotdot() helper that's used by various
     filesystems and the VFS code instead of open-coding it multiple
     times.

   - Recently we've added stable offsets which allows stable ordering
     when iterating directories exported through NFS on e.g., tmpfs
     filesystems. Originally an xarray was used for the offset map but
     that caused slab fragmentation issues over time. This switches the
     offset map to the maple tree which has a dense mode that handles
     this scenario a lot better. Includes tests.

   - Finally merge the case-insensitive improvement series Gabriel has
     been working on for a long time. This cleanly propagates case
     insensitive operations through ->s_d_op which in turn allows us to
     remove the quite ugly generic_set_encrypted_ci_d_ops() operations.
     It also improves performance by trying a case-sensitive comparison
     first and then fallback to case-insensitive lookup if that fails.
     This also fixes a bug where overlayfs would be able to be mounted
     over a case insensitive directory which would lead to all sort of
     odd behaviors.

  Cleanups:

   - Make file_dentry() a simple accessor now that ->d_real() is
     simplified because of the backing file work we did the last two
     cycles.

   - Use the dedicated file_mnt_idmap helper in ntfs3.

   - Use smp_load_acquire/store_release() in the i_size_read/write
     helpers and thus remove the hack to handle i_size reads in the
     filemap code.

   - The SLAB_MEM_SPREAD is a nop now. Remove it from various places in
     fs/

   - It's no longer necessary to perform a second built-in initramfs
     unpack call because we retain the contents of the previous
     extraction. Remove it.

   - Now that we have removed various allocators kfree_rcu() always
     works with kmem caches and kmalloc(). So simplify various places
     that only use an rcu callback in order to handle the kmem cache
     case.

   - Convert the pipe code to use a lockdep comparison function instead
     of open-coding the nesting making lockdep validation easier.

   - Move code into fs-writeback.c that was located in a header but can
     be made static as it's only used in that one file.

   - Rewrite the alignment checking iterators for iovec and bvec to be
     easier to read, and also significantly more compact in terms of
     generated code. This saves 270 bytes of text on x86-64 (with
     clang-18) and 224 bytes on arm64 (with gcc-13). In profiles it also
     saves a bit of time for the same workload.

   - Switch various places to use KMEM_CACHE instead of
     kmem_cache_create().

   - Use inode_set_ctime_to_ts() in inode_set_ctime_current()

   - Use kzalloc() in name_to_handle_at() to avoid kernel infoleak.

   - Various smaller cleanups for eventfds.

  Fixes:

   - Fix various comments and typos, and unneeded initializations.

   - Fix stack allocation hack for clang in the select code.

   - Improve dump_mapping() debug code on a best-effort basis.

   - Fix build errors in various selftests.

   - Avoid wrap-around instrumentation in various places.

   - Don't allow user namespaces without an idmapping to be used for
     idmapped mounts.

   - Fix sysv sb_read() call.

   - Fix fallback implementation of the get_name() export operation"

* tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (70 commits)
  hugetlbfs: support idmapped mounts
  qnx4: convert qnx4 to use the new mount api
  fs: use inode_set_ctime_to_ts to set inode ctime to current time
  libfs: Drop generic_set_encrypted_ci_d_ops
  ubifs: Configure dentry operations at dentry-creation time
  f2fs: Configure dentry operations at dentry-creation time
  ext4: Configure dentry operations at dentry-creation time
  libfs: Add helper to choose dentry operations at mount-time
  libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops
  fscrypt: Drop d_revalidate once the key is added
  fscrypt: Drop d_revalidate for valid dentries during lookup
  fscrypt: Factor out a helper to configure the lookup dentry
  ovl: Always reject mounting over case-insensitive directories
  libfs: Attempt exact-match comparison first during casefolded lookup
  efs: remove SLAB_MEM_SPREAD flag usage
  jfs: remove SLAB_MEM_SPREAD flag usage
  minix: remove SLAB_MEM_SPREAD flag usage
  openpromfs: remove SLAB_MEM_SPREAD flag usage
  proc: remove SLAB_MEM_SPREAD flag usage
  qnx6: remove SLAB_MEM_SPREAD flag usage
  ...
  • Loading branch information
torvalds committed Mar 11, 2024
2 parents 97ec971 + 09406ad commit 7ea65c8
Show file tree
Hide file tree
Showing 65 changed files with 816 additions and 503 deletions.
2 changes: 1 addition & 1 deletion Documentation/filesystems/files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ before and after the reference count increment. This pattern can be seen
in get_file_rcu() and __files_get_rcu().

In addition, it isn't possible to access or check fields in struct file
without first aqcuiring a reference on it under rcu lookup. Not doing
without first acquiring a reference on it under rcu lookup. Not doing
that was always very dodgy and it was only usable for non-pointer data
in struct file. With SLAB_TYPESAFE_BY_RCU it is necessary that callers
either first acquire a reference or they must hold the files_lock of the
Expand Down
2 changes: 1 addition & 1 deletion Documentation/filesystems/locking.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ prototypes::
char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen);
struct vfsmount *(*d_automount)(struct path *path);
int (*d_manage)(const struct path *, bool);
struct dentry *(*d_real)(struct dentry *, const struct inode *);
struct dentry *(*d_real)(struct dentry *, enum d_real_type type);

locking rules:

Expand Down
16 changes: 7 additions & 9 deletions Documentation/filesystems/vfs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1264,7 +1264,7 @@ defined:
char *(*d_dname)(struct dentry *, char *, int);
struct vfsmount *(*d_automount)(struct path *);
int (*d_manage)(const struct path *, bool);
struct dentry *(*d_real)(struct dentry *, const struct inode *);
struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
};
``d_revalidate``
Expand Down Expand Up @@ -1419,16 +1419,14 @@ defined:
the dentry being transited from.

``d_real``
overlay/union type filesystems implement this method to return
one of the underlying dentries hidden by the overlay. It is
used in two different modes:
overlay/union type filesystems implement this method to return one
of the underlying dentries of a regular file hidden by the overlay.

Called from file_dentry() it returns the real dentry matching
the inode argument. The real dentry may be from a lower layer
already copied up, but still referenced from the file. This
mode is selected with a non-NULL inode argument.
The 'type' argument takes the values D_REAL_DATA or D_REAL_METADATA
for returning the real underlying dentry that refers to the inode
hosting the file's data or metadata respectively.

With NULL inode the topmost real underlying dentry is returned.
For non-regular files, the 'dentry' argument is returned.

Each dentry has a pointer to its parent dentry, as well as a hash list
of child dentries. Child dentries are basically like files in a
Expand Down
2 changes: 1 addition & 1 deletion fs/attr.c
Original file line number Diff line number Diff line change
Expand Up @@ -352,7 +352,7 @@ int may_setattr(struct mnt_idmap *idmap, struct inode *inode,
EXPORT_SYMBOL(may_setattr);

/**
* notify_change - modify attributes of a filesytem object
* notify_change - modify attributes of a filesystem object
* @idmap: idmap of the mount the inode was found from
* @dentry: object affected
* @attr: new attributes
Expand Down
4 changes: 1 addition & 3 deletions fs/backing-file.c
Original file line number Diff line number Diff line change
Expand Up @@ -325,9 +325,7 @@ EXPORT_SYMBOL_GPL(backing_file_mmap);

static int __init backing_aio_init(void)
{
backing_aio_cachep = kmem_cache_create("backing_aio",
sizeof(struct backing_aio),
0, SLAB_HWCACHE_ALIGN, NULL);
backing_aio_cachep = KMEM_CACHE(backing_aio, SLAB_HWCACHE_ALIGN);
if (!backing_aio_cachep)
return -ENOMEM;

Expand Down
10 changes: 3 additions & 7 deletions fs/buffer.c
Original file line number Diff line number Diff line change
Expand Up @@ -464,7 +464,7 @@ EXPORT_SYMBOL(mark_buffer_async_write);
* a successful fsync(). For example, ext2 indirect blocks need to be
* written back and waited upon before fsync() returns.
*
* The functions mark_buffer_inode_dirty(), fsync_inode_buffers(),
* The functions mark_buffer_dirty_inode(), fsync_inode_buffers(),
* inode_has_buffers() and invalidate_inode_buffers() are provided for the
* management of a list of dependent buffers at ->i_mapping->i_private_list.
*
Expand Down Expand Up @@ -3121,12 +3121,8 @@ void __init buffer_init(void)
unsigned long nrpages;
int ret;

bh_cachep = kmem_cache_create("buffer_head",
sizeof(struct buffer_head), 0,
(SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
SLAB_MEM_SPREAD),
NULL);

bh_cachep = KMEM_CACHE(buffer_head,
SLAB_RECLAIM_ACCOUNT|SLAB_PANIC);
/*
* Limit the bh occupancy to 10% of ZONE_NORMAL
*/
Expand Down
143 changes: 98 additions & 45 deletions fs/coda/inode.c
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@
#include <linux/pid_namespace.h>
#include <linux/uaccess.h>
#include <linux/fs.h>
#include <linux/fs_context.h>
#include <linux/fs_parser.h>
#include <linux/vmalloc.h>

#include <linux/coda.h>
Expand Down Expand Up @@ -87,10 +89,10 @@ void coda_destroy_inodecache(void)
kmem_cache_destroy(coda_inode_cachep);
}

static int coda_remount(struct super_block *sb, int *flags, char *data)
static int coda_reconfigure(struct fs_context *fc)
{
sync_filesystem(sb);
*flags |= SB_NOATIME;
sync_filesystem(fc->root->d_sb);
fc->sb_flags |= SB_NOATIME;
return 0;
}

Expand All @@ -102,78 +104,102 @@ static const struct super_operations coda_super_operations =
.evict_inode = coda_evict_inode,
.put_super = coda_put_super,
.statfs = coda_statfs,
.remount_fs = coda_remount,
};

static int get_device_index(struct coda_mount_data *data)
struct coda_fs_context {
int idx;
};

enum {
Opt_fd,
};

static const struct fs_parameter_spec coda_param_specs[] = {
fsparam_fd ("fd", Opt_fd),
{}
};

static int coda_parse_fd(struct fs_context *fc, int fd)
{
struct coda_fs_context *ctx = fc->fs_private;
struct fd f;
struct inode *inode;
int idx;

if (data == NULL) {
pr_warn("%s: Bad mount data\n", __func__);
return -1;
}

if (data->version != CODA_MOUNT_VERSION) {
pr_warn("%s: Bad mount version\n", __func__);
return -1;
}

f = fdget(data->fd);
f = fdget(fd);
if (!f.file)
goto Ebadf;
return -EBADF;
inode = file_inode(f.file);
if (!S_ISCHR(inode->i_mode) || imajor(inode) != CODA_PSDEV_MAJOR) {
fdput(f);
goto Ebadf;
return invalf(fc, "code: Not coda psdev");
}

idx = iminor(inode);
fdput(f);

if (idx < 0 || idx >= MAX_CODADEVS) {
pr_warn("%s: Bad minor number\n", __func__);
return -1;
if (idx < 0 || idx >= MAX_CODADEVS)
return invalf(fc, "coda: Bad minor number");
ctx->idx = idx;
return 0;
}

static int coda_parse_param(struct fs_context *fc, struct fs_parameter *param)
{
struct fs_parse_result result;
int opt;

opt = fs_parse(fc, coda_param_specs, param, &result);
if (opt < 0)
return opt;

switch (opt) {
case Opt_fd:
return coda_parse_fd(fc, result.uint_32);
}

return idx;
Ebadf:
pr_warn("%s: Bad file\n", __func__);
return -1;
return 0;
}

/*
* Parse coda's binary mount data form. We ignore any errors and go with index
* 0 if we get one for backward compatibility.
*/
static int coda_parse_monolithic(struct fs_context *fc, void *_data)
{
struct coda_mount_data *data = _data;

if (!data)
return invalf(fc, "coda: Bad mount data");

if (data->version != CODA_MOUNT_VERSION)
return invalf(fc, "coda: Bad mount version");

coda_parse_fd(fc, data->fd);
return 0;
}

static int coda_fill_super(struct super_block *sb, void *data, int silent)
static int coda_fill_super(struct super_block *sb, struct fs_context *fc)
{
struct coda_fs_context *ctx = fc->fs_private;
struct inode *root = NULL;
struct venus_comm *vc;
struct CodaFid fid;
int error;
int idx;

if (task_active_pid_ns(current) != &init_pid_ns)
return -EINVAL;

idx = get_device_index((struct coda_mount_data *) data);

/* Ignore errors in data, for backward compatibility */
if(idx == -1)
idx = 0;

pr_info("%s: device index: %i\n", __func__, idx);
infof(fc, "coda: device index: %i\n", ctx->idx);

vc = &coda_comms[idx];
vc = &coda_comms[ctx->idx];
mutex_lock(&vc->vc_mutex);

if (!vc->vc_inuse) {
pr_warn("%s: No pseudo device\n", __func__);
errorf(fc, "coda: No pseudo device");
error = -EINVAL;
goto unlock_out;
}

if (vc->vc_sb) {
pr_warn("%s: Device already mounted\n", __func__);
errorf(fc, "coda: Device already mounted");
error = -EBUSY;
goto unlock_out;
}
Expand Down Expand Up @@ -313,18 +339,45 @@ static int coda_statfs(struct dentry *dentry, struct kstatfs *buf)
return 0;
}

/* init_coda: used by filesystems.c to register coda */
static int coda_get_tree(struct fs_context *fc)
{
if (task_active_pid_ns(current) != &init_pid_ns)
return -EINVAL;

static struct dentry *coda_mount(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data)
return get_tree_nodev(fc, coda_fill_super);
}

static void coda_free_fc(struct fs_context *fc)
{
return mount_nodev(fs_type, flags, data, coda_fill_super);
kfree(fc->fs_private);
}

static const struct fs_context_operations coda_context_ops = {
.free = coda_free_fc,
.parse_param = coda_parse_param,
.parse_monolithic = coda_parse_monolithic,
.get_tree = coda_get_tree,
.reconfigure = coda_reconfigure,
};

static int coda_init_fs_context(struct fs_context *fc)
{
struct coda_fs_context *ctx;

ctx = kzalloc(sizeof(struct coda_fs_context), GFP_KERNEL);
if (!ctx)
return -ENOMEM;

fc->fs_private = ctx;
fc->ops = &coda_context_ops;
return 0;
}

struct file_system_type coda_fs_type = {
.owner = THIS_MODULE,
.name = "coda",
.mount = coda_mount,
.init_fs_context = coda_init_fs_context,
.parameters = coda_param_specs,
.kill_sb = kill_anon_super,
.fs_flags = FS_BINARY_MOUNTDATA,
};
Expand Down
8 changes: 1 addition & 7 deletions fs/crypto/fname.c
Original file line number Diff line number Diff line change
Expand Up @@ -74,13 +74,7 @@ struct fscrypt_nokey_name {

static inline bool fscrypt_is_dot_dotdot(const struct qstr *str)
{
if (str->len == 1 && str->name[0] == '.')
return true;

if (str->len == 2 && str->name[0] == '.' && str->name[1] == '.')
return true;

return false;
return is_dot_dotdot(str->name, str->len);
}

/**
Expand Down
15 changes: 5 additions & 10 deletions fs/crypto/hooks.c
Original file line number Diff line number Diff line change
Expand Up @@ -102,11 +102,8 @@ int __fscrypt_prepare_lookup(struct inode *dir, struct dentry *dentry,
if (err && err != -ENOENT)
return err;

if (fname->is_nokey_name) {
spin_lock(&dentry->d_lock);
dentry->d_flags |= DCACHE_NOKEY_NAME;
spin_unlock(&dentry->d_lock);
}
fscrypt_prepare_dentry(dentry, fname->is_nokey_name);

return err;
}
EXPORT_SYMBOL_GPL(__fscrypt_prepare_lookup);
Expand All @@ -131,12 +128,10 @@ EXPORT_SYMBOL_GPL(__fscrypt_prepare_lookup);
int fscrypt_prepare_lookup_partial(struct inode *dir, struct dentry *dentry)
{
int err = fscrypt_get_encryption_info(dir, true);
bool is_nokey_name = (!err && !fscrypt_has_encryption_key(dir));

fscrypt_prepare_dentry(dentry, is_nokey_name);

if (!err && !fscrypt_has_encryption_key(dir)) {
spin_lock(&dentry->d_lock);
dentry->d_flags |= DCACHE_NOKEY_NAME;
spin_unlock(&dentry->d_lock);
}
return err;
}
EXPORT_SYMBOL_GPL(fscrypt_prepare_lookup_partial);
Expand Down
2 changes: 1 addition & 1 deletion fs/dcache.c
Original file line number Diff line number Diff line change
Expand Up @@ -3139,7 +3139,7 @@ static void __init dcache_init(void)
* of the dcache.
*/
dentry_cache = KMEM_CACHE_USERCOPY(dentry,
SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_ACCOUNT,
d_iname);

/* Hash may have been set up in dcache_init_early */
Expand Down
10 changes: 0 additions & 10 deletions fs/ecryptfs/crypto.c
Original file line number Diff line number Diff line change
Expand Up @@ -1949,16 +1949,6 @@ int ecryptfs_encrypt_and_encode_filename(
return rc;
}

static bool is_dot_dotdot(const char *name, size_t name_size)
{
if (name_size == 1 && name[0] == '.')
return true;
else if (name_size == 2 && name[0] == '.' && name[1] == '.')
return true;

return false;
}

/**
* ecryptfs_decode_and_decrypt_filename - converts the encoded cipher text name to decoded plaintext
* @plaintext_name: The plaintext name
Expand Down
Loading

0 comments on commit 7ea65c8

Please sign in to comment.