diff options
author | Christian Brauner <christian.brauner@ubuntu.com> | 2020-07-14 14:28:51 +0200 |
---|---|---|
committer | Christian Brauner <christian.brauner@ubuntu.com> | 2021-02-09 19:43:07 +0100 |
commit | 1d7b902e2875a1ff342e036a9f866a995640aea8 (patch) | |
tree | 9d09757509cb47edd291f26ab5617ffb16a039ac | |
parent | 28a4c58cc211900943f48d65fd42b313ce54e5a6 (diff) | |
download | man-pages-mount_setattr.tar.gz |
mount_setattr.2: New manual page documenting the mount_setattr() system callmount_setattr
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
-rw-r--r-- | man2/mount_setattr.2 | 715 |
1 files changed, 715 insertions, 0 deletions
diff --git a/man2/mount_setattr.2 b/man2/mount_setattr.2 new file mode 100644 index 0000000000..6289e8deeb --- /dev/null +++ b/man2/mount_setattr.2 @@ -0,0 +1,715 @@ +.\" Copyright (c) 2021 by Christian Brauner <christian.brauner@ubuntu.com> +.\" +.\" %%%LICENSE_START(VERBATIM) +.\" Permission is granted to make and distribute verbatim copies of this +.\" manual provided the copyright notice and this permission notice are +.\" preserved on all copies. +.\" +.\" Permission is granted to copy and distribute modified versions of this +.\" manual under the conditions for verbatim copying, provided that the +.\" entire resulting derived work is distributed under the terms of a +.\" permission notice identical to this one. +.\" +.\" Since the Linux kernel and libraries are constantly changing, this +.\" manual page may be incorrect or out-of-date. The author(s) assume no +.\" responsibility for errors or omissions, or for damages resulting from +.\" the use of the information contained herein. The author(s) may not +.\" have taken the same level of care in the production of this manual, +.\" which is licensed free of charge, as they might when working +.\" professionally. +.\" +.\" Formatted or processed versions of this manual, if unaccompanied by +.\" the source, must acknowledge the copyright and authors of this work. +.\" %%%LICENSE_END +.\" +.TH MOUNT_SETATTR 2 2020-07-14 "Linux" "Linux Programmer's Manual" +.SH NAME +mount_setattr \- change mount options of a mount or mount tree +.SH SYNOPSIS +.nf +.BI "int mount_setattr(int " dfd ", const char *" path ", unsigned int " flags , +.BI " struct mount_attr *" attr ", size_t " size ); +.fi +.PP +.IR Note : +There is no glibc wrapper for this system call; see NOTES. +.SH DESCRIPTION +The +.BR mount_setattr () +system call changes the mount properties of a mount or whole mount tree. +If +.I path +is a relative pathname, then it is interpreted relative to the directory +referred to by the file descriptor +.I dirfd +(or the current working directory of the calling process, if +.I dirfd +is the special value +.BR AT_FDCWD ). +If +.BR AT_EMPTY_PATH +is specified in +.I flags +then the mount properties of the mount identified by +.I dirfd +are changed. +.PP +The +.BR mount_setattr () +syscall uses an extensible structure (\fIstruct mount_attr\fP) to allow for +future extensions. Any future extensions to +.BR mount_setattr () +will be implemented as new fields appended to the above structure, +with a zero value in a new field resulting in the kernel behaving +as though that extension field was not present. +Therefore, the caller +.I must +zero-fill this structure on +initialization. +(See the "Extensibility" section of the +.B NOTES +for more detail on why this is necessary.) +.PP +The +.I size +argument must be specified as +.IR "sizeof(struct mount_attr)" . +.\" +.PP +The +.I flags +argument can be used to alter the path resolution behavior. The supported +values are: +.TP +.in +4n +.B AT_EMPTY_PATH +.in +4n +The mount properties of the mount identified by +.I dfd +are changed. +.TP +.in +4n +.B AT_RECURSIVE +.in +4n +Change the mount properties of the whole mount tree. +.TP +.in +4n +.B AT_SYMLINK_NOFOLLOW +.in +4n +Don't follow trailing symlinks. +.TP +.in +4n +.B AT_NO_AUTOMOUNT +.in +4n +Don't trigger automounts. +.PP +The +.I attr +argument of +.BR mount_setattr () +is a structure of the following form: +.PP +.in +4n +.EX +struct mount_attr { + u64 attr_set; /* Mount properties to set. */ + u64 attr_clr; /* Mount properties to clear. */ + u64 propagation; /* Mount propagation type. */ + u64 userns_fd; /* User namespace file descriptor. */ +}; +.EE +.in +.PP +The +.I attr_set +and +.I attr_clr +members are used to specify the mount options that are supposed to be set or +cleared for a given mount or mount tree. The following mount attributes can be +specified in the +.I attr_set +and +.I attr_clear +fields: +.TP +.in +4n +.B MOUNT_ATTR_RDONLY +.in +4n +If set in +.I attr_set +makes the mount read only and if set in +.I attr_clr +removes the read only setting if set on the mount. +.TP +.in +4n +.B MOUNT_ATTR_NOSUID +.in +4n +If set in +.I attr_set +makes the mount not honor set-user-ID and set-group-ID bits or file capabilities +when executing programs +and if set in +.I attr_clr +clears the set-user-ID, set-group-ID bits, file capability restriction if set on +this mount. +.TP +.in +4n +.B MOUNT_ATTR_NODEV +.in +4n +If set in +.I attr_set +prevents access to devices on this mount +and if set in +.I attr_clr +removes the device access restriction if set on this mount. +.TP +.in +4n +.B MOUNT_ATTR_NOEXEC +.in +4n +If set in +.I attr_set +prevents executing programs on this mount +and if set in +.I attr_clr +removes the restriction to execute programs on this mount. +.TP +.in +4n +.B MOUNT_ATTR_NODIRATIME +.in +4n +If set in +.I attr_set +prevents updating access time for directories on this mount +and if set in +.I attr_clr +removes access time restriction for directories. Note that +.I MOUNT_ATTR_NODIRATIME +can be combined with other access time settings and is implied +by the noatime setting. All other access time settins are mutually +exclusive. +.TP +.in +4n +.B MOUNT_ATTR__ATIME - Changing access time settings +.in +4n +In the new mount api the access time values are an enum starting from 0. +Even though they are an enum in contrast to the other mount flags such as +.I MOUNT_ATTR_NOEXEC +they are nonetheless passed in +.I attr_set +and +.I attr_clr +to keep the uapi consistent since +.BR fsmount () +has the same behavior. +.IP +.in +4n +Note, since access times are an enum, not a bitmap, users wanting to transition +to a different access time setting cannot simply specify the access time in +.I attr_set +but must also set +.I MOUNT_ATTR__ATIME +in the +.I attr_clr +field. The kernel will verify that +.I MOUNT_ATTR__ATIME +isn't partially set in +.I attr_clr +and that +.I attr_set +doesn't have any access time bits set if +.I MOUNT_ATTR__ATIME +isn't set in +.I attr_clr. +.TP +.in +8n +.B MOUNT_ATTR_RELATIME +.in +8n +When a file is accessed via this mount, update the file's last access time +(atime) only if the current value of atime is less than or equal to the file's +last modification time (mtime) or last status change time (ctime). +.IP +.in +8n +To enable this access time setting on a mount or mount tree +.I MOUNT_ATTR_RELATIME +must be set in +.I attr_set +and +.I MOUNT_ATTR__ATIME +must be set in the +.I attr_clr +field. +.TP +.in +8n +.B MOUNT_ATTR_NOATIME +.in +8n +Do not update access times for (all types of) files on this mount. +.IP +.in +8n +To enable this access time setting on a mount or mount tree +.I MOUNT_ATTR_NOATIME +must be set in +.I attr_set +and +.I MOUNT_ATTR__ATIME +must be set in the +.I attr_clr +field. +.TP +.in +8n +.B MOUNT_ATTR_STRICTATIME +.in +8n +Always update the last access time (atime) when files are +accessed on this mount. +.IP +.in +8n +To enable this access time setting on a mount or mount tree +.I MOUNT_ATTR_STRICTATIME +must be set in +.I attr_set +and +.I MOUNT_ATTR__ATIME +must be set in the +.I attr_clr +field. +.TP +.in +4n +.B MOUNT_ATTR_IDMAP +.in +4n +If set in +.I attr_set +creates an idmapped mount. The idmapping is taken from the user namespace +specified in +.I userns_fd +and attached to the mount. It is currently not supported to change the +idmapping of a mount after it has been idmapped. Therefore, it is invalid to +specify +.I MOUNT_ATTR_IDMAP +in +.I attr_clr. +More details can be found in subsequent paragraphs. +.IP +.in +4n +Creating an idmapped mount allows to change the ownership of all files located +under a given mount. Other mounts that expose the same files will not be +affected, i.e. the ownership will not be changed. Consequently, a caller +accessing files through an idmapped mount will see files under an idmapped +mount owned by the uid and gid as specified in the idmapping attached to the +mount. +.IP +.in +4n +The idmapping is also applied to the following +.BR xattr (7) +namespaces: +.RS +.RS +.IP \(bu 2 +The +.I security. +namespace when interacting with filesystem capabilities through the +.I security.capability +key whenever filesystem +.BR capabilities (7) +are stored or returned in the +.I VFS_CAP_REVISION_3 +format which stores a rootid alongside the capabilities. +.IP \(bu 2 +The +.I system.posix_acl_access +and +.I system.posix_acl_default +keys whenever uids or gids are stored in +.I ACL_USER +and +.I ACL_GROUP +entries. +.RE +.RE +.IP +.in +4n +The following conditions must be met in order to create an idmapped mount: +.RS +.RS +.IP \(bu 2 +The caller must currently have the +.I CAP_SYS_ADMIN +capability in the user namespace the underlying filesystem has been mounted in. +.IP \(bu +The underlying filesystem must support idmapped mounts. Currently +.BR xfs (5), +.BR ext4 (5) +and +.BR fat +filesystems support idmapped mounts with more filesystems being actively worked +on. +.IP \(bu +The mount must not already be idmapped. This also implies that the idmapping of +a mount cannot be altered. +.IP \(bu +The mount must be a detached/anonymous mount, i.e. it must have been created by +calling +.BR open_tree () +with the +.I OPEN_TREE_CLONE +flag and it must not already have been visible in the filesystem. +.RE +.IP +.RE +.IP +.in +4n +In the common case the user namespace passed in +.I userns_fd +together with +.I MOUNT_ATTR_IDMAP +in +.I attr_set +to create an idmapped mount will be the user namespace of a container. In other +scenarios it will be a dedicated user namespace associated with a given user's +login session as is the case for portable home directories in +.BR systemd-homed.service (8)). +Details on how to create user namespaces and how to setup idmappings can be +gathered from +.BR user_namespaces (7). +.IP +.in +4n +In essence, an idmapping associated with a user namespace is a 1-to-1 mapping +between source and target ids for a given range. Specifically, an idmapping +always has the abstract form +.I [type of id] [source id] [target id] [range]. +For example, uid 1000 1001 1 would mean that uid 1000 is mapped to uid 1001, +gid 1000 1001 2 would mean that gid 1000 will be mapped to gid 1001 and gid +1001 to gid 1002. If we were to attach the idmapping of uid 1000 1001 1 to a +mount it would cause all files owned by uid 1000 to be owned by uid 1001. It is +possible to specify up to 340 of such idmappings providing for a great deal of +flexibility. If any source ids are not mapped to a target id all files owned by +that unmapped source id will appear as being owned by the overflow uid or +overflow gid respectively (see. +.BR user_namespaces (7) +and +.BR proc (5)). +.IP +.in +4n +Idmapped mounts can be useful in the following and a variety of other +scenarios: +.RS +.RS +.IP \(bu 2 +Idmapped mounts make it possible to easily share files between multiple users +or multiple machines especially in complex scenarios. For example, idmapped +mounts are used to implement portable home directories in +.BR systemd-homed.service (8) +whre they allow users to move their home directory to an external storage +device and use it on multiple computers where they are assigned different uids +and gids. This effectively makes it possible to assign random uids and gids at +login time. +.IP \(bu +It is possible to share files from the host with unprivileged containers +without having to change ownership permanently through +.IP \(bu +It is possible to idmap a container's rootfs and without having to mangle every +file. For example, Chromebooks use it to share the user's Download folder with +their unprivileged containers used for development. +.IP \(bu +It is possible to share files between containers with non-overlapping +idmappings +.BR chown (2). +.IP \(bu +Filesystem that lack a proper concept of ownership such as fat can use idmapped +mounts to implement discretionary access (DAC) permission checking. +.IP \(bu +They allow users to +efficiently changing ownership on a per-mount basis without having to +(recursively) +.BR chown (2) +all files. In contrast to chown (2) changing ownership of large sets of files +is instantenous with idmapped mounts. This is especially useful when ownership +of a whole root filesystem of a virtual machine or container is be changed. +With idmapped mounts a single syscall +.BR mount_setattr +syscall will be sufficient to change the ownership of all files. +.IP \(bu +Idmapped mounts always take the current ownership into account as +idmappings specify what a given uid or gid is supposed to be mapped to. This +contrasts with the +.BR chown (2) +syscall which cannot by itself take the current ownership of the files it +changes into account. It simply changes the ownership to the specified uid and +gid. +.IP \(bu +Idmapped mounts allow to change ownership locally, restricting it +to specific mounts, and temporarily as the ownership changes only apply as long +as the mount exists. In contrast, changing ownership via the +.BR chown (2) +syscall changes the ownership globally and permanently. +.RE +.RE +.IP +.in +4n +.PP +The +.I propagation +member is used to specify the propagation type of the mount or mount tree. +The supported mount propagation settings are: +.TP +.in +4n +.B MS_PRIVATE +.in +4n +Turn all mounts into private mounts. Mount and umount events do not propagate +into or out of this mount point. +.TP +.in +4n +.B MS_SHARED +.in +4n +Turn all mounts into shared mounts. Mount points share events with members of a +peer group. Mount and unmount events immediately under this mount point +will propagate to the other mount points that are members of the peer group. +Propagation here means that the same mount or unmount will automatically occur +under all of the other mount points in the peer group. Conversely, mount and +unmount events that take place under peer mount points will propagate to this +mount point. +.TP +.in +4n +.B MS_SLAVE +.in +4n +Turn all mounts into dependent mounts. Mount and unmount events propagate into +this mount point from a shared peer group. Mount and unmount events under this +mount point do not propagate to any peer. +.TP +.in +4n +.B MS_UNBINDABLE +.in +4n +This is like a private mount, and in addition this mount can't be bind mounted. +Attempts to bind mount this mount will fail. +When a recursive bind mount is performed on a directory subtree, any bind +mounts within the subtree are automatically pruned (i.e., not replicated) when +replicating that subtree to produce the target subtree. +.PP +The +.I size +argument that is supplied to +.BR mount_setattr () +should be initialized to the size of this structure. +(The existence of the +.I size +argument permits future extensions to the +.IR mount_attr +structure.) +.SH RETURN VALUE +On success, +.BR mount_setattr () +zero is returned. On error, \-1 is returned and +.I errno +is set to indicate the cause of the error. +.SH ERRORS +.TP +.B EBADF +.I dfd +is not a valid file descriptor. +.TP +.B ENOENT +A pathname was empty or had a nonexistent component. +.TP +.B EINVAL +Unsupported value in +.I flags +.TP +.B EINVAL +Unsupported value was specified in the +.I attr_set +field of +.IR mount_attr. +.TP +.B EINVAL +Unsupported value was specified in the +.I attr_clr +field of +.IR mount_attr. +.TP +.B EINVAL +Unsupported value was specified in the +.I propagation +field of +.IR mount_attr. +.TP +.B EINVAL +An access time setting was specified in the +.I attr_set +field without +.I MOUNT_ATTR__ATIME +being set in the +.I attr_clr +field. +.TP +.B EINVAL +.I MOUNT_ATTR_IDMAP +was specified in +.I attr_clr. +.TP +.B EINVAL +A file descriptor value was specified in +.I userns_fd +which exceeds +.I INT_MAX. +.TP +.B EBADF +An invalid file descriptor value was specified in +.I userns_fd. +.TP +.B EINVAL +A valid file descriptor value was specified in +.I userns_fd +but the file descriptor wasn't a namespace file descriptor or did not refer to +a user namespace. +.TP +.B EPERM +A valid file descriptor value was specified in +.I userns_fd +but the file descriptor refers to the initial user namespace. +.TP +.B EPERM +An already idmapped mount was supposed to be idmapped. +.TP +.B EINVAL +The underlying filesystem does not support idmapped mounts. +.TP +.B EPERM +The caller does not have +.I CAP_SYS_ADMIN +in the user namespace the underlying filesystem is mounted in. +.TP +.B EINVAL +The mount to idmap is not a detached mount, i.e. the mount is already visible +in the filesystem. +.TP +.B EINVAL +A partial access time setting was specified in +.I attr_clr +instead of +.I MOUNT_ATTR__ATIME +being set. +.TP +.B EINVAL +Caller tried to change the mount properties of a mount or mount tree +in another mount namespace. +.SH VERSIONS +.BR mount_setattr () +first appeared in Linux 5.12. +.\" commit ? +.SH CONFORMING TO +.BR mount_setattr () +is Linux specific. +.SH NOTES +Currently, there is no glibc wrapper for this system call; call it using +.BR syscall (2). +.\" +.SS Extensibility +In order to allow for future extensibility, +.BR mount_setattr () +equivalent to +.BR openat2 (2) +and +.BR clone3 (2) +requires the user-space application to specify the size of the +.I mount_attr +structure that it is passing. +By providing this information, it is possible for +.BR mount_setattr () +to provide both forwards- and backwards-compatibility, with +.I size +acting as an implicit version number. +(Because new extension fields will always +be appended, the structure size will always increase.) +This extensibility design is very similar to other system calls such as +.BR perf_setattr (2), +.BR perf_event_open (2), +.BR clone3 (2) +and +.BR openat2 (2) +.PP +If we let +.I usize +be the size of the structure as specified by the user-space application, and +.I ksize +be the size of the structure which the kernel supports, then there are +three cases to consider: +.IP \(bu 2 +If +.IR ksize +equals +.IR usize , +then there is no version mismatch and +.I how +can be used verbatim. +.IP \(bu +If +.IR ksize +is larger than +.IR usize , +then there are some extension fields that the kernel supports +which the user-space application +is unaware of. +Because a zero value in any added extension field signifies a no-op, +the kernel +treats all of the extension fields not provided by the user-space application +as having zero values. +This provides backwards-compatibility. +.IP \(bu +If +.IR ksize +is smaller than +.IR usize , +then there are some extension fields which the user-space application +is aware of but which the kernel does not support. +Because any extension field must have its zero values signify a no-op, +the kernel can +safely ignore the unsupported extension fields if they are all-zero. +If any unsupported extension fields are non-zero, then \-1 is returned and +.I errno +is set to +.BR E2BIG . +This provides forwards-compatibility. +.PP +Because the definition of +.I struct mount_attr +may change in the future (with new fields being added when system headers are +updated), user-space applications should zero-fill +.I struct mount_attr +to ensure that recompiling the program with new headers will not result in +spurious errors at runtime. +The simplest way is to use a designated +initializer: +.PP +.in +4n +.EX +struct mount_attr attr = { + .attr_set = MOUNT_ATTR_RDONLY, + .attr_clr = MOUNT_ATTR_NODEV +}; +.EE +.in +.PP +or explicitly using +.BR memset (3) +or similar: +.PP +.in +4n +.EX +struct mount_attr attr; +memset(&attr, 0, sizeof(attr)); +attr.attr_set = MOUNT_ATTR_RDONLY; +attr.attr_clr = MOUNT_ATTR_NODEV; +.EE +.in +.PP +A user-space application that wishes to determine which extensions +the running kernel supports can do so by conducting a binary search on +.IR size +with a structure which has every byte nonzero (to find the largest value +which doesn't produce an error of +.BR E2BIG ). +.SH SEE ALSO +.BR capabilities (7), +.BR mount (2), +.BR mount_namespaces (7), +.BR newuidmap (1), +.BR newgidmap (1), +.BR proc (5), +.BR user_namespaces (7) |