diff --git a/content/handling_data/data_sharing.md b/content/handling_data/data_sharing.md index 7708696eec2f2f49a8270a4f04048742daf04313..28f16d6860c2f4fb6c6106d1ea8ac11cfb6b0374 100644 --- a/content/handling_data/data_sharing.md +++ b/content/handling_data/data_sharing.md @@ -7,30 +7,30 @@ weight = "30" --- ## Standard Unix permissions -Each file on the cluster can have read \(r\), write (w) and execute (x) permissions for different access groupings; these access groupings are known as the user (u), group (g) and other (o) permission modes of the file. The user permissions map to the UID (user identifier number) of the account that created the file. Similarly, the group permissions map to the GID (group identifier number) of the account that created the file; generally the GID is the primary group of the user account in most cases. The other (o) permissions map to all other users not matching the prior two groupings. To say that another way, your HCC account username maps to the UID, your HCC primary group maps to the GID (though this may depend on where the file is located/created with regards to supplementary group access), and the other is all the other users that are not part of your HCC user account or group(s). The (x) permission differs depending on file type; directory type with (x) will allow search operations for the grouping involved under that directory path - lacking the (x) will result in permission denied errors for the grouping being checked for path access. File type with (x) are known as executable files that the system will run (load a program image file instance into RAM memory and execute it on the CPU) while files without (x) tend to be data files of some sort for input or output. +Each file on the cluster can have read \(r\), write (w) and execute (x) permissions for different access groupings; these access groupings are known as the user (u), group (g) and other (o) permission modes of the file. The user permissions map to the UID (user identifier number) of the account that created the file. Similarly, the group permissions map to the GID (group identifier number) of the account that created the file; generally the GID is the primary group of the user account in most cases. The other (o) permissions map to all other users not matching the prior two groupings. To say that another way, your HCC account username maps to the UID, your HCC primary group maps to the GID (although this may depend on where the file is located/created with regards to supplementary group access), and the other is all the other users that are not part of your HCC user account or group(s). The (x) permission differs depending on file type; directory type with (x) will allow search operations for the grouping involved under that directory path - lacking the (x) will result in permission denied errors for the grouping being checked for path access. File types with (x) are known as executable files that the system will run (load a program image file instance into RAM memory and execute it on the CPU), while files without (x) tend to be data files of some sort used for input or output. -Directory files start with a "d" in the permission listing while files have a hyphen "-", the user (u), group (g) and other (o) permission modes follow, i.e. `tuuugggooo` where `t` is the type of dir/file, `uuu`, `ggg` and `ooo` are permission placeholders for the prior mentioned (u), (g) and (o) permission groupings. +Directory files start with a "d" in the permission listing, while files have a hyphen "-". Next, the user (u), group (g) and other (o) permission modes follow, i.e., `tuuugggooo` where `t` is the type of directory/file, `uuu`, `ggg` and `ooo` are permission placeholders for the prior mentioned (u), (g) and (o) permission groupings. -## Directory permission mode example +## Directory permission mode example: ``` -drwxr-x--x # an example directory permission modes +drwxr-x--x # an example directory permission modes |||||||||| -tuuugggooo # permission mode template -d # directory type "d" - rwx # directory user (UID owner) has read (r) write (w) and access (x) permissions to what is contained in the directory - r-x # directory group (GID owner) has only read (r) and access (x) permissions to what is contained in the directory but cannot create new entries - --x # all other users can try to search/access (x) for content, if they know the path name under the directory already, but cannot read (r) to check/discover existing entries if they don't know of them, or write (w) to create new entries within the directory +tuuugggooo # permission mode template +d # directory type "d" + rwx # directory user (UID owner) has read (r) write (w) and access/execute (x) permissions to what is contained in the directory + r-x # directory group (GID owner) has only read (r) and access/execute (x) permissions to what is contained in the directory but cannot create new entries + --x # all other users can try to search/access/execute (x) for content, if they know the absolute path name under the directory already, but cannot read (r) to check/discover existing entries if they don't know of them, or write (w) to create new entries within the directory ``` -## File permission mode example +## File permission mode example: ``` --rw-r----- # an example file permission modes +-rw-r----- # an example file permission modes |||||||||| -tuuugggooo # permission mode template -- # file type "-" - rw- # file user (UID owner) has read (r) write (w) but not execute (x) permissions to the file - r-- # file group (GID owner) has only read (r) permissions - --- # all other users on the system have no access +tuuugggooo # permission mode template +- # file type "-" + rw- # file user (UID owner) has read (r) write (w) but not execute (x) permissions to the file + r-- # file group (GID owner) has only read (r) permissions + --- # all other users on the system have no access ``` File and directories permissions can be set using [chmod](https://en.wikipedia.org/wiki/Chmod). @@ -42,66 +42,66 @@ If you want to share data between group members, we can create group-level read- Everyone that is part of the HCC group can read data stored in the shared directory and/or write data in the shared directory. If you are interested in having a group-level shared directory, please email hcc-support@unl.edu for the setup. When data is stored in the shared directory, occasionally some permission errors may occur. -In this case, a `shared_fix.sh` script can be used to correct the permissions. -This script should be run by the owner of the data in the shared directory where other group members are having difficulty with needed access. The script will ensure group (g) modes match the owning user's user (u) permissions so group members have the same level of access. +In this case, a `shared_fix.sh` script can be used to correct the permissions that is created by HCC staff when the group-level shared directory is set. +This script should be run by the owner of the data in the shared directory where other group members are having difficulty with the needed access. +The script will ensure that the group (g) modes match the user's user (u) permissions of the owner of the shared files so group members can have the same level of access. **Pros:** - - Data in the shared directory can be easily accessed on the cluster and used as part of SLURM jobs. - The access for the shared directory can be read-only or read-write. [Josh comment !!remove!!: this is easy for user's to break out of - so, not sure how to caveat word this - other than end-user attention is _always_ needed and not just assume it is only read-only vs read-write - accidents happen.] - When multiple HCC users need access to the same data and scripts, storing the data in group-level shared directory is recommended. **Cons:** - - Users need HCC accounts to access the shared directory *from the cluster nodes*. - Users need to be part of the HCC group with the shared directory in order to access data. - The permissions are set as discussed above in the Standard Unix Permissions section. +{{% notice warning %}} +While the group-level shared directory can be created as read-only or read-write, please always make sure that the shared data has the correct permissions. +{{% /notice %}} + ## Using user-level world-readable directory If you want to create directory under your HCC account that is readable and accessible by everyone with HCC account, whether or not you are part of the same HCC group, the commands you can use are: ``` cd ${WORK} mkdir public -chmod go+x ${WORK} # ensure directory search is possible to your ${WORK} to group (g) and other (o) +chmod go+x ${WORK} # ensure directory search is possible to your ${WORK} to group (g) and other (o) chmod u=rwx,go=rx public ``` Here, read \(r\), write (w) and execute (x) permissions are given to the user (u), and read \(r\) and execute (x) permissions are given for the group (g) and others (o). After the world-readable directory is created, you can share the path to it with your collaborators that have HCC accounts. **Pros:** - - Data in the shared directory can be easily accessed on the cluster and used as part of SLURM jobs. - Easy way to share a single file. **Cons:** - - Users need HCC accounts to access the shared directory *from the cluster nodes*. - You should be careful when setting permissions this way - you can lock yourself from your HCC account if the permissions are insufficient and/or you can give public access to your files. {{% notice info %}} -Please note that when sharing a file, all the directories in the path to the file need to have execute (x) bits set in order for its contents to be accessible and read \(r\) bits to show up in listing queries, e.g. the `ls -l` command. For example, if you want to share the directory `/work/group/username/shared/`, read \(r\) and execute (x) permissions should be given to `/work`, `/work/group`, `/work/group/username` and `/work/group/username/shared` to ensure both access to the files and the ability to list directory entries for the various path components. +Please note that when sharing a file, all the directories in the path to the file need to have execute (x) bits set in order for its contents to be accessible and read \(r\) bits to show up in listing queries, e.g., the `ls -l` command. For example, if you want to share the directory `/work/group/username/shared/`, read \(r\) and execute (x) permissions should be given to `/work`, `/work/group`, `/work/group/username` and `/work/group/username/shared` to ensure both access to the files and the ability to list directory entries for the various path components. {{% /notice %}} - ## Using POSIX Access Control Lists (ACL) With the standard Unix/[POSIX permissions](https://en.wikipedia.org/wiki/File-system_permissions#POSIX_permissions) the cluster uses, it is not possible to share data with only a single user as only the user, group and other permission model is in effect. -However, with POSIX Access Control Lists model ([POSIX extended ACLs](https://man7.org/linux/man-pages/man5/acl.5.html)) which extend the standard POSIX model, it is possible - but more involved and only recommended for the advanced user that has the need and is already well experienced with the standard model. We refer such users to the tool docs: [getfacl](https://man7.org/linux/man-pages/man1/getfacl.1.html) and [setfacl](https://man7.org/linux/man-pages/man1/setfacl.1.html). +However, this is possible with the POSIX Access Control Lists model ([POSIX extended ACLs](https://man7.org/linux/man-pages/man5/acl.5.html)) which extends the standard POSIX model. This is more involved setup that is only recommended for the advanced user that has the need and is already well experienced with the standard model. We refer such users to the tool docs: [getfacl](https://man7.org/linux/man-pages/man1/getfacl.1.html) and [setfacl](https://man7.org/linux/man-pages/man1/setfacl.1.html). +{{% notice info %}} Please note that only the `${WORK}` filesystem on the cluster supports ACL, and sharing data in this way on `${COMMON}` is not possible. +{{% /notice %}} + One can use ACL on directories/files stored in `${WORK}` with the `getfacl` and `setfacl` command mentioned above. -Similar to Unix/POSIX permissions, ACL provides read \(r\), write (w) and execute (x) permissions for the user (u), group (g) and other (o). The user is your HCC account, the group is your HCC group (or supplementary group for where the file is located), and the other is all the other users that are not part of your HCC group. An ACL can "extend" this prior mapping by allowing a per-user and/or per-group list of additional groupings that reside within the traditional/standard model's "group permission" grouping. To say that another way, the group rwx permissions mapping expand to multiple entries that only the prior mentioned tools can work with. +Similar to Unix/POSIX permissions, ACL provides read \(r\), write (w) and execute (x) permissions for the user (u), group (g) and other (o). The user is your HCC account, the group is your HCC group (or supplementary group for where the file is located), and the other is all the other users that are not part of your HCC group. An ACL can "extend" this prior mapping by allowing a per-user and/or per-group list of additional groupings that reside within the traditional/standard model's "group permission" grouping. To say that another way, the group *rwx* permissions mapping expands to multiple entries that only the prior mentioned tools can work with. {{% notice info %}} - -KEEP IN MIND: - +**Things to remember:** - Presently only the ${WORK} filesystem supports POSIX extended ACLs, and sharing data in this way on ${COMMON} is not possible). - [Josh comment !!remove!!: we've upgraded ${COMMON} to version 7 and can _technically_ turn knobs to enable it there now - not sure if that's a good idea at this stage of its life though (and it's a Enterprise/must pay for feature which we lack now _I believe_.] - HCC staff cannot help with advanced permission modes as the end user is ultimately responsible for these settings if they choose to add and use them. {{% /notice %}} - -To view the ACL setting for the file file.txt, one can run: +To view the ACL setting for the file `file.txt` on ${WORK}, one can run: ``` getfacl file.txt ``` @@ -115,15 +115,18 @@ group::r-- other::r-- ``` -Running as user `demo01`, to share the file ${WORK}/shared/file.txt with a user `demo02` and grant them read \(r\) and write (w) permissions, the setup steps are: +Running as user `demo01`, to share the file `${WORK}/shared/file.txt` with a user `demo02` and grant them read \(r\) and write (w) permissions, the setup steps are: ``` -$ cd ${WORK}/shared # the "shared" path must be setup by HCC staff, please see the section above on how to request access +$ cd ${WORK}/shared # the "shared" path must be setup by HCC staff, please see the section above on how to request access +# create an emoty file $ touch file.txt +# check the file permissions $ ls -l file.txt -rw-r--r-- 1 demo01 demo 0 Aug 22 16:25 file.txt +# view the ACL settings for the file $ getfacl file.txt # file: file.txt # owner: demo01 @@ -132,13 +135,17 @@ user::rw- group::r-- other::r-- +# set/update the ACL settings for the file $ setfacl -m user:demo02:rw file.txt +# check the file permissions $ ls -l file.txt -rw-rw-r--+ 1 demo01 demo 0 Aug 22 16:25 file.txt +``` +Note the "+" character being added at the end of the permission mode line (-rw-rw-r--+). This indicates a directory or file that has extended ACL rules added to it. -# Note the "+" character being added at the end of the permission mode line, this indicates a directory or file that has extended ACL rules added to it - +``` +# view the ACL settings for the file $ getfacl file.txt # file: file.txt # owner: demo01 @@ -148,23 +155,28 @@ user:demo02:rw- group::r-- mask::rw- other::r-- +``` +Note that a "user:demo02:rw-" mapping was added in the ACL listing. This means the listed user account can be granted the "rw" permission modes only when the "allow mask" line would allow for it, which in this case it does ("mask::rw-"). -# Note that a "user:demo02:rw-" mapping was added in the ACL listing, this means the listed user account can be granted the "rw" permission modes only when the "allow mask" line would allow for it, which in this case it does: "mask::rw-" - -# One more example, directores carry a default ACL entry to grant users/groups that have an entry to pass the same defaults to child directories and files that are created within/under it. - +Directores carry a default ACL entry to grant users/groups that have an entry to pass the same defaults to child directories and files that are created within/under it. +``` $ mkdir test_dir +# check the directory permissions $ ls -ld test_dir drwxr-sr-x 2 demo01 demo 33280 Aug 22 16:36 test_dir +# set/update the ACL settings for the directory # ensure all users can collaborate on newly created files, in this case demo01 and demo02 accounts are working together and expect to share files amongst themselves $ setfacl -m default:user:demo01:rwx -m default:user:demo02:rwx test_dir/ +# check the directory permissions $ ls -ld test_dir drwxr-sr-x+ 2 demo01 demo 33280 Aug 22 16:36 test_dir -# Note (again) the "+" character being added at the end of the permission mode line, this indicates the directory has extended ACL rules added to it +# Note the "+" character being added at the end of the permission mode line (drwxr-sr-x+). This indicates a directory or file that has extended ACL rules added to it. + +# view the ACL settings for the directory $ getfacl test_dir/ # file: test_dir/ # owner: demo01 @@ -180,11 +192,14 @@ default:group::r-x default:mask::rwx default:other::r-x +# create an emoty file $ touch test_dir/file.txt +# check the file permissions $ ls -l test_dir/file.txt -rw-rw-r--+ 1 demo01 demo 0 Aug 22 16:37 test_dir/file.txt +# view the ACL settings for the file $ getfacl test_dir/file.txt # file: test_dir/file.txt # owner: demo01 @@ -195,15 +210,18 @@ user:demo02:rwx #effective:rw- group::r-x #effective:r-- mask::rw- other::r-- +``` +Note the effective mode differs from the rule, this is because the touch command used open() octal permissions of 666 for the file, 4 for read (r), 2 for write (w) and 1 for execute (x) was missing. -# Note the effective mode differs from the rule, this is because the touch command used open() octal permissions of 666 for the file, 4 for read (r), 2 for write (w) - 1 for execute (x) was missing. - +``` +# give the file execute permissions $ chmod g+x test_dir/file.txt +# check the file permissions $ ls -l test_dir/file.txt -rw-rwxr--+ 1 demo01 demo 0 Aug 22 16:37 test_dir/file.txt - +# view the ACL settings for the file $ getfacl test_dir/file.txt # file: test_dir/file.txt # owner: demo01 @@ -214,12 +232,12 @@ user:demo02:rwx group::r-x mask::rwx other::r-- - -# changing the group permission on the file updated the "mask::rwx" extended ACL entry to "allow" the execute (x) permission that were prior missing. Note well, even though the group permissions in the ls listing show rwx for the group, actual GID group members would only have 'r-x' access as the allow mask property is what is actually listed. ``` -With the `setfacl` commands above, the listed "demo" accounts are given read \(r\) write (w) or execute (x) access to the file `file.txt` by the ACL (standard permission modes still apply - as in the demo group members note) and it is assumed that these "demo" accounts have sufficient directory search (x) permissions to reach the ${WORK}/shared path; such details may need to be given when HCC staff setup the shared path if the user accounts are not members of the group involved at the path ${WORK} expands to. +Changing the group permission on the file updated the "mask::rwx" extended ACL entry to "allow" the execute (x) permission that was previously missing. Note well, even though the group permissions in the ls listing show rwx for the group, actual GID group members would only have 'r-x' access as the allow mask property is what is actually listed. + +With the `setfacl` commands above, the listed "demo" accounts are given read \(r\) write (w) or execute (x) access to the file `file.txt` by the ACL (standard permission modes still apply - as in the demo group members note) and it is assumed that these "demo" accounts have sufficient directory search (x) permissions to reach the `${WORK}/shared` path; such details may need to be given when HCC staff sets up the shared path if the user accounts are not members of the group involved at the path ${WORK} expands to. -To remove all extended ACL entries for file.txt: +To remove all extended ACL entries for `file.txt`: ``` setfacl -b file.txt ``` @@ -227,41 +245,37 @@ setfacl -b file.txt More examples on ACLs can be found [here](https://www.geeksforgeeks.org/access-control-listsacl-linux/) and the author of the Linux POSIX ACL implementation has an excellent document on the topic [here](https://www.usenix.org/legacy/publications/library/proceedings/usenix03/tech/freenix03/full_papers/gruenbacher/gruenbacher_html/main.html). **Pros:** - - ACL provides safer and more flexible way to manage access to share data than standard Unix/POSIX permissions. **Cons:** - -- Using ACLs is advanced and requires focus to get the details correct. +- Using ACLs is advanced approach and requires detailed understanding to get the details correct. - Users need HCC accounts to access the shared directory. {{% notice info %}} -Please note that when sharing a file, all the directories in the path to the file need to have execute (x) bits set in order for its contents to be accessible and read \(r\) bits to show up in listing queries, e.g. the `ls -l` command. For example, if you want to share the directory `/work/group/shared/something.txt`, read \(r\) and execute (x) permissions should be given to `/work`, `/work/group`, `/work/group/shared` and `/work/group/ahred/something.txt` to ensure both access to the files and the ability to list directory entries for the various path components. +Please note that when sharing a file, all the directories in the path to the file need to have execute (x) bits set in order for its contents to be accessible and read \(r\) bits to show up in listing queries, e.g., the `ls -l` command. For example, if you want to share the directory `/work/group/username/shared/`, read \(r\) and execute (x) permissions should be given to `/work`, `/work/group`, `/work/group/username` and `/work/group/username/shared` to ensure both access to the files and the ability to list directory entries for the various path components. {{% /notice %}} Please note that using ACLs is not straight-forward and please consider this approach only when the other suggested approaches here do not apply to you. If you have any questions about using ACLs, please email hcc-support@unl.edu. - ## Using Globus shared collections If you want to share data with a single user, a specific custom group of users (independent of your HCC group), or with external collaborators (without HCC accounts), Linux user/group permissions do not provide that flexibility. In this case, [Globus shared collections]({{< relref "/handling_data/data_transfer/globus_connect/file_sharing.md" >}}) offer much more flexibility and control. Once a shared Collection is created (e.g., `/work/group/username/shared`), you can perform "Add Permissions - Share With" multiple times, and each time you can select different subdirectory from the created Collection, set different permissions and share it with different collaborators. + Some things to note: - you can share subdirectories on the same level under the Collection with different permissions for different users (e.g., `/work/group/username/shared/shared1` can be Read/Write and `/work/group/username/shared/shared2` can be Read Only); - if you set the permissions of the subdirectory to Read/Write, all the directories within this subdirectory will have Read/Write permissions and you can not overwrite that (e.g., if `/work/group/username/shared/shared1` is Read/Write then `/work/group/username/shared/shared1/test` will be Read/Write too); - if you set the permissions of the subdirectory to Read Only, you can set Read/Write permissions to the directories within this subdirectory (e.g., if `/work/group/username/shared/shared1` is Read Only then `/work/group/username/shared/shared1/test` can be set to Read/Write if needed). **Pros:** - - Users don't need HCC accounts to access shared data via Globus. - Users don't need to be part of the same HCC group to share data via Globus. -- Anyone with institutional and InCommon credentials can access Globus. +- Anyone with institutional and InCommon credentials can login to Globus. - Globus shared endpoints offer much more flexiblity and control - different data can be shared with different users. - The access for the shared data can be read-only or read-write. -- All file access via Globus is proxied as the user that sets up the share so all files are owned and accessed as this user. +- All file access via Globus is proxied as the user that sets up the share, so all files are owned and accessed as this user. **Cons:** - -- Data shared with Globus can not be accessed directly on the cluster and the data will need to be transferred to the cluster if it is used as part of SLURM job; unless the data being shared is from a cluster file system, in which case the prior mentioned Unix permissions and/or ACLs may be needed to grant the local cluster accounts the needed permissions - thus complicating the share. -- Globus provides web-based App and a CLI tool for the transfer. +- Data shared with Globus can not be accessed directly on the cluster and the data will need to be transferred to the cluster if it is used as part of SLURM job; unless the data being shared is from a cluster filesystem, in which case the prior mentioned Unix permissions and/or ACLs may be needed to grant the HCC accounts the needed permissions - thus complicating the share. +- Globus provides web-based App and a CLI tool for the transfer. \ No newline at end of file