From b6879203561d1ab971d1a576a58f944d785194f7 Mon Sep 17 00:00:00 2001 From: Adam Caprez <acaprez2@unl.edu> Date: Fri, 2 Sep 2022 19:21:50 -0500 Subject: [PATCH] Add large # of files section to good practices. --- content/good_hcc_practices/_index.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/content/good_hcc_practices/_index.md b/content/good_hcc_practices/_index.md index ecc888fa..9ec3097a 100644 --- a/content/good_hcc_practices/_index.md +++ b/content/good_hcc_practices/_index.md @@ -36,6 +36,16 @@ all the necessary files need to be either moved to a permanent storage, or delet disk, in your program.** This approach stresses the file system and may cause general issues. Instead, consider reading and writing large blocks of data in memory over time, or utilizing more advanced parallel I/O libraries, such as *parallel hdf5* and *parallel netcdf*. +#### Large numbers of files considerations + * **No POSIX file system performs well with an excessive number of files**, as each file operation +requires opening and closing, which is relatively expensive. + * Moreover, network data transfer operations that involve frequent scanning (walking) of every +file in a set for syncing operations (backups, automated copying) can become excessively taxing for +network file systems, especially at scale. + * Large numbers of files can take an inordinate amount of time to transfer in or out of network +file systems during data migration operations. + * **Computing workflows can be negatively impacted by unnecessarily large numbers of file operations**, including file transfers. + ## Internal and External Networks * **Use archives to transfer large number of files.** If you are performing file transfer of -- GitLab