diff --git a/content/applications/app_specific/bioinformatics_tools/biodata_module.md b/content/applications/app_specific/bioinformatics_tools/biodata_module.md index 4aa14a1b023e7ea32b5cd71febd70eb2a403d273..df77da73e63fc329b68743340ca8606144aa40b5 100644 --- a/content/applications/app_specific/bioinformatics_tools/biodata_module.md +++ b/content/applications/app_specific/bioinformatics_tools/biodata_module.md @@ -11,6 +11,11 @@ HCC hosts multiple databases (BLAST, KEGG, PANTHER, InterProScan), genome files, In order to use these resources, the "**biodata**" module needs to be loaded first. For how to load module, please check [Module Commands]({{< relref "/applications/modules/_index.md" >}}). +{{% notice info %}} +The *biodata* module is maintained and updated by the Bioinformatics Core Research Facility (BCRF). +Please email bcrf-support@unl.edu or hcc-support@unl.edu with any questions or issues with the module. +{{% /notice %}} + Loading the "**biodata**" module will pre-set many environment variables, but most likely you will only need a subset of them. Environment variables can be used in your command or script by prefixing `$` to the name. The major environment variables are: diff --git a/content/handling_data/_index.md b/content/handling_data/_index.md index f8e00978f762015e67ec54583c403a2dd52b9714..5dee5c2ffdbfe64ce59d5892fbc85144665fa3c6 100644 --- a/content/handling_data/_index.md +++ b/content/handling_data/_index.md @@ -26,3 +26,30 @@ email <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl ### Using */scratch* storage space to improve running jobs: [Using Scratch]({{<relref "using_scratch_space" >}}) + + +### Storing public software-specific and research datasets + +#### Software-specific datasets +Many software packages available on Swan (e.g., AlphaFold, HUMAnN) require datasets. Where possible, HCC has pre-downloaded the datasets and configured the modules to use the datasets. This avoids any [quota]({{<relref "data_storage">}}) and [purge policy]({{<relref "data_storage/#purge-policy" >}}) issues. + +If you are not sure if a dataset the software requires is already available on Swan, **please check the module info, or email <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl.edu</a> before you attempt to download it yourself**. + +#### Research datasets +Many public datasets are commonly used for running jobs across various scientific fields. To avoid any [per-user or per-group quota issues]({{<relref "data_storage">}}), HCC can host these datasets on a system-wide location on Swan excluded from the purge policy, such that the entire HCC community can benefit from using a shared copy. + +HCC currently hosts a few public datasets on Swan that can be accessed via data modules: +- **biodata/1.0** - [Static data resources for bioinformatics/computational biology]({{<relref "biodata_module" >}}) +- **mldata/1.0** - Static data resources for machine-learning/AI (e.g., ImageNet, TCGA, CAMELYON, TCIA) +- **mridata/1.0** - Static data resources for MRI/NeuroImaging (e.g., Penn Memory Center 3T ASHS 1.0 Atlas) +- **geodata/1.0** - Static data resources for geo data (e.g., NLDAS-2) +- **chemdata/1.0** - Static data resources for computational chemistry (e.g., Tetramers, Zinc) + +If you are not sure if a public dataset is already available on Swan, **please check the info of the available data modules (e.g., module help mldata/1.0), or email <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl.edu</a> before you attempt to download the dataset yourself**. + +To request a version update of the system-wide available datasets, please email <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl.edu</a>. + +{{% notice note %}} +If you have a licensed dataset you want to share with your research group, please email hcc-support@unl.edu for assistance. +{{% /notice %}} +