Skip to content
Snippets Groups Projects
Commit 094da08b authored by Natasha Pavlovikj's avatar Natasha Pavlovikj
Browse files

Merge branch 'update_data' into 'master'

Add info on public datasets

See merge request !448
parents 85a2b566 adc21e78
Branches
No related tags found
1 merge request!448Add info on public datasets
......@@ -11,6 +11,11 @@ HCC hosts multiple databases (BLAST, KEGG, PANTHER, InterProScan), genome files,
In order to use these resources, the "**biodata**" module needs to be loaded first.
For how to load module, please check [Module Commands]({{< relref "/applications/modules/_index.md" >}}).
{{% notice info %}}
The *biodata* module is maintained and updated by the Bioinformatics Core Research Facility (BCRF).
Please email bcrf-support@unl.edu or hcc-support@unl.edu with any questions or issues with the module.
{{% /notice %}}
Loading the "**biodata**" module will pre-set many environment variables, but most likely you will only need a subset of them. Environment variables can be used in your command or script by prefixing `$` to the name.
The major environment variables are:
......
......@@ -26,3 +26,30 @@ email <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl
### Using */scratch* storage space to improve running jobs:
[Using Scratch]({{<relref "using_scratch_space" >}})
### Storing public software-specific and research datasets
#### Software-specific datasets
Many software packages available on Swan (e.g., AlphaFold, HUMAnN) require datasets. Where possible, HCC has pre-downloaded the datasets and configured the modules to use the datasets. This avoids any [quota]({{<relref "data_storage">}}) and [purge policy]({{<relref "data_storage/#purge-policy" >}}) issues.
If you are not sure if a dataset the software requires is already available on Swan, **please check the module info, or email <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl.edu</a> before you attempt to download it yourself**.
#### Research datasets
Many public datasets are commonly used for running jobs across various scientific fields. To avoid any [per-user or per-group quota issues]({{<relref "data_storage">}}), HCC can host these datasets on a system-wide location on Swan excluded from the purge policy, such that the entire HCC community can benefit from using a shared copy.
HCC currently hosts a few public datasets on Swan that can be accessed via data modules:
- **biodata/1.0** - [Static data resources for bioinformatics/computational biology]({{<relref "biodata_module" >}})
- **mldata/1.0** - Static data resources for machine-learning/AI (e.g., ImageNet, TCGA, CAMELYON, TCIA)
- **mridata/1.0** - Static data resources for MRI/NeuroImaging (e.g., Penn Memory Center 3T ASHS 1.0 Atlas)
- **geodata/1.0** - Static data resources for geo data (e.g., NLDAS-2)
- **chemdata/1.0** - Static data resources for computational chemistry (e.g., Tetramers, Zinc)
If you are not sure if a public dataset is already available on Swan, **please check the info of the available data modules (e.g., module help mldata/1.0), or email <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl.edu</a> before you attempt to download the dataset yourself**.
To request a version update of the system-wide available datasets, please email <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl.edu</a>.
{{% notice note %}}
If you have a licensed dataset you want to share with your research group, please email hcc-support@unl.edu for assistance.
{{% /notice %}}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment