Skip to content
Snippets Groups Projects
Commit adc21e78 authored by Natasha Pavlovikj's avatar Natasha Pavlovikj
Browse files

Add info on public datasets

parent 85a2b566
No related branches found
No related tags found
1 merge request!448Add info on public datasets
...@@ -11,6 +11,11 @@ HCC hosts multiple databases (BLAST, KEGG, PANTHER, InterProScan), genome files, ...@@ -11,6 +11,11 @@ HCC hosts multiple databases (BLAST, KEGG, PANTHER, InterProScan), genome files,
In order to use these resources, the "**biodata**" module needs to be loaded first. In order to use these resources, the "**biodata**" module needs to be loaded first.
For how to load module, please check [Module Commands]({{< relref "/applications/modules/_index.md" >}}). For how to load module, please check [Module Commands]({{< relref "/applications/modules/_index.md" >}}).
{{% notice info %}}
The *biodata* module is maintained and updated by the Bioinformatics Core Research Facility (BCRF).
Please email bcrf-support@unl.edu or hcc-support@unl.edu with any questions or issues with the module.
{{% /notice %}}
Loading the "**biodata**" module will pre-set many environment variables, but most likely you will only need a subset of them. Environment variables can be used in your command or script by prefixing `$` to the name. Loading the "**biodata**" module will pre-set many environment variables, but most likely you will only need a subset of them. Environment variables can be used in your command or script by prefixing `$` to the name.
The major environment variables are: The major environment variables are:
......
...@@ -26,3 +26,30 @@ email <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl ...@@ -26,3 +26,30 @@ email <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl
### Using */scratch* storage space to improve running jobs: ### Using */scratch* storage space to improve running jobs:
[Using Scratch]({{<relref "using_scratch_space" >}}) [Using Scratch]({{<relref "using_scratch_space" >}})
### Storing public software-specific and research datasets
#### Software-specific datasets
Many software packages available on Swan (e.g., AlphaFold, HUMAnN) require datasets. Where possible, HCC has pre-downloaded the datasets and configured the modules to use the datasets. This avoids any [quota]({{<relref "data_storage">}}) and [purge policy]({{<relref "data_storage/#purge-policy" >}}) issues.
If you are not sure if a dataset the software requires is already available on Swan, **please check the module info, or email <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl.edu</a> before you attempt to download it yourself**.
#### Research datasets
Many public datasets are commonly used for running jobs across various scientific fields. To avoid any [per-user or per-group quota issues]({{<relref "data_storage">}}), HCC can host these datasets on a system-wide location on Swan excluded from the purge policy, such that the entire HCC community can benefit from using a shared copy.
HCC currently hosts a few public datasets on Swan that can be accessed via data modules:
- **biodata/1.0** - [Static data resources for bioinformatics/computational biology]({{<relref "biodata_module" >}})
- **mldata/1.0** - Static data resources for machine-learning/AI (e.g., ImageNet, TCGA, CAMELYON, TCIA)
- **mridata/1.0** - Static data resources for MRI/NeuroImaging (e.g., Penn Memory Center 3T ASHS 1.0 Atlas)
- **geodata/1.0** - Static data resources for geo data (e.g., NLDAS-2)
- **chemdata/1.0** - Static data resources for computational chemistry (e.g., Tetramers, Zinc)
If you are not sure if a public dataset is already available on Swan, **please check the info of the available data modules (e.g., module help mldata/1.0), or email <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl.edu</a> before you attempt to download the dataset yourself**.
To request a version update of the system-wide available datasets, please email <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl.edu</a>.
{{% notice note %}}
If you have a licensed dataset you want to share with your research group, please email hcc-support@unl.edu for assistance.
{{% /notice %}}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment