Skip to content
Snippets Groups Projects
Select Git revision
  • master default protected
  • site_url
  • sislam2-master-patch-51693
  • FAQ
  • test
  • good-hcc-practice-rep-workflow
  • data_share
  • ipynb-doc
  • hchen2016-faq-home-is-full
  • sislam2-master-patch-86974
  • atticguidelines
  • rclone-fix
  • UNL_OneDrive
  • globus-auto-backups
  • RDPv10
15 results

using_anaconda_package_manager.md

Blame
  • title = "Using Anaconda Package Manager"
    description = "How to use the Anaconda Package Manager on HCC resources."
    weight=10

    Anaconda, from Anaconda, Inc is a completely free enterprise-ready distribution for large-scale data processing, predictive analytics, and scientific computing. It includes over 195 of the most popular Python packages for science, math, engineering, and data analysis. It also offers the ability to easily create custom environments by mixing and matching different versions of Python and/or R and other packages into isolated environments that individual users are free to create. Anaconda includes the conda package and environment manager to make managing these environments straightforward.

    Using Anaconda

    While the standard methods of installing packages via pip and easy_install work with Anaconda, the preferred method is using the conda command.

    {{% notice info %}} Full documentation on using Conda is available at http://conda.pydata.org/docs/

    A cheatsheet is also provided. {{% /notice %}}

    A few examples of the basic commands are provided here. For a full explanation of all of Anaconda/Conda's capabilities, see the documentation linked above.

    Anaconda is provided through the anaconda module on HCC machines. To begin using it, load the Anaconda module.

    {{% panel theme="info" header="Load the Anaconda module to start using Conda" %}} {{< highlight bash >}} module load anaconda {{< /highlight >}} {{% /panel %}}

    To display general information about Conda/Anaconda, use the info subcommand.

    {{% panel theme="info" header="Display general information about Conda/Anaconda" %}} {{< highlight bash >}} conda info {{< /highlight >}} {{% /panel %}}

    Conda allows the easy creation of isolated, custom environments with packages and versions of your choosing. To show all currently available environments, and which is active, use the info subcommand with the -e option.

    {{% panel theme="info" header="List available environments" %}} {{< highlight bash >}} conda info -e {{< /highlight >}} {{% /panel %}}

    The active environment will be marked with an asterisk (*) character.

    The list command will show all packages installed in the currently active environment.

    {{% panel theme="info" header="List installed packages in current environment" %}} {{< highlight bash >}} conda list {{< /highlight >}} {{% /panel %}}

    Searching for Packages

    To find packages, use the search subcommand.

    {{% panel theme="info" header="Search for packages" %}} {{< highlight bash >}} conda search numpy {{< /highlight >}} {{% /panel %}}

    If the package is available, this will also display available package versions and compatible Python versions the package may be installed under.

    Creating Custom Anaconda Environments

    The create command is used to create a new environment. It requires at a minimum a name for the environment, and at least one package to install. For example, suppose we wish to create a new environment, and need version 1.17 of NumPy.

    {{% panel theme="info" header="Create a new environment by providing a name and package specification" %}} {{< highlight bash >}} conda create -n mynumpy numpy=1.17 {{< /highlight >}} {{% /panel %}}

    This will create a new environment called 'mynumpy' and installed NumPy version 1.17, along with any required dependencies.

    To use the environment, we must first activate it.

    {{% panel theme="info" header="Activate environment" %}} {{< highlight bash >}} conda activate mynumpy {{< /highlight >}} {{% /panel %}}

    Our new environment is now active, and we can use it. The shell prompt will change to indicate this as well.

    Using /common for environments

    By default, conda environments are installed in the user's home directory at ~/.conda/envs. This is fine for smaller environments, but larger environments (especially ML/AI-based ones) can quickly exhaust the space in the home directory.

    For larger environments, we recommend using the $COMMON folder instead. To do so, use the -p option instead of -n for conda create. For example, creating the same environment as above but placing it in the folder $COMMON/mynumpy instead.

    {{% panel theme="info" header="Create environment in /common" %}} {{< highlight bash >}} conda create -p $COMMON/mynumpy numpy=1.17 {{< /highlight >}} {{% /panel %}}

    To activate the environment, you must use the full path.

    {{% panel theme="info" header="Activate environment in /common" %}} {{< highlight bash >}} conda activate $COMMON/mynumpy {{< /highlight >}} {{% /panel %}}

    Please note that you'll need to add the #SBATCH --licenses=common directive to your submit scripts as described here in order to use environments in $COMMON.

    Adding and Removing Packages from an Existing Environment

    To install additional packages in an environment, use the install subcommand. Suppose we want to install iPython in our 'mynumpy' environment. While the environment is active, use install with no additional arguments.

    {{% panel theme="info" header="Install a new package in the currently active environment" %}} {{< highlight bash >}} conda install ipython {{< /highlight >}} {{% /panel %}}

    If you aren't currently in the environment you wish to install the package in, add the -n option to specify the name.

    {{% panel theme="info" header="Install new packages in a specified environment" %}} {{< highlight bash >}} conda install -n mynumpy ipython {{< /highlight >}} {{% /panel %}}

    The remove subcommand to uninstall a package functions similarly.

    {{% panel theme="info" header="Remove package from currently active environment" %}} {{< highlight bash >}} conda remove ipython {{< /highlight >}} {{% /panel %}}

    {{% panel theme="info" header="Remove package from environment specified by name" %}} {{< highlight bash >}} conda remove -n mynumpy ipython {{< /highlight >}} {{% /panel %}}

    To exit an environment, we deactivate it.

    {{% panel theme="info" header="Exit current environment" %}} {{< highlight bash >}} conda deactivate {{< /highlight >}} {{% /panel %}}

    Finally, to completely remove an environment, add the --all option to remove.

    {{% panel theme="info" header="Completely remove an environment" %}} {{< highlight bash >}} conda remove -n mynumpy --all {{< /highlight >}} {{% /panel %}}

    Moving and Recreating Existing Environment

    Sometimes conda environments need to be moved (e.g., from $HOME to $COMMON in order to reduce used space in $HOME) or recreated (e.g., when shared with someone). This is done using environment.yml file as shown below.

    {{% panel theme="info" header="Activate the conda environment to export" %}} {{< highlight bash >}} conda activate mynumpy {{< /highlight >}} {{% /panel %}}

    Then export the active conda environment to file environment.yml.

    {{% panel theme="info" header="Export conda environment" %}} {{< highlight bash >}} conda env export > environment.yml {{< /highlight >}} {{% /panel %}}

    Next, deactivate the conda environment.

    {{% panel theme="info" header="Exit current environment" %}} {{< highlight bash >}} conda deactivate {{< /highlight >}} {{% /panel %}}

    The file environment.yml contains both pip and conda packages installed in the activated environment. This file can now be shared or used to recreate the conda environment elsewhere.

    The exported environment can be recreated in $COMMON with:

    {{% panel theme="info" header="Recreate conda environment in $COMMON" %}} {{< highlight bash >}} conda env create -p $COMMON/mynumpy -f environment.yml {{< /highlight >}} {{% /panel %}}

    After the conda environment has been exported or recreated, if needed, the original conda environment can be removed.

    {{% panel theme="info" header="Remove conda environment" %}} {{< highlight bash >}} conda remove -n mynumpy --all {{< /highlight >}} {{% /panel %}}

    The migrated environment can then be activated with:

    {{% panel theme="info" header="Activate new environment" %}} {{< highlight bash >}} conda activate $COMMON/mynumpy {{< /highlight >}} {{% /panel %}}

    Please note that you'll need to add the #SBATCH --licenses=common directive to your submit scripts as described here in order to use environments in $COMMON.

    Remove Unused Anaconda Packages and Caches

    By default, conda environments are installed in the user’s home directory at ~/.conda/envs. conda caches and package tarballs are stored in ~/.conda/ as well. For larger or many conda environments, the size of the directory ~/.conda/ can easily reach the $HOME space quota limit of 20GB per user.

    In addition to Moving and Recreating Existing Environment, one can remove unused conda packages and caches.

    {{% panel theme="info" header="Remove unused conda packages and caches" %}} {{< highlight bash >}} conda clean --all {{< /highlight >}} {{% /panel %}}

    {{% notice info %}} Please note that this command will only remove index cache, and unused cache packages and tarballs and will not affect nor break the current conda environments you have. {{% /notice %}}

    Creating Custom GPU Anaconda Environment

    We provide GPU versions of various frameworks such as tensorflow, keras, theano, via modules. However, sometimes you may need additional libraries or packages that are not available as part of these modules. In this case, you will need to create your own GPU Anaconda environment.

    To do this, you need to first clone one of our GPU modules to a new Anaconda environment, and then install the desired packages in this new environment.

    The reason for this is that the GPU modules we support are built using the specific CUDA drivers our GPU nodes have. If you just create custom GPU environment without cloning the module, your code will not utilize the GPUs correctly.

    For example, if you want to use tensorflow with additional packages, first do: {{% panel theme="info" header="Cloning GPU module to a new Anaconda environment" %}} {{< highlight bash >}} module load tensorflow-gpu/py311/2.15 module load anaconda conda create -n tensorflow-gpu-2.15-custom --clone $CONDA_DEFAULT_ENV module purge {{< /highlight >}} {{% /panel %}}

    {{% notice note %}} While tensorflow-gpu/py311/2.15 is used here as an example module and version, please make sure you use the newest available version of the module you want to clone, or the version that is needed for your particular research needs. {{% /notice %}}

    This will create a new tensorflow-gpu-2.15-custom environment in your home directory that is a copy of the tensorflow-gpu module. Then, you can install the additional packages you need in this environment. {{% panel theme="info" header="Install new packages in the currently active environment" %}} {{< highlight bash >}} module load anaconda conda activate tensorflow-gpu-2.15-custom conda install --no-update-deps {{< /highlight >}} {{% /panel %}}

    {{% notice info %}} When installing packages in existing/cloned environment, please use --no-update-deps. This will ensure that already installed dependencies are not being updated or changed. {{% /notice %}}

    Next, whenever you want to use this custom GPU Anaconda environment, you need to add these two lines in your submit script: {{< highlight bash >}} module load anaconda conda activate tensorflow-gpu-2.15-custom {{< /highlight >}}

    {{% notice info %}} If you have custom GPU Anaconda environment please only use the two lines from above and DO NOT load the module you have cloned earlier. Using module load tensorflow-gpu/py311/2.15 and conda activate tensorflow-gpu-2.15-custom in the same script is wrong and may give you various errors and incorrect results. {{% /notice %}}

    Creating Custom MPI Anaconda Environment

    Some conda packages available on conda-forge and bioconda support MPI (via openmpi or mpich). However, just using the openmpi and mpich packages from conda-forge often does not work on HPC systems. More information about this can be found here.

    In order to be able to correctly use these MPI packages with the MPI libraries installed on our clusters, two steps need to be performed.

    First, at install time, besides the package, the "dummy" package openmpi=4.1.*=external_* or mpich=4.0.*=external_* needs to be installed for openmpi or mpich respectively. These "dummy" packages are empty, but allow the solver to create correct environments and use the system-wide modules when the environment is activated.

    Secondly, when activating the conda environment and using the package, the system-wide openmpi/4.1 or mpich/4.0 module needs to be loaded depending on the MPI library used. Currently only packages that were built using openmpi 4.1 and mpich 4.0 are supported on HCC clusters.

    For example, the steps for creating conda environment with mpi4py that supports openmpi are: {{% panel theme="info" header="Creating Anaconda environment with openmpi" %}} {{< highlight bash >}} module purge module load anaconda conda create -n mpi4py-openmpi mpi4py openmpi=4.1.=external_ {{< /highlight >}} {{% /panel %}} and the steps for using this environment are: {{% panel theme="info" header="Using Anaconda environment with openmpi" %}} {{< highlight bash >}} module purge module load compiler/gcc/10 openmpi/4.1 anaconda conda activate mpi4py-openmpi {{< /highlight >}} {{% /panel %}}

    The steps for creating conda environment with mpi4py that supports mpich are: {{% panel theme="info" header="Creating Anaconda environment with mpich" %}} {{< highlight bash >}} module purge module load anaconda conda create -n mpi4py-mpich mpi4py mpich=4.0.=external_ {{< /highlight >}} {{% /panel %}} and the steps for using this environment are: {{% panel theme="info" header="Using Anaconda environment with mpich" %}} {{< highlight bash >}} module purge module load compiler/gcc/10 mpich/4.0 anaconda conda activate mpi4py-mpich {{< /highlight >}} {{% /panel %}}

    Using an Anaconda Environment in a Jupyter Notebook

    It is not difficult to make an Anaconda environment available to a Jupyter Notebook. To do so, follow the steps below, replacing myenv with the name of the Python or R environment you wish to use:

    1. Stop any running Jupyter Notebooks and ensure you are logged out of the JupyterHub instance on the cluster you are using.

      1. If you are not logged out, please click the Control Panel button located in the top right corner.
      2. Click the "Stop My Server" Button to terminate the Jupyter server.
      3. Click the logout button in the top right corner.
    2. Using the command-line environment of the login node, load the target conda environment: {{< highlight bash >}}conda activate myenv{{< /highlight >}}

    3. Install the Jupyter kernel and add the environment:

      1. For a Python conda environment, install the IPykernel package, and then the kernel specification:

        {{< highlight bash >}}

        Install ipykernel

        conda install ipykernel

        Install the kernel specification

        python -m ipykernel install --user --name "CONDA_DEFAULT_ENV" --display-name "Python (CONDA_DEFAULT_ENV)" --env PATH $PATH {{< /highlight >}}

        {{% notice note %}} If needed, other variables can be set via additional --env arguments, e.g., python -m ipykernel install --user --name "$CONDA_DEFAULT_ENV" --display-name "Python ($CONDA_DEFAULT_ENV)" --env PATH $PATH --env VAR value, where VAR and value are the name and the value of the variable respectively. {{% /notice %}}

        {{% notice note %}} If the conda environment is located in COMMON (e.g., `COMMON/conda_env), please use the name of the environment instead of $CONDA_DEFAULT_ENV, e.g.,: python -m ipykernel install --user --name conda_env --display-name "Python (conda_env)" --env PATH $PATHwhereconda_env` is replaced with the name of your conda enironment. {{% /notice %}}

      2. For an R conda environment, install the jupyter_client and IRkernel packages, and then the kernel specification:

        {{< highlight bash >}}

        Install PNG support for R, the R kernel for Jupyter, and the Jupyter client

        conda install r-png conda install r-irkernel jupyter_client

        Install jupyter_client 5.2.3 from anaconda channel for bug workaround

        conda install -c anaconda jupyter_client

        Install the kernel specification

        R -e "IRkernel::installspec(name = 'CONDA_DEFAULT_ENV', displayname = 'R (CONDA_DEFAULT_ENV)', user = TRUE)" {{< /highlight >}}

    4. Once you have the environment set up, deactivate it: {{< highlight bash >}}conda deactivate{{< /highlight >}}

    5. Login to JupyterHub and create a new notebook using the environment by selecting the correct entry in the New dropdown menu in the top right corner.
      {{< figure src="/images/24151931.png" height="400" class="img-border">}}

    Using Mamba

    Mamba is an alternative to Conda that is in general faster and performs better at resolving dependencies in conda environments. Mamba is available as part of the anaconda modules on Swan.

    Mamba can be used by simply replacing conda with mamba in all conda commands provided here.

    {{% panel theme="info" header="Load the Anaconda module to start using Mamba" %}} {{< highlight bash >}} module load anaconda {{< /highlight >}} {{% /panel %}}

    To create a new environment called 'mynumpy' and install NumPy version 1.17, along with any required dependencies, the command is:

    {{% panel theme="info" header="Create a new environment by providing a name and package specification" %}} {{< highlight bash >}} mamba create -n mynumpy numpy=1.17 {{< /highlight >}} {{% /panel %}}