Help Documents

Linux Clusters

  • Basic information about Linux clusters
  • A Linux cluster is a connected array of Linux computers or nodes that work together and can be viewed and managed as a single system. Nodes are usually connected by fast local area networks, with each node running its own instance of Linux. Nodes may be physical or virtual machines, and they may be separated geographically. Each node includes storage capacity, processing power and I/O (input/output) bandwidth. Multiple redundant nodes of Linux servers may be connected as a cluster for high availability (HA) in a network or data center, where each node is capable of failure detection and recovery.

    IT organizations use Linux clusters to reduce downtime and deliver high availability of IT services and mission-critical workloads. The redundancy of cluster components eliminates single points of failure. Linux clusters may be connected nodes of servers, storage devices or virtualized containers. A server cluster is a group of linked servers that work together to improve system performance, load balancing and service availability. If a server fails, other servers in the cluster can take over the functions and workloads of the failed server.

    Compared to a single computer, a Linux cluster can provide faster processing speed, larger storage capacity, better data integrity, greater reliability and wider availability of resources. Clusters are usually dedicated to specific functions, such as load balancing, high availability, high performance, storage or large-scale processing. Compared to a mainframe computer, the amount of power and processing speed produced by a Linux cluster is more cost effective. The networked nodes in a cluster also create an efficient, distributed infrastructure that prevents bottlenecks, thus improving performance.

  • Information about High-Performance Computing
  • High Performance Computing (HPC) is the IT practice of aggregating computing power to deliver more performance than a typical computer can provide. Originally used to solve complex scientific and engineering problems, HPC is now used by businesses of all sizes for data-intensive tasks. Companies that provide automotive engineering, pharmaceutical design, oil and gas exploration, renewable energy research, entertainment and media, financial analytics, and consumer product manufacturing rely on HPC for scalable business computing.

    An HPC system is typically a cluster of computers or nodes, with each node containing one to four processors and each processor containing two to four cores. A common cluster size in many businesses is between 16 and 64 nodes, with 64 to 256 cores. Linux is the dominant operating system for HPC installations. HPC applications usually require fast network and storage performance, large amounts of memory and very high compute capabilities. An HPC cluster running in the cloud can scale to larger numbers of parallel tasks than most on-premises environments. A cloud-based HPC cluster allows users to focus on their applications and research output instead of IT maintenance. HPC cloud systems only charge for the services clients actually use, so businesses can optimize costs without paying for idle compute capacity.

    HPC infrastructures are complex in both design and operation, involving a large set of interdependent hardware and software elements that must be precisely configured and seamlessly integrated across a growing number of compute nodes.

  • Technical Terms
  • Cluster - many nodes connected that can coordinate between themselves to handle massive amounts of data at high speed with parallel performance.
    Node - a single computer or server.
    (processor) - A single unit that executes a single chain of instructions. It can be a head node/login node or a regular compute node.
    Headnode - Also called a login node, is a node that the user logs in 
    Slurm job - a scheduled process sent by the user that is allocated, managed, and monitored by the Slurm manager.
    SSH key pair - Consisting of private and public key is a pair of cryptographic keys used to authenticate and ensure secure communication between user and server
    CLI - Command Line Interface is an application that lets you interact with the machine and enter commands to accomplish a task.
    X11 - is a graphical user interface that runs on a Unix machine. It is sometimes required to run GUI software/modules on HPC clusters/servers
    Module - Installed software/module in a cluster

Command Prompt

  • Basics of Command Prompt
  • Command Prompt is a command line interpreter application available in most types of operating systems. It's used to access a cluster node and execute entered commands. Most of those commands automate tasks via scripts and batch files, perform advanced administrative functions, and solve certain kinds of issues.
    Here are some basic and helpful Linux commands:-

    Use the pwd command to find out the path of the current working directory (folder) you’re in. The command will return an absolute (full) path, which is basically a path of all the directories that starts with a forward slash (/). An example of an absolute path is /home/username.

    To navigate through the Linux files and directories, use the cd command. It requires either the full path or the name of the directory, depending on the current working directory that you’re in.

    Let’s say you’re in /home/username/Documents and you want to go to Photos, a subdirectory of Documents. To do so, simply type the following command: cd Photos.
    Another scenario is if you want to switch to a completely new directory, for example,/home/username/Movies. In this case, you have to type cd followed by the directory’s absolute path: cd /home/username/Movies.

    There are some shortcuts to help you navigate quickly:

    cd .. (with two dots) to move one directory up
    cd to go straight to the home folder
    cd- (with a hyphen) to move to your previous directory
    On a side note, Linux’s shell is case sensitive. So, you have to type the name’s directory exactly as it is.

     The ls command is used to view the contents of a directory. By default, this command will display the contents of your current working directory.
    If you want to see the content of other directories, type ls and then the directory’s path. For example, enter ls /home/username/Documents to view the content of Documents.

    There are variations you can use with the ls command:

    ls -R will list all the files in the sub-directories as well
    ls -a will show the hidden files
    ls -al will list the files and directories with detailed information like the permissions, size, owner, etc.

    cat (short for concatenate) is one of the most frequently used commands in Linux. It is used to list the contents of a file on the standard output (sdout). To run this command, type cat followed by the file’s name and its extension. For instance: cat file.txt.

    Here are other ways to use the cat command:

    cat > filename creates a new file
    cat filename1 filename2>filename3 joins two files (1 and 2) and stores the output of them in a new file (3)
    to convert a file to upper or lower case use, cat filename | tr a-z A-Z >output.txt

     Use the cp command to copy files from the current directory to a different directory. For instance, the command cp scenery.jpg /home/username/Pictures would create a copy of scenery.jpg (from your current directory) into the Pictures directory.

    The primary use of the mv command is to move files, although it can also be used to rename files.

    The arguments in mv are similar to the cp command. You need to type mv, the file’s name, and the destination’s directory. For example: mv file.txt /home/username/Documents. To rename files, the Linux command is mv oldname.ext newname.ext

    Use mkdir command to make a new directory — if you type mkdir Music it will create a directory called Music.

    There are extra mkdir commands as well:

    To generate a new directory inside another directory, use this Linux basic command mkdir Music/Newfile
    use the p (parents) option to create a directory in between two existing directories. For example, mkdir -p Music/2020/Newfile will create the new “2020” file.

    If you need to delete a directory, use the rmdir command. However, rmdir only allows you to delete empty directories.

    The rm command is used to delete directories and the contents within them. If you only want to delete the directory — as an alternative to rmdir — use rm -r. Note: Be very careful with this command and double-check which directory you are in. This will delete everything and there is no undo.

    The touch command allows you to create a blank new file through the Linux command line. As an example, enter touch /home/username/Documents/Web.html to create an HTML file entitled Web under the Documents directory.

    You can use this command to locate a file, just like the search command in Windows. What’s more, using the -i argument along with this command will make it case-insensitive, so you can search for a file even if you don’t remember its exact name.

    To search for a file that contains two or more words, use an asterisk (*). For example, locate -i school*note command will search for any file that contains the word “school” and “note”, whether it is uppercase or lowercase.

    Similar to the locate command, using find also searches for files and directories. The difference is, you use the find command to locate files within a given directory.

    As an example, find /home/ -name notes.txt command will search for a file called notes.txt within the home directory and its subdirectories.

    Other variations when using the find are:

    To find files in the current directory use, find . -name notes.txt
    To look for directories use, / -type d -name notes. txt

    Another basic Linux command that is undoubtedly helpful for everyday use is grep. It lets you search through all the text in a given file.

    To illustrate, grep blue notepad.txt will search for the word blue in the notepad file. Lines that contain the searched word will be displayed fully.

    Use df command to get a report on the system’s disk space usage, shown in percentage and KBs. If you want to see the report in megabytes, type df -m.

    If you want to check how much space a file or a directory takes, the du (Disk Usage) command is the answer. However, the disk usage summary will show disk block numbers instead of the usual size format. If you want to see it in bytes, kilobytes, and megabytes, add the -h argument to the command line.

    The head command is used to view the first lines of any text file. By default, it will show the first ten lines, but you can change this number to your liking. For example, if you only want to show the first five lines, type head -n 5 filename.ext.

    This one has a similar function to the head command, but instead of showing the first lines, the tail command will display the last ten lines of a text file. For example, tail -n filename.ext.

    Short for difference, the diff command compares the contents of two files line by line. After analyzing the files, it will output the lines that do not match. Programmers often use this command when they need to make program alterations instead of rewriting the entire source code.

    The simplest form of this command is diff file1.ext file2.ext

    The tar command is the most used command to archive multiple files into a tarball — a common Linux file format that is similar to zip format, with compression being optional.

    This command is quite complex with a long list of functions such as adding new files into an existing archive, listing the content of an archive, extracting the content from an archive, and many more. Check out some practical examples to know more about other functions.

    chmod is another Linux command, used to change the read, write, and execute permissions of files and directories. As this command is rather complicated, you can read the full tutorial in order to execute it properly.

    In Linux, all files are owned by a specific user. The chown command enables you to change or transfer the ownership of a file to the specified username. For instance, chown linuxuser2 file.ext will make linuxuser2 as the owner of the file.ext.

    jobs command will display all current jobs along with their statuses. A job is basically a process that is started by the shell.

    If you have an unresponsive program, you can terminate it manually by using the kill command. It will send a certain signal to the misbehaving app and instructs the app to terminate itself.

    There is a total of sixty-four signals that you can use, but people usually only use two signals:

    SIGTERM (15) — requests a program to stop running and gives it some time to save all of its progress. If you don’t specify the signal when entering the kill command, this signal will be used.
    SIGKILL (9) — forces programs to stop immediately. Unsaved progress will be lost.
    Besides knowing the signals, you also need to know the process identification number (PID) of the program you want to kill. If you don’t know the PID, simply run the command ps ux.

    After knowing what signal you want to use and the PID of the program, enter the following syntax:

    kill [signal option] PID.

    Use the ping command to check your connectivity status to a server. For example, by simply entering ping, the command will check whether you’re able to connect to Google and also measure the response time.

    The Linux command line is super useful — you can even download files from the internet with the help of the wget command. To do so, simply type wget followed by the download link.

    The uname command, short for Unix Name, will print detailed information about your Linux system like the machine name, operating system, kernel, and so on.

    As a terminal equivalent to Task Manager in Windows, the top command will display a list of running processes and how much CPU each process uses. It’s very useful to monitor system resource usage, especially knowing which process needs to be terminated because it consumes too many resources.

    When you’ve been using Linux for a certain period of time, you’ll quickly notice that you can run hundreds of commands every day. As such, running history command is particularly useful if you want to review the commands you’ve entered before.

    Confused about the function of certain Linux commands? Don’t worry, you can easily learn how to use them right from Linux’s shell by using the man command. For instance, entering man tail will show the manual instruction of the tail command.

    This command is used to move some data into a file. For example, if you want to add the text, “Hello, my name is John” into a file called name.txt, you would type echo Hello, my name is John >> name.txt

    zip, unzip 
    Use the zip command to compress your files into a zip archive, and use the unzip command to extract the zipped files from a zip archive.

    If you want to know the name of your host/network simply type hostname. Adding a -I to the end will display the IP address of your network.

    useradd, userdel 
    Since Linux is a multi-user system, this means more than one person can interact with the same system at the same time. useradd is used to create a new user, while passwd is adding a password to that user’s account. To add a new person named John type, useradd John and then to add his password type, passwd 123456789.

    To remove a user is very similar to adding a new user. To delete the users account type, userdel UserName

    Bonus Tips and Tricks
    Use the clear command to clean out the terminal if it is getting cluttered with too many past commands.

    Try the TAB button to autofill what you are typing. For example, if you need to type Documents, begin to type a command (let’s go with cd Docu, then hit the TAB key) and the terminal will fill in the rest, showing you cd Documents.

    Ctrl+C and Ctrl+Z are used to stop any command that is currently working. Ctrl+C will stop and terminate the command, while Ctrl+Z will simply pause the command.

    If you accidental freeze your terminal by using Ctrl+S, simply undo this with the unfreeze Ctrl+Q.

    Ctrl+A moves you to the beginning of the line while Ctrl+E moves you to the end.

    You can run multiple commands in one single command by using the “;” to separate them. For example Command1; Command2; Command3. Or use && if you only want the next command to run when the first one is successful.

  • How do I extract tar or tar.gz file and directory in Linux?
  • Tar.gz or Tar archives, zips and compresses files and directories. In order to extract/untar a Tar file in linux, you can type in the following commands in command prompt:-

    tar xzf file.tar.gz- to uncompress a gzip tar file (.tgz or .tar.gz)
    tar xjf file.tar.bz2 - to uncompress a bzip2 tar file (.tbz or .tar.bz2) to extract the contents.
    tar xf file.tar - to uncompressed tar file (.tar)
    tar xC /var/tmp -f file.tar - to uncompress tar file (.tar) to another directory

    Here are the explanations for the flags that has been used:-

    x = extract, this indicated an extraction c = create to create 
    v = verbose (optional) the files with relative locations will be displayed.
    z = gzip-ped; j = bzip2-zipped
    f = from/to file ... (what is next after the f is the archive file)
    C = directory. In c and r mode, this changes the directory before adding the following files. In x mode, changes directoriy after opening the archive but before extracting entries from the archive.

    The files will be extracted in the current folder (most of the times in a folder with the name 'file-1.0').
  • Linux Commands' Display Map

  • Click here to see a display map of some useful Linux commands >>
  • Text Editors in Linux
  • There are different text editors available in Linux to leverage. The useful and popular ones are vi, nano, emacs 

    You can create, edit and manipulate files through these editors:

    Vi is the default text editor in Linux. The UNIX vi editor is a full screen editor and has two modes of operation:
    1 - Command mode commands which cause action to be taken on the file, and
    2 - Insert mode in which entered text is inserted into the file.

    In the command mode, every character typed is a command that does something to the text file being edited; a character typed in the command mode may even cause the vi editor to enter the insert mode. In the insert mode, every character typed is added to the text in the file; pressing the <Esc> (Escape) key turns off the Insert mode.
    While there are a number of vi commands, just a handful of these is usually sufficient for beginning vi users. To assist such users, this Web page contains a sampling of basic vi commands. The most basic and useful commands are marked with an asterisk (* or star) in the tables below. With practice, these commands should become automatic.
    NOTE: Both UNIX and vi are case-sensitive. Be sure not to use a capital letter in place of a lowercase letter; the results will not be what you expect.

    Nano - GNU nano is a small and friendly text editor. Besides basic text editing, nano offers many extra features like an interactive search and replace, go to line and column number, auto-indentation, feature toggles, internationalization support, and filename tab completion. 

    Emacs - Emacs is one of the oldest and most versatile text editors. The GNU Emacs version was originally written in 1984 and is well known for its powerful and rich editing features. To run it, type in:
    $ emacs
    Emacs has a GUI and is easy to open file and edit files with.
    Cat - can be used to display the content of a file, copy content from one file to another, concatenate the contents of multiple files, display the line number, display $ at the end of the line, etc.
    The cat command can be used for piping a file to any program that expects binary data or plain text on the input stream. The cat command doesn't damage non-text bytes when outputting and concatenating. As such, the two primary use cases of this command are certain format-compatible binary file types and text files.

    cat > [fileName]    To create a file.
    cat [oldfile] > [newfile]    To copy content from older to new file.
    cat [file1 file2 and so on] > [new file name]    To concatenate contents of multiple files into one.
    cat -n/cat -b [fileName]    To display line numbers.
    cat -e [fileName]    To display $ character at the end of each line.
    cat [fileName] <<EOF    Used as page end marker.

    More - command is used to view the text files in the command prompt, displaying one screen at a time in case the file is large (For example log files). The more command also allows the user do scroll up and down through the page. 

    more [-options] [-num] [+/pattern] [+linenum] [file_name]
    -d : Use this command in order to help the user to navigate. It displays “[Press space to continue, ‘q’ to quit.]” and displays “[Press ‘h’ for instructions.]” when wrong key is pressed.

    -f : This option does not wrap the long lines and displays them as such.
    -p : This option clears the screen and then displays the text.
    -c : This command is used to display the pages on the same area by overlapping the previously displayed text.
    -s : This option squeezes multiple blank lines into one single blank line.
    -u : This option omits the underlines. 
  • How do I find available software and modules on HPC clusters?
  • Most of HPC clusters have many software packages available for a wide range of needs. Most packages that are installed are available as environment modules. You can find out about an installed software/module using the following command:-
    module avail 
    This command will list all the installed software and modules in your cluster.

    module load <module/version>        to load a module
    module unload <module/version>    when done.

    Generally, use as few modules as possible at a time–once you're done using a particular piece of software, unload the module before you load another one, to avoid incompatibilities.

    If you cannot find a piece of software on the cluster, you can request an installation for cluster-wide use. Read through the Software Installation Policy page, and send your software request through this form: 

Slurm Headers

  • New to Slurm? Here are Useful examples
  • Slurm is a job scheduler and batch manager,
    Here are main Slurm commands:
    sbatch - submit a job script.
    srun - run a command on allocated compute node(s).
    scancel - delete a job.
    squeue - show state of jobs.
    sinfo - show state of nodes and partitions (queues).
    smap - show jobs, partitions and nodes in a graphical network topology.
    scontrol - modify jobs or show information about various aspects of the cluster
    Please read the following articles for some good examples of running a script and working with its output.
  • Here is an example of an script to run in Slurm
  • You can use any editor to write this script and run it using sbatch or srun commands (explained at the bottom of this article).
    # Your Sbatch job should start with the following line of code. Please note the -l flag!

    #!/bin/bash -l

    # Name of the job - You'll probably want to customize this.
    #SBATCH -J bench

    # Tasks per node based on --cpus-per-task
    #SBATCH --ntasks-per-node=1

    # Processors per task needed for use case (example):
    #SBATCH --cpus-per-task=5

    # Time to run the job, remember to pick the correct time to run your job and only admins can extend the job time limit if needed:

    #SBATCH --time=60:00:00

    #If you want the system to send email if the job end or fails

    #SBATCH --mail-user=

    #SBATCH --mail-type=END

    #SBATCH --mail-type=FAIL

    #If you want to exclude specific nodes to not pick the job:

    #SBATCH --exclude=node01,node02,node03,node04

    # Standard out and Standard Error output files with the job number in the name.
    #SBATCH -o bench-%j.output
    #SBATCH -e bench-%j.output

    # no -n here, the user is expected to provide that on the command line.

    # The useful part of your job goes below

    # run one thread for each one the user asks the queue for
    # hostname is just for debugging
    module load benchmarks

    # You can run your job using the commands: Sbatch or Srun
    #Sbatch - will run the job in the background at none-blocking mode. For example we wrap the above lines of code into a script called "bench" and run it like:
    sbatch -N4 bench

    #Srun - will run the job in interactive and blocking mode. For example:

    srun --partition=med --time=60:00:00 --mem=10G --nodes=4 --pty /bin/bash -il

  • Here is another example of script run in Slurm:-
  • Open an editor and log these code in the file:
    Note - The # sign is required on the start of Slurm options:

    #!/bin/bash -l

    #SBATCH -J Test-Job
    #SBATCH --ntasks-per-node=1
    #SBATCH --cpus-per-task=5
    #SBATCH --time=02:00:00
    #SBATCH --mem=32GB
    #SBATCH --partition=med
    #SBATCH --mail-user=
    #SBATCH --mail-type=END
    #SBATCH --mail-type=FAIL
    #SBATCH -o Test-Job-%j.output
    #SBATCH -e Test-Job-%j.err

    module load <module/version>
    <code to process>

    After saving and closing that file, run it using srun or sbatch like this

    srun --partition=med --mem=32G --nodes=4 --time=02:00:00

    OR like this:


  • How do I request GPU resources in Slurm?
  • If your default Slurm association/account does not have GPU partition access. And if you want to request GPU from another account, you can request it using the flags:

    #SBATCH --account=Alternate_Account_Name
    #SBATCH --nodes=1
    #SBATCH --gres=gpu:1

    Or when you submit your job using srun, add these flags:

    srun -A Alternate_Account_Name --gres=gpu:1 -t 01:00:00 --mem=20GB
  • Slurm Script Quick Reference
  • prefixed with #SBATCH

    :    A comment
    –job-name=myjob    Job Name
    –output=myjob.out    Output sent to this file
    –output=myjob.%j.%N.out    Output file named with job number and the node the job landed on
    –error=myjob.err    Errors written to this file
    –partition=med    Run is the med partition (known as a queue in SGE)
    –nodes=4    Request four nodes
    –ntasks-per-node=8    Request eight tasks per node. The number of tasks may not exceed the number of processor cores on the node
    –ntasks=10    Request 10 tasks for your job
    –time=2-12:00:00    The maximum amount of time SLURM will allow your job to run before it is killed. (2 days and 12 hours in the example)
    –mail-type=type    Set type to: BEGIN to notify you when your job starts, END for when it ends, FAIL for if it fails, or ALL for all of the above
    –mem-per-cpu=MB    Specify a memory limit for each process of your job
    –mem=MB    Specify a memory limit for each node of your job
    –exclusive    Specify that you need exclusive access to nodes for your job
    –share    Specify that your job may share nodes with other jobs
    –begin=2013-09-21T01:30:00    Start the job after this time
    –begin=now+1hour    Use a relative time to start the job
    –dependency=afterany:100:101    Wait for jobs 100 and 101 to complete before starting
    –dependency=afterok:100:101    Wait for jobs 100 and 101 to finish without error
    Show all available options

    $ sbatch --help
    Another useful command

     $ sbatch --usage

  • SBATCH job with parallel programs running:
  • #SBATCH -p bigmem
    #SBATCH -n 16

    srun="srun -N1 -n1 -c2"    ##where -c{INT} is the number of threads you're providing the program
    for sample in $samp_ids

        ## NOTE parallelizing with SRUN requires the & at the end of the command in order for WAIT to work
        bwaalncmd1="$srun time bwa aln -t 2 -I $REF $FW_READS > $OUT_RESULTS/$prefix.1.sai 2> $OUT_STATS/$prefix.1.log &"
        echo "[]: running $bwaalncmd1"
        eval $bwaalncmd1
        echoerr $bwaalncmd1

    This script would run the same 'bwa' command on multiple samples in parallel. Then, wait for these srun commands to finish before continuing to the next step in the pipeline.

  • Job Dependencies
  • You may want to run a set of jobs sequentially, so that the second job runs only after the first one has completed. This can be accomplished using Slurm's job dependencies options. For example, if you have two jobs, Job1.bat and Job2.bat, you can utilize job dependencies as in the example below.

    [user@biowulf]$ sbatch Job1.bat

    [user@biowulf]$ sbatch --dependency=afterany:123213 Job2.bat
    The flag --dependency=afterany:123213 tells the batch system to start the second job only after completion of the first job. afterany indicates that Job2 will run regardless of the exit status of Job1, i.e. regardless of whether the batch system thinks Job1 completed successfully or unsuccessfully.

    Once job 123213 completes, job 123214 will be released by the batch system and then will run as the appropriate nodes become available. Exit status: The exit status of a job is the exit status of the last command that was run in the batch script. An exit status of '0' means that the batch system thinks the job completed successfully. It does not necessarily mean that all commands in the batch script completed successfully.

    There are several options for the '--dependency' flag that depend on the status of Job1. e.g.

    --dependency=afterany:Job1    Job2 will start after Job1 completes with any exit status
    --dependency=after:Job1    Job2 will start any time after Job1 starts
    --dependency=afterok:Job1    Job2 will run only if Job1 completed with an exit status of 0
    --dependency=afternotok:Job1    Job2 will run only if Job1 completed with a non-zero exit status
    Making several jobs depend on the completion of a single job is trivial. This is accomplished in the example below:

    [user@biowulf]$ sbatch Job1.bat

    [user@biowulf]$ sbatch --dependency=afterany:13205 Job2.bat

    [user@biowulf]$ sbatch --dependency=afterany:13205 Job3.bat

    [user@biowulf]$ squeue -u $USER -S S,i,M -o "%12i %15j %4t %30E"
    JOBID        NAME            ST   DEPENDENCY                    
    13205        Job1.bat        R                                  
    13206        Job2.bat        PD   afterany:13205                
    13207        Job3.bat        PD   afterany:13205                
    Making a job depend on the completion of several other jobs: example below.

    [user@biowulf]$ sbatch Job1.bat

    [user@biowulf]$ sbatch Job2.bat

    [user@biowulf]$ sbatch --dependency=afterany:13201,13202 Job3.bat

    [user@biowulf]$ squeue -u $USER -S S,i,M -o "%12i %15j %4t %30E"
    JOBID        NAME            ST   DEPENDENCY                    
    13201        Job1.bat        R                                  
    13202        Job2.bat        R                                  
    13203        Job3.bat        PD   afterany:13201,afterany:13202 
    Chaining jobs is most easily done by submitting the second dependent job from within the first job. Example batch script:


    cd /data/mydir
    sbatch --dependency=afterany:$SLURM_JOB_ID  my_second_job

  • Building Pipelines using Slurm dependencies
  • Job dependencies are used to defer the start of a job until the specified dependencies have been satisfied. They are specified with the --dependency option to sbatch or swarm in the format

    sbatch --dependency=<type:job_id[:job_id][,type:job_id[:job_id]]> ...
    Dependency types:

    after:jobid[:jobid...]    job can begin after the specified jobs have started
    afterany:jobid[:jobid...]    job can begin after the specified jobs have terminated
    afternotok:jobid[:jobid...]    job can begin after the specified jobs have failed
    afterok:jobid[:jobid...]    job can begin after the specified jobs have run to completion with an exit code of zero (see the user guide for caveats).
    singleton    jobs can begin execution after all previously launched jobs with the same name and user have ended. This is useful to collate results of a swarm or to send a notification at the end of a swarm.


    To set up pipelines using job dependencies the most useful types are afterany, afterok and singleton. The simplest way is to use the afterok dependency for single consecutive jobs. For example:

    b2$ sbatch
    b2$ sbatch --dependency=afterok:11254323
    Now when job1 ends with an exit code of zero, job2 will become eligible for scheduling. However, if job1 fails (ends with a non-zero exit code), job2 will not be scheduled but will remain in the queue and needs to be canceled manually.

    As an alternative, the afterany dependency can be used and checking for successful execution of the prerequisites can be done in the jobscript itself.

    The sections below give more complicated examples of using job dependencies for pipelines in bash, perl, and python.

    The following bash script is a stylized example of some useful patterns for using job dependencies:

    #! /bin/bash

    # first job - no dependencies
    jid1=$(sbatch  --mem=12g --cpus-per-task=4

    # multiple jobs can depend on a single job
    jid2=$(sbatch  --dependency=afterany:$jid1 --mem=20g
    jid3=$(sbatch  --dependency=afterany:$jid1 --mem=20g

    # a single job can depend on multiple jobs
    jid4=$(sbatch  --dependency=afterany:$jid2:$jid3

    # swarm can use dependencies
    jid5=$(swarm --dependency=afterany:$jid4 -t 4 -g 4 -f

    # a single job can depend on an array job
    # it will start executing when all arrayjobs have finished
    jid6=$(sbatch --dependency=afterany:$jid5

    # a single job can depend on all jobs by the same user with the same name
    jid7=$(sbatch --dependency=afterany:$jid6 --job-name=dtest
    jid8=$(sbatch --dependency=afterany:$jid6 --job-name=dtest
    sbatch --dependency=singleton --job-name=dtest

    # show dependencies in squeue output:
    squeue -u $USER -o "%.8A %.4C %.10m %.20E"
    A more complete example of a mock chipseq pipeline can be found here.

    And here is a simple bash script that will submit a series of jobs for a benchmark test. This script submits the same job with 1 MPI process, 2 MPI processes, 4 MPI processes ... 128 MPI processes. The Slurm batch script 'jobscript' uses the environment variable $SLURM_NTASKS to specify the number of MPI processes that the program should start. The reason to use job dependencies here is that all the jobs write some temporary files with the same name, and would clobber each other if run at the same time.


    id=`sbatch --job-name=factor9-1 --ntasks=1 --ntasks-per-core=1 --output=${PWD}/results/x2650-1.slurmout jobscript`
    echo "ntasks 1 jobid $id"

    for n in 2 4 8 16 32 64 128; do
        id=`sbatch --depend=afterany:$id --job-name=factor9-$n --ntasks=$n --ntasks-per-core=1 --output=${PWD}/results/x2650-$n.slurmout jobscript`;
        echo "ntasks $n jobid $id"
    The batch script corresponding to this example:


    module load  amber/14
    module list

    echo "Using $SLURM_NTASKS cores"

    cd /data/user  /amber/factor_ix.amber10

    `which mpirun` -np $SLURM_NTASKS `which sander.MPI` -O -i mdin -c inpcrd -p prmtop
    A sample perl script that submits 3 jobs, each one dependent on the completion (in any state) of the previous job.


    $num = 8;

    $jobnum = `sbatch --cpus-per-task=$num myjobscript`;
    chop $jobnum;
    print "Job number $jobnum submitted\n\n";

    $jobnum = `sbatch --depend=afterany:${jobnum} --cpus-per-task=8 --mem=2g mysecondjobscript`;
    chop $jobnum;
    print "Job number $jobnum submitted\n\n";

    $jobnum = `sbatch --depend=afterany:${jobnum} --cpus-per-task=8 --mem=2g mythirdjobscript`;
    chop $jobnum;
    print "Job number $jobnum submitted\n\n";

    The sample Python script below submits 3 jobs that are dependent on each other, and shows the status of those jobs.


    import commands, os

    # submit the first job
    cmd = "sbatch Job1.bat"
    print "Submitting Job1 with command: %s" % cmd
    status, jobnum = commands.getstatusoutput(cmd)
    if (status == 0 ):
        print "Job1 is %s" % jobnum
        print "Error submitting Job1"

    # submit the second job to be dependent on the first
    cmd = "sbatch --depend=afterany:%s Job2.bat" % jobnum
    print "Submitting Job2 with command: %s" % cmd
    status,jobnum = commands.getstatusoutput(cmd)
    if (status == 0 ):
        print "Job2 is %s" % jobnum
        print "Error submitting Job2"

    # submit the third job (a swarm) to be dependent on the second
    cmd = "swarm -f swarmfile --module blast  --depend=afterany:%s Job2.bat" % jobnum
    print "Submitting swarm job  with command: %s" % cmd
    status,jobnum = commands.getstatusoutput(cmd)
    if (status == 0 ):
        print "Job3 is %s" % jobnum
        print "Error submitting Job3"

    print "\nCurrent status:\n"
    #show the current status with 'sjobs'
    Running this script:

    [user  @biowulf ~]$
    Submitting Job1 with command: sbatch Job1.bat
    Job1 is 25452702
    Submitting Job2 with command: sbatch --depend=afterany:25452702 Job2.bat
    Job2 is 25452703
    Submitting swarm job  with command: swarm -f swarm.cmd --module blast  --depend=afterany:25452703
    Swarm job is 25452706

    Current status:

    User    JobId            JobName   Part  St  Reason      Runtime  Walltime  Nodes  CPUs  Memory  Dependency      
    user    25452702         Job1.bat  norm  PD  ---            0:00   4:00:00      1   1   2GB/cpu
    user    25452703         Job2.bat  norm  PD  Dependency     0:00   4:00:00      1   1   2GB/cpu  afterany:25452702
    user    25452706_[0-11]  swarm     norm  PD  Dependency     0:00   4:00:00      1  12   1GB/node afterany:25452703
    cpus running = 0
    cpus queued = 14
    jobs running = 0
    jobs queued = 14 

  • Snakemake and Slurm: How to Run Snakemake on HPC
  • Why use Snakemake on HPC
    Snakemake is a handy workflow manager written in Python. It handles workflow based on predefined job dependencies. One of the great features of Snakemake is that it can manage a workflow on both a standalone computer, or a clustered HPC system. HPC, or “cluster” as it’s often referred to, requires additional considerations.

    On HPC, all computing jobs should be submitted to “compute nodes” through a workload manager (for example, Slurm). Running jobs on “heaed nodes”, or log-in nodes, is a big no-no: doing so risks exhausting computing resources of head node computer (which is very limited) and slowing down everyone else on that same head node.

    File system is another thing that requires consideration when computing on HPC. Many HPC systems use NFS (Network File System) to have storage system accessible to all computing and head nodes. But some HPC systems also have dedicated local storage for each compute node. This is to help offload I/Os from major storage drives when users are running jobs with intensive I/O. It is important that users make use of these local storage spaces, as well as clean up their temporary files.

    Below I will show how to achieve responsible computing on HPC with a few tweaks to your Snakemake workflow.

    How to run snakemake on HPC
    Snakemake natively supports managing job submission on HPC. Take a look at their documentation here.

    Let’s assume I have a Snakefile with following minimal rules:

    To execute this workflow locally, we can simply invoke snakemake -s path/to/Snakefile (or snakemake if you’re in the same directory as Snakefile).

    Now if we want to run this on HPC on compute nodes, Snakemake allows us to do so by:

    snakemake -s Snakefile --cluster 'sbatch -t 60 --mem=2g -c 1' -j 10
    In the above command, we use --cluster to hand Snakemake a command that it can use to submit a job script. Snakemake creates a job script for each job to be run and uses this command to submit that job to HPC. That simple! Notice that here we use sbatch because it’s the command you would use to submit a shell script on HPC managed by Slurm. Replace this with qsub commands if your HPC uses SGE. -j 10 argument limits to at most 10 jobs running at the same time.

    Decorate your job submission
    Claim different resources based on rules
    Now suppose we have a slightly more complicated workflow:

  • More Helpful Links
  • You can find more helpful resources at: - RILab documentation
     Information about Slurm Rosetta - 

Manage Data Via Git Repository

  • What is Git repository and how do we manage data via Git Repository?
  • A Git repository (or repo for short) contains all of the project files and the entire revision history. You'll take an ordinary folder of files (such as a website's root folder), and tell Git to make it a repository. This creates a . git subfolder, which contains all of the Git metadata for tracking changes. 
    Link to HPC Core Facility's Git Repo:-