What is Slurm, and how do I write and submit a Slurm job?

 What is Slurm, and how do I submit a Slurm job?

 

When HPC users log into the clusters, they enter the head or login nodes. All HPC users are encouraged to submit jobs to the nodes, not the head node. For this purpose, HPC has Slurm resource manager available and responsible for scheduling and running jobs.

 Slurm is an open-source resource manager (batch queue)  and job scheduler designed for Linux clusters of all sizes.

The general idea with a batch queue is that you don't have to babysit your jobs. You submit it, and it'll run until it dies, or there is a problem. You can configure it to notify you via email when that happens. This allows very efficient use of the cluster. You can still babysit/debug your jobs if you wish using an interactive session (ie qlogin).

Our main concern is that all jobs go through the batch queuing system. Do not bypass the batch queue. We don't lock anything down but that doesn't mean we can't or won't. If you need to retrieve files from a compute node feel free to ssh directly to it and get them, but don't impact other jobs that have gone through the queue.

You can write this script using any editor and run it using the sbatch or srun commands (explained at the bottom of this article).
Please note that Slurm directives start with a # sign, and comments can have multiple # signs at the beginning.

## Your Sbatch job should start with the following line of code.
#!/bin/bash -l

## Name of the job - You'll probably want to customize this.
#SBATCH -J bench

## Tasks per node based on --cpus-per-task
#SBATCH --ntasks-per-node=1

## Processors per task needed for use case (example):
#SBATCH --cpus-per-task=5

## Time to run the job, remember to pick the correct time to run your job and only admins can extend the job time limit if needed:

#SBATCH --time=60:00:00

##If you want the system to send email if the job end or fails
#SBATCH --mail-user= userid@ucdavis.edu

#SBATCH --mail-type=END

#SBATCH --mail-type=FAIL

## Standard out and Standard Error output files with the job number in the name.
#SBATCH -o bench-%j.output
#SBATCH -e bench-%j.output

##Account name, the group association that you are using the resources from (if this is a secondary or a different account than your default group):
#SBATCH -A groupID

## hostname is just for debugging
hostname
export OMP_NUM_THREADS=$SLURM_NTASKS
module load module/version

Then you can run your job using the commands Sbatch or Srun
Sbatch submits the script to available resources managed by Slurm, the batch job can be executed without user intervention when the resources are available.

sbatch -N4 bench

Srun runs jobs directly and doesn't involve scheduling, it executes the script immediately on the avialable resource and blocks the interaction with the prompt.

srun --partition=med --time=60:00:00 --mem=10G --nodes=4 --pty /bin/bash -il

 

How do I submit a job to a GPU node via Slurm?

If your default Slurm association/account does not have GPU partition access. And if you want to request GPU from another account, you can request it using the flags:

#SBATCH --account=Alternate_Account_Name
#SBATCH --nodes=1
#SBATCH --gres=gpu:1

Or when you submit your job using srun, add these flags:

srun -A Alternate_Account_Name --gres=gpu:1 -t 01:00:00 --mem=20GB batch.sh

 

How do I track and see the status of my Slurm jobs?

Here are some useful Slurm commands with their purpose:
sinfo reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options.

smap reports state information for jobs, partitions, and nodes managed by SLURM, but graphically displays the information to reflect network topology.

sbatch is used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks.

squeue reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order.
You can check on your currently running jobs using the commands:


squeue --me
squeue -u <userid>

If you want to check the running jobs under an Account or from a certain group:
squeue -A <groupName>

If you want to see running jobs on a node and a partition:
squeue -p <PartitionID>
squeue -w <nodeID>
 

sacct is a utility used for retrieving and displaying accouting data and job information like job status, history, resource consumption, efficiency analysis, customizable output. For example the following command can be used to extract information about processed jobs of yours over a time period:
 

sacct --starttime=2022-01-01 --endtime=2022-12-01 --format="user,account%15,jobid%15,nodelist%15,state%20,jobname%20,partition,start,elapsed,TotalCPU," | less

The command above will list down all the jobs that you ran over the year of 2022 from January 1st till December 1st.
If you want to add more arguments for the --format command to extract specifics of jobs, use the following command to see the sacct arguments:

sacct -e


srun is used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics (so much memory, disk space, certain required features, etc.).
A job can contain multiple job steps executing sequentially or in parallel on independent or shared nodes within the job's node allocation.

smap reports state information for jobs, partitions, and nodes managed by SLURM, but graphically displays the information to reflect network topology.

scancel is used to stop a job early. Example, when you queue the wrong script or you know it's going to fail because you forgot something.
See more in "Monitoring Jobs" in the Slurm Example Scripts article in the Help Documents.
More in-depth information at http://slurm.schedmd.com/documentation.html
 

How do I check my Slurm resource?

Users can see their association using commands like:
sacctmgr show association user=$USER

Users can see their group/association assigned partitions and memory:
sacctmgr show qos format=name%-40,priority,usagefactor,grptres%40 | egrep "GROUPID|UsageFactor"

 

How do I see Slurm partitions and nodes?

Users can see the available nodes on available partitions using the following command:
 
sinfo
 
This command lists all the available partitions, their state, and relevant nodes. If users want to see details of each partition or each node, they can use these commands:
 
scontrol show partition <partition-name>
 
scontrol show node <node-name>