We received a customer request to generate a resource usages from individual jobs inside our High Performance Cluster environment. We have Zabbix installed in the environment; however, we want to see if we could generate something from the system itself natively.

OpenHPC uses SLURM to manage worlloads. Therefore, we can dive into SLURM commands to approach this task. None of us have taken SchedMD training courses, our knowledge came from Slurm documentation[1], or plan Google searches. This is a good opportunity and use case to get ourselves more familiar with HPC ecosystems.

Approach

We know “sacct“ is a Slurm command that displays accounting data for all jobs and job steps in the Slurm job accoutning log and/or database[2]. Therefore, I started to going through its manual page.

sacct has options to query a start and end dates, and it can also toggle between the current user and all users’ job history. This is good. So we test the following command:

1
sacct -S 2022-10-01 -E 2022-10-14 -X -a -o JobID,Start,End,Elapsed,ReqMem,ReqCPUS,User,Node

where:

  • -S is the start date of the query
  • -E is the end date of the query
  • -X is statistics relevant to each job allocation
  • -a is all users
  • and -o stand for format. Note

all queried fields cannot have space after comma, it will treat whatever behind the first space seperated arguments.

The output showed us exactly what we needed. Cool. Now we can capture the data displayed on screen into a file by redirect the standand output with “>

1
sacct -S 2022-10-01 -E 2022-10-14 -X -a -o JobID,Start,End,Elapsed,ReqMem,ReqCPUS,User,Node > filename

But, I want to save the data into a CSV format. This way I can just send my file as a report to the customer.

I did not see any options to convert tabs between each field to commas; however, there is an option, –parsable2 to delimited data with “|”. That’s good enough, we can always manipulate string inside a Bash Script.

Now the command has became:

1
sacct -S 2022-10-01 -E 2022-10-14 -X -a -o JobID,Start,End,Elapsed,ReqMem,ReqCPUS,User,Node --parsable2 > filename

sed command is a Linux/Unix command, that can transform text by replacing a string with another string. Here, we want to replace “|“ with “,“.

1
sed 's/|/,/g' filename > new_filename.csv

This will do the trick.

Implentation

We can write these two line of bash command into a script:

1
2
3
4
5
#!/bin/bash
sacct -S 2022-10-01 -E 2022-10-14 -X -a -o JobID,Start,End,Elapsed,ReqMem,ReqCPUS,User,Node --parsable2 > filename
sed 's/|/,/g' filename > new_filename.csv
# remove filename
rm filename

Since filename is just a temporary file; we really do not need to save it when we really just need to CSV file. We can just delete it at the end. We can use sed --in-place to write new output into the same file; but we still will have a backup copy of the original to deal with. It makes no difference between in-place editing and create a temprary file, then delete it later.

Another argument of why we did our approach is we do not have much data; therefore, we just don’t care if we have to write two files to the disk.

Afterthought

There are a couple of more things we can do to make this tiny script more useful:

  1. We can make the start and end date dynamic so that we can run the same script in the future.
  2. With a dynamic start and end date, we make a cronjob to run this script monthly.
  3. Email the output CSV file to a email group or SCP to a shared file server for review.

Nonetheless, our customer hasn’t asked any automated reports yet; so, this is it for now.


References

  1. slurm.schedmd.com/documentation.html
  2. https://slurm.schedmd.com/sacct.html