#slurm — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #slurm, aggregated by home.social.
-
От майнинга на попутном газе к AI-фабрикам: история Crusoe
У AI-индустрии есть серьезная проблема: как развернуть вычислительную инфраструктуру раньше и быстрее (да еще и дешевле) конкурентов? Основной дефицитный ресурс сейчас — электричество, а не чипы или их компоненты, как вы могли предположить. Техногиганты думают, где поставить стойки, чем их охлаждать, но главное, где взять энергию, чтобы питать всю AI-систему. И у одного стартапа из Денвера есть нестандартное решение — портативные модульные AI-дата-центры, которые можно размещать в самых нестандартных условиях. Компания пришла в ИТ из мира крипты: изначально она вела деятельность установкой майнинг-машин, которые брали энергию от попутного газа на нефтяных вышках. Сегодня я расскажу вам о компании Crusoe — которая крайне нестандартно превращает энергию в вычислительную мощность. Разберем их бизнес-модель и поймем, что такое вертикально интегрированная AI-инфраструктура.
https://habr.com/ru/companies/ru_mts/articles/1022116/
#Crusoe #AIинфраструктура #датацентры #GPUоблако #облачные_вычисления #inference #Kubernetes #Slurm #edge_computing #энергетика
-
RE: https://fediscience.org/@snakemake/116295568336688286
This is a big step forward: The SLURM plugin for Snakemake now supports so-called job arrays. These are cluster jobs, with ~ equal resource requirements in terms of memory and compute resources.
The change in itself was big: The purpose of a workflow system is to make use of the vast resources of an HPC cluster. Hence, jobs are submitted to run concurrently. However, for a job array, we have to "wait" for all eligible jobs to be ready. And then we submit.
To preserve concurrent execution of other jobs which are ready to be executed, a thread pool has been introduced. In itself, I do not see job arrays as such a big feature: The LSF system profited much more from arrays than the rather lean SLURM implementation does.
BUT: the new code base will ease further development to pooling many shared memory tasks (applications which support no parallel execution or are confined to one computer by "only" supporting threading). Until then, there is more work to do.
#HPC #SLURM #Snakemake #SnakemakeHackathon2026 #ReproducibleComputing #OpenScience
-
RE: https://fediscience.org/@snakemake/116295568336688286
This is a big step forward: The SLURM plugin for Snakemake now supports so-called job arrays. These are cluster jobs, with ~ equal resource requirements in terms of memory and compute resources.
The change in itself was big: The purpose of a workflow system is to make use of the vast resources of an HPC cluster. Hence, jobs are submitted to run concurrently. However, for a job array, we have to "wait" for all eligible jobs to be ready. And then we submit.
To preserve concurrent execution of other jobs which are ready to be executed, a thread pool has been introduced. In itself, I do not see job arrays as such a big feature: The LSF system profited much more from arrays than the rather lean SLURM implementation does.
BUT: the new code base will ease further development to pooling many shared memory tasks (applications which support no parallel execution or are confined to one computer by "only" supporting threading). Until then, there is more work to do.
#HPC #SLURM #Snakemake #SnakemakeHackathon2026 #ReproducibleComputing #OpenScience
-
RE: https://fediscience.org/@snakemake/116295568336688286
This is a big step forward: The SLURM plugin for Snakemake now supports so-called job arrays. These are cluster jobs, with ~ equal resource requirements in terms of memory and compute resources.
The change in itself was big: The purpose of a workflow system is to make use of the vast resources of an HPC cluster. Hence, jobs are submitted to run concurrently. However, for a job array, we have to "wait" for all eligible jobs to be ready. And then we submit.
To preserve concurrent execution of other jobs which are ready to be executed, a thread pool has been introduced. In itself, I do not see job arrays as such a big feature: The LSF system profited much more from arrays than the rather lean SLURM implementation does.
BUT: the new code base will ease further development to pooling many shared memory tasks (applications which support no parallel execution or are confined to one computer by "only" supporting threading). Until then, there is more work to do.
#HPC #SLURM #Snakemake #SnakemakeHackathon2026 #ReproducibleComputing #OpenScience
-
RE: https://fediscience.org/@snakemake/116295568336688286
This is a big step forward: The SLURM plugin for Snakemake now supports so-called job arrays. These are cluster jobs, with ~ equal resource requirements in terms of memory and compute resources.
The change in itself was big: The purpose of a workflow system is to make use of the vast resources of an HPC cluster. Hence, jobs are submitted to run concurrently. However, for a job array, we have to "wait" for all eligible jobs to be ready. And then we submit.
To preserve concurrent execution of other jobs which are ready to be executed, a thread pool has been introduced. In itself, I do not see job arrays as such a big feature: The LSF system profited much more from arrays than the rather lean SLURM implementation does.
BUT: the new code base will ease further development to pooling many shared memory tasks (applications which support no parallel execution or are confined to one computer by "only" supporting threading). Until then, there is more work to do.
#HPC #SLURM #Snakemake #SnakemakeHackathon2026 #ReproducibleComputing #OpenScience
-
RE: https://fediscience.org/@snakemake/116295568336688286
This is a big step forward: The SLURM plugin for Snakemake now supports so-called job arrays. These are cluster jobs, with ~ equal resource requirements in terms of memory and compute resources.
The change in itself was big: The purpose of a workflow system is to make use of the vast resources of an HPC cluster. Hence, jobs are submitted to run concurrently. However, for a job array, we have to "wait" for all eligible jobs to be ready. And then we submit.
To preserve concurrent execution of other jobs which are ready to be executed, a thread pool has been introduced. In itself, I do not see job arrays as such a big feature: The LSF system profited much more from arrays than the rather lean SLURM implementation does.
BUT: the new code base will ease further development to pooling many shared memory tasks (applications which support no parallel execution or are confined to one computer by "only" supporting threading). Until then, there is more work to do.
#HPC #SLURM #Snakemake #SnakemakeHackathon2026 #ReproducibleComputing #OpenScience
-
A few #slurm tidbits:
Total submitted jobs per user, sorted:
```
squeue | sed 's/ \+/\t/g' | cut -f5 \
| sort | uniq -c | sort -hr
```Running jobs per user:
```
squeue | grep ' R ' | sed 's/ \+/\t/g' \
| cut -f5 | sort | uniq -c | sort -hr
```Pending jobs per user:
```
squeue | grep ' PD ' | sed 's/ \+/\t/g' \
| cut -f5 | sort | uniq -c | sort -hr
``` -
As for the little executor plugin for the #SLURM batch system (for which I promised a release supporting array job support) ... Well, only a little bug fix release could be accomplished: https://github.com/snakemake/snakemake-executor-plugin-slurm/releases/tag/v2.5.4
Unfortunately, I wanted to use the common #Snakemake logo without the letters "#HPC" and missed one entry. So our announcement bot did not work.
Anyway, a faulty file system connection kept me from debugging the new feature. Stay tuned. It is almost ready.
-
As for the little executor plugin for the #SLURM batch system (for which I promised a release supporting array job support) ... Well, only a little bug fix release could be accomplished: https://github.com/snakemake/snakemake-executor-plugin-slurm/releases/tag/v2.5.4
Unfortunately, I wanted to use the common #Snakemake logo without the letters "#HPC" and missed one entry. So our announcement bot did not work.
Anyway, a faulty file system connection kept me from debugging the new feature. Stay tuned. It is almost ready.
-
As for the little executor plugin for the #SLURM batch system (for which I promised a release supporting array job support) ... Well, only a little bug fix release could be accomplished: https://github.com/snakemake/snakemake-executor-plugin-slurm/releases/tag/v2.5.4
Unfortunately, I wanted to use the common #Snakemake logo without the letters "#HPC" and missed one entry. So our announcement bot did not work.
Anyway, a faulty file system connection kept me from debugging the new feature. Stay tuned. It is almost ready.
-
As for the little executor plugin for the #SLURM batch system (for which I promised a release supporting array job support) ... Well, only a little bug fix release could be accomplished: https://github.com/snakemake/snakemake-executor-plugin-slurm/releases/tag/v2.5.4
Unfortunately, I wanted to use the common #Snakemake logo without the letters "#HPC" and missed one entry. So our announcement bot did not work.
Anyway, a faulty file system connection kept me from debugging the new feature. Stay tuned. It is almost ready.
-
As for the little executor plugin for the #SLURM batch system (for which I promised a release supporting array job support) ... Well, only a little bug fix release could be accomplished: https://github.com/snakemake/snakemake-executor-plugin-slurm/releases/tag/v2.5.4
Unfortunately, I wanted to use the common #Snakemake logo without the letters "#HPC" and missed one entry. So our announcement bot did not work.
Anyway, a faulty file system connection kept me from debugging the new feature. Stay tuned. It is almost ready.
-
Finally, some personal progress: Thanks to @fbartusch a bug of the #SLURM executor plugin for Snakemake was fixed (dealing with nested quoting). A release is upcoming.
And: I generated my first (still faulty) test #nanopub from Snakemake 🥳
-
This cannot be:
I am trying to compile a few stats for the #Snakemake executor plugin for #SLURM on #HPC systems. Preparing for a lighting talk at the #SnakemakeHackathon2026
PyPi: 20,000 downloads last month
BioConda: > 60,000 total (aggregated over all versions)Impressive as it might be, this is contradictory. PyPi would exceed BioConda by a huge margin.
Does anyone know how to get all-time statistics from either platform? #BioConda or #PyPi?
-
The #Snakemake plugin for #SLURM on #HPC clusters will support JobArrays, soon:
1057691_1 2dcf44cc-+ rule_map_reads_wild+ 32 COMPLETED 0:0
1057691_2 2dcf44cc-+ 32 RUNNING 0:0
1057691_3 2dcf44cc-+ 32 RUNNING 0:0
1057691_4 2dcf44cc-+ 32 RUNNING 0:0
1057691_5 2dcf44cc-+ 32 RUNNING 0:0
1057691_6 2dcf44cc-+ 32 RUNNING 0:0Hope to do more during next week's #SnakemakeHackathon2026 / #SnakemakeHackathon
-
On a similar note: there is another (draft) PR. The #SLURM executor plugin for #Snakemake is capable of respecting partition definitions since v. 2.
I had the notion, that this is rather difficult to set this up manually and wrote a little command line helper. It queries the SLURM config and writes out a preliminary partition configuration template. This still requires manual adaptation, I'm afraid.
A small step forward as it requires both an understanding of Snakemake and your local SLURM setup. The world is as is it is, the phantasy of admin teams is unlimited and a one-fits-all solution is not on the horizon.
Still, if you want to try it out and provide feedback, this would be very much appreciated! All suggestions are welcome!
-
I want to reach out: I have this pending release for the SLURM executor (https://github.com/snakemake/snakemake-executor-plugin-slurm/pull/412 ). It implements better error feedback (in case of hardware failures and otherwise). It would need some thorough checking, and I cannot provoke too many hardware failures. Is anyone working on an older cluster? 😉
Feedback (as issues) would be appreciated. Also nice, if you tell me it is working, here.
-
PSA for my #HPC cluster operators out there. A new CVE was announced for #MUNGE, a popular authentication mechanism used in #Slurm
https://github.com/dun/munge/security/advisories/GHSA-r9cr-jf4v-75gh
-
What Does #Nvidia’s Acquisition of #SchedMD Mean for #Slurm?
Slurm was developed at LLNL in the early 2000s to replace commercial workload management software for #HPC clusters and #supercomputers.
Addison Snell, the CEO of Intersect360, says the acquisition of SchedMD makes sense considering the emerging focus on developing #AI models to accelerate scientific discovery and engineering, and the need to integrate traditional HPC workloads and new AI ones.
https://www.hpcwire.com/2026/01/06/what-does-nvidias-acquisition-of-schedmd-mean-for-slurm/ -
Does anyone here use the #Slurm `nss_slurm` extension?
I see in Slurm's documentation an example of how to enable the extension, but I can't find any examples of the referenced /etc/nss_slurm.conf file anywhere...
The source code of the extension seems to indicate that it is very simple file - e.g.
```
NodeName=<nodename>
SlurmdSpoolDir=<dir>
```but I just want an example to ensure that my assumptions are correct 😅
-
Nvidia’s acquisition of SchedMD, the company behind Slurm, is a strategic move that goes far beyond GPUs.
Slurm (Simple Linux Utility for Resource Management) is the de facto open-source workload manager for large-scale GPU clusters, widely used in supercomputing centers, AI labs, hyperscalers, and cloud GPU operators. It plays a critical role in ...
#NVIDIA #AIInfrastructure #OpenSource #Slurm #HPC #GPUs #AITraining #CloudComputing #tech #DataCenters
-
#Nvidia pledges more openness as it slurps up #Slurm
The chip giant revealed yesterday that it had acquired #SchedMD, the key developer behind Slurm, which Nvidia described in a statement as "an #opensource workload management system for high-performance computing (#HPC) and #AI."
"Nvidia insisted it will also support "a diverse hardware and software ecosystem, so customers can run heterogeneous clusters with the latest Slurm innovations."
https://www.theregister.com/2025/12/16/nvidia_slurm_nemotron/Still makes me quite nervous!
-
RE: https://mstdn.social/@TechCrunch/115725841067428721
This is pretty awful. The reason for anti-monopoly regulations are to prevent one bully from taking over the school yard. The bully (#nvidia) buying #slurm is flatly bad. It will not improve code quality, and will be turned into a paid product.
Another situation where forking into a public project (gurm? good slurm?) may be the best bet. It's only a matter of time.
-
The sound you hear is coming from #HPC data centers as jaws drop and the research community gasps at the news that #NVIDIA acquired SchedMD - the #SLURM developer.
SLURM is very popular for job scheduling on high performance compute clusters. Let us hope this will at least keep #CoPilot from being bolted onto SLURM. After a recent experience with CoPilot on GitHub, I’m questioning some of my life choices.
-
RE: https://social.heise.de/@heiseonlineenglish/115725340534744039
In any kind of normal timeline this would have been stopped by regulators. Instead we let a single company become the sole gatekeeper for everything in #HPC .
SchedMD are the makers of #slurm , the (only) Open Source cluster scheduler, and used by a large majority of cluster computers. Nvidia bought Mellanox a few years ago, the maker of Infiniband, the main network technology for clusters.
-
Managing cluster jobs... from the terminal 💯
🌀 **turm** — A TUI for the Slurm Workload Manager.
🔥 Supports all squeue flags, auto-updates logs & more!
🦀 Written in Rust & built with @ratatui_rs
⭐ GitHub: https://github.com/kabouzeid/turm
#rustlang #ratatui #tui #hpc #slurm #observability #devops #terminal
-
RE: https://fediscience.org/@snakemake/115684609969502897
This was a yolo-release, I'm afraid: I am starting to dislike multicluster setups at #SLURM - the code base leaves something to be desired in terms of error messages and configuration flexibility. 😅
-
RE: https://fediscience.org/@snakemake/115611862667755622
Now, this is huge!
Thanks to a contribution from Cade Mirchandani (Santa Cruz, CA), whom I met at this year's #SnakemakeHackathon users can now supply a partition profile. So, instead of wrangling #SLURM partition information into a workflow profile (indicated with --workflow-profile), we can now have a global file to contain this information.
I added a time conversion function, such that the SLURM time format is obeyed, too.
There are several other development needs, before we continue in this direction (e.g. parsing SLURM partition information directly). But a task to be done is summing this up for non-users, e.g. administrators, is due too.
In any case, I think this merits a new major version.
-
🚀 Wow, someone out there thought, "Hey, let's mix high-performance computing with... Docker Compose! Because why not?" 🤦♂️ Now you can enjoy the thrill of virtualized #SLURM clusters without leaving your couch. Just what the tech world needed: more complexity in a box! 📦✨
https://github.com/exactlab/vhpc #highperformancecomputing #DockerCompose #virtualization #techinnovation #complexityinabox #HackerNews #ngated -
A #Slurm user just confirmed that "yay it works. Pretty sick!"
Thanks to excellent feedback from several users, it'll soon be even easier to distribute #rstats code via #HPC job schedulers using future.batchtools
-
Hoy reinstalamos el cluster de dinamica molecular del CURE con Debian 12, 5 PCs con OpenMPI y un RAID-5 para almacenar los cálculos... el dios debianita debe estar orgulloso de mi y contento por la faena #cluster #cure #openmpi #slurm #debian #moleculardynamics #fisica #physics #dinamicamolecular
-
For @centos #Hyperscale users, we now have a fixed #pmux for CVE-2023-41915 and an upgraded #openmpi and rebuilt #slurm to go with it
https://nvd.nist.gov/vuln/detail/CVE-2023-41915
https://pagure.io/centos-sig-hyperscale/sig/issue/156
Instructions on enabling this if you are interested in trying it out
https://sigs.centos.org/hyperscale/content/repositories/main/
-
#SLURM presentations from #SC22 now available: https://slurm.schedmd.com/publications.html
There’s one I specifically had been hoping to be discussed for so long, and it finally happened: Slurm and/or/versus Kubernetes
It talks about potentially getting the slurm to work with K8s. Something that may not exactly be necessary as engineers do the on-prem and k8s separately, but there are still good reasons to think about the possibility of integration; like managing one infrastructure vs. two. -
#SLURM presentations from #SC22 now available: https://slurm.schedmd.com/publications.html
There’s one I specifically had been hoping to be discussed for so long, and it finally happened: Slurm and/or/versus Kubernetes
It talks about potentially getting the slurm to work with K8s. Something that may not exactly be necessary as engineers do the on-prem and k8s separately, but there are still good reasons to think about the possibility of integration; like managing one infrastructure vs. two. -
#SLURM presentations from #SC22 now available: https://slurm.schedmd.com/publications.html
There’s one I specifically had been hoping to be discussed for so long, and it finally happened: Slurm and/or/versus Kubernetes
It talks about potentially getting the slurm to work with K8s. Something that may not exactly be necessary as engineers do the on-prem and k8s separately, but there are still good reasons to think about the possibility of integration; like managing one infrastructure vs. two. -
I just created the #PIRA v0.5.0 release. It now comes with experimental #Slurm integration, so you can run individual measurements through the scheduler.
Go check it out at https://github.com/tudasc/PIRA/releases/tag/v0.5.0