home.social

#slurm — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #slurm, aggregated by home.social.

  1. Just did a major overhaul of my "top, but for #slurm" util! Might be useful to #hpc admins and users alike. Appreciate any bug reports, especially crashes or incompatability!

    github.com/buzh/slop

  2. От майнинга на попутном газе к AI-фабрикам: история Crusoe

    У AI-индустрии есть серьезная проблема: как развернуть вычислительную инфраструктуру раньше и быстрее (да еще и дешевле) конкурентов? Основной дефицитный ресурс сейчас — электричество, а не чипы или их компоненты, как вы могли предположить. Техногиганты думают, где поставить стойки, чем их охлаждать, но главное, где взять энергию, чтобы питать всю AI-систему. И у одного стартапа из Денвера есть нестандартное решение — портативные модульные AI-дата-центры, которые можно размещать в самых нестандартных условиях. Компания пришла в ИТ из мира крипты: изначально она вела деятельность установкой майнинг-машин, которые брали энергию от попутного газа на нефтяных вышках. Сегодня я расскажу вам о компании Crusoe — которая крайне нестандартно превращает энергию в вычислительную мощность. Разберем их бизнес-модель и поймем, что такое вертикально интегрированная AI-инфраструктура.

    habr.com/ru/companies/ru_mts/a

    #Crusoe #AIинфраструктура #датацентры #GPUоблако #облачные_вычисления #inference #Kubernetes #Slurm #edge_computing #энергетика

  3. RE: fediscience.org/@snakemake/116

    This is a big step forward: The SLURM plugin for Snakemake now supports so-called job arrays. These are cluster jobs, with ~ equal resource requirements in terms of memory and compute resources.

    The change in itself was big: The purpose of a workflow system is to make use of the vast resources of an HPC cluster. Hence, jobs are submitted to run concurrently. However, for a job array, we have to "wait" for all eligible jobs to be ready. And then we submit.

    To preserve concurrent execution of other jobs which are ready to be executed, a thread pool has been introduced. In itself, I do not see job arrays as such a big feature: The LSF system profited much more from arrays than the rather lean SLURM implementation does.

    BUT: the new code base will ease further development to pooling many shared memory tasks (applications which support no parallel execution or are confined to one computer by "only" supporting threading). Until then, there is more work to do.

    #HPC #SLURM #Snakemake #SnakemakeHackathon2026 #ReproducibleComputing #OpenScience

  4. RE: fediscience.org/@snakemake/116

    This is a big step forward: The SLURM plugin for Snakemake now supports so-called job arrays. These are cluster jobs, with ~ equal resource requirements in terms of memory and compute resources.

    The change in itself was big: The purpose of a workflow system is to make use of the vast resources of an HPC cluster. Hence, jobs are submitted to run concurrently. However, for a job array, we have to "wait" for all eligible jobs to be ready. And then we submit.

    To preserve concurrent execution of other jobs which are ready to be executed, a thread pool has been introduced. In itself, I do not see job arrays as such a big feature: The LSF system profited much more from arrays than the rather lean SLURM implementation does.

    BUT: the new code base will ease further development to pooling many shared memory tasks (applications which support no parallel execution or are confined to one computer by "only" supporting threading). Until then, there is more work to do.

    #HPC #SLURM #Snakemake #SnakemakeHackathon2026 #ReproducibleComputing #OpenScience

  5. RE: fediscience.org/@snakemake/116

    This is a big step forward: The SLURM plugin for Snakemake now supports so-called job arrays. These are cluster jobs, with ~ equal resource requirements in terms of memory and compute resources.

    The change in itself was big: The purpose of a workflow system is to make use of the vast resources of an HPC cluster. Hence, jobs are submitted to run concurrently. However, for a job array, we have to "wait" for all eligible jobs to be ready. And then we submit.

    To preserve concurrent execution of other jobs which are ready to be executed, a thread pool has been introduced. In itself, I do not see job arrays as such a big feature: The LSF system profited much more from arrays than the rather lean SLURM implementation does.

    BUT: the new code base will ease further development to pooling many shared memory tasks (applications which support no parallel execution or are confined to one computer by "only" supporting threading). Until then, there is more work to do.

    #HPC #SLURM #Snakemake #SnakemakeHackathon2026 #ReproducibleComputing #OpenScience

  6. RE: fediscience.org/@snakemake/116

    This is a big step forward: The SLURM plugin for Snakemake now supports so-called job arrays. These are cluster jobs, with ~ equal resource requirements in terms of memory and compute resources.

    The change in itself was big: The purpose of a workflow system is to make use of the vast resources of an HPC cluster. Hence, jobs are submitted to run concurrently. However, for a job array, we have to "wait" for all eligible jobs to be ready. And then we submit.

    To preserve concurrent execution of other jobs which are ready to be executed, a thread pool has been introduced. In itself, I do not see job arrays as such a big feature: The LSF system profited much more from arrays than the rather lean SLURM implementation does.

    BUT: the new code base will ease further development to pooling many shared memory tasks (applications which support no parallel execution or are confined to one computer by "only" supporting threading). Until then, there is more work to do.

    #HPC #SLURM #Snakemake #SnakemakeHackathon2026 #ReproducibleComputing #OpenScience

  7. RE: fediscience.org/@snakemake/116

    This is a big step forward: The SLURM plugin for Snakemake now supports so-called job arrays. These are cluster jobs, with ~ equal resource requirements in terms of memory and compute resources.

    The change in itself was big: The purpose of a workflow system is to make use of the vast resources of an HPC cluster. Hence, jobs are submitted to run concurrently. However, for a job array, we have to "wait" for all eligible jobs to be ready. And then we submit.

    To preserve concurrent execution of other jobs which are ready to be executed, a thread pool has been introduced. In itself, I do not see job arrays as such a big feature: The LSF system profited much more from arrays than the rather lean SLURM implementation does.

    BUT: the new code base will ease further development to pooling many shared memory tasks (applications which support no parallel execution or are confined to one computer by "only" supporting threading). Until then, there is more work to do.

    #HPC #SLURM #Snakemake #SnakemakeHackathon2026 #ReproducibleComputing #OpenScience

  8. A few #slurm tidbits:

    Total submitted jobs per user, sorted:
    ```
    squeue | sed 's/ \+/\t/g' | cut -f5 \
    | sort | uniq -c | sort -hr
    ```

    Running jobs per user:
    ```
    squeue | grep ' R ' | sed 's/ \+/\t/g' \
    | cut -f5 | sort | uniq -c | sort -hr
    ```

    Pending jobs per user:
    ```
    squeue | grep ' PD ' | sed 's/ \+/\t/g' \
    | cut -f5 | sort | uniq -c | sort -hr
    ```

    #bash #hpc

  9. As for the little executor plugin for the #SLURM batch system (for which I promised a release supporting array job support) ... Well, only a little bug fix release could be accomplished: github.com/snakemake/snakemake

    Unfortunately, I wanted to use the common #Snakemake logo without the letters "#HPC" and missed one entry. So our announcement bot did not work.

    Anyway, a faulty file system connection kept me from debugging the new feature. Stay tuned. It is almost ready.

    #SnakemakeHackathon2026

  10. As for the little executor plugin for the #SLURM batch system (for which I promised a release supporting array job support) ... Well, only a little bug fix release could be accomplished: github.com/snakemake/snakemake

    Unfortunately, I wanted to use the common #Snakemake logo without the letters "#HPC" and missed one entry. So our announcement bot did not work.

    Anyway, a faulty file system connection kept me from debugging the new feature. Stay tuned. It is almost ready.

    #SnakemakeHackathon2026

  11. As for the little executor plugin for the #SLURM batch system (for which I promised a release supporting array job support) ... Well, only a little bug fix release could be accomplished: github.com/snakemake/snakemake

    Unfortunately, I wanted to use the common #Snakemake logo without the letters "#HPC" and missed one entry. So our announcement bot did not work.

    Anyway, a faulty file system connection kept me from debugging the new feature. Stay tuned. It is almost ready.

    #SnakemakeHackathon2026

  12. As for the little executor plugin for the #SLURM batch system (for which I promised a release supporting array job support) ... Well, only a little bug fix release could be accomplished: github.com/snakemake/snakemake

    Unfortunately, I wanted to use the common #Snakemake logo without the letters "#HPC" and missed one entry. So our announcement bot did not work.

    Anyway, a faulty file system connection kept me from debugging the new feature. Stay tuned. It is almost ready.

    #SnakemakeHackathon2026

  13. As for the little executor plugin for the #SLURM batch system (for which I promised a release supporting array job support) ... Well, only a little bug fix release could be accomplished: github.com/snakemake/snakemake

    Unfortunately, I wanted to use the common #Snakemake logo without the letters "#HPC" and missed one entry. So our announcement bot did not work.

    Anyway, a faulty file system connection kept me from debugging the new feature. Stay tuned. It is almost ready.

    #SnakemakeHackathon2026

  14. Finally, some personal progress: Thanks to @fbartusch a bug of the #SLURM executor plugin for Snakemake was fixed (dealing with nested quoting). A release is upcoming.

    And: I generated my first (still faulty) test #nanopub from Snakemake 🥳

    #SnakemakeHackathon2026

  15. This cannot be:

    I am trying to compile a few stats for the #Snakemake executor plugin for #SLURM on #HPC systems. Preparing for a lighting talk at the #SnakemakeHackathon2026

    PyPi: 20,000 downloads last month
    BioConda: > 60,000 total (aggregated over all versions)

    Impressive as it might be, this is contradictory. PyPi would exceed BioConda by a huge margin.

    Does anyone know how to get all-time statistics from either platform? #BioConda or #PyPi?

  16. The #Snakemake plugin for #SLURM on #HPC clusters will support JobArrays, soon:

    1057691_1 2dcf44cc-+ rule_map_reads_wild+ 32 COMPLETED 0:0
    1057691_2 2dcf44cc-+ 32 RUNNING 0:0
    1057691_3 2dcf44cc-+ 32 RUNNING 0:0
    1057691_4 2dcf44cc-+ 32 RUNNING 0:0
    1057691_5 2dcf44cc-+ 32 RUNNING 0:0
    1057691_6 2dcf44cc-+ 32 RUNNING 0:0

    Hope to do more during next week's #SnakemakeHackathon2026 / #SnakemakeHackathon

  17. On a similar note: there is another (draft) PR. The #SLURM executor plugin for #Snakemake is capable of respecting partition definitions since v. 2.

    I had the notion, that this is rather difficult to set this up manually and wrote a little command line helper. It queries the SLURM config and writes out a preliminary partition configuration template. This still requires manual adaptation, I'm afraid.

    A small step forward as it requires both an understanding of Snakemake and your local SLURM setup. The world is as is it is, the phantasy of admin teams is unlimited and a one-fits-all solution is not on the horizon.

    Still, if you want to try it out and provide feedback, this would be very much appreciated! All suggestions are welcome!

  18. I want to reach out: I have this pending release for the SLURM executor (github.com/snakemake/snakemake ). It implements better error feedback (in case of hardware failures and otherwise). It would need some thorough checking, and I cannot provoke too many hardware failures. Is anyone working on an older cluster? 😉

    Feedback (as issues) would be appreciated. Also nice, if you tell me it is working, here.

    #Snakemake #HPC #SLURM #ReproducibleComputing

  19. PSA for my #HPC cluster operators out there. A new CVE was announced for #MUNGE, a popular authentication mechanism used in #Slurm

    github.com/dun/munge/security/

  20. What Does #Nvidia’s Acquisition of #SchedMD Mean for #Slurm?
    Slurm was developed at LLNL in the early 2000s to replace commercial workload management software for #HPC clusters and #supercomputers.
    Addison Snell, the CEO of Intersect360, says the acquisition of SchedMD makes sense considering the emerging focus on developing #AI models to accelerate scientific discovery and engineering, and the need to integrate traditional HPC workloads and new AI ones.
    hpcwire.com/2026/01/06/what-do

  21. Does anyone here use the #Slurm `nss_slurm` extension?

    I see in Slurm's documentation an example of how to enable the extension, but I can't find any examples of the referenced /etc/nss_slurm.conf file anywhere...

    The source code of the extension seems to indicate that it is very simple file - e.g.

    ```
    NodeName=<nodename>
    SlurmdSpoolDir=<dir>
    ```

    but I just want an example to ensure that my assumptions are correct 😅

  22. reuters.com/business/nvidia-bu

    Nvidia’s acquisition of SchedMD, the company behind Slurm, is a strategic move that goes far beyond GPUs.

    Slurm (Simple Linux Utility for Resource Management) is the de facto open-source workload manager for large-scale GPU clusters, widely used in supercomputing centers, AI labs, hyperscalers, and cloud GPU operators. It plays a critical role in ...

    #NVIDIA #AIInfrastructure #OpenSource #Slurm #HPC #GPUs #AITraining #CloudComputing #tech #DataCenters

  23. #Nvidia pledges more openness as it slurps up #Slurm
    The chip giant revealed yesterday that it had acquired #SchedMD, the key developer behind Slurm, which Nvidia described in a statement as "an #opensource workload management system for high-performance computing (#HPC) and #AI."
    "Nvidia insisted it will also support "a diverse hardware and software ecosystem, so customers can run heterogeneous clusters with the latest Slurm innovations."
    theregister.com/2025/12/16/nvi

    Still makes me quite nervous!

  24. @jannem
    > "of #slurm , the (only) Open Source cluster scheduler"
    The claim of its uniqueness seems to be an exaggeration to me.
    From top of my head I could name Mesos/Marathon and Ganglia, which are truly FOSS.

  25. RE: mstdn.social/@TechCrunch/11572

    This is pretty awful. The reason for anti-monopoly regulations are to prevent one bully from taking over the school yard. The bully (#nvidia) buying #slurm is flatly bad. It will not improve code quality, and will be turned into a paid product.

    Another situation where forking into a public project (gurm? good slurm?) may be the best bet. It's only a matter of time.

  26. The sound you hear is coming from #HPC data centers as jaws drop and the research community gasps at the news that #NVIDIA acquired SchedMD - the #SLURM developer.

    SLURM is very popular for job scheduling on high performance compute clusters. Let us hope this will at least keep #CoPilot from being bolted onto SLURM. After a recent experience with CoPilot on GitHub, I’m questioning some of my life choices.

    blogs.nvidia.com/blog/nvidia-a

  27. #nvidia to purchase developer of #slurm

    Arguably the de-facto open-source #HPC scheduler / queue / job management tool

    blogs.nvidia.com/blog/nvidia-a

  28. RE: social.heise.de/@heiseonlineen

    In any kind of normal timeline this would have been stopped by regulators. Instead we let a single company become the sole gatekeeper for everything in .

    SchedMD are the makers of , the (only) Open Source cluster scheduler, and used by a large majority of cluster computers. Nvidia bought Mellanox a few years ago, the maker of Infiniband, the main network technology for clusters.

  29. Managing cluster jobs... from the terminal 💯

    🌀 **turm** — A TUI for the Slurm Workload Manager.

    🔥 Supports all squeue flags, auto-updates logs & more!

    🦀 Written in Rust & built with @ratatui_rs

    ⭐ GitHub: github.com/kabouzeid/turm

  30. RE: fediscience.org/@snakemake/115

    This was a yolo-release, I'm afraid: I am starting to dislike multicluster setups at #SLURM - the code base leaves something to be desired in terms of error messages and configuration flexibility. 😅

  31. RE: fediscience.org/@snakemake/115

    Now, this is huge!

    Thanks to a contribution from Cade Mirchandani (Santa Cruz, CA), whom I met at this year's #SnakemakeHackathon users can now supply a partition profile. So, instead of wrangling #SLURM partition information into a workflow profile (indicated with --workflow-profile), we can now have a global file to contain this information.

    I added a time conversion function, such that the SLURM time format is obeyed, too.

    There are several other development needs, before we continue in this direction (e.g. parsing SLURM partition information directly). But a task to be done is summing this up for non-users, e.g. administrators, is due too.

    In any case, I think this merits a new major version.

    #Snakemake #HPC #ReproducibleComputing

  32. 🚀 Wow, someone out there thought, "Hey, let's mix high-performance computing with... Docker Compose! Because why not?" 🤦‍♂️ Now you can enjoy the thrill of virtualized #SLURM clusters without leaving your couch. Just what the tech world needed: more complexity in a box! 📦✨
    github.com/exactlab/vhpc #highperformancecomputing #DockerCompose #virtualization #techinnovation #complexityinabox #HackerNews #ngated

  33. Nuestro pequeño cluster para cálculos de dinámica molecular en el CURE. Configurar este cluster fue una gran experiencia que amplió notablemente mis conocimientos previos de sysadmin con Undernet. Le saco poco cartel #cluster #moleculardynamics #cure #udelar #diy #physics #debian #slurm #openmpi

  34. A #Slurm user just confirmed that "yay it works. Pretty sick!"

    Thanks to excellent feedback from several users, it'll soon be even easier to distribute #rstats code via #HPC job schedulers using future.batchtools

    future.batchtools.futureverse.

    #parallel #futureverse

  35. Finishing some runtime system work; decided to try a #deskpi #super6c cluster board.

    An ITX form factor beowulf cluster is amazing.

    #slurm & #nfs worked out of apt.

    #ucx, #openmpi, #openshmem, #openpmix, #gasnet, & #hpx needed custom compilation.

    @raspberrypi #arm #hpc #supercomputing

  36. Hoy reinstalamos el cluster de dinamica molecular del CURE con Debian 12, 5 PCs con OpenMPI y un RAID-5 para almacenar los cálculos... el dios debianita debe estar orgulloso de mi y contento por la faena #cluster #cure #openmpi #slurm #debian #moleculardynamics #fisica #physics #dinamicamolecular

  37. #SLURM presentations from #SC22 now available: slurm.schedmd.com/publications
    There’s one I specifically had been hoping to be discussed for so long, and it finally happened: Slurm and/or/versus Kubernetes
    It talks about potentially getting the slurm to work with K8s. Something that may not exactly be necessary as engineers do the on-prem and k8s separately, but there are still good reasons to think about the possibility of integration; like managing one infrastructure vs. two.

  38. #SLURM presentations from #SC22 now available: slurm.schedmd.com/publications
    There’s one I specifically had been hoping to be discussed for so long, and it finally happened: Slurm and/or/versus Kubernetes
    It talks about potentially getting the slurm to work with K8s. Something that may not exactly be necessary as engineers do the on-prem and k8s separately, but there are still good reasons to think about the possibility of integration; like managing one infrastructure vs. two.

  39. #SLURM presentations from #SC22 now available: slurm.schedmd.com/publications
    There’s one I specifically had been hoping to be discussed for so long, and it finally happened: Slurm and/or/versus Kubernetes
    It talks about potentially getting the slurm to work with K8s. Something that may not exactly be necessary as engineers do the on-prem and k8s separately, but there are still good reasons to think about the possibility of integration; like managing one infrastructure vs. two.

  40. I just created the #PIRA v0.5.0 release. It now comes with experimental #Slurm integration, so you can run individual measurements through the scheduler.

    Go check it out at github.com/tudasc/PIRA/release

    #HPC #MPI #Profiler #ScoreP #MetaCG

  41. Later today we'll release #PIRA v0.5.0

    Most changes are somewhat invisible too the user *except* the new #Slurm support. \o/
    You can now run PIRA on the login node and it will dispatch the actual measurements to your Slurm cluster. Be aware this is still highly experimental. But it's there!

    #HPC