I am interested in ways that we could bring cloud execution features to our SLURM cluster, and potentially other HPC and HPC-like environments. Currently, while we run many of our production workloads with Seqera Platform + AWS Batch + Wave / Fusion + S3, we also maintain AWS ParallelCluster (SLURM) deployments for users whose use cases or preferences fit this context. One of the primary challenges with usage of a SLURM cluster on AWS is allocation of FSx storage (expensive, fills up all the time). Currently, users whose pipelines or dev work have large storage footprints are required to forgo usage of SLURM and instead must use AWS Batch (with or without Tower) + S3. Usage of AWS Batch also allows for usage of Wave / Fusion, which is an added bonus.
Ultimately, developers and users are forced to choose between running their pipelines more easily and conveniently on SLURM, or migrate everything to AWS Batch. Also worth noting, that many SLURM users may not have AWS Console access and so visibility into the AWS Batch execution and management is impaired, compared to SLURM where all users can easily see and manage their workload.
If it was possible to use some of these cloud technologies more closely in the ParallelCluster environment, then it seems that we could be able to allow users to execute their pipelines locally using SLURM while still being able to use things such as S3 storage for their input files, work dir, and/or publish dir, and possibly also utilize Wave / Fusion to further reduce the storage impact of the running pipeline.
There is already a ticket here about Fusion support on HPC, which is very similar; https://feedback.seqera.io/fusion-feature-requests/p/fusion-support-for-hpc-environments.
Are some of these usages already possible? If not, would it be feasible to expand the support for such cloud based technologies when using SLURM as the Nextflow Executor?
To summarize, this is what I have in mind;
  • Nextflow running with SLURM executor
  • input files, work dir, and publish dir on S3
  • bonus: Wave / Fusion, potentially running with Singularity? Might need to use some of the new Singularity OCI container modes? Not sure https://www.nextflow.io/docs/latest/config.html#scope-singularity
  • any other cloud tech that might be leveraged from inside of a SLURM job instead of through AWS Batch
  • features that currently work in SLURM but not on AWS Batch such as Nextflow Secrets would be usable this way perhaps
Sorry the scope on this Feature Request is kinda big, the S3 usages are the most pertinent changes (if a change is needed), but it would be great if it was possible to expand it to include some of these other aspects.