Nextflow execution with S3-backed storage on SLURM / HPC, and other cloud features enabled
acknowledged
F
Flamingo pink Python
I am interested in ways that we could bring cloud execution features to our SLURM cluster, and potentially other HPC and HPC-like environments. Currently, while we run many of our production workloads with Seqera Platform + AWS Batch + Wave / Fusion + S3, we also maintain AWS ParallelCluster (SLURM) deployments for users whose use cases or preferences fit this context. One of the primary challenges with usage of a SLURM cluster on AWS is allocation of FSx storage (expensive, fills up all the time). Currently, users whose pipelines or dev work have large storage footprints are required to forgo usage of SLURM and instead must use AWS Batch (with or without Tower) + S3. Usage of AWS Batch also allows for usage of Wave / Fusion, which is an added bonus.
Ultimately, developers and users are forced to choose between running their pipelines more easily and conveniently on SLURM, or migrate everything to AWS Batch. Also worth noting, that many SLURM users may not have AWS Console access and so visibility into the AWS Batch execution and management is impaired, compared to SLURM where all users can easily see and manage their workload.
If it was possible to use some of these cloud technologies more closely in the ParallelCluster environment, then it seems that we could be able to allow users to execute their pipelines locally using SLURM while still being able to use things such as S3 storage for their input files, work dir, and/or publish dir, and possibly also utilize Wave / Fusion to further reduce the storage impact of the running pipeline.
There is already a ticket here about Fusion support on HPC, which is very similar; https://feedback.seqera.io/fusion-feature-requests/p/fusion-support-for-hpc-environments.
Are some of these usages already possible? If not, would it be feasible to expand the support for such cloud based technologies when using SLURM as the Nextflow Executor?
To summarize, this is what I have in mind;
- Nextflow running with SLURM executor
- input files, work dir, and publish dir on S3
- bonus: Wave / Fusion, potentially running with Singularity? Might need to use some of the new Singularity OCI container modes? Not sure https://www.nextflow.io/docs/latest/config.html#scope-singularity
- any other cloud tech that might be leveraged from inside of a SLURM job instead of through AWS Batch
- features that currently work in SLURM but not on AWS Batch such as Nextflow Secrets would be usable this way perhaps
Sorry the scope on this Feature Request is kinda big, the S3 usages are the most pertinent changes (if a change is needed), but it would be great if it was possible to expand it to include some of these other aspects.
F
Fellow Bee
Slurm + Fusion and AWS S3 is supported by Nextflow version 23.10.x.
To have this working it's key:
- Enable Fusion "freeze" mode, so that an ad-hoc Singularity container is created with the Fusion driver
- Make sure the compute node has an IAM Instance role that grant access to the required S3 bucket(s)
The nextflow configuration would look like this:
process.executor = 'slurm'
singularity.enabled = true
fusion.enabled = true
wave.enabled = true
wave.freeze = true
wave.build.repository = '<your OCI container repository to host Singularity images>'
F
Flamingo pink Python
Fellow Bee
Hey I had a quick question about this part;
> wave.build.repository = '<your OCI container repository to host Singularity images>'
how do I know if my container registry is OCI, and if it can host Singularity containers? We are using AWS ECR for Docker containers, I am not really sure what kinda of registry would be needed for us to host Singularity containers too, and what would be needed to be different to use it with Wave / Fusion as described
Rob Newman
acknowledged
F
Flamingo pink Python
some updates, these already appear to work
- using S3 files as inputs and outputs (publishDir) with SLURM jobs
- there are docs about using an S3 bucket as a local work dir here: https://www.nextflow.io/docs/latest/fusion.html#local-execution-with-s3-bucket-as-work-directory