Continuous cache pruning | Voters | Seqera Feedback Forum

Continuous cache pruning

acknowledged

Medical Grouse

Our NextFlow pipelines generate very large interim files - sometimes this can be 6-8X the size of the input data. Storing all this data can be a burden, especially in resource constrained HPC systems - our pipelines sometimes fail just because they run out of scratch space. This could be (partially) address if we could clean the scratch directories _during pipeline execution_.

February 12, 2024

Ben Sherman

To your question about resumability, it is possible to resume from a deleted run as long as you have enough metadata about the deleted tasks, such as the file checksums. It just requires a fair amount of refactoring "under the hood". i hope that the nf-boost plugin will reduce the urgency of the feature so that we can take our time with the resumability piece and make sure to get it right.

Ben Sherman

Hi Peter, the basic cleanup functionality has been implemented in the PR but there are some other challenges around resumability that we still need to figure out. I hope to have it ready by 24.10 but can't make any guarantees at the moment. In the meantime, since I have heard from many users who don't need the resumability, I have created a plugin called nf-boost (https://github.com/bentsherman/nf-boost) which provides the basic cleanup as an experimental feature. You are welcome to try it out and share any feedback on the plugin GitHub, and if it works fine for you, use it for the time being. All of the improvements in nf-boost will be incorporated into the core feature when it is merged.

Medical Grouse

(sorry for slow reply)

This PR looks great, is there a timeline to get in merged? Just to be clear, I think we would accept that a

cleanup='eager'

mode would sacrifice resumability - surely it has to, since you've cleaned up the files you would use to resume??

Rob Newman

marked this post as

acknowledged

Rob Newman

Medical Grouse: Does this Nextflow draft PR satisfy your request? https://github.com/nextflow-io/nextflow/pull/3849

If so, please let us know and we'll close this feature request here. Thanks!