Pipeline Data Lifecycle Management and Cleanup | Voters

Pipeline Data Lifecycle Management and Cleanup

acknowledged

Adam Talbot

This post consolidates multiple related requests around managing pipeline run data on the Seqera Platform, specifically in the areas of log persistence and cleanup of scratch/intermediate data.

Persistent Logs After Run Completion

Support for archiving essential execution data — including
.log, .timeline, .sh
scripts, and reports — to a persistent S3 bucket (Tower-managed or user-defined).
Allows run logs and metadata to remain visible in the Tower UI after work directories are deleted.
Enables integration with custom lifecycle rules for scratch storage.
Preserves artifacts critical for auditability, troubleshooting, and provenance.

Manual Clean-Up via Tower UI (
A “Clean” action in the run details interface to:
)

Trigger cleanup of intermediate files and caches (e.g.,
nextflow clean -l <run_name>
).
Reclaim storage post-execution while retaining archived logs.
Offer fine-grained control over data cleanup, one run at a time.

Auto Clean-Up Option at Launch

A toggle in the pipeline launch form to enable automatic cleanup when a run completes.
Useful for test or debug workflows where data retention is not required.
Prevents accumulation of scratch data without manual intervention.

Clean-Up on Run Deletion

An optional checkbox when deleting a run to also remove associated
scratch/workdir
data.
Ensures that run deletion can include cleanup of underlying execution storage.
Offers a single-step way to retire a workflow and its resource footprint.

Out of Scope

Automatic deletion of archived logs when a run is deleted.
Cleanup functionality for in-progress or running workflows.
Advanced or rule-based retention policies; current scope is limited to manual actions and simple toggles.

February 21, 2024

Anton Tsyganov-Bodounov

Merged in a post:

Simpler Execution Log Persistence

Brass Wildcat

After a pipeline completes on Seqera Platform it can be tricky to clean up execution work folders (to save space) while preserving useful log files.

This workflow could be improved by enabling Seqera Platform/Nextflow to send all logs to separate storage – outside of the pipeline run work folder, but still accessible by Seqera Platform.

July 30, 2025

Anton Tsyganov-Bodounov

Merged in a post:

Clear workdir data for deleted Runs in Tower

Brass Wildcat

Currently, users have large amounts of data on their scratch/workdir spaces which are shared between compute nodes. It would be good to have a cleanup option for runs so that you can manually cleanup the scratch space for that run specifically.
- When a user deletes a run from tower, the associated workdir data are cleaned up from the clients system 
- Suggestion: on delete, execute nextflow clean
 for the run

July 30, 2025

Yinmn blue Partridge

This would be very useful in some form.

Net Ox

This is crucial since otherwise there is little point in a scatch directory. All files required to render the tower interface such as log files and the files required to render reports should be preserved in a place other than scratch.

Bronze Crow

How about a

Clean up after run

checkbox/toggle setting the pipeline launch parameters? It is not critical, but in some scenarios it could be useful to automatically clean up after the pipeline has finished. I am thinking about testing and debugging new pipelines where you want to make sure they are working well on their own, for example.

Rob Newman

marked this post as

acknowledged

Indigo Wildcat

This would be great--we apply lifecycle rules to clear out our working directory after a few weeks, but would be great to retain nextflow logs elsewhere (not concerned about storage costs for them, likely to be negligible)

Brass Wildcat

Merged in a post:

Archive logs & files from jobs

Tangerine yellow Ladybug

We would like to see logs and files that are generated from the Nextflow jobs (launched manually or from the platform) to be archived into a dedicated Tower S3 bucket. The reason for this is working directories are purged are a certain amount of time, and so is cloud watch. If the run stays in tower, the logs should too.

This should include logs, timeline, reports, etc. Anything shown in the platform should survive work directory purging.

January 24, 2024

Yellow sunshine Firefly

I strongly agree. Here is a design idea:

As part of the "Tower Platform" infrastrucutre is creates and owns an S3 bucket. When a pipeline is completed, it copies the log files from the work directory, to thw Tower managed S3 bucket. It then assoicates all log references for the run in Tower, to the files in S3. This will allow the work directory to be deleted, but all logging information is retained. When the run is deleted from Tower, Tower would then delete the log files from the Tower S3 Logs Bucket.

Mighty Bird

I strongly second this one.

→