Pipeline Data Lifecycle Management and Cleanup
acknowledged
Adam Talbot
This post consolidates multiple related requests around managing pipeline run data on the Seqera Platform, specifically in the areas of log persistence and cleanup of scratch/intermediate data.
- Persistent Logs After Run Completion
- Support for archiving essential execution data — including .log, .timeline, .shscripts, and reports — to a persistent S3 bucket (Tower-managed or user-defined).
- Allows run logs and metadata to remain visible in the Tower UI after work directories are deleted.
- Enables integration with custom lifecycle rules for scratch storage.
- Preserves artifacts critical for auditability, troubleshooting, and provenance.
- Manual Clean-Up via Tower UI (A “Clean” action in the run details interface to:)
- Trigger cleanup of intermediate files and caches (e.g., nextflow clean -l <run_name>).
- Reclaim storage post-execution while retaining archived logs.
- Offer fine-grained control over data cleanup, one run at a time.
- Auto Clean-Up Option at Launch
- A toggle in the pipeline launch form to enable automatic cleanup when a run completes.
- Useful for test or debug workflows where data retention is not required.
- Prevents accumulation of scratch data without manual intervention.
- Clean-Up on Run Deletion
- An optional checkbox when deleting a run to also remove associated scratch/workdirdata.
- Ensures that run deletion can include cleanup of underlying execution storage.
- Offers a single-step way to retire a workflow and its resource footprint.
Out of Scope
- Automatic deletion of archived logs when a run is deleted.
- Cleanup functionality for in-progress or running workflows.
- Advanced or rule-based retention policies; current scope is limited to manual actions and simple toggles.
Anton Tsyganov-Bodounov
Merged in a post:
Simpler Execution Log Persistence
B
Brass Wildcat
After a pipeline completes on Seqera Platform it can be tricky to clean up execution work folders (to save space) while preserving useful log files.
This workflow could be improved by enabling Seqera Platform/Nextflow to send all logs to separate storage – outside of the pipeline run work folder, but still accessible by Seqera Platform.
Anton Tsyganov-Bodounov
Merged in a post:
Clear workdir data for deleted Runs in Tower
B
Brass Wildcat
Currently, users have large amounts of data on their scratch/workdir spaces which are shared between compute nodes. It would be good to have a cleanup option for runs so that you can manually cleanup the scratch space for that run specifically.
- When a user deletes a run from tower, the associated workdir data are cleaned up from the clients system
- Suggestion: on delete, execute
nextflow clean
for the runY
Yinmn blue Partridge
This would be very useful in some form.
N
Net Ox
This is crucial since otherwise there is little point in a scatch directory. All files required to render the tower interface such as log files and the files required to render reports should be preserved in a place other than scratch.
B
Bronze Crow
How about a
Clean up after run
checkbox/toggle setting the pipeline launch parameters? It is not critical, but in some scenarios it could be useful to automatically clean up after the pipeline has finished. I am thinking about testing and debugging new pipelines where you want to make sure they are working well on their own, for example.Rob Newman
acknowledged
I
Indigo Wildcat
This would be great--we apply lifecycle rules to clear out our working directory after a few weeks, but would be great to retain nextflow logs elsewhere (not concerned about storage costs for them, likely to be negligible)
B
Brass Wildcat
Merged in a post:
Archive logs & files from jobs
T
Tangerine yellow Ladybug
We would like to see logs and files that are generated from the Nextflow jobs (launched manually or from the platform) to be archived into a dedicated Tower S3 bucket. The reason for this is working directories are purged are a certain amount of time, and so is cloud watch. If the run stays in tower, the logs should too.
This should include logs, timeline, reports, etc. Anything shown in the platform should survive work directory purging.
Y
Yellow sunshine Firefly
I strongly agree. Here is a design idea:
As part of the "Tower Platform" infrastrucutre is creates and owns an S3 bucket. When a pipeline is completed, it copies the log files from the work directory, to thw Tower managed S3 bucket. It then assoicates all log references for the run in Tower, to the files in S3. This will allow the work directory to be deleted, but all logging information is retained. When the run is deleted from Tower, Tower would then delete the log files from the Tower S3 Logs Bucket.
M
Mighty Bird
I strongly second this one.
Load More
→