Ignore selected files from being sync with S3 in Fusion
acknowledged
A
Amused Mollusk
We observed that some tools, e.g.
sentieon driver ...
create many temporary files (>>1k) written and deleted during the run. Syncing these files with S3 takes a lot of computing time and often results in Fusion errors or process timeouts that require a process restart.I suggest creating a mechanism that will allow us to define (using pattern matching) which files are not synchronised by Fusion. This can be done via an ignore list, similar to the .gitignore file:
Rob Newman
Merged in a post:
Fusion not handling 0-byte S3 "folder" objects
F
Fair Penguin
When sequencing data is synced from instruments to S3 using NAS software, it often creates zero-byte objects that serve as folder markers (e.g., s3://bucket/run123/). These objects end in / and are not real files or folders, but Fusion currently treats them as files. As a result, when specifying such a prefix in a pipeline, Fusion sees it as an empty file and fails to run (e.g., demultiplexing pipelines fail due to “missing” data).
This behaviour makes it difficult to work with standard sequencing data pipelines that rely on folder structures in S3. Several of us were surprised by the lack of support for this in Fusion because it's not just a quirk of copying from a NAS to S3. It's a well-documented AWS feature: https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html. The AWS CLI handles folder objects in the expected way since pipelines not using Fusion are able to work with our instrument data.
Drew DiPalma
Merged in a post:
Ignore selected files from being sync with S3 in Fusion
A
Amused Mollusk
We observed that some tools, e.g.
sentieon driver ...
create many temporary files (>>1k) written and deleted during the run. Syncing these files with S3 takes a lot of computing time and often results in Fusion errors or process timeouts that require a process restart.I suggest creating a mechanism that will allow us to define (using pattern matching) which files are not synchronised by Fusion. This can be done via an ignore list, similar to the .gitignore file:
Rob Newman
acknowledged