Add datasets directly from s3 / data explorer to the platform
in progress
Z
Zircon blue Sturgeon
I am testing the seqera kit command-line utility to add datasets to the platform, and realized that most of the datasets are stored in s3. So I tried:
tw datasets add s3://xxx/samplesheet.csv --name cool-data --header --workspace $SEQERA_ORGANIZATION_NAME/$SEQERA_WORKSPACE_NAME
> ERROR: File path 's3://xxx/samplesheet.csv' do not exists.
Would it be possible to copy the dataset directly from s3 to the platform without downloading it first?
On that note, an even better feature, would be to use the "Data Explorer" interface to add Datasets. In "Data Explorer" it is already possible to browse s3 buckets, which in my case contain the data. Having an option to "Add as a dataset" would be a nice addition for when a computer savy user creates the dataset csv in s3 and someone more GUI-centric as the possibility of adding it on the platform.
Rob Newman
in progress
Rob Newman
evaluating
Z
Zircon blue Sturgeon
This is fantastic! I think the reason I missed it earlier when testing is that it takes a few seconds to list the buckets.
I would still keep my (non essential) request for you to consider adding a
tw datasets add
feature to work directly with s3 for scripting.Thanks!
Rob Newman
Rob Newman
acknowledged
Proposed solution provided with existing Data Explorer functionality
Rob Newman
Thanks for your feedback, Zircon blue Sturgeon . You're correct that "Data Explorer" is the best way to select and use S3-hosted datasets in your Seqera Platform Nextflow pipelines (without having to upload to the Datasets tool first). You don't need to copy the dataset directly to the platform - it is already available to your pipelines if you have Data Explorer enabled and you have valid AWS credentials to select your S3 bucket-hosted datasets.
In the pipeline launch form, when choosing your input data simply select the
Browse
button - this opens a modal window where you can choose your input data using the Data Explorer interface. This functionality has been available since Enterprise release 23.3.