Seqera Feedback Forum

Boards

Seqera Platform Feature Requests

Fusion Feature Requests

Wave Feature Requests

Seqera AI Feature Requests

Powered by Canny

Seqera Platform Feature Requests

Anonymous

Feature requests for the Seqera Platform (https://cloud.seqera.io)

Datasets improvements (M1): User experience improvements

Datasets within Seqera Platform facilitate structured handling of input sample sheets required for genomics pipelines such as RNA-seq. Currently, researchers often face friction in assembling these datasets, needing to prepare CSV files externally. This could be improved by the following: 1. Remove/Increase Dataset limit: Users are currently limited to 1000 datasets per workspace. Action: Raise or remove limit 2. Improved Dataset listing: Currently, the dataset listing is relatively basic and low density. Around 6 datasets can be viewed in a typical desktop browser window size. This density plus lack of tooling hinders users from effectively navigating or identifying datasets quickly. Action: A metadata-rich table view, such as those used on the Runs page and elsewhere would be preferable. Such a table could include the dataset name, number of rows, author, creation date, last used date, and potentially the start of the description. The table will be sortable. 2. Dataset creation from Pipeline Launch UI: Currently, creating datasets requires users to navigate away from the pipeline launch page, disrupting workflow and causing friction. Action: Integrate dataset creation directly within the pipeline launch interface, users can seamlessly upload or enter sample data without leaving their primary task. 3. Enhanced Dataset details user interface: Currently, viewing dataset details emphasizes metadata over actual dataset content, causing users unnecessary scrolling and inefficiency. Action: Prioritizing the dataset’s actual content prominently at the top ensures users quickly verify the dataset's accuracy and completeness, reducing errors and improving productivity. 4. "Archive" or "deactivate" a Dataset: Datasets are often used once or twice, and then no longer actively needed. For GxP/clinical environments, the dataset should not be deleted/removed, but made inactive/disabled/"archived"/"deactivated". This allows inspection of the dataset, but it would be excluded from any pipeline launches. The table of datasets can be filtered to remove inactive/disabled/"archived"/"deactivated" entries. Action: Allow datasets to be tagged as “inactive/disabled/archived/deactivated” and allow filtering of datasets to show/hide archived entries. 5. Import or link a Dataset from a URL: Datasets are becoming increasingly available via URL links. Currently, users are forced to fetch locally and add them, adding friction and wasting storage, and the original reference of the source is lost. Action: Supporting direct URL import/linking. 6. Keep a record of Dataset usage in Runs: Currently, it’s difficult / impossible to know if a dataset has ever been used. This makes their utility post-usage very limited. Action: With a record of Dataset usage within Run history, Datasets suddenly become a powerful tool for the user. They act as a rich history of run inputs, agnostic to the specifics of pipeline design and file usage. 7. Improve how Dataset versioning works: A user should be able to choose any dataset and version as the source of a pipeline run, and that dataset and version is displayed in the pipeline Run details page in the “Datasets” tab correctly. Additional potential milestones Integration with new Nextflow data lineage

planned

LIMS 'lite' capabilities available via Seqera Platform

We do not possess a full-blown LIMS and hope the Seqera Platform could provide some necessary dataset management features. The Datasets file management seems like a step in that direction, but we'd need to be able to search by field, automate dataset creation/deletion and map sequencing runs to samples . Specifically, we want to: Organize datasets by project (hierarchical or by label). Link from datasets to analysis runs on that data. A way to associate metadata (a location or a reference ID to another database) with a dataset as a whole, rather than each line inside the CSV file, and then manage that through the CLI/API and use it in a Nextflow pipeline run.

acknowledged

Use text boxes to supply simple file content to pipelines

Uploading files to Seqera Platform datasets to run a workflow with simple file content adds an unnecessary complexity to our usage patterns. This is particularly pronounced when using the fetchngs workflow, for example where a single identifier may be all that's required. We still have to write that identifier to a file and upload it to a dataset before it can be used in a workflow. It would be great if a text box in the UI could be optionally used to supply file content, written to a file (and dataset?) behind the scenes, resulting in a reduced number of steps to run workflows.

acknowledged

Datasets improvements (M2): Creation from pipeline launch and inline editing

Datasets within Seqera Platform facilitate structured handling of input sample sheets required for genomics pipelines such as RNA-seq. Currently, researchers often face friction in assembling these datasets, needing to prepare CSV files externally. This could be improved by the following: 1. Dataset creation from Pipeline Launch: Currently, creating datasets requires users to navigate away from the pipeline launch page, disrupting workflow and causing friction. Action: Integrate dataset creation directly within the pipeline launch interface, so users can seamlessly upload or enter sample data without leaving their primary task. 2. Manual inline Dataset editing: Users currently must create or edit sample sheets externally, and upload these files to create or edit Datasets, which is inefficient. Action: An embedded spreadsheet editor within the platform would allow users to quickly and intuitively enter data and make edits directly.

evaluating

Datasets improvements (M3): Schema validation and data format expansion

1. Dataset schema validation at launch: Nextflow workflow schema files (e.g. RNASeq) can reference specific schemas for input files. Action: Only datasets that match the input file schema could be selected and/or launched. 2. Dataset schema validation on creation: Automatic schema validation for datasets can be achieved by either building our own, or using a third-party tool. Action: Users with elevated permissions can define one or more valid schemas, and each dataset added to the workspace is first validated against the schema before being approved for upload to Seqera Platform. 3. Support for Parquet, YAML and JSON formatted data: Support for non-tabular datasets (eg. arbitrary Parquet, YAML and JSON) is useful for users. Action: Allow upload and editing of Parquet, YAML and JSON datasets. Additional potential milestones Integration with new Nextflow data lineage

evaluating

Powered by Canny