Datasets improvements - user experience | Voters

Datasets improvements - user experience

complete

Rob Newman

Datasets within Seqera Platform facilitate structured handling of input sample sheets required for genomics pipelines such as RNA-seq. Currently, researchers often face friction in assembling these datasets, needing to prepare CSV files externally. This could be improved by the following:
1. Improved Dataset listing:
 Currently, the dataset listing is relatively basic and low density. Around 6 datasets can be viewed in a typical desktop browser window size. This density plus lack of tooling hinders users from effectively navigating or identifying datasets quickly. Action:
 A metadata-rich table view, such as those used on the Runs page and elsewhere would be preferable. Such a table could include the dataset name, number of rows, author, creation date, last used date, and potentially the start of the description. The table will be sortable.
2. Enhanced Dataset details user interface:
 Currently, viewing dataset details emphasizes metadata over actual dataset content, causing users unnecessary scrolling and inefficiency. Action:
 Prioritizing the dataset’s actual content prominently at the top ensures users quickly verify the dataset's accuracy and completeness, reducing errors and improving productivity.
3. "Show" or "hide" a Dataset:
 Datasets are often used once or twice, and then no longer actively needed. For GxP/clinical environments, the dataset should not be deleted/removed, but made hidden/"deactivated". This allows inspection of the dataset, but it would be excluded from any pipeline launches. The table of datasets can be filtered to remove "hidden"/"deactivated" entries. Action:
 Allow datasets to be tagged as “hidden” and allow filtering of datasets to show/hide entries.
4. Keep a record of Dataset usage in Runs:
 Currently, it’s difficult to know if a dataset has ever been used. This makes their utility post-usage very limited. Action:
 With a record of Dataset usage within Run history, Datasets suddenly become a powerful tool for the user. They act as a rich history of run inputs, agnostic to the specifics of pipeline design and file usage.
7. Improve how Dataset versioning works:
 A user should be able to choose any dataset and version as the source of a pipeline run, and that dataset and version is displayed in the pipeline Run details page in the “Datasets” tab correctly.
Additional potential milestones
Integration with new Nextflow data lineage

May 29, 2025

Rob Newman

updated the status to

complete

Available in Cloud today, part of the Enterprise v25.3 in early December 2025.

Rob Newman

Merged in a post:

Add labels to datasets for improved searchability

Canny AI

I would like the ability to add tags to datasets to make them more searchable, similar to how pipeline labels work. This would help in easily finding datasets, especially when working with large amounts of data.

September 29, 2025

Rob Newman

updated the status to

in progress

Rob Newman

Follow on milestones:

Rob Newman

updated the status to

planned

Rob Newman

Merged in a post:

Improvements to Dataset Metadata, Naming Limits, and Search Functionality

Neon carrot Eagle

Customer is requesting a few minor features when working with Datasets:

Additional metadata:

Date/time created
,
Author
Increase name length limit
(previously requested): Remove hard limit on the name length
Improve search functionality:
Search does not currently index
ID
or
Description
fields

July 7, 2025

Rob Newman

updated the status to

evaluating

Inquisitive Reindeer

Our Datasets needs are more directed to automation/ semi-automation. So for us things that are missing:

If possible an unlimited amount of Datasets would be great for traceability of workflows.
Navigation of the Datasets could use improving. It would be nice to be able to sort by added date (created/ modified) or alphabetic order. (if more Datasets are allowed this will be critical)
Pagination is also missed within a Dataset , as it is nice for quickly inspecting a samplesheet.

Rob Newman

Merged in a post:

Support for YAML in Datasets

Currently only CSV and TSV are supported in Datasets, but I have some data in YAML format, that I would love to upload and use.

May 29, 2025

Rob Newman

Merged in a post:

Datasets editing enhancements

Esha Joshi

Current Limitation: The Datasets platform currently lacks basic editing capabilities for samplesheets (TSV/CSV), requiring users to modify files externally when working with SRA Explorer imports or other data sources.

Requested Features:

Column Removal:
Allow users to drop unwanted columns from datasets/samplesheets. For example, this is essential for cleaning up SRA Explorer imports and making them compatible with pipelines (e.g. nf-core/rnaseq).
Header Name Editing:
Enable editing of column header names. Ties into the earlier point, will make it much easier to adapt samplesheets to be compatible with pipelines.
Manual Sample Entry:
Add ability to manually enter data via free text. This is useful for adding individual SRA samples or IDs without having to generate a text file externally then import it (e.g. "SAMN1000000" as a sample ID used in
nf-core/fetchngs
).
Provenance:
All edits should automatically create new versions in the dataset's history, maintaining a clear record of modifications and enabling users to revert changes if needed.

These features will significantly reduce the need for external file editing and streamline the workflow preparation process, especially when working with SRA Explorer data.

May 29, 2025

→