Datasets within Seqera Platform facilitate structured handling of input sample sheets required for genomics pipelines such as RNA-seq. Currently, researchers often face friction in assembling these datasets, needing to prepare CSV files externally. This could be improved by the following:
1. Improved Dataset listing:
Currently, the dataset listing is relatively basic and low density. Around 6 datasets can be viewed in a typical desktop browser window size. This density plus lack of tooling hinders users from effectively navigating or identifying datasets quickly.
Action:
A metadata-rich table view, such as those used on the Runs page and elsewhere would be preferable. Such a table could include the dataset name, number of rows, author, creation date, last used date, and potentially the start of the description. The table will be sortable.
2. Enhanced Dataset details user interface:
Currently, viewing dataset details emphasizes metadata over actual dataset content, causing users unnecessary scrolling and inefficiency.
Action:
Prioritizing the dataset’s actual content prominently at the top ensures users quickly verify the dataset's accuracy and completeness, reducing errors and improving productivity.
3. "Show" or "hide" a Dataset:
Datasets are often used once or twice, and then no longer actively needed. For GxP/clinical environments, the dataset should not be deleted/removed, but made hidden/"deactivated". This allows inspection of the dataset, but it would be excluded from any pipeline launches. The table of datasets can be filtered to remove "hidden"/"deactivated" entries.
Action:
Allow datasets to be tagged as “hidden” and allow filtering of datasets to show/hide entries.
4. Keep a record of Dataset usage in Runs:
Currently, it’s difficult to know if a dataset has ever been used. This makes their utility post-usage very limited.
Action:
With a record of Dataset usage within Run history, Datasets suddenly become a powerful tool for the user. They act as a rich history of run inputs, agnostic to the specifics of pipeline design and file usage.
7. Improve how Dataset versioning works:
A user should be able to choose any dataset and version as the source of a pipeline run, and that dataset and version is displayed in the pipeline Run details page in the “Datasets” tab correctly.
Additional potential milestones