Datasets within Seqera Platform facilitate structured handling of input sample sheets required for genomics pipelines such as RNA-seq. Currently, researchers often face friction in assembling these datasets, needing to prepare CSV files externally. This could be improved by the following:
1. Remove Dataset limit:
Users are currently limited to 1000 datasets per workspace.
Action:
Raise or remove limit
2. Dataset creation from Pipeline Launch UI:
Currently, creating datasets requires users to navigate away from the pipeline launch page, disrupting workflow and causing friction.
Action:
Integrate dataset creation directly within the pipeline launch interface, users can seamlessly upload or enter sample data without leaving their primary task.
3. Improved Dataset listing:
Currently, the dataset listing is relatively basic and low density. Around 6 datasets can be viewed in a typical desktop browser window size. This density plus lack of tooling hinders users from effectively navigating or identifying datasets quickly.
Action:
A metadata-rich table view, such as those used on the Runs page and elsewhere would be preferable. Such a table could include the dataset name, number of rows, creation date, last used date, and potentially the start of the description.
4. Keep a record of Dataset usage in Runs:
Currently, it’s difficult / impossible to know if a dataset has ever been used. This makes their utility post-usage very limited.
Action:
With a record of Dataset usage within Run history, Datasets suddenly become a powerful tool for the user. They act as a rich history of run inputs, agnostic to the specifics of pipeline design and file usage.
5. Enhanced Dataset details user interface:
Currently, viewing dataset details emphasizes metadata over actual dataset content, causing users unnecessary scrolling and inefficiency.
Action:
Prioritizing the dataset’s actual content prominently at the top ensures users quickly verify the dataset's accuracy and completeness, reducing errors and improving productivity.
6. Manual inline Dataset editing:
Users currently must create or edit sample sheets externally, and upload these files to create or edit Datasets, which is inefficient.
Action:
An embedded spreadsheet editor within the platform would allow users to quickly and intuitively enter data and make edits directly.
7. Support for YAML and JSON formatted data:
Support for non-tabular datasets (eg. arbitrary YAML and JSON) is useful for users.
Action:
Allow upload and editing of YAML and JSON datasets.