Datasets improvements (M3): Schema validation and data format expansion
evaluating
Rob Newman
1. Dataset schema validation at launch:
Nextflow workflow schema files (e.g. RNASeq) can reference specific schemas for input files. Action:
Only datasets that match the input file schema could be selected and/or launched.2. Dataset schema validation on creation:
Automatic schema validation for datasets can be achieved by either building our own, or using a third-party tool. Action:
Users with elevated permissions can define one or more valid schemas, and each dataset added to the workspace is first validated against the schema before being approved for upload to Seqera Platform.3. Support for Parquet, YAML and JSON formatted data:
Support for non-tabular datasets (eg. arbitrary Parquet, YAML and JSON) is useful for users. Action:
Allow upload and editing of Parquet, YAML and JSON datasets.Additional potential milestones
- Integration with new Nextflow data lineage
Rob Newman
Follow on from M2
Rob Newman
evaluating