Nextflow needs to be able to access metadata from Seqera Platform
acknowledged
F
Flamingo pink Python
Similar & related:
- https://feedback.seqera.io/feature-requests/p/ability-to-pass-data-from-the-oidc-user-token-info-into-the-compute-pipeline-con
- https://feedback.seqera.io/feature-requests/p/template-resource-labels
Right now, Nextflow has the built-in implicit
task
, workflow
, and nextflow
variables that provide important information inside the pipeline about the execution contexts. (docs: https://www.nextflow.io/docs/latest/script.html#implicit-variables )This should ideally be expanded, somehow (?), to give an easy and consistent interface to metadata about the Seqera Platform execution context. So that there could be perhaps an implicit variable such as
workflow.tower
or tower
which could include things like Run ID, User, Pipeline details, etc.. As the linked Feature Request details, this could also encompass things such as Okta auth user metadata provided by Seqera Platform. And as the other linked Feature Request details, this information would be critical for inclusion in things like resourceLabels
tagging for cloud resource labels and cost tracking. Along with whatever other purposes people typically use things like workflow
and nextflow
variables for. If there was some easy method to pass these critical metadata values into your Nextflow pipeline, then you could also easily leverage this by making Nextflow config files that utilize these metadata's in various ways at the config-level, so you wouldnt have to actually hard-code it into your main pipeline script. And these config files could be applied at the Compute Environment level ( https://feedback.seqera.io/feature-requests/p/allow-setting-nextflowconfig-settings-on-compute-environments-or-seqera-platform ) or based on other desired criteria.
I dont know what "Category" to apply for this Feature Request so please update the category here if you are able to, thanks.
-----
here is a tl;dr: with a real-world use case that would be able to utilize this to put all these things together
- make metadata available inside the Nextflow pipeline with some magic implicit variable
- use those implicit variables in a config file to apply things like resourceLabelsfor the AWS Batch cloud resource tagging
- stick those config files in the Compute Environment (in Tower) so that all pipelines executed on that CE automatically get their Batch jobs tagged
Rob Newman
Merged in a post:
Expose the Seqera Platform username to the Nextflow run
Rob Syme
We can the use the
resourceLabels
directive to attach labels to runs which are extremely useful for higher resolution cost tracking. A common request is to be able to see the cost of runs per user
. At the moment this is impossible because the Platform does not pass the username
to Nextflow.A quick win would be to set the SP
username
as an environment variable in the head job. This can be read by Nextflow and used as a resourceLabel
tag (if the user provided the appropriate configuration)F
Flamingo pink Python
similar to these?
https://feedback.seqera.io/feature-requests/p/nextflow-needs-to-be-able-to-access-metadata-from-the-tower
https://feedback.seqera.io/feature-requests/p/template-resource-labels
> A quick win would be to set the SP username as an environment variable in the head job. This can be read by Nextflow and used as a resourceLabel tag (if the user provided the appropriate configuration)
Does this feature already exist? Or is that part of the feature request as well?
I think if we are going to start introducing environment variables that can supply extra resourceLabels to append to the current set of resourceLabels in the pipeline, then it might be something similar to the demo here;
https://github.com/stevekm/nextflow-demos/blob/master/resourceLabels/nextflow.config
Instead of making a Nextflow env var only for the Seqera Platform username, might as well just use some sort of syntax for embedding a collection of metadata to an env var for ingestion by Nextflow. In that linked demo, a JSON string is used. As per this current feature request, maybe it could be like this;
export NXF_RESOURCELABELS='{"PlatformUsername": "my-name", "PlatformRunID":"123456", ... }'
if it was implemented something like this, it could be usable both by Seqera Platform in the current feature request but also for other use cases. It seems like the effort needed to implement such a feature in an open-ended extensible manner might not be much greater.
Ben Sherman
I think we could do this by exposing a function in the nf-tower plugin which provides some metadata object. Then you could include the plugin function and use it anywhere, rather than relying on an implicit variable
F
Flamingo pink Python
Ben Sherman that is an interesting idea, I did not know there was already a plugin for nf-tower. However I also wonder if using a plugin would make it difficult to use this metadata in the nextflow.config?
Ben Sherman
Flamingo pink Python That's a good point, in that case I would add a property to the workflow variable e.g. workflow.seqeraPlatform. We recently added similar properties for wave and fusion.
btw nf-tower is a core plugin in the Nextflow codebase, it contains the integration with Seqera Platform
Rob Newman
acknowledged