Dynamic resource labels
planned
B
Brass Wildcat
Users frequently request the ability to associate cloud service provider tags with workflow runs. This feature aims to annotate resources utilized by workflow runs with cloud-specific tags, enhancing cost allocation, monitoring and tracking capabilities from the cloud provider's administration console.
The current limitations include the inability to add tags to Seqera Platform elements in shared workspaces. Additionally, resource labels lack parametric/dynamic behavior (such as
workflowId=${workflowId}
, making it challenging to track users and individual workflows for cost allocation reporting.The proposed change is two-fold:
- Add resource labels to items in shared workspaces.
- Introduction of template resource labels, allowing fixed names with flexible values at launch (e.g. ${workflowId}and${sessionId}). This enhancement significantly improves cost allocation tracking capabilities and management experience.
Implementing these enhancements aligns the resource label system with user expectations, providing a more flexible and powerful solution for resource annotation and tracking.
Rob Newman
Charcoal Mandrill This is currently planned, with the initial implementation and availability of both
${workflowId}
and the Nextflow ${sessionId}
resource label values. Later milestones will include additional dynamic resource label values.C
Charcoal Mandrill
Rob Newman thanks for the update. Here is the code from the nextflow.config that illustrates how we are currently implementing this for cost tracking, hoping that Platform could do it better.
// apply AWS Tags to the Batch Jobs
process.resourceLabels = {
// get the workspace ID from env var that Seqera Platform embeds in head job
// we have configured one workspace per tenant
def towerWorkspaceId = System.getenv("TOWER_WORKSPACE_ID")
// get the user provided tenant
def pipelineTenant = params.tenant.toString()
// check if its in the known workspace map
if(workspaceMap.containsKey(towerWorkspaceId)) {
pipelineTenant = workspaceMap[towerWorkspaceId]["tenantName"].toString()
}
def default_labels = [
// label from params value
pipelineSubmitter: params.submitter.toString(),
pipelineTenant: pipelineTenant,
// labels from head job environment variables
towerWorkflowId: System.getenv("TOWER_WORKFLOW_ID"),
towerWorkspaceId: towerWorkspaceId,
awsBatchJobId: System.getenv("AWS_BATCH_JOB_ID"),
// dynamic from pipeline task
pipelineProcess: task.process.toString(),
pipelineTaskTag: task.tag.toString(),
pipelineTaskCPUs: task.cpus.toString(),
pipelineTaskMemory: task.memory.toString(),
pipelineTaskAttempt: task.attempt.toString(),
pipelineTaskContainer: task.container.toString(),
// dynamic from workflow
pipelineRunName: workflow.runName.toString(),
pipelineSessionId: workflow.sessionId.toString(),
pipelineResume: workflow.resume.toString(),
pipelineRevision: workflow.revision.toString(),
pipelineCommitId: workflow.commitId.toString(),
pipelineRepository: workflow.repository.toString(),
pipelineName: workflow.manifest.name.toString(),
pipelineWorkDir: workflow.workDir.toString(),
pipelineLaunchDir: workflow.launchDir.toString(),
pipelineScriptFile: workflow.scriptFile.toString(),
pipelineUsername: workflow.userName.toString(),
pipelineHostname: localhostname.toString()
]
// import and parse a JSON string passed in for more resource labels
def jsonObj = new JsonSlurper().parseText(params.resourceLabels)
def updated_labels = default_labels + jsonObj
updated_labels = sanitizeAwsBatchTags(updated_labels)
return updated_labels
}
Rob Newman
planned
Rob Newman
evaluating
C
Charcoal Mandrill
not sure if this was covered in the described scope already, but in addition to the other comments, an important consideration for my own purposes has been the ability to include dynamic values in the resource labels. This is a code snippet I have been playing with for this;
// in the nextflow.config file
process.resourceLabels = { [
pipelineProcess: task.process.toString(),
pipelineUser: workflow.userName.toString(),
pipelineSessionId: workflow.sessionId.toString(),
pipelineResume: workflow.resume.toString(),
pipelineRevision: workflow.revision.toString(),
pipelineCommitId: workflow.commitId.toString(),
pipelineRepository: workflow.repository.toString(),
pipelineRunName: workflow.runName.toString(),
pipelineName: workflow.manifest.name.toString()
] }
However, I also want to expand this to include some of the following;
- metadata about the Tower execution, including the Tower username, Tower run ID, Tower pipeline used, etc..
- an arbitrary list of key:value pairs
The arbitrary list of resources labels would be provided outside of the Nextflow pipeline by the Tower User launching the pipeline from the web dashboard, via pipeline launch with the
tw
cli tool, and even by launching Nextflow directly without the involvement of Tower. Also, the intention of this "arbitrary list of resource labels" is to encompass things such as Cost Centers, as someone commented on, along with any other business-specific labels that are needed.Right now, I could hack it together by using some
params
field that accepts a list of mappings to append to the process.resourceLabels
. I can also hack it together by using a wrapper-script to dynamically create a Nextflow config file and then passing that in with the pipeline execution. However the code to implement these things gets pretty hairy. Some kind of consistent, built-in approach as described in this Feature Request sounds preferable.this is related to the ongoing efforts to allow for applying configs at the Compute Environment level ( https://feedback.seqera.io/feature-requests/p/allow-setting-nextflowconfig-settings-on-compute-environments-or-seqera-platform ) since some of these resource labels would also be specific per-compute environment, while others would not be and would be applied based on different criteria.
Rob Newman
acknowledged
Rob Newman
Merged in a post:
Is it possible to introspect a tower user in a workflow for tagging in Resource Labels?
A
Aquamarine Gibbon
Love the new resourceLabels directive. Curiosity question... Do you think it's possible when launching from tower to have users "username" added as a resourceLabel directive?
Maybe something along the line of:
process my_task {
resourceLabels user: "$workflow.userName"
'''
<task script>
'''
}
I guess I'm just not sure if/how one would access the users name when things were launched via Seqera Platform/Tower?
A
Amber Marlin
We would like to add cost centers to workflows/resources. Moreover we have a list (~70) such cost centers and would like our users to be
required
to enter one of the list of values when provisioning.C
Charcoal Mandrill
Amber Marlin this specific use case you describe also makes me wonder if something like this could be implemented via the nextflow_schema.json (giving a list of params and requiring a choice). This would only be applicable from the Seqera Platform front-end I think though and might need extra coding to enforce the requirement inside the pipeline. But it would be great if these kinds of features could be part of the Platform directly.
P
Prospective Parrotfish
+1 on this request. This would be a very useful feature to have and make tracking more robust.