Add instance ID to Nextflow & Seqera Platform reports | Voters

Add instance ID to Nextflow & Seqera Platform reports

acknowledged

Tangerine yellow Ladybug

We would like to have the Instance ID in the report details. I would assume that the Seqera Platform already gets the Instance ID when looking up the ECS Instances to get pricing.

Having Instance ID in this view would allow us to troubleshoot issues. Currently when we get an "Out Of Space" notification, we cannot correlate that with the logs of a terminated instance to further look into the issue. We have to relaunch and hope to catch the issues in the act to find the instance to get logs from.

January 30, 2024

Charcoal Mandrill

I was revisiting this once again because this situation comes up so frequently ; either I or users have pipelines running in Seqera Platform, with AWS Batch, and need to find out details related to the EC2 instance that the Job is running on. As described here already, the AWS Batch Job ID is reported by both Nextflow and Seqera Platform as the native_id
 value. In order to find the EC2 instance ID that the Job is running on, I had to use this method;
JOB_ID="$1"
# find the ECS Container Instance from the Batch Job
ECS_CONTAINER_INSTANCE="$(aws batch describe-jobs --no-cli-pager --jobs "$JOB_ID" --query 'jobs[0].container.containerInstanceArn' --output text)"
printf "ECS_CONTAINER_INSTANCE:\n$ECS_CONTAINER_INSTANCE\n\n"
# get the name of the ECS Cluster that the ECS Instance is in
ECS_CLUSTER="$(echo "$ECS_CONTAINER_INSTANCE" | cut -d '/' -f2)"
printf "ECS_CLUSTER:\n$ECS_CLUSTER\n\n"
# look up the EC2 that the ECS instance is running in
EC2_ID="$(aws ecs describe-container-instances --no-cli-pager --container-instances "$ECS_CONTAINER_INSTANCE" --cluster "$ECS_CLUSTER" --query "containerInstances[0].ec2InstanceId" --output text)"
printf "EC2_ID:\n$EC2_ID\n\n"
# look up details about the EC2
aws ec2 describe-instances --instance-ids "$EC2_ID" --query 'Reservations[].Instances[].[InstanceId,State,PublicDnsName,Tags]'
# or use this;
#--filters "Name=instance-id,Values=$EC2_ID"
so save some version of this to a script and run it and you can get the EC2 ID of your running Batch Job in nextflow & Seqera Platform
Note that you might need a lot of IAM's to allow this depending on your setup
It would be fantastic if Seqera Platform and/or Nextflow could, somehow, just do this itself and include this information in the Seqera Platform Task View details 
maybe we can also get some version of this in one of those handy Seqera Blog Posts too or other docs :) 
Rob Newman Phil Ewels Rob Syme

Phil Ewels

Thanks for this Charcoal Mandrill, it's helpful. We're aware of this feature request and discussed it again recently. Hopefully it can find a place on the roadmap before too long.

Charcoal Mandrill

I did some investigation into this, since we have needed this feature for a long time, and it seems like a basic implementation of this is actually more trivial than I expected

This is a simple bash snippet that you can run inside a Nextflow job running in AWS Batch to retrieve almost all of the information that is being asked for in this feature request. This takes advantage of some default AWS Batch env vars, and the AWS IMDS service available in standard EC2 based jobs.

    # check if we are running in AWS Batch
    if [ -n "\${AWS_BATCH_JOB_ID+1}" ]; then
        echo "Job ID:          \${AWS_BATCH_JOB_ID:-none}"
        echo "Queue:           \${AWS_BATCH_JOB_QUEUE:-\${AWS_BATCH_JQ_NAME:-none}}"
        echo "Compute Environment:           \${AWS_BATCH_CE_NAME:-none}"
        echo "Definition:      \${AWS_BATCH_JOB_DEFINITION:-none}"
        echo "Attempt:         \${AWS_BATCH_JOB_ATTEMPT:-none}"
        aws batch describe-jobs --jobs "\$AWS_BATCH_JOB_ID" --output json > "batch.job.json"
    
        # wrap the JSON in a HTML so Seqera Platform can load it as a Report
        json2html.sh "batch.job.json" "batch.job.html"
        # try to access the AWS EC2 IMDS ; does not work on Fargate instances
        EC2_ID="\$(curl -s http://169.254.169.254/latest/meta-data/instance-id)"
        echo "EC2 Instance:    \$EC2_ID"
        curl -s \${AWS_CONTAINER_METADATA_URI_V4:-\${ECS_CONTAINER_METADATA_URI_V4}} | jq . > ecs.meta.json
        json2html.sh ecs.meta.json ecs.meta.html
        aws sts get-caller-identity --query Account --output text > account_id.txt
    else
        echo "AWS_BATCH_JOB_ID is not set ; not running in AWS Batch"
    fi

It seems like this could just be embedded inside the Nextflow

process.beforeScript

section if you want this to run at the start every task. Of course, if you are not using Fusion, you wont be able to see this information until the task completes. Not sure what the best way to handle that might be, could be possible to hard-code in an

aws s3 cp

upload of the data files into the work dir maybe so the information could be available while the task is running.

Considering this, it seems like this should have been relatively easy for Nextflow to have included itself in the .nextflow.run for collection by the parent pipeline process and thus inclusion in Seqera Platform. Or maybe its better to bundle it directly in to the Nextflow launch template.

Phil Ewels

I believe that Nextflow already reports the instance ID in trace reports - it's shown as native_id 
 (docs: https://nextflow.io/docs/latest/tracing.html#trace-report)
You can see this in the tasks table in Seqera Platform (see screenshot).
Is this enough Tangerine yellow Ladybug, or would you need it in the modal window as well? (Presumably not that difficult to add).

Charcoal Mandrill

Phil Ewels just to clarify, the ID shown here as the "native_id" in the format of "174357da-4761-44e2-bfc8-0e43c5b3979c" is not the EC2 instance ID. This appears to be the AWS Batch Job ID. Unfortunately, while that ID is helpful to an extent, its not enough to solve the problem described here, which is needing to access or investigate the EC2 instance itself for debugging etc.. Its clear that Seqera Platform must have access to the EC2 instance in order to display the other metrics such as the jobs' resource usages, but its simply not showing the EC2 instance ID.

considering the date of when this Feature Request was filed, I am not clear if this has been implemented in Platform yet or not? Its certainly been a headache when using Platform for quite some time now.

Rob Newman

marked this post as

acknowledged

Advisory Spoonbill

It's not that simple. Pricing is for a class of instances, not a specific instance. The instance ID would need to be queried for against the ECS service for every task run. To do this programmatically for every task would result in API rate limits on the account and possible job submission failures.

Fellow Bee

Advisory Spoonbill: Nextflow already does this

Rob Newman

Tangerine yellow Ladybug - This isn't currently available in Nextflow, but as a workaround have you tried running your pipeline with Fusion v2 to retrieve this information? Fusion allows using a S3 bucket as the pipeline scratch directory and the

.fusionfs.log

file of each task folder contains extensive details including the EC2 instance id, so you can identify which tasks run on each instance.

Yellow sunshine Firefly

Rob Newman: Thanks, Rob. We do not run fusion as we do not use Wave. We use our own entry point for AWS Batch and now log the instance ID as a workaround until it’s in Tower.