Seqera Platform Feature Requests

Anonymous

Feature requests for the Seqera Platform (https://cloud.seqera.io)
TowerForge needs to be able to set jobStateLimitActions in AWS Batch Job Queue
We have issues where user submitted pipelines running in AWS Batch end up with misconfigured resource requests. This is especially the case when Nextflow pipelines use dynamic directives to automatically increase their CPU or Memory resource requirements on subsequent retried pipeline tasks. User's Nextflow tasks end up eventually requesting more CPU or Memory resources than is available by any of the EC2 types in the AWS Batch Compute Environment. This causes the jobs to get stuck in a RUNNABLE state. However, worse is that this causes the entire AWS Batch Job Queue to grind to a halt and stop running any Jobs at all. This is not a mistake on AWS's part, this is the accepted and expected behavior for AWS Batch; a single misconfigured Batch Job halts all job execution on the entire Job Queue. See the docs here; https://docs.aws.amazon.com/batch/latest/userguide/job_stuck_in_runnable.html The "solution" for this from AWS is that you need to update your AWS Batch Job Queue with jobStateLimitActions.action in order to have jobs stuck in RUNNABLE state automatically get canceled after some time limit if they are stuck due to MISCONFIGURATION:COMPUTE_ENVIRONMENT_MAX_RESOURCE or MISCONFIGURATION:JOB_RESOURCE_REQUIREMENT . I am not aware of any alternative method for handling this, as this is the procedure recommended by AWS Support themselves. This is a problem because Seqera Platform's TowerForge interface for creating Compute Environments with AWS Batch via Platform does not appear to have any way to add these jobStateLimitActions. We cannot apply the jobStateLimitActions in a post-hoc manner ourselves because the TowerForge CE's are being actively managed by Platform, via the Platform UI, tw , and seqera-kit . We cant go making custom changes to the CE outside the scope of what TowerForge has set up since that breaks any notion of consistency in our deployment via "Infrastructure as Code". We also cannot even make these Job Queue changes ourselves since the AWS environment is managed and requires escalation and support tickets to IT to make such changes, which is infeasible if we need to re-deploy our CE's repeatedly via tw / seqerakit , etc.. Ultimately these features need to be baked into TowerForge so that we can apply this in our scripted Platform configs.
3
·

acknowledged

Improving credentials validation in Seqera Platform/Tower
The inability to apply correct Git credentials is frequently reported by Seqera Platform users. This can cause pipeline executions to fail because Nextflow cannot successfully pull the pipeline code from the Git repository. Additionally, Seqera platform currently lacks real-time updates for compute environment (CE) statuses, resulting in CE being marked as "Available" even when they are not. This leads to a number of failed jobs or jobs remaining in a submitted status indefinitely. We aim to enhance this user experience by adding a credentials validation feature to the platform: Validation on Credentials Creation: When a user adds new Git credentials, the system must provide a test URL. The system will use this test URL to validate the credentials and either accept or reject them. Add a Validate Action in the Credentials List Page: Users will have the ability to validate their credentials and check if they remain valid over time. Implement System Periodic Checks: The system will periodically validate the credentials to ensure they are not expired. When the system detects non-functional credentials, a warning alert will be displayed on the list page, and, optionally, an email notification can be sent. Pipeline Git Credentials Reporting: When launching a pipeline, the platform will report which Git credentials have been applied to access the pipeline Git URL or display an error message if no credentials are found. Add a "refresh/check connectivity" button on the CE page to validate the actual availability of each compute environment. Additionally, leveraging periodic checks can ensure status accuracy without placing the entire responsibility on administrators.
1
·

acknowledged

Load More