Improve cost estimation to include storage and data transfer costs in the Seqera Platform
acknowledged
B
Brass Wildcat
To provide more accurate cost estimations for users running workflows, there is a need to enhance the platform's ability to estimate data storage (e.g., EBS) and ingress/egress transfer costs.
However, due to potential limitations of cloud providers and the fact that instances can host tasks from multiple workflows, ensuring the accuracy of these estimates may be challenging. The feasibility of this solution will need to be carefully assessed.
M
Michael Tansini
Merging this into Improve cost estimation to include storage and data transfer costs which covers the broader cost estimation improvement. Your votes have been transferred
M
Michael Tansini
Merged in a post:
Include worst-case scenario cost estimate
Edmund Miller
We have a compute environment (CE) that only has
c5n
type instances, because we common use just a 1:2 CPU to memory ratio for our processes. We ran nf-core/methylseq
in this CE and the bwamem_align
step requests 8 CPUs and 64GB of RAM. We noticed it was running on a c5n.18x
and figured out that was because we only let the compute environment run with c5s
. It was only using 8 CPUs per instance instead of the 32. Just a sanity check: to avoid this mismatch of resource requests should we allow the CE to use
M4
, C4
, and R4
instance types? The other option being that we tune the requests per process from the nf-core pipelines. We noticed an inconsistency with the Seqera Platform cost estimate. It estimated the cost of a 8CPU/64GB job, even though it ran on a 72CPU/192GB box(across multiple jobs) and the majority of the huge instance sat there idle. So the reported cost was $39. But our AWS bill for that time period was much higher.
Could it make sense to also show a 'worst-case-scenario' cost? (price of the instance)?
Or has there been any thought to resolve the direct cost across all tasks and running instance?
Does this cost estimate also include any other AWS fees? For example disk, data transfer, etc?
Rob Newman
Merged in a post:
Roll up billing/cost per workgroup or per tag
E
Eggshell Anglerfish
I would like a way to easily roll up cost(s) of a given project:
- Show me total cost for workspace XYZ over date-range YYYY-MM-DD -- YYYY-MM-DD
- Show me total cost for all workflows with tag/label XYZ
Ideally even showing cost(s) broken down into various buckets:
- Compute
- Disk (EBS)
- Transfer
Rob Newman
marked this post as
acknowledged
I
Indigo Wildcat
I've run into similar issues too, where the reported cost of the job in Seqera Platform does not match what I see in actual AWS costs--the differences can be dramatic (2x or higher). Storage costs are some of the discrepency (would be great to track those if possible), but I think most of it is due to inefficient packing of jobs into instances (the scenario described above or a single, small long-running job forcing a larger instance to stay active). Is there a way to track the total costs for all the instances used for the job?
Rob Newman
Currently the cost estimate only includes compute, it does not include storage or networking.