"Dead man's" switch for in-flight compute jobs when the head node dies
planned
S
Shamrock green Gopher
Currently with AWS Batch executor if the head job dies (e.g. OOM), remaining in-flight compute jobs continue running and are not cancelled. Tower would ideally kill the child compute jobs when the parent has died (I believe it already tracks the native task IDs from batch).