If, during a resumed run, a task is submitted for execution rather than picked up from the previous run's cache, it is challenging to know why Nextflow's task calculation changed between runs.
Nextflow's cache mechanism works by creating a unique fingerprint for each task - a hash calculated by hashing the inputs to the task, the process name, the script block, etc and then hashing those hashes.
Nextflow has the
dumpHashes = true
configuration option which prints each final task hash and each of the component inputs. These hashes allow users to identify which of the task inputs has changed between runs by comparing the hashes.
Unfortunately, this is turned off by default, so if a cache miss occurs, the user has to re-run the workflow two more times with the
dumpHashes = true
configuration enabled (so at least four runs in total) to get two runs where the hashes can be compared.
It would be a huge time saver if Nextflow could report the hashes to Platform on every run, and Platform include those hash details in the task details modal. This would make cache misses much more actionable and less opaque.