Error Handling


# Debugging

Errors are automatically reported on the Dashboard and categorized by workflow or task and error type

Drill down to see error messages, stacktrace and more
Pause, retry or kill the workflow or tasks

Failures are identified and reported on our Zenaton Dashboard . In case of a failure during a running instance, we can quickly see the details by clicking on the failed task. The task detail will appear with stacktrace.

stacktrace

# Errors Tab

On the Zenaton Dashboard, the Errors Tab shows a list of all failed tasks, whether there are standalone tasks or included workflows.

errors-tab

# Errors details

When we click on an error class, we have all the details about failed tasks and have the opportunity to retry all manually.

errors-detail

# Retry

When using Zenaton we can retry failed tasks manually on the dashboard or automatically by writing it into the code.

# Manual Retry

Login to the dashboard and retry failed tasks from the workflow tab or on the Errors Tab. On the errors tab, we can retry an individual instance of an error or 'retry all' occurrences.

# Automatic Retry

Automatic retry require the Zenaton Agent version 0.8.0 and the Zenaton library version 0.6.3.

We can build in automatic retries for standalone tasks or tasks that are part of a workflow. Tasks can be retried automatically after a specified delay. To enable automatic retries for a task, we must implement a onErrorRetryDelay method into our task.

The onErrorRetryDelay method receives the error as its first parameter and returns a positive number representing the delay in seconds to wait before the next try.

We can access the execution context of the task using the context property. It will allow us to implement any retry strategy we need.

Here is an example of a task which will automatically be retried at most 3 times, increasing the delay time between each try:

const { task } = require("zenaton");

module.exports = task("SimpleTask", {

    async handle() {
        // [...] task implementation
    },

    onErrorRetryDelay(exception) {
        // The retry index starts at 1 and increases by one for every retry.
        // This can be used to to increase the time between each attempt.
        const n = this.context.retryIndex;
        if (n > 3) {
            return false;
        }

        return n * 60;
    }
});

We can implement the onErrorRetryDelay method in any manner that suits our needs. Here is an example of an exponential-backoff strategy that is widely used:

onErrorRetryDelay(exception) {
    const n = this.context.retryIndex;
    const rand = (min, max) => Math.floor(Math.random() * (max - min + 1)) + min;

    return n <= 12 ? 5 * rand(0, 2 ** n) : false;
}

When a task has a configured automatic retry, it will still be displayed as an error on the dashboard and we will still have the option to manually retry it on the dashboard. This would allow us to quickly retry a task rather than wait for the automatic retry.

A task can be retried a max of 100 'tries' using automatic retry. When this limit is reached, the task will still be displayed in the list of errors on the dashboard and we will have the option to retry it manually. When a task is manually retried, the automatic retry count restarts.

# Alerting

We will receive alerts whenever an error occurs for a task or workflow so that we can log into our dashboard and investigate or retry the task or resume the workflow.

According to our alerting preferences we can receive the following emails:

Immediate email for the first daily occurrence of a task or decision error (including timeouts)
A daily summary of all the errors from the day before, if any

# Timeouts

Timeouts can occur for different reasons - lack of response from an API or launching workflows without the sources, etc. If there is a timeout error for our tasks or workflows it will appear in the list of errors where we can see the details.

Note that a timeout occurs if a task lasts more than 5 minutes (max processing time) or if a decision lasts more than 30 seconds.