Agent Event Threshold Settings

Overview

One of the main functions of the controller is to keep track of connected agents and report when an agent starts to experience performance problems, logs many errors in a short time, or unexpectedly goes offline. Agent events are held in the controller's configured database in the agent_events table. Within this table, you can find three types of event categories: agent, metric, and task. The agent event category is used for major connectivity events, such as loss of connectivity from an agent. The metric event category is used for events that report abnormal agent health statistics, such as abnormally high CPU usage. The task event category does not affect metrics, and is detailed in the Gateway Task Results section.

General Settings

Alarm Evaluation

This section is for activity and metric events, you can configure alarms to trigger when an event is reported at the warning or error level. You can also set the alarm pipeline that will process the generated alarms.


Enable Activity Alarms : If true, alarms will be generated for agent activity events, such as when an agent stops responding.

Enable Metrics Alarms: If true, alarms will be generated for agent metric events.

Warning Priority: The priority assigned to all of the warning thresholds. Options are: Diagnostic, Low, Medium, High, Critical.

Error Priority: The priority assigned to all of the error thresholds. Options are: Diagnostic, Low, Medium, High, Critical.

Active Pipeline: The Pipeline assigned to all of the warning thresholds. Note that this Pipeline must be created before the alarm event happens, and that the name is case-sensitive.

Ack Pipeline: The Pipeline assigned to all of the error thresholds. Note that this Pipeline must be created before the alarm event happens, and that the name is case-sensitive.

Activity Monitor

The Activity Monitor configures how agent inactivity is reported. When contact is lost with an agent, an inactivity warning or error event is fired if the configured time in minutes has elapsed since last contact.


Inactivity Warning (Minutes) : The number of minutes before a warning threshold alarm is activated. (default 5)

Inactivity Error (Minutes) : The n umber of minutes before an error threshold alarm is activated. (default 15)

System Metric Tresholds

In addition to inactivity alarms, alarms can be set on all agents when certain metrics like CPU usage, number of clients, error rates, and more are reached. Each one has both a warning and an error level.


CPU Usage Warning (%) : The warning level of an Agent's CPU usage. (default 70)

CPU Usage Error (%) : The error level of an Agent's CPU usage. (default 90)

Memory Usage Warning (%) : The warning level of an Agent's memory (RAM) usage. (default 70)

Memory Usage Error (%) : The error level of an Agent's memory (RAM) usage. (default 90)

Errors Per Minute Warning : The warning level of an Agent's error rate (minute). The contents of the Agent's errors can be checked in the Agent's console. (default 2)

Errors Per Minute Error : The error level of an Agent's error rate (minute) . The contents of the Agent's errors can be checked in the Agent's console. (default 5)

Errors Per Hour Warning : The warning level of an Agent's hourly error rate. The contents of the Agent's errors can be checked in the Agent's console. (default 20)

Errors Per Hour Error : The error level of an Agent's hourly error rate. The contents of the Agent's errors can be checked in the Agent's console. (default 60)

Connected Clients Warning : The number of clients connected to that Agent required to raise a warning alarm. (default 50)

Connected Clients Error : The number of clients connected to that Agent required to raise an error alarm. (default 100)

Queries Per Sec Warning : The number of SQL queries per second from that Agent required to raise a warning alarm. (default 5)

Queries Per Sec Error : The number of SQL queries per second from that Agent required to raise an error alarm. (default 10)

Q uery Duration Warning (MS) : The average duration of a SQL query on an Agent (in milliseconds) to raise a warning alarm. (default 10000ms or 10 seconds)

Query Duration Error (MS) : The average duration of a SQL query on an Agent (in milliseconds) to raise an error alarm. (default 30000ms or 30 seconds)

DB Connections Warning : The number of currently active database connections on an Agent to raise a warning alarm. (default 8)

DB Connections Error : The number of currently active database connections on an Agent to raise an error alarm. (default 12)

Active Queries Warning : The number of currently active SQL queries on an Agent to raise a warning alarm. (default 60)

Active Queries Error : The number of currently active SQL queries on an Agent to raise an error alarm. (default 100)