Documentation

User Documentation

New alarm can be created in three simple steps. First register it using REST API, add the alarm generating code to your program, create a cron that executes your program periodically.

Registering Alarm

Before registering a new kind of alarm, look at the classification tree of the current alarms and decide if there is already a category and subcategory where the new alarm would naturally fit. Once you have decided on catagory and subcategory, you should create a small json document in this form:

 {
    "category": "Analytics",
    "subcategory": "Frontier",
    "event": "Failed queries",
    "description": "Code running every 1h at UC k8s cluster. Checks all servers for: Rejected queries - server is busy and doesn't respond to the query; DB disconnections - the query was processed by the Frontier server but the Oracle DB terminated the connection. The code can be found here: https://github.com/ATLAS-Analytics/AlarmAndAlertService/blob/master/frontier-failed-q.py",
    "template": "Servers with failed queries:\n%{servers}\n. Affected tasks: \n%{tasks}\n\tConsult the following link to get a table with the most relevant taskids https://atlas-kibana.mwt2.org:5601/s/frontier/goto/c72d263c3e2b86f394ab99211c99b613\n"
}
It is really important to write the description field as detailed as possible as that will help people getting alarm know how it got generated. The template field defines text that will appear in the alert email. So it should be brief, but give all the information that a person receiving it could judge how serious issue is, shoudl he/she forward it, and have enough information to either act on it, or start digging further into the issue. All the instances of %{variable} will be changed with values of variable field given in alarms "source" field. At this moment only simple \n and \t are available for formating. Once you have the json alarm description ready, you can simply post it to the Alarm And Alert Service REST endpoint.

Creating Alarm

Once Alarm has been configured, you may start creating them. Alarms are again simple JSON documents, and follow this form:

{ 
    "category" : "network",
    "subcategory": "perfsonar",
    "event": "high packet loss",
    "tag":"ps.sl",
    "body":"ps_1 needs 10000ms",
    "source": {
        "src": "137.222.79.1", 
        "avg_value": 1.0, 
        "host_src": "dice-io-37-00.acrc.bris.ac.uk", 
        "site_src": "UKI-SOUTHGRID-BRIS-HEP"
    }
}
In addition to alarm classification, it contains fields:
  • tag - a single keyword or a short array that a user can subscribe to. This helps limit number of alerts that a user will receive.
  • body - a very brief description of the event.
  • source - this optional field is a short map whose attributes will be placed in the alert template.
The document should be sent to this REST endpoint.

Periodic execution

REST API

There are two main parts of API. One concerning users, and one concerning alarms. While not in its final form, it is already usable.

Alarm

List categories

GET /alarm/categories

returns JSON formated information on all currently registered alarm categories.

Registering Alarm

POST /alarm/category

BODY raw(JSON)

{
    "category": "",
    "subcategory": "",
    "event": "",
    "description": "",
    "template": ""
}

Editing Alarm

PATCH /alarm/category

BODY raw(JSON)

{
    "category": "",
    "subcategory": "",
    "event": "",
    "description": "",
    "template": ""
}

Creating Alarm

POST /alarm/

BODY raw(JSON)

{
    "category" : "",
    "subcategory": "",
    "event": "",
    "tag":"",
    "body":"",
    "source": {"xxx": "", "yyy": "", ...}
}

List Alarms

POST /alarm/fetch

BODY raw(JSON)

{
    "category" : "",
    "subcategory" : "",
    "event" : "",
    "period" : 6
}              

Delete Alarm Category

DEL /alarm/

BODY raw(JSON)

{
    "category" : "",
    "subcategory" : "",
    "event" : ""
}    

Unregisters alarm event. All alarms created are deleted. Use with caution!

User

All Users Information

GET /user/

returns JSON formated information on all users.

User Information

GET /user/:userId

Returns JSON formated information on a specific user, preferences and subscriptions.

Full Unsubscribe

DEL /user/:userId

completely deletes a user from system.

Update User preferences

POST /user/preferences/:userId

BODY raw(JSON)

{
    "vacation" : true
}

Updates user preferences. Only preferences supported by the deployed system are allowed.

Update subscriptions

POST /user/subscriptions/:userId

BODY raw(JSON)

[
    {
        "category": "",
        "subcategory": "",
        "event": "",
        "tags": ["xxx","yyy"]
    },
    ...
]

Updates user subscriptions. Body has JSON array of all alarms user will be subscribed. Only alarms currently configured in the system are allowed.

Heartbeat

List Categories

GET /heartbeat/categories

returns JSON formated information on all currently registered heartbeat categories.

Registering heartbeat

POST /heartbeat/register

BODY raw(JSON)

{
    "category": "",
    "subcategory": "",
    "event": "",
    "description": "",
    "template": "",
    "interval": 60,
    "min_expected": 4
}

Editing heartbeat

PATCH /heartbeat/

BODY raw(JSON)

{
    "category": "",
    "subcategory": "",
    "event": "",
    "description": "",
    "template": "",
    "interval": 60,
    "min_expected": 4
}

Creating heartbeat

POST /heartbeat/

BODY raw(JSON)

{
    "category" : "",
    "subcategory": "",
    "event": "",
    "tag":["",""],
    "body":"",
    "source": {"xxx": "", "yyy": "", ...}
}

List heartbeats

POST /heartbeat/fetch

BODY raw(JSON)

{
    "category" : "",
    "subcategory" : "",
    "event" : "",
    "period" : 6
}              

Delete heartbeat Category

DEL /heartbeat/

BODY raw(JSON)

{
    "category" : "",
    "subcategory" : "",
    "event" : ""
}    

Unregisters heartbeat event. All heartbeats created are deleted. Use with caution!