Documentation
User Documentation
New alarm can be created in three simple steps. First register it using REST API, add the alarm generating code to your program, create a cron that executes your program periodically.
Registering Alarm
Before registering a new kind of alarm, look at the classification tree of the current alarms and decide if there is already a category and subcategory where the new alarm would naturally fit. Once you have decided on catagory and subcategory, you should create a small json document in this form:
{
"category": "Analytics",
"subcategory": "Frontier",
"event": "Failed queries",
"description": "Code running every 1h at UC k8s cluster. Checks all servers for: Rejected queries - server is busy and doesn't respond to the query; DB disconnections - the query was processed by the Frontier server but the Oracle DB terminated the connection. The code can be found here: https://github.com/ATLAS-Analytics/AlarmAndAlertService/blob/master/frontier-failed-q.py",
"template": "Servers with failed queries:\n%{servers}\n. Affected tasks: \n%{tasks}\n\tConsult the following link to get a table with the most relevant taskids https://atlas-kibana.mwt2.org:5601/s/frontier/goto/c72d263c3e2b86f394ab99211c99b613\n"
}
It is really important to write the description field as detailed as possible as that will help people getting alarm know how it got generated.
The template field defines text that will appear in the alert email.
So it should be brief, but give all the information that a person receiving it could judge how serious issue is, shoudl he/she forward it, and have enough information to either act on it, or start digging further into the issue.
All the instances of %{variable} will be changed with values of variable field given in alarms "source" field. At this moment only simple \n and \t are available for formating.
Once you have the json alarm description ready, you can simply post it to the Alarm And Alert Service
REST endpoint.Creating Alarm
Once Alarm has been configured, you may start creating them. Alarms are again simple JSON documents, and follow this form:
{
"category" : "network",
"subcategory": "perfsonar",
"event": "high packet loss",
"tag":"ps.sl",
"body":"ps_1 needs 10000ms",
"source": {
"src": "137.222.79.1",
"avg_value": 1.0,
"host_src": "dice-io-37-00.acrc.bris.ac.uk",
"site_src": "UKI-SOUTHGRID-BRIS-HEP"
}
}
In addition to alarm classification, it contains fields: - tag - a single keyword or a short array that a user can subscribe to. This helps limit number of alerts that a user will receive.
- body - a very brief description of the event.
- source - this optional field is a short map whose attributes will be placed in the alert template.
Periodic execution
REST API
There are two main parts of API. One concerning users, and one concerning alarms. While not in its final form, it is already usable.
Alarm
List categories
GET /alarm/categories
returns JSON formated information on all currently registered alarm categories.
Registering Alarm
POST /alarm/category
BODY raw(JSON)
{
"category": "",
"subcategory": "",
"event": "",
"description": "",
"template": ""
}
Editing Alarm
PATCH /alarm/category
BODY raw(JSON)
{
"category": "",
"subcategory": "",
"event": "",
"description": "",
"template": ""
}
Creating Alarm
POST /alarm/
BODY raw(JSON)
{
"category" : "",
"subcategory": "",
"event": "",
"tag":"",
"body":"",
"source": {"xxx": "", "yyy": "", ...}
}
List Alarms
POST /alarm/fetch
BODY raw(JSON)
{
"category" : "",
"subcategory" : "",
"event" : "",
"period" : 6
}
Delete Alarm Category
DEL /alarm/
BODY raw(JSON)
{
"category" : "",
"subcategory" : "",
"event" : ""
}
Unregisters alarm event. All alarms created are deleted. Use with caution!
User
All Users Information
GET /user/
returns JSON formated information on all users.
User Information
GET /user/:userId
Returns JSON formated information on a specific user, preferences and subscriptions.
Full Unsubscribe
DEL /user/:userId
completely deletes a user from system.
Update User preferences
POST /user/preferences/:userId
BODY raw(JSON)
{
"vacation" : true
}
Updates user preferences. Only preferences supported by the deployed system are allowed.
Update subscriptions
POST /user/subscriptions/:userId
BODY raw(JSON)
[
{
"category": "",
"subcategory": "",
"event": "",
"tags": ["xxx","yyy"]
},
...
]
Updates user subscriptions. Body has JSON array of all alarms user will be subscribed. Only alarms currently configured in the system are allowed.
Heartbeat
List Categories
GET /heartbeat/categories
returns JSON formated information on all currently registered heartbeat categories.
Registering heartbeat
POST /heartbeat/register
BODY raw(JSON)
{
"category": "",
"subcategory": "",
"event": "",
"description": "",
"template": "",
"interval": 60,
"min_expected": 4
}
Editing heartbeat
PATCH /heartbeat/
BODY raw(JSON)
{
"category": "",
"subcategory": "",
"event": "",
"description": "",
"template": "",
"interval": 60,
"min_expected": 4
}
Creating heartbeat
POST /heartbeat/
BODY raw(JSON)
{
"category" : "",
"subcategory": "",
"event": "",
"tag":["",""],
"body":"",
"source": {"xxx": "", "yyy": "", ...}
}
List heartbeats
POST /heartbeat/fetch
BODY raw(JSON)
{
"category" : "",
"subcategory" : "",
"event" : "",
"period" : 6
}
Delete heartbeat Category
DEL /heartbeat/
BODY raw(JSON)
{
"category" : "",
"subcategory" : "",
"event" : ""
}
Unregisters heartbeat event. All heartbeats created are deleted. Use with caution!