Server and Service Monitoring with TICK
Content of the article
In this article, I would like to consider a tool that allows you to facilitate the work of the system administrator and make monitoring client servers more cost-effective and resultative. The overview will be helpful for System administrators, DevOps specialists, customers who cooperate with IT services, developers.
But we successfully use TICK Stack. It is a free program, easy to use and flexible in set up, with an aggregation mechanism and convenient individual selection features. And as need simple functions to handle our routine, we don't require complex and expensive solutions.
Tick Stack Structure
Consider each component.
Telegraf is part of the TICK Stack and is a plugin-driven server agent for collecting and reporting metrics. Telegraf has integrations to source a variety of metrics, events, and logs directly from the containers and systems it’s running on, pull metrics from third-party APIs, or even listen for metrics via a StatsD and Kafka consumer services. It also has output plugins to send metrics to a variety of other data stores, services, and message queues, including InfluxDB, Graphite, OpenTSDB, Datadog, Librato, Kafka, MQTT, NSQ, and many others.
InfluxDB is a Time Series Database built from the ground up to handle high write & query loads. InfluxDB is a custom high-performance datastore written specifically for timestamped data, including DevOps monitoring, application metrics, IoT sensor data, and real-time analytics. Conserve space on your machine by configuring InfluxDB to keep data for a defined length of time, and automatically expiring and deleting any unwanted data from the system. InfluxDB also offers a SQL-like query language for interacting with data.
Chronograf is the administrative user interface and visualization engine of the platform. It makes the monitoring and alerting for your infrastructure easy to set up and maintain. It is simple to use and includes templates and libraries to allow you to rapidly build dashboards with real-time visualizations of your data and to easily create alerting and automation rules.
Kapacitor is a native data processing engine. It can process both stream and batch data from InfluxDB. Kapacitor lets you plug in your own custom logic or user-defined functions to process alerts with dynamic thresholds, match metrics for patterns, compute statistical anomalies and perform specific actions based on these alerts like dynamic load rebalancing. Kapacitor integrates with HipChat, OpsGenie, Alerta, Sensu, PagerDuty, Slack, and more.
How it works
Customizable Dashboard system
This is a home dashboard. It shows the status of notifications from the timeline, so the user can see when and what events occurred in the system.
With the tool, we can check CPU, RAM. HDD space, network traffic intensity, IIS connections, site requests, IIS web cache, MS SQL (r\w, per each of DB), rabbitmq.
If you require some extra functionality. You can upgrade the tool with some extra plugin from the list of opportunities.
Server system characteristics
This board shows main system server statuses such as CPU Usage, Disk usage, Lead, Memory Gigabytes Used, Network Mb etc.
This is a Linux dashboard showing the connection status, the free RAM space of three servers and the number of active connections on the web server*.
*The last one is not a standard setting, but we require it to monitor how many people are coming to the site. And thus we are able to check it in such a convenient way.
Chronograph allows you to make unique samples from the database, which are necessary for a specific dashboard.
We can build a unique dashboard that will display only the priority information for the user. The intuitive interface has a choice of the data from a query.
Above there are the diagrams of two Windows servers: the processor, and free RAM. Thus three graphs with three parameters for two servers (six parameters) are shown on one screen.
Dependencies and Forecasting
The system also allows you to check the history of value changes for the selected period, in order to identify dependencies, causes, and in some way to predict the situation.
For example, you can analyze the chosen timeline period graph, to check what has happened during the day, weekend, month.
Easy and customizable alerts system
TICK allows you to set up a notification system via convenient channels that will respond to the tasks of a specific monitoring and support system.
Alerts can be sent via Slack, PageDuty, Telegram, etc.
Kapacitor allows you to customize a flexible alert system in which you can selectively notify the system administrators \ developers \ customers prioritizing the alerts value. It allows executing alerts scripts that can automate the repair of some breakdowns, such as buffer overflows.
This is a classification of alerts for notification and customisable system of messages prioritisation (low and high priority).
An alert appears in a special Slack channel. It is quite handy because you don’t need to open an extra app or browser. Slack is widely used.
Checking the alert status user can create a unique message, including information about where and what to fix. It is quite effective for teamwork.
As I’ve mentioned TICK can prioritize the alerts. So you can filter non-urgent alerts for night hours. For non-working hours you can receive only highly important alerts and be sure that you will not miss a chance to save the infrastructure or company reputation. And quickly react if the Galaxy is in danger even at 2AM.
Who can use the TICK?
The Tick is a helpful tech stack for:
A very useful tool for system administrators, as it aggregates information from multiple hosts in one place, with the possibility of flexible dependency selection and visualization. Notifications save administrator’s time and help to respond only to highly relevant notifications.
The tool can be used by customers to identify the low efficiency of individual sections of the infrastructure, predict loads and optimize the strategies of server park expanding. The customer can be aware of how efficiently resources are used, tailoring them according to needs.
The system allows you to monitor a large fleet of servers. The deployment method is quite simple and can be automated by ansible or another aggregator. The metrics selection feature allows you to create significant graphs, quickly analyse the timeline and predict the risks.