proxysql/doc/monitor.md

Monitor Module
==============

This section focus on Monitor v1.2.1, as it introduces multiple improved compared to v1.2.0


Variables removed as unused or deprecated:
* mysql-monitor_query_variables
* mysql-monitor_query_status
* mysql-monitor_timer_cached

Variables currently not in use:
* mysql-monitor_query_interval
* mysql-monitor_query_timeout

Overview
--------
Monitor Module is responsible for a series of check against the backends.
It currently supports 4 types of checks:
* **connect** : it connects to all the backends, and success/failure is logged in table `mysql_server_connect_log`;
* **ping** : it pings to all the backends, and success/failure is logged in table `mysql_server_ping_log` . In case of `mysql-monitor_ping_max_failures` missed heartbeat, sends a signal to MySQL_Hostgroups_Manager to kill all connections;
* **replication lag** : it checks `Seconds_Behind_Master` to all backends configured with `max_replication_lag` greater than 0, and check is logged in table `mysql_server_replication_lag_log`. If `Seconds_Behind_Master` > `max_replication_lag` the server is shunned until `Seconds_Behind_Master` < `max_replication_lag` ;
* **read only** : it checks `read_only` for all hosts in the hostgroups in table `mysql_replication_hostgroups`, and check is logged in table `mysql_server_read_only_log` . If `read_only=1` the host is copied/moved to the `reader_hostgroup`, while if `read_only=0` the host is copied/moved to the `writer_hostgroup` .


Variables
=========
General variables:
* *mysql-monitor_username*

  Specifies the username that the Monitor module will use to connect to the backend. The user needs only `USAGE` privileges to connect, ping and check `read_only`. The user needs also `REPLICATION CLIENT` if it needs to monitor replication lag.

* *mysql-monitor_password*

  Password for user *mysql-monitor_username*

* *mysql-monitor_enabled*

  It enables or disables MySQL Monitor. Since MySQL Monitor can interfere with changed applied directly on the Admin interface, this variable allows to temporary disable it.

Connect variables:
* *mysql-monitor_connect_interval*

  How frequently a connect check is performed, in milliseconds.

* *mysql-monitor_connect_timeout*

  Connection timeout in milliseconds. The current implementation rounds this value to an integer number of seconds less or equal to the original interval, with 1 second as minimum. This lazy rouding is done because SSL connections are blocking calls.

Ping variables:
* *mysql-monitor_ping_interval*

  How frequently a ping check is performed, in milliseconds.

* *mysql-monitor_ping_timeout*

  Ping timeout in milliseconds.

* *mysql-monitor_ping_max_failures*

  If a host misses *mysql-monitor_ping_max_failures* pings in a row, MySQL_Monitor informs MySQL_Hostgroup_Manager that the node is unreacheable and that should immediately kill all connections.
  It is important to note that in case a connection to the backend is not available, MySQL_Monitor will first try to connect in order to ping, therefore the time to detect a node down could be one of the two:
  * *mysql-monitor_ping_max_failures* * *mysql-monitor_connect_timeout*
  * *mysql-monitor_ping_max_failures* * *mysql-monitor_ping_timeout*

Read only variables:

* *mysql-monitor_read_only_interval*

  How frequently a read only check is performed, in milliseconds.

* *mysql-monitor_read_only_timeout*

  Read only check timeout in milliseconds.

* *mysql-monitor_writer_is_also_reader*

  When a node change its `read_only` value from 1 to 0, this variable determines if the node should be present in both hostgroups or not:
  * *false* : node will be moved in `writer_hostgroup` and removed from `reader_hostgroup`
  * *true* : node will be copied in `writer_hostgroup` and stay also in `reader_hostgroup`

Replication lag variables:

* *mysql-monitor_replication_lag_interval*

  How frequently a replication lag check is performed, in milliseconds.

* *mysql-monitor_replication_lag_timeout*

  Replication lag check timeout in milliseconds.

Other variables:

* *mysql-monitor_history*

  To prevent that log tables grow without limit, Monitor Module will automatically purge records older than *mysql-monitor_history* milliseconds. Since ping checks relies on history table to determine if a node is missing heartbeats, the value of *mysql-monitor_history* is automatically adjusted to the follows if less than it:
  * (*mysql-monitor_ping_max_failures* + 1 ) * *mysql-monitor_ping_timeout*


Main Threads
============
The Monitor Module has several internal threads. There are currently 5 main threads:
* Monitor: master thread, responsible to start and coordinate all the others
* monitor_connect_thread: main thread and scheduler for the connect checks
* monitor_ping_thread: main thread and scheduler for the ping checks
* monitor_read_only_thread: main thread and scheduler for the read only checks
* monitor_replication_lag_thread: main thread and scheduler for the replication lag checks
Up to version v1.2.0 the above threads but *Monitor* were also responsible to perform the checks

Thread Pool
===========
The implementation in v1.2.0 has a limitation with SSL implementation: with SSL, `connect()` is a blocking call, causing the threads to stall while performing the connect phase.
Version v1.2.1 tries to overcome this limitation with a new implementation. Now:
* *Monitor* initializes a Thread Pool of workers and creates a queue;
* *monitor_connect_thread*, *monitor_ping_thread*, *monitor_read_only_thread* and *monitor_replication_lag_thread* are producers that generate tasks and sent them to the workers using the queue;
* the workers process the tasks and perform the requires actions;
* if *Monitor* detects that the queue is growing too fast, it creates new temporary worker threads

Connection purging
==================
Monitor implements its own connection pool. Connections that are alive for more than 3 * `mysql_thread___monitor_ping_interval` milliseconds are automatically purged

wait_timeout
------------

To prevent that backends terminated connections, Monitor module automatically configures `wait_timeout` = `mysql_thread___monitor_ping_interval` * 10