Modern monitoring tools and methods to guarantee high availability

Subject: Cyber Security II (KB II), Department of Telecommunications, Faculty of electrical engineering and computer science, VSB-TUO

Name: Bc. Kryštof Šara (SAR0130)

Date of presentation: May 10, 2024

Task syllabus:

design and configuration of test topology using network monitoring tools - Zabbix, Nagios, including their configuration
description of the methods used to maintain high availability and their practical example

introduction

As far as modern computational systems are concerned, it is common to ensure those are operational no matter what happens. Those systems are commonly present in healthcare centres, military facilities, energy infrastructure companies, or government operations.

high availability

High-availability (HA) is a type of operation, where the system behaves, in ideal conditions, as unified even though it is composed of various backed up routes and networking and computational modules. Those systems are often called the failover clusters. Their main responsibility is to ensure continuous operation even if one part of the chain fails — ensuring so-called zero system downtime. In terms of the application layer of ISO/OSI model, those systems mainly act as load-balancers — evenly propagating ingress traffic between two or more edge nodes —, or traffic redirectors. [1]

Common parts of HA systems include:

redundancy
replication
failover
fault tolerance

monitoring of systems and services

Systems monitoring is a vital element when considering HA system deployment. Each component of such system should be properly configured, deployed, and ensured is monitored too. Monitoring should be centralized as the monitoring operator is to manage and control important system nodes and elements. Those systems consist of networking parts, and application parts to put it simply. [2]

It is common that monitoring centres (a.k.a. Network Operations Centres, NOCs) also maintain remote configuration of such primitives, backup statuses of important failover nodes, restore mechanisms and disaster recovery testing. Using monitoring, one should be able to check each critical part’s health and status to evaluate a possible incident impact. When the incident happens, monitoring centres are the very first ones to act and escalate such failure in the system chain. [2] [3]

Typical NOC operations include: [3]

installing and updating software on interconnected systems
data backing up and restoring
firewall and network software monitoring
software patching
network health and performance analysis
disaster recovery testing
rapid incident handling
downtime elimination
network optimization

Fig. 1: European Space Agency Network Operations Centre, (Photo: ©ESA) [4]

Moreover, systems monitoring include data (metrics) collection — actual system performance facts like CPU usage, RAM utilization, ingress and egress traffic and more. The very next step in metrics collection is the data vizualization using graphs, charts and diagrams. [2]

NOCs often cooperate with so-called SOC (Security Operations Centre) and help-desks in bigger facilities. SOC is however often the next part in the escalation matrix, mitigating denial of services and security attacks, as well as building protections and maintaining IPS/IDS deployments. As far as help-desks are concerned, operators there act like a communication platform for clients to answer their complaints and operational issues. [3]

The most common tools for business monitoring are these:

Zabbix
Nagios
Prometheus (+Grafana and Loki)

demonstration

In this chapter, a simple load balancing on the L4 ISO/OSI layer is to be implemented, to ensure high-availability for three redundant services — DNS server, web server, and mail server.

used hardware and tooling

Raspberry Pi 4B 8 GB
python3 3.11.2
mininet 2.3.0
Raspbian (bookworm) OS
iperf3 to evaluate the upstream link*
iptables
MariaDB server
nginx as web server
bind9 as DNS server
postfix as mail server
Zabbix tooling (mainly server and agent)

*The connection itself had been really hard to establish for iperf3 tool, therefore was not mentioned in the project any further.

topology

A simplified topology diagram can be seen in the Fig. 2. The whole network is connected to the internet via one link between root host (Raspberry Pi, upstream) and router R1. The three services are distributed redundantly behind routers R2 and R3 (and switches S2 and S3 respectively).

Nodes H1 and H2 are to be used for monitoring (Zabbix and Nagios).

Three main network subnet groups are utilized:

10.1.1.0/30
- the edge Internet connectivity link
10.1.2.0/30, 10.1.2.4/30
- core routers’ links
10.1.3.0/27, 10.1.3.32/27, 10.1.3.64/27
- hosts’ subnets, links to switches

Fig 2: Network topology diagram. All main links are shown. Links are (universally) assigned the nearby subnet’s IP addresses. Switches are switching the whole subnet.

mininet installation

The mininet package is available in the common Raspbian/Debian repositories, so after the repos update, we can directly install python3.11 and mininet packages.

1
2
apt update
apt install python3.11 mininet

python environment settings

To use the python lib for mininet, we need to setup a virtual environment (kb2_mininet) first. After the successful venv creation, we can proceed to actually use the environment and download the mininet lib there. The command sequence is below.

1
2
3
4
python3 -m venv kb2_mininet
source kb2_mininet/bin/activate

pip3 install mininet

mininet configuration using the python lib

Now the whole environment should be set up. It is time to compose the main network’s topology and logic startup script. Let this be named topo_kb2.py. The script is going to be divided into code blocks/listings with further description.

Note: there are sections of wider code syntax, therefore you can use Shift+mouse scroll to move to the side.

libs import

At the top of the script, let us define libraries to import. We are going to use the main() function as a whole (to be refactored into separated functions later).

1
2
3
4
5
6
7
8
9
from mininet.cli  import CLI
from mininet.link import Link
from mininet.log  import setLogLevel
from mininet.net  import Mininet
from mininet.node import OVSKernelSwitch
from mininet.node import Node

def main():
    net = Mininet(controller=None, link=Link, switch=OVSKernelSwitch)

definitions of nodes

This section shows how each node is defined. Routers are added via the addHost() function, as the libs lack a definition for a virtual router device. IP addresses are assigned only to the very hosts, router hosts (routers hereinafter) are to be configured separately and more explicitly.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
    #
    # routers definition
    #

    r1 = net.addHost("r1") 
    r2 = net.addHost("r2") 
    r3 = net.addHost("r3") 

    #
    # switches definitions
    #

    s1 = net.addSwitch("s1")
    s2 = net.addSwitch("s2")
    s3 = net.addSwitch("s3")

    #
    # hosts definitins
    # 

    # net 10.1.3.0/27 
    h1 = net.addHost("h1", ip="10.1.3.11/27")
    h2 = net.addHost("h2", ip="10.1.3.12/27")

    # net 10.1.3.32/27
    h3 = net.addHost("h3", ip="10.1.3.43/27")
    h4 = net.addHost("h4", ip="10.1.3.44/27")
    h5 = net.addHost("h5", ip="10.1.3.45/27")

    # net 10.1.3.64/27
    h6 = net.addHost("h6", ip="10.1.3.66/27")
    h7 = net.addHost("h7", ip="10.1.3.67/27")
    h8 = net.addHost("h8", ip="10.1.3.68/27")

links interconnectiong nodes

The following section describes on how to ensure links are added between defined pair of nodes. The order matters, as the interfaces’ names are then named after the node itself with the order number attached (e.g. r1-eth3). The order starts at zero.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
    #
    # core net router links
    #

    net.addLink(r1, r2)
    net.addLink(r1, r3)

    # router to switch links
    net.addLink(r1, s1)
    net.addLink(r2, s2)
    net.addLink(r3, s3)

    # net 10.1.3.0/27
    net.addLink(s1, h1)
    net.addLink(s1, h2)

    # net 10.1.3.32/27
    net.addLink(s2, h3)
    net.addLink(s2, h4)
    net.addLink(s2, h5)

    # net 10.1.3.64/27
    net.addLink(s3, h6)
    net.addLink(s3, h7)
    net.addLink(s3, h8)

upstream link

Here, we are going to fetch the root node (Raspberry Pi device). Then a link is added between the root node and R1 router. The upstream end of the link is assigned 10.1.1.1/30 address, and R1 router receives the first IP address — 10.1.1.2/30 — to communicate with the Internet.

The block’s tail tells that the network is to be built and started.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
    #
    # NATed link to root node (internet access)
    #

    root = Node("root", inNamespace=False)
    link = net.addLink(root, r1)

    # set address 10.1.1.1/30 on the side of root host
    link.intf1.setIP("10.1.1.1", "30")

    # build and start the topology
    net.build()
    net.start()

core routers configuration

By now, the network (nodes) is started — we can configure nodes directly.

The incoming block is to be very verbose — we are configuring core routers. At first, let us assign physical MAC addresses to interfaces facing switches. The switches are using those addresses as default destination for unknown frames.

Then, IP packets routing is enabled, and each interface’s setting is flushed. Each IP address is assigned to the defined interface explicitly.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
    #
    # router configuration
    #

    # interfaces' physical addresses for switching
    r1.cmd("ip link set dev r1-eth2 address 00:00:00:00:01:03")
    r1.cmd("ip link set dev r1-eth3 address 00:00:00:00:01:04")
    r2.cmd("ip link set dev r2-eth1 address 00:00:00:00:02:02")
    r3.cmd("ip link set dev r3-eth1 address 00:00:00:00:03:02")

    # main router configuration
    r1.cmd("sysctl -w net.ipv4.ip_forward=1")
    r1.cmd("ip a f dev r1-eth0")
    r1.cmd("ip a f dev r1-eth1")
    r1.cmd("ip a a 10.1.1.2/30 brd + dev r1-eth3")
    r1.cmd("ip a a 10.1.2.1/30 brd + dev r1-eth0")
    r1.cmd("ip a a 10.1.2.5/30 brd + dev r1-eth1")
    r1.cmd("ip a a 10.1.3.1/27 brd + dev r1-eth2")
    r1.cmd("ip r a 10.1.3.32/27 via 10.1.2.2 dev r1-eth0")
    r1.cmd("ip r a 10.1.3.64/27 via 10.1.2.6 dev r1-eth1")
    r1.cmd("ip r a default via 10.1.1.1 dev r1-eth3")

    # mask the whole network behind R1 router 
    r1.cmd("iptables -t nat -I POSTROUTING -o r1-eth3 -j MASQUERADE")

    # branch #1 router configuration
    r2.cmd("sysctl -w net.ipv4.ip_forward=1")
    r2.cmd("ip a f dev r2-eth0")
    r2.cmd("ip a f dev r2-eth1")
    r2.cmd("ip a a 10.1.2.2/30 brd + dev r2-eth0")
    r2.cmd("ip a a 10.1.3.33/27 brd + dev r2-eth1")
    r2.cmd("ip r a 10.1.3.0/27 via 10.1.2.1 dev r2-eth0")
    r2.cmd("ip r a 10.1.3.64/27 via 10.1.2.6 dev r2-eth0")
    r2.cmd("ip r a 10.1.2.4/30 via 10.1.2.1 dev r2-eth0")
    r2.cmd("ip r a default via 10.1.2.1 dev r2-eth0")

    # branch #2 router configuration
    r3.cmd("sysctl -w net.ipv4.ip_forward=1")
    r3.cmd("ip a f dev r3-eth0")
    r3.cmd("ip a f dev r3-eth1")
    r3.cmd("ip a f dev r3-eth2")
    r3.cmd("ip a a 10.1.2.6/30 brd + dev r3-eth0")
    r3.cmd("ip r a 10.1.3.0/27 via 10.1.2.5 dev r3-eth0")
    r3.cmd("ip r a 10.1.3.32/27 via 10.1.2.2 dev r3-eth0")
    r3.cmd("ip a a 10.1.3.65/27 brd + dev r3-eth1")
    r3.cmd("ip r a 10.1.2.0/30 via 10.1.2.5 dev r3-eth0")
    r3.cmd("ip r a default via 10.1.2.5 dev r3-eth0")

TCP load balancing

In this block, the TCP load balancing is to be implemented. For the balancing itself, the round-robin nth mode will be used. This mode takes two important flags from the oneliner — every and packet. In the listing below, the flags are set to 2 a 0 respectively. This setting ensures that every second packet suits the very rule starting at packet number zero. [5]

Note that these rules have to be appended (using the -A flag) to the ruleset, because packet is compared with each rule until suits one line by line. [5]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
    #
    # ensure services accessible from the outside
    # implement round-robin TCP traffic balancing
    #

    # TCP/80 balancing 
    r1.cmd("iptables -t nat -A PREROUTING -i r1-eth3 -d 10.1.1.2/30 -p tcp --dport 80 -m statistic --mode nth --every 2 --packet 0 -j DNAT --to-destination 10.1.3.43:80")
    r1.cmd("iptables -t nat -A PREROUTING -i r1-eth3 -d 10.1.1.2/30 -p tcp --dport 80 -j DNAT --to-destination 10.1.3.66:80")

    # TCP/53 balancing + UDP?
    r1.cmd("iptables -t nat -A PREROUTING -i r1-eth3 -d 10.1.1.2/30 -p tcp --dport 53 -m statistic --mode nth --every 2 --packet 0 -j DNAT --to-destination 10.1.3.44:53")
    r1.cmd("iptables -t nat -A PREROUTING -i r1-eth3 -d 10.1.1.2/30 -p tcp --dport 53 -j DNAT --to-destination 10.1.3.67:53")

    # TCP/25 balancing
    r1.cmd("iptables -t nat -A PREROUTING -i r1-eth3 -d 10.1.1.2/30 -p tcp --dport 25 -m statistic --mode nth --every 2 --packet 0 -j DNAT --to-destination 10.1.3.45:25")
    r1.cmd("iptables -t nat -A PREROUTING -i r1-eth3 -d 10.1.1.2/30 -p tcp --dport 25 -j DNAT --to-destination 10.1.3.68:25")

manual configuration of switches

The OVS controller for switches won’t be used, therefore all flows need to be defined explicitly.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
    #
    # switch configuration
    #

    # enable ARP 
    s1.cmd("ovs-ofctl add-flow s1 priority=1,arp,actions=flood")
    s2.cmd("ovs-ofctl add-flow s2 priority=1,arp,actions=flood")
    s3.cmd("ovs-ofctl add-flow s3 priority=1,arp,actions=flood")
   
    # by default, switch traffic to upstream routers
    s1.cmd("ovs-ofctl add-flow s1 priority=65535,ip,dl_dst=00:00:00:00:01:03,actions=output:1")
    s2.cmd("ovs-ofctl add-flow s2 priority=65535,ip,dl_dst=00:00:00:00:02:02,actions=output:1")
    s3.cmd("ovs-ofctl add-flow s3 priority=65535,ip,dl_dst=00:00:00:00:03:02,actions=output:1")

    # switch traffic within net 10.1.3.0/27
    s1.cmd("ovs-ofctl add-flow s1 priority=10,ip,nw_dst=10.1.3.11/32,actions=output:2")
    s1.cmd("ovs-ofctl add-flow s1 priority=10,ip,nw_dst=10.1.3.12/32,actions=output:3")

    # switch traffic within net 10.1.3.32/27
    s2.cmd("ovs-ofctl add-flow s2 priority=10,ip,nw_dst=10.1.3.43/32,actions=output:2")
    s2.cmd("ovs-ofctl add-flow s2 priority=10,ip,nw_dst=10.1.3.44/32,actions=output:3")
    s2.cmd("ovs-ofctl add-flow s2 priority=10,ip,nw_dst=10.1.3.45/32,actions=output:4")

    # switch traffic within net 10.1.3.64/27
    s3.cmd("ovs-ofctl add-flow s3 priority=10,ip,nw_dst=10.1.3.66/32,actions=output:2")
    s3.cmd("ovs-ofctl add-flow s3 priority=10,ip,nw_dst=10.1.3.67/32,actions=output:3")
    s3.cmd("ovs-ofctl add-flow s3 priority=10,ip,nw_dst=10.1.3.68/32,actions=output:4")

additional hosts configuration

Let us not forget about the default route to be set for each host, while reflecting the host’s subnet gateway. This will route all the traffic towards the core router(s), and possibly to the upstream.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
    #
    # host configuration
    #

    # subnet 10.1.3.0/27
    h1.cmd("ip r a default via 10.1.3.1")
    h2.cmd("ip r a default via 10.1.3.1")

    # subnet 10.1.3.32/27
    h3.cmd("ip r a default via 10.1.3.33")
    h4.cmd("ip r a default via 10.1.3.33")
    h5.cmd("ip r a default via 10.1.3.33")

    # subnet 10.1.3.64/27
    h6.cmd("ip r a default via 10.1.3.65")
    h7.cmd("ip r a default via 10.1.3.65")
    h8.cmd("ip r a default via 10.1.3.65")

run internal services as jobs

Now, let us focus on the services to be run on the virtual nodes. The paravirtualized layer is very thin, that means the resources are mostly shared — we cannot use SystemD, because it would interfere with other nodes trying to start such service too.

To bypass this situation, I propose running the services as jobs in the foreground (fg), then switched to the background (bg) as a job. This setup will lead to the various output to be printed when the node is accessed (output in fg is printed out, then the service is put to bg). [6]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
    #
    # run services as jobs
    #

    #h1.cmd("apt update && apt install -y nginx")
    h3.cmd("cp ./nginx-default-server.conf /etc/nginx/sites-enabled/default")
    h3.cmd("nginx -t && /usr/sbin/nginx -g 'daemon off; master_process on;' &")

    #h6.cmd("apt update && apt install bind9 bind9utils bind9-doc")
    h6.cmd("/usr/sbin/named -f -u bind &")

mininet’s prompt source

The very last (but not the least) block just ensures the command-line interface (CLI) is started and running. When the CLI is exited, the network is decomposed inc. node links.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
    #
    # run CLI
    #

    print("*** Running CLI")
    CLI(net)

    print("*** Stopping network")
    net.stop()

if __name__ == "__main__":
    setLogLevel("info")
    main()

deployment

When the script is ready, let us execute the scirpt as superuser (to manipulate with networking and interfaaces mainly):

`1`	`sudo python3 topo.py`

Ideally, that sequence should produce this output with mininet prompt on the last line.

1
2
3
4
5
6
7
8
9
*** Configuring hosts
r1 r2 r3 h1 h2 h3 h4 h5 h6 h7 h8 
*** Starting controller

*** Starting 3 switches
s1 s2 s3 ...
*** Running CLI
*** Starting CLI:
mininet>

link and routing test

As soon as the prompt is ready, one can examine the deployed network topology. For example let us check the bidirectional interface link status. Command links will print all defined links and their actual state respectively.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
mininet> links
r1-eth0<->r2-eth0 (OK OK) 
r1-eth1<->r3-eth0 (OK OK) 
r1-eth2<->s1-eth1 (OK OK) 
r2-eth1<->s2-eth1 (OK OK) 
r3-eth1<->s3-eth1 (OK OK) 
s1-eth2<->h1-eth0 (OK OK) 
s1-eth3<->h2-eth0 (OK OK) 
s2-eth2<->h3-eth0 (OK OK) 
s2-eth3<->h4-eth0 (OK OK) 
s2-eth4<->h5-eth0 (OK OK) 
s3-eth2<->h6-eth0 (OK OK) 
s3-eth3<->h7-eth0 (OK OK) 
s3-eth4<->h8-eth0 (OK OK) 
root-eth0<->r1-eth3 (OK OK) 

ping and traceroute

To test the interconnectivity between nodes, ping and traceroute tools are very useful. The first listing shows the ICMP ping-pong results, therefore h8 host is reachable from host h1 .

1
2
3
4
5
6
mininet> h1 ping h8
PING 10.1.3.68 (10.1.3.68) 56(84) bytes of data.
64 bytes from 10.1.3.68: icmp_seq=1 ttl=62 time=2.40 ms
64 bytes from 10.1.3.68: icmp_seq=2 ttl=62 time=0.360 ms
64 bytes from 10.1.3.68: icmp_seq=3 ttl=62 time=0.371 ms
64 bytes from 10.1.3.68: icmp_seq=4 ttl=62 time=0.234 ms

The second listing utilizes the traceroute utility, mainly to show the so-called hops between two defined nodes. Here, host h2 from the 10.1.3.0/27network probes host h5, which is set in the 10.1.3.32/27 network. We can see that the packet hops to router R1 (10.1.3.1), then to router R2 (10.1.2.2), and finally to the remote counterpart, host h5.

1
2
3
4
5
mininet> h2 traceroute h5
traceroute to 10.1.3.45 (10.1.3.45), 30 hops max, 60 byte packets
 1  10.1.3.1 (10.1.3.1)  3.450 ms  3.287 ms  3.234 ms
 2  10.1.2.2 (10.1.2.2)  3.197 ms  3.151 ms  3.100 ms
 3  10.1.3.45 (10.1.3.45)  4.140 ms  4.107 ms  4.071 ms

To show the core routing in action, let us probe hosts from the subnets behind another router (R2, R3). In the third listing, we can observe another hop between hosts, as the packet(s) flows like this:

1
2
h4 -> r2 -> r1 -> r3 -> h8
h4 <- r2 <- r1 <- r3 <- h8

1
2
3
4
5
6
mininet> h4 traceroute h8
traceroute to 10.1.3.68 (10.1.3.68), 30 hops max, 60 byte packets
 1  10.1.3.33 (10.1.3.33)  1.131 ms  0.283 ms  0.345 ms
 2  10.1.2.1 (10.1.2.1)  0.163 ms  0.070 ms  0.055 ms
 3  10.1.2.6 (10.1.2.6)  0.084 ms  0.075 ms  0.069 ms
 4  10.1.3.68 (10.1.3.68)  1.470 ms  1.971 ms  1.868 ms

zabbix

To install the Zabbix tooling, it is necessary to install their repository first (it depends on the chosen system configuration on the site). [9]

1
2
3
wget https://repo.zabbix.com/zabbix/6.4/raspbian/pool/main/z/zabbix-release/zabbix-release_6.4-1+debian11_all.deb
dpkg -i zabbix-release_6.4-1+debian11_all.deb
apt update 

When the repository is installed and loaded, it is possible to install the whole Zabbix tooling (again, depends on the chosen system configuration). [9]

1
2
3
4
5
6
apt install zabbix-server-mysql \
	zabbix-frontend-php \
	zabbix-nginx-conf \
	zabbix-sql-scripts \
	zabbix-agent \
	mariadb-server

Next, the database has to be created via the mysql command: [9]

1
2
3
4
5
6
7
mysql -uroot -p

mysql> create database zabbix character set utf8mb4 collate utf8mb4_bin;
mysql> create user zabbix@localhost identified by 'password';
mysql> grant all privileges on zabbix.* to zabbix@localhost;
mysql> set global log_bin_trust_function_creators = 1;
mysql> quit; 

Then initial database schema needs to be imported: [9]

1
2
zcat /usr/share/zabbix-sql-scripts/mysql/server.sql.gz \
	| mysql --default-character-set=utf8mb4 -uzabbix -p zabbix 

Again, enter the MariaDB mysql prompt and disable the log_bin_trust_function_creators option. [9]

1
2
3
4
mysql -uroot -p

mysql> set global log_bin_trust_function_creators = 0;
mysql> quit; 

Now go to /etc/zabbix/zabbix_server.conf and enter the database password you assigned to the newly created database account (zabbix). [9]

`1`	`DBPassword=password`

Furthermore, edit the nginx server configuration file (/etc/zabbix/nginx.conf). [9]

1
2
# listen 8080;
# server_name example.com; 

agent and mininet integration

Zabbix Agent configuration file (trimmed, R1 example):

1
2
3
4
5
6
PidFile=/run/zabbix/zabbix_agentd.r1.pid
LogFile=/var/log/zabbix/zabbix_agentd.r1.log
LogFileSize=0
Server=127.0.0.1,10.1.3.11
ServerActive=127.0.0.1,10.1.1.2,10.1.3.11
Hostname=R1

To run Zabbix server, agent, nginx proxy and PHP-FPM processor all in foreground as jobs, we need to add those lines to the mininet configuration (topo_kb2.py): [7]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
    # run zabbix
    h1.cmd("/usr/sbin/zabbix_server -c /etc/zabbix/zabbix_server.conf -f &")
    h1.cmd("/usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.h1.conf -f &")
    h1.cmd("nginx -t && /usr/sbin/nginx -g 'daemon off; master_process on;' &")
    h1.cmd("/usr/sbin/php-fpm8.2 --nodaemonize --fpm-config /etc/php/8.2/fpm/php-fpm.conf &")

    # run zabbix agent on hosts
    h2.cmd("/usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.h2.conf -f &")
    h3.cmd("/usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.h3.conf -f &")
    h4.cmd("/usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.h4.conf -f &")
    h5.cmd("/usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.h5.conf -f &")
    h6.cmd("/usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.h6.conf -f &")
    h7.cmd("/usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.h7.conf -f &")
    h8.cmd("/usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.h8.conf -f &")
    
    # run zabbix agent on routers
    r1.cmd("/usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.r1.conf -f &")
    r2.cmd("/usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.r2.conf -f &")
    r3.cmd("/usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.r3.conf -f &")

frontend and dashboards

To be able to access Zabbix frontend from the “Internet” (upstream) side, we can setup another DNAT rule on the root host to forward incoming traffic on port TCP/8090 to mininet’s edge router port TCP/8080, which is then forwarded to host h1.

1
2
3
sudo iptables -t nat -I PREROUTING \
	-p tcp --dport 8090 \
	-j DNAT --to-destination 10.1.1.2:8080

Now, we can access the very frontend (via the host itself):

`1`	`http://raspberry_pi:8090`

Fig 3: Welcome dashboard after the login. Zabbix.

Fig 4: New item dialog. Zabbix.

Fig 5: Sample alert indicating that SMTP server on H8 host is not reachable (is down). Zabbix.

Fig 6: List of all hosts from the project’s network topology, Zabbix agent reachability indicator shown on the right side. Zabbix.

visualization of load balancing

To visualize the TCP load balancing descibed earlier, a simple bash script implementing the curl tool is to be executed towards the core R1 router from the upstream side. Router R1 then should evenly balance ingress traffic between the two subnets. The script prints the iterator i when modulo 1000 is zero as a progress infonote. [8]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#!/bin/bash

# knock.sh
#
# [email protected] / May 8, 2024

for i in $(seq 0 10000); do
	[ $((i % 1000)) -eq 0 ] && echo $i;
	curl -s http://10.1.1.2:80 > /dev/null;
done

echo done

Fig 7: Demonstration of the TCP load balancing in practice. Zabbix dashboard.

In the fugure above we can see a significant increase in traffic on R1’s r1-eth3 interface. Let the unit be bytes. The saturation point of such interface’s ingress hits circa 954.4 kBps. This traffic is then almost perfectly distributed between R2 and R3 routers (485.96 kBps and 486.49 kBps respectively). [10]

In the end, Nagios tooling won’t be installed and used in demonstration.

conclusion

To sum it up, deploying and configuring mininet tool (using the python mininet lib) had been easy and straightforward. Although the usage of OVS switch/controller tooling wasn’t fully utilized, the routing capabilities of Linux kernel enabled the way for smooth network deployment and setup.

In my honest opinion, I would rather use Prometheus stack (with Grafana, Loki, and more) with dedicated exporters as it is a very lightweight solution for complex systems and networks.

references

reference No.	reference detail
[1]	https://www.cisco.com/c/en/us/solutions/hybrid-work/what-is-high-availability.html
[2]	https://www.pagerduty.com/resources/learn/top-5-apm-tools/
[3]	https://www.ibm.com/topics/network-operations-center
[4]	https://www.esa.int/Enabling_Support/Operations/ESA_Ground_Stations/Network_Operations_Centre
[5]	https://scalingo.com/blog/iptables
[6]	https://serverfault.com/a/916072
[7]	https://support.zabbix.com/browse/ZBXNEXT-611
[8]	https://www.baeldung.com/linux/bash-modulo-operator
[9]	https://www.zabbix.com/download
[10]	https://www.zabbix.com/documentation/current/en/manual/config/items/itemtypes/zabbix_agent#net.if.inifmode

introduction#

high availability#

monitoring of systems and services#

demonstration#

used hardware and tooling#

topology#

mininet installation#

python environment settings#

mininet configuration using the python lib#

libs import#

definitions of nodes#

links interconnectiong nodes#

upstream link#

core routers configuration#

TCP load balancing#

manual configuration of switches#

additional hosts configuration#

run internal services as jobs#

mininet’s prompt source#

deployment#

link and routing test#

ping and traceroute#

zabbix#

agent and mininet integration#

frontend and dashboards#

visualization of load balancing#

conclusion#

references#