Unix Domain Socket example with Fail2ban

In Linux, Stream Sockets are primarily implemented in two domains. The domain specifies the protocol family and addressing scheme used.

Internet Domain (AF_INET/AF_INET6): Uses the Transmission Control Protocol (TCP) to communicate over a network. They use IP addresses and port numbers for addressing.
Unix Domain (AF_UNIX): Used for Inter-Process Communication (IPC) between programs on the same machine. In this context, they behave like a bidirectional pipe. They are addressed via filesystem paths (e.g., /tmp/file.socket) and are faster than network sockets as they avoid network protocol overhead.

In this post I’ll cover an example of a Unix Domain Socket by using the fail2ban utility.

Fail2Ban is an open-source intrusion prevention software that protects servers from automated attacks, like brute-force login attempts, by monitoring log files for suspicious activity and automatically banning offending IP addresses using firewall rules (e.g., iptables/nftables).

Here is the Fail2ban configuration I have, where I defined a new failregex filter to match 429 http return code (Too Many Requests) log lines from HAProxy log files, and created a new jail called ‘haproxy-ratelimit’ to block (temporarily, for 1 hour – 3600 seconds) on the firewall the http/https traffic from the remote IPs that have caused 429 on HAProxy.

root@vps-2153e875:~# cat /etc/fail2ban/filter.d/haproxy-ratelimit.conf
[Definition]
failregex = ^.*haproxy\[.*\]: <HOST>:.*NOSRV.* 429
ignoreregex =

root@vps-2153e875:~# cat /etc/fail2ban/jail.local
[haproxy-ratelimit]
enabled = true
port    = http,https
filter  = haproxy-ratelimit
logpath = /var/log/haproxy.log
maxretry = 3
bantime  = 3600

Below you also have an excerpt from the HAProxy frontend configuration used to implement the ‘429‘ rate limit. This will log and return the http ‘429’ response to clients that made more than 60 requests within a minute.

root@vps-2153e875:~# view /etc/haproxy/haproxy.cfg
...
    stick-table type ip size 100k expire 300s store http_req_rate(60s)
    acl is_whitelisted src 1.2.3.4/32
    http-request track-sc0 src unless is_whitelisted
    http-request deny deny_status 429 if { sc_http_req_rate(0) gt 60 }
...

Fail2ban has a core daemon process running on the server called ‘fail2ban-server‘, which actively monitors the server’s log files for patterns of malicious or network abusive activity. When it detects too many failures from a single IP address within a set timeframe, it automatically triggers actions, most commonly by updating firewall rules to temporarily or permanently block that IP, preventing further unauthorized access.

root@vps-2153e875:~# ps aux | grep -i fail2ba[n]
root         823  0.1  1.7 1440576 142060 ?      Ssl  Jan01  13:47 /usr/bin/python3 /usr/bin/fail2ban-server -xf start

root@vps-2153e875:~# lsof -nP -p 823 | grep '\/var' | grep -v journal
fail2ban- 823 root    3w      REG                8,1    22460   4195 /var/log/fail2ban.log
fail2ban- 823 root    4u     unix 0xffff8e2d48102000      0t0   8841 /var/run/fail2ban/fail2ban.sock type=STREAM (LISTEN)
fail2ban- 823 root    6u      REG                8,1   126976 811607 /var/lib/fail2ban/fail2ban.sqlite3

There is also a CLI utility that comes with Fail2ban called ‘fail2ban-client‘ which you can use to list the current configured jails, list banned IPs on each jail, and ban/unban IPs manually.

root@vps-2153e875:~# fail2ban-client status
Status
|- Number of jail:      2
`- Jail list:   haproxy-ratelimit, sshd

root@vps-2153e875:~# fail2ban-client status haproxy-ratelimit
Status for the jail: haproxy-ratelimit
|- Filter
|  |- Currently failed: 1
|  |- Total failed:     4
|  `- Journal matches:
`- Actions
   |- Currently banned: 1
   |- Total banned:     1
   `- Banned IP list:   79.116.218.196

The fail2ban-client utility communicates with fail2ban-server using a Unix Domain Socket. Let’s use the strace utility to trace system calls on both fail2ban-client command and fail2ban-server daemon process in order to understand how this communication works. The following is the fail2ban command to unban manually the IP ‘79.116.218.196‘ on all existing fail2ban jails: “fail2ban-client unban 79.116.218.196“.

Below you have the strace commands I used. The first command will start a strace capture by attaching to the fail2ban-server process (PID=823) and write the traces to the called ‘/tmp/strace_fail2banserver.txt‘. The second command will run the command ‘fail2ban-client unban 79.116.218.196’, capturing all the system calls made by the execution of this command, and save it to the file ‘/tmp/strace_fai2banclient.txt‘. The last command is just to kill/stop the first strace command which I initially ran in the background.

strace -f -s 1024 -tt -p 823 -o /tmp/strace_fail2banserver.txt &

strace -f -s 1024 -tt -o /tmp/strace_fai2banclient.txt fail2ban-client unban 79.116.218.196

pkill strace

Given strace generates files with hundreds/thousands of lines because it records every single interaction between a program and the Linux kernel, below I’ll just post a small extract from these logs showing the part of the Unix Domain Socket communication between the ‘fail2ban-client’ and the ‘fail2ban-server’ processes.

Let’s start with the strace traces generated by the ‘fail2ban-client’ command. In the following ouput you can see that the system call ‘socket‘ is used to create a ‘AF_UNIX‘ socket (Unix Domain), which creates the file descriptor ‘3‘, then the process uses the syscall ‘connect‘ (using that file descriptor) to connect to the existing server socket ‘/var/run/fail2ban/fail2ban.sock‘, and uses the syscall ‘sendto‘ to send the message ‘unban 79.116.218.196’, waits and reads the reply using the sycall ‘recvfrom‘, and closes the file descriptor/socket using the ‘close‘ syscall. Regarding the file ‘fail2ban.sock’ is the socket file that is created by ‘fail2ban-server’ when the fail2ban service is started on the server.

# strace_fai2banclient.txt:
353347 18:18:13.524454 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 3
353347 18:18:13.524630 connect(3, {sa_family=AF_UNIX, sun_path="/var/run/fail2ban/fail2ban.sock"}, 34) = 0
353347 18:18:13.525040 sendto(3, "\200\5\225\36\0\0\0\0\0\0\0]\224(\214\5unban\224\214\01679.116.218.196\224e.", 41, 0, NULL, 0) = 41
353347 18:18:13.525214 sendto(3, "<F2B_END_COMMAND>", 17, 0, NULL, 0) = 17
353347 18:18:13.525526 recvfrom(3, "\200\5\225\7\0\0\0\0\0\0\0K\0K\1\206\224.<F2B_END_COMMAND>", 1024, 0, NULL, NULL) = 35
353347 18:18:13.611999 write(1, "1\n", 2) = 2
353347 18:18:13.612189 sendto(3, "<F2B_CLOSE_COMMAND><F2B_END_COMMAND>", 36, 0, NULL, 0) = 36
353347 18:18:13.612415 shutdown(3, SHUT_RDWR) = 0
353347 18:18:13.612633 close(3)

Now let’s see what happens on the ‘fail2ban-server‘ side. Notice on the output of the lsof command I posted earlier that ‘fail2ban-server’ is listening on the socket file ‘/var/run/fail2ban/fail2ban.sock‘ (file descriptor 4). The ‘accept4‘ syscall waits for and accepts the first pending connection on the listening socket (file descriptor 4), returning a new socket file descriptor (50) for that connection. After that, the ‘recvfrom‘ syscall is used to receive the message from the socket, and it receives the message ‘unban 79.116.218.196‘ followed by “<F2B_END_COMMAND>“, which is a mark of f2b protocol, just signaling the end of command.

# strace_fail2banserver.txt:
823   18:18:13.525029 accept4(4, {sa_family=AF_UNIX}, [110 => 2], SOCK_CLOEXEC) = 50
823   18:18:13.525440 getsockname(50, {sa_family=AF_UNIX, sun_path="/var/run/fail2ban/fail2ban.sock"}, [128 => 34]) = 0
823   18:18:13.525649 fcntl(50, F_GETFD) = 0x1 (flags FD_CLOEXEC)
823   18:18:13.525763 fcntl(50, F_SETFD, FD_CLOEXEC) = 0
823   18:18:13.525878 ioctl(50, FIONBIO, [1]) = 0
823   18:18:13.525984 getpeername(50, {sa_family=AF_UNIX}, [110 => 2]) = 0
823   18:18:13.526143 pselect6(51, [4 50], NULL, [4 50], {tv_sec=2, tv_nsec=0}, NULL) = 1 (in [50], left {tv_sec=1, tv_nsec=999997449})
823   18:18:13.526286 recvfrom(50, "\200\5\225\36\0\0\0\0\0\0\0]\224(\214\5unban\224\214\01679.116.218.196\224e.<F2B_END_COMMAND>", 65536, 0, NULL, NULL) = 58

Notice from the output of the earlier lsof command I mentioned that the ‘fail2ban-server’ process has also opened permanently the file ‘/var/lib/fail2ban/fail2ban.sqlite3‘ on the file descriptor 6. This is a database that fail2ban maintains with the runtime information regarding the jails and the bans. After reading the IP to ban, ‘fail2ban-server’ will read the information about the jails where that IP is banned.

# strace_fail2banserver.txt:
823   18:18:13.529161 fstat(6, {st_mode=S_IFREG|0600, st_size=98304, ...}) = 0
823   18:18:13.529262 fcntl(6, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=1073741825, l_len=1}) = 0
823   18:18:13.529588 fcntl(6, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=1073741824, l_len=1}) = 0
823   18:18:13.529679 fcntl(6, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=1073741826, l_len=510}) = 0
823   18:18:13.529756 pwrite64(6, "SQLite format 3\0\20\0\1\1\0@

And finally, ‘execve‘ syscall runs the OS system command ‘nft delete element inet f2b-table addr-set-haproxy-ratelimit { 79.116.218.196 }‘ which will remove the banned IP from the nftables firewall. There are two execve syscalls, the first one is for the shell that is spawned to run the nft command, and the second one is the execution of the nft command itself.

This is the end of this article, where we were able to review the fail2ban utility, how it uses Unix Domain sockets, and check how Linux syscalls are used in this process.

Adib Ahmed Akhtar

Unix Domain Socket example with Fail2ban

More posts

Random timeout failures on Java SFTP Pipeline and SO_KEEPALIVE

How Kubernetes CoreDNS works

Understanding Kubernetes node network interfaces

Auditd rules for Kubernetes nodes