Author: Adib Ahmed Akhtar

  • Random timeout failures on Java SFTP Pipeline and SO_KEEPALIVE

    This post is about a customer case I worked on some time ago. The customer was using our Java-based ETL application where they had a Pipeline to read files from a local server directory, parse some information read from these files, and finally send the files to a remote SFTP server. It was a streaming Pipeline (not a batch Pipeline), which means that it’s continuously running, checking every few seconds the origin directory for new files, and processing/sending them to the destination as soon as they arrive. The Pipeline does not open a new SFTP connection every time it sends a new file. Instead, the Pipeline keeps a SFTP connection open permanently against the SFTP server, which is more efficient because there are no new connections overhead.

    The customer stated that the Pipeline was working fine, but sometimes randomly starts to fail and new files in the origin directory are not being sent to the destination, taking 10 minutes or more to be copied onto the destination. On the Pipeline application log we found several traces with the error message ‘Connection timed out (Read failed)‘ related to the SFTP connection, and we also noticed several retries from the Pipeline trying to list the remote SFTP directory. The customer also mentioned that if they restart the Pipeline when they start having these errors, then the Pipeline continues processing the pending files to be processed without issues.

    java.net.SocketException: Connection timed out (Read failed)
            at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_261]
            at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0_261]
            at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[?:1.8.0_261]
            at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_261]
            at net.schmizz.sshj.transport.Reader.run(Reader.java:50) [sshj-0.38.0.jar:?]
    (more…)
  • How Kubernetes CoreDNS works

    CoreDNS is the default, flexible, and extensible DNS server in Kubernetes, providing service discovery and name resolution by mapping service and pod names to IP addresses within the cluster. It allows pods to find services using human-readable domain names instead of IPs. CoreDNS is running as pods (a Kubernetes deployment with 2 replicas) in the kube-system namespace.

    root@vps-2153e875:~# kubectl get pods -A -o wide
    NAMESPACE        NAME                                   READY   STATUS    RESTARTS      AGE    IP               NODE        
    kube-flannel     kube-flannel-ds-xcdcr                  1/1     Running   2 (21d ago)   105d   135.125.174.14   vps-2153e875
    kube-system      coredns-674b8bbfcf-8qdxv               1/1     Running   2 (21d ago)   105d   10.244.0.34      vps-2153e875
    kube-system      coredns-674b8bbfcf-btksw               1/1     Running   2 (21d ago)   105d   10.244.0.33      vps-2153e875
    kube-system      etcd-vps-2153e875                      1/1     Running   2 (21d ago)   105d   135.125.174.14   vps-2153e875
    kube-system      kube-apiserver-vps-2153e875            1/1     Running   2 (21d ago)   105d   135.125.174.14   vps-2153e875
    kube-system      kube-controller-manager-vps-2153e875   1/1     Running   2 (21d ago)   105d   135.125.174.14   vps-2153e875
    kube-system      kube-proxy-sr72s                       1/1     Running   2 (21d ago)   105d   135.125.174.14   vps-2153e875
    kube-system      kube-scheduler-vps-2153e875            1/1     Running   2 (21d ago)   105d   135.125.174.14   vps-2153e875
    kube-system      metrics-server-797759896b-j7wnz        1/1     Running   1 (21d ago)   21d    135.125.174.14   vps-2153e875
    ns-adibexpress   frontend-q8llk                         1/1     Running   1 (21d ago)   21d    10.244.0.36      vps-2153e875
    ns-adibexpress   mysql                                  1/1     Running   1 (21d ago)   21d    10.244.0.35      vps-2153e875

    10.96.0.10 is the default internal IP address for the Kubernetes DNS service (CoreDNS) in a default Kubernetes cluster, acting as the nameserver for pods to resolve service names within the cluster. It’s a ClusterIP, meaning it’s a virtual IP for a service, load balancing traffic to the underlying DNS pods, allowing other pods to find services like ‘kubernetes.default.svc.cluster.local’ using DNS.

    (more…)
  • Understanding Kubernetes node network interfaces

    The goal of this article is to review and understand the node network interfaces, and how network communications works in a Kubernetes cluster.

    My server has a kubeadm installation with Flannel as a CNI plugin. By default, Flannel has the IP range ‘10.244.0.0/16‘ for the Pods network. Each Kubernetes node will have a subnet of ‘/24’ for Pods on that node.

    Let’s start by listing my Kubernetes Control Plane node network interfaces which have an assigned IP address.

    root@vps-2153e875:~# ifconfig | grep -B1 ' inet '
    cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
            inet 10.244.0.1  netmask 255.255.255.0  broadcast 10.244.0.255
    --
    ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
            inet 135.125.174.14  netmask 255.255.255.255  broadcast 0.0.0.0
    --
    flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
            inet 10.244.0.0  netmask 255.255.255.255  broadcast 0.0.0.0
    --
    lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
            inet 127.0.0.1  netmask 255.0.0.0

    From the previous capture, we can see that ‘ens3‘ (135.125.174.14) is the main network interface of the server, which has a public range IP assigned. The network interfaces ‘cni0‘ (10.244.0.1) and ‘flannel.1‘ (10.244.0.0) are network interfaces which were created when I installed Kubernetes on this server.

    1. cni0 (Bridge): Handles intra-node traffic. It connects all pods on the same node so they can talk to each other directly at Layer 2.
    2. flannel.1 (VXLAN Device): Handles inter-node traffic. It is a VXLAN Tunnel Endpoint (VTEP) that encapsulates the original pod packet into a UDP packet to “tunnel” it across the physical network to the destination node.
    (more…)
  • Auditd rules for Kubernetes nodes

    Auditd (Linux Audit Daemon) is the userspace component of the Linux Auditing System. It is responsible for collecting, filtering, and writing system event logs to disk based on pre-defined rules. It hooks into the Linux kernel to intercept system calls (syscalls) such as file access, network activity, and process execution. Records crucial metadata including timestamps, the User ID (UID), the Audit ID (AUID) which persists even after privilege escalation (e.g., via sudo), and the success or failure of the event.

    On my servers, I usually increase the default value of ‘max_log_file‘ on ‘/etc/audit/auditd.conf‘ from 8 MB to 100 MB and keep the value of ‘num_logs‘ to 5 log files in order to keep enough history for the auditd events. You can increase these values, but keep in mind that the ausearch utility will consume more server resources and take longer times when you query auditd logs using this utility.

    # /etc/audit/auditd.conf
    ...
    max_log_file = 100
    num_logs = 5
    ...
    
    root@vps-2153e875:~# du -ks /var/log/audit/*
    4528    /var/log/audit/audit.log
    102408  /var/log/audit/audit.log.1
    102408  /var/log/audit/audit.log.2
    102408  /var/log/audit/audit.log.3
    102408  /var/log/audit/audit.log.4

    You can search on the Internet for which auditd rules you should have to comply PCI DSS compliance. In addition to those rules, I have the following rules to monitor the file changes (creation, deletion, modification) on the indicated directories (configurations directory, binaries/libraries directories, temporary directories, logs, and user’s home directories) in order to be able to trace changes made by users, daemons, applications, and investigate any security breach.

    (more…)
  • Linux process in uninterruptible sleep state

    In Linux, a process in an uninterruptible sleep state (D state) is typically waiting for I/O operations (like disk or network) and cannot be killed until the system call completes. 

    root@vps-2153e875:~# ps -eo pid,state,comm | awk '$2 == "D"'
    2662710 D find
    
    root@vps-2153e875:~# cat /proc/2662710/stat
    2662710 (find) D 2660486 2662710 2660486 34817 2662710 4456448 162 0 0 0 5 40 0 0 20 0 1 0 470174814 5050368 576 18446744073709551615 94339750293504 94339750436697 140720879474736 0 0 0 0 0 0 1 0 0 17 0 0 0 0 0 0 94339750490160 94339750499624 94340756328448 140720879479322 140720879479338 140720879479338 140720879480810 0

    Unlike standard “idle” processes, those in state D contribute to the system load average, often causing high load numbers even if CPU usage is low.

    Common causes of a process in D state are NFS issues (a lost connection to a NFS is the most frequent cause) or Failing Hardware (hardware malfunctions that prevent I/O completion).

    If the resource becomes available (e.g., the NFS server comes back online), the process will resume automatically. For storage issues, attempting a “lazy” unmount (umount -l) or resetting the specific hardware device may help. If the process is stuck due to a kernel bug or permanently lost hardware, a system reboot is often the only way to clear it.

    (more…)
  • SSL certificates with Certbot on HAProxy

    As you may already know, Certbot is a free, open-source software tool designed to automatically obtain and install SSL/TLS certificates for websites. It is developed by the Electronic Frontier Foundation (EFF) and serves as the primary client for Let’s Encrypt, a free certificate authority.

    You can install certbot using your OS package manager (apt/yum). After that, if you already have an Apache Webserver running on http://<yourdomain>, then you can just run ‘sudo certbot –apache‘ which will use Certbot Apache plugin to detect the configuration and domain, then it generates/installs the SSL key/certificate, and automatically will modify web server configurations to use the new certificate. The first time you run it, you will be asked for an email address and prompted to agree to terms of service; it will also ask by prompting you to confirm the domain for which you are generating the certificate, finally it will ask if you want automatically redirect all HTTP traffic to HTTPS on your web server configuration. The certificate will be valid for 90 days, and to renew it you will have to run ‘sudo certbot renew‘.

    Below you have more details about what internally happens when you run the command certbot.

    (more…)
  • Remote Code Execution vulnerability exploitation

    Disclaimer note: The information, examples and screenshots provided in this article are for general informational and learning purposes only. The author assumes no responsibility or liability for any results obtained from the use of this information.

    A Remote Code Execution (RCE) vulnerability is a critical security flaw that allows an attacker to run arbitrary code or commands on a target machine from a remote location. Because RCE does not require physical access to the device or prior authentication, it is considered one of the most dangerous types of cybersecurity vulnerabilities.

    RCE typically occurs when an application or server processes user-supplied data insecurely, allowing an attacker to “trick” the system into executing malicious instructions.

    Successful exploitation of an RCE vulnerability often results in full system compromise. Attackers can steal sensitive data, deploy malware, attack other systems within a corporate network, or use the compromised server resources to run cryptomining software.

    (more…)
  • Java thread blocked on a native method

    When a Java application is experiencing performance issues like slowness, or it’s being unresponsive, we usually check the following things in order to troubleshoot.

    • Threads consuming high CPU
    • Threads marked as BLOCKED
    • GC pause times
    • Connection pool
    • Application log

    If our Java application makes http(s) requests to an external site or if our application depends on a database, then the issues and the delays from these remote resources affect directly our application response time. In these scenarios, although the thread is effectively “blocked” waiting for the network response, the JVM reports it as RUNNABLE because it can’t track the internal state of a native method. In Java, native threads are execution units managed directly by the underlying operating system (OS) kernel.

    (more…)
  • Unix Domain Socket example with Fail2ban

    In Linux, Stream Sockets are primarily implemented in two domains. The domain specifies the protocol family and addressing scheme used.

    • Internet Domain (AF_INET/AF_INET6): Uses the Transmission Control Protocol (TCP) to communicate over a network. They use IP addresses and port numbers for addressing. 
    • Unix Domain (AF_UNIX): Used for Inter-Process Communication (IPC) between programs on the same machine. In this context, they behave like a bidirectional pipe. They are addressed via filesystem paths (e.g., /tmp/file.socket) and are faster than network sockets as they avoid network protocol overhead.

    In this post I’ll cover an example of a Unix Domain Socket by using the fail2ban utility.

    Fail2Ban is an open-source intrusion prevention software that protects servers from automated attacks, like brute-force login attempts, by monitoring log files for suspicious activity and automatically banning offending IP addresses using firewall rules (e.g., iptables/nftables).

    (more…)
  • How a SYN flood attack looks

    Some weeks ago, on December 22, 2025, after logging into one of my existing VPS servers (IP 51.79.160.8) which has an Apache process listening on ports 80 and 443, I noticed on the output of the command ‘netstat -ptuan’ that there were around 100 connections (TCP Sockets) on my server port 443 in ‘SYN_RECV‘ state.

    Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
    ...
    tcp        0      0 51.79.160.8:443         177.8.135.235:47130     SYN_RECV    -
    tcp        0      0 51.79.160.8:443         177.8.134.2:50381       SYN_RECV    -
    tcp        0      0 51.79.160.8:443         177.8.132.49:48273      SYN_RECV    -
    tcp        0      0 51.79.160.8:443         177.8.135.108:59527     SYN_RECV    -
    tcp        0      0 51.79.160.8:443         177.8.132.62:42140      SYN_RECV    -
    tcp        0      0 51.79.160.8:443         177.8.134.237:32949     SYN_RECV    -
    tcp        0      0 51.79.160.8:443         177.8.135.149:27714     SYN_RECV    -
    tcp        0      0 51.79.160.8:443         177.8.132.77:28729      SYN_RECV    -
    tcp        0      0 51.79.160.8:443         177.8.132.234:37897     SYN_RECV    -
    tcp        0      0 51.79.160.8:443         177.8.132.120:53457     SYN_RECV    -
    ...
    (more…)