This post is about a customer case I worked on some time ago. The customer was using our Java-based ETL application where they had a Pipeline to read files from a local server directory, parse some information read from these files, and finally send the files to a remote SFTP server. It was a streaming Pipeline (not a batch Pipeline), which means that it’s continuously running, checking every few seconds the origin directory for new files, and processing/sending them to the destination as soon as they arrive. The Pipeline does not open a new SFTP connection every time it sends a new file. Instead, the Pipeline keeps a SFTP connection open permanently against the SFTP server, which is more efficient because there are no new connections overhead.
The customer stated that the Pipeline was working fine, but sometimes randomly starts to fail and new files in the origin directory are not being sent to the destination, taking 10 minutes or more to be copied onto the destination. On the Pipeline application log we found several traces with the error message ‘Connection timed out (Read failed)‘ related to the SFTP connection, and we also noticed several retries from the Pipeline trying to list the remote SFTP directory. The customer also mentioned that if they restart the Pipeline when they start having these errors, then the Pipeline continues processing the pending files to be processed without issues.
java.net.SocketException: Connection timed out (Read failed)
at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_261]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0_261]
at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[?:1.8.0_261]
at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_261]
at net.schmizz.sshj.transport.Reader.run(Reader.java:50) [sshj-0.38.0.jar:?]
(more…)