Hi,
I use this library to transfer files sized up to some gigabytes. Sometimes, quite rare, such a transfer doesn't complete and runs forever without any progress. Whenever i do a thread dump in this situation, the output is always as follows:
"MessagePump_Thread-1388" #1399 prio=5 os_prio=0 tid=0x00007ff3f0bf1800 nid=0x2f6b runnable [0x00007ff3d623d000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.DataInputStream.read(DataInputStream.java:149)
at com.sshtools.ssh2.TransportProtocol.readWithTimeout(TransportProtocol.java:594)
at com.sshtools.ssh2.TransportProtocol.readMessage(TransportProtocol.java:747)
- locked <0x00000005626628d8> (a java.io.DataInputStream)
at com.sshtools.ssh2.TransportProtocol.nextMessage(TransportProtocol.java:571)
- locked <0x00000005626628d8> (a java.io.DataInputStream)
at com.sshtools.ssh.message.SshMessageRouter.blockForMessage(SshMessageRouter.java:297)
at com.sshtools.ssh.message.SshMessageRouter.access$000(SshMessageRouter.java:44)
at com.sshtools.ssh.message.SshMessageRouter$MessagePump.run(SshMessageRouter.java:370)
Locked ownable synchronizers:
- None
...
"main" #1 prio=5 os_prio=0 tid=0x00007ff3f0008800 nid=0x2c70 in Object.wait() [0x00007ff3f4e26000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at com.sshtools.ssh.message.ThreadSynchronizer.requestBlock(ThreadSynchronizer.java:73)
- locked <0x00000005626b8750> (a com.sshtools.ssh.message.ThreadSynchronizer)
at com.sshtools.ssh.message.SshMessageRouter.nextMessage(SshMessageRouter.java:258)
at com.sshtools.ssh.message.SshMessageStore.nextMessage(SshMessageStore.java:71)
at com.sshtools.ssh2.Ssh2Channel.processMessages(Ssh2Channel.java:507)
at com.sshtools.ssh2.Ssh2Channel$ChannelInputStream.read(Ssh2Channel.java:1132)
- locked <0x00000005626b83b8> (a com.sshtools.ssh2.Ssh2Channel$ChannelInputStream)
at com.sshtools.ssh2.Ssh2Channel$ChannelInputStream.read(Ssh2Channel.java:1097)
at com.sshtools.scp.ScpClientIO$ScpEngineIO.waitForResponse(ScpClientIO.java:398)
at com.sshtools.scp.ScpClient$ScpEngine.writeFileToRemote(ScpClient.java:508)
at com.sshtools.scp.ScpClient$ScpEngine.access$000(ScpClient.java:401)
at com.sshtools.scp.ScpClient.putFile(ScpClient.java:184)
...
Locked ownable synchronizers:
- None
A comparisson of the source and target file in this situation shows:
Somewhere in the middle of the target file, a block sized by a multiple of 4k is missing. I observed the following blocksizes: 4k, 8k, 32k. After that block, the content again matches. The target file is always a bit shorter, but not in the same way as the missing block. What i often observe: if a 32k block is missing in the middle of the file, the file is around 55k smaller. The final content of the target file does not match the final content of the source. Maybe this is due to some buffered data not yet written by the sshd?
My test file is about 260MB: $ seq -w 30000000> testfile
Using testfiles of size ~3MB or ~30MB i was not able to reproduce this problem. I need to transfer the file repeatedly for several hours to run into the problem.
Currently i was able to reproduce this problem with the following target configurations:
cygwin, OpenSSH_7.2p2, OpenSSL 1.0.2g 1 Mar 2016
cygwin, OpenSSH_7.6p1, OpenSSL 1.0.2n 7 Dec 2017
cygwin, OpenSSH_6.6.1p1, OpenSSL 1.0.1i 6 Aug 2014
On the other hand, i was not able to reproduce it on
linux, OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.8, OpenSSL 1.0.1f 6 Jan 2014
linux, OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
While it currently might looking like a cygwin problem, couldn't it be a timing/concurrency issue which only occurs due to the limited performance of cygwin?
Please let me know if (and how) i can provide more information.
This is the code i use to perform the scp put:
public void scp(String local, String remote, String hostname, int port, String username, String keyfile, String passphrase) throws IOException {
try {
SshConnector con = SshConnector.createInstance();
con.getContext().setHostKeyVerification(
new DisabledKnownHostsVerification());
con.getContext().setPreferredPublicKey(Ssh2Context.PUBLIC_KEY_SSHRSA);
try (SocketTransport t = new SocketTransport(hostname, port)) {
t.setTcpNoDelay(true);
Ssh2Client ssh2 = (Ssh2Client) con.connect(t, username, true);
try {
PublicKeyAuthentication pk = new PublicKeyAuthentication();
try (FileInputStream fis = new FileInputStream(new File(System.getProperty("user.home") + File.separatorChar + ".ssh" + File.separatorChar + keyfile))) {
SshPrivateKeyFile pkfile = SshPrivateKeyFileFactory
.parse(fis);
SshKeyPair pair;
try {
if (pkfile.isPassphraseProtected()) {
if (passphrase == null) {
throw new RuntimeException("private key is passphrase protected, but passphrase is missing");
}
pair = pkfile.toKeyPair(passphrase);
} else {
pair = pkfile.toKeyPair(null);
}
} catch (InvalidPassphraseException e) {
throw new RuntimeException("The causing exception may be due to a wrong passphrase", e);
}
pk.setPrivateKey(pair.getPrivateKey());
pk.setPublicKey(pair.getPublicKey());
}
if (ssh2.isConnected()) {
ssh2.authenticate(pk);
if (ssh2.isAuthenticated()) {
new ScpClient(ssh2).putFile(local, remote, false, null, false);
} else {
throw new RuntimeException("Authentication failed!");
}
} else {
throw new RuntimeException("Connection failed!");
}
} finally {
ssh2.disconnect();
}
}
} catch (SshException | ChannelOpenException e) {
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new RuntimeException(e.getMessage(), e);
}
}
Hi,
I use this library to transfer files sized up to some gigabytes. Sometimes, quite rare, such a transfer doesn't complete and runs forever without any progress. Whenever i do a thread dump in this situation, the output is always as follows:
A comparisson of the source and target file in this situation shows:
Somewhere in the middle of the target file, a block sized by a multiple of 4k is missing. I observed the following blocksizes: 4k, 8k, 32k. After that block, the content again matches. The target file is always a bit shorter, but not in the same way as the missing block. What i often observe: if a 32k block is missing in the middle of the file, the file is around 55k smaller. The final content of the target file does not match the final content of the source. Maybe this is due to some buffered data not yet written by the sshd?
My test file is about 260MB:
$ seq -w 30000000> testfileUsing testfiles of size ~3MB or ~30MB i was not able to reproduce this problem. I need to transfer the file repeatedly for several hours to run into the problem.
Currently i was able to reproduce this problem with the following target configurations:
cygwin, OpenSSH_7.2p2, OpenSSL 1.0.2g 1 Mar 2016
cygwin, OpenSSH_7.6p1, OpenSSL 1.0.2n 7 Dec 2017
cygwin, OpenSSH_6.6.1p1, OpenSSL 1.0.1i 6 Aug 2014
On the other hand, i was not able to reproduce it on
linux, OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.8, OpenSSL 1.0.1f 6 Jan 2014
linux, OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
While it currently might looking like a cygwin problem, couldn't it be a timing/concurrency issue which only occurs due to the limited performance of cygwin?
Please let me know if (and how) i can provide more information.
This is the code i use to perform the scp put: