Skip to content

Commit f199ecd

Browse files
author
Pan
committed
Updated documentation and docstrings.
Updated readme. Added changelog.
1 parent 6569221 commit f199ecd

File tree

6 files changed

+188
-102
lines changed

6 files changed

+188
-102
lines changed

Changelog.rst

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
Change Log
2+
============
3+
4+
1.2.0
5+
++++++
6+
7+
Changes
8+
---------
9+
10+
* New ``ssh2-python`` (``libssh2``) based clients
11+
12+
Fixes
13+
--------
14+
15+
* Remote path for SFTP operations was created incorrectly on Windows - #88 - thanks @moscoquera
16+
* Parallel client key error when openssh config with a host name override was used - #93
17+
18+
1.1.1
19+
++++++
20+
21+
Changes
22+
---------
23+
24+
* Accept Paramiko version ``2`` but < ``2.2`` (it's buggy).
25+
26+
1.1.0
27+
+++++++
28+
29+
Changes
30+
---------
31+
32+
* Allow passing on of additional keyword arguments to underlying SSH library via ``run_command`` - #85
33+
34+
1.0.0
35+
+++++++
36+
37+
Changes from `0.9x` series API
38+
--------------------------------
39+
40+
- `ParallelSSHClient.join` no longer consumes output buffers
41+
- Command output is now a dictionary of host name -> `host output object <http://parallel-ssh.readthedocs.io/en/latest/output.html>`_ with `stdout` and et al attributes. Host output supports dictionary-like item lookup for backwards compatibility. No code changes are needed to output use though documentation will from now on refer to the new attribute style output. Dictionary-like item access is deprecated and will be removed in future major release, like `2.x`.
42+
- Made output encoding configurable via keyword argument on `run_command` and `get_output`
43+
- `pssh.output.HostOutput` class added to hold host output
44+
- Added `copy_remote_file` function for copying remote files to local ones in parallel
45+
- Deprecated since `0.70.0` `ParallelSSHClient` API endpoints removed
46+
- Removed setuptools >= 28.0.0 dependency for better compatibility with existing installations. Pip version dependency remains for Py 2.6 compatibility with gevent - documented on project's readme
47+
- Documented `use_pty` parameter of run_command
48+
- `SSHClient` `read_output_buffer` is now public function and has gained callback capability
49+
- If using the single `SSHClient` directly, `read_output_buffer` should now be used to read output buffers - this is not needed for `ParallelSSHClient`
50+
- `run_command` now uses named positional and keyword arguments

README.rst

Lines changed: 36 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -65,15 +65,15 @@ Run ``uname`` on two remote hosts in parallel with ``sudo``.
6565
Native code client
6666
*******************
6767

68-
As of version ``1.2.0``, a new client is supported in ``ParallelSSH`` which offers much greater performance and reduced overhead than the current default client (paramiko). Binary wheel packages with ``libssh2`` included are provided for Linux, OSX and Windows platforms and all supported Python versions.
68+
As of version ``1.2.0``, a new client is supported in ``ParallelSSH`` which offers much greater performance and reduced overhead than the current default client library. Binary wheel packages with ``libssh2`` included are provided for Linux, OSX and Windows platforms and all supported Python versions.
6969

70-
The new client is based on ``libssh2`` via the ``ssh2-python`` extension library and supports non-blocking mode natively. In addition, SFTP push/pull operations in the new client have also been implemented in native code without Python's GIL, allowing for much greater performance and significantly reduced overhead.
70+
The new client is based on ``libssh2`` via the ``ssh2-python`` extension library and supports non-blocking mode natively. In addition, SFTP push/pull operations in the new client have also been implemented in native code, allowing for much greater performance and significantly reduced overhead.
7171

72-
See < here > for a performance comparison of the two clients.
72+
See `this post <https://parallel-ssh.org/post/pssh>`_ for a performance comparison of the available clients.
7373

74-
To make use of this new client, ``ParallelSSHClient`` can be imported from ``pssh.pssh2_client`` instead of ``pssh.pssh_client``. The respective APIs are almost identical, though some things have either not yet been implemented or are not supported in ``libssh2``.
74+
To make use of this new client, ``ParallelSSHClient`` can be imported from ``pssh.pssh2_client`` instead. The respective APIs are almost identical, though some features have either not yet been implemented or are not supported by ``libssh2``.
7575

76-
Note that the new client will become the default and will replace the current ``pssh.pssh_client`` in a new major version of the library - ``2.x.x`` - once remaining features have been implemented.
76+
Note that the new client will become the default and will replace the current ``pssh.pssh_client`` in a new major version of the library - ``2.x.x`` - once remaining features have been implemented. The current client will remain available as an option under a new name.
7777

7878
For example:
7979

@@ -91,20 +91,18 @@ For example:
9191
print(line)
9292
9393
94-
Compared to the current default, the native client currently lacks proxying/tunnelling implementation, as well as SSH agent forwarding. The latter is not currently supported by ``libssh2``.
95-
96-
See documentation for more information on how the two clients compare feature
94+
See `documentation <http://parallel-ssh.readthedocs.io/en/latest/ssh2.html>`_ for a feature comparison of the two clients.
9795

9896

9997
****************************
10098
Native Code Client Features
10199
****************************
102100

103-
* Highest performance and least overhead of any currently available Python SSH libraries
104-
* Native non-blocking client based on ``libssh2`` via the ``ssh2-python`` wrapper
105-
* Thread safe - utilises both native threads for blocking calls like authentication and non-blocking network requests
106-
* Natively non-blocking - **no monkey patching of the Python standard library**
107-
* Native binary-like SFTP speeds thanks to SFTP and local file read/write operations being implemented in native code
101+
* Highest performance and least overhead of any Python SSH libraries
102+
* Thread safe - utilises both native threads for blocking calls like authentication and non-blocking I/O
103+
* Natively non-blocking utilising ``libssh2`` via ``ssh2-python`` - **no monkey patching of the Python standard library**
104+
* Native binary-like SFTP speeds thanks to SFTP and local file read/write native code implementations
105+
* Significantly reduced overhead in CPU and memory usage
108106

109107

110108
***********
@@ -153,13 +151,10 @@ Similarly, output and exit codes are available after ``client.join`` is called:
153151
0
154152
<..stdout..>
155153
156-
.. note::
157-
158-
In versions prior to ``1.0.0`` only, ``client.join`` would consume standard output.
159154
160-
There is also a built in host logger that can be enabled to log output from remote hosts. The helper function ``pssh.utils.enable_host_logger`` will enable host logging to stdout.
155+
There is also a built in host logger that can be enabled to log output from remote hosts. The helper function ``pssh.utils.enable_host_logger`` will enable host logging to stdout.
161156

162-
To log output without having to iterate over standard output generators, the ``consume_output`` flag can be enabled, for example:
157+
To log output without having to iterate over output generators, the ``consume_output`` flag can be enabled - for example:
163158

164159
.. code-block:: python
165160
@@ -172,38 +167,7 @@ To log output without having to iterate over standard output generators, the ``c
172167
173168
[localhost] Linux
174169
175-
*****************
176-
Design And Goals
177-
*****************
178-
179-
``ParallelSSH``'s design goals and motivation are to provide a *library* for running *non-blocking* asynchronous SSH commands in parallel with little to no load induced on the system by doing so with the intended usage being completely programmatic and non-interactive.
180-
181-
To meet these goals, API driven solutions are preferred first and foremost. This frees up developers to drive the library via any method desired, be that environment variables, CI driven tasks, command line tools, existing OpenSSH or new configuration files, from within an application et al.
182-
183-
********
184-
Scaling
185-
********
186-
187-
Some guide lines on scaling ``ParallelSSH`` client and pool size numbers.
188-
189-
In general, long lived commands with little or no output *gathering* will scale better. Pool sizes in the multiple thousands have been used successfully with little CPU overhead in the single process running them in these use cases.
190-
191-
Conversely, many short lived commands with output gathering will not scale as well. In this use case, smaller pool sizes in the hundreds are likely to perform better with regards to CPU overhead in the event loop. Multiple python processes, each with its own event loop, may be used to scale this use case further as CPU overhead allows.
192-
193-
Gathering is highlighted here as output generation does not affect scaling. Only when output is gathered either over multiple still running commands, or while more commands are being triggered, is overhead increased.
194-
195-
Technical Details
196-
******************
197-
198-
To understand why this is, consider that in co-operative multi tasking, which is being used in this project via the ``gevent`` library, a co-routine (greenlet) needs to ``yield`` the event loop to allow others to execute - *co-operation*. When one co-routine is constantly grabbing the event loop in order to gather output, or when co-routines are constantly trying to start new short-lived commands, it causes overhead with other co-routines that also want to use the event loop.
199-
200-
This manifests itself as increased CPU usage in the process running the event loop and reduced performance with regards to scaling improvements from increasing pool size.
201-
202-
On the other end of the spectrum, long lived remote commands that generate *no* output only need the event loop at the start, when they are establishing connections, and at the end, when they are finished and need to gather exit codes, which results in practically zero CPU overhead at any time other than start or end of command execution.
203170
204-
Output *generation* is done remotely and has no effect on the event loop until output is gathered - output buffers are iterated on. Only at that point does the event loop need to be held.
205-
206-
********
207171
SFTP/SCP
208172
********
209173

@@ -235,47 +199,43 @@ Directory recursion is supported in both cases via the ``recurse`` parameter - d
235199

236200
See `SFTP documentation <http://parallel-ssh.readthedocs.io/en/latest/advanced.html#sftp>`_ for more examples.
237201

238-
**************************
239-
Frequently asked questions
240-
**************************
241202

242-
:Q:
243-
Why should I use this library and not, for example, `fabric <https://github.com/fabric/fabric>`_?
203+
*****************
204+
Design And Goals
205+
*****************
244206

245-
:A:
246-
In short, the tools are intended for different use cases.
207+
``ParallelSSH``'s design goals and motivation are to provide a *library* for running *non-blocking* asynchronous SSH commands in parallel with little to no load induced on the system by doing so with the intended usage being completely programmatic and non-interactive.
247208

248-
``ParallelSSH`` is a parallel SSH client library that scales well over hundreds to hundreds of thousands of hosts - per `Design And Goals`_ - a use case that is very common on cloud platforms and virtual machine automation. It would be best used where it is a good fit for the use case at hand.
209+
To meet these goals, API driven solutions are preferred first and foremost. This frees up developers to drive the library via any method desired, be that environment variables, CI driven tasks, command line tools, existing OpenSSH or new configuration files, from within an application et al.
249210

250-
Fabric and tools like it on the other hand are not well suited to such use cases, for many reasons, performance and differing design goals in particular. The similarity is only that these tools also make use of SSH to run commands.
211+
********
212+
Scaling
213+
********
251214

252-
``ParallelSSH`` is in other words well suited to be the SSH client tools like Fabric and Ansible and others use to run their commands rather than a direct replacement for.
215+
Some guide lines on scaling ``ParallelSSH`` client and pool size numbers.
253216

254-
By focusing on providing a well defined, lightweight - actual code is a few hundred lines - library, ``ParallelSSH`` is far better suited for *run this command on these hosts* tasks for which frameworks like Fabric, Capistrano and others are overkill and unsuprisignly, as it is not what they are for, ill-suited to and do not perform particularly well with.
217+
In general, long lived commands with little or no output *gathering* will scale better. Pool sizes in the multiple thousands have been used successfully with little CPU overhead in the single process running them in these use cases.
255218

256-
Fabric and tools like it are high level deployment frameworks - as opposed to general purpose libraries - for building deployment tasks to perform on hosts matching a role with task chaining, a DSL like syntax and are primarily intended for command line use - very far removed from an SSH client *library*.
219+
Conversely, many short lived commands with output gathering will not scale as well. In this use case, smaller pool sizes in the hundreds are likely to perform better with regards to CPU overhead in the event loop. Multiple python processes, each with its own event loop, may be used to scale this use case further as CPU overhead allows.
257220

258-
Fabric in particular is a port of `Capistrano <https://github.com/capistrano/capistrano>`_ from Ruby to Python. Its design goals are to provide a faithful port of Capistrano with its `tasks` and `roles` framework to python with interactive command line being the intended usage.
221+
Gathering is highlighted here as output generation does not affect scaling. Only when output is gathered either over multiple still running commands, or while more commands are being triggered, is overhead increased.
259222

260-
Furthermore, Fabric's use as a library is non-standard and in `many <https://github.com/fabric/fabric/issues/521>`_ `cases <https://github.com/fabric/fabric/pull/674>`_ `just <https://github.com/fabric/fabric/pull/1215>`_ `plain <https://github.com/fabric/fabric/issues/762>`_ `broken <https://github.com/fabric/fabric/issues/1068>`_ and currently stands at over 7,000 lines of code most of which is lacking code testing.
223+
Technical Details
224+
******************
261225

262-
In addition, Fabric's parallel command implementation uses a combination of both threads and processes with extremely high CPU usage and system load while running with as little as hosts in the single digits.
226+
To understand why this is, consider that in co-operative multi tasking, which is being used in this project via the ``gevent`` library, a co-routine (greenlet) needs to ``yield`` the event loop to allow others to execute - *co-operation*. When one co-routine is constantly grabbing the event loop in order to gather output, or when co-routines are constantly trying to start new short-lived commands, it causes overhead with other co-routines that also want to use the event loop.
263227

264-
:Q:
265-
Is Windows supported?
228+
This manifests itself as increased CPU usage in the process running the event loop and reduced performance with regards to scaling improvements from increasing pool size.
229+
230+
On the other end of the spectrum, long lived remote commands that generate *no* output only need the event loop at the start, when they are establishing connections, and at the end, when they are finished and need to gather exit codes, which results in practically zero CPU overhead at any time other than start or end of command execution.
266231

267-
:A:
268-
The library installs and works on Windows though not formally supported as unit tests are currently Posix system based.
269-
270-
Pip versions >= 8.0 are required for binary package installation of ``gevent`` on Windows, a dependency of ``ParallelSSH``.
271-
272-
Though ``ParallelSSH`` is pure python code and will run on any platform that has a working Python interpreter, its ``gevent`` dependency and certain dependencies of ``paramiko`` contain native code which either needs a binary package to be provided for the platform or to be built from source. Binary packages for ``gevent`` are provided for OSX, Linux and Windows platforms as of this time of writing.
232+
Output *generation* is done remotely and has no effect on the event loop until output is gathered - output buffers are iterated on. Only at that point does the event loop need to be held.
273233

274-
:Q:
275-
Is there a user's group for feedback and discussion about ParallelSSH?
234+
*************
235+
User's group
236+
*************
276237

277-
:A:
278-
There is a public `ParallelSSH Google group <https://groups.google.com/forum/#!forum/parallelssh>`_ setup for this purpose - both posting and viewing are open to the public.
238+
here is a public `ParallelSSH Google group <https://groups.google.com/forum/#!forum/parallelssh>`_ setup for this purpose - both posting and viewing are open to the public.
279239

280240
.. image:: https://ga-beacon.appspot.com/UA-9132694-7/parallel-ssh/README.rst?pixel
281241
:target: https://github.com/igrigorik/ga-beacon

doc/advanced.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -305,7 +305,7 @@ No output from ``stderr``.
305305
SFTP
306306
*****
307307
308-
SFTP - `SCP version 2` - is supported by ``Parallel-SSH`` and two functions are provided by the client for copying files with SFTP.
308+
SFTP - `SCP version 2` - is supported by ``parallel-ssh`` and two functions are provided by the client for copying files with SFTP.
309309
310310
SFTP does not have a shell interface and no output is provided for any SFTP commands.
311311

doc/ssh2.rst

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,24 @@ Clients Feature Comparison
33

44
For the ``ssh2-python`` (``libssh2``) based clients, not all features supported by the paramiko based clients are currently supported by the underlying library or implemented in ``parallel-ssh``.
55

6-
Below is a comparison of the differing feature support for the two client types.
6+
Below is a comparison of feature support for the two client types.
77

8-
============================== ========= ======================
9-
Feature Paramiko ssh2-python (libssh2)
10-
============================== ========= ======================
8+
=============================== ========= ======================
9+
Feature paramiko ssh2-python (libssh2)
10+
=============================== ========= ======================
1111
Agent forwarding Yes Not supported
1212
Proxying/tunnelling Yes Not yet implemented
1313
Kerberos (GSS) authentication Yes Not supported
1414
Per-channel timeout setting Yes Not supported
1515
Public key from memory Yes Not yet implemented
16-
============================== ========= ======================
16+
SFTP copy to/from hosts Yes Yes
17+
Agent authentication Yes Yes
18+
Private key file authentication Yes Yes
19+
Password authentication Yes Yes
20+
Session timeout setting Yes Yes
21+
Per-channel timeout setting Yes Not supported
22+
Programmatic SSH agent Yes Not supported
23+
OpenSSH config parsing Yes Not yet implemented
24+
=============================== ========= ======================
1725

18-
If any of these features are required for a use case, then the paramiko based clients should be used instead. In all other cases the ``ssh2-python`` clients are preferred.
26+
If any of missing features are required for a use case, then the paramiko based clients should be used instead. In all other cases the ``ssh2-python`` based clients offer significantly greater performance at less overhead and are preferred.

0 commit comments

Comments
 (0)