I observed some strang behavior with Erlang's SNMP in which SNMP response packets would arrive at a server's network interface fairly promptly, but there was a big delay between the packet's arrival and the time the Erlang function returned the data. This is a small program to give some profiling data about SNMP performance. Run it from various locations and pointing at various targets to compare performance. We hope this will help narrow down the performance problem.
This is meant to run in an Erlang Docker container for easy portability. Start one from this project dir with
docker run --rm -it -v $(pwd):/app -w /app -p 21312:21312 \
erlang:18.3.4.11 bash
Three shell scripts give you basic functionality:
-
compile.shcompiles the project -
run.shis the main runner for both dev and real usage; compile first -
clean.shcleans compiled artifacts; could also justgit clean -fdx
The project is built and archived to a tar.gz file that's available on the Github Releases page. At its current state, you need both docker and docker-compose to run it. Then:
-
Download and extract the archive.
-
Change to the extracted directory, like
cd snmp-profiler-X.X.X. -
Run
docker-compose up. -
Open a new terminal, and change to the extracted directory again.
-
Run
docker ps, and find the name or id of the running erlang container. -
Run
docker exec -it -w /app $CONTAINER_ID bash. -
Run the project with
./run.sh [OPTIONS]. -
View the output in Grafana, which is running on port 3000. For security, it's bound only to the local interface, so you'll need to make some kind of proxy/tunnel. I recommend sshuttle, such as
sshuttle -r $ADDRESS $ADDRESSwhere
$ADDRESSis the server running the tool. Then browse tohttp://$ADDRESS:3000. Otherwise, you can use plain ssh to establish a tunnel, likessh -L 1234:localhost:3000 $ADDRESSand then browse to http://localhost:1234.
-
Build in docker-compose so the user doesn't have to go get it themselves.
-
Make running the project simpler overall. There should only be three steps: download, extract, and execute.
-
Figure out what other metrics make sense to track. Should metric names include the switch name? Like instead of
snmp_profiler.sync_get_next, it could besnmp_profiler.ord1.a3-1-1.sync_get_next. -
Verify that all emitted metrics are making it to Graphite/Grafana. I've seen some discrepancies that make me think they're not. Try checking a count of emitted metrics against some of the counts in Grafana.
The current approach of mounting directories into containers can cause issues on hardened servers:
-
docker-compose seems to use /tmp for some intermediate script(s), and if /tmp is mounted with
noexec, it'll fail with an error like this:./docker-compose: error while loading shared libraries: libz.so.1: failed to map segment from shared object: Operation not permittedTo solve this, you can create a "temporary temporary" directory in the project directory, like
mkdir tmpand thenexport TMPDIR=tmpto tell docker-compose to use that instead. -
The Grafana mounted files at
grafana/mountshave to be readable by thegrafanauser in the Grafana container. If the file permissions are too restrictive when you extract the archive, Grafana will fail to start with an error likeCRIT[10-08|14:50:43] failed to parse "/etc/grafana/grafana.ini": open /etc/grafana/grafana.ini: permission deniedTo solve this, you can run
chmod a+rX grafana.
The goal is to test Erlang snmpm:sync_get_next() calls. In troubleshooting, I've observed that this function sends a request packet, and the response packet is received in very short order according to a tcpdump of the interface; however there's a long delay before the erlang function returns the data. This project will focus on observing the behavior of that function.
It allows you to test the performance of a single device or a group of them defined by an input file. For each test run, it produces a report with useful data for comparing against other runs. It will be run by non-developers, so its interface needs to be relatively clean and easy to use.
The run.sh script does initial input arg parsing and passes all args
through to the erlang program. A dedicated config module further
interprets, validates, and stores all the input arguments so they're
sanitized and available to the rest of the application. For sanity,
the arg names are the same at the shell script level as inside the
erlang program, except that in some places, hyphens have to become
underscores because of language constraints.
A primitive logging mechanism lets users choose between three levels of verbosity via command line flags. All logs are written to stdout.
The program should abort with a good error message and non-zero exit
status for any recognizable error. There's a die() function for
this purpose. Be sure to tag internal failures as such. An example of
an internal failure would be asking the config module for a config
item that doesn't exist.