LinkedPipes ETL is an RDF based, lightweight ETL tool.
- REST API based set of components for easy integration
- Library of components to get you started faster
- Sharing of configuration among individual pipelines using templates
- RDF configuration of transformation pipelines
- Linux, Windows, iOS
- Java 11, 10 or Java 8 update 101 or newer
- Git
- Maven, 3.2.5 or newer
- Node.js & npm
So far, you need to compile LP-ETL on your own:
$ git clone https://github.com/linkedpipes/etl.git
$ cd etl
$ mvn installWe recommend using Bash on Ubuntu on Windows or Cygwin and proceeding as with Linux. Nevertheless, it is possible to build and use LP-ETL with pure Windows-based versions of tools.
To run LP-ETL, you need to run the four components it consists of. For debugging purposes, it is useful to store the console logs.
$ cd deploy
$ ./executor.sh >> executor.log &
$ ./executor-monitor.sh >> executor-monitor.log &
$ ./storage.sh >> storage.log &
$ ./frontend.sh >> frontend.log &We recommend using Bash on Ubuntu on Windows or Cygwin and proceeding as with Linux.
Otherwise, in the deploy folder, run
executor.batexecutor-monitor.batstorage.batfrontend.bat
Unless configured otherwise, LinkedPipes ETL should now run on http://localhost:8080.
There are components in the jars directory. Detailed description of how to create your own coming soon.
The configuration file in the deploy directory can be edited, mainly changing paths to working, storage, log and library directories.
Since we are still in the rapid development phase, we update our instance often. This is an update script that we use and you can reuse it if you wish. The script sets the path to Java 8, kills the running components (yeah, it is dirty), the repo is cloned in /opt/lp/etl and we store the console logs in /data/lp/etl
#!/bin/bash
echo Killing Executor
kill `ps ax | grep /executor.jar | grep -v grep | awk '{print $1}'`
echo Killing Executor-monitor
kill `ps ax | grep /executor-monitor.jar | grep -v grep | awk '{print $1}'`
echo Killing Frontend
kill `ps ax | grep node | grep -v grep | awk '{print $1}'`
echo Killing Storage
kill `ps ax | grep /storage.jar | grep -v grep | awk '{print $1}'`
cd /opt/lp/etl
echo Git Pull
git pull
echo Mvn install
mvn clean install
cd deploy
echo Running executor
./executor.sh >> /data/lp/etl/executor.log &
echo Running executor-monitor
./executor-monitor.sh >> /data/lp/etl/executor-monitor.log &
echo Running storage
./storage.sh >> /data/lp/etl/storage.log &
echo Running frontend
./frontend.sh >> /data/lp/etl/frontend.log &
echo Disowning
disown- On some Linux systems, Node.js may be run by
nodejsinstead ofnode. In that case, you need to rewrite this in thedeploy/frontend.shscript. - If you are using Oracle Java 8, accessing HTTPS based URLs and getting
SSLHandshakeException : Received fatal alert: handshake_failurewhen the same URL works from, e.g. Chrome, try installing the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files for JDK/JRE 8 or for Java 9
Update note 3: When upgrading from develop prior to 2017-02-14, you need to delete
{deploy}/jarsand{deploy}/osgi.
Update note 2: When upgrading from master prior to 2016-11-04, you need to move your pipelines folder from e.g.,
/data/lp/etl/pipelinesto/data/lp/etl/storage/pipelines, update the configuration.properites file and possibly the update/restart scripts as there is a new component,storage.
Update note: When upgrading from master prior to 2016-04-07, you need to delete your old execution data (e.g., in /data/lp/etl/working/data)