Nashorn Java to JavaScript interoperability issues

This page is to document, discuss and possible solutions for short comings of the Java JavaScript interoperability in Java8

Extending a Spark Lambda function, Nashorn does not support extending Java class that extends java.io.Serializable

Spark processing id accomplished by providing Lambda functions to Spark class for example RDD

 JavaRDD complete_ratings_data = complete_ratings_raw_data.filter(new Function<String, Boolean>() {
            public Boolean call(String line) {
                if (line.equals(complete_ratings_raw_data_header)) {
                    return false;
                } else {
                    return true;
                }
            }
        });

In the Example above we are implementing the org.apache.spark.api.java.function.Function class The Nashorn Documentation states that we can implement/extend a Java class by either

// This syntax is primarily used to support anonymous class-like syntax for
// Java interface implementation as shown below.
 
var r = new java.lang.Runnable() {
    run: function() { print("run"); }
}

or


var ArrayList = Java.type("java.util.ArrayList")
var ArrayListExtender = Java.extend(ArrayList)
var printSizeInvokedArrayList = new ArrayListExtender() {
    size: function() { print("size invoked!"); }
}

So for org.apache.spark.api.java.function.Function we would have code like:

 var jsFunc = new org.apache.spark.api.java.function.Function() {
        call: function(line) {
            return line != "userId,movieId,rating,timestamp"; //complete_ratings_raw_data_header;
        }
    }
 var xx = complete_ratings_raw_data_JavaObj.filter(boolFunctionExtender);

or

    var sparkFunction = Java.type("org.apache.spark.api.java.function.Function")
    var sparkFunctionExtender = Java.extend(sparkFunction)

    var boolFunctionExtender = new sparkFunctionExtender() {
        call: function(line) {
            return line != "userId,movieId,rating,timestamp"; //complete_ratings_raw_data_header;
        }
    }
    var xx = complete_ratings_raw_data_JavaObj.filter(boolFunctionExtender);

will throw the exception Exception in thread "main" java.lang.RuntimeException: org.apache.spark.SparkException: Task not serializable

Nashorn parseInt returns a java.lang.Double

var x = parseInt("3");
print(x.getClass()); // prints java.lang.Double

This causes class cast exceptions when running many mllib classes. One way to fix this would be to add RDD.map() to every place we run into the class cast exceptions and insure we have the correct types by using java.lang.Integer.parseInt() to ensure integers. But a better solution would be to just "monkey patch" the Nashorn implementation of parseInt.

/**
 *  We need to replace the Nashorn's implementation of parseInt becouse it returns
 *  a java.lang.Double. Why you ask, that is a good question!
 *  Any way this really mess up spark as we need a parseInt to be a java.lang.Integer
 *  so we will replace it globally with an implementation that works for spark
 * @param string
 * @param radix
 * @returns {Number}
 * @private
 */
parseInt = function(string, radix) {

    var val = NaN;
    try{
        if (radix) {
            val = java.lang.Integer.parseInt(string, radix);
        } else {
            val = java.lang.Integer.parseInt(string);
        }
    } catch (e) {
        // bad parseInt value
    }

    return val;
};

Automatic conversion of Arrays to/from Java/JavaScript

Convert arrays from JavaScript to Java is not automatic if we have a JavaScript array of int[] we must do

ret = Java.to(l, "int[]");

for double[]

ret = Java.to(l, "double[]");

for Object[]

ret = Java.to(l);

and for a Java array we need to call var keys = Java.from(javaObj.keySet().toArray());

Performance penetal using anonymous functions instead of looping for Arrays

We have notices that looping through and array is faster than using LAMBDA functions to process the array

    a.forEach(function(x) {
        args.push(x);
    });

takes longer than

    for (var i = 1; i < arguments.length; i++) {
        args.push(Serialize.javaToJs(arguments[i]));
    }

The above code snippet is from Utils_invoke So this code is invoked every time we "setup" to call a users LAMBDA functions. I would suspect that the issues is setting up the stack for the anonymous function, while this may not be significant if this were to only happen once but with large datasets like movieLen (100M) this time would be significant.

Running the LAMBDA functions in Nashorn.

Running the LAMBDA functions in Nashorn has a cost, running code that just loads a large dataset and filters the dataset removing a string.

    var obj = complete_ratings_raw_data.getJavaObject();

    var start = new Date().getTime();
    var complete_ratings_data = obj.filter(new org.eclairjs.nashorn.JSFunctionTest());


    print("There are recommendations in the complete dataset:  " + complete_ratings_data.count());

    var end = new Date().getTime();
    var time = end - start;
    print('Execution time: ' + time + " milliseconds");

Running the filter in Java with

package org.eclairjs.nashorn;

import org.apache.commons.lang.ArrayUtils;
import org.apache.spark.api.java.function.Function;

import javax.script.Invocable;
import javax.script.ScriptEngine;

public class JSFunctionTest implements Function {


    public JSFunctionTest() {

    }

    @SuppressWarnings({ "null", "unchecked" })
    @Override
    public Object call(Object l) {
        String line = (String) l;
        if (line.equals("userId,movieId,rating,timestamp")) {
            return false;
        } else {
            return true;
        }
    }
}

gives us a time of

There are recommendations in the complete dataset:  22884377
Execution time: 2379 milliseconds

Changing the LAMDA to run in nashorn using

import org.apache.commons.lang.ArrayUtils;
import org.apache.spark.api.java.function.Function;

import javax.script.Invocable;
import javax.script.ScriptEngine;

public class JSFunctionTest2 implements Function {

    private Object fn = null;

    public JSFunctionTest2() {

    }

    @SuppressWarnings({ "null", "unchecked" })
    @Override
    public Object call(Object l) throws Exception {
        ScriptEngine e =  NashornEngineSingleton.getEngine();
        if (this.fn == null) {
            String func = "function myTestFunc(line) { return line != \"userId,movieId,rating,timestamp\";}";
            this.fn = e.eval(func);
        }
        Invocable invocable = (Invocable) e;

        Object params[] = {this.fn, l};


        Object ret = invocable.invokeFunction("myTestFunc", params);

        return ret;
    }
}

gives us a time of:

There are recommendations in the complete dataset:  22884378
Execution time: 8372 milliseconds

just using Nashorn to run the equivalent JavaScript code without our serialization cost us 60 milliseconds

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nashorn Java to JavaScript interoperability issues

This page is to document, discuss and possible solutions for short comings of the Java JavaScript interoperability in Java8

Extending a Spark Lambda function, Nashorn does not support extending Java class that extends java.io.Serializable

Nashorn parseInt returns a java.lang.Double

Automatic conversion of Arrays to/from Java/JavaScript

Performance penetal using anonymous functions instead of looping for Arrays

Running the LAMBDA functions in Nashorn.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally