Skip to content

Nashorn Java to JavaScript interoperability issues

Bill Reed edited this page Jun 2, 2016 · 7 revisions

This page is to document, discuss and possible solutions for short comings of the Java JavaScript interoperability in Java8

Extending a Spark Lambda function, Nashorn does not support extending Java class that extends java.io.Serializable

Spark processing id accomplished by providing Lambda functions to Spark class for example RDD

 JavaRDD complete_ratings_data = complete_ratings_raw_data.filter(new Function<String, Boolean>() {
            public Boolean call(String line) {
                if (line.equals(complete_ratings_raw_data_header)) {
                    return false;
                } else {
                    return true;
                }
            }
        });

In the Example above we are implementing the org.apache.spark.api.java.function.Function class The Nashorn Documentation states that we can implement/extend a Java class by either

// This syntax is primarily used to support anonymous class-like syntax for
// Java interface implementation as shown below.
 
var r = new java.lang.Runnable() {
    run: function() { print("run"); }
}

or


var ArrayList = Java.type("java.util.ArrayList")
var ArrayListExtender = Java.extend(ArrayList)
var printSizeInvokedArrayList = new ArrayListExtender() {
    size: function() { print("size invoked!"); }
}

So for org.apache.spark.api.java.function.Function we would have code like:

 var jsFunc = new org.apache.spark.api.java.function.Function() {
        call: function(line) {
            return line != "userId,movieId,rating,timestamp"; //complete_ratings_raw_data_header;
        }
    }
 var xx = complete_ratings_raw_data_JavaObj.filter(boolFunctionExtender);

or

    var sparkFunction = Java.type("org.apache.spark.api.java.function.Function")
    var sparkFunctionExtender = Java.extend(sparkFunction)

    var boolFunctionExtender = new sparkFunctionExtender() {
        call: function(line) {
            return line != "userId,movieId,rating,timestamp"; //complete_ratings_raw_data_header;
        }
    }
    var xx = complete_ratings_raw_data_JavaObj.filter(boolFunctionExtender);

will throw the exception Exception in thread "main" java.lang.RuntimeException: org.apache.spark.SparkException: Task not serializable

Nashorn parseInt returns a java.lang.Double

var x = parseInt("3");
print(x.getClass()); // prints java.lang.Double

This causes class cast exceptions when running many mllib classes. One way to fix this would be to add RDD.map() to every place we run into the class cast exceptions and insure we have the correct types by using java.lang.Integer.parseInt() to ensure integers. But a better solution would be to just "monkey patch" the Nashorn implementation of parseInt.

/**
 *  We need to replace the Nashorn's implementation of parseInt becouse it returns
 *  a java.lang.Double. Why you ask, that is a good question!
 *  Any way this really mess up spark as we need a parseInt to be a java.lang.Integer
 *  so we will replace it globally with an implementation that works for spark
 * @param string
 * @param radix
 * @returns {Number}
 * @private
 */
parseInt = function(string, radix) {

    var val = NaN;
    try{
        if (radix) {
            val = java.lang.Integer.parseInt(string, radix);
        } else {
            val = java.lang.Integer.parseInt(string);
        }
    } catch (e) {
        // bad parseInt value
    }

    return val;
};

Clone this wiki locally