Skip to content

Use ThreadLocal caching for ExtendedRandom PRNG contexts#1255

Open
taoliult wants to merge 1 commit intoIBM:mainfrom
taoliult:main_securerandom_native_2
Open

Use ThreadLocal caching for ExtendedRandom PRNG contexts#1255
taoliult wants to merge 1 commit intoIBM:mainfrom
taoliult:main_securerandom_native_2

Conversation

@taoliult
Copy link
Copy Markdown
Collaborator

@taoliult taoliult commented Mar 9, 2026

Add ThreadLocal caching for native PRNG contexts used by ExtendedRandom. Each thread creates and reuses a PRNG context for supported DRBG algorithms.

This avoids repeated EXTRAND_create calls when instances are created frequently.

Benefits:

  • Reduce native allocation overhead
  • Reuse PRNG contexts per thread
  • Improve performance in RNG-heavy workloads

Comment thread src/main/java/com/ibm/crypto/plus/provider/base/ExtendedRandom.java Outdated
Comment thread src/main/java/com/ibm/crypto/plus/provider/base/ExtendedRandom.java Outdated
}

public synchronized void nextBytes(byte[] bytes) throws OCKException {
public void nextBytes(byte[] bytes) throws OCKException {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we find a way to check that ICC doesn't have an issue with concurrent calls, even with separate contexts?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the synchronized back since the existing RandomBenchmark test will crash if synchronized is removed and the benchmark is run with:

-Djmh.threads=16

This is due to the benchmark setup:

@Setup
public void setup() throws Exception {
    insertProvider("OpenJCEPlus");
    randomOpenJCEPlusSHA256DRBG = SecureRandom.getInstance("SHA256DRBG", "OpenJCEPlus");
    randomOpenJCEPlusSHA512DRBG = SecureRandom.getInstance("SHA512DRBG", "OpenJCEPlus");
    randomSUNSHA1PRNG = SecureRandom.getInstance("SHA1PRNG", "SUN");
    randomSUNDRBG = SecureRandom.getInstance("DRBG", "SUN");
    payload = new byte[payloadSize];
    random.nextBytes(payload);
}

Each SecureRandom instance is initialized in the @Setup method. And due to the benchmark state is configured as:

@State(Scope.Benchmark)

Means only one instance of the benchmark state is created, and the @Setup method runs only once. As a result, all benchmark threads share the same SecureRandom instances.

During the benchmark execution, multiple threads call:

@Benchmark
public byte[] runOpenJCEPlusSHA256DRBG() {
    randomOpenJCEPlusSHA256DRBG.nextBytes(payload);
    return payload;
}

So, the same instance is accessed concurrently by all threads.

From what I observed, even though ThreadLocal is used in ExtendedRandom.java, the benchmark configuration with @State(Scope.Benchmark) still causes the SecureRandom instance itself to be shared across threads.

Changing the benchmark state to below works, since it would give each thread its own instance and avoid this issue. However, most benchmarks typically use @State(Scope.Benchmark) by default.

@State(Scope.Thread)

In addition, based on the results from both RandomBenchmark and RandomNewInstanceBenchmark, keeping synchronized does not show any performance difference compared to removing it.

So, I suggest keeping the synchronized and leaving the existing RandomBenchmark unchanged.

@jasonkatonica FYI.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if we keep the synchronized, do we need the thread local contexts or is just one enough?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to remove the thread local and change the code as below:

    private static PRNGContextPointer prngContextBufferSha256;
    private static PRNGContextPointer prngContextBufferSha512;

    private static synchronized long getPRNGContext(OCKContext ockContext,
            String algName, OpenJCEPlusProvider provider) throws OCKException {

        PRNGContextPointer prngCtx;

        switch (algName) {
            case "SHA256":
                if (prngContextBufferSha256 == null) {
                    prngContextBufferSha256 =
                            new PRNGContextPointer(ockContext.getId(), algName, provider);
                }
                prngCtx = prngContextBufferSha256;
                break;

            case "SHA512":
                if (prngContextBufferSha512 == null) {
                    prngContextBufferSha512 =
                            new PRNGContextPointer(ockContext.getId(), algName, provider);
                }
                prngCtx = prngContextBufferSha512;
                break;
            default:
                throw new IllegalArgumentException(
                        "Unsupported HASHDRBG algorithm: " + algName);
        }

        return prngCtx.getCtx();
    }

Running RandomBenchmark works fine, but there is no noticeable performance difference compared to use ThreadLocal. However, when running RandomNewInstanceBenchmark, which gets a new instance each time before calling nextBytes(), the test crashes.

I think that after removing ThreadLocal, the code changes from using a per-thread native PRNG context to using a single shared native PRNG context across all threads. The synchronization in nextBytes() and setSeed() is only on the ExtendedRandom instance, not on the shared native PRNG context itself. If the native ICC/OCK PRNG context is not thread-safe for concurrent use, this may cause the crash.

Comment thread src/test/java/ibm/jceplus/jmh/RandomNewInstanceBenchmark.java Outdated
@taoliult taoliult force-pushed the main_securerandom_native_2 branch 2 times, most recently from 75c4eb8 to 71f77f3 Compare March 11, 2026 21:24
OCKContext ockContext;
final long ockPRNGContextId;

private static final ThreadLocal<PRNGContextPointer> prngContextBufferSha256 = new ThreadLocal<PRNGContextPointer>();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still have outstanding questions here if we can share contexts in the same thread. In the scenario we are worried about we would have two instances of SecureRandom and the state data associated with one instance will be influenced by the state data in the context by the other instance of SecureRandom. Im not sure if this is a problem or not in OCKC while generating randoms. It might be fine to share state between two streams of random like this but not sure.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this code change, ThreadLocal prevents sharing across threads, but yes, it still allows sharing within the same thread between different SecureRandom instances that use the same algorithm. And, we keep synchronized on nextBytes(byte[] bytes) and setSeed(byte[] seed), so access to the native context is serialized.

From NativeInterface_EXTRAND_create, the native context is created based on the algorithm and then returned. I do not see any instance-specific state passed in during creation, so from this code, it looks possible that the same context could be reused within the same thread.

The GSKit user guide states that RNG_CTXs are not intrinsically thread-safe, but it does not specify whether reusing the same context sequentially within a single thread across multiple instances is supported/allowed or not.

From the benchmark tests, we have tests: reusing one SecureRandom instance repeatedly, and creating a new instance each time before calling nextBytes(). I ran the JMH tests with both 1 thread and 16 threads, and I did not see any failures or issues.

So for the question of whether contexts can be shared within the same thread by multiple instances, I don't think we have documentation that clearly proves it. Just based on the tests, I did not see any issue.

And one additional is, in the existing benchmark, we already reuse the same context repeatedly through a single SecureRandom instance on the same thread. That is not exactly the same scenario as here, where multiple SecureRandom instances with the same algorithm may sharing one context on the same thread. However, since I don't see any instance-specific state in context creation, and the benchmark tests did not show any issue, I don't see that same-thread multiple instances sharing is causing a problem.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to test this on all the platforms. Threading is different on the different platforms and there maybe a probably on some that do not happen on others.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, since Fyre VM does not have many platform options, I have run it on both x and z Linux and did not find any threading issues. Is there any particular platform you would like me to check?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AIX and Windows

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I ran the functional tests on these platforms and they look good. Right now, I’m running the JMH performance tests on these platforms and will share the results once the runs are complete.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late update. I ran all the benchmark tests in the SecurityPerformancePipeline on both ppc64_aix and x86-64_windows, and the results look good. I did not see any issues.

return prngCtx.getCtx();
}

public synchronized void nextBytes(byte[] bytes) throws OCKException {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand that since the test is crashed so we have to add the synchronized back.

However, i feel there is a contradiction between using the synchronized and the threadlocal, isn't it? I mean the primary advantage of using ThreadLocal is to eliminate contention by giving each thread its own isolated resource. Using synchronized on the instance level, any multi-threaded workload sharing a SecureRandom instance will be forced to execute serially anyway.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the synchronized issue we discussed last Friday, the failure does not occur in the newly added test RandomNewInstanceBenchmark. The crash happens in the existing JMH test RandomBenchmark.

I moved the payload initialization into the benchmark method, as shown below, but it still crashes.

  @Benchmark
  public byte[] runOpenJCEPlusSHA256DRBG() {
    byte[] payload = new byte[payloadSize];
    random.nextBytes(payload);

    randomOpenJCEPlusSHA256DRBG.nextBytes(payload);
    return payload;
  }

I think it is not only the shared payload, but also the shared SecureRandom instance created in @Setup.

@Setup
  public void setup() throws Exception {
    insertProvider("OpenJCEPlus");
    randomOpenJCEPlusSHA256DRBG = SecureRandom.getInstance("SHA256DRBG", "OpenJCEPlus");
    randomOpenJCEPlusSHA512DRBG = SecureRandom.getInstance("SHA512DRBG", "OpenJCEPlus");
    randomSUNSHA1PRNG = SecureRandom.getInstance("SHA1PRNG", "SUN");
    randomSUNDRBG = SecureRandom.getInstance("DRBG", "SUN");
    payload = new byte[payloadSize];
    random.nextBytes(payload);
  }

With @State(Scope.Benchmark), JMH creates one shared state object for the whole benchmark. That means the fields initialized in @Setup can be accessed by all threads. So, even ExtendedRandom.java uses ThreadLocal for the native PRNG context, the SecureRandom instance itself is still shared across threads because it is created once in @Setup and stored in the shared benchmark state.

ThreadLocal value is resolved during ExtendedRandom initialization, and the native PRNG context ID is then stored inside the object. After that, all threads use the same shared object, so they end up using the same stored native context ID to get the native context, even though that context was originally initialized from one thread’s ThreadLocal.

So either we remove synchronized and change the benchmark test from @State(Scope.Benchmark) to @State(Scope.Thread), or we keep synchronized, since removing it does not show any noticeable performance difference.

@jasonkatonica @KostasTsiounis FYI.

this.ockContext = ockContext;
this.provider = provider;

this.provider.registerCleanable(this, cleanOCKResources(prngCtx, ockContext));
Copy link
Copy Markdown
Collaborator

@JinhangZhang JinhangZhang Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one concern for the cleaner methods considering a specific scenario which is Thread Pools.

Under this scenario, a thread maintains an internal ThreadLocalMap. In this ThreadLocalMap, the key is the ThreadLocal object itself, and the value is the instance of the internal class PRNGContextPointer in ExtendedRandom class. Because the ThreadLocal variables are defined as private static, so the keys inside the host thread's ThreadLocalMap are held by strong references.

Im afraid that the map entries are never naturally cleared. Because the values (PRNGContextPointer instances) remain strongly reachable indefinitely, they will never be eligible for garbage collection, completely preventing the Cleaner from ever being triggered. This will inevitably result in a native memory leak.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this in last Friday’s meeting. If there are any further concerns, just let me know.

@taoliult taoliult force-pushed the main_securerandom_native_2 branch 2 times, most recently from 4a97876 to fb77737 Compare April 1, 2026 15:18
@taoliult taoliult force-pushed the main_securerandom_native_2 branch 2 times, most recently from c8e276f to ff9268f Compare April 20, 2026 18:30
@State(Scope.Benchmark)
@Warmup(iterations = 3, time = 10, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 4, time = 30, timeUnit = TimeUnit.SECONDS)
public class RandomNewInstanceBenchmark extends JMHBase {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a new test it would need to be added to the JenkinsfilePerformance file such that is available for users to select.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, added into JenkinsfilePerformance.

@taoliult
Copy link
Copy Markdown
Collaborator Author

@johnpeck-us-ibm Sorry, I accidentally clicked the refresh button. I noticed that you had already approved this PR, but I’m not sure if you need to click Approve button again because of that.

Add ThreadLocal caching for native PRNG contexts used by
ExtendedRandom. Each thread creates and reuses a PRNG
context for supported DRBG algorithms.

This avoids repeated EXTRAND_create calls when instances
are created frequently.

Benefits:
- Reduce native allocation overhead
- Reuse PRNG contexts per thread
- Improve performance in RNG-heavy workloads

Signed-off-by: Tao Liu <tao.liu@ibm.com>
@taoliult taoliult force-pushed the main_securerandom_native_2 branch from 4d16a3f to fe5383a Compare April 28, 2026 16:17
@taoliult taoliult requested a review from JinhangZhang April 28, 2026 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants