Use ThreadLocal caching for ExtendedRandom PRNG contexts#1255
Use ThreadLocal caching for ExtendedRandom PRNG contexts#1255
Conversation
| } | ||
|
|
||
| public synchronized void nextBytes(byte[] bytes) throws OCKException { | ||
| public void nextBytes(byte[] bytes) throws OCKException { |
There was a problem hiding this comment.
Did we find a way to check that ICC doesn't have an issue with concurrent calls, even with separate contexts?
There was a problem hiding this comment.
I added the synchronized back since the existing RandomBenchmark test will crash if synchronized is removed and the benchmark is run with:
-Djmh.threads=16
This is due to the benchmark setup:
@Setup
public void setup() throws Exception {
insertProvider("OpenJCEPlus");
randomOpenJCEPlusSHA256DRBG = SecureRandom.getInstance("SHA256DRBG", "OpenJCEPlus");
randomOpenJCEPlusSHA512DRBG = SecureRandom.getInstance("SHA512DRBG", "OpenJCEPlus");
randomSUNSHA1PRNG = SecureRandom.getInstance("SHA1PRNG", "SUN");
randomSUNDRBG = SecureRandom.getInstance("DRBG", "SUN");
payload = new byte[payloadSize];
random.nextBytes(payload);
}
Each SecureRandom instance is initialized in the @Setup method. And due to the benchmark state is configured as:
@State(Scope.Benchmark)
Means only one instance of the benchmark state is created, and the @Setup method runs only once. As a result, all benchmark threads share the same SecureRandom instances.
During the benchmark execution, multiple threads call:
@Benchmark
public byte[] runOpenJCEPlusSHA256DRBG() {
randomOpenJCEPlusSHA256DRBG.nextBytes(payload);
return payload;
}
So, the same instance is accessed concurrently by all threads.
From what I observed, even though ThreadLocal is used in ExtendedRandom.java, the benchmark configuration with @State(Scope.Benchmark) still causes the SecureRandom instance itself to be shared across threads.
Changing the benchmark state to below works, since it would give each thread its own instance and avoid this issue. However, most benchmarks typically use @State(Scope.Benchmark) by default.
@State(Scope.Thread)
In addition, based on the results from both RandomBenchmark and RandomNewInstanceBenchmark, keeping synchronized does not show any performance difference compared to removing it.
So, I suggest keeping the synchronized and leaving the existing RandomBenchmark unchanged.
@jasonkatonica FYI.
There was a problem hiding this comment.
So, if we keep the synchronized, do we need the thread local contexts or is just one enough?
There was a problem hiding this comment.
I tried to remove the thread local and change the code as below:
private static PRNGContextPointer prngContextBufferSha256;
private static PRNGContextPointer prngContextBufferSha512;
private static synchronized long getPRNGContext(OCKContext ockContext,
String algName, OpenJCEPlusProvider provider) throws OCKException {
PRNGContextPointer prngCtx;
switch (algName) {
case "SHA256":
if (prngContextBufferSha256 == null) {
prngContextBufferSha256 =
new PRNGContextPointer(ockContext.getId(), algName, provider);
}
prngCtx = prngContextBufferSha256;
break;
case "SHA512":
if (prngContextBufferSha512 == null) {
prngContextBufferSha512 =
new PRNGContextPointer(ockContext.getId(), algName, provider);
}
prngCtx = prngContextBufferSha512;
break;
default:
throw new IllegalArgumentException(
"Unsupported HASHDRBG algorithm: " + algName);
}
return prngCtx.getCtx();
}
Running RandomBenchmark works fine, but there is no noticeable performance difference compared to use ThreadLocal. However, when running RandomNewInstanceBenchmark, which gets a new instance each time before calling nextBytes(), the test crashes.
I think that after removing ThreadLocal, the code changes from using a per-thread native PRNG context to using a single shared native PRNG context across all threads. The synchronization in nextBytes() and setSeed() is only on the ExtendedRandom instance, not on the shared native PRNG context itself. If the native ICC/OCK PRNG context is not thread-safe for concurrent use, this may cause the crash.
75c4eb8 to
71f77f3
Compare
| OCKContext ockContext; | ||
| final long ockPRNGContextId; | ||
|
|
||
| private static final ThreadLocal<PRNGContextPointer> prngContextBufferSha256 = new ThreadLocal<PRNGContextPointer>(); |
There was a problem hiding this comment.
I think we still have outstanding questions here if we can share contexts in the same thread. In the scenario we are worried about we would have two instances of SecureRandom and the state data associated with one instance will be influenced by the state data in the context by the other instance of SecureRandom. Im not sure if this is a problem or not in OCKC while generating randoms. It might be fine to share state between two streams of random like this but not sure.
There was a problem hiding this comment.
From this code change, ThreadLocal prevents sharing across threads, but yes, it still allows sharing within the same thread between different SecureRandom instances that use the same algorithm. And, we keep synchronized on nextBytes(byte[] bytes) and setSeed(byte[] seed), so access to the native context is serialized.
From NativeInterface_EXTRAND_create, the native context is created based on the algorithm and then returned. I do not see any instance-specific state passed in during creation, so from this code, it looks possible that the same context could be reused within the same thread.
The GSKit user guide states that RNG_CTXs are not intrinsically thread-safe, but it does not specify whether reusing the same context sequentially within a single thread across multiple instances is supported/allowed or not.
From the benchmark tests, we have tests: reusing one SecureRandom instance repeatedly, and creating a new instance each time before calling nextBytes(). I ran the JMH tests with both 1 thread and 16 threads, and I did not see any failures or issues.
So for the question of whether contexts can be shared within the same thread by multiple instances, I don't think we have documentation that clearly proves it. Just based on the tests, I did not see any issue.
And one additional is, in the existing benchmark, we already reuse the same context repeatedly through a single SecureRandom instance on the same thread. That is not exactly the same scenario as here, where multiple SecureRandom instances with the same algorithm may sharing one context on the same thread. However, since I don't see any instance-specific state in context creation, and the benchmark tests did not show any issue, I don't see that same-thread multiple instances sharing is causing a problem.
There was a problem hiding this comment.
We probably need to test this on all the platforms. Threading is different on the different platforms and there maybe a probably on some that do not happen on others.
There was a problem hiding this comment.
So far, since Fyre VM does not have many platform options, I have run it on both x and z Linux and did not find any threading issues. Is there any particular platform you would like me to check?
There was a problem hiding this comment.
Yes, I ran the functional tests on these platforms and they look good. Right now, I’m running the JMH performance tests on these platforms and will share the results once the runs are complete.
There was a problem hiding this comment.
Sorry for the late update. I ran all the benchmark tests in the SecurityPerformancePipeline on both ppc64_aix and x86-64_windows, and the results look good. I did not see any issues.
| return prngCtx.getCtx(); | ||
| } | ||
|
|
||
| public synchronized void nextBytes(byte[] bytes) throws OCKException { |
There was a problem hiding this comment.
I can understand that since the test is crashed so we have to add the synchronized back.
However, i feel there is a contradiction between using the synchronized and the threadlocal, isn't it? I mean the primary advantage of using ThreadLocal is to eliminate contention by giving each thread its own isolated resource. Using synchronized on the instance level, any multi-threaded workload sharing a SecureRandom instance will be forced to execute serially anyway.
There was a problem hiding this comment.
For the synchronized issue we discussed last Friday, the failure does not occur in the newly added test RandomNewInstanceBenchmark. The crash happens in the existing JMH test RandomBenchmark.
I moved the payload initialization into the benchmark method, as shown below, but it still crashes.
@Benchmark
public byte[] runOpenJCEPlusSHA256DRBG() {
byte[] payload = new byte[payloadSize];
random.nextBytes(payload);
randomOpenJCEPlusSHA256DRBG.nextBytes(payload);
return payload;
}
I think it is not only the shared payload, but also the shared SecureRandom instance created in @Setup.
@Setup
public void setup() throws Exception {
insertProvider("OpenJCEPlus");
randomOpenJCEPlusSHA256DRBG = SecureRandom.getInstance("SHA256DRBG", "OpenJCEPlus");
randomOpenJCEPlusSHA512DRBG = SecureRandom.getInstance("SHA512DRBG", "OpenJCEPlus");
randomSUNSHA1PRNG = SecureRandom.getInstance("SHA1PRNG", "SUN");
randomSUNDRBG = SecureRandom.getInstance("DRBG", "SUN");
payload = new byte[payloadSize];
random.nextBytes(payload);
}
With @State(Scope.Benchmark), JMH creates one shared state object for the whole benchmark. That means the fields initialized in @Setup can be accessed by all threads. So, even ExtendedRandom.java uses ThreadLocal for the native PRNG context, the SecureRandom instance itself is still shared across threads because it is created once in @Setup and stored in the shared benchmark state.
ThreadLocal value is resolved during ExtendedRandom initialization, and the native PRNG context ID is then stored inside the object. After that, all threads use the same shared object, so they end up using the same stored native context ID to get the native context, even though that context was originally initialized from one thread’s ThreadLocal.
So either we remove synchronized and change the benchmark test from @State(Scope.Benchmark) to @State(Scope.Thread), or we keep synchronized, since removing it does not show any noticeable performance difference.
| this.ockContext = ockContext; | ||
| this.provider = provider; | ||
|
|
||
| this.provider.registerCleanable(this, cleanOCKResources(prngCtx, ockContext)); |
There was a problem hiding this comment.
I have one concern for the cleaner methods considering a specific scenario which is Thread Pools.
Under this scenario, a thread maintains an internal ThreadLocalMap. In this ThreadLocalMap, the key is the ThreadLocal object itself, and the value is the instance of the internal class PRNGContextPointer in ExtendedRandom class. Because the ThreadLocal variables are defined as private static, so the keys inside the host thread's ThreadLocalMap are held by strong references.
Im afraid that the map entries are never naturally cleared. Because the values (PRNGContextPointer instances) remain strongly reachable indefinitely, they will never be eligible for garbage collection, completely preventing the Cleaner from ever being triggered. This will inevitably result in a native memory leak.
There was a problem hiding this comment.
We discussed this in last Friday’s meeting. If there are any further concerns, just let me know.
4a97876 to
fb77737
Compare
c8e276f to
ff9268f
Compare
| @State(Scope.Benchmark) | ||
| @Warmup(iterations = 3, time = 10, timeUnit = TimeUnit.SECONDS) | ||
| @Measurement(iterations = 4, time = 30, timeUnit = TimeUnit.SECONDS) | ||
| public class RandomNewInstanceBenchmark extends JMHBase { |
There was a problem hiding this comment.
Since this is a new test it would need to be added to the JenkinsfilePerformance file such that is available for users to select.
There was a problem hiding this comment.
Yes, added into JenkinsfilePerformance.
|
@johnpeck-us-ibm Sorry, I accidentally clicked the refresh button. I noticed that you had already approved this PR, but I’m not sure if you need to click Approve button again because of that. |
Add ThreadLocal caching for native PRNG contexts used by ExtendedRandom. Each thread creates and reuses a PRNG context for supported DRBG algorithms. This avoids repeated EXTRAND_create calls when instances are created frequently. Benefits: - Reduce native allocation overhead - Reuse PRNG contexts per thread - Improve performance in RNG-heavy workloads Signed-off-by: Tao Liu <tao.liu@ibm.com>
4d16a3f to
fe5383a
Compare
Add ThreadLocal caching for native PRNG contexts used by ExtendedRandom. Each thread creates and reuses a PRNG context for supported DRBG algorithms.
This avoids repeated EXTRAND_create calls when instances are created frequently.
Benefits: