Skip to content

Conversation

@IcoreE
Copy link
Contributor

@IcoreE IcoreE commented Dec 14, 2025

1. Background
The CharRange class (package: org.apache.commons.lang3) is a core component of the CharSet utility, and its hashCode() method is frequently invoked in scenarios such as HashMap/HashSet storage. The current implementation of hashCode() (linear combination) has severe hash collision problems and suboptimal calculation efficiency, which affects the performance of upper-layer applications.

//Current hashCode implementation
@Override
public int hashCode() {
    return 83 + start + 7 * end + (negated ? 1 : 0);
}

2. Problem Description
The current hashCode() implementation has two critical issues: LANG-1802

  • High collision rate: The linear combination (83 + start + 7 * end + flag) easily causes hash value offset. For example:
  1. CharRange.isNotIn((char) 1, (char) 2) (start=1, end=2, negated=true) → hash = 83+1+14+1=99
  2. CharRange.isIn((char) 2, (char) 2) (start=2, end=2, negated=false) → hash = 83+2+14=99

These two logically distinct instances (equals() returns false) have the same hash code, leading to serious bucket conflicts in HashMap.

  • Low calculation efficiency: Relies on multiple arithmetic operations (1 multiplication + 3 additions), which are slower than bitwise operations and fail to utilize the 32-bit space of int efficiently.
//  hash collision cases
@Test
void testHashCodeCollision() {
    // case A:hash=99
    CharRange a1 = CharRange.isNotIn((char)1, (char)2); // 1,2,true → 83+1+14+1=99
    CharRange a2 = CharRange.isIn((char)2, (char)2);    // 2,2,false → 83+2+14+0=99
    assertEquals(a1.hashCode(), a2.hashCode()); // Collision
    // case B:hash=123
    CharRange b1 = CharRange.isIn((char)5, (char)5);    //5,5,false →83+5+35+0=123
    CharRange b2 = CharRange.isNotIn((char)4, (char)5); //4,5,true →83+4+35+1=123
    assertEquals(b1.hashCode(),b2.hashCode()); //Collision
}  

3. Proposed Solution
Replace the current linear combination implementation with a bitwise splicing + XOR flag scheme, which maximizes the use of 32-bit int space, minimizes collision rate, and optimizes calculation efficiency:

@Override
public int hashCode() {
    // Splice 16-bit start (high) and 16-bit end (low) into 32-bit int (no overflow)
    final int result = (start << 16) | (end & 0xFFFF);
    // Integrate negated flag via XOR to further disperse hash values
    return result ^ (negated ? 0x80000000 : 0);
}

@garydgregory
Copy link
Member

Hello @IcoreE
Thank you for the pull request.
This PR does NOT include a unit test!
This is a good find, but the JRE already provides a solution. Please see git master.
This PR can be closed unless you can provide a test with a different collision.

…cts.hash version and bitwise operation version
@IcoreE IcoreE closed this Dec 14, 2025
@IcoreE
Copy link
Contributor Author

IcoreE commented Dec 14, 2025

Hello @garydgregory In the PR, I have provided the unit test code: CharRangeHashCodeTest.java
I’ve conducted a detailed performa hashCode implementations for the CharRange class (package: org.apache.commons.lang3) and wanted to share the results:
1. Test Overview
I benchmarked two hashCode implementations for CharRange (coreorg.apache.commons.lang3.CharRange class):
Baseline: hashCodeObjects() (using Objects.hash(end, negated, start) – standard general-purpose implementat
Optimized: hashCodeBitwise() (bitwise splicing of startendnegated
image

# Conflicts:
#	src/main/java/org/apache/commons/lang3/CharRange.java
@IcoreE IcoreE reopened this Dec 21, 2025
@IcoreE IcoreE changed the title [Improvement] Optimize CharRange.hashCode() to reduce collision rate and improve calculation efficiency [LANG-1802] Optimize CharRange.hashCode() to reduce collision rate and improve calculation efficiency Dec 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants