Skip to content

Fix leading space in surnames after capitalize() with empty middle name#164

Open
patchwright wants to merge 1 commit into
derek73:masterfrom
patchwright:fix/capitalize-leading-space
Open

Fix leading space in surnames after capitalize() with empty middle name#164
patchwright wants to merge 1 commit into
derek73:masterfrom
patchwright:fix/capitalize-leading-space

Conversation

@patchwright

Copy link
Copy Markdown

Summary

After capitalize() is called on a name that has no middle name (or any other empty attribute), surnames carries a spurious leading space and the *_list attributes become [''] instead of [].

Reproduction

from nameparser import HumanName

n = HumanName('john doe')
n.capitalize()

print(repr(n.surnames))     # ' Doe'   <- leading space
print(n.middle_list)        # ['']    <- should be []
print(n.title_list)         # ['']    <- should be []

The leading space is a real observable artifact, not just an empty-list detail: downstream code that joins or compares surnames silently breaks.

Cause

cap_piece(...).split(' ') uses str.split(' ') with an explicit separator. On an empty string, split(' ') returns [''] (it preserves the empty token), whereas str.split() with no argument collapses runs of whitespace and returns [] for an empty/whitespace-only string — which is the correct result for a list of name tokens.

Fix

Drop the explicit separator for the whitespace-delimited attributes:

-        self.title_list = self.cap_piece(self.title, 'title').split(' ')
-        self.first_list = self.cap_piece(self.first, 'first').split(' ')
-        self.middle_list = self.cap_piece(self.middle, 'middle').split(' ')
-        self.last_list = self.cap_piece(self.last, 'last').split(' ')
+        self.title_list = self.cap_piece(self.title, 'title').split()
+        self.first_list = self.cap_piece(self.first, 'first').split()
+        self.middle_list = self.cap_piece(self.middle, 'middle').split()
+        self.last_list = self.cap_piece(self.last, 'last').split()
         self.suffix_list = self.cap_piece(self.suffix, 'suffix').split(', ')

The suffix line is intentionally left as .split(', '), because commas are a deliberate delimiter there.

Tests

Adds test_capitalize_empty_name_part_has_no_leading_space_in_surnames, asserting surnames == 'Doe', middle_list == [], and surnames_list == ['Doe'] after capitalize() (covers the force=True path too). The case fails on the current code and passes with the fix. Full suite: 345 tests OK (skipped=2, expected failures=10).

I couldn't find an existing issue or PR covering this — happy to fold it into one if I missed it.

capitalize() split each attribute with str.split(' '), which returns ['']
(not []) for an empty string. cap_piece() returns '' for an empty part, so an
empty middle name produced middle_list = [''], which leaked into surnames_list
(middle_list + last_list) and yielded a leading space in the surnames property:

    >>> hn = HumanName('john doe'); hn.capitalize(); hn.surnames
    ' Doe'   # leading space (should be 'Doe')

The same spurious '' element also appeared in title_list/first_list/last_list
for empty attributes. Using str.split() instead returns [] for empty strings
and is equivalent for the already-whitespace-collapsed pieces cap_piece()
returns. The suffix split (', ') is intentionally left unchanged.

Added a regression test in HumanNameCapitalizationTestCase.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant