A flexible and extensible Go library for text normalization. textn8r provides a comprehensive set of string normalizers that can be used individually or chained together to clean, transform, and standardize text data.
- Case Conversion: Convert text to uppercase, lowercase
- Space Handling: Trim, remove extra spaces, or remove all spaces
- Character Removal: Remove special characters, digits, punctuation, or non-alphanumeric characters
- Accent & Diacritic Handling: Remove or replace accented characters and diacritics
- Whitespace Normalization: Handle tabs, newlines, and carriage returns
- Flexible Replacement: Replace specific character types with custom strings
- Chainable Operations: Combine multiple normalizers for complex transformations
- Custom Normalizers: Create your own normalizers easily
go get github.com/slashdevops/textn8rpackage main
import (
"fmt"
"github.com/slashdevops/textn8r"
)
func main() {
// Basic usage
text := " Hello, World! "
result := textn8r.UpperCaseNormalizer(text)
fmt.Println(result) // " HELLO, WORLD! "
// Chain multiple normalizers
normalizers := textn8r.Normalizers{
textn8r.TrimSpaceNormalizer,
textn8r.UpperCaseNormalizer,
textn8r.RemoveSpecialCharactersNormalizer,
}
result = normalizers.Apply(" hello, world! ")
fmt.Println(result) // "HELLO WORLD "
}UpperCaseNormalizer: Converts text to uppercaseLowerCaseNormalizer: Converts text to lowercase
TrimSpaceNormalizer: Removes leading and trailing whitespaceRemoveExtraSpaceNormalizer: Collapses multiple spaces into single spacesRemoveAllSpaceNormalizer: Removes all spaces
RemoveSpecialCharactersNormalizer: Removes special characters (keeps alphanumeric and spaces)RemoveDigitsNormalizer: Removes all numeric digitsRemovePunctuationNormalizer: Removes punctuation marksRemoveNonAlphanumericNormalizer: Removes all non-alphanumeric characters
RemoveTabNormalizer: Removes tab charactersRemoveNewLineNormalizer: Removes newline charactersRemoveCarriageReturnNormalizer: Removes carriage return characters
ReplaceAccentsNormalizer: Replaces accented characters with base characters (café → cafe)ReplaceTildesNormalizer: Replaces tilde characters (ñ → n)RemoveDiacriticsNormalizer: Removes all diacritical marksRemoveTildesNormalizer: Removes tilde characters
These normalizers replace characters with custom strings:
ReplaceTabNormalizer(replacement): Replaces tabs with specified stringReplaceNewLineNormalizer(replacement): Replaces newlines with specified stringReplaceCarriageReturnNormalizer(replacement): Replaces carriage returnsReplaceSpaceNormalizer(replacement): Replaces spacesReplaceDigitsNormalizer(replacement): Replaces digitsReplacePunctuationNormalizer(replacement): Replaces punctuationReplaceNonAlphanumericNormalizer(replacement): Replaces non-alphanumeric charactersReplaceSpecialCharactersNormalizer(input, replacement): Replaces special charactersReplaceDiacriticsNormalizer(replacement): Replaces diacritics
text := " Hello, World! "
// Case conversion
fmt.Println(textn8r.UpperCaseNormalizer(text)) // " HELLO, WORLD! "
fmt.Println(textn8r.LowerCaseNormalizer(text)) // " hello, world! "
// Space handling
fmt.Println(textn8r.TrimSpaceNormalizer(text)) // "Hello, World!"text := "Hello@#$%World123!?*"
fmt.Println(textn8r.RemoveSpecialCharactersNormalizer(text)) // "Hello World123 "
fmt.Println(textn8r.RemoveDigitsNormalizer(text)) // "Hello@#$%World!?*"
fmt.Println(textn8r.RemovePunctuationNormalizer(text)) // "Hello@#$%World123"
fmt.Println(textn8r.RemoveNonAlphanumericNormalizer(text)) // "HelloWorld123"text := "Café, résumé, naïve, jalapeño"
fmt.Println(textn8r.ReplaceAccentsNormalizer(text)) // "Cafe, resume, naive, jalapeno"
fmt.Println(textn8r.ReplaceTildesNormalizer("niño")) // "nino"text := "Hello\tWorld\nNew Line"
tabReplacer := textn8r.ReplaceTabNormalizer("-")
fmt.Println(tabReplacer.Apply(text)) // "Hello-World\nNew Line"
newlineReplacer := textn8r.ReplaceNewLineNormalizer(" | ")
fmt.Println(newlineReplacer.Apply(text)) // "Hello\tWorld | New Line"text := " CAFÉ, résumé! @2023 "
normalizers := textn8r.Normalizers{
textn8r.TrimSpaceNormalizer,
textn8r.LowerCaseNormalizer,
textn8r.ReplaceAccentsNormalizer,
textn8r.RemoveSpecialCharactersNormalizer,
textn8r.RemoveExtraSpaceNormalizer,
}
result := normalizers.Apply(text)
fmt.Println(result) // "cafe resume 2023"// Create a custom normalizer
customNormalizer := textn8r.Normalizer(func(input string) string {
return strings.ReplaceAll(input, "foo", "bar")
})
result := customNormalizer.Apply("foo is everywhere")
fmt.Println(result) // "bar is everywhere"title := "How to Create Amazing Web Apps in 2023!"
slugNormalizers := textn8r.Normalizers{
textn8r.TrimSpaceNormalizer,
textn8r.LowerCaseNormalizer,
textn8r.RemovePunctuationNormalizer,
textn8r.RemoveExtraSpaceNormalizer,
textn8r.ReplaceSpaceNormalizer("-"),
}
slug := slugNormalizers.Apply(title)
fmt.Println(slug) // "how-to-create-amazing-web-apps-in-2023"userInput := " JoHn.DoE@123! \t\n"
cleaningNormalizers := textn8r.Normalizers{
textn8r.TrimSpaceNormalizer,
textn8r.RemoveTabNormalizer,
textn8r.RemoveNewLineNormalizer,
textn8r.LowerCaseNormalizer,
textn8r.RemoveSpecialCharactersNormalizer,
textn8r.RemoveExtraSpaceNormalizer,
}
cleanedInput := cleaningNormalizers.Apply(userInput)
fmt.Println(cleanedInput) // "john doe 123"// Standardize company names
companyNames := []string{
" ACME Corp. ",
"acme corporation",
"A.C.M.E. Inc",
}
standardizer := textn8r.Normalizers{
textn8r.TrimSpaceNormalizer,
textn8r.LowerCaseNormalizer,
textn8r.RemovePunctuationNormalizer,
textn8r.RemoveExtraSpaceNormalizer,
}
for _, name := range companyNames {
standardized := standardizer.Apply(name)
fmt.Printf("'%s' -> '%s'\n", name, standardized)
}
// Output:
// ' ACME Corp. ' -> 'acme corp'
// 'acme corporation' -> 'acme corporation'
// 'A.C.M.E. Inc' -> 'acme inc'You can run the comprehensive examples included in this repository:
cd examples
go run main.goRun the test suite:
go test -vContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.