textn8r

A flexible and extensible Go library for text normalization. textn8r provides a comprehensive set of string normalizers that can be used individually or chained together to clean, transform, and standardize text data.

Features

Case Conversion: Convert text to uppercase, lowercase
Space Handling: Trim, remove extra spaces, or remove all spaces
Character Removal: Remove special characters, digits, punctuation, or non-alphanumeric characters
Accent & Diacritic Handling: Remove or replace accented characters and diacritics
Whitespace Normalization: Handle tabs, newlines, and carriage returns
Flexible Replacement: Replace specific character types with custom strings
Chainable Operations: Combine multiple normalizers for complex transformations
Custom Normalizers: Create your own normalizers easily

Installation

go get github.com/slashdevops/textn8r

Quick Start

package main

import (
    "fmt"
    "github.com/slashdevops/textn8r"
)

func main() {
    // Basic usage
    text := "  Hello, World!  "
    result := textn8r.UpperCaseNormalizer(text)
    fmt.Println(result) // "  HELLO, WORLD!  "

    // Chain multiple normalizers
    normalizers := textn8r.Normalizers{
        textn8r.TrimSpaceNormalizer,
        textn8r.UpperCaseNormalizer,
        textn8r.RemoveSpecialCharactersNormalizer,
    }
    result = normalizers.Apply("  hello, world!  ")
    fmt.Println(result) // "HELLO  WORLD "
}

Available Normalizers

Case Conversion

UpperCaseNormalizer: Converts text to uppercase
LowerCaseNormalizer: Converts text to lowercase

Space Handling

TrimSpaceNormalizer: Removes leading and trailing whitespace
RemoveExtraSpaceNormalizer: Collapses multiple spaces into single spaces
RemoveAllSpaceNormalizer: Removes all spaces

Character Removal

RemoveSpecialCharactersNormalizer: Removes special characters (keeps alphanumeric and spaces)
RemoveDigitsNormalizer: Removes all numeric digits
RemovePunctuationNormalizer: Removes punctuation marks
RemoveNonAlphanumericNormalizer: Removes all non-alphanumeric characters

Whitespace Control

RemoveTabNormalizer: Removes tab characters
RemoveNewLineNormalizer: Removes newline characters
RemoveCarriageReturnNormalizer: Removes carriage return characters

Accent & Diacritic Handling

ReplaceAccentsNormalizer: Replaces accented characters with base characters (café → cafe)
ReplaceTildesNormalizer: Replaces tilde characters (ñ → n)
RemoveDiacriticsNormalizer: Removes all diacritical marks
RemoveTildesNormalizer: Removes tilde characters

Replacement Normalizers

These normalizers replace characters with custom strings:

ReplaceTabNormalizer(replacement): Replaces tabs with specified string
ReplaceNewLineNormalizer(replacement): Replaces newlines with specified string
ReplaceCarriageReturnNormalizer(replacement): Replaces carriage returns
ReplaceSpaceNormalizer(replacement): Replaces spaces
ReplaceDigitsNormalizer(replacement): Replaces digits
ReplacePunctuationNormalizer(replacement): Replaces punctuation
ReplaceNonAlphanumericNormalizer(replacement): Replaces non-alphanumeric characters
ReplaceSpecialCharactersNormalizer(input, replacement): Replaces special characters
ReplaceDiacriticsNormalizer(replacement): Replaces diacritics

Usage Examples

Basic Normalizers

text := "  Hello, World!  "

// Case conversion
fmt.Println(textn8r.UpperCaseNormalizer(text))     // "  HELLO, WORLD!  "
fmt.Println(textn8r.LowerCaseNormalizer(text))     // "  hello, world!  "

// Space handling
fmt.Println(textn8r.TrimSpaceNormalizer(text))     // "Hello, World!"

Basic Character Removal Examples

text := "Hello@#$%World123!?*"

fmt.Println(textn8r.RemoveSpecialCharactersNormalizer(text))    // "Hello World123 "
fmt.Println(textn8r.RemoveDigitsNormalizer(text))              // "Hello@#$%World!?*"
fmt.Println(textn8r.RemovePunctuationNormalizer(text))         // "Hello@#$%World123"
fmt.Println(textn8r.RemoveNonAlphanumericNormalizer(text))     // "HelloWorld123"

Accent Handling

text := "Café, résumé, naïve, jalapeño"

fmt.Println(textn8r.ReplaceAccentsNormalizer(text))  // "Cafe, resume, naive, jalapeno"
fmt.Println(textn8r.ReplaceTildesNormalizer("niño")) // "nino"

Using Replacement Normalizers

text := "Hello\tWorld\nNew Line"

tabReplacer := textn8r.ReplaceTabNormalizer("-")
fmt.Println(tabReplacer.Apply(text))  // "Hello-World\nNew Line"

newlineReplacer := textn8r.ReplaceNewLineNormalizer(" | ")
fmt.Println(newlineReplacer.Apply(text))  // "Hello\tWorld | New Line"

Chaining Normalizers

text := "  CAFÉ, résumé! @2023  "

normalizers := textn8r.Normalizers{
    textn8r.TrimSpaceNormalizer,
    textn8r.LowerCaseNormalizer,
    textn8r.ReplaceAccentsNormalizer,
    textn8r.RemoveSpecialCharactersNormalizer,
    textn8r.RemoveExtraSpaceNormalizer,
}

result := normalizers.Apply(text)
fmt.Println(result)  // "cafe resume 2023"

Custom Normalizers

// Create a custom normalizer
customNormalizer := textn8r.Normalizer(func(input string) string {
    return strings.ReplaceAll(input, "foo", "bar")
})

result := customNormalizer.Apply("foo is everywhere")
fmt.Println(result)  // "bar is everywhere"

Real-World Use Cases

URL Slug Generation

title := "How to Create Amazing Web Apps in 2023!"

slugNormalizers := textn8r.Normalizers{
    textn8r.TrimSpaceNormalizer,
    textn8r.LowerCaseNormalizer,
    textn8r.RemovePunctuationNormalizer,
    textn8r.RemoveExtraSpaceNormalizer,
    textn8r.ReplaceSpaceNormalizer("-"),
}

slug := slugNormalizers.Apply(title)
fmt.Println(slug)  // "how-to-create-amazing-web-apps-in-2023"

User Input Sanitization

userInput := "  JoHn.DoE@123!  \t\n"

cleaningNormalizers := textn8r.Normalizers{
    textn8r.TrimSpaceNormalizer,
    textn8r.RemoveTabNormalizer,
    textn8r.RemoveNewLineNormalizer,
    textn8r.LowerCaseNormalizer,
    textn8r.RemoveSpecialCharactersNormalizer,
    textn8r.RemoveExtraSpaceNormalizer,
}

cleanedInput := cleaningNormalizers.Apply(userInput)
fmt.Println(cleanedInput)  // "john doe 123"

Data Standardization

// Standardize company names
companyNames := []string{
    "  ACME Corp.  ",
    "acme corporation",
    "A.C.M.E. Inc",
}

standardizer := textn8r.Normalizers{
    textn8r.TrimSpaceNormalizer,
    textn8r.LowerCaseNormalizer,
    textn8r.RemovePunctuationNormalizer,
    textn8r.RemoveExtraSpaceNormalizer,
}

for _, name := range companyNames {
    standardized := standardizer.Apply(name)
    fmt.Printf("'%s' -> '%s'\n", name, standardized)
}
// Output:
// '  ACME Corp.  ' -> 'acme corp'
// 'acme corporation' -> 'acme corporation'
// 'A.C.M.E. Inc' -> 'acme inc'

Running the Examples

You can run the comprehensive examples included in this repository:

cd examples
go run main.go

Testing

Run the test suite:

go test -v

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
.vscode		.vscode
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example_test.go		example_test.go
go.mod		go.mod
normalizers.go		normalizers.go
normalizers_test.go		normalizers_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

textn8r

Features

Installation

Quick Start

Available Normalizers

Case Conversion

Space Handling

Character Removal

Whitespace Control

Accent & Diacritic Handling

Replacement Normalizers

Usage Examples

Basic Normalizers

Basic Character Removal Examples

Accent Handling

Using Replacement Normalizers

Chaining Normalizers

Custom Normalizers

Real-World Use Cases

URL Slug Generation

User Input Sanitization

Data Standardization

Running the Examples

Testing

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

textn8r

Features

Installation

Quick Start

Available Normalizers

Case Conversion

Space Handling

Character Removal

Whitespace Control

Accent & Diacritic Handling

Replacement Normalizers

Usage Examples

Basic Normalizers

Basic Character Removal Examples

Accent Handling

Using Replacement Normalizers

Chaining Normalizers

Custom Normalizers

Real-World Use Cases

URL Slug Generation

User Input Sanitization

Data Standardization

Running the Examples

Testing

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages