Skip to content

vermaysha/windows-1252

Repository files navigation

@vermaysha/windows-1252

npm version License: MIT

The version I created for my personal use. If you want to try it and find that it suits you, congratulations! You can use this library.

Technically, this can be used throughout the JavaScript/Typescript/EcmaScript ecosystem or whatever it is.

Features

  1. Encode from blabla string to windows-1252 encoding
  2. Decode from windows-1252 encoding to blablabla

It's as simple as encoding and decoding, just that simple! Don't expect any other features.

Installation

# npm
npm install @vermaysha/windows-1252

# yarn
yarn add @vermaysha/windows-1252

# pnpm
pnpm add @vermaysha/windows-1252

# bun
bun add @vermaysha/windows-1252

CDN Usage (esm.sh)

You can also use this package directly from a CDN without installation:

// Import everything
import { encode, decode } from "https://esm.sh/@vermaysha/windows-1252";

// Import only encode function (tree-shakeable)
import { encode } from "https://esm.sh/@vermaysha/windows-1252/encode";

// Import only decode function (tree-shakeable)
import { decode } from "https://esm.sh/@vermaysha/windows-1252/decode";

This package exports modular functions, so you can import only what you need to reduce bundle size. Each export path (/encode and /decode) contains only the specific functionality without importing the other.

Usage

Basic Encoding

import { encode } from "@vermaysha/windows-1252";

// Encode a string to Windows-1252 bytes
const bytes = encode("Hello, world!");
console.log(bytes); // Uint8Array [72, 101, 108, 108, 111, 44, 32, 119, 111, 114, 108, 100, 33]

// Encode with special characters
const euroBytes = encode("Price: €100");
console.log(euroBytes); // Uint8Array with € encoded as 0x80

Basic Decoding

import { decode } from "@vermaysha/windows-1252";

// Decode Windows-1252 bytes to a string
const bytes = new Uint8Array([72, 101, 108, 108, 111]);
const text = decode(bytes);
console.log(text); // "Hello"

// Decode with Windows-1252 specific characters
const euroBytes = new Uint8Array([0x80]); // € symbol
const euroText = decode(euroBytes);
console.log(euroText); // "€"

Encoding Options

The encode function accepts an optional options parameter:

import { encode } from "@vermaysha/windows-1252";

// Fatal mode (default) - throws error for unencodable characters
try {
  encode("Hello 世界"); // Contains Chinese characters not in Windows-1252
} catch (error) {
  console.error(error.message); // "Character '世' (U+4E16) cannot be encoded in Windows-1252"
}

// Replacement mode - replaces unencodable characters with '?'
const bytes = encode("Hello 世界", { mode: "replacement" });
console.log(decode(bytes)); // "Hello ??"

Decoding Options

The decode function also accepts an optional options parameter:

import { decode } from "@vermaysha/windows-1252";

// Replacement mode (default) - replaces invalid bytes with �
const invalidBytes = new Uint8Array([0x48, 0x81, 0x65]); // 0x81 is undefined in Windows-1252
const text = decode(invalidBytes);
console.log(text); // "H�e"

// Fatal mode - throws error for invalid bytes
try {
  decode(invalidBytes, { mode: "fatal" });
} catch (error) {
  console.error(error.message); // "Invalid Windows-1252 byte: 0x81"
}

Complete Example

import { encode, decode } from "@vermaysha/windows-1252";

// Round-trip encoding/decoding
const original = "Café — €100";
const encoded = encode(original);
const decoded = decode(encoded);

console.log(original === decoded); // true

// Working with binary data
const buffer = encode("Hello");
// Send buffer over network, save to file, etc.
const restored = decode(buffer);
console.log(restored); // "Hello"

API Reference

encode(str: string, options?: EncodeOptions): Uint8Array

Encodes a UTF-16 string to Windows-1252 byte array.

Parameters:

  • str - The string to encode
  • options (optional):
    • mode: 'fatal' (default) | 'replacement'
      • 'fatal': Throws an error if a character cannot be encoded
      • 'replacement': Replaces unencodable characters with ?

Returns: Uint8Array of bytes in Windows-1252 encoding

Throws: Error if the string contains characters not representable in Windows-1252 and mode is 'fatal'

decode(bytes: Uint8Array | number[], options?: DecodeOptions): string

Decodes a Windows-1252 encoded byte array to a UTF-16 string.

Parameters:

  • bytes - Array of bytes in Windows-1252 encoding
  • options (optional):
    • mode: 'replacement' (default) | 'fatal'
      • 'replacement': Replaces invalid bytes with � (U+FFFD)
      • 'fatal': Throws an error if an invalid byte is encountered

Returns: Decoded string

Throws: Error if bytes contain invalid values and mode is 'fatal'

Windows-1252 Character Set

Windows-1252 is a single-byte character encoding that includes:

  • 0x00-0x7F: ASCII characters (direct mapping)
  • 0x80-0x9F: Windows-1252 specific characters (€, ', ", —, etc.)
  • 0xA0-0xFF: Latin-1 supplement (direct mapping)

Note: Bytes 0x81, 0x8D, 0x8F, 0x90, and 0x9D are undefined in Windows-1252 and will be replaced with � (U+FFFD) in replacement mode or throw an error in fatal mode.

Special Characters

The library correctly handles Windows-1252 specific characters in the 0x80-0x9F range:

Byte Character Unicode Description
0x80 U+20AC Euro sign
0x82 U+201A Single low-9 quotation mark
0x83 ƒ U+0192 Latin small letter f with hook
0x84 U+201E Double low-9 quotation mark
0x85 U+2026 Horizontal ellipsis
0x86 U+2020 Dagger
0x87 U+2021 Double dagger
0x88 ˆ U+02C6 Modifier letter circumflex accent
0x89 U+2030 Per mille sign
0x8A Š U+0160 Latin capital letter S with caron
0x8B U+2039 Single left-pointing angle quotation mark
0x8C Œ U+0152 Latin capital ligature OE
0x8E Ž U+017D Latin capital letter Z with caron
0x91 ' U+2018 Left single quotation mark
0x92 ' U+2019 Right single quotation mark
0x93 " U+201C Left double quotation mark
0x94 " U+201D Right double quotation mark
0x95 U+2022 Bullet
0x96 U+2013 En dash
0x97 U+2014 Em dash
0x98 ˜ U+02DC Small tilde
0x99 U+2122 Trade mark sign
0x9A š U+0161 Latin small letter s with caron
0x9B U+203A Single right-pointing angle quotation mark
0x9C œ U+0153 Latin small ligature oe
0x9E ž U+017E Latin small letter z with caron
0x9F Ÿ U+0178 Latin capital letter Y with diaeresis

Browser Support

Works in all modern browsers and environments that support:

  • Uint8Array
  • String.fromCodePoint()
  • String.prototype.codePointAt()

Development

Install Dependencies

bun install

Run Tests

bun test

Build

bun run build.ts

License

MIT © Ashary Vermaysha

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Links

Related

About

a Fast way to encode/decode windows-1252 encoding on Javascript/Typescript work on NodeJS/Bun/Browser

Topics

Resources

License

Stars

Watchers

Forks

Contributors