csv-go

This package is a highly flexible and performant single threaded UTF-8 friendly csv stream reader and writer. It opts for strictness with nearly all options off by default. Reader and Writer constructors use functional options to maximize flexibility while validating configuration at initialization rather than at runtime. This keeps exported types behavior-oriented (methods over public fields), avoiding leakage of rigid internal implementation details, and improving coupling/cohesion. Once created, parsing and writing strategies are immutable, allowing maintainers to evolve implementations greatly over time while keeping interface contracts stable. It has been battle tested thoroughly in production contexts for both correctness and speed so feel free to use in any way you like.

Both the reader and writer are more performant than the standard go csv package when compared in an apples-to-apples configuration between the two. The writer also has several optimizations for non-string type serialization via the fluent api returned by csv.Writer.NewRecord() and FieldWriters(). I expect mileage here to vary over time. My primary goal with this lib was to solve my own edge case problems like suspect-encodings/loose-rules and offer something back more aligned with others that think like myself regarding reducing allocations, GC pause, and increasing efficiency.

package main

// this is a toy example that reads a csv file and writes to another

import (
	"os"

	"github.com/josephcopenhaver/csv-go/v3"
)

func main() {
	r, err := os.Open("input.csv")
	if err != nil {
		panic(err)
	}
	defer r.Close()

	cr, err := csv.NewReader(
		csv.ReaderOpts().Reader(r),
		// by default quotes have no meaning
		// so must be specified to match RFC 4180
		// csv.ReaderOpts().Quote('"'),
	)
	if err != nil {
		panic(err)
	}
	defer cr.Close()

	w, err := os.Create("output.csv")
	if err != nil {
		panic(err)
	}
	defer func() {
		if err := w.Close(); err != nil {
			panic(err)
		}
	}()

	cw, err := csv.NewWriter(
		csv.WriterOpts().Writer(w),
	)
	if err != nil {
		panic(err)
	}
	defer func() {
		if err := cw.Close(); err != nil {
			panic(err)
		}
	}()

	for row := range cr.IntoIter() {
		if _, err := cw.WriteRow(row...); err != nil {
			panic(err)
		}
	}
	if err := cr.Err(); err != nil {
		panic(err)
	}
}

See the Reader and Writer examples for more in-depth usages.

Reader Features

Name	option(s)
Zero allocations during processing	BorrowRow + BorrowFields + InitialRecordBuffer + InitialRecordBufferSize + NumFields
Format Specification	Comment + CommentsAllowedAfterStartOfRecords + Escape + FieldSeparator + Quote + RecordSeparator + NumFields
Format Discovery	DiscoverRecordSeparator
Data Loss Prevention	ClearFreedDataMemory
Byte Order Marker Support	RemoveByteOrderMarker + ErrorOnNoByteOrderMarker
Headers Support	ExpectHeaders + RemoveHeaderRow + TrimHeaders
Reader Buffer tuning	ReaderBuffer + ReaderBufferSize
Format Validation	ErrorOnNoRows + ErrorOnNewlineInUnquotedField + ErrorOnQuotesInUnquotedField
Security Limits	MaxFields + MaxRecordBytes + MaxRecords + MaxComments + MaxCommentBytes

Writer Features

Name	option(s)
Zero allocations	InitialRecordBufferSize + InitialRecordBuffer
Header and Comment Specification	CommentRune + CommentLines + IncludeByteOrderMarker + Headers + TrimHeaders
Format Specification	CommentRune + Escape + FieldSeparator + Quote + RecordSeparator + NumFields
Data Loss Prevention	ClearFreedDataMemory
Encoding Validation	ErrorOnNonUTF8
Security Limits	planned

Note that the writer also has WriteFieldRow*() functions (WriteFieldRow, WriteFieldRowBorrowed) to reduce allocations when converting non‑string types to human‑readable CSV field values via the FieldWriter generating functions under csv.FieldWriters().

Note that after a number of columns, the WriteFieldRow*() calls flush less efficiently given they can leak to the heap and the cost of staging the non-serialized forms in a slice of wide-structs can add up quickly. To address this case, a fluent API has been added to the csv.Writer instance which can be utilized per some record to write via .NewRecord() which returns a RecordWriter instance. In a single-threaded fashion it locks the writer until Write() or Rollback() is called. Each field can be buffered for writing via the "FieldTypeName()" functions on the RecordWriter instance. Only one RecordWriter instance can be alive at a time for any given Writer.

Performance testing should be utilized to choose which writing methodology is ideal for your case. In general choose the method most sympathetic to your hardware and data formats. For most cases, csv.Writer.NewRecord() should achieve a nice balance that scales very high in terms of both utility and efficiency.

CHANGELOG

Here's the same example as above adjusted to optimize throughput via additional configurations.

package main

// this is a toy example that reads a csv file and writes to another without making allocations while processing

import (
	"bufio"
	"os"

	"github.com/josephcopenhaver/csv-go/v3"
)

func main() {
	r, err := os.Open("input.csv")
	if err != nil {
		panic(err)
	}
	defer r.Close()

	// using a buffered reader to avoid hot io pipes / writing less than the system storage device block size or ideal network protocol packet payload size
	// could instead use something async powered to get concurrent behaviors
	br := bufio.NewReader(r)

	var cr csv.Reader
	{
		op := csv.ReaderOpts()
		cr, err = csv.NewReader(
			op.Reader(br),
			op.RecordSeparator("\n"), // simplifies the execution plan ever so slightly and ensures consistent parsing rather than depending on automatic discovery
			op.InitialRecordBufferSize(4*1024*1024), // seeds the reading record buffer to a particular initial capacity
			op.BorrowRow(true),                      // evades allocations BUT makes it unsafe to store/use the resulting slice past the next call to Scan
			op.BorrowFields(true),                   // evades allocations BUT makes it unsafe to store/use the resulting character content of each slice element result anywhere past the next call to Scan
			op.NumFields(2),                         // simplifies the execution plan ever so slightly
			// by default quotes have no meaning
			// so must be specified to match RFC 4180
			// op.Quote('"'),
		)
		if err != nil {
			panic(err)
		}
		defer func() {
			if err := cr.Close(); err != nil {
				panic(err)
			}
		}()
	}

	w, err := os.Create("output.csv")
	if err != nil {
		panic(err)
	}
	defer func() {
		if err := w.Close(); err != nil {
			panic(err)
		}
	}()

	// using a buffered writer to avoid hot io pipes / writing less than the system storage device block size or ideal network protocol packet payload size
	// could instead use something async powered to get concurrent behaviors
	bw := bufio.NewWriterSize(w, 4*1024*1024)
	defer func() {
		if err := bw.Flush(); err != nil {
			panic(err)
		}
	}()

	var cw *csv.Writer
	{
		op := csv.WriterOpts()
		cw, err = csv.NewWriter(
			op.Writer(bw),
			op.InitialRecordBufferSize(4*1024*1024), // seeds the writing record buffer to a particular initial capacity
		)
		if err != nil {
			panic(err)
		}
		defer func() {
			if err := cw.Close(); err != nil {
				panic(err)
			}
		}()
	}

	// using Scan instead of the iterator sugar to avoid allocation of the iterator closures
	for cr.Scan() {
		// if BorrowRow=true or BorrowFields=true then implementation reading rows from the Reader MUST NOT keep the rows or byte sub-slices alive beyond the next call to cr.Scan()

		// I could also use cw.WriteRow here in this example since
		// the input is a slice of strings, but for most contexts
		// persons will have varying input data types in which
		// case NewRecord offers the most utility for a small
		// overhead cost. If you always have strings already on the
		// heap or you know they do not escape, then use WriteRow
		// instead.
		rw, err := cw.NewRecord()
		if err != nil {
			// note if you are just going to panic or are certain
			// the Writer state never errors unexpectedly / becomes
			// hard-locked, consider MustNewRecord() instead of
			// NewRecord()
			panic(err)
		}
		for _, s := range cr.Row() {
			rw.String(s)
		}
		if _, err := rw.Write(); err != nil {
			panic(err)
		}
	}
	if err := cr.Err(); err != nil {
		panic(err)
	}
}

Name		Name	Last commit message	Last commit date
Latest commit History 228 Commits
.github/workflows		.github/workflows
docs		docs
internal		internal
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bench_fast_csv_rune_set_test.go		bench_fast_csv_rune_set_test.go
bench_reader_test.go		bench_reader_test.go
bench_writer_test.go		bench_writer_test.go
declen_u64.go		declen_u64.go
fast_csv_rune_encoder.go		fast_csv_rune_encoder.go
fast_csv_rune_set.go		fast_csv_rune_set.go
field_writer.go		field_writer.go
gen_strategies.go		gen_strategies.go
go.mod		go.mod
go.sum		go.sum
panics.go		panics.go
reader.go		reader.go
record_writer.go		record_writer.go
ttf_reader_eof_test.go		ttf_reader_eof_test.go
ttf_reader_init_error_test.go		ttf_reader_init_error_test.go
ttf_reader_init_test.go		ttf_reader_init_test.go
ttf_reader_io_error_test.go		ttf_reader_io_error_test.go
ttf_reader_iter_test.go		ttf_reader_iter_test.go
ttf_reader_parsing_error_test.go		ttf_reader_parsing_error_test.go
ttf_reader_prepare_row_appendRecBuf_true_test.go		ttf_reader_prepare_row_appendRecBuf_true_test.go
ttf_reader_prepare_row_error_test.go		ttf_reader_prepare_row_error_test.go
ttf_reader_prepare_row_maxFields_error_test.go		ttf_reader_prepare_row_maxFields_error_test.go
ttf_reader_prepare_row_maxRecords_error_test.go		ttf_reader_prepare_row_maxRecords_error_test.go
ttf_reader_prepare_row_outOfCommentBytes_true_test.go		ttf_reader_prepare_row_outOfCommentBytes_true_test.go
ttf_reader_prepare_row_outOfCommentLines_true_test.go		ttf_reader_prepare_row_outOfCommentLines_true_test.go
ttf_reader_prepare_row_test.go		ttf_reader_prepare_row_test.go
ttf_reader_scan_error_field_index_test.go		ttf_reader_scan_error_field_index_test.go
ttf_reader_skipscan_test.go		ttf_reader_skipscan_test.go
ttf_reader_test.go		ttf_reader_test.go
ttf_record_writer_error_test.go		ttf_record_writer_error_test.go
ttf_record_writer_rollback_test.go		ttf_record_writer_rollback_test.go
ttf_record_writer_test.go		ttf_record_writer_test.go
ttf_writer_error_test.go		ttf_writer_error_test.go
ttf_writer_header_error_test.go		ttf_writer_header_error_test.go
ttf_writer_header_test.go		ttf_writer_header_test.go
ttf_writer_init_error_test.go		ttf_writer_init_error_test.go
ttf_writer_process_field_error_test.go		ttf_writer_process_field_error_test.go
ttf_writer_process_field_test.go		ttf_writer_process_field_test.go
ttf_writer_test.go		ttf_writer_test.go
ttu_declen_u64_test.go		ttu_declen_u64_test.go
ttu_fast_csv_rune_encoder_test.go		ttu_fast_csv_rune_encoder_test.go
ttu_fast_csv_rune_set_test.go		ttu_fast_csv_rune_set_test.go
ttu_reader_helpers_test.go		ttu_reader_helpers_test.go
ttu_reader_option_record_separator_error_test.go		ttu_reader_option_record_separator_error_test.go
ttu_reader_panic_test.go		ttu_reader_panic_test.go
ttu_reader_record_separator_discovery_test.go		ttu_reader_record_separator_discovery_test.go
ttu_record_writer_init_error_test.go		ttu_record_writer_init_error_test.go
ttu_writer_field_test.go		ttu_writer_field_test.go
ttu_writer_init_test.go		ttu_writer_init_test.go
write_buffer.go		write_buffer.go
writer.go		writer.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

csv-go

Reader Features

Writer Features

About

Uh oh!

Releases 44

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

csv-go

Reader Features

Writer Features

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 44

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages