Update method signature and example in `CSV.read()` docstring by abhro · Pull Request #1173 · JuliaData/CSV.jl

abhro · 2025-11-09T19:44:31Z

No description provided.

Improve CSV.Rows performance

* Added some code comments to help clarify things * Update out-dated variable name usage (e.g. tapes) * Cleaned up dependencies (WeakRefStrings is test-only now) * Added lots of tests to increase coverage * Made multithreaded chunk identification more robust by checking we have correct # of columns for 5 consecutive rows instead of just 1 * Made sure we sync Int64 sentinels in multithreaded parsing * Removed some unused functions * Made sure we're testing type promoting when multithreaded parsing * Add a `tasks::Integer` keyword argument to allow controlling how many tasks will be spawned for multithreaded parsing * Clean up keyword arg docs

Lots of cleanup

Add CSV.Chunks for iterating over chunks of large files

Fixes JuliaData#464 (or at least improves it quite a bit). A new precompile.jl file is a script I ran to get some precompile statements for CSV.jl, Parsers.jl, and SentinelArrays.jl (which seem to be the biggest targets and ones that live in JuliaData). The Parsers.jl output ended up being insignificant for now, so that wasn't committed, but SentinelArrays.jl was. I've added two new "precompile.csv" and "precompile_small.csv" files that are used for snooping; they include a column of each type, which should hopefully cover a good chunk of codepaths we're compiling. We can ajust more later if there are certain paths that could use it and are causing people problems. All in all, this cuts TTFP (time-to-first-parse) in half on my machine.

Add some precompiles

Fixes JuliaData#668. The issue here is that when column names were passed manually, the code path that "skipped" to the datarow passed in the starting position as 1 instead of `pos` variable. This used to not be an issue because the `pos` was almost always 1 anyway. With `IOBuffer`, we now start `pos` at `io.ptr`, so we'll have more cases where it's critical to start reading at right position.

When column names passed manually, ensure we respect starting position

I've wanted to do this for a while; previously we were only using the estimate from the first 10 rows. This hooks into the "chunking" code, which looks at `tasks` # of chunks of a file to find the start of rows for each; we now keep track of the # of bytes we saw when doing those row checks and use those totals plus the original 10 rows to form a better estimate of the total # of rows.

Improve accuracy of estimated rows for multithreaded parsing

Make the automatic pooled=>string column promotion more efficient

…r invalid rows

added documentation for the dateformats option

Fixes JuliaData#679; alternative fix to JuliaData#681. When a column is dropped, we essentially turn it into a `Missing` column type and ignore it when parsing. There was a check later in file parsing, however, that said if no missing values were found in a column, to ensure its type is `Vector{T}` instead of `Vector{Union{Missing, T}}`. The core problem in issue JuliaData#679 was that these dropped columns, while completely `missing`, didn't get "flagged" as having `missing` values.

Fixes JuliaData#680. Before custom types, the `typemap` keyword argument was really only about mapping between the standard, supported types. With custom types, we still only support certain type mappings (Int to Float, Any type to String), but we also want to support type mappings like `Int64 => Int32`. This PR readjusts how typemap works when detecting column types to account for the possiblity of custom Integer or AbstractFloat type mappints for Int64 & Float64, and moving directly to String if that's specified.

Ensure dropped columns are ignored in later file processing

JuliaData#1123)

…liaData#1126)

* support for IOBuffer containing memory * fix errors caught in tests * use Base.wrap if available --------- Co-authored-by: Viral B. Shah <ViralBShah@users.noreply.github.com>

* fix breakage caused by JuliaLang/julia/pull/53896 * make __wrap compatible with 1.11 RC

* Update to 0.10.14 * Update julia setup action * Add dependabot

* finalize memory if 1.11 * don't download busybox in test * test windows on lts

* fix decchar handling in writecell() for AbstractFloat * test for JuliaData#1109 fix decchar handling in writecell() for AbstractFloat * fix newline format --------- Co-authored-by: Jacob Quinn <quinn.jacobd@gmail.com>

* fix INT128_MIN write * add write INT128_MIN test * use Base.uabs Co-authored-by: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com> * add BigInt test --------- Co-authored-by: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com>

* Bump codecov/codecov-action from 4 to 5 Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 4 to 5. - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](codecov/codecov-action@v4...v5) --- updated-dependencies: - dependency-name: codecov/codecov-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * Update .github/workflows/ci.yml Co-authored-by: Chengyu Han <cyhan.dev@outlook.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jacob Quinn <quinn.jacobd@gmail.com> Co-authored-by: Chengyu Han <cyhan.dev@outlook.com>

codecov · 2025-11-09T19:51:04Z

Codecov Report

❌ Patch coverage is 90.54545% with 156 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.29%. Comparing base (b222c16) to head (e4646f9).
⚠️ Report is 773 commits behind head on main.

Files with missing lines	Patch %	Lines
src/context.jl	88.68%	37 Missing ⚠️
src/utils.jl	89.11%	37 Missing ⚠️
src/file.jl	94.10%	29 Missing ⚠️
src/rows.jl	82.20%	29 Missing ⚠️
src/detection.jl	94.68%	11 Missing ⚠️
src/write.jl	89.53%	9 Missing ⚠️
src/CSV.jl	71.42%	2 Missing ⚠️
src/chunks.jl	91.30%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1173       +/-   ##
===========================================
+ Coverage   79.83%   90.29%   +10.46%     
===========================================
  Files           8        9        +1     
  Lines        1671     2319      +648     
===========================================
+ Hits         1334     2094      +760     
+ Misses        337      225      -112

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

abhro · 2026-01-15T16:21:46Z

Given that the entirety of the actual changes in this PR could be shown in the patch,

From e4646f9b5970ce628af621da7c22c6825fe6b44a Mon Sep 17 00:00:00 2001
From: abhro <5664668+abhro@users.noreply.github.com>
Date: Sun, 9 Nov 2025 14:44:24 -0500
Subject: [PATCH] Update method signature and example in `CSV.read()` docstring

---
 src/CSV.jl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/CSV.jl b/src/CSV.jl
index 7a6789a9..8edfe2f1 100644
--- a/src/CSV.jl
+++ b/src/CSV.jl
@@ -79,13 +79,13 @@ include("rows.jl")
 include("write.jl")
 
 """
-`CSV.read(source, sink::T; kwargs...)` => T
+    CSV.read(source, sink::T; kwargs...) => T
 
 Read and parses a delimited file or files, materializing directly using the `sink` function. Allows avoiding excessive copies
 of columns for certain sinks like `DataFrame`.
 
 # Example
-```
+```julia-repl
 julia> using CSV, DataFrames
 
 julia> path = tempname();

may I request reconsideration on closing the PR?

abhro · 2026-04-23T14:16:05Z

The singular commit has been cherry-picked on top of main, to align with the newer commit history.

quinnj · 2026-04-23T16:47:20Z

do you have a new PR or branch w/ the commit?

abhro · 2026-04-23T16:49:51Z

It should be this same current branch, I've rebased (-ish) and force pushed to this remote (abhro:patch-1)

quinnj · 2026-04-23T16:53:13Z

got it; I guess I expected it to update hte status on this page, but this must be some snapshot of the old state. I've opened a new PR w/ just the new clean commit: #1183

abhro · 2026-04-23T16:57:24Z

Thank you!

quinnj and others added 30 commits June 26, 2020 14:02

Fix docs

2924126

Merge pull request JuliaData#663 from JuliaData/jq/rowperf

3d2ad1f

Improve CSV.Rows performance

Merge pull request JuliaData#664 from JuliaData/jq/cleanup

0075065

Lots of cleanup

Quick fix for testing file

d74c2fc

Add CSV.Chunks for iterating over chunks of large files

c0426a8

fix 32-bit

27a4af3

fix windows

f5023fd

fix travis

ed3dab6

Merge pull request JuliaData#665 from JuliaData/jq/chunks

bd4c8f3

Add CSV.Chunks for iterating over chunks of large files

Merge pull request JuliaData#666 from JuliaData/jq/precompile

138e323

Add some precompiles

Add SentinelArrays compat

4671e57

Merge pull request JuliaData#671 from JuliaData/jq/668

da0a5dc

When column names passed manually, ensure we respect starting position

Bump version

e9651d9

Fix tests

5b06308

Merge pull request JuliaData#673 from JuliaData/jq/estrows

ed48ad2

Improve accuracy of estimated rows for multithreaded parsing

Make the automatic pooled=>string column promotion more efficient

b9cb5d4

remove debug

2933624

fix nightly test

88cf907

Merge pull request JuliaData#676 from JuliaData/jq/pool

e21beb3

Make the automatic pooled=>string column promotion more efficient

Fix JuliaData#678 by ensuring pooled columns get missing value set fo…

478670f

…r invalid rows

Bump version

131f233

added documentation for the dateformats option

b96868a

Merge pull request JuliaData#682 from kragol/document_dateformats

d4ce5b6

added documentation for the dateformats option

Merge pull request JuliaData#683 from JuliaData/jq/679

c53b274

Ensure dropped columns are ignored in later file processing

stephen-huan and others added 20 commits June 26, 2023 12:23

doc(examples.md): fix extraneous ``` (JuliaData#1100)

058fa68

docs: fixing Example.md render with @ref => @id (JuliaData#1106)

c81a1af

Add zenodo badge to README; fixes JuliaData#1112

cb1b411

typos (JuliaData#1119)

c6efb45

Update Project.toml

849f17f

Update keyworddocs.jl for limit to remove use of deprecated "threaded" (

00f5510

JuliaData#1123)

Add compat to Documenter.jl, use warnonly = Documenter.except() (Ju…

66a3a65

…liaData#1126)

support for IOBuffer containing Memory (JuliaData#1125)

141e2e4

* support for IOBuffer containing memory * fix errors caught in tests * use Base.wrap if available --------- Co-authored-by: Viral B. Shah <ViralBShah@users.noreply.github.com>

Update Project.toml

ba1f4d2

Update ci.yml: Add mac aarch64 CI, codecov v4 (JuliaData#1127)

acd36a6

Fix breakage caused by JuliaLang/julia/pull/53896 (JuliaData#1133)

67424ce

* fix breakage caused by JuliaLang/julia/pull/53896 * make __wrap compatible with 1.11 RC

Update Project.toml to 0.10.14 (JuliaData#1134)

3d61294

* Update to 0.10.14 * Update julia setup action * Add dependabot

Fix CI badge in README

57eca79

Fix reading gzipped file in Julia 1.11 on Windows (JuliaData#1144)

80936af

* finalize memory if 1.11 * don't download busybox in test * test windows on lts

Bump version to 0.10.15

41a6875

fix INT128_MIN write (JuliaData#1152)

8207959

* fix INT128_MIN write * add write INT128_MIN test * use Base.uabs Co-authored-by: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com> * add BigInt test --------- Co-authored-by: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com>

Update examples.md to use ZipArchives (JuliaData#1158)

04ec1cf

Update method signature and example in CSV.read() docstring

e4646f9

t-bltg mentioned this pull request Jan 10, 2026

compress test seems broken #1177

Closed

quinnj force-pushed the main branch from 04ec1cf to 4f8c505 Compare January 12, 2026 15:48

quinnj closed this Jan 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update method signature and example in `CSV.read()` docstring#1173

Update method signature and example in `CSV.read()` docstring#1173
abhro wants to merge 780 commits into
JuliaData:mainfrom
abhro:patch-1

abhro commented Nov 9, 2025

Uh oh!

codecov Bot commented Nov 9, 2025 •

edited

Loading

Uh oh!

abhro commented Jan 15, 2026 •

edited

Loading

Uh oh!

abhro commented Apr 23, 2026

Uh oh!

quinnj commented Apr 23, 2026

Uh oh!

abhro commented Apr 23, 2026

Uh oh!

quinnj commented Apr 23, 2026

Uh oh!

abhro commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

abhro commented Nov 9, 2025

Uh oh!

codecov Bot commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

abhro commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abhro commented Apr 23, 2026

Uh oh!

quinnj commented Apr 23, 2026

Uh oh!

abhro commented Apr 23, 2026

Uh oh!

quinnj commented Apr 23, 2026

Uh oh!

abhro commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

codecov Bot commented Nov 9, 2025 •

edited

Loading

abhro commented Jan 15, 2026 •

edited

Loading