Update method signature and example in CSV.read() docstring#1173
Update method signature and example in CSV.read() docstring#1173abhro wants to merge 780 commits into
CSV.read() docstring#1173Conversation
Improve CSV.Rows performance
* Added some code comments to help clarify things * Update out-dated variable name usage (e.g. tapes) * Cleaned up dependencies (WeakRefStrings is test-only now) * Added lots of tests to increase coverage * Made multithreaded chunk identification more robust by checking we have correct # of columns for 5 consecutive rows instead of just 1 * Made sure we sync Int64 sentinels in multithreaded parsing * Removed some unused functions * Made sure we're testing type promoting when multithreaded parsing * Add a `tasks::Integer` keyword argument to allow controlling how many tasks will be spawned for multithreaded parsing * Clean up keyword arg docs
Lots of cleanup
Add CSV.Chunks for iterating over chunks of large files
Fixes JuliaData#464 (or at least improves it quite a bit). A new precompile.jl file is a script I ran to get some precompile statements for CSV.jl, Parsers.jl, and SentinelArrays.jl (which seem to be the biggest targets and ones that live in JuliaData). The Parsers.jl output ended up being insignificant for now, so that wasn't committed, but SentinelArrays.jl was. I've added two new "precompile.csv" and "precompile_small.csv" files that are used for snooping; they include a column of each type, which should hopefully cover a good chunk of codepaths we're compiling. We can ajust more later if there are certain paths that could use it and are causing people problems. All in all, this cuts TTFP (time-to-first-parse) in half on my machine.
Add some precompiles
Fixes JuliaData#668. The issue here is that when column names were passed manually, the code path that "skipped" to the datarow passed in the starting position as 1 instead of `pos` variable. This used to not be an issue because the `pos` was almost always 1 anyway. With `IOBuffer`, we now start `pos` at `io.ptr`, so we'll have more cases where it's critical to start reading at right position.
When column names passed manually, ensure we respect starting position
I've wanted to do this for a while; previously we were only using the estimate from the first 10 rows. This hooks into the "chunking" code, which looks at `tasks` # of chunks of a file to find the start of rows for each; we now keep track of the # of bytes we saw when doing those row checks and use those totals plus the original 10 rows to form a better estimate of the total # of rows.
Improve accuracy of estimated rows for multithreaded parsing
Make the automatic pooled=>string column promotion more efficient
…r invalid rows
added documentation for the dateformats option
Fixes JuliaData#679; alternative fix to JuliaData#681. When a column is dropped, we essentially turn it into a `Missing` column type and ignore it when parsing. There was a check later in file parsing, however, that said if no missing values were found in a column, to ensure its type is `Vector{T}` instead of `Vector{Union{Missing, T}}`. The core problem in issue JuliaData#679 was that these dropped columns, while completely `missing`, didn't get "flagged" as having `missing` values.
Fixes JuliaData#680. Before custom types, the `typemap` keyword argument was really only about mapping between the standard, supported types. With custom types, we still only support certain type mappings (Int to Float, Any type to String), but we also want to support type mappings like `Int64 => Int32`. This PR readjusts how typemap works when detecting column types to account for the possiblity of custom Integer or AbstractFloat type mappints for Int64 & Float64, and moving directly to String if that's specified.
Ensure dropped columns are ignored in later file processing
* support for IOBuffer containing memory * fix errors caught in tests * use Base.wrap if available --------- Co-authored-by: Viral B. Shah <ViralBShah@users.noreply.github.com>
* fix breakage caused by JuliaLang/julia/pull/53896 * make __wrap compatible with 1.11 RC
* Update to 0.10.14 * Update julia setup action * Add dependabot
* finalize memory if 1.11 * don't download busybox in test * test windows on lts
* fix decchar handling in writecell() for AbstractFloat * test for JuliaData#1109 fix decchar handling in writecell() for AbstractFloat * fix newline format --------- Co-authored-by: Jacob Quinn <quinn.jacobd@gmail.com>
* fix INT128_MIN write * add write INT128_MIN test * use Base.uabs Co-authored-by: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com> * add BigInt test --------- Co-authored-by: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com>
* Bump codecov/codecov-action from 4 to 5 Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 4 to 5. - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](codecov/codecov-action@v4...v5) --- updated-dependencies: - dependency-name: codecov/codecov-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * Update .github/workflows/ci.yml Co-authored-by: Chengyu Han <cyhan.dev@outlook.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jacob Quinn <quinn.jacobd@gmail.com> Co-authored-by: Chengyu Han <cyhan.dev@outlook.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1173 +/- ##
===========================================
+ Coverage 79.83% 90.29% +10.46%
===========================================
Files 8 9 +1
Lines 1671 2319 +648
===========================================
+ Hits 1334 2094 +760
+ Misses 337 225 -112 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Given that the entirety of the actual changes in this PR could be shown in the patch, From e4646f9b5970ce628af621da7c22c6825fe6b44a Mon Sep 17 00:00:00 2001
From: abhro <5664668+abhro@users.noreply.github.com>
Date: Sun, 9 Nov 2025 14:44:24 -0500
Subject: [PATCH] Update method signature and example in `CSV.read()` docstring
---
src/CSV.jl | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/CSV.jl b/src/CSV.jl
index 7a6789a9..8edfe2f1 100644
--- a/src/CSV.jl
+++ b/src/CSV.jl
@@ -79,13 +79,13 @@ include("rows.jl")
include("write.jl")
"""
-`CSV.read(source, sink::T; kwargs...)` => T
+ CSV.read(source, sink::T; kwargs...) => T
Read and parses a delimited file or files, materializing directly using the `sink` function. Allows avoiding excessive copies
of columns for certain sinks like `DataFrame`.
# Example
-```
+```julia-repl
julia> using CSV, DataFrames
julia> path = tempname();may I request reconsideration on closing the PR? |
|
The singular commit has been cherry-picked on top of main, to align with the newer commit history. |
|
do you have a new PR or branch w/ the commit? |
|
It should be this same current branch, I've rebased (-ish) and force pushed to this remote (abhro:patch-1) |
|
got it; I guess I expected it to update hte status on this page, but this must be some snapshot of the old state. I've opened a new PR w/ just the new clean commit: #1183 |
|
Thank you! |
No description provided.