Skip to content

m0nirul/Field-Observer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Field Observer

A Python utility to infer and report data types for columns in tabular files like CSVs and TSVs.

Features

  • Parses CSV and TSV files automatically.
  • Infers common data types (integer, float, string, boolean, date) for each column.
  • Identifies columns with mixed data types, reporting the types found.
  • Reports unique value counts for categorical fields.
  • Generates a summarized schema report to console, JSON, or CSV output.

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/field-observer.git
    cd field-observer
  2. Install dependencies:

    pip install -r requirements.txt
  3. Install as an editable package (optional, for development):

    pip install -e .

Usage

Field Observer provides a command-line interface (CLI) to analyze your data files.

Basic Usage

To analyze a CSV or TSV file and print the report to the console:

field-observer analyze <path/to/your/file.csv>

Example:

field-observer analyze data/sample.csv

Specifying Delimiter

By default, the tool attempts to infer the delimiter (comma for CSV, tab for TSV). If you need to explicitly specify it, use the -d or --delimiter option:

field-observer analyze data/custom_delimiter.txt --delimiter ';'

Output Formats

You can specify the output format using the -o or --output option. Supported formats are console (default), json, and csv.

JSON Output

To save the schema report as a JSON file:

field-observer analyze data/sample.csv --output json > schema_report.json

Or directly to a file:

field-observer analyze data/sample.csv --output json --file schema_report.json

CSV Output

To save the schema report as a CSV file:

bfield-observer analyze data/sample.csv --output csv > schema_report.csv

Or directly to a file:

field-observer analyze data/sample.csv --output csv --file schema_report.csv

Example Output (Console)

Analyzing file: data/sample.csv
--------------------------------------------------------------------------------
Column Name       | Inferred Type      | Mixed Types               | Unique Count
--------------------------------------------------------------------------------
id                | Integer            |                           | 100
name              | String             |                           | 98
email             | String             |                           | 100
age               | Integer            |                           | 50
is_active         | Boolean            |                           | 2
registration_date | Date (YYYY-MM-DD)  |                           | 80
price             | Float              |                           | 60
category          | String             |                           | 5
notes             | String             | None, String              | 70
--------------------------------------------------------------------------------

Contributing

Contributions are welcome! Please feel free to open issues or submit pull requests.

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/your-feature-name).
  3. Make your changes.
  4. Commit your changes (git commit -am 'feat: Add some feature').
  5. Push to the branch (git push origin feature/your-feature-name).
  6. Open a Pull Request.

About

A Python utility to infer and report data types for columns in tabular files like CSVs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages