Skip to content

An improved library similar to Pandas-Profiling  #8

@lettergram

Description

@lettergram

Howdy!

I'm reaching out as a maintainer of the DataProfiler library.

I think it might be useful to your project so I'm reaching out!

We effectively wrote a library to improve upon the objectives of pandas-profiling with some neat added functionality:

  • Auto-Detect & Load: CSV, AVRO, Parquet, JSON, Text, URL data = Data("your_filepath_or_url.csv")
  • Profile data: calculating statistics and doing entity detection (for PII) profile = Profiler(data)
  • Merge profiles: profile3 = profile1 + profile2; enabling distributed profile generation
  • Compare profiles: profile_diff = profile1.diff(profile2)
  • Generate reports: readable_report = profile.report(report_options={"output_format": "compact"})
import json
from dataprofiler import Data, Profiler

data = Data("your_file.csv") # Auto-Detect & Load: CSV, AVRO, Parquet, JSON, Text, URL

print(data.data.head(5)) # Access data directly via a compatible Pandas DataFrame

profile = Profiler(data) # Calculate Statistics, Entity Recognition, etc

readable_report = profile.report(report_options={"output_format": "compact"})

print(json.dumps(readable_report, indent=4))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions