Kaggle supplies many datasets, most are in CSV format.
Does adding the feature of directly downloading Kaggle datasets in MLDatasets.jl make any sense?
For example, to download House Prices 2023 Dataset:
Step1: Get kaggle.json file or set the username and key manually.
username = "neroblackstone"
key = "key"
or download keggle.json to ~/.kaggle/
Step2: Download
# download dataset to default path and extract csv.
files_path = keggle_download("howisusmanali/house-prices-2023-dataset")
Step3: Processing
using CSV
using DataFrames
file_path = joinpath(files_path,"csv_we_want.csv")
data = CSV.read(open(file_path),DataFrame)
Implementation:
- Pycall KaggleAPI, a little heavy
- Or use Julia to request Kaggle rest API, this is more lightweight but a bit harder to implement.
What's your thought, do you think this feature makes sense?
I can implement this by myself and make a PR.
Kaggle supplies many datasets, most are in CSV format.
Does adding the feature of directly downloading Kaggle datasets in MLDatasets.jl make any sense?
For example, to download House Prices 2023 Dataset:
Step1: Get
kaggle.jsonfile or set theusernameandkeymanually.or download
keggle.jsonto~/.kaggle/Step2: Download
Step3: Processing
Implementation:
What's your thought, do you think this feature makes sense?
I can implement this by myself and make a PR.