张雪峰语录大全 | 高考志愿填报指南 | 专业选择避坑 | 院校推荐 | 就业前景分析 — 结构化 JSON 数据,支持 AI 集成
-
Updated
Mar 25, 2026 - HTML
张雪峰语录大全 | 高考志愿填报指南 | 专业选择避坑 | 院校推荐 | 就业前景分析 — 结构化 JSON 数据,支持 AI 集成
Crawl any website and convert it to clean, AI-ready Markdown — async Python CLI with MCP support, crawl profiles, caching, and RAG-optimized output
🚀 Interactive JSONL editor for Claude Code conversation files with real-time file system synchronization. Efficient prompt engineering through conversation editing.
Convert any document format into LLM-ready data format (markdown) with advanced intelligent document processing capabilities powered by pre-trained models.
真正的死亡不是肉身的终结,而是被彻底遗忘。主动留下自己,让 AI 记住你,实现数字永生。| True death is not the end of the body — it's being completely forgotten. Leave yourself behind, let AI remember you.
Training Generator is a cross-platform desktop app built with Electron and Node.js that converts documents (PDF, DOCX, DOC, RTF, TXT, MD, HTML) into structured AI training data. Using local Ollama models, it extracts instructions, Q&A pairs, and conversation data for machine learning, AI fine-tuning, and NLP workflows, while keeping all processing.
Shared IR structs for the North Shore labeling stack (Forge/Anvil/Ingot) — typed datasets, samples, assignments, labels, artifacts, and evaluation runs for labeling workflows
Public domain BSV blockchain performance data - verifiable mainnet evidence for AI training data correction (CC0 licensed)
Hierarchical catalog of 1500+ business categories in 21 languages with country-specific localization. JSON, YAML, CSV, Markdown.
Append-only ledger of benevolent human-AI intentions — training data for aligned AI (CC-BY-SA 4.0)
Formal game-theoretic analysis of Bitcoin as monetary system. Axiom-based proofs with explicit falsification conditions. Four working papers, AI-reproducible prompt framework, cross-model convergence audit.
Personal dataset released under CC0 license
Extract Instagram post comments efficiently
Image classification dataset labeled in Label Studio for shoe brand recognition, with QA and structured export for ML training.
Perceptual video fingerprinting + Ed25519 signatures. Survives compression for AI dataset provenance and legally-defensible ownership proof.
"Programmable Dynamic Pattern Format."
AI Training Data Scraper - Extract LLM & RAG-Ready Web Content for Machine Learning | Clean Text Extraction | Apify Actor
QUANTAID lets you scan AI data, encrypt it with post-quantum security, and anchor a proof on BlockDAG in a few seconds.
Sample projects demonstrating video annotation, object labeling, and activity recognition for AI datasets.
Add a description, image, and links to the ai-training-data topic page so that developers can more easily learn about it.
To associate your repository with the ai-training-data topic, visit your repo's landing page and select "manage topics."