-
Notifications
You must be signed in to change notification settings - Fork 0
Define base card JSON Schema + prompts #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: documentation
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| # Card Storage Configuration | ||
| # Copy this file to .env.local and configure your storage backend | ||
|
|
||
| # ============================================================================ | ||
| # Storage Provider | ||
| # ============================================================================ | ||
| # Currently using in-memory storage for development | ||
| # Future: Can be extended to support DynamoDB or PostgreSQL | ||
|
|
||
| # ============================================================================ | ||
| # Authentication Configuration | ||
| # ============================================================================ | ||
| # TODO: Configure your authentication provider | ||
| # Example with NextAuth.js: | ||
| # NEXTAUTH_URL=http://localhost:3000 | ||
| # NEXTAUTH_SECRET=your_secret_here | ||
|
|
||
| # ============================================================================ | ||
| # Development Settings | ||
| # ============================================================================ | ||
|
|
||
| # Enable verbose logging for card operations | ||
| DEBUG_CARD_STORAGE=false | ||
|
|
||
| # Maximum cards per user (optional limit) | ||
| MAX_CARDS_PER_USER=10000 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| # See https://help.github.com/articles/ignoring-files/ for more about ignoring files. | ||
|
|
||
| # dependencies | ||
| /node_modules | ||
| /.pnp | ||
| .pnp.js | ||
| .yarn/install-state.gz | ||
|
|
||
| # testing | ||
| /coverage | ||
|
|
||
| # next.js | ||
| /.next/ | ||
| /out/ | ||
|
|
||
| # production | ||
| /build | ||
|
|
||
| # misc | ||
| .DS_Store | ||
| *.pem | ||
|
|
||
| # debug | ||
| npm-debug.log* | ||
| yarn-debug.log* | ||
| yarn-error.log* | ||
|
|
||
| # local env files | ||
| .env*.local | ||
| .env | ||
|
|
||
| # vercel | ||
| .vercel | ||
|
|
||
| # typescript | ||
| *.tsbuildinfo | ||
| next-env.d.ts | ||
|
|
||
| # IDE | ||
| .vscode/ | ||
| .idea/ | ||
| *.swp | ||
| *.swo | ||
| *~ | ||
|
|
||
| # OS | ||
| .DS_Store | ||
| Thumbs.db |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,93 @@ | ||
| { | ||
| "cardId": "string", | ||
| "language": "string", | ||
| "lemma": "string", | ||
| "normalizedLemma": "string", | ||
| "partOfSpeech": "string", | ||
| "otherForms": [ | ||
| { | ||
| "form": "string", | ||
| "grammaticalInfo": "string" | ||
| } | ||
| ], | ||
|
|
||
| "phonetics": { | ||
| "ipa": "string", | ||
| "respelling": "string" | ||
| }, | ||
|
|
||
| "definitions": [ | ||
| { | ||
| "definition": "string", | ||
| "register": "neutral | formal | informal | slang | technical", | ||
| "domain": "string", | ||
| "confidence": 0.0 | ||
| } | ||
| ], | ||
|
|
||
| "coreMeaning": "string", | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What would be the core meaning vs definition?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Core meaning is most common meaning and definitions is an array of definition objects with all meanings and contexts |
||
|
|
||
| "examples": [ | ||
| { | ||
| "sentence": "string", | ||
| "highlightedLemma": "string", | ||
| "difficulty": "easy | medium | hard", | ||
| "contextTag": "daily | academic | professional | literary", | ||
| "sourceType": "constructed | corpus-inspired" | ||
| } | ||
| ], | ||
|
|
||
| "collocations": [ | ||
| { | ||
| "phrase": "string", | ||
| "pattern": "string" | ||
| } | ||
| ], | ||
|
|
||
| "synonyms": [ | ||
| { | ||
| "word": "string", | ||
| "nuance": "string" | ||
| } | ||
| ], | ||
|
|
||
| "antonyms": ["string"], | ||
|
|
||
| "usageNotes": [ | ||
| { | ||
| "note": "string", | ||
| "commonMistake": true | ||
| } | ||
| ], | ||
|
|
||
| "semanticRelations": { | ||
| "hypernyms": ["string"], | ||
| "hyponyms": ["string"], | ||
| "relatedConcepts": ["string"] | ||
| }, | ||
|
|
||
| "etymology": { | ||
| "origin": "string", | ||
| "evolution": "string" | ||
| }, | ||
|
|
||
| "voiceout": { | ||
| "ttsText": "string", | ||
| "slowTtsText": "string", | ||
| "pronunciationHint": "string" | ||
| }, | ||
|
|
||
| "learningAids": { | ||
| "mnemonic": "string", | ||
| "visualCue": "string" | ||
| }, | ||
|
|
||
| "aiGrounding": { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like the AI grounding technique to make sure the data is legit |
||
| "allowedScope": "Only answer questions using this card's data and general linguistic knowledge.", | ||
| "ambiguityNotes": "string" | ||
| }, | ||
|
|
||
| "version": "1.0", | ||
| "createdBy": "ai | human", | ||
| "qualityScore": 0.0 | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,97 @@ | ||
| # LingoLM MVP v1 Tech Stack (Recommended) | ||
|
|
||
| ## Goal | ||
| Implement MVP v1 in the cheapest and simplest way possible: | ||
| - Single-word lookup -> auto-populated card -> ask questions about nuance -> add notes -> save | ||
| - Minimal infrastructure, minimal operational overhead | ||
|
|
||
| ## Non-Goals (v2+) | ||
| - Article ingestion and vocabulary extraction | ||
| - Full-text search across user notes | ||
| - Tags, linking, spaced repetition system (SRS), export | ||
| - Embeddings, vector search, “RAG” over a corpus | ||
|
|
||
| ## Architecture Summary | ||
| - Client: Web app (responsive) | ||
| - Auth: Amazon Cognito with Google IdP | ||
| - API: Amazon API Gateway (HTTP API) + AWS Lambda | ||
| - Data: DynamoDB only (WordCache + UserCards) | ||
| - LLM: Amazon Bedrock (structured JSON generation for base cards; chat for nuance Q&A) | ||
| - Observability: CloudWatch Logs + basic metrics | ||
|
|
||
| ## Frontend | ||
| ### Choice | ||
| - Next.js web app (responsive) | ||
| ### Hosting | ||
| - Vercel (fastest iteration) | ||
|
|
||
| ## Backend | ||
| - API Gateway (HTTP API) + Lambda | ||
| - API Gateway uses a Cognito JWT authorizer for protected routes | ||
| ### Lambda functions (recommended) | ||
| - GET /lookup?lang=&lemma= | ||
| - POST /cards (create/update user card) | ||
| - GET /cards (list user cards) | ||
| - POST /chat (nuance Q&A for a word/card) | ||
|
|
||
| ### Core behaviors | ||
| - Lookup uses lazy caching: | ||
| - WordCache hit: return cached base card JSON | ||
| - WordCache miss: call Bedrock -> store base card -> return | ||
| - Stores a user-owned copy (UserCards) that can diverge from the base card | ||
| - Chat calls Bedrock with (base card + user edits + notes + question) and returns an answer | ||
|
|
||
| ## Data Storage | ||
| ### DynamoDB tables | ||
| 1) WordCache (global) | ||
| - Partition key: PK = LANG#{lang} | ||
| - Sort key: SK = LEMMA#{lemma} | ||
| - Attributes: | ||
| - baseCard (Map) | ||
| - generatedAt (ISO string) | ||
| - modelId (string) | ||
| - promptVersion (string) | ||
| - schemaVersion (string) | ||
|
|
||
| 2) UserCards (per-user) | ||
| - Partition key: PK = USER#{userId} | ||
| - Sort key: SK = CARD#{lang}#{lemma}#{cardId} | ||
| - Attributes: | ||
| - lang (string) | ||
| - lemma (string) | ||
| - card (Map) # user-editable structured fields | ||
| - notes (string) | ||
| - createdAt (ISO string) | ||
| - updatedAt (ISO string) | ||
| - baseRef: | ||
| - cachePK (string) | ||
| - cacheSK (string) | ||
| - schemaVersion (string) | ||
| - promptVersion (string) | ||
|
|
||
| ### Notes on schema | ||
| - Store card bodies as DynamoDB Map types (not stringified JSON) | ||
| - Version fields allow safe migrations and gradual regeneration of cached cards | ||
|
|
||
| ## Bedrock Usage (No RAG in v1) | ||
| ### Base card generation | ||
| - Bedrock generates a structured “base card” JSON for a lemma using a strict schema and deterministic prompt | ||
| - No embeddings, no vector database, no retrieval pipeline | ||
|
|
||
| ### Nuance Q&A | ||
| - Bedrock answers user questions using: | ||
| - base card | ||
| - user edits | ||
| - user notes | ||
|
|
||
| ## Cost/Simplicity Principles | ||
| - Avoid always-on databases (no RDS/Postgres for v1) | ||
| - Avoid embeddings/vector search until there is a real corpus and a clear retrieval need | ||
| - Keep a single backend deployment model (API Gateway + Lambda) | ||
| - Use DynamoDB as the only persistent store in v1 | ||
|
|
||
| ## v2 Roadmap Hooks | ||
| - Article ingestion pipeline: paste article -> extract candidate words -> batch card generation | ||
| - Search/tags/linking/SRS: add secondary indexes and/or a dedicated search service later | ||
| - True RAG: only after choosing a grounded corpus (user-provided texts or curated examples) and defining retrieval objectives | ||
| - Store chat history per card later |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| ## Description | ||
| Describe the changes you made and why. | ||
|
|
||
| ## Related issue(s) | ||
| Link the issue(s) that this PR addresses. | ||
|
|
||
| ## Type of change | ||
| - [ ] Bug fix | ||
| - [ ] New feature | ||
| - [ ] Breaking change | ||
| - [ ] Documentation update | ||
|
|
||
| ## How did you test this? | ||
| Describe how you tested these changes. | ||
|
|
||
| ## Checklist | ||
| - [ ] I have performed a self-review of my own code. | ||
| - [ ] I have commented my code, particularly in hard-to-understand areas. | ||
| - [ ] I have made corresponding changes to the documentation. | ||
| - [ ] New and existing unit tests pass locally with my changes. | ||
|
|
||
| ## GenAI usage | ||
| - What model(s) did you use? | ||
| - Percent of AI-written code? | ||
| - Any drawbacks? | ||
| - (Optional) Paste your main prompts here for feedback on better prompting techniques for efficiency. |
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So just making sure for now we will include basic info of other forms of the word but the user will have to request more information for those? (the LLM can extend on a previously generated card). I think that's good because it doesn't include info not asked for if they only sought to study one form.