Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions app/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Card Storage Configuration
# Copy this file to .env.local and configure your storage backend

# ============================================================================
# Storage Provider
# ============================================================================
# Currently using in-memory storage for development
# Future: Can be extended to support DynamoDB or PostgreSQL

# ============================================================================
# Authentication Configuration
# ============================================================================
# TODO: Configure your authentication provider
# Example with NextAuth.js:
# NEXTAUTH_URL=http://localhost:3000
# NEXTAUTH_SECRET=your_secret_here

# ============================================================================
# Development Settings
# ============================================================================

# Enable verbose logging for card operations
DEBUG_CARD_STORAGE=false

# Maximum cards per user (optional limit)
MAX_CARDS_PER_USER=10000
48 changes: 48 additions & 0 deletions app/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.

# dependencies
/node_modules
/.pnp
.pnp.js
.yarn/install-state.gz

# testing
/coverage

# next.js
/.next/
/out/

# production
/build

# misc
.DS_Store
*.pem

# debug
npm-debug.log*
yarn-debug.log*
yarn-error.log*

# local env files
.env*.local
.env

# vercel
.vercel

# typescript
*.tsbuildinfo
next-env.d.ts

# IDE
.vscode/
.idea/
*.swp
*.swo
*~

# OS
.DS_Store
Thumbs.db
93 changes: 93 additions & 0 deletions baseCard.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
{
"cardId": "string",
"language": "string",
"lemma": "string",
"normalizedLemma": "string",
"partOfSpeech": "string",
"otherForms": [

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just making sure for now we will include basic info of other forms of the word but the user will have to request more information for those? (the LLM can extend on a previously generated card). I think that's good because it doesn't include info not asked for if they only sought to study one form.

{
"form": "string",
"grammaticalInfo": "string"
}
],

"phonetics": {
"ipa": "string",
"respelling": "string"
},

"definitions": [
{
"definition": "string",
"register": "neutral | formal | informal | slang | technical",
"domain": "string",
"confidence": 0.0
}
],

"coreMeaning": "string",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the core meaning vs definition?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core meaning is most common meaning and definitions is an array of definition objects with all meanings and contexts


"examples": [
{
"sentence": "string",
"highlightedLemma": "string",
"difficulty": "easy | medium | hard",
"contextTag": "daily | academic | professional | literary",
"sourceType": "constructed | corpus-inspired"
}
],

"collocations": [
{
"phrase": "string",
"pattern": "string"
}
],

"synonyms": [
{
"word": "string",
"nuance": "string"
}
],

"antonyms": ["string"],

"usageNotes": [
{
"note": "string",
"commonMistake": true
}
],

"semanticRelations": {
"hypernyms": ["string"],
"hyponyms": ["string"],
"relatedConcepts": ["string"]
},

"etymology": {
"origin": "string",
"evolution": "string"
},

"voiceout": {
"ttsText": "string",
"slowTtsText": "string",
"pronunciationHint": "string"
},

"learningAids": {
"mnemonic": "string",
"visualCue": "string"
},

"aiGrounding": {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the AI grounding technique to make sure the data is legit

"allowedScope": "Only answer questions using this card's data and general linguistic knowledge.",
"ambiguityNotes": "string"
},

"version": "1.0",
"createdBy": "ai | human",
"qualityScore": 0.0
}
97 changes: 97 additions & 0 deletions docs/mvp-v1-tech-stack.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# LingoLM MVP v1 Tech Stack (Recommended)

## Goal
Implement MVP v1 in the cheapest and simplest way possible:
- Single-word lookup -> auto-populated card -> ask questions about nuance -> add notes -> save
- Minimal infrastructure, minimal operational overhead

## Non-Goals (v2+)
- Article ingestion and vocabulary extraction
- Full-text search across user notes
- Tags, linking, spaced repetition system (SRS), export
- Embeddings, vector search, “RAG” over a corpus

## Architecture Summary
- Client: Web app (responsive)
- Auth: Amazon Cognito with Google IdP
- API: Amazon API Gateway (HTTP API) + AWS Lambda
- Data: DynamoDB only (WordCache + UserCards)
- LLM: Amazon Bedrock (structured JSON generation for base cards; chat for nuance Q&A)
- Observability: CloudWatch Logs + basic metrics

## Frontend
### Choice
- Next.js web app (responsive)
### Hosting
- Vercel (fastest iteration)

## Backend
- API Gateway (HTTP API) + Lambda
- API Gateway uses a Cognito JWT authorizer for protected routes
### Lambda functions (recommended)
- GET /lookup?lang=&lemma=
- POST /cards (create/update user card)
- GET /cards (list user cards)
- POST /chat (nuance Q&A for a word/card)

### Core behaviors
- Lookup uses lazy caching:
- WordCache hit: return cached base card JSON
- WordCache miss: call Bedrock -> store base card -> return
- Stores a user-owned copy (UserCards) that can diverge from the base card
- Chat calls Bedrock with (base card + user edits + notes + question) and returns an answer

## Data Storage
### DynamoDB tables
1) WordCache (global)
- Partition key: PK = LANG#{lang}
- Sort key: SK = LEMMA#{lemma}
- Attributes:
- baseCard (Map)
- generatedAt (ISO string)
- modelId (string)
- promptVersion (string)
- schemaVersion (string)

2) UserCards (per-user)
- Partition key: PK = USER#{userId}
- Sort key: SK = CARD#{lang}#{lemma}#{cardId}
- Attributes:
- lang (string)
- lemma (string)
- card (Map) # user-editable structured fields
- notes (string)
- createdAt (ISO string)
- updatedAt (ISO string)
- baseRef:
- cachePK (string)
- cacheSK (string)
- schemaVersion (string)
- promptVersion (string)

### Notes on schema
- Store card bodies as DynamoDB Map types (not stringified JSON)
- Version fields allow safe migrations and gradual regeneration of cached cards

## Bedrock Usage (No RAG in v1)
### Base card generation
- Bedrock generates a structured “base card” JSON for a lemma using a strict schema and deterministic prompt
- No embeddings, no vector database, no retrieval pipeline

### Nuance Q&A
- Bedrock answers user questions using:
- base card
- user edits
- user notes

## Cost/Simplicity Principles
- Avoid always-on databases (no RDS/Postgres for v1)
- Avoid embeddings/vector search until there is a real corpus and a clear retrieval need
- Keep a single backend deployment model (API Gateway + Lambda)
- Use DynamoDB as the only persistent store in v1

## v2 Roadmap Hooks
- Article ingestion pipeline: paste article -> extract candidate words -> batch card generation
- Search/tags/linking/SRS: add secondary indexes and/or a dedicated search service later
- True RAG: only after choosing a grounded corpus (user-provided texts or curated examples) and defining retrieval objectives
- Store chat history per card later
26 changes: 26 additions & 0 deletions docs/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
## Description
Describe the changes you made and why.

## Related issue(s)
Link the issue(s) that this PR addresses.

## Type of change
- [ ] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation update

## How did you test this?
Describe how you tested these changes.

## Checklist
- [ ] I have performed a self-review of my own code.
- [ ] I have commented my code, particularly in hard-to-understand areas.
- [ ] I have made corresponding changes to the documentation.
- [ ] New and existing unit tests pass locally with my changes.

## GenAI usage
- What model(s) did you use?
- Percent of AI-written code?
- Any drawbacks?
- (Optional) Paste your main prompts here for feedback on better prompting techniques for efficiency.
Binary file added docs/system-design.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
81 changes: 0 additions & 81 deletions docs/tech-stack.md

This file was deleted.

Loading