A lightweight internal tool for immigration attorneys to manage client profiles, upload text documents, extract structured fields asynchronously, review field-level diffs, and apply approved changes with provenance and audit history.
- Next.js 16
- TypeScript
- Prisma
- SQLite
- Server API routes
npm installnpx prisma migrate dev --name initnpm run prisma:seednpm run dev-
List of Clients:
http://localhost:3000/clients -
Client Details by client ID:
http://localhost:3000/clients/{clientId} -
Seed data (sample test data):
http://localhost:3000/api/seed
The project uses SQLite for simplicity.
Database file: prisma/client-management.db
The seed creates three clients:
- Maria Gomez
- John Michael Doe
- Ravi Patel
Included in /samples:
passport.txti797c.txtinformal-email.txt
You can paste any of these into the upload form.
Upload → document created in uploaded.
Background worker:
uploadedprocessingneeds_review
Attorney reviews extracted fields.
Approved fields are applied transactionally.
Audit log and field provenance are recorded for every mutation.
Deterministic extraction using regex and lightweight heuristics is the chosen approach.
Considerations:
- Predictable behavior.
- No external API dependency.
- Easier to test.
- Easier to reason within given execution timeframe.
- Good fit for the provided structured and semi-structured sample inputs.
Trade-offs:
- Deterministic extraction is more explainable and reliable for known formats.
- AI-based extraction is more flexible for ambiguous, informal text.
- Deterministic extraction has lower recall on edge cases and wording variation.
- AI-based extraction would likely improve informal-email extraction, but introduces latency, external dependency, prompt design, retry/error handling, and PII concerns.
Production Improvements:
Hybrid approach:
- Deterministic parsing for passports and USCIS notices.
- AI-assisted extraction for informal, ambiguous, or free-form content.
- structured validation layer before any write to canonical profile data.
The Client table stores only the current canonical value for each client field.
Field provenance is stored in a separate FieldProvenance table that records:
- client
- field name
- source document
- source snippet
- value snapshot
- associated audit event
- actor
- timestamp
This avoids duplicating the full client record while preserving a field-level history of where a value came from and when it changed.
The upload endpoint stores the document immediately in uploaded state and returns without waiting on extraction.
A background in-process worker then:
- Moves the document to
processing. - Runs extraction.
- Stores extracted rows.
- Moves the document to
needs_review.
If extraction fails:
- The document moves to
failed. - An error message is stored on the document.
- An audit event is recorded.
This keeps upload non-blocking and makes failures visible and retryable.
To support multiple attorneys safely:
- A real
Usermodel. - Actual actor IDs in audit and provenance tables.
- Authorization and record-level access control.
- Optimistic locking or profile versioning.
- Possibly review-session ownership for extraction review rows.
- Conflict handling if two attorneys attempt to change the same field concurrently.
In a production system, a version column to Client and reject stale writes.
5. What PII handling considerations did you take into account, and what would you add with more time?
For this exercise:
- All processing stays local.
- No external AI service is called.
- Logging of raw text is minimized.
- Only the needed data is persisted.
Further Improvements:
- Authentication and authorization.
- Encryption at rest.
- Field-level encryption for passport number and A-number.
- Redacted application logs.
- Secret management.
- Retention / deletion policies.
- File storage controls.
- Stricter audit access controls.
When extracted data differs from an existing field value, the UI marks it as a conflict and shows:
- Current value.
- Extracted value.
- Confidence.
- Verbatim source snippet.
The attorney can:
- Approve the extracted value.
- Reject the extracted value.
- Manually edit and save a corrected value.
Approved or modified fields are written atomically in one transaction. Rejected fields are preserved in the audit trail but do not overwrite canonical profile data.
- One attorney actor is assumed for the demo.
- Dates are stored as strings for speed and readability in the timebox.
- Only
.txtcontent is supported. - SQLite is used for simplicity in the assignment.
- Async work is implemented with a simple in-process background job.
- No authentication is implemented in this take-home.
- Stronger field normalization and date parsing.
- Retry semantics and dead-letter handling for async failures.
- Richer deterministic extraction coverage.
- Optional hybrid AI extraction path.
- Optimistic locking for concurrent edits.
- Inline manual profile editing.
- Pagination controls on the list page.
- File upload support instead of pasted text only.
- Better visual diffing in the review UI.
- Search supports the client name but can be extended to include client details, immigration status, the list of uploaded documents, extracted data, and the audit history of changes.
- Tests
Author: Prachi Shah @ https://www.linkedin.com/in/prachisms/
P.S. The default copyright laws apply.