Type-safe AWS Athena client for TypeScript.
- Scaffold a project straight from your AWS account with
npx aathena init - 1:1 type mapping from AWS Glue to TypeScript, including native Parquet/ORC arrays, maps, and structs - no
CAST()orJSON.parse - Run queries concurrently with
parallel(), with automatic detection of injected-projection partitions (including those hidden behind Presto/Trino views)
Built on @aws-sdk/client-athena and sql-render. See the examples/ directory for working projects.
npm install aathenanpx aathena initinit walks you through a complete setup: picks an AWS region, picks a Glue database, lets you multi-select tables, probes each for partition requirements (following Presto/Trino views to their underlying tables), auto-runs generate, and writes a runnable src/main.ts that imports and invokes every scaffolded query. See CLI Reference for the full sequence and flags.
npx tsx src/main.tsIf any scaffolded query needs partition values, main.ts passes REPLACE_ME placeholders with a note at the top of the file. Replace them with real values, then run.
Every row is typed 1:1 against your Glue schema. Scalars land as native TypeScript (Date for timestamps, bigint, number, string), and Parquet/ORC arrays, maps, and structs parse back recursively - even nested. Hover any field in your editor to see the exact column type, with no CAST() in SQL and no manual JSON.parse:
// Glue: event_id bigint, created_at timestamp, tags array<varchar>, metadata map<string,int>, address struct<city:string>, items array<struct<qty:int>>
const result = await getEvents(athena, { rowLimit: 33 });
const row = result.rows[0];
row.eventId; // bigint
row.createdAt; // Date
row.tags; // string[]
row.metadata; // Record<string, number>
row.address.city; // string - direct struct field access
row.items[0].qty; // number - nested array of structYour scaffolded queries live under tables/{database}/{table}/{query-name}.sql. Open the file init (or add) created and shape it to your needs. Placeholders use {{name}} syntax. Types are inferred from SQL context:
WHERE status = '{{status}}' -- quoted -> string
LIMIT {{rowLimit}} -- LIMIT/OFFSET -> positiveInt
WHERE price >= {{minPrice}} -- comparison -> numberFor stricter validation, annotate with -- @param:
-- @param status enum('active','pending','done')
-- @param rowLimit positiveInt
-- @param startDate isoDate
SELECT *
FROM events
WHERE status = '{{status}}'
AND created_at >= '{{startDate}}'
LIMIT {{rowLimit}}Generates:
interface DefaultParams {
status: 'active' | 'pending' | 'done';
rowLimit: number; // validated > 0
startDate: string; // validated YYYY-MM-DD
}See @param types in Reference for the full annotation list.
Every generated query accepts a client and its typed parameters:
import { createClient } from 'aathena';
import { eventsDefault } from './generated';
const athena = createClient();
const result = await eventsDefault(athena, { status: 'active', rowLimit: 99 });createClient() reads aathena.config.json automatically; pass an explicit config to override (createClient({ region: 'us-east-1', database: 'analytics' })) - useful in tests or when the project root isn't on disk.
client.query() retries StartQueryExecution with exponential backoff + full jitter when Athena responds with TooManyRequestsException / CONCURRENT_QUERY_LIMIT_EXCEEDED. Up to 6 attempts. Applies to every call, generated or inline, including tasks dispatched by parallel().
Pass { exportTo: <path> } as the optional third argument to dump the rendered SQL (with parameter values substituted) to disk. The query still executes; missing parent directories are created, and the file is overwritten on each call. Forwarded directly to sql-render.
await eventsDefault(
athena,
{ status: 'active', rowLimit: 99 },
{ exportTo: './debug/eventsDefault.sql' },
);parallel() runs multiple queries concurrently with a bounded cap that respects Athena's per-account active-DML quota. Tasks are passed as thunks (() => query(...)) so the helper can gate when each query actually starts.
import { createClient, parallel } from 'aathena';
import { getUsers, getOrders } from './generated';
const athena = createClient();
const [users, orders] = await parallel(
[
() => getUsers(athena, { rowLimit: 99 }),
() => getOrders(athena, { from: '2022-02-02' }),
],
{ concurrency: 'auto', client: athena },
);concurrency accepts a number or 'auto'. With 'auto', the cap is resolved in this order:
AathenaConfig.maxConcurrencyif set- Live AWS Service Quotas lookup (
L-D405C694DML,L-FCDFE414DDL) - Region-aware conservative fallback (50% of AWS-documented default, clamped to
[5, 25])
The live lookup uses @aws-sdk/client-service-quotas, an optional dependency loaded via dynamic import. It requires the servicequotas:GetServiceQuota IAM permission; without it the lookup fails silently and the fallback is used.
| Option | Default | Description |
|---|---|---|
concurrency |
5 |
number or 'auto' |
client |
- | Required when concurrency: 'auto' and maxConcurrency is unset |
kind |
'dml' |
'dml' or 'ddl', selects which quota to probe |
reserveHeadroom |
1 |
Subtracted from the resolved quota |
mode |
'all' |
'all' rejects on first failure; 'allSettled' returns per-task settlements |
import {
AathenaError,
QueryTimeoutError,
QueryFailedError,
QueryCancelledError,
ColumnParseError,
} from 'aathena';
try {
const result = await eventsDefault(athena, { status: 'active', rowLimit: 99 });
} catch (err) {
if (err instanceof QueryTimeoutError) {
console.log(`Timed out after ${err.timeoutMs}ms: ${err.queryExecutionId}`);
} else if (err instanceof QueryFailedError) {
console.log(`Athena error: ${err.athenaErrorMessage}`);
} else if (err instanceof AathenaError) {
// QueryCancelledError, ColumnParseError, or anything else aathena threw
console.log(`aathena error (${err.name}): ${err.message}`);
} else {
throw err;
}
}All four specific classes extend AathenaError, so catching the base is enough when you don't need to discriminate.
Athena tables with partition projection type=injected require a static WHERE predicate on the partition column. Without it, queries fail at runtime with CONSTRAINT_VIOLATION.
init and add inspect each selected table's Glue Parameters for projection settings and scaffold the SQL with:
-- @param <col> stringannotationsWHERE <col> = '{{<col>}}'predicatesREPLACE_MEplaceholder values insrc/main.tsplus a note at the top
When the target is a Presto/Trino view (detected via TableType === 'VIRTUAL_VIEW' or the presto_view / trino_view parameter flag), aathena decodes the view's ViewOriginalText, extracts FROM / JOIN table references, and probes those underlying tables recursively. Max traversal depth is 3 levels with a visited set to defuse cycles. If a view's references cannot be parsed or the probe hits the limit, a note is written into the scaffolded SQL so you know to add the predicates by hand.
Example output for a view over a partitioned table:
-- Starter aathena query. Edit or delete as needed.
-- See README for placeholder and parameter syntax.
-- View 'analytics.events' traced to: analytics.events_raw
-- @param tenant_id string
-- @param rowLimit positiveInt
SELECT *
FROM events
WHERE tenant_id = '{{tenant_id}}'
LIMIT {{rowLimit}}By default every query runs against config.database. When a SQL file lives in tables/{other-db}/... (different from config.database), generate automatically emits an explicit per-query binding so it routes to that database at runtime:
// generated/queries/sales/events/default.ts (config.database is 'marketing')
export const default_ = createQuery<Events, DefaultParams>(
'tables/sales/events/default.sql',
schemaDef,
{ database: 'sales' },
);You don't write this yourself - add <db>.<table> and generate handle it. For ad-hoc inline queries use client.query(sql, { database: 'sales' }).
Every QueryResult includes a statistics block from Athena (cost tracking, queueing visibility, cache hits). Pass { includeRuntimeStats: true } for input/output row counts via an extra GetQueryRuntimeStatistics API call.
const result = await athena.query<Row>(sql, { includeRuntimeStats: true });
result.statistics.totalExecutionTimeInMillis; // 8128
result.statistics.resultReused; // true
result.statistics.runtime?.outputRows; // 99| Field | Notes |
|---|---|
engineExecutionTimeInMillis |
Engine execution time |
totalExecutionTimeInMillis |
Wall time Athena took |
queryQueueTimeInMillis |
Time spent waiting in the queue |
queryPlanningTimeInMillis |
Planning + partition retrieval |
servicePreProcessingTimeInMillis |
Preprocessing before engine |
serviceProcessingTimeInMillis |
Result publication |
dataScannedInBytes |
Drives query cost |
dpuCount? |
Capacity-reservation workgroups only |
resultReused? |
True if served from result cache |
runtime?.inputRows / inputBytes |
Opt-in via includeRuntimeStats |
runtime?.outputRows / outputBytes |
Opt-in via includeRuntimeStats |
| Command | Purpose |
|---|---|
aathena init |
Interactive project scaffold. Fills config from AWS, picks tables, probes partitions, runs generate, writes src/main.ts. |
aathena add <table> |
Scaffold a new query under tables/{database}/{table}/<name>.sql. Accepts db.table for cross-database tables and prompts to resolve mismatches. |
aathena generate |
Re-run codegen (fetch Glue schemas, produce typed query functions). Runs automatically after init and add unless --no-generate. |
aathena help |
Show all flags. |
Runs interactively by default, then:
- Resolves the AWS region from
--region, theAWS_REGION/AWS_DEFAULT_REGIONenv vars, or a select prompt (common commercial regions plus a Custom fallback for typing one in); lists Glue databases and Athena workgroups; inherits the workgroup's default output location when available - Writes
aathena.config.jsonand addsnode_modules/to.gitignore(thegenerated/directory is committed by default; delete it or add it to.gitignoreyourself if you prefer to regenerate on every build) - Lets you multi-select which tables to scaffold starter SQL for
- Probes each selected table (and any Presto/Trino view it points at) for injected-projection partitions that require WHERE predicates
- Writes
tables/{db}/{table}/default.sqlwithLIMIT {{rowLimit}}, plus the right-- @param/WHERElines - Runs
generateto produce typed query functions undergenerated/ - Writes a runnable
src/main.tsthat imports and calls every scaffolded query (single call for 1 table,parallel()demo for 2+)
Flags:
--region,--database,--workgroup,--output-location,--tables a,b,c- non-interactive overrides--tables-dir <path>- override the SQL scaffold root (defaulttables); persisted toconfig.tablesDir--out-dir <path>- override the codegen output directory (defaultgenerated); persisted toconfig.outDir--example-path <path>- override the runnable starter file (defaultsrc/main.ts); thefrom '../generated'import is rewritten to point at--out-dirfrom the chosen location--no-sample- skip SQL scaffolding--no-generate- skip the auto-generate step--no-example- skip writing the example file--force- overwriteaathena.config.jsonand regenerate the example file to reflect the current selection. SQL files are always preserved because you may have edited them.
npx aathena add events # tables/{config.db}/events/default.sql
npx aathena add sales.events # cross-db; interactive prompt resolves mismatch
npx aathena add events --from-schema # embed Glue column list as a comment block
npx aathena add events --name daily # scaffold daily.sql instead of default.sqladd always probes partitions (even without --from-schema) and auto-runs generate unless --no-generate.
Flags:
--name <query-name>- query filename (default:default)--from-schema- include Glue column list as a comment block--force- overwrite an existing SQL file--no-generate- skip the auto-generate step
Re-runs codegen: fetches Glue schemas for every SQL file under tables/ in parallel and produces generated/types/{database}/{table}.ts, generated/queries/{database}/{table}/{query}.ts, and a barrel generated/index.ts. Run it after editing SQL files or when upstream schemas change. Surfaces a one-line info when a query directory's database differs from config.database and a per-query binding is emitted.
aathena.config.json lives at your project root. init writes it for you; edit as needed afterwards. Both the CLI and runtime walk up the directory tree to find it.
{
"region": "eu-west-1",
"database": "sampledb",
"workgroup": "primary",
"outputLocation": "s3://my-bucket/athena-results/",
"tablesDir": "./tables",
"outDir": "./generated",
"codegen": {
"indent": 2
},
"query": {
"timeout": 300000,
"pollingInterval": 500,
"maxPollingInterval": 5000
}
}| Field | Default | Description |
|---|---|---|
region |
AWS default | AWS region |
database |
required | Primary Athena database (used when a query's directory doesn't specify otherwise) |
workgroup |
- | Athena workgroup |
outputLocation |
- | S3 path for query results (optional if workgroup has a default) |
tablesDir |
./tables |
SQL files directory |
outDir |
./generated |
Codegen output directory |
codegen.indent |
2 |
Number of spaces used to indent generated TypeScript files (matches Prettier / Google / Airbnb / Angular / NestJS defaults). Clamped to [1, 8] |
query.timeout |
300000 |
Query timeout in ms (5 min) |
query.pollingInterval |
500 |
Initial poll interval in ms |
query.maxPollingInterval |
5000 |
Max poll interval in ms |
maxConcurrency |
- | Override for parallel({ concurrency: 'auto' }) when Service Quotas is unreachable |
Glue column types map 1:1 to TypeScript:
| Athena | TypeScript |
|---|---|
varchar, string, char |
string |
integer, int, smallint, tinyint |
number |
bigint |
bigint |
double, float, real |
number |
decimal |
string (preserves precision) |
boolean |
boolean |
date |
string (YYYY-MM-DD) |
timestamp |
Date |
json |
unknown |
binary, varbinary |
string (Base64 encoded) |
array<T> |
T[] |
map<K, V> |
Record<K, V> |
struct<a:T, b:U> |
{ a: T; b: U } |
Regular columns are nullable (T | null); partition keys are always non-null. See Quick Start step 3 for a typed-access demo that covers scalars and nested complex types.
| Annotation | TypeScript | Validation |
|---|---|---|
string |
string |
SQL injection check |
number |
number |
Finite number |
boolean |
boolean |
true / false |
positiveInt |
number |
Integer > 0 |
isoDate |
string |
YYYY-MM-DD |
isoTimestamp |
string |
ISO 8601 |
identifier |
string |
SQL identifier |
uuid |
string |
RFC 4122 UUID |
s3Path |
string |
s3://bucket/path |
enum('a','b','c') |
'a' | 'b' | 'c' |
Whitelist |
The barrel at generated/index.ts re-exports every query under a JS-safe identifier:
latest.sql->latestdefault.sql(JS reserved word) -> aliased as<table>Default, e.g.eventsDefault- Two queries with the same filename across tables -> aliased as
<table>{Query}on both sides
So a scaffolded tables/sampledb/events/default.sql shows up as:
import { eventsDefault } from './generated';
const result = await eventsDefault(athena, { rowLimit: 33 });project/
├── aathena.config.json # project root marker, written by 'init'
├── tables/ # SQL files you edit
│ └── sampledb/ # database
│ └── events/ # table
│ ├── default.sql
│ └── daily.sql
├── generated/ # codegen output, committed by default
│ ├── index.ts # barrel re-exporting every query under a JS-safe name
│ ├── types/ # one file per table, mirroring the Glue schema
│ │ └── sampledb/
│ │ └── events.ts
│ └── queries/ # one file per SQL, binding it to its types
│ └── sampledb/
│ └── events/
│ ├── default.ts
│ └── daily.ts
└── src/
└── main.ts # runnable example, written by 'init'
Nested grouping under a table (e.g. events/cart/add.sql) works too - codegen walks the whole tree under tables/.
- Node.js >= 20
- AWS credentials resolvable via the default provider chain (env, shared config, SSO, IAM role, etc.)
- IAM permissions:
athena:StartQueryExecution,athena:GetQueryExecution,athena:GetQueryResults(runtime)athena:ListWorkGroups,athena:GetWorkGroup(init)glue:GetDatabases,glue:GetTables,glue:GetTable(init, add, generate)servicequotas:GetServiceQuota(optional, enablesparallel({ concurrency: 'auto' }))- Your query's S3 read + output-bucket write permissions
- sql-render - type-safe SQL templating with injection protection (used internally)
Two machine-readable summaries are maintained for AI consumption, following the llms.txt convention:
- llms.txt - curated API reference and usage guide
- llms-full.txt - full packed source, auto-generated on each push