From bdc7e95d14368602ade2e6193c5a6668b4bce554 Mon Sep 17 00:00:00 2001 From: Qing Tomlinson Date: Wed, 26 Nov 2025 14:35:32 -0800 Subject: [PATCH 1/2] Add documentation for BATCH_RATE_LIMIT_MAX and BATCH_RATE_LIMIT_WINDOW --- service_config/service.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/service_config/service.md b/service_config/service.md index 0872ce5..9a6fa19 100644 --- a/service_config/service.md +++ b/service_config/service.md @@ -7,6 +7,8 @@ - [AUTH\_CURATION\_TEAM](#auth_curation_team) - [AUTH\_GITHUB\_CLIENT](#auth_github_client) - [AUTH\_HARVEST\_TEAM](#auth_harvest_team) + - [BATCH_RATE_LIMIT_MAX](#batch_rate_limit_max) + - [BATCH_RATE_LIMIT_WINDOW](#batch_rate_limit_window) - [CACHING\_PROVIDER](#caching_provider) - [CACHING\_REDIS\_SERVICE](#caching_redis_service) - [CRAWLER\_API\_AUTH\_TOKEN\*\*](#crawler_api_auth_token) @@ -97,6 +99,8 @@ The environmental variables for the clearlydefined-api-dev App Service include: * NODE_ENV * RATE_LIMIT_MAX * RATE_LIMIT_WINDOW +* BATCH_RATE_LIMIT_MAX +* BATCH_RATE_LIMIT_WINDOW * SEARCH_AZURE_API_KEY * SEARCH_AZURE_SERVICE * SEARCH_PROVIDER @@ -334,6 +338,14 @@ When we [use this value in the code](https://github.com/clearlydefined/service/b So, one IP address can only call the ClearlyDefined API 500 times every 300 seconds. +### BATCH_RATE_LIMIT_MAX + +Defines the maximum number of requests allowed from a single IP to the batch endpoints within the batch rate limit window. + +### BATCH_RATE_LIMIT_WINDOW + +Defines the time window (in seconds) used to apply `BATCH_RATE_LIMIT_MAX` for batch endpoints. This value is multiplied by 1000 internally to convert to milliseconds (same as `RATE_LIMIT_WINDOW`) + ### SEARCH_PROVIDER We use [Azure Cognitive Search](https://docs.microsoft.com/en-us/azure/search/search-what-is-azure-search) to power ClearlyDefined's Search functionality, in this case this is indicated with the string "azure". From 696e4d3cd0e60912b5f7ec1863045f6159ee31f7 Mon Sep 17 00:00:00 2001 From: Qing Tomlinson Date: Wed, 26 Nov 2025 14:45:58 -0800 Subject: [PATCH 2/2] Document environment variables for logging node heap stats --- service_config/service.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/service_config/service.md b/service_config/service.md index 9a6fa19..f12785f 100644 --- a/service_config/service.md +++ b/service_config/service.md @@ -34,6 +34,8 @@ - [HARVEST\_QUEUE\_PROVIDER](#harvest_queue_provider) - [HARVEST\_QUEUE\_PREFIX\*\*](#harvest_queue_prefix) - [HARVESTER\_PROVIDER](#harvester_provider) + - [LOG\_NODE\_HEAPSTATS](#log_node_heapstats) + - [LOG\_NODE\_HEAPSTATS\_INTERVAL\_MS](#log_node_heapstats_interval_ms) - [MULTIVERSION\_CURATION\_FF](#multiversion_curation_ff) - [NODE\_ENV](#node_env) - [RATE\_LIMIT\_MAX](#rate_limit_max) @@ -96,6 +98,8 @@ The environmental variables for the clearlydefined-api-dev App Service include: * HARVEST_QUEUE_PREFIX * HARVEST_QUEUE_PROVIDER * HARVESTER_PROVIDER +* LOG_NODE_HEAPSTATS +* LOG_NODE_HEAPSTATS_INTERVAL_MS * NODE_ENV * RATE_LIMIT_MAX * RATE_LIMIT_WINDOW @@ -316,6 +320,28 @@ Important to ensure that any other instances of production crawlers that use the This indicates what type of service we use for harvesting, in this case it's **crawlerQueue**, which corresponds with the [crawlerQueue harvest provider](https://github.com/clearlydefined/service/blob/master/providers/harvest/crawlerQueue.js) +### LOG_NODE_HEAPSTATS + +This is an optional flag to `enable` logging of Node's `v8` module's memory usage data using the `getHeapSpaceStatistics` and `getHeapStatistics()` functions. + +Value is either `true` or `false` +> Note: if this env var is not present, it equates to `false` +> example: +> `LOG_NODE_HEAPSTATS` = `true` + +- [Node.js v8 engine docs - getHeapSpaceStatistics()](https://nodejs.org/docs/v22.12.0/api/v8.html#v8getheapspacestatistics) + +- [Node.js v8 engine docs - getHeapStatistics()](https://nodejs.org/docs/v22.12.0/api/v8.html#v8getheapstatistics) + +### LOG_NODE_HEAPSTATS_INTERVAL_MS + +This is an optional environment variable that sets the interval to log heap statistics (When enabled). + +Value is a number in `ms` (`milliseconds`). +> NOTE: The default value is `30000` ms (`30` seconds) +> example: +> `LOG_NODE_HEAPSTATS_INTERVAL_MS` = `10000` + ### MULTIVERSION_CURATION_FF This is a feature flag that indicates whether the [Multi-version curation feature](https://github.com/clearlydefined/service/pull/810) is active.