Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions service_config/service.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
- [AUTH\_CURATION\_TEAM](#auth_curation_team)
- [AUTH\_GITHUB\_CLIENT](#auth_github_client)
- [AUTH\_HARVEST\_TEAM](#auth_harvest_team)
- [BATCH_RATE_LIMIT_MAX](#batch_rate_limit_max)
- [BATCH_RATE_LIMIT_WINDOW](#batch_rate_limit_window)
- [CACHING\_PROVIDER](#caching_provider)
- [CACHING\_REDIS\_SERVICE](#caching_redis_service)
- [CRAWLER\_API\_AUTH\_TOKEN\*\*](#crawler_api_auth_token)
Expand All @@ -32,6 +34,8 @@
- [HARVEST\_QUEUE\_PROVIDER](#harvest_queue_provider)
- [HARVEST\_QUEUE\_PREFIX\*\*](#harvest_queue_prefix)
- [HARVESTER\_PROVIDER](#harvester_provider)
- [LOG\_NODE\_HEAPSTATS](#log_node_heapstats)
- [LOG\_NODE\_HEAPSTATS\_INTERVAL\_MS](#log_node_heapstats_interval_ms)
- [MULTIVERSION\_CURATION\_FF](#multiversion_curation_ff)
- [NODE\_ENV](#node_env)
- [RATE\_LIMIT\_MAX](#rate_limit_max)
Expand Down Expand Up @@ -94,9 +98,13 @@ The environmental variables for the clearlydefined-api-dev App Service include:
* HARVEST_QUEUE_PREFIX
* HARVEST_QUEUE_PROVIDER
* HARVESTER_PROVIDER
* LOG_NODE_HEAPSTATS
* LOG_NODE_HEAPSTATS_INTERVAL_MS
* NODE_ENV
* RATE_LIMIT_MAX
* RATE_LIMIT_WINDOW
* BATCH_RATE_LIMIT_MAX
* BATCH_RATE_LIMIT_WINDOW
* SEARCH_AZURE_API_KEY
* SEARCH_AZURE_SERVICE
* SEARCH_PROVIDER
Expand Down Expand Up @@ -312,6 +320,28 @@ Important to ensure that any other instances of production crawlers that use the

This indicates what type of service we use for harvesting, in this case it's **crawlerQueue**, which corresponds with the [crawlerQueue harvest provider](https://github.com/clearlydefined/service/blob/master/providers/harvest/crawlerQueue.js)

### LOG_NODE_HEAPSTATS

This is an optional flag to `enable` logging of Node's `v8` module's memory usage data using the `getHeapSpaceStatistics` and `getHeapStatistics()` functions.

Value is either `true` or `false`
> Note: if this env var is not present, it equates to `false`
> example:
> `LOG_NODE_HEAPSTATS` = `true`

- [Node.js v8 engine docs - getHeapSpaceStatistics()](https://nodejs.org/docs/v22.12.0/api/v8.html#v8getheapspacestatistics)

- [Node.js v8 engine docs - getHeapStatistics()](https://nodejs.org/docs/v22.12.0/api/v8.html#v8getheapstatistics)

### LOG_NODE_HEAPSTATS_INTERVAL_MS

This is an optional environment variable that sets the interval to log heap statistics (When enabled).

Value is a number in `ms` (`milliseconds`).
> NOTE: The default value is `30000` ms (`30` seconds)
> example:
> `LOG_NODE_HEAPSTATS_INTERVAL_MS` = `10000`

### MULTIVERSION_CURATION_FF

This is a feature flag that indicates whether the [Multi-version curation feature](https://github.com/clearlydefined/service/pull/810) is active.
Expand All @@ -334,6 +364,14 @@ When we [use this value in the code](https://github.com/clearlydefined/service/b

So, one IP address can only call the ClearlyDefined API 500 times every 300 seconds.

### BATCH_RATE_LIMIT_MAX

Defines the maximum number of requests allowed from a single IP to the batch endpoints within the batch rate limit window.

### BATCH_RATE_LIMIT_WINDOW

Defines the time window (in seconds) used to apply `BATCH_RATE_LIMIT_MAX` for batch endpoints. This value is multiplied by 1000 internally to convert to milliseconds (same as `RATE_LIMIT_WINDOW`)

### SEARCH_PROVIDER

We use [Azure Cognitive Search](https://docs.microsoft.com/en-us/azure/search/search-what-is-azure-search) to power ClearlyDefined's Search functionality, in this case this is indicated with the string "azure".
Expand Down