Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions conf/input.elasticsearch/elasticsearch.toml
Original file line number Diff line number Diff line change
Expand Up @@ -100,3 +100,20 @@ cluster_info_interval = "5m"
# tls_key = "/etc/categraf/key.pem"
## Use TLS but skip chain & host verification
# insecure_skip_verify = true

## Sets the number of most recent indices to return for indices that are configured with a date-stamped suffix.
## Each 'indices_include' entry ending with a wildcard (*) or glob matching pattern will group together all indices that match it, and
## sort them by the date or number after the wildcard. Metrics then are gathered for only the 'num_most_recent_indices' amount of most
## recent indices.
num_most_recent_indices = 0



## must num_most_recent_indices Coordinated use
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment "must num_most_recent_indices Coordinated use" is grammatically incorrect and unclear. It should be reworded to something like "Must be used in coordination with num_most_recent_indices" or similar to improve clarity.

Suggested change
## must num_most_recent_indices Coordinated use
## Must be used in coordination with num_most_recent_indices.

Copilot uses AI. Check for mistakes.
## default ["(?P<date>(?:\\d{4}|\\d{2})[.-]?(?:\\d{2})[.-]?(?:\\d{2})?[.-]?(?:\\d{2})?)$","[\\.-._]\\d+(\\.\\d+){0,2}$"]
## match //YYYY.MM.DD or YYYY-MM-DD or YYYYMMDD or YYYY-MM-DD-HH
## //YYYY.MM or YYYY-MM or YYYYMM or YYYYMMDDHH
## //YY.MM.DD or YY-MM-DD or YYMMDD or YY.MM.DD.HH
## //v1_001 v1_002 -->v1* v0.1 v0.2 -->v0* v5.2.3 v5.2.4 -->v5*
## or self expansion
dynamic_index_matcher_regexp = ["(?P<date>(?:\\d{4}|\\d{2})[.-]?(?:\\d{2})[.-]?(?:\\d{2})?[.-]?(?:\\d{2})?)$","[\\.-._]\\d+(\\.\\d+){0,2}$"]
Comment on lines +113 to +119
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default regex patterns in the configuration file have the same issue as in the code - excessive backslash escaping. These patterns should use \d instead of \\d to properly match digits in regex patterns.

Suggested change
## default ["(?P<date>(?:\\d{4}|\\d{2})[.-]?(?:\\d{2})[.-]?(?:\\d{2})?[.-]?(?:\\d{2})?)$","[\\.-._]\\d+(\\.\\d+){0,2}$"]
## match //YYYY.MM.DD 或 YYYY-MM-DD 或 YYYYMMDD 或 YYYY-MM-DD-HH
## //YYYY.MM 或 YYYY-MM 或 YYYYMM 或YYYYMMDDHH
## //YY.MM.DD 或 YY-MM-DD 或 YYMMDD 或YYYY.MM.DD.HH
## //v1_001 v1_002 -->v1* v0.1 v0.2 -->v0* v5.2.3 v5.2.4 -->v5*
## or self expansion
dynamic_index_matcher_regexp = ["(?P<date>(?:\\d{4}|\\d{2})[.-]?(?:\\d{2})[.-]?(?:\\d{2})?[.-]?(?:\\d{2})?)$","[\\.-._]\\d+(\\.\\d+){0,2}$"]
## default ["(?P<date>(?:\d{4}|\d{2})[.-]?(?:\d{2})[.-]?(?:\d{2})?[.-]?(?:\d{2})?)$","[\\.-._]\d+(\.\d+){0,2}$"]
## match //YYYY.MM.DD 或 YYYY-MM-DD 或 YYYYMMDD 或 YYYY-MM-DD-HH
## //YYYY.MM 或 YYYY-MM 或 YYYYMM 或YYYYMMDDHH
## //YY.MM.DD 或 YY-MM-DD 或 YYMMDD 或YYYY.MM.DD.HH
## //v1_001 v1_002 -->v1* v0.1 v0.2 -->v0* v5.2.3 v5.2.4 -->v5*
## or self expansion
dynamic_index_matcher_regexp = ["(?P<date>(?:\d{4}|\d{2})[.-]?(?:\d{2})[.-]?(?:\d{2})?[.-]?(?:\d{2})?)$","[\\.-._]\d+(\.\d+){0,2}$"]

Copilot uses AI. Check for mistakes.
17 changes: 17 additions & 0 deletions inputs/elasticsearch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -483,3 +483,20 @@ ES 7.x 支持基于角色的访问控制(RBACs)。`elasticsearch` 插件需
| elasticsearch_slm_stats_snapshots_deleted_total | counter | 按策略删除的快照数 |
| elasticsearch_slm_stats_snapshot_deletion_failures_total | counter | 按策略快照删除失败次数 |
| elasticsearch_slm_stats_operation_mode | gauge | SLM操作模式(运行中,停止中,已停止) |


#### `num_most_recent_indices = 0`


| 设置日期类动态索引可取前"num_most_recent_indices"个最新index的指标数据
| 可极大减少历史动态索引导致的大指标量级
| 可与“indices_include”配置一起使用

#### `dynamic_index_matcher_regexp` = ["(?P<date>(?:\\d{4}|\\d{2})[.-]?(?:\\d{2})[.-]?(?:\\d{2})?[.-]?(?:\\d{2})?)$","[\\.-._]\\d+(\\.\\d+){0,2}$"]
| 与num_most_recent_indices 配合使用,用于指定动态索引的匹配逻辑,默认值:
| ["(?P<date>(?:\\d{4}|\\d{2})[.-]?(?:\\d{2})[.-]?(?:\\d{2})?[.-]?(?:\\d{2})?)$","[\\.-._]\\d+(\\.\\d+){0,2}$"]
| 支持匹配 //YYYY.MM.DD 或 YYYY-MM-DD 或 YYYYMMDD 或 YYYY-MM-DD-HH
| //YYYY.MM 或 YYYY-MM 或 YYYYMM 或YYYYMMDDHH
| //YY.MM.DD 或 YY-MM-DD 或 YYMMDD 或YYYY.MM.DD.HH
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo in this comment - "YYY.MM.DD.HH" should be "YY.MM.DD.HH" (with two Y's not three) to be consistent with the other date format patterns mentioned.

Suggested change
| //YY.MM.DD 或 YY-MM-DD 或 YYMMDD 或YYYY.MM.DD.HH
| //YY.MM.DD 或 YY-MM-DD 或 YYMMDD 或YY.MM.DD.HH

Copilot uses AI. Check for mistakes.
| //v1_001 v1_002 -->v1* v0.1 v0.2 -->v0* v5.2.3 v5.2.4 -->v5*
| 也可自行扩展
16 changes: 16 additions & 0 deletions inputs/elasticsearch/README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -477,3 +477,19 @@ ES 7.x supports RBACs. The following security privileges are required for the `e
| elasticsearch_slm_stats_snapshots_deleted_total | counter | Snapshots deleted by policy |
| elasticsearch_slm_stats_snapshot_deletion_failures_total | counter | Snapshot deletion failures by policy |
| elasticsearch_slm_stats_operation_mode | gauge | SLM operation mode (Running, stopping, stopped) |


#### `num_most_recent_indices = 0`
|Set the indicator data for the latest index of the first "num_mast_decent_indice" in the date class dynamic index
|It can greatly reduce the large scale of indicators caused by historical dynamic indexing
|Can be used together with the 'indices_inclub' configuration


#### `dynamic_index_matcher_regexp` = ["(?P<date>(?:\\d{4}|\\d{2})[.-]?(?:\\d{2})[.-]?(?:\\d{2})?[.-]?(?:\\d{2})?)$","[\\.-._]\\d+(\\.\\d+){0,2}$"]
|Used in conjunction with num_mast_decent_indice to specify the matching logic for dynamic indexes, default value:
Comment on lines +483 to +489
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple spelling errors in the documentation:

  1. "num_mast_decent_indice" should be "num_most_recent_indices"
  2. "indices_inclub" should be "indices_include"
Suggested change
|Set the indicator data for the latest index of the first "num_mast_decent_indice" in the date class dynamic index
|It can greatly reduce the large scale of indicators caused by historical dynamic indexing
|Can be used together with the 'indices_inclub' configuration
#### `dynamic_index_matcher_regexp` = ["(?P<date>(?:\\d{4}|\\d{2})[.-]?(?:\\d{2})[.-]?(?:\\d{2})?[.-]?(?:\\d{2})?)$","[\\.-._]\\d+(\\.\\d+){0,2}$"]
|Used in conjunction with num_mast_decent_indice to specify the matching logic for dynamic indexes, default value:
|Set the indicator data for the latest index of the first "num_most_recent_indices" in the date class dynamic index
|It can greatly reduce the large scale of indicators caused by historical dynamic indexing
|Can be used together with the 'indices_include' configuration
#### `dynamic_index_matcher_regexp` = ["(?P<date>(?:\\d{4}|\\d{2})[.-]?(?:\\d{2})[.-]?(?:\\d{2})?[.-]?(?:\\d{2})?)$","[\\.-._]\\d+(\\.\\d+){0,2}$"]
|Used in conjunction with num_most_recent_indices to specify the matching logic for dynamic indexes, default value:

Copilot uses AI. Check for mistakes.
Comment on lines +483 to +489
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spelling "num_mast_decent_indice" should be "num_most_recent_indices" for consistency with the actual configuration parameter name.

Suggested change
|Set the indicator data for the latest index of the first "num_mast_decent_indice" in the date class dynamic index
|It can greatly reduce the large scale of indicators caused by historical dynamic indexing
|Can be used together with the 'indices_inclub' configuration
#### `dynamic_index_matcher_regexp` = ["(?P<date>(?:\\d{4}|\\d{2})[.-]?(?:\\d{2})[.-]?(?:\\d{2})?[.-]?(?:\\d{2})?)$","[\\.-._]\\d+(\\.\\d+){0,2}$"]
|Used in conjunction with num_mast_decent_indice to specify the matching logic for dynamic indexes, default value:
|Set the indicator data for the latest index of the first "num_most_recent_indices" in the date class dynamic index
|It can greatly reduce the large scale of indicators caused by historical dynamic indexing
|Can be used together with the 'indices_inclub' configuration
#### `dynamic_index_matcher_regexp` = ["(?P<date>(?:\\d{4}|\\d{2})[.-]?(?:\\d{2})[.-]?(?:\\d{2})?[.-]?(?:\\d{2})?)$","[\\.-._]\\d+(\\.\\d+){0,2}$"]
|Used in conjunction with num_most_recent_indices to specify the matching logic for dynamic indexes, default value:

Copilot uses AI. Check for mistakes.
| ["(?P<date>(?:\\d{4}|\\d{2})[.-]?(?:\\d{2})[.-]?(?:\\d{2})?[.-]?(?:\\d{2})?)$","[\\.-._]\\d+(\\.\\d+){0,2}$"]
|Support matching with//YYYY.MM.DD or YYYY-MM-DD or YYYY-MMDD or YYYY-MM-DD-HH
|//YYYY.MM or YYYY-MM or YYYYMM or YYYYMMDDHH
|//YY.MM.DD or YY-MM-DD or YYMMDD or YYY.MM.DD.HH
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo in this comment - "YYY.MM.DD.HH" should be "YY.MM.DD.HH" (with two Y's not three) to be consistent with the other date format patterns mentioned.

Suggested change
|//YY.MM.DD or YY-MM-DD or YYMMDD or YYY.MM.DD.HH
|//YY.MM.DD or YY-MM-DD or YYMMDD or YY.MM.DD.HH

Copilot uses AI. Check for mistakes.
| //v1_001 v1_002 -->v1* v0.1 v0.2 -->v0* v5.2.3 v5.2.4 -->v5*
|It can also be expanded on its own
103 changes: 29 additions & 74 deletions inputs/elasticsearch/collector/ilm_indices.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,17 @@ package collector

import (
"encoding/json"
"flashcat.cloud/categraf/pkg/filter"
"fmt"
"io"
"log"
"net/http"
"net/url"
"path"
"sort"
"slices"
"strings"

"flashcat.cloud/categraf/pkg/filter"

"github.com/prometheus/client_golang/prometheus"
)

Expand All @@ -37,12 +38,13 @@ type ilmMetric struct {

// Index Lifecycle Management information object
type IlmIndiciesCollector struct {
client *http.Client
url *url.URL
indicesIncluded []string
numMostRecentIndices int
indexMatchers map[string]filter.Filter
ilmMetric ilmMetric
client *http.Client
url *url.URL
indicesIncluded []string
numMostRecentIndices int
maxIndicesIncludeCount int
indexMatchers map[string]filter.Filter
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field indexMatchers is no longer being initialized in the constructor but is still declared in the struct. Since the old filtering logic that used this field has been removed, this unused field should be removed from the struct definition to keep the code clean.

Suggested change
indexMatchers map[string]filter.Filter

Copilot uses AI. Check for mistakes.
ilmMetric ilmMetric
}

type IlmResponse struct {
Expand All @@ -63,16 +65,14 @@ var (
)

// NewIlmIndicies defines Index Lifecycle Management Prometheus metrics
func NewIlmIndicies(client *http.Client, url *url.URL, indicesIncluded []string, numMostRecentIndices int, indexMatchers map[string]filter.Filter) *IlmIndiciesCollector {
func NewIlmIndicies(client *http.Client, url *url.URL, indicesIncluded []string, maxIndicesIncludeCount int) *IlmIndiciesCollector {
subsystem := "ilm_index"

return &IlmIndiciesCollector{
client: client,
url: url,
indicesIncluded: indicesIncluded,
numMostRecentIndices: numMostRecentIndices,
indexMatchers: indexMatchers,

client: client,
url: url,
indicesIncluded: indicesIncluded,
maxIndicesIncludeCount: maxIndicesIncludeCount,
ilmMetric: ilmMetric{
Type: prometheus.GaugeValue,
Desc: prometheus.NewDesc(
Expand All @@ -95,10 +95,11 @@ func (i *IlmIndiciesCollector) fetchAndDecodeIlm() (IlmResponse, error) {
var ir IlmResponse

u := *i.url

//add indices filter
if len(i.indicesIncluded) == 0 {
if len(i.indicesIncluded) == 0 || len(i.indicesIncluded) > i.maxIndicesIncludeCount {
u.Path = path.Join(u.Path, "/_all/_ilm/explain")
} else {
} else if len(i.indicesIncluded) <= i.maxIndicesIncludeCount {
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditional logic is redundant. The first condition checks len(i.indicesIncluded) == 0 || len(i.indicesIncluded) > i.maxIndicesIncludeCount, and the else-if checks len(i.indicesIncluded) <= i.maxIndicesIncludeCount. The else-if is unnecessary because it's the only remaining case. This should be simplified to use just an if-else structure.

Suggested change
} else if len(i.indicesIncluded) <= i.maxIndicesIncludeCount {
} else {

Copilot uses AI. Check for mistakes.
u.Path = path.Join(u.Path, "/"+strings.Join(i.indicesIncluded, ",")+"/_ilm/explain")
}

Expand Down Expand Up @@ -128,6 +129,11 @@ func (i *IlmIndiciesCollector) fetchAndDecodeIlm() (IlmResponse, error) {
return ir, err
}

//filter
if len(i.indicesIncluded) > i.maxIndicesIncludeCount {
ir.Indices = i.filterMapByKeys(ir.Indices, i.indicesIncluded)
}

return ir, nil
}

Expand All @@ -147,9 +153,6 @@ func (i *IlmIndiciesCollector) Collect(ch chan<- prometheus.Metric) {
return
}

//add config i.numMostRecentIndices process code
ilmResp = i.gatherIndividualIndicesStats(ilmResp)

for indexName, indexIlm := range ilmResp.Indices {
ch <- prometheus.MustNewConstMetric(
i.ilmMetric.Desc,
Expand All @@ -160,61 +163,13 @@ func (i *IlmIndiciesCollector) Collect(ch chan<- prometheus.Metric) {
}
}

func (i *IlmIndiciesCollector) gatherIndividualIndicesStats(resp IlmResponse) IlmResponse {
newIndicesMappings := make(map[string]IlmIndexResponse)

// Sort indices into buckets based on their configured prefix, if any matches.
categorizedIndexNames := i.categorizeIndices(resp)
for _, matchingIndices := range categorizedIndexNames {
// Establish the number of each category of indices to use. User can configure to use only the latest 'X' amount.
indicesCount := len(matchingIndices)
indicesToTrackCount := indicesCount

// Sort the indices if configured to do so.
if i.numMostRecentIndices > 0 {
if i.numMostRecentIndices < indicesToTrackCount {
indicesToTrackCount = i.numMostRecentIndices
}
sort.Strings(matchingIndices)
}
func (i *IlmIndiciesCollector) filterMapByKeys(originalMap map[string]IlmIndexResponse, allowedKeys []string) map[string]IlmIndexResponse {

// Gather only the number of indexes that have been configured, in descending order (most recent, if date-stamped).
for i := indicesCount - 1; i >= indicesCount-indicesToTrackCount; i-- {
indexName := matchingIndices[i]
newIndicesMappings[indexName] = resp.Indices[indexName]
resultMap := make(map[string]IlmIndexResponse)
for key, value := range originalMap {
if slices.Contains(allowedKeys, key) {
resultMap[key] = value
}
}
Comment on lines +166 to 173
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filterMapByKeys function uses slices.Contains inside a loop which results in O(n*m) time complexity where n is the size of the response map and m is the size of the included slice. For better performance when filtering large datasets, consider converting the allowedKeys slice to a map for O(1) lookup time, reducing overall complexity to O(n).

Copilot uses AI. Check for mistakes.
Comment on lines +166 to 173
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using slices.Contains in a loop over potentially large maps creates O(n*m) complexity, where n is the number of indices in the response and m is the number of included indices. For large datasets, this could be inefficient. Consider converting the allowedKeys slice to a map for O(1) lookup performance.

Copilot uses AI. Check for mistakes.
//return new IlmResponse
var iml IlmResponse
iml.Indices = newIndicesMappings
return iml
}

func (i *IlmIndiciesCollector) categorizeIndices(resp IlmResponse) map[string][]string {
categorizedIndexNames := make(map[string][]string, len(resp.Indices))
// If all indices are configured to be gathered, bucket them all together.
if len(i.indicesIncluded) == 0 || i.indicesIncluded[0] == "_all" {
for indexName := range resp.Indices {
categorizedIndexNames["_all"] = append(categorizedIndexNames["_all"], indexName)
}

return categorizedIndexNames
}

// Bucket each returned index with its associated configured index (if any match).
for indexName := range resp.Indices {
match := indexName
for name, matcher := range i.indexMatchers {
// If a configured index matches one of the returned indexes, mark it as a match.
if matcher.Match(match) {
match = name
break
}
}

// Bucket all matching indices together for sorting.
categorizedIndexNames[match] = append(categorizedIndexNames[match], indexName)
}

return categorizedIndexNames
return resultMap
}
6 changes: 2 additions & 4 deletions inputs/elasticsearch/collector/ilm_indices_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
package collector

import (
"flashcat.cloud/categraf/pkg/filter"
"io"
"net/http"
"net/http/httptest"
Expand Down Expand Up @@ -99,9 +98,8 @@ elasticsearch_ilm_index_status{action="complete",index="facebook",phase="new",st
}

indicesIncluded := make([]string, 0)
var numMostRecentIndices int = 0
indexMatchers := make(map[string]filter.Filter)
c := NewIlmIndicies(http.DefaultClient, u, indicesIncluded, numMostRecentIndices, indexMatchers)
maxIndicesIncludeCount := 80
c := NewIlmIndicies(http.DefaultClient, u, indicesIncluded, maxIndicesIncludeCount)
if err != nil {
t.Fatal(err)
}
Expand Down
Loading
Loading