K8S中探针请求与OTEL链路采集的问题

最近给所有的java应用加了就绪探针和存活探针,并且通过kubectl rollout status deploy命令让Gitlab流水线能检测应用是否已经就绪。

1
kubectl rollout status deploy $DOCKER_APP_NAME --context=dev-admin@cluster.dev -n recircle-industry-platform-dev

image

但是又遇到一个新的问题,OpenTelemetry的java-agent默认也会把探针的请求上报到后端的监控服务。

image

由于探针请求的频率有点高,而且每个应用都会有探针请求,结果导致OpenObserver处理不过来,报MemoryTableOverflowError:

1
2
3
4
5
6
7
8
9
10
11
12
13
2025-06-18T07:40:22.074188063+00:00 ERROR openobserve::service::traces: [TRACES:OTLP] ingestion error while checking memtable size: MemoryTableOverflowError    
2025-06-18T07:40:22.074317749+00:00 INFO actix_web::middleware::logger: 10.233.71.119 "POST /api/default/v1/traces HTTP/1.1" 503 74 "1706" "-" "OTel-OTLP-Exporter-Java/1.40.0" 0.000299
2025-06-18T07:40:22.449896048+00:00 ERROR openobserve::service::traces: [TRACES:OTLP] ingestion error while checking memtable size: MemoryTableOverflowError
2025-06-18T07:40:22.450004619+00:00 INFO actix_web::middleware::logger: 10.233.80.211 "POST /api/default/v1/traces HTTP/1.1" 503 74 "2230" "-" "OTel-OTLP-Exporter-Java/1.40.0" 0.000270
2025-06-18T07:40:22.517553812+00:00 INFO actix_web::middleware::logger: 10.233.71.0 "POST /api/default/v1/metrics HTTP/1.0" 503 74 "3826" "-" "OpenTelemetry Collector Contrib/0.111.0 (linux/amd64)" 0.000867
2025-06-18T07:40:23.572270669+00:00 ERROR openobserve::service::traces: [TRACES:OTLP] ingestion error while checking memtable size: MemoryTableOverflowError
2025-06-18T07:40:23.572416843+00:00 INFO actix_web::middleware::logger: 10.233.71.1 "POST /api/default/v1/traces HTTP/1.1" 503 74 "2278" "-" "OTel-OTLP-Exporter-Java/1.40.0" 0.000371
2025-06-18T07:40:23.657375098+00:00 INFO actix_web::middleware::logger: 10.233.71.247 "POST /api/default/v1/metrics HTTP/1.1" 503 74 "20634" "-" "OTel-OTLP-Exporter-Java/1.40.0" 0.000578
2025-06-18T07:40:23.803923763+00:00 ERROR openobserve::service::traces: [TRACES:OTLP] ingestion error while checking memtable size: MemoryTableOverflowError
2025-06-18T07:40:24.635078516+00:00 ERROR openobserve::service::traces: [TRACES:OTLP] ingestion error while checking memtable size: MemoryTableOverflowError
2025-06-18T07:40:24.635189325+00:00 INFO actix_web::middleware::logger: 10.233.71.223 "POST /api/default/v1/traces HTTP/1.1" 503 74 "2162" "-" "OTel-OTLP-Exporter-Java/1.40.0" 0.000268
2025-06-18T07:40:25.119413710+00:00 ERROR openobserve::service::traces: [TRACES:OTLP] ingestion error while checking memtable size: MemoryTableOverflowError
2025-06-18T07:40:25.120251549+00:00 ERROR openobserve::service::traces: [TRACES:OTLP] ingestion error while checking memtable size: MemoryTableOverflowError

目前找到几种解决方法。

java-agent上报时过滤掉actuator的请求

opentelemetry-java-instrumentation#1060discussions#6605中提到了这个问题,但是agent并不打算实现Exclude URL的功能。

opentelemetry有一个三方提供的samplers包,可以做到过滤actuator请求。需要你将其打包成java-agent-extension,并对java-agent-extension进行配置

通过opentelemetry-spring-boot-starter过滤actuator请求

还有一种方式是通过opentelemetry-spring-boot-starter过滤掉actuator请求。

但是这种方式需要侵入代码,比较好的方式是把这个功能和spring-boot-actuator一起封装成二方包。但是需要对业务开发人员灌输这个二方包的作用,仍然不够优雅。

通过opentelemetry-collector丢弃掉actuator请求

第三种方式就是在java应用和openobserver之间添加一个opentelemetry-collector的组件,这个好处是让opentelemetry-collector统一收集所有java应用上报的OTEL消息,然后批量发给openobserver可以缓解openobserver的压力(openobserver底层使用LSM数据结构,有后台merge的过程),其次就是有非常多的processor可以对OTEL消息进行处理。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318

processors:
batch:
memory_limiter:
check_interval: 1s
limit_mib: 512
filter/skip_actuator:
error_mode: ignore
traces:
span:
# 丢弃包含actuator或healthz请求
- attributes["http.route"] != nil and IsMatch(attributes["http.route"], ".*actuator.*")
- attributes["http.target"] != nil and IsMatch(attributes["http.target"], ".*/healthz")
- attributes["url.path"] != nil IsMatch(attributes["url.path"], ".*actuator.*")
# 丢弃kube-probe发出的探针请求
- attributes["http.user_agent"] != nil and IsMatch(attributes["http.user_agent"], "^kube-probe.*")

exporters:
otlphttp/openobserve:
endpoint: "http://openobserve.recircle-industry-platform-dev:5080/api/default"
headers:
Authorization: Basic cm9vdEBleGFtcGxlLmNvbTpjNEVxU1Jhb0F5SHNWTDVn
stream-name: default

service:
pipelines:
traces:
receivers: [otlp]
processors: [filter/skip_actuator, memory_limiter, batch]
exporters: [otlphttp/openobserve]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlphttp/openobserve]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlphttp/openobserve]

这里推荐一个网站otelbin.io可以可视化opentelemetry-collector的配置。

本作品采用 知识共享署名 4.0 国际许可协议 进行许可。

转载时请注明原文链接:https://blog.hufeifei.cn/2025/06/Distribution/opentelemetry-spring-actuator/

鼓励一下
支付宝微信