Skip to content

[Performance] A render query with many metrics takes a very long time (at least a minute) #823

@interfan7

Description

@interfan7

Problem description
Query a.*b*.c.d.* times out after a minute.

carbonapi's version
v0.16.0

Does this happened before
irrelevant (I didn't have an earlier version)

carbonapi's config
Will provide if relevant.

backend software and config
Go-Carbon 0.17.3.
Config will be provided if needed.

carbonapi performance metrics

  1. Please enable expvars in the config and provide it's output
    I believe this is the required expvars config:
expvar:
  enabled: true
  pprofEnabled: true
  listen: ""

IDK what output this question refers to.

  1. What's system load on a server? What's the memory consumption?
    The top doesn't seem to indicate any kind of throttling. I believe the userspace CPU load is somewhat higher during the query but still very low.
    Mem - free -h output:
me@graphite-dev:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:           30Gi       3.1Gi       628Mi       1.0Mi        27Gi        27Gi
Swap:            0B          0B          0B

This is our dev host.

  1. If possible, please provide profiler's output (https://blog.golang.org/pprof for instruction how to do that, debug handlers can be enabled in carbonapi's config on a separate port). Golang allows different types of profiles. It's always better to give CPU Profile, Memory profile and Lock profile, but that depends on what's the symptoms.
    2 CPU profiles which are captured during the request are attached:
    curl_cpu_3
    curl_cpu_4

  2. Debug logs could be helpful as well
    Setting log-level to debug has immense implication on performance so for now I've set it to "info" level. The only 2 relevant log entries for this request (grepped a\.\*b\*\.c\.d\.\*) are:

"level":"ERROR","timestamp":"2024-02-18T13:51:21.454Z","logger":"access","message":"request failed","data":{"handler":"render","carbonapi_uuid":"d1118526-3647-4c52-99ea-49d19b549f74","url":"/render","peer_ip":"10.42.136.247","host":"{our graphite dev host}:4040","format":"json","use_cache":true,"targets":["a.*b*.c.d.*"],"cache_timeout":180,"runtime":59.89556336,"http_code":503,"reason":"timeout while fetching Response","from":1708263920,"until":1708264222,"from_raw":"1708263920","until_raw":"1708264222","uri":"/render","from_cache":false,"used_backend_cache":false,"request_headers":{"X-Grafana-Org-Id":"1","X-Panel-Id":"1"}}}

and some 10's MB's long entry which starts like this:

{"level":"WARN","timestamp":"2024-02-18T13:52:40.013Z","logger":"zipper","message":"errors occurred while getting results","type":"protoV2Group","name":"http://127.0.0.1:8080","type":"fetch","request":"&MultiFetchRequest{Metrics:[]FetchRequest{FetchRequest{Name:
{{  removed for brevity }}
,"carbonapi_uuid":"d1118526-3647-4c52-99ea-49d19b549f74","errors":"max tries exceeded","errorsVerbose":"max tries exceeded\nHTTP Code: 500\n\ngithub.com/go-graphite/carbonapi/zipper/types.init\n\t/root/go/src/github.com/go-graphite/carbonapi/zipper/types/errors.go:25\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6321\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\n\nCaused By: failed to fetch data from server/group\nHTTP Code: 500\n\ngithub.com/go-graphite/carbonapi/zipper/types.init\n\t/root/go/src/github.com/go-graphite/carbonapi/zipper/types/errors.go:27\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6321\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\n\nCaused By: error while fetching Response\n\ngithub.com/go-graphite/carbonapi/zipper/types.init\n\t/root/go/src/github.com/go-graphite/carbonapi/zipper/types/errors.go:34\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6321\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"

Query that causes problems

  1. GF queries metrics. It's render query. Query target is expressed by a.*b*.c.d.*.
  2. According to go-carbon logs, there are ~65.5K files (i.e. metrics) returned.
  3. resolution - I think it's irrelevant.
  4. datapoints amount - Also irrelevant to my mind?

Additional context
The carbonapi log reports status code of 503. The nginx reverse proxy indicates code 499 which is according to the internet means a client has halted the connection. Anyway, 1 minute feels too long anyway.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceperformance related issues and questions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions