Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogDNA agent is failling with error ERROR logdna_agent::_main: bad request, check configuration: 400 Bad Request #617

Open
okarasov-sift opened this issue May 27, 2024 · 9 comments

Comments

@okarasov-sift
Copy link

okarasov-sift commented May 27, 2024

Problem

  • LogDNA agent is failling with error "ERROR logdna_agent::_main: bad request, check configuration: 400 Bad Request"
  • Reproducing only on GKE cluster node pool with GPU
  • LogDna agent works in other node pools in the same GKE cluster with the same configuration
  • The problem can be fixed by excluding logDNA agent logs (/var/log/containers/logdna-agent-*logdna-)
  • if replace CMD in docker image and run sleep infinity and sh inside container and run ./logdna-agent in /work folder, it works on GPU node as well.

Environment

  • GCP GKE node pool with GPU card
  • All logdna version does not work
  • Resource requests: cpu: 20m, limits: memory: 1G
@jakedipity
Copy link
Contributor

Thanks for reporting @okarasov-sift, the error indicates a 400 response from our ingestion API so it's a bit puzzling why this happens on a node pool with attached GPU's and even more puzzling that the suggested workarounds work.

We can extract a bit more information about the bad request by setting the agent to log in debug mode. Can you set following environment variable for the container and share the resulting logs: RUST_LOG=info,mz_http::client=debug. Alternatively you can use the following patch command to modify a running daemonset:

kubectl patch daemonset -n logdna-agent logdna-agent --type json -p '[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"RUST_LOG","value":"info,mz_http::client=debug"}}]'

@okarasov-sift
Copy link
Author

@jakedipity , thank you for you response.

I found in log the following

2024-06-04T23:56:23.784031Z ERROR logdna_agent::_main: Pod metadata is missing for line (retries=disabled): LazyLineSerializer { annotations: None, app: None, env: None, host: None, labels: None, level: None, meta: None, path: Some("/var/log/containers/logdna-agent-brf2z_logging_logdna-agent-dd1dc4f3ced576b2d2187232d18eac13eff5242264071373a1836c1b13a13bc3.log"), line_buffer: None, file_offset: (1062521, 5630, 5791), reader: Mutex { is_locked: false, has_waiters: false }, retry_events_send: Some(Sender { .. }) }
and debug message
2024-06-04T23:56:24.084091Z DEBUG mz_http::client: failed request: 400 Bad Request {"code":"BadRequest","error":"Missing hostname","status":"error"}

@okarasov-sift
Copy link
Author

Looks like root cause is /etc/hostname file. Our GKE clusters use GCP GKE use Container-Optimized OS version 105. I checked a node where LogDNA agent works, there is /etc/hostname but it is folder. Because, there was no hostname file at all, but during logdna agent pod startup it create volume from my understanding.
file /etc/hostname /etc/hostname: directory
On node where LogDNA agent does not work, hostname is empty file.
file /etc/hostname /etc/hostname: ASCII text
I have added node hostname to this file and it fixed the logdna agent crash looping issue.

Not clear how to fix it.

@jakedipity
Copy link
Contributor

The agent daemonset mounts the hostname from the node to /etc/logdna-hostname as shown here and here. I don't know enough about Container-Optimized OS to say what's expected for the hostname file, but the agent does expect it to be set.

@okarasov-sift
Copy link
Author

okarasov-sift commented Jun 5, 2024

@jakedipity, if /etc/hostname (/etc/logdna-hostname) is absent, logdna agent will take /etc/hostname (inside pod). it's not correct logic but it will not crash looping. The problem is that /etc/hostname file (/etc/logdna-hostname) is present but it's empty. I think logdna should check if /etc/logdna-hostname is empty, take value from /etc/hostname. Now it checks only if file exists or not.

if path.exists() {

@jakedipity
Copy link
Contributor

There's some downstream side effects for the node's /etc/hostname being empty. We use the supplied hostname in kubernetes contexts to map out additional fields including the node. If it falls back to the hostname of the container it muddies the node field since each container might have a different hostname.

We can definitely make the paths check more robust and keep falling back until it finds a non-empty value, but it's worth also understanding why the node's /etc/hostname is empty and what is the appropriate way to get a hostname for a GCP container optimized os.

@jakedipity
Copy link
Contributor

Additionally, it may be more appropriate to fetch the node name directly from Kubernetes. We already populate the node name into an environment variable here, but such a change would be breaking.

@okarasov-sift
Copy link
Author

@jakedipity , I see that you merged commit related to /etc/hostname. Does it help to resolve the issue? When are you going to do official release?

@jakedipity
Copy link
Contributor

jakedipity commented Jun 19, 2024

@okarasov-sift That commit doesn't alter the previous behavior of accepting an empty hostname as valid. This issue is minor and we are currently in the process of beta testing the upcoming 3.10.0 release so this is quite low on our priority list.

You're welcome to make the change yourself. The logic isn't difficult it just requires an adequate test case for the new behavior - the recent commit should make testing the function straightforward.

Additionally I think we still should have a discussion about where is an appropriate source for the hostname (which is used as the cluster name downstream) in container contexts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants