Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aws.firehose.arn, aws.firehose.request_id and aws.metrics_names_fingerprint fields #11239

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

kaiyan-sheng
Copy link
Contributor

@kaiyan-sheng kaiyan-sheng commented Sep 24, 2024

Proposed commit message

This PR adds aws.firehose.arn and aws.firehose.request_id field definitions for the firehose integration.

This PR also adds aws.metrics_names_fingerprint which is a hash of the list of metric names exist in each document. This way we don't have to count request_id as a dimension.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

How to test this PR locally?

  1. Install firehose integration
  2. Install aws integration
  3. POST two documents using the dev tool into the metrics-aws.cloudwatch-default data stream to mimic metrics coming in from Firehose: POST metrics-aws.cloudwatch-default/_doc/
``` POST metrics-aws.cloudwatch-default/_doc/ { "@timestamp": "2024-09-25T23:12:00.000Z", "agent.type": "firehose", "cloud.provider": "aws", "cloud.account.id": "627286350134", "cloud.region": "eu-central-1", "aws.exporter.arn": "arn:aws:cloudwatch:eu-central-1:627286350134:metric-stream/CustomFull-KefpMG", "aws.cloudwatch.namespace": "AWS/Kinesis", "aws.firehose.arn": "arn:aws:firehose:eu-central-1:627286350134:deliverystream/KS-PUT-ELI-2", "aws.firehose.request_id": "7628c293-ba54-4ea4-9abe-d256976c1dbd", "aws.dimensions": { "StreamName": "test-esf-encrypted" }, "aws.kinesis.metrics.GetRecords_Success": { "count": 240, "sum": 240, "avg": 1, "max": 1, "min": 1 }, "aws.kinesis.metrics.GetRecords_Bytes": { "count": 240, "sum": 0, "avg": 0, "max": 0, "min": 0 }, "aws.kinesis.metrics.GetRecords_Latency": { "count": 240, "sum": 1864, "avg": 7.766667, "max": 18, "min": 6 }, "aws.kinesis.metrics.GetRecords_IteratorAge": { "count": 240, "sum": 0, "avg": 0, "max": 0, "min": 0 }, "aws.kinesis.metrics.GetRecords_Records": { "count": 240, "sum": 0, "avg": 0, "min": 0, "max": 0 }, "aws.firehose.parameters.X-Found-Cluster": "bb8e51259abe4c21996954f5cfe90af1", "data_stream.type": "metrics", "data_stream.dataset": "aws.cloudwatch", "data_stream.namespace": "default" } ``` ``` POST metrics-aws.cloudwatch-default/_doc/ { "@timestamp": "2024-09-25T23:12:00.000Z", "agent.type": "firehose", "cloud.provider": "aws", "cloud.account.id": "627286350134", "cloud.region": "eu-central-1", "aws.exporter.arn": "arn:aws:cloudwatch:eu-central-1:627286350134:metric-stream/CustomFull-KefpMG", "aws.cloudwatch.namespace": "AWS/Kinesis", "aws.firehose.arn": "arn:aws:firehose:eu-central-1:627286350134:deliverystream/KS-PUT-ELI-2", "aws.firehose.request_id": "70222524-bdc0-47f4-b7f8-adb7c6998bc7", "aws.dimensions": { "StreamName": "test-esf-encrypted" }, "aws.kinesis.metrics.GetRecords_IteratorAgeMilliseconds": { "count": 240, "sum": 0, "avg": 0, "max": 0, "min": 0 }, "aws.kinesis.metrics.ReadProvisionedThroughputExceeded": { "count": 240, "sum": 0, "avg": 0, "max": 0, "min": 0 }, "aws.firehose.parameters.X-Found-Cluster": "bb8e51259abe4c21996954f5cfe90af1", "data_stream.type": "metrics", "data_stream.dataset": "aws.cloudwatch", "data_stream.namespace": "default" } ```
4. With these two POST you should see success for both.
``` { "_index": ".ds-metrics-aws.kinesis-default-2024.09.25-000001", "_id": "kpqzKhZo8euJ-o_TAAABkiuFKMA", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 1, "_primary_term": 1 } ```
5. These two documents have the same timestamp, dimension, namespace, accountID, exportARN and region BUT from two different requests. With adding the `aws.metrics_names_fingerprint` as TSDB dimension, we are able to ingest these two documents into ES. Screenshot 2024-09-25 at 5 09 40 PM

@andrewkroh andrewkroh added the Integration:awsfirehose Amazon Data Firehose label Sep 24, 2024
@kaiyan-sheng kaiyan-sheng marked this pull request as ready for review September 24, 2024 22:02
@kaiyan-sheng kaiyan-sheng requested a review from a team as a code owner September 24, 2024 22:02
@kaiyan-sheng kaiyan-sheng self-assigned this Sep 24, 2024
@andrewkroh andrewkroh added bugfix Pull request that fixes a bug issue Team:obs-ds-hosted-services Label for the Observability Hosted Services team [elastic/obs-ds-hosted-services] labels Sep 24, 2024
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we want these as dimensions? These fields describe the conduit for the data, rather than anything related to the metrics themselves. So I would see them as metadata rather than dimensions.

@kaiyan-sheng
Copy link
Contributor Author

kaiyan-sheng commented Sep 25, 2024

@axw For metrics coming in from the same Firehose stream, I see cases when two documents have the same timestamp, dimension, namespace, accountID, exportARN and region BUT from two different requests. Without specifying the request_id being a dimension, one of the documents get dropped.
Screenshot 2024-09-24 at 8 35 50 PM

I'm still trying to test out if aws.firehose.arn needs to be a dimension with the use case of having same metrics ingesting from two different firehose streams. But I think this case the exportARN will be different so we should be ok.

@kaiyan-sheng
Copy link
Contributor Author

Testing with firehose integration assets, I was able to ingest documents with the same timestamp, dimension, namespace, accountID, exportARN and region but different aws.firehose.request_id.
But testing with both firehose and aws integration assets installed, I still get error when ingesting the second document, which is the one has the differerent aws.firehose.request_id:

{
  "error": {
    "root_cause": [
      {
        "type": "version_conflict_engine_exception",
        "reason": "[kpqzKhZo8euJ-o_TAAABkicK60A][LI4OMA7GPJ422d7twNPxBkhU9sqRvtsl0QxVjz5Q9TriPoJxICdKr4x1LVlk@2024-09-25T02:38:00.000Z]: version conflict, document already exists (current version [1])",
        "index_uuid": "5r5AtNd8Qj-HMQkXPit4tw",
        "shard": "0",
        "index": ".ds-metrics-aws.kinesis-default-2024.09.25-000002"
      }
    ],
    "type": "version_conflict_engine_exception",
    "reason": "[kpqzKhZo8euJ-o_TAAABkicK60A][LI4OMA7GPJ422d7twNPxBkhU9sqRvtsl0QxVjz5Q9TriPoJxICdKr4x1LVlk@2024-09-25T02:38:00.000Z]: version conflict, document already exists (current version [1])",
    "index_uuid": "5r5AtNd8Qj-HMQkXPit4tw",
    "shard": "0",
    "index": ".ds-metrics-aws.kinesis-default-2024.09.25-000002"
  },
  "status": 409
}

But adding aws.firehose.request_id as dimension into the AWS integration solves the problem. Do we need to add these fields in both integrations? Will look into it more tomorrow.

@axw
Copy link
Member

axw commented Sep 25, 2024

For metrics coming in from the same Firehose stream, I see cases when two documents have the same timestamp, dimension, namespace, accountID, exportARN and region BUT from two different requests. Without specifying the request_id being a dimension, one of the documents get dropped.

Do we know why there are multiple requests with all the same dimensions? Are they retries, where one of them should get dropped?

@kaiyan-sheng
Copy link
Contributor Author

kaiyan-sheng commented Sep 25, 2024

@axw These are multiple requests with the same dimensions BUT different metric data points. For example cpu.utilization and network.in will be in request number 1 and then network.out will be in request number 2. But in request number 2, there can be metric data points for other services/timestamp.

@axw
Copy link
Member

axw commented Sep 25, 2024

Also discussed on Slack. We're having to choose the best of a bad lot here:

  1. we could send each metric in its own document (see Challenge of one metric per document elasticsearch#91775), and make the metric name a dimension/identifying field for the time series
  2. we can continue grouping, but add the request ID to disambiguate

I don't like the second option (what this PR implements), but I think it's the most reasonable solution at the moment. The reason I don't like it is that it would make automatic rollups ineffective, without some knowledge that request_id is something to ignore in rollups. Not a problem for today, so we can kick that can down the road a bit longer.

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixbarny and I had a chat on Slack about this, as it's another issue related to elastic/elasticsearch#99123. Although we can technically work around it using the request ID, this will reduce the storage efficiency of TSDB.

It's still a workaround, and not ideal, but we should probably hash the metric names instead. The hash can then be included as a dimension. That has been done for the Prometheus integration, so I'd suggest following suit. I think we could do it in the integration ingest pipeline?

@kaiyan-sheng kaiyan-sheng changed the title Add aws.firehose.arn and aws.firehose.request_id fields Add aws.firehose.arn, aws.firehose.request_id and aws.metrics_names_fingerprint fields Sep 25, 2024
@elastic-vault-github-plugin-prod
Copy link

elastic-vault-github-plugin-prod bot commented Sep 25, 2024

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@kaiyan-sheng kaiyan-sheng marked this pull request as draft September 25, 2024 22:57
@kaiyan-sheng kaiyan-sheng added the Integration:aws_bedrock Amazon Bedrock label Sep 25, 2024
@kaiyan-sheng kaiyan-sheng marked this pull request as ready for review September 25, 2024 23:44
@andrewkroh andrewkroh added Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] Team:Security-Service Integrations Security Service Integrations Team [elastic/security-service-integrations] labels Sep 25, 2024
@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks. Main issue is regarding the change in field access in routing rules - doesn't seem related to the main change?

Copy link
Contributor

@efd6 efd6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as codeowner for aws_bedrock with one nit.

Some nits in other files.

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple of minor comments in addition to @efd6's

@@ -1,7 +1,7 @@
format_version: 3.0.0
name: aws
title: AWS
version: 2.25.0-preview01
version: 2.26.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to figure out what should be the new version here and the kibana version restriction before merging this PR. We don't want the preview features getting GA by accident.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will need to remove all the changes inside AWS integration and create a separate PR for aws integration to release it as a bug fix for supporting older package version 8.14.0

@elasticmachine
Copy link

💚 Build Succeeded

History

cc @kaiyan-sheng

Copy link

Quality Gate failed Quality Gate failed

Failed conditions
13.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Pull request that fixes a bug issue Integration:aws_bedrock Amazon Bedrock Integration:aws AWS Integration:awsfirehose Amazon Data Firehose Team:obs-ds-hosted-services Label for the Observability Hosted Services team [elastic/obs-ds-hosted-services] Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] Team:Security-Service Integrations Security Service Integrations Team [elastic/security-service-integrations]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants