Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(datadog grok): enable the DOTALL mode by default #1022

Merged
merged 7 commits into from
Sep 26, 2024

Conversation

vladimir-dd
Copy link
Contributor

@vladimir-dd vladimir-dd commented Sep 6, 2024

Enables the DOTALL mode(dot includes \n) for the Datadog grok lib by default.

@@ -0,0 +1,2 @@
The `parse_groks` VRL function has the multiline mode enabled by default.
Copy link
Contributor Author

@vladimir-dd vladimir-dd Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether it makes sense to introduce an optional parameter for users to control this, keep the existing behaviour or make a breaking change for this function? 🤔 Datadog's grok implementation enables the DOTALL mode by default(see the csv filter for example), which seems like a reasonable behaviour. Users can always use (?s) and (?-s) modifiers to control this behaviour.
Also please note that the parse_grok function does not use the datadog/grok lib, so in this case they would behave differently unless we change it too..
Any thoughts @jszwedko @bruceg?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have as much context on vrl but my 2cents - agreed that making this the default behavior feels reasonable, and I think it would also make sense to eventually update parse_grok to have parity. If I'm looking at the vrl docs I would assume they share this base behavior.

@vladimir-dd vladimir-dd marked this pull request as draft September 7, 2024 20:26
@vladimir-dd vladimir-dd removed the request for review from bruceg September 8, 2024 08:06
@vladimir-dd vladimir-dd changed the title fix(datadog grok): support multiline logs fix(datadog grok): enable the DOTALL mode by default Sep 8, 2024
@vladimir-dd vladimir-dd marked this pull request as ready for review September 8, 2024 19:04
@vladimir-dd vladimir-dd marked this pull request as draft September 8, 2024 19:04
Comment on lines 178 to 186
let mut pattern = String::new();
// In Oniguruma the (?m) modifier is used to enable the DOTALL mode(dot includes newlines),
// as opposed to the (?s) modifier in other regex flavors.
// \A, \z - parses from the beginning to the end of string, not line(until \n)
pattern.push_str(r"\A");
pattern.push_str(r"(?m)\A"); // (?m) enables the DOTALL mode by default
pattern.push_str(&context.regex);
pattern.push_str(r"\z");

// our regex engine(onig) uses (?m) mode modifier instead of (?s) to make the dot match all characters
pattern = pattern.replace("(?s)", "(?m)").replace("(?-s)", "(?-m)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This could be done all in one go:

    let pattern = [
        // In Oniguruma the (?m) modifier is used to enable the DOTALL mode(dot includes newlines),
        // as opposed to the (?s) modifier in other regex flavors.
        // \A, \z - parses from the beginning to the end of string, not line(until \n)
        r"(?m)\A", // (?m) enables the DOTALL mode by default
        &context.regex.replace("(?s)", "(?m)").replace("(?-s)", "(?-m)"),
        r"\z",
    ].concat();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, updated.

changelog.d/1022.breaking.md Outdated Show resolved Hide resolved
@vladimir-dd vladimir-dd marked this pull request as ready for review September 25, 2024 10:12
@tessneau tessneau added this pull request to the merge queue Sep 26, 2024
Merged via the queue into main with commit e1be30e Sep 26, 2024
15 checks passed
@tessneau tessneau deleted the vladimir-dd/opa-2298 branch September 26, 2024 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants