Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esdsl #69254

Merged
merged 25 commits into from
Aug 17, 2020
Merged

esdsl #69254

merged 25 commits into from
Aug 17, 2020

Conversation

ppisljar
Copy link
Member

@ppisljar ppisljar commented Jun 16, 2020

closes #70772

Summary

Adds expression function to access app arch search service from expressions.

esdsl function takes a single dsl parameter which should be stringified elasticsearch search request json

it will try to merge any filters,query,timerange present on the input context into the request.

it will output raw_es_response format, which can be automatically converted to datatable and kibana_table.
if raw es response contains docs and aggregation results the conversion to datatable will only take the docs.

it will log the request and response to inspector

followup:

  • parse_es_response function which allows you to better control conversion from es_raw_response to datatable.
  • handling timeRange on input context

Checklist

Delete any items that are not applicable to this PR.

@ppisljar ppisljar added Feature:ExpressionLanguage Interpreter expression language (aka canvas pipeline) v8.0.0 Team:AppArch release_note:skip Skip the PR/issue when compiling release notes v7.9.0 labels Jun 16, 2020
@ppisljar ppisljar requested a review from a team as a code owner June 16, 2020 11:21
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-arch (Team:AppArch)

@ppisljar ppisljar requested a review from lukeelmers June 16, 2020 11:21
@ppisljar ppisljar added the WIP Work in progress label Jun 16, 2020
Copy link
Member

@lukasolson lukasolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did a quick code review pass, added a couple of minor comments below.

Other than that, we need unit tests on esRawResponse and esdsl .

@ppisljar ppisljar force-pushed the expressions/esdsl branch 3 times, most recently from 902b768 to bd89491 Compare July 16, 2020 13:14
@ppisljar
Copy link
Member Author

@elasticmachine merge upstream

@ppisljar ppisljar added review and removed WIP Work in progress labels Jul 16, 2020
Comment on lines 95 to 96
expressions.registerFunction(esdsl);
expressions.registerType(esRawResponse);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a note for myself here - currently we are registering esaggs from plugin.ts, but would be nice to move that here instead for consistency. I'll try to do that as we work toward server migration of esaggs.

src/plugins/data/public/search/expressions/esdsl.ts Outdated Show resolved Hide resolved
src/plugins/data/public/search/expressions/esdsl.ts Outdated Show resolved Hide resolved
src/plugins/data/public/search/expressions/esdsl.ts Outdated Show resolved Hide resolved
* under the License.
*/

import { EsRawResponse, esRawResponse } from './es_raw_response';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could also add a few tests for .to.kibana_datatable()

src/plugins/data/public/search/expressions/esdsl.ts Outdated Show resolved Hide resolved
@ppisljar
Copy link
Member Author

@elasticmachine merge upstream

@ppisljar ppisljar added v7.10.0 and removed v7.9.0 labels Jul 21, 2020
@apmmachine
Copy link
Contributor

apmmachine commented Jul 21, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #69254 updated]

  • Start Time: 2020-07-21T14:03:57.806+0000

  • Duration: 4 min 18 sec

@ppisljar
Copy link
Member Author

ppisljar commented Aug 3, 2020

@elasticmachine merge upstream

Copy link
Member

@lukeelmers lukeelmers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the latest changes, and was so far unable to find another way to break it. Like we discussed earlier, I have a feeling we will need to address any other edge cases as they come up. But since there will be nobody initially using this function, and we are confident we have covered the most common query cases, I am comfortable merging as-is since I expect this will address the majority of cases.

dsl: string;
}

export type EsdslExpressionFunctionDefinition = ExpressionFunctionDefinition<
Copy link
Member

@lukeelmers lukeelmers Aug 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: This should get added to the mapping of all function definitions provided by the expressions plugin in src/plugins/expressions/common/expression_functions/types.ts

Never mind; forgot this lives in the data plugin 😉

Copy link
Contributor

@wylieconlon wylieconlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API looks pretty rough, and I've left some comments about it based on the spacetime work that I did. The way I would define "ready to use" here is: can I create a visualization in Canvas from flattening multiple aggregations?

{ inspectorAdapters: {} } as any
);

expect(result).toMatchSnapshot();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These snapshot tests are hard to understand from a reviewing perspective. Can you make specific assertions instead of using snapshots?


expect(result).toMatchSnapshot();
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests look incomplete to me:

  • What happens if the JSON is invalid?
  • What if the JSON has line breaks?
  • What if the context includes query, filters, and timerange?
  • What if the JSON includes a bool query, how is that query merged with the context?

async fn(input, args, { inspectorAdapters, abortSignal }) {
const searchService = getSearchService();

const dsl = JSON.parse(args.dsl);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could throw, please add a test

},
{
dsl:
'{"index": "kibana_sample_data_logs", "size": 4, "body": { "_source": false, "query": { "term": { "machine.os.keyword": "osx"}}}}',
Copy link
Contributor

@wylieconlon wylieconlon Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the DSL... the section labelled body is the DSL. index should be a separate parameter to the function.

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html#query-filter-context-ex

const dsl = JSON.parse(args.dsl);
if (!dsl.body) dsl.body = {};

const dslBody = dsl.body;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument that you're calling body should be the only argument. Everything else should be top-level arguments to the expression function, such as:

esdsl index="my_index" dsl='{ "query": { "match_all": {} }, "size": 0, "aggs": { } }'

};

const convertResult = (body: SearchResponse<unknown>) => {
return body.hits.hits.length ? parseRawDocs(body.hits) : flatten(body.aggregations);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented on this earlier, but because of the way the query DSL will return hits by default, I think aggregations should take priority.

id: key,
name: key,
meta: {
type: typeof rows[0][key],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a stricter definition of this type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we have anything at the moment, but hopefully elasticsearch can provide this kind of meta data on the response.

"meta": Object {
"field": "products.created_on",
"params": Object {},
"type": "object",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the JS type, but it's not actually useful for our purposes. I would have expected that arrays would be flattened more.

skipped: 0,
failed: 0,
},
hits: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a copy+paste response with the previous? Can it be stored as a variable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, they are different, the second one having aggs and docs

"products.created_on": Array [
"2016-12-12T09:27:22.000Z",
"2016-12-12T09:27:22.000Z",
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This row value is not useful for visualizations, since it's an array. In the spacetime work that Joe and I did, this was the most important case to figure out. Both of us chose to unroll these nested arrays into multiple rows.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this response handling is copied from joe's space time. I am hoping we can get elasticsearch to return meta information with responses and thus address this, but if you have a simple solution that would make this parsing better i am open for suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that we should follow up with this later. It might need special handling

Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this looks good to me, but if I understood correctly, there is no way to use date fields at the moment because they won't be converted to actual date objects. At least for canvas this limits possible use cases.

One option would be to add an auto-detection of ISO dates and converting them automatically or to have a separate function for that. Doesn't have to be part of this PR though.

}

const parseRawDocs = (hits: SearchResponse<unknown>['hits']) => {
return hits.hits.map((hit) => hit.fields).filter((hit) => hit);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to map the fields? In most cases when we are working with raw documents we rely on the stuff in _source (at least that's how I used the results in most cases). Another option would be to use both and prefer fields if available.

# Conflicts:
#	src/plugins/data/public/public.api.md
#	src/plugins/data/public/search/search_service.ts
Copy link
Contributor

@wylieconlon wylieconlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these API changes, I found it fairly easy to work with in Canvas. The main changes that I want to clarify is how we want fields to behave: it's the new standard, and I think it's wrong to ignore it. I would be comfortable merging this sooner and making those behavior changes later, but only if you add the tests that indicate missing support for fields.

"products.created_on": Array [
"2016-12-12T09:27:22.000Z",
"2016-12-12T09:27:22.000Z",
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that we should follow up with this later. It might need special handling

});

describe('esRawResponse', () => {
describe('converts aggregations to table', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test for a multi-level aggregation, for example Terms + Date histogram, or Terms + Terms? I see that it's working the way I expected, but it wasn't clear until I actually ran this locally.

Screen Shot 2020-08-13 at 5 44 27 PM

});
});

describe('converts raw docs to table', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that there was previous discussion about how to handle the fields parameter if it's present, but the current behavior of just ignoring it is probably not the long-term behavior. I'm fine with changing the behavior for this, but I'd like to see a test that is only for the fields parameter. This test would indicate that it's intentionally ignored.

In the long term, I think fields and _source should be merged when creating the table response. If you want a shorter term solution, you could something like with aggregations and create a hierarchy of aggs > fields > _source.

}

const parseRawDocs = (hits: SearchResponse<unknown>['hits']) => {
return hits.hits.map((hit) => hit._source).filter((hit) => hit);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ignoring the hits.fields parameter, which I know you previously discussed. I would encourage you to add support for this because it's now being documented as the primary way to retrieve more specific information. https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-fields.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An easy way would be to flatten out source fields by _source.<field name> and fields fields by fields.<field name>. This stays close to the format of Elasticsearch and gives the user all they need (renaming the stuff they need shouldn't be hard)

Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this state. There are things to improve, but IMHO they shouldn't be part of the very first version of this. I left one suggestion about the fields vs source debate, otherwise I think this is a great addition.

Thanks for working on this!

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Build metrics

@kbn/optimizer bundle module count

id value diff baseline
data 542 +2 540

page load bundle size

id value diff baseline
data 1.4MB +9.2KB 1.4MB

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@ppisljar ppisljar merged commit 22f0641 into elastic:master Aug 17, 2020
@ppisljar ppisljar mentioned this pull request Aug 17, 2020
ppisljar added a commit to ppisljar/kibana that referenced this pull request Aug 17, 2020
ppisljar added a commit that referenced this pull request Aug 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:ExpressionLanguage Interpreter expression language (aka canvas pipeline) release_note:skip Skip the PR/issue when compiling release notes review v7.10.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

esdsl expression function
8 participants