-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-26940][table] Add the built-in function SUBSTRING_INDEX #24972
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should also support API for Python? If so, you could refer other functions in these two files:
- docs/reference/pyflink.table/expressions.rst
- pyflink/table/expression.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dylanhz Thanks for your contribution, I left some comments.
...me/src/main/java/org/apache/flink/table/runtime/functions/scalar/SubstringIndexFunction.java
Outdated
Show resolved
Hide resolved
...me/src/main/java/org/apache/flink/table/runtime/functions/scalar/SubstringIndexFunction.java
Outdated
Show resolved
Hide resolved
...me/src/main/java/org/apache/flink/table/runtime/functions/scalar/SubstringIndexFunction.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
private int find(byte[] str, int beginIdx, byte[] target) { | ||
final int endIdx = str.length - target.length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we should check the str.length > target.length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is unnecessary, as the current code would return -1
directly in this case, indicating no matching substring found, which aligns with the expected behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I do find a bug in the next row, beginIndex < endIndex
should be beginIndex <= endIndex
. Corresponding test cases have been added.
if (count == 0) { | ||
return BinaryStringData.EMPTY_UTF8; | ||
} | ||
String str = expr.toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use the BinaryStringData to judge and find the substring instead of materialized it in advance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we don't have find
and rfind
for BinaryStringData. I think it would be great to introduce them into BinaryStringDataUtil because they are commonly used for string matching.
What is the purpose of the change
Add SUBSTRING_INDEX supported in SQL & Table API.
Examples:
Brief change log
FLINK-26940
Verifying this change
This change added tests and can be verified as follows:
StringFunctionsITCase#substringIndexTestCases
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: yesDocumentation