-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Fix FORMAT function to better comply with Microsoft SQL Server specification #86225
SQL: Fix FORMAT function to better comply with Microsoft SQL Server specification #86225
Conversation
Pinging @elastic/es-ql (Team:QL) |
Hi @luigidellaquila, I've created a changelog YAML for you. |
@@ -136,39 +136,39 @@ SELECT ISO_DAY_OF_WEEK(birth_date) AS d, SUM(salary) s FROM test_emp GROUP BY d | |||
|
|||
d:i | s:l | |||
---------------+--------------- | |||
7 |386466 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for this, it's annoying for me as well...
IntelliJ keeps trimming blank spaces in csv-spec files on save, and there is apparently no way to avoid it, even disabling it from the IDE settings (not sure if it's a mix of IntelliJ + checkstyle + OSX).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luigidellaquila this is probably due to .editorconfig
. You could add some further config to it, which worked when I tested it:
diff --git a/.editorconfig b/.editorconfig
index 92111b6c074..00dc2f6fa83 100644
--- a/.editorconfig
+++ b/.editorconfig
@@ -225,3 +225,6 @@ indent_size = 2
[*.{xsd,xml}]
indent_size = 4
+
+[*.{csv,sql}-spec]
+trim_trailing_whitespace = false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @pugnascotia, I'll try it
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments.
Also, I suggest changing the title of the PR to Fix FORMAT function to better comply with Microsoft SQL Server specification
. Not all the characteristics of MS format are supported.
// manage patterns that are different from MS to Java and have to be translated | ||
boolean replaced = false; | ||
for (String[] item : MS_TO_JAVA_PATTERNS) { | ||
if (i + item[0].length() <= pattern.length() && item[0].equals(pattern.substring(i, i + item[0].length()))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract item[0].length()
outside if
and call it only once.
@@ -67,6 +86,105 @@ protected Function<TemporalAccessor, String> formatterFor(String pattern) { | |||
} | |||
}; | |||
|
|||
private static String msToJavaPattern(String pattern) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method should be unit tested. There are many edge cases with translating one format to another and we need to be extra careful with them and make sure everything is covered. Make this method package protected and add tests for it.
StringBuilder partialQuotedString = new StringBuilder(); | ||
|
||
boolean originQuoting = false; | ||
boolean targetQuoting = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better named as shouldQuoteTarget
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, the naming is bad and misleading.
What I mean here is:
originQuoting
: the character in the original (MS) pattern was intended to be quoted (eg. previous char was a backslash or is inside a quoting block).targetQuoting
: last character in the generated (java) pattern has to be quoted, so I'm building a portion of pattern that will then be quoted. Unfortunately I cannot quote characters one by one, I have to accumulate portions of string that have to be quoted and then quote them properly
I'll try to find better names
StringBuilder result = new StringBuilder(pattern.length()); | ||
StringBuilder partialQuotedString = new StringBuilder(); | ||
|
||
boolean originQuoting = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better named as isStartQuote
?
result.append("'"); | ||
result.append(partialQuotedString.toString().replaceAll("'", "''")); | ||
result.append("'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract this as a small method and use it here and above as well.
); | ||
assertEquals( | ||
"2019-09-04 04:10:37.12345678 AM", | ||
new Format(Source.EMPTY, dateTime, l("YYYY-MM-dd HH:mm:ss.ffffffff t"), zoneId).makePipe().asProcessor().process(null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also add a test with YYYY
to show that the MS pattern is case-sensitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added above
- code refactoring - further unit tests - fix .editorconfig to avoid trim of trailing whitespaces in csv-spec files
💚 CLA has been signed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there more breaking changes apart from Y
vs. y
?
For Y
vs y
I would suggest to keep supporting upper case Y
and potentially emit a deprecation warning instead of breaking current behavior. Since even the docs uses Y
in the example it seems quite likely that users also do so.
@elasticmachine run CLA |
@Luegg yes, there are many breaking changes here, like A, N, G, VV... IMHO it could still make sense to treat Y as an exception, since it's pretty common, but probably we should not maintain the compatibility with all the other cases. |
I have mixed feelings about this. This is a bug and should be treated like it in my opinion: changing the output means fixing a faulty behavior in the first place. On the other hand, we could do as you suggest and be lenient towards such changes, but this could introduce a precedent: every bug fix should also consider any (if it wouldn't be a bug fix) introduced breaking changes and add deprecation warnings and so on. @bpintea @costin wdyt? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Left some comments around the coding style to replace the majority of if .. continue
with else if
.
Regarding the changes, I don't think we need to be lenient - this function is specific to SQL Server and should act accordingly. The fact that Y
has been supported instead of y
is a bug - if folks want to use the Java formatter rules, they can do so using the regular DATE_FORMATTER
or simply change their queries to use y
instead. It is annoying however it has been a bug that affected also our documentation.
In short, I would just apply the changes without any special treatment of the previous behavior.
originalCharacterQuoted = false; | ||
lastTargetCharacterQuoted = true; | ||
partialQuotedString.append(c); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use else if
instead of continue
through-out this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I'm wondering just how close to the MS specification we will want to get? |
@bpintea MS and Java formats are not different only in the syntax, but also on some concepts. One of them is custom format specifiers, that don't seem to have an equivalent in Java. |
// and then translate the pattern | ||
result.append(item[1]); | ||
characterProcessed = true; | ||
i += (fragmentLength - 1); // fast-forward, because the replaced pattern could be longer than one character |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the parentheses shouldn't be needed. Just in case any other changes will be needed.
Also, I'm generally in favour of using a while loop and explicitly incrementing by needed amount (vs. decrementing due to for).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Based on the documentation,
FORMAT()
function should comply with Microsoft SQL Server Format specification.Current implementation uses Java DateTimeFormat as it is, only with replacement of basic pattern characters that are slightly different between the two specs, but it's not fully compliant.
The goal of this PR is to better translate Microsoft datetime patterns to the equivalent in Java, and addresses the cases that are not currently covered, ie.
Y
andy
in Java are equivalent, while in MS specY
is just a normal character, not interpreted as a pattern)Fixes #66560