{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":17165658,"defaultBranch":"master","name":"spark","ownerLogin":"apache","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2014-02-25T08:00:08.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/47359?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1718755714.0","currentOid":""},"activityList":{"items":[{"before":"6161632a9580cd1a222daf8117c1850b87974155","after":"f1eca903f5c25aa08be80e9af2df3477e2a5a6ef","ref":"refs/heads/master","pushedAt":"2024-07-05T14:22:19.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48719][SQL] Fix the calculation bug of `RegrSlope` & `RegrIntercept` when the first parameter is null\n\n### What changes were proposed in this pull request?\n\nThis PR aims to fix the calculation bug of `RegrSlope`&`RegrIntercept` when the first parameter is null. Regardless of whether the first parameter(y) or the second parameter(x) is null, this tuple should be filtered out.\n\n### Why are the changes needed?\n\nFix bug.\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, the calculation changes when the first value of a tuple is null, but the value is truly correct.\n\n### How was this patch tested?\n\nPass GA and test with `build/sbt \"~sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z linear-regression.sql\"`\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #47105 from wayneguow/SPARK-48719.\n\nAuthored-by: Wei Guo \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48719][SQL] Fix the calculation bug of RegrSlope & `RegrInte…"}},{"before":"4c1066d0654d9d1a9b5c7dc76825cd32bd819842","after":"6161632a9580cd1a222daf8117c1850b87974155","ref":"refs/heads/master","pushedAt":"2024-07-05T10:01:43.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48792][SQL] Fix regression for INSERT with partial column list to a table with char/varchar\n\n### What changes were proposed in this pull request?\n\n#41262 introduced a regression by applying literals with char/varchar type in query output for table insertions, see\n\nhttps://github.com/apache/spark/pull/41262/files#diff-6e331e8f1c67b5920fb46263b6e582ec6e6a253ee45543559c9692a72a1a40ecR187-R188\n\nThis causes bugs\n\n```java\n24/07/03 16:29:01 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)\norg.apache.spark.SparkException: [INTERNAL_ERROR] Unsupported data type VarcharType(64). SQLSTATE: XX000\n\tat org.apache.spark.SparkException$.internalError(SparkException.scala:92)\n\tat org.apache.spark.SparkException$.internalError(SparkException.scala:96)\n```\n\n```java\norg.apache.spark.SparkUnsupportedOperationException: VarcharType(64) is not supported yet.\n\tat org.apache.spark.sql.errors.QueryExecutionErrors$.dataTypeUnsupportedYetError(QueryExecutionErrors.scala:993)\n\tat org.apache.spark.sql.execution.datasources.orc.OrcSerializer.newConverter(OrcSerializer.scala:209)\n\tat org.apache.spark.sql.execution.datasources.orc.OrcSerializer.$anonfun$converters$2(OrcSerializer.scala:35)\n\tat scala.collection.immutable.List.map(List.scala:247)\n```\n\n### Why are the changes needed?\n\nBugfix\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n### How was this patch tested?\n\nnew tests\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #47198 from yaooqinn/SPARK-48792.\n\nAuthored-by: Kent Yao \nSigned-off-by: Kent Yao ","shortMessageHtmlLink":"[SPARK-48792][SQL] Fix regression for INSERT with partial column list…"}},{"before":"9f22fa4d2acfcbc42d6d76a28778885cbdad733d","after":"4c1066d0654d9d1a9b5c7dc76825cd32bd819842","ref":"refs/heads/master","pushedAt":"2024-07-05T09:26:03.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"ulysses-you","name":"Xiduo You","path":"/ulysses-you","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12025282?s=80&v=4"},"commit":{"message":"[SPARK-48815][CONNECT] Update environment when stoping connect session\n\n### What changes were proposed in this pull request?\n\nWe should update environment if any added files are removed when stoping connect session.\n\n### Why are the changes needed?\n\nTo sync the resources with event environment and UI\n\n### Does this PR introduce _any_ user-facing change?\n\nyes, the event and UI changed\n\n### How was this patch tested?\n\nmanually test, after this change, the resources in UI environment are removed when stoping connect session.\n\n`spark.addArtifact(\"/Users/cathy/Desktop/code/spark/dist/jars/zjsonpatch-0.3.0.jar\")`\n\n\"image\"\n\n`spark.stop`\n\n\"image\"\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #47223 from ulysses-you/env.\n\nAuthored-by: ulysses-you \nSigned-off-by: youxiduo ","shortMessageHtmlLink":"[SPARK-48815][CONNECT] Update environment when stoping connect session"}},{"before":"a2f8001b59df0042b3197c672c1c24413eba6f0f","after":"9f22fa4d2acfcbc42d6d76a28778885cbdad733d","ref":"refs/heads/master","pushedAt":"2024-07-05T07:58:57.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48767][SQL] Fix some error prompts when `variant` type data is invalid\n\n### What changes were proposed in this pull request?\nThe pr aims to:\n- fix some error prompts when `variant` type data is invalid.\n- provide a clear `error-condition` for `variant` related errors.\n- use `checkError` to check exception in the `VariantSuite` class.\n\n### Why are the changes needed?\n- Reproduction examples\n \"image\"\n\n For reproduction examples detail, please refer to: https://github.com/apache/spark/pull/47162#issuecomment-2202255944\n\n- When there is only `value` or `metadata` in `variant` data, it will throw `variant with more than two field`, which is obviously incorrect.\nhttps://github.com/apache/spark/blob/930422389352b8349e5a845c8cae9993d30dce17/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L405-L408\n\n- When `SQLConf.PARQUET_VECTORIZED_READER_ENABLED` is true or false, there is a difference in the error prompt, and we should align it.\n\n### Does this PR introduce _any_ user-facing change?\nYes, fix some error prompts when `variant` type data is invalid.\n\n### How was this patch tested?\nExisted UT & Update UT.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #47162 from panbingkun/SPARK-48767.\n\nAuthored-by: panbingkun \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48767][SQL] Fix some error prompts when variant type data is…"}},{"before":"310f8ea2456dad7cec0f22bfed05a679764c3d7e","after":"a2f8001b59df0042b3197c672c1c24413eba6f0f","ref":"refs/heads/master","pushedAt":"2024-07-05T02:48:54.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48783][DOCS] Update the table-valued function docs\n\n### What changes were proposed in this pull request?\n\nThis PR updates the table-valued function SQL reference doc to include the new set of TVFs that can be used in the FROM clause of a query, as well as the examples.\n\n### Why are the changes needed?\n\nTo improve the documentation.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nExisting tests\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #47184 from allisonwang-db/spark-48783-tvf-docs.\n\nAuthored-by: allisonwang-db \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48783][DOCS] Update the table-valued function docs"}},{"before":"44eba46cc8b90be990177450141c48746fa5b67d","after":"1cc0043fe549dcdf918d9dad21462a1e4714bb5d","ref":"refs/heads/branch-3.5","pushedAt":"2024-07-04T14:27:31.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48806][SQL] Pass actual exception when url_decode fails\n\n### What changes were proposed in this pull request?\n\nPass actual exception for url_decode.\n\nFollow-up to https://issues.apache.org/jira/browse/SPARK-40156\n\n### Why are the changes needed?\n\nCurrently url_decode function ignores actual exception, which contains information that is useful for quickly locating the problem.\n\nLike executing this sql:\n```\nselect url_decode('https%3A%2F%2spark.apache.org');\n```\nWe only get the error message:\n```\norg.apache.spark.SparkIllegalArgumentException: [CANNOT_DECODE_URL] The provided URL cannot be decoded: https%3A%2F%2spark.apache.org. Please ensure that the URL is properly formatted and try again.\n at org.apache.spark.sql.errors.QueryExecutionErrors$.illegalUrlError(QueryExecutionErrors.scala:376)\n at org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:118)\n at org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)\n```\nHowever, the actual useful exception information is ignored:\n```\njava.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - Error at index 1 in: \"2s\"\n```\n\nAfter this pr we will get:\n\n```\norg.apache.spark.SparkIllegalArgumentException: [CANNOT_DECODE_URL] The provided URL cannot be decoded: https%3A%2F%2spark.apache.org. Please ensure that the URL is properly formatted and try again. SQLSTATE: 22546\n\tat org.apache.spark.sql.errors.QueryExecutionErrors$.illegalUrlError(QueryExecutionErrors.scala:372)\n\tat org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:119)\n\tat org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)\n\t...\nCaused by: java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - Error at index 1 in: \"2s\"\n\tat java.base/java.net.URLDecoder.decode(URLDecoder.java:237)\n\tat java.base/java.net.URLDecoder.decode(URLDecoder.java:147)\n\tat org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:116)\n\t... 135 more\n```\n\n### Does this PR introduce _any_ user-facing change?\n\nNo\n\n### How was this patch tested?\n\nunit test\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #47211 from wForget/SPARK-48806.\n\nLead-authored-by: wforget <643348094@qq.com>\nCo-authored-by: Kent Yao \nSigned-off-by: Kent Yao \n(cherry picked from commit 310f8ea2456dad7cec0f22bfed05a679764c3d7e)\nSigned-off-by: Kent Yao ","shortMessageHtmlLink":"[SPARK-48806][SQL] Pass actual exception when url_decode fails"}},{"before":"c73c412b172ee1bfd7466e1222f0b1cc5c7ba4c8","after":"310f8ea2456dad7cec0f22bfed05a679764c3d7e","ref":"refs/heads/master","pushedAt":"2024-07-04T14:27:07.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48806][SQL] Pass actual exception when url_decode fails\n\n### What changes were proposed in this pull request?\n\nPass actual exception for url_decode.\n\nFollow-up to https://issues.apache.org/jira/browse/SPARK-40156\n\n### Why are the changes needed?\n\nCurrently url_decode function ignores actual exception, which contains information that is useful for quickly locating the problem.\n\nLike executing this sql:\n```\nselect url_decode('https%3A%2F%2spark.apache.org');\n```\nWe only get the error message:\n```\norg.apache.spark.SparkIllegalArgumentException: [CANNOT_DECODE_URL] The provided URL cannot be decoded: https%3A%2F%2spark.apache.org. Please ensure that the URL is properly formatted and try again.\n at org.apache.spark.sql.errors.QueryExecutionErrors$.illegalUrlError(QueryExecutionErrors.scala:376)\n at org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:118)\n at org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)\n```\nHowever, the actual useful exception information is ignored:\n```\njava.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - Error at index 1 in: \"2s\"\n```\n\nAfter this pr we will get:\n\n```\norg.apache.spark.SparkIllegalArgumentException: [CANNOT_DECODE_URL] The provided URL cannot be decoded: https%3A%2F%2spark.apache.org. Please ensure that the URL is properly formatted and try again. SQLSTATE: 22546\n\tat org.apache.spark.sql.errors.QueryExecutionErrors$.illegalUrlError(QueryExecutionErrors.scala:372)\n\tat org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:119)\n\tat org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)\n\t...\nCaused by: java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - Error at index 1 in: \"2s\"\n\tat java.base/java.net.URLDecoder.decode(URLDecoder.java:237)\n\tat java.base/java.net.URLDecoder.decode(URLDecoder.java:147)\n\tat org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:116)\n\t... 135 more\n```\n\n### Does this PR introduce _any_ user-facing change?\n\nNo\n\n### How was this patch tested?\n\nunit test\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #47211 from wForget/SPARK-48806.\n\nLead-authored-by: wforget <643348094@qq.com>\nCo-authored-by: Kent Yao \nSigned-off-by: Kent Yao ","shortMessageHtmlLink":"[SPARK-48806][SQL] Pass actual exception when url_decode fails"}},{"before":"54b75582506d0e58af7f500b9d284ab7222e98f0","after":"c73c412b172ee1bfd7466e1222f0b1cc5c7ba4c8","ref":"refs/heads/master","pushedAt":"2024-07-04T13:59:04.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48808][SQL] Fix NPE when connecting thriftserver through Hive 1.2.1 and the result schema is empty\n\n### What changes were proposed in this pull request?\n\nLower Hive JDBC/Thrift Client doesn't test whether columns are null or not before handling it. In this PR, we set an empty list to it if the result schema is empty\n\n### Why are the changes needed?\n\nbugfix for lower hive clients\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\ntested offline by wangyum\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #47213 from yaooqinn/SPARK-48808.\n\nLead-authored-by: Kent Yao \nCo-authored-by: Yuming Wang \nSigned-off-by: Kent Yao ","shortMessageHtmlLink":"[SPARK-48808][SQL] Fix NPE when connecting thriftserver through Hive …"}},{"before":"d5dc22356d23b9d15b4f7c195e60c3767ea205ad","after":"54b75582506d0e58af7f500b9d284ab7222e98f0","ref":"refs/heads/master","pushedAt":"2024-07-04T11:27:28.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"LuciferYang","name":"YangJie","path":"/LuciferYang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1475305?s=80&v=4"},"commit":{"message":"[SPARK-48805][SQL][ML][SS][AVRO][EXAMPLES] Replace calls to bridged APIs based on `SparkSession#sqlContext` with `SparkSession` API\n\n### What changes were proposed in this pull request?\nIn the internal code of Spark, there are instances where, despite having a SparkSession instance, the bridged APIs based on SparkSession#sqlContext are still used. Therefore, this PR makes some simplifications in this regard:\"\n\n1. `SparkSession#sqlContext#read` -> `SparkSession#read`\n\n```scala\n/**\n * Returns a [[DataFrameReader]] that can be used to read non-streaming data in as a\n * `DataFrame`.\n * {{{\n * sqlContext.read.parquet(\"/path/to/file.parquet\")\n * sqlContext.read.schema(schema).json(\"/path/to/file.json\")\n * }}}\n *\n * group genericdata\n * since 1.4.0\n */\n def read: DataFrameReader = sparkSession.read\n```\n\n2. `SparkSession#sqlContext#setConf` -> `SparkSession#conf#set`\n\n```scala\n /**\n * Set the given Spark SQL configuration property.\n *\n * group config\n * since 1.0.0\n */\n def setConf(key: String, value: String): Unit = {\n sparkSession.conf.set(key, value)\n }\n```\n\n3. `SparkSession#sqlContext#getConf` -> `SparkSession#conf#get`\n\n```scala\n/**\n * Return the value of Spark SQL configuration property for the given key.\n *\n * group config\n * since 1.0.0\n */\n def getConf(key: String): String = {\n sparkSession.conf.get(key)\n }\n```\n\n4. `SparkSession#sqlContext#createDataFrame` -> `SparkSession#createDataFrame`\n\n```scala\n/**\n * Creates a DataFrame from an RDD of Product (e.g. case classes, tuples).\n *\n * group dataframes\n * since 1.3.0\n */\n def createDataFrame[A <: Product : TypeTag](rdd: RDD[A]): DataFrame = {\n sparkSession.createDataFrame(rdd)\n }\n```\n\n5. `SparkSession#sqlContext#sessionState` -> `SparkSession#sessionState`\n\n```scala\nprivate[sql] def sessionState: SessionState = sparkSession.sessionState\n```\n\n6. `SparkSession#sqlContext#sharedState` -> `SparkSession#sharedState`\n\n```scala\nprivate[sql] def sharedState: SharedState = sparkSession.sharedState\n```\n\n7. `SparkSession#sqlContext#streams` -> `SparkSession#streams`\n\n```scala\n/**\n * Returns a `StreamingQueryManager` that allows managing all the\n * [[org.apache.spark.sql.streaming.StreamingQuery StreamingQueries]] active on `this` context.\n *\n * since 2.0.0\n */\n def streams: StreamingQueryManager = sparkSession.streams\n```\n\n8. `SparkSession#sqlContext#uncacheTable` -> `SparkSession#catalog#uncacheTable`\n\n```scala\n/**\n * Removes the specified table from the in-memory cache.\n * group cachemgmt\n * since 1.3.0\n */\n def uncacheTable(tableName: String): Unit = {\n sparkSession.catalog.uncacheTable(tableName)\n }\n```\n\n### Why are the changes needed?\nDecrease the nesting levels of API calls\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\n- Pass GitHub Actions\n- Manually checked `SparkHiveExample`\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #47210 from LuciferYang/session.sqlContext.\n\nAuthored-by: yangjie01 \nSigned-off-by: yangjie01 ","shortMessageHtmlLink":"[SPARK-48805][SQL][ML][SS][AVRO][EXAMPLES] Replace calls to bridged A…"}},{"before":"bf25f0ac80020b364166d6aeae166cc229366e70","after":"d5dc22356d23b9d15b4f7c195e60c3767ea205ad","ref":"refs/heads/master","pushedAt":"2024-07-04T08:12:21.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48280][SQL] Improve collation testing surface area using expression walking\n\n### What changes were proposed in this pull request?\nThis PR is introducing Expression Walker in different forms in order to improve collation testing surface area. The tests added include:\n\n1. Expression Walker for expression evaluation\n2. Expression Walker for SQL query examples\n3. Expression Walker for codeGen generation\n\n### Why are the changes needed?\nCollations introduced a lot of changes to many functions and parts of the code and these tests aim to catch existing errors and prevent addition of new functions without proper implementation of collation support.\nTo emphasise the importance of these tests, some of the relevant tickets that were opened as a byproduct of this testing:\n\n- https://issues.apache.org/jira/browse/SPARK-48472\n- https://issues.apache.org/jira/browse/SPARK-48572\n- https://issues.apache.org/jira/browse/SPARK-48574\n- https://issues.apache.org/jira/browse/SPARK-48600\n- https://issues.apache.org/jira/browse/SPARK-48662\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nThis PR is only related to testing.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46801 from mihailom-db/SPARK-48280.\n\nAuthored-by: Mihailo Milosevic \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48280][SQL] Improve collation testing surface area using expre…"}},{"before":"f4386745a6b7cddcb0a6f0e6a8f1512e8bd6d9af","after":"bf25f0ac80020b364166d6aeae166cc229366e70","ref":"refs/heads/master","pushedAt":"2024-07-04T07:28:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48715][SQL] Integrate UTF8String validation into collation-aware string function implementations\n\n### What changes were proposed in this pull request?\nUse our own invalid UTF-8 byte sequence replacement logic in UTF8String, before all `.toString()` method calls.\n\n### Why are the changes needed?\nAvoid relying on Java to perform invalid UTF-8 byte sequence replacement, and ensure consistent results.\n\n### Does this PR introduce _any_ user-facing change?\nYes, collation aware string function implementations will now rely on our own invalid UTF-8 string replacement implementation, instead of Java's.\n\n### How was this patch tested?\nExisting tests, with some changes in `UTF8StringSuite` and `CollationSupportSuite`.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #47131 from uros-db/make-valid.\n\nAuthored-by: Uros Bojanic <157381213+uros-db@users.noreply.github.com>\nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48715][SQL] Integrate UTF8String validation into collation-awa…"}},{"before":"8ace648a8357c0ec8ff8b1d9c2a49a17a07d2202","after":"f4386745a6b7cddcb0a6f0e6a8f1512e8bd6d9af","ref":"refs/heads/master","pushedAt":"2024-07-04T03:44:32.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HeartSaVioR","name":"Jungtaek Lim","path":"/HeartSaVioR","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1317309?s=80&v=4"},"commit":{"message":"[SPARK-48799][SS] Refactor versioning for operator metadata read/write and callers\n\n### What changes were proposed in this pull request?\nRefactor versioning for operator metadata read/write and callers\n\n### Why are the changes needed?\nProvides clear separation around version management for operator metadata related updates. Also needed for subsequent changes in this area\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nExisting unit tests\n\n```\n===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.execution.streaming.state.OperatorStateMetadataSuite, threads: block-manager-ask-thread-pool-92 (daemon=true), Idle Worker Monitor for python3 (daemon=true), rpc-boss-3-1 (daemon=true), block-manager-ask-thread-pool-8 (daemon=true), block-manager-ask-thread-pool-59 (daemon=true), ForkJoinPool.commonPool-worker-3 (daemon=true), block-manager-ask-thread-pool-29 (daemon=true), block-manager-ask-thread-pool-90 (daemon=true), block-manager-ask-thread-pool-48 (daemo...\n[info] Run completed in 30 seconds, 35 milliseconds.\n[info] Total number of tests run: 10\n[info] Suites: completed 1, aborted 0\n[info] Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0\n[info] All tests passed.\n[success] Total time: 44 s, completed Jul 3, 2024, 3:00:58 PM\n```\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #47203 from anishshri-db/task/SPARK-48799.\n\nAuthored-by: Anish Shrigondekar \nSigned-off-by: Jungtaek Lim ","shortMessageHtmlLink":"[SPARK-48799][SS] Refactor versioning for operator metadata read/writ…"}},{"before":"906079202a93fdcb68a69f391bc1e1e003a2e75a","after":"8ace648a8357c0ec8ff8b1d9c2a49a17a07d2202","ref":"refs/heads/master","pushedAt":"2024-07-03T23:19:26.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-47046][BUILD][TESTS] Upgrade `mysql-connector-j` to 9.0.0\n\n### What changes were proposed in this pull request?\n\nThis PR aims to upgrade `mysql-connector-j` from 8.4.0 to 9.0.0.\n\n### Why are the changes needed?\n\nVersion 9.0.0 is a new GA release version and is recommended for use. The full release notes of `mysql-connector-j` 9.0.0:\nhttps://dev.mysql.com/doc/relnotes/connector-j/en/news-9-0-0.html\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nPass GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #47200 from wayneguow/upgrade_mysql_connector.\n\nAuthored-by: Wei Guo \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-47046][BUILD][TESTS] Upgrade mysql-connector-j to 9.0.0"}},{"before":"b93bfcd95e16c09fbdae3686e5bb5d180417ae30","after":"906079202a93fdcb68a69f391bc1e1e003a2e75a","ref":"refs/heads/master","pushedAt":"2024-07-03T23:09:14.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48780][SQL] Make errors in NamedParametersSupport generic to handle functions and procedures\n\n### What changes were proposed in this pull request?\n\nThis PR makes errors in `NamedParametersSupport` generic so that we can reuse that class to handle argument rearrangement both in functions and procedures. It is a subset of changes from PR #47183.\n\n### Why are the changes needed?\n\nThese changes are needed in preparation for adding support for stored procedures.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nExists tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #47189 from aokolnychyi/spark-48780.\n\nAuthored-by: Anton Okolnychyi \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48780][SQL] Make errors in NamedParametersSupport generic to h…"}},{"before":"2e210d9390d7052d5c80928632dba9b45917ef99","after":"b93bfcd95e16c09fbdae3686e5bb5d180417ae30","ref":"refs/heads/master","pushedAt":"2024-07-03T20:21:33.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48787][BUILD] Upgrade Kafka to 3.7.1\n\n### What changes were proposed in this pull request?\nThe pr aims to upgrade `kafka` from `3.7.0` to `3.7.1`.\n\n### Why are the changes needed?\nhttps://downloads.apache.org/kafka/3.7.1/RELEASE_NOTES.html\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nPass GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #47191 from panbingkun/SPARK-48787.\n\nAuthored-by: panbingkun \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48787][BUILD] Upgrade Kafka to 3.7.1"}},{"before":"bb59267abecb6f950aa655c880ce1df75df8b04b","after":"2e210d9390d7052d5c80928632dba9b45917ef99","ref":"refs/heads/master","pushedAt":"2024-07-03T11:49:19.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48790][TESTING] Use checkDatasetUnorderly in DeprecatedDatasetAggregatorSuite\n\n### What changes were proposed in this pull request?\n\nUse `checkDatasetUnorderly` in DeprecatedDatasetAggregatorSuite. The tests do not need depending on the ordering of the result.\n\n### Why are the changes needed?\n\nImprove test cases.\n\n### Does this PR introduce _any_ user-facing change?\n\nNO\n\n### How was this patch tested?\n\nN/A\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNO\n\nCloses #47196 from amaliujia/fix_tests.\n\nAuthored-by: Rui Wang \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48790][TESTING] Use checkDatasetUnorderly in DeprecatedDataset…"}},{"before":"d410e0631fbfa854573d6b005d57964426db7610","after":"bb59267abecb6f950aa655c880ce1df75df8b04b","ref":"refs/heads/master","pushedAt":"2024-07-03T09:12:05.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[MINOR][TESTS] Replace `getResource` with `getWorkspaceFilePath` to enable `HiveUDFSuite` to run successfully in the IDE\n\n### What changes were proposed in this pull request?\nThe pr aims to replace `getResource` with `getWorkspaceFilePath` to enable `HiveUDFSuite#hive struct udf` to run successfully in the IDE.\n\n### Why are the changes needed?\n- Before:\n \"image\"\n\n- After:\n \"image\"\n\n### Does this PR introduce _any_ user-facing change?\nNo, only test.\n\n### How was this patch tested?\n- Pass GA\n- Manually test.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #47194 from panbingkun/minor_HiveUDFSuite.\n\nAuthored-by: panbingkun \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[MINOR][TESTS] Replace getResource with getWorkspaceFilePath to e…"}},{"before":"b8cc91cef9096a18a8cd8600372f5b625b9638f5","after":"d410e0631fbfa854573d6b005d57964426db7610","ref":"refs/heads/master","pushedAt":"2024-07-03T07:54:31.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48282][SQL][FOLLOWUP] Fix FindInSet code generation\n\n### What changes were proposed in this pull request?\nFix codeGen path for FindInSet.\n\n### Why are the changes needed?\nError in original PR (https://github.com/apache/spark/pull/46682), caught by: https://github.com/apache/spark/pull/46801.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nExisting tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #47179 from uros-db/fix-findinset.\n\nAuthored-by: Uros Bojanic <157381213+uros-db@users.noreply.github.com>\nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48282][SQL][FOLLOWUP] Fix FindInSet code generation"}},{"before":"9551907053089e5ee9f66804c1fe687a1e8749bd","after":"b8cc91cef9096a18a8cd8600372f5b625b9638f5","ref":"refs/heads/master","pushedAt":"2024-07-03T03:40:46.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48760][SQL] Introduce ALTER TABLE ... CLUSTER BY SQL syntax to change clustering columns\n\n### What changes were proposed in this pull request?\n\nIntroduce ALTER TABLE ... CLUSTER BY SQL syntax to change the clustering columns:\n```sql\nALTER TABLE tbl CLUSTER BY (a, b); -- update clustering columns to a and b\nALTER TABLE tbl CLUSTER BY NONE; -- remove clustering columns\n```\n\nThis change updates the clustering columns for catalogs to utilize. Clustering columns are maintained in:\n* CatalogTable's `PROP_CLUSTERING_COLUMNS` for session catalog\n* Table's `partitioning` transform array for V2 catalog\n\nwhich is consistent with CREATE TABLE CLUSTER BY( https://github.com/apache/spark/pull/42577).\n\n### Why are the changes needed?\n\nProvides a way to update the clustering columns.\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, it introduces new SQL syntax and a new keyword NONE.\n\n### How was this patch tested?\n\nNew unit tests.\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #47156 from zedtang/alter-table-cluster-by.\n\nLead-authored-by: Jiaheng Tang \nCo-authored-by: Wenchen Fan \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48760][SQL] Introduce ALTER TABLE ... CLUSTER BY SQL syntax to…"}},{"before":"15f216774ee5dad1043f7f0092cb17a9a1077921","after":"9551907053089e5ee9f66804c1fe687a1e8749bd","ref":"refs/heads/master","pushedAt":"2024-07-03T03:36:59.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48774][SQL] Use SparkSession in SQLImplicits\n\n### What changes were proposed in this pull request?\n\nThis PR proposes to use `SparkSession` at `SQLImplicits`.\n\n### Why are the changes needed?\n\nTo use `SparkSsession` that is encouraged over `SQLContext`.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nCI in this PR should test them out.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #47173 from HyukjinKwon/minor-implicit-session.\n\nAuthored-by: Hyukjin Kwon \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48774][SQL] Use SparkSession in SQLImplicits"}},{"before":"6d226c1e2abad1df230511c6d0add295844a336b","after":"15f216774ee5dad1043f7f0092cb17a9a1077921","ref":"refs/heads/master","pushedAt":"2024-07-03T01:05:54.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48785][DOCS] Add a simple Python data source example in the user guide\n\n### What changes were proposed in this pull request?\n\nThis PR adds a self-contained, simple example implementation of a Python data source in the user guide to help users get started more quickly.\n\n### Why are the changes needed?\n\nTo improve the documentation\n\n### Does this PR introduce _any_ user-facing change?\n\nNo\n\n### How was this patch tested?\n\nExisting tests\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #47187 from allisonwang-db/spark-48785-pyds-user-guide.\n\nAuthored-by: allisonwang-db \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48785][DOCS] Add a simple Python data source example in the us…"}},{"before":"0aa32e42f97c79961240cff5e9e67d8f0fa91d40","after":"6d226c1e2abad1df230511c6d0add295844a336b","ref":"refs/heads/master","pushedAt":"2024-07-03T00:26:51.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48714][PYTHON] Implement `DataFrame.mergeInto` in PySpark\n\n### What changes were proposed in this pull request?\n\nSpark 4.0 added a new `df.mergeInto` API, but it is missing from PySpark. This PR fixes that.\nThe support for this API in Spark Connect Python API will be added later by https://github.com/apache/spark/pull/46960.\n\n### Why are the changes needed?\n\nBecause PySpark does not support `df.mergeInto`.\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, the user would be able to use the `df.mergeInto` API.\n\n### How was this patch tested?\n\nNew unit tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #47086 from xupefei/pyspark-mergeinto.\n\nAuthored-by: Paddy Xu \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48714][PYTHON] Implement DataFrame.mergeInto in PySpark"}},{"before":"fea930ab8d6e524dd0125cd76596d38659a58050","after":"0aa32e42f97c79961240cff5e9e67d8f0fa91d40","ref":"refs/heads/master","pushedAt":"2024-07-03T00:23:29.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48710][PYTHON] Use NumPy 2.0 compatible types\n\n### What changes were proposed in this pull request?\n * Replace NumPy types removed in NumPy 2.0 with their equivalent counterparts\n * Make tests compatible to new `__repr__` of numerical scalars\n\n### Why are the changes needed?\n\nPySpark references some code which was removed with NumPy 2.0:\n * `np.NaN` was removed, should be replaced with `np.nan`\n * `np.string_` was removed, [is an alias for](https://github.com/numpy/numpy/blob/v1.26.5/numpy/__init__.pyi#L3134) `np.bytes_`\n * `np.float_` was removed, [is defined the same as](https://github.com/numpy/numpy/blob/v1.26.5/numpy/__init__.pyi#L3042-3043) `np.double`\n * `np.unicode_` was removed, [is an alias for](https://github.com/numpy/numpy/blob/v1.26.5/numpy/__init__.pyi#L3148) `np.str_`\n\nNumPy 2.0 changed the `__repr__` of numerical scalars to contain type information (e.g. `np.int32(3)` instead of `3`). Old behavior can be enabled by setting `numpy.printoptions(legacy=\"1.25\")` (or the older `1.21` and `1.13` legacy modes). There are multiple tests and doctests that rely on the old behavior.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nTests for modules `pyspark-connect`, `pyspark-core`, `pyspark-errors`, `pyspark-mllib`, `pyspark-pandas`, `pyspark-sql`, `pyspark-resource`, `pyspark-testing` were executed in a local venv with `numpy==2.0.0` installed.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #47083 from codesorcery/SPARK-48710.\n\nAuthored-by: Patrick Marx <6949483+codesorcery@users.noreply.github.com>\nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48710][PYTHON] Use NumPy 2.0 compatible types"}},{"before":"50ee3affd5dd0e41dac7a94638d0df3873165dd1","after":"fa7a6ab43f2b8b353fdbf9dff04b2d3446d7ac95","ref":"refs/heads/branch-3.4","pushedAt":"2024-07-03T00:22:29.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48710][PYTHON][3.5] Limit NumPy version to supported range (>=1.15,<2)\n\n### What changes were proposed in this pull request?\n * Add a constraint for `numpy<2` to the PySpark package\n\n### Why are the changes needed?\n\nPySpark references some code which was removed with NumPy 2.0. Thus, if `numpy>=2` is installed, executing PySpark may fail.\n\nhttps://github.com/apache/spark/pull/47083 updates the `master` branch to be compatible with NumPy 2. This PR adds a version bound for older releases, where it won't be applied.\n\n### Does this PR introduce _any_ user-facing change?\nNumPy will be limited to `numpy<2` when installing `pypspark` with extras `ml`, `mllib`, `sql`, `pandas_on_spark` or `connect`.\n\n### How was this patch tested?\nVia existing CI jobs.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #47175 from codesorcery/SPARK-48710-numpy-upper-bound.\n\nAuthored-by: Patrick Marx <6949483+codesorcery@users.noreply.github.com>\nSigned-off-by: Hyukjin Kwon \n(cherry picked from commit 44eba46cc8b90be990177450141c48746fa5b67d)\nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48710][PYTHON][3.5] Limit NumPy version to supported range (>=…"}},{"before":"ef4e45646205ec7fb08485a348d6bb5879477cb8","after":"44eba46cc8b90be990177450141c48746fa5b67d","ref":"refs/heads/branch-3.5","pushedAt":"2024-07-03T00:22:14.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48710][PYTHON][3.5] Limit NumPy version to supported range (>=1.15,<2)\n\n### What changes were proposed in this pull request?\n * Add a constraint for `numpy<2` to the PySpark package\n\n### Why are the changes needed?\n\nPySpark references some code which was removed with NumPy 2.0. Thus, if `numpy>=2` is installed, executing PySpark may fail.\n\nhttps://github.com/apache/spark/pull/47083 updates the `master` branch to be compatible with NumPy 2. This PR adds a version bound for older releases, where it won't be applied.\n\n### Does this PR introduce _any_ user-facing change?\nNumPy will be limited to `numpy<2` when installing `pypspark` with extras `ml`, `mllib`, `sql`, `pandas_on_spark` or `connect`.\n\n### How was this patch tested?\nVia existing CI jobs.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #47175 from codesorcery/SPARK-48710-numpy-upper-bound.\n\nAuthored-by: Patrick Marx <6949483+codesorcery@users.noreply.github.com>\nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48710][PYTHON][3.5] Limit NumPy version to supported range (>=…"}},{"before":"ee0d30686c4fb82d09339cfac190ae61cb0939a8","after":"fea930ab8d6e524dd0125cd76596d38659a58050","ref":"refs/heads/master","pushedAt":"2024-07-02T20:21:57.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HeartSaVioR","name":"Jungtaek Lim","path":"/HeartSaVioR","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1317309?s=80&v=4"},"commit":{"message":"[SPARK-48770][SS] Change to read operator metadata once on driver to check if we can find info for numColsPrefixKey used for session window agg queries\n\n### What changes were proposed in this pull request?\nChange to read operator metadata once on driver to check if we can find info for numColsPrefixKey used for session window agg queries\n\n### Why are the changes needed?\nAvoid reading the operator metadata file multiple times on the executors\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nExisting unit tests\n\n```\n===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.execution.datasources.v2.state.RocksDBStateDataSourceReadSuite, threads: ForkJoinPool.commonPool-worker-6 (daemon=true), ForkJoinPool.commonPool-worker-4 (daemon=true), Idle Worker Monitor for python3 (daemon=true), ForkJoinPool.commonPool-worker-7 (daemon=true), ForkJoinPool.commonPool-worker-5 (daemon=true), ForkJoinPool.commonPool-worker-3 (daemon=true), rpc-boss-3-1 (daemon=true), ForkJoinPool.commonPool-worker-8 (daemon=true), shuffle-boss-6-1 (daemon=tru...\n[info] Run completed in 1 minute, 39 seconds.\n[info] Total number of tests run: 14\n[info] Suites: completed 1, aborted 0\n[info] Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0\n[info] All tests passed.\n```\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #47167 from anishshri-db/task/SPARK-48770.\n\nAuthored-by: Anish Shrigondekar \nSigned-off-by: Jungtaek Lim ","shortMessageHtmlLink":"[SPARK-48770][SS] Change to read operator metadata once on driver to …"}},{"before":"db9e1ac92bea7fabe62385fb52a356eca98a7cd8","after":"ee0d30686c4fb82d09339cfac190ae61cb0939a8","ref":"refs/heads/master","pushedAt":"2024-07-02T20:16:39.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HeartSaVioR","name":"Jungtaek Lim","path":"/HeartSaVioR","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1317309?s=80&v=4"},"commit":{"message":"[SPARK-48589][SQL][SS] Add option snapshotStartBatchId and snapshotPartitionId to state data source\n\n### What changes were proposed in this pull request?\n\nThis PR defines two new options, snapshotStartBatchId and snapshotPartitionId, for the existing state reader. Both of them should be provided at the same time.\n1. When there is no snapshot file at `snapshotStartBatch` (note there is an off-by-one issue between version and batch Id), throw an exception.\n2. Otherwise, the reader should continue to rebuild the state by reading delta files only, and ignore all snapshot files afterwards.\n3. Note that if a `batchId` option is already specified. That batchId is the ending batchId, we should then end at that batchId.\n4. This feature supports state generated by HDFS state store provider and RocksDB state store provider with changelog checkpointing enabled. **It does not support RocksDB with changelog disabled which is the default for RocksDB.**\n\n### Why are the changes needed?\n\nSometimes when a snapshot is corrupted, users want to bypass it when reading a later state. This PR gives user ability to specify the starting snapshot version and partition. This feature can be useful for debugging purpose.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nCreated test cases for testing edge cases for the input of new options. Created test for the new public function `replayReadStateFromSnapshot`. Created integration test for the new options against four stateful operators: limit, aggregation, deduplication, stream-stream join. Instead of generating states within the tests which is unstable, I prepare golden files for the integration test.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46944 from eason-yuchen-liu/skipSnapshotAtBatch.\n\nLead-authored-by: Yuchen Liu \nCo-authored-by: Yuchen Liu <170372783+eason-yuchen-liu@users.noreply.github.com>\nSigned-off-by: Jungtaek Lim ","shortMessageHtmlLink":"[SPARK-48589][SQL][SS] Add option snapshotStartBatchId and snapshotPa…"}},{"before":"4ee37edf063aa7b3c0e970fcf769d2d2755938e3","after":"db9e1ac92bea7fabe62385fb52a356eca98a7cd8","ref":"refs/heads/master","pushedAt":"2024-07-02T17:34:08.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48177][BUILD] Upgrade `Apache Parquet` to 1.14.1\n\n### What changes were proposed in this pull request?\n\n### Why are the changes needed?\n\nFixes quite a few bugs on the Parquet side: https://github.com/apache/parquet-mr/blob/master/CHANGES.md#version-1140\n\n### Does this PR introduce _any_ user-facing change?\n\nNo\n\n### How was this patch tested?\n\nUsing the existing unit tests\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46447 from Fokko/fd-bump-parquet.\n\nAuthored-by: Fokko Driesprong \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48177][BUILD] Upgrade Apache Parquet to 1.14.1"}},{"before":"1bfc9c3cbdc5d017fef68b0e84dd7fd22d2fef0f","after":"50ee3affd5dd0e41dac7a94638d0df3873165dd1","ref":"refs/heads/branch-3.4","pushedAt":"2024-07-02T14:23:00.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48759][SQL] Add migration doc for CREATE TABLE AS SELECT behavior change behavior change since Spark 3.4 (branch-3.5)\n\n### What changes were proposed in this pull request?\n\nThis PR is a follow-up to #47152 against `branch-3.5`.\n\nAdd migration guide for `CREATE TABLE AS SELECT...` behavior change.\n\nSPARK-41859 changes the behaviour for `CREATE TABLE AS SELECT ...` from OVERWRITE to APPEND when `spark.sql.legacy.allowNonEmptyLocationInCTAS` is set to `true`:\n\n```\ndrop table if exists test_table;\ncreate table test_table location '/tmp/test_table' stored as parquet as select 1 as col union all select 2 as col;\ndrop table if exists test_table;\ncreate table test_table location '/tmp/test_table' stored as parquet as select 3 as col union all select 4 as col;\nselect * from test_table;\n\n```\nThis produces {3, 4} in Spark <3.4.0 and {1, 2, 3, 4} in Spark 3.4.0 and later. This is a silent change in `spark.sql.legacy.allowNonEmptyLocationInCTAS` behaviour which introduces wrong results in the user application.\n\n### Why are the changes needed?\nThis documents a behavior change starting in Spark 3.4 for `CREATE TABLE AS SELECT`\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\n`doc build\n`\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #47178 from asl3/allowNonEmptyLocationInCTAS-3.5.\n\nAuthored-by: Amanda Liu \nSigned-off-by: Wenchen Fan \n(cherry picked from commit ef4e45646205ec7fb08485a348d6bb5879477cb8)\nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48759][SQL] Add migration doc for CREATE TABLE AS SELECT behav…"}},{"before":"df70cc1797c60e913885106aa155f4047a070b2a","after":"ef4e45646205ec7fb08485a348d6bb5879477cb8","ref":"refs/heads/branch-3.5","pushedAt":"2024-07-02T14:22:28.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48759][SQL] Add migration doc for CREATE TABLE AS SELECT behavior change behavior change since Spark 3.4 (branch-3.5)\n\n### What changes were proposed in this pull request?\n\nThis PR is a follow-up to #47152 against `branch-3.5`.\n\nAdd migration guide for `CREATE TABLE AS SELECT...` behavior change.\n\nSPARK-41859 changes the behaviour for `CREATE TABLE AS SELECT ...` from OVERWRITE to APPEND when `spark.sql.legacy.allowNonEmptyLocationInCTAS` is set to `true`:\n\n```\ndrop table if exists test_table;\ncreate table test_table location '/tmp/test_table' stored as parquet as select 1 as col union all select 2 as col;\ndrop table if exists test_table;\ncreate table test_table location '/tmp/test_table' stored as parquet as select 3 as col union all select 4 as col;\nselect * from test_table;\n\n```\nThis produces {3, 4} in Spark <3.4.0 and {1, 2, 3, 4} in Spark 3.4.0 and later. This is a silent change in `spark.sql.legacy.allowNonEmptyLocationInCTAS` behaviour which introduces wrong results in the user application.\n\n### Why are the changes needed?\nThis documents a behavior change starting in Spark 3.4 for `CREATE TABLE AS SELECT`\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\n`doc build\n`\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #47178 from asl3/allowNonEmptyLocationInCTAS-3.5.\n\nAuthored-by: Amanda Liu \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48759][SQL] Add migration doc for CREATE TABLE AS SELECT behav…"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEd9_90QA","startCursor":null,"endCursor":null}},"title":"Activity · apache/spark"}