Pulse · apache/spark · GitHub

June 29, 2024 – July 6, 2024

Overview

56 Active pull requests

0 Active issues
- 0 Merged pull requests
- 56 Open pull requests
- 0 Closed issues
- 0 New issues

56 Pull requests opened by 32 people

[SPARK-48756][CONNECT][PYTHON]Support for `df.debug()` in Connect Mode
#47153 opened Jun 30, 2024
[WIP] [SPARK-46837] [SPARK-48700] Mode expression for complex types (all collations)
#47154 opened Jun 30, 2024
[SPARK-48763][CONNECT][BUILD] Move connect server and common to builtin module
#47157 opened Jul 1, 2024
[SPARK-48750][SQL] AQEPropagateEmptyRelation convert broadcast query stage plan to empty relation causing error
#47158 opened Jul 1, 2024
[SPARK-48769][SQL] Support constant folding for ScalaUDF
#47164 opened Jul 1, 2024
[SPARK-48771][SQL] Speed up `LogicalPlanIntegrity.validateExprIdUniqueness` for large query plans
#47170 opened Jul 1, 2024
[SPARK-48773] Document config "spark.default.parallelism" by config builder framework
#47171 opened Jul 2, 2024
[DO-NOT-MERGE][DRAFT] Extract shared DataFrameSuite for Spark Connect
#47174 opened Jul 2, 2024
[SPARK-48775][SQL][STS] Replace SQLContext with SparkSession in STS
#47176 opened Jul 2, 2024
[SPARK-48776] Fix timestamp formatting for json, xml and csv
#47177 opened Jul 2, 2024
[SPARK-46625] CTE with Identifier clause as reference
#47180 opened Jul 2, 2024
Introduce versoning to jdbc connectors
#47181 opened Jul 2, 2024
[DO-NOT-MERGE] Structured logging style
#47182 opened Jul 2, 2024
[SPARK-44167][SQL] Support loading stored procedures in catalogs
#47183 opened Jul 2, 2024
[SPARK-48307][SQL][FOLLOWUP] Eliminate the use of mutable.ArrayBuffer
#47185 opened Jul 2, 2024
[SPARK-48784][SQL] Add ::: syntax as a shorthand for try_cast
#47186 opened Jul 2, 2024
[SPARK-48772][SS][SQL] State Data Source Change Feed Reader Mode
#47188 opened Jul 2, 2024
[SPARK-44167][SQL] Add Catalog APIs for loading stored procedures
#47190 opened Jul 2, 2024
[SPARK-48628][CORE] Add task peak on/off heap memory metrics
#47192 opened Jul 3, 2024
[Only Test] Make HiveGenericUDF's DeferredObject lazy
#47193 opened Jul 3, 2024
[SPARK-48802][SS][FOLLOWUP] FileStreamSource maxCachedFiles set to 0 causes batch with no files to be processed
#47195 opened Jul 3, 2024
[SPARK-48791][CORE] Fix perf regression caused by the accumulators registration overhead using CopyOnWriteArrayList
#47197 opened Jul 3, 2024
[SPARK-48793][SQL][TESTS] Unify v1 and v2 `ALTER TABLE .. DROP (COLUMN | COLUMNS) ...` tests
#47199 opened Jul 3, 2024
[SPARK-48798][PYTHON] Introduce `spark.profile.render` for SparkSession-based profiling
#47202 opened Jul 3, 2024
[SPARK-48800][CONNECT][SS] Deflake ClientStreamingQuerySuite
#47205 opened Jul 4, 2024
[SPARK-48801][BUILD][K8S] Upgrade `kubernetes-client` to 6.13.1
#47206 opened Jul 4, 2024
[ONLY TEST][HOLD] Upgrade rocksdbjni to 9.3.1
#47207 opened Jul 4, 2024
[SPARK-48803][SQL] Throw internal error in Orc(De)serializer to align with ParquetWriteSupport
#47208 opened Jul 4, 2024
[SPARK-48804][SQL] Add classIsLoadable & OutputCommitter.isAssignableFrom check for output committer class configrations
#47209 opened Jul 4, 2024
[SPARK-48807][SQL] Binary Support for CSV datasource
#47212 opened Jul 4, 2024
[SPARK-48809][PYTHON][DOCS] Reimplemented `spark version drop down` of the `PySpark doc site` and fix bug
#47214 opened Jul 4, 2024
[SPARK-48810][CONNECT] Session stop() API should be idempotent and not fail if the session is already closed by the server.
#47215 opened Jul 4, 2024
[SPARK-48280][SQL][FOLLOWUP] Improve collation testing surface area using expression walking
#47216 opened Jul 4, 2024
[SPARK-48812][SQL][TESTS] Add some test suites for `mariadb` jdbc connector
#47217 opened Jul 4, 2024
[MINOR][SQL][TESTS] Optimize the run tests command in the doc of suites in `docker-integration-tests` module
#47218 opened Jul 4, 2024
[SPARK-29430][DOCS] Documented new metric endpoints for Prometheus
#47219 opened Jul 4, 2024
[SPARK-48813][SQL][DOCS] Add a notice that `mariadb` protocol does not apply when the database is `MariaDB`
#47220 opened Jul 4, 2024
[SPARK-48814][BUILD] Upgrade `tink` to 1.14.0
#47221 opened Jul 4, 2024
[MINOR][PYTHON] Eliminating warnings for panda
#47222 opened Jul 5, 2024
[SPARK-48817][SQL] Eagerly execute union multi commands together
#47224 opened Jul 5, 2024
[SPARK-48818][PYTHON] Simplify `percentile` functions
#47225 opened Jul 5, 2024
[SPARK-48820][SQL][DOC] Correct the examples for Collate functions
#47226 opened Jul 5, 2024
[SPARK-48816][SQL] Shorthand for interval converters in UnivocityParser
#47227 opened Jul 5, 2024
[SPARK-48719][SQL][3.5] Fix the calculation bug of RegrSlope & RegrIntercept when the first parameter is null
#47230 opened Jul 5, 2024
[SPARK-48821][SQL] Support Update in DataFrameWriterV2
#47233 opened Jul 5, 2024
[MINOR][DOCS] Adding example to `drop_duplicates`
#47234 opened Jul 5, 2024
[MINOR][DOCS] Add example to `countDistinct`
#47235 opened Jul 5, 2024
[SPARK-48823][MINOR][DOCS] Improve clarity in `lag` docstring
#47236 opened Jul 5, 2024
[SPARK-48822][MINOR][DOCS] Add examples section header to `format_number` docstring
#47237 opened Jul 5, 2024
[DO-NOT-MERGE][SPARK-47047][SS] Add changes to support reading transformWithState value state variables
#47238 opened Jul 5, 2024
[SPARK-48592][INFRA] Add structured logging style script and GitHub workflow
#47239 opened Jul 5, 2024
[SPARK-48825][DOCS] Unify the 'See Also' section formatting across PySpark docstrings
#47240 opened Jul 5, 2024
[SPARK-48826]BUILD] Upgrade `fasterxml.jackson` to 2.17.2
#47241 opened Jul 6, 2024
[SPARK-48177][BUILD][FOLLOWUP] Update parquet version in `sql-data-sources-parquet.md` doc
#47242 opened Jul 6, 2024
[WIP][SPARK-48827] Upgrade `RoaringBitmap` to 1.2.0
#47243 opened Jul 6, 2024
[SPARK-48828][DOCS][MINOR] Update documentation to add `column` as alias of `col`
#47244 opened Jul 6, 2024

40 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[SPARK-48726] Create the StateSchemaV3 file format, and write this out for the TransformWithStateExec operator
#47104 commented on Jul 4, 2024 • 75 new comments
[SPARK-48742][SS] Virtual Column Family for RocksDB
#47107 commented on Jul 5, 2024 • 54 new comments
[SPARK-48755] State V2 base implementation and ValueState support
#47133 commented on Jul 3, 2024 • 38 new comments
[SPARK-48720][SQL] Align the command `ALTER TABLE ... UNSET TBLPROPERTIES ...` in v1 and v2
#47097 commented on Jul 3, 2024 • 14 new comments
[SPARK-48343][SQL] Introduction of SQL Scripting interpreter
#47026 commented on Jul 4, 2024 • 12 new comments
[SPARK-48794][CONNECT] df.mergeInto support for Spark Connect (Scala and Python)
#46960 commented on Jul 5, 2024 • 10 new comments
[SPARK-48728][SQL] Support ignoreNulls for collect_list and collect_set
#47149 commented on Jul 1, 2024 • 8 new comments
[WIP][SPARK-48752][PYTHON][CONNECT][DOCS] Introduce `pyspark.logger` for improved structured logging for PySpark
#47145 commented on Jul 6, 2024 • 5 new comments
[WIP][SPARK-48529][SQL] Introduction of Labels in SQL Scripting
#47146 commented on Jul 2, 2024 • 4 new comments
[SPARK-48613][SQL] SPJ: Support auto-shuffle one side + less join keys than partition keys
#47064 commented on Jul 3, 2024 • 4 new comments
[SPARK-48510][2/2] Support UDAF `toColumn` API in Spark Connect
#46849 commented on Jul 3, 2024 • 3 new comments
[SPARK-48493][PYTHON] Enhance Python Datasource Reader with direct Arrow Batch support for improved performance
#46826 commented on Jul 4, 2024 • 3 new comments
[SPARK-48440][SQL] Fix StringTranslate behaviour for non-UTF8_BINARY collations
#46761 commented on Jul 5, 2024 • 3 new comments
[SPARK-46741][SQL] Cache Table with CTE won't work
#44767 commented on Jul 3, 2024 • 3 new comments
[SPARK-48414][PYTHON] Fix breaking change in python's `fromJson`
#46737 commented on Jul 2, 2024 • 2 new comments
[SPARK-48739][SQL] Disable writing collated data to file formats that don't support them in non managed tables
#47127 commented on Jul 1, 2024 • 2 new comments
[SPARK-43242][CORE] Fix throw 'Unexpected type of BlockId' in shuffle corruption diagnose
#40921 commented on Jul 2, 2024 • 2 new comments
[SPARK-48698][SQL] Support analyze column stats for tables with collated columns
#47072 commented on Jul 1, 2024 • 2 new comments
[SPARK-48740][SQL] Catch missing window specification error early
#47129 commented on Jul 1, 2024 • 1 new comment
[SPARK-39901][CORE][SQL] Redesign `ignoreCorruptFiles` to make it more accurate by adding a new config `spark.files.ignoreCorruptFiles.errorClasses`
#47090 commented on Jul 3, 2024 • 1 new comment
[SPARK-48743][SQL][SS] MergingSessionIterator should better handle when getStruct returns null
#47134 commented on Jul 4, 2024 • 1 new comment
[SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry
#47084 commented on Jul 4, 2024 • 1 new comment
[SPARK-48696][SQL][CONNECT] Also truncate the schema row for show function
#47078 commented on Jul 1, 2024 • 1 new comment
[SPARK-48703][SQL][TESTS] Upgrade `mssql-jdbc` to 12.6.3.jre11
#47075 commented on Jul 4, 2024 • 1 new comment
[SPARK-48694][CORE]Manage memory used by external cache
#47067 commented on Jul 1, 2024 • 1 new comment
[SPARK-48669][K8S] K8s resource name prefix follows `DNS Subdomain Names` rule
#47039 commented on Jul 1, 2024 • 1 new comment
[SPARK-48667][PYTHON] Arrow python UDFS didn't support UDT as outputType
#47036 commented on Jul 1, 2024 • 1 new comment
[SPARK-48495][SQL][DOCS] Describe shredding scheme for Variant
#46831 commented on Jul 6, 2024 • 1 new comment
[SPARK-36680][SQL] Supports Dynamic Table Options for Spark SQL
#46707 commented on Jul 4, 2024 • 1 new comment
[SPARK-47780][SQL] Make catalyst-generated classes stable and uniquely named
#45955 commented on Jul 2, 2024 • 1 new comment
[SPARK-47217][SQL] Fix deduplicated expression resolution
#45552 commented on Jul 7, 2024 • 1 new comment
[SPARK-24497][SQL] Support recursive SQL
#40744 commented on Jul 3, 2024 • 1 new comment
[SPARK-40193][SQL] Merge subquery plans with different filters
#37630 commented on Jul 1, 2024 • 1 new comment
[SPARK-48592][INFRA] Add scala style check for logging message inline variables
#46947 commented on Jul 2, 2024 • 0 new comments
[SPARK-48505][CORE] Simplify the implementation of `Utils#isG1GC`
#46873 commented on Jul 1, 2024 • 0 new comments
[SPARK-48441][SQL] Fix StringTrim behaviour for non-UTF8_BINARY collations
#46762 commented on Jul 5, 2024 • 0 new comments
[WIP][SPARK-48725][SQL] Integrate CollationAwareUTF8String.lowerCaseCodePoints into string expressions
#47132 commented on Jul 2, 2024 • 0 new comments
[SPARK-47482] Add HiveDialect to sql module
#45644 commented on Jun 30, 2024 • 0 new comments
[SPARK-44811][BUILD] Upgrade Guava to 33.1.0-jre
#42493 commented on Jul 2, 2024 • 0 new comments
[WIP][SPARK-48350][SQL] Introduction of Custom Exceptions for Sql Scripting
#47147 commented on Jul 2, 2024 • 0 new comments