Tags · bluecipher/gporca

v3.97.1

Point ORCA pipelines to the gpdb 5X branch

With ORCA going into GPDB, we'll now be using the infrastructure scripts
in the gpdb5 repo instead of the gpdb master repo.

Apr 2, 2020
f604412
zip
tar.gz

v3.97.0

Only normalize histogram if well defined

When calculating the statistics of a filter node, the histograms of
newly projected columns are set to empty. Such histograms are not "well
defined" and thus should not be used to derive cardinality. Instead, we
use the default cardinality in such cases.

However, there was a bug which calculated the cardinality from non "well
defined" histograms in the presence of a disjunction on newly projected
columns. This would result in gross underestimation of expected rows of
the filter.

This commit fixes this issue.

Co-authored-by: Ashuka Xue <[email protected]>
Co-authored-by: Shreedhar Hardikar <[email protected]>

Mar 26, 2020
922dd7c
zip
tar.gz

v3.96.0

Create multi-phase DQAs only if all aggs are splittable

Consider the case below:

create table foo (citext a, citext b); explain select min(a), count(distinct a)
from foo;

Today in GPDB, no combine function exists for a `min` on citext. So
`ExecInitAgg` will fail for top level aggregate. Aggs with no combine function
are call non-splittable.

So we should create multi-phase DQAs only if all participating aggs are
splittable.

Mar 26, 2020
7fce634
zip
tar.gz

v3.95.0

Merge pull request #575 from sambitesh/window_filter

Check if there is direct dispatchable filter

Mar 16, 2020
717fdd9
zip
tar.gz

v3.94.0

Allow only equality comparisons for Dynamic Partition Elimination

This commit only allows equality comparisons when doing dynamic
partition elimination. It will be the default behavior moving forward.

Non-equality predicates for dynamic partition elimination is currently
expensive to execute since the executor must iterate over all the
partition rules for each row from its subtree and execute the
non-equality predicate. So for cases where there are a large number of
rows and/or partitions, this process of selecting the partition may
outweigh the savings gained by skipping the eliminated partitions.

The commit fixes the erroneous logic for removing "IS NOT NULL" exprs

They should only be removed if the selected partition expressions are
strict. Also add assert checks for certain assumptions made to test this
logic.

The commit also includes some minor refactors and removal of dead code.

MDP changes:

* Only plan size or cost changes (due to removal of IS NOT NULL
  predicates)

    data/dxl/minidump/DPE-SemiJoin.mdp
    data/dxl/minidump/IndexApply-PartKey-Is-IndexKey.mdp
    data/dxl/minidump/IndexApply-PartResolverExpand.mdp
    data/dxl/minidump/PartTbl-CSQ-PartKey.mdp
    data/dxl/minidump/SpoolShouldInvalidateUnresolvedDynamicScans.mdp
    data/dxl/minidump/IndexApply-Heterogeneous-BothSidesPartitioned.mdp

* Trace flag is added to preserve old behavior since they test specific
  scenarios which are now disabled

    data/dxl/minidump/DPE-with-unsupported-pred.mdp
    data/dxl/minidump/NLJ-Broadcast-DPE-Outer-Child.mdp
    data/dxl/minidump/PartTbl-IDFNull.mdp
    data/dxl/minidump/PartTbl-RangeJoinPred.mdp

* Addition of IS NOT NULL predicate due to stricter checks

    data/dxl/minidump/PartTbl-CSQ-PartKey.mdp

Co-authored-by: Shreedhar Hardikar <[email protected]>
Co-authored-by: Ashuka Xue <[email protected]>

Mar 11, 2020
1900260
zip
tar.gz

v3.93.0

Implemented Query, Greedy, MinCard with the new DPv2 xform

The existing query, greedy and mincard xforms didn't handle the new NAry joins that
contained LOJs. To solve this, we integrated query, greedy and mincard into the DPv2
xform, using properties. We hope to re-use this property infrastructure in the future
if/when we improve the cost model used for DPv2.

Until now, we stored only the best join expression per group. With this commit, we
store the best expression for each unique property. Properties right now are the
type of join enumeration used, i. e. query, greedy, mincard and DPv2. So, we might
store a separate greedy, mincard and DPv2 expression, for example.

For a picture of some new data structures introduced, see the "Data structures for DPv2 join enumeration"
comment in file CJoinOrderDPv2.h.

Co-authored-by: Sambitesh Dash <[email protected]>
Co-authored-by: Hans Zeller <[email protected]>

Feb 26, 2020
94c297e
zip
tar.gz

v3.92.2

Use correct CMAKE build option for debug builds

Previously, we used "DEBUG" (in all caps) in our CI/scripts, which isn't
canonical (https://cmake.org/cmake/help/v3.0/variable/CMAKE_BUILD_TYPE.html).

Authored-by: Chris Hajas <[email protected]>

Feb 26, 2020
ad944f6
zip
tar.gz

v3.92.1

Add mdps that weren't being tested.

These appear to be intended to be included in the test suite, but may
have been forgotten. They've also been updated accordingly since they
haven't been run in a while.

Authored-by: Chris Hajas <[email protected]>

Feb 18, 2020
713d081
zip
tar.gz

v3.92.0

Change bitmap index costing to choose bitmap NLJs more often

The experiments and assertions made below were found using the
cal_test.py calibration script. We used regression analysis and isolated
a single variable to determine the coefficients.

This commit makes substantial changes to costing bitmap indexes. Our
goal was to choose bitmap index NL joins instead of hash joins, as the
execution time of the bitmap NL joins was 10X+ less than hash joins
in many cases.

Previously, we were multiplying the rebinds by the bitmap page cost
which caused the cost to be much more expensive than a hash join in
many cases.

Now, we no longer multiply the page cost by the number of rebinds, and
instead multiply the rebinds by a much smaller rebind cost.

Additionally, we took this opportunity to simplify the cost model a bit
by removing the separate code path for small vs large NDVs. We did not
see the large NDV path being used in joins, and in non-join cases it had
very minimal impact on the cost.

This functionality is guarded by a traceflag, EopttraceCalibratedBitmapIndexCostModel.
In GPDB, it will be enabled by setting
`optimizer_cost_model=experimental`. The intent is to enable this by
default in the near future.

Co-authored-by: Chris Hajas <[email protected]>
Co-authored-by: Ashuka Xue <[email protected]>

Feb 15, 2020
8a5513e
zip
tar.gz

v3.91.0

Bump ORCA version to 3.91.0

Feb 13, 2020
4e52112
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3.97.1

v3.97.0

v3.96.0

v3.95.0

v3.94.0

v3.93.0

v3.92.2

v3.92.1

v3.92.0

v3.91.0

Tags: bluecipher/gporca