object GpuDecimalSumOverflow
All decimal processing in Spark has overflow detection as a part of it. Either it replaces
the value with a null in non-ANSI mode, or it throws an exception in ANSI mode. Spark will also
do the processing for larger values as Decimal values which are based on BigDecimal and have
unbounded precision. So in most cases it is impossible to overflow/underflow so much that an
incorrect value is returned. Spark will just use more and more memory to hold the value and
then check for overflow at some point when the result needs to be turned back into a 128-bit
value.
We cannot do the same thing. Instead we take three strategies to detect overflow.
1. For decimal values with a precision of 8 or under we follow Spark and do the SUM
on the unscaled value as a long, and then bit-cast the result back to a Decimal value.
this means that we can SUM 174,467,442,481 maximum or minimum decimal values with a
precision of 8 before overflow can no longer be detected. It is much higher for decimal
values with a smaller precision.
2. For decimal values with a precision from 9 to 20 inclusive we sum them as 128-bit values.
this is very similar to what we do in the first strategy. The main differences are that we
use a 128-bit value when doing the sum, and we check for overflow after processing each batch.
In the case of group-by and reduction that happens after the update stage and also after each
merge stage. This gives us enough room that we can always detect overflow when summing a
single batch. Even on a merge where we could be doing the aggregation on a batch that has
all max output values in it.
3. For values from 21 to 28 inclusive we have enough room to not check for overflow on teh update
aggregation, but for the merge aggregation we need to do some extra checks. This is done by
taking the digits above 28 and sum them separately. We then check to see if they would have
overflowed the original limits. This lets us detect overflow in cases where the original
value would have wrapped around. The reason this works is because we have a hard limit on the
maximum number of values in a single batch being processed. Int.MaxValue, or about 2.2
billion values. So we use a precision on the higher values that is large enough to handle
2.2 billion values and still detect overflow. This equates to a precision of about 10 more
than is needed to hold the higher digits. This effectively gives us unlimited overflow
detection.
4. For anything larger than precision 28 we do the same overflow detection for strategy 3, but
also do it on the update aggregation. This lets us fully detect overflows in any stage of
an aggregation.
Note that for Window operations either there is no merge stage or it only has a single value being merged into a batch instead of an entire batch being merged together. This lets us handle the overflow detection with what is built into GpuAdd.
- Alphabetic
- By Inheritance
- GpuDecimalSumOverflow
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
val
extraGuaranteePrecision: Int
Generally we want a guarantee that is at least 10x larger than the original overflow.
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
val
sumPrecisionIncrease: Int
The increase in precision for the output of a SUM from the input.
The increase in precision for the output of a SUM from the input. This is hard coded by Spark so we just have it here. This means that for most types without being limited to a precision of 38 you get 10-billion+ values before an overflow would even be possible.
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
val
updateCutoffPrecision: Int
The precision above which we need extra overflow checks while doing an update.
The precision above which we need extra overflow checks while doing an update. This is because anything above this precision could in theory overflow beyond detection within a single input batch.
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()