The idealized formula for the updating current value for a key (y0 -> y1) is given as:
The idealized formula for the updating current value for a key (y0 -> y1) is given as:
delta = (t1 - t0) / halflife y1 = y0 * 2^(-delta) + n
However, we want to avoid having to rescale every single cell every time we update; i.e. a cell with a zero value should continue to have a zero value when n=0.
Therefore, we introduce a change of variable to cell values (z) along with a scale factor (scale), and the following formula:
(1) zN = yN * scaleN
Our constraint is expressed as:
(2) If n=0, z1 = z0
In that case:
(3) If n=0, (y1 * scale1) = (y0 * scale0) (4) Substituting for y1, (y0 * 2(-delta) + 0) * scale1 = y0 * scale0 (5) 2(-delta) * scale1 = scale0 (6) scale1 = scale0 * 2^(delta)
Also, to express z1 in terms of z0, we say:
(7) z1 = y1 * scale1 (8) z1 = (y0 * 2(-delta) + n) * scale1 (9) z1 = ((z0 / scale0) * 2(-delta) + n) * scale1 (10) z1 / scale1 = (z0 / (scale1 * 2(-delta))) * 2(-delta) + n (11) z1 / scale1 = z0 / scale1 + n (12) z1 = z0 + n * scale1
So, for cells where n=0, we just update scale0 to scale1, and for cells where n is non-zero, we update z1 in terms of z0 and scale1.
If we convert scale to logscale, we have:
(13) logscale1 = logscale0 + delta * log(2) (14) z1 = z0 + n * exp(logscale1)
When logscale1 gets big, we start to distort z1. For example, exp(36) is close to 2^53. We can measure when n * exp(logscale1) gets big, and in those cases we can rescale all our cells (set each z to its corresponding y) and set the logscale to 0.
(15) y1 = z1 / scale1 (16) y1 = z1 / exp(logscale1) (17) y1 = z1 * exp(-logscale1)
Represents a decaying scalar value at a particular point in time.
Represents a decaying scalar value at a particular point in time.
The value decays according to halfLife. Another way to think about DoubleAt is that it represents a particular decay curve (and in particular, a point along that curve). Two DoubleAt values may be equivalent if they are two points on the same curve.
The timeToZero and timeToUnit methods can be used to "normalize" DoubleAt values. If two DoubleAt
values do not produce the same (approximate) Double values from these methods, they represent different
curves.
DecayingCMS is a module to build count-min sketch instances whose counts decay exponentially.
Similar to a Map[K, com.twitter.algebird.DecayedValue], each key is associated with a single count value that decays over time. Unlike a map, the decyaing CMS is an approximate count -- in exchange for the possibility of over-counting, we can bound its size in memory.
The intended use case is for metrics or machine learning where exact values aren't needed.
You can expect the keys with the biggest values to be fairly accurate but the very small values (rare keys or very old keys) to be lost in the noise. For both metrics and ML this should be fine: you can't learn too much from very rare values.
We recommend depth of at least 5, and width of at least 100, but you should do some experiments to determine the smallest parameters that will work for your use case.