final class SortedBucketScioContext extends Serializable
- Alphabetic
- By Inheritance
- SortedBucketScioContext
- Serializable
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new SortedBucketScioContext(self: ScioContext)
Type Members
- class SortMergeTransformReadBuilder[K, R] extends Serializable
- class SortMergeTransformWriteBuilder[K, R, W] extends Serializable
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
sortMergeCoGroup[K, A, B, C, D](keyClass: Class[K], a: Read[A], b: Read[B], c: Read[C], d: Read[D])(implicit arg0: Coder[K], arg1: Coder[A], arg2: Coder[B], arg3: Coder[C], arg4: Coder[D]): SCollection[(K, (Iterable[A], Iterable[B], Iterable[C], Iterable[D]))]
- Annotations
- @experimental()
-
def
sortMergeCoGroup[K, A, B, C, D](keyClass: Class[K], a: Read[A], b: Read[B], c: Read[C], d: Read[D], targetParallelism: TargetParallelism)(implicit arg0: Coder[K], arg1: Coder[A], arg2: Coder[B], arg3: Coder[C], arg4: Coder[D]): SCollection[(K, (Iterable[A], Iterable[B], Iterable[C], Iterable[D]))]
For each key K in
aorborcord, return a resulting SCollection that contains a tuple with the list of values for that key ina,b,candd.For each key K in
aorborcord, return a resulting SCollection that contains a tuple with the list of values for that key ina,b,candd.See note on SortedBucketScioContext.sortMergeJoin() for information on how an SMB cogroup differs from a regular org.apache.beam.sdk.transforms.join.CoGroupByKey operation.
- keyClass
cogroup key class. Must have a Coder in Beam's default org.apache.beam.sdk.coders.CoderRegistry as custom key coders are not supported yet.
- targetParallelism
the desired parallelism of the job. See org.apache.beam.sdk.extensions.smb.TargetParallelism for more information.
- Annotations
- @experimental()
-
def
sortMergeCoGroup[K, A, B, C](keyClass: Class[K], a: Read[A], b: Read[B], c: Read[C])(implicit arg0: Coder[K], arg1: Coder[A], arg2: Coder[B], arg3: Coder[C]): SCollection[(K, (Iterable[A], Iterable[B], Iterable[C]))]
- Annotations
- @experimental()
-
def
sortMergeCoGroup[K, A, B, C](keyClass: Class[K], a: Read[A], b: Read[B], c: Read[C], targetParallelism: TargetParallelism)(implicit arg0: Coder[K], arg1: Coder[A], arg2: Coder[B], arg3: Coder[C]): SCollection[(K, (Iterable[A], Iterable[B], Iterable[C]))]
For each key K in
aorborc, return a resulting SCollection that contains a tuple with the list of values for that key ina,bandc.For each key K in
aorborc, return a resulting SCollection that contains a tuple with the list of values for that key ina,bandc.See note on SortedBucketScioContext.sortMergeJoin() for information on how an SMB cogroup differs from a regular org.apache.beam.sdk.transforms.join.CoGroupByKey operation.
- keyClass
cogroup key class. Must have a Coder in Beam's default org.apache.beam.sdk.coders.CoderRegistry as custom key coders are not supported yet.
- targetParallelism
the desired parallelism of the job. See org.apache.beam.sdk.extensions.smb.TargetParallelism for more information.
- Annotations
- @experimental()
-
def
sortMergeCoGroup[K, A, B](keyClass: Class[K], a: Read[A], b: Read[B])(implicit arg0: Coder[K], arg1: Coder[A], arg2: Coder[B]): SCollection[(K, (Iterable[A], Iterable[B]))]
- Annotations
- @experimental()
-
def
sortMergeCoGroup[K, A, B](keyClass: Class[K], a: Read[A], b: Read[B], targetParallelism: TargetParallelism)(implicit arg0: Coder[K], arg1: Coder[A], arg2: Coder[B]): SCollection[(K, (Iterable[A], Iterable[B]))]
For each key K in
aorbreturn a resulting SCollection that contains a tuple with the list of values for that key ina, andb.For each key K in
aorbreturn a resulting SCollection that contains a tuple with the list of values for that key ina, andb.See note on SortedBucketScioContext.sortMergeJoin() for information on how an SMB cogroup differs from a regular org.apache.beam.sdk.transforms.join.CoGroupByKey operation.
- keyClass
cogroup key class. Must have a Coder in Beam's default org.apache.beam.sdk.coders.CoderRegistry as custom key coders are not supported yet.
- targetParallelism
the desired parallelism of the job. See org.apache.beam.sdk.extensions.smb.TargetParallelism for more information.
- Annotations
- @experimental()
-
def
sortMergeGroupByKey[K, V](keyClass: Class[K], read: Read[V], targetParallelism: TargetParallelism)(implicit arg0: Coder[K], arg1: Coder[V]): SCollection[(K, Iterable[V])]
For each key K in
readreturn a resulting SCollection that contains a tuple with the list of values for that key inread.For each key K in
readreturn a resulting SCollection that contains a tuple with the list of values for that key inread.See note on SortedBucketScioContext.sortMergeJoin() for information on how an SMB group differs from a regular org.apache.beam.sdk.transforms.GroupByKey operation.
- keyClass
cogroup key class. Must have a Coder in Beam's default org.apache.beam.sdk.coders.CoderRegistry as custom key coders are not supported yet.
- targetParallelism
the desired parallelism of the job. See org.apache.beam.sdk.extensions.smb.TargetParallelism for more information.
- Annotations
- @experimental()
-
def
sortMergeGroupByKey[K, V](keyClass: Class[K], read: Read[V])(implicit arg0: Coder[K], arg1: Coder[V]): SCollection[(K, Iterable[V])]
For each key K in
readreturn a resulting SCollection that contains a tuple with the list of values for that key inread.For each key K in
readreturn a resulting SCollection that contains a tuple with the list of values for that key inread.See note on SortedBucketScioContext.sortMergeJoin() for information on how an SMB group differs from a regular org.apache.beam.sdk.transforms.GroupByKey operation.
- keyClass
grouping key class. Must have a Coder in Beam's default org.apache.beam.sdk.coders.CoderRegistry as custom key coders are not supported yet.
- Annotations
- @experimental()
-
def
sortMergeJoin[K, L, R](keyClass: Class[K], lhs: Read[L], rhs: Read[R], targetParallelism: TargetParallelism = TargetParallelism.auto())(implicit arg0: Coder[K], arg1: Coder[L], arg2: Coder[R]): SCollection[(K, (L, R))]
Return an SCollection containing all pairs of elements with matching keys in
lhsandrhs.Return an SCollection containing all pairs of elements with matching keys in
lhsandrhs. Each pair of elements will be returned as a (k, (v1, v2)) tuple, where (k, v1) is inlhsand (k, v2) is inrhs.Unlike a regular PairSCollectionFunctions.join(), the key information (namely, how to extract a comparable
KfromLandR) is remotely encoded in a org.apache.beam.sdk.extensions.smb.BucketMetadata file in the same directory as the input records. This transform requires a filesystem lookup to ensure that the metadata for each source are compatible. In return for reading pre-sorted data, the shuffle step in a typical org.apache.beam.sdk.transforms.GroupByKey operation can be eliminated.- keyClass
join key class. Must have a Coder in Beam's default org.apache.beam.sdk.coders.CoderRegistry as custom key coders are not supported yet.
- targetParallelism
the desired parallelism of the job. See org.apache.beam.sdk.extensions.smb.TargetParallelism for more information.
- Annotations
- @experimental()
-
def
sortMergeTransform[K, A, B, C](keyClass: Class[K], readA: Read[A], readB: Read[B], readC: Read[C], targetParallelism: TargetParallelism): SortMergeTransformReadBuilder[K, (Iterable[A], Iterable[B], Iterable[C])]
Perform a 3-way SortedBucketScioContext.sortMergeCoGroup() operation, then immediately apply a transformation function to the merged cogroups and re-write using the same bucketing key and hashing scheme.
Perform a 3-way SortedBucketScioContext.sortMergeCoGroup() operation, then immediately apply a transformation function to the merged cogroups and re-write using the same bucketing key and hashing scheme. By applying the write, transform, and write in the same transform, an extra shuffle step can be avoided.
- Annotations
- @experimental()
-
def
sortMergeTransform[K, A, B, C](keyClass: Class[K], readA: Read[A], readB: Read[B], readC: Read[C]): SortMergeTransformReadBuilder[K, (Iterable[A], Iterable[B], Iterable[C])]
Perform a 3-way SortedBucketScioContext.sortMergeCoGroup() operation, then immediately apply a transformation function to the merged cogroups and re-write using the same bucketing key and hashing scheme.
Perform a 3-way SortedBucketScioContext.sortMergeCoGroup() operation, then immediately apply a transformation function to the merged cogroups and re-write using the same bucketing key and hashing scheme. By applying the write, transform, and write in the same transform, an extra shuffle step can be avoided.
- Annotations
- @experimental()
-
def
sortMergeTransform[K, A, B](keyClass: Class[K], readA: Read[A], readB: Read[B], targetParallelism: TargetParallelism): SortMergeTransformReadBuilder[K, (Iterable[A], Iterable[B])]
Perform a 2-way SortedBucketScioContext.sortMergeCoGroup() operation, then immediately apply a transformation function to the merged cogroups and re-write using the same bucketing key and hashing scheme.
Perform a 2-way SortedBucketScioContext.sortMergeCoGroup() operation, then immediately apply a transformation function to the merged cogroups and re-write using the same bucketing key and hashing scheme. By applying the write, transform, and write in the same transform, an extra shuffle step can be avoided.
- Annotations
- @experimental()
-
def
sortMergeTransform[K, A, B](keyClass: Class[K], readA: Read[A], readB: Read[B]): SortMergeTransformReadBuilder[K, (Iterable[A], Iterable[B])]
Perform a 2-way SortedBucketScioContext.sortMergeCoGroup() operation, then immediately apply a transformation function to the merged cogroups and re-write using the same bucketing key and hashing scheme.
Perform a 2-way SortedBucketScioContext.sortMergeCoGroup() operation, then immediately apply a transformation function to the merged cogroups and re-write using the same bucketing key and hashing scheme. By applying the write, transform, and write in the same transform, an extra shuffle step can be avoided.
- Annotations
- @experimental()
-
def
sortMergeTransform[K, R](keyClass: Class[K], read: Read[R], targetParallelism: TargetParallelism): SortMergeTransformReadBuilder[K, Iterable[R]]
Perform a SortedBucketScioContext.sortMergeGroupByKey() operation, then immediately apply a transformation function to the merged groups and re-write using the same bucketing key and hashing scheme.
Perform a SortedBucketScioContext.sortMergeGroupByKey() operation, then immediately apply a transformation function to the merged groups and re-write using the same bucketing key and hashing scheme. By applying the write, transform, and write in the same transform, an extra shuffle step can be avoided.
- Annotations
- @experimental()
-
def
sortMergeTransform[K, R](keyClass: Class[K], read: Read[R]): SortMergeTransformReadBuilder[K, Iterable[R]]
Perform a SortedBucketScioContext.sortMergeGroupByKey() operation, then immediately apply a transformation function to the merged groups and re-write using the same bucketing key and hashing scheme.
Perform a SortedBucketScioContext.sortMergeGroupByKey() operation, then immediately apply a transformation function to the merged groups and re-write using the same bucketing key and hashing scheme. By applying the write, transform, and write in the same transform, an extra shuffle step can be avoided.
- Annotations
- @experimental()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )