This method does a join between different types which can have a corresponding ReferenceMapping.
This method does a join between different types which can have a corresponding ReferenceMapping.
This method does a cartesian product between the two, then removes mismatched regions.
This is SLOW SLOW SLOW, and shouldn't be used for anything other than correctness-testing on realistic sized sets.
Performs a region join between two RDDs (broadcast join).
Performs a region join between two RDDs (broadcast join).
This implementation first _collects_ the left-side RDD; therefore, if the left-side RDD is large or otherwise idiosyncratic in a spatial sense (i.e. contains a set of regions whose unions overlap a significant fraction of the genome) then the performance of this implementation will likely be quite bad.
Once the left-side RDD is collected, its elements are reduced to their distinct unions; these can then be used to define the partitions over which the region-join will be computed.
The regions in the left-side are keyed by their corresponding partition (each such region should have exactly one partition). The regions in the right-side are also keyed by their corresponding partitions (here there can be more than one partition for a region, since a region may cross the boundaries of the partitions defined by the left-side).
Finally, within each separate partition, we essentially perform a cartesian-product-and-filter operation. The result is the region-join.
type of baseRDD
type of joinedRDD
The 'left' side of the join
The 'right' side of the join
implicit type of baseRDD
implicit type of joinedRDD
An RDD of pairs (x, y), where x is from baseRDD, y is from joinedRDD, and the region corresponding to x overlaps the region corresponding to y.
Contains multiple implementations of a 'region join', an operation that joins two sets of regions based on the spatial overlap between the regions.
Different implementations will have different performance characteristics -- and new implementations will likely be added in the future, see the notes to each individual method for more details.