com.twitter.scalding

CoGroupBuilder

class CoGroupBuilder extends GroupBuilder

Builder classes used internally to implement coGroups (joins). Can also be used for more generalized joins, e.g., star joins.

linear super types: GroupBuilder, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. Hide All
  2. Show all
  1. CoGroupBuilder
  2. GroupBuilder
  3. Serializable
  4. AnyRef
  5. Any
Visibility
  1. Public
  2. All
Impl.
  1. Concrete
  2. Abstract

Instance constructors

  1. new CoGroupBuilder (groupFields: Fields, joinMode: JoinMode)

Value Members

  1. def != (arg0: AnyRef) : Boolean

    attributes: final
    definition classes: AnyRef
  2. def != (arg0: Any) : Boolean

    o != arg0 is the same as !(o == (arg0)).

    o != arg0 is the same as !(o == (arg0)).

    arg0

    the object to compare against this object for dis-equality.

    returns

    false if the receiver object is equivalent to the argument; true otherwise.

    attributes: final
    definition classes: Any
  3. def ## () : Int

    attributes: final
    definition classes: AnyRef → Any
  4. def $asInstanceOf [T0] () : T0

    attributes: final
    definition classes: AnyRef
  5. def $isInstanceOf [T0] () : Boolean

    attributes: final
    definition classes: AnyRef
  6. def == (arg0: AnyRef) : Boolean

    o == arg0 is the same as if (o eq null) arg0 eq null else o.equals(arg0).

    o == arg0 is the same as if (o eq null) arg0 eq null else o.equals(arg0).

    arg0

    the object to compare against this object for equality.

    returns

    true if the receiver object is equivalent to the argument; false otherwise.

    attributes: final
    definition classes: AnyRef
  7. def == (arg0: Any) : Boolean

    o == arg0 is the same as o.equals(arg0).

    o == arg0 is the same as o.equals(arg0).

    arg0

    the object to compare against this object for equality.

    returns

    true if the receiver object is equivalent to the argument; false otherwise.

    attributes: final
    definition classes: Any
  8. def asInstanceOf [T0] : T0

    This method is used to cast the receiver object to be of type T0.

    This method is used to cast the receiver object to be of type T0.

    Note that the success of a cast at runtime is modulo Scala's erasure semantics. Therefore the expression1.asInstanceOf[String] will throw a ClassCastException at runtime, while the expressionList(1).asInstanceOf[List[String]] will not. In the latter example, because the type argument is erased as part of compilation it is not possible to check whether the contents of the list are of the requested typed.

    returns

    the receiver object.

    attributes: final
    definition classes: Any
  9. def average (f: Symbol) : GroupBuilder

    definition classes: GroupBuilder
  10. def average (f: (Fields, Fields)) : GroupBuilder

    uses a more stable online algorithm which should be suitable for large numbers of records similar to: http://en.

    uses a more stable online algorithm which should be suitable for large numbers of records similar to: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm

    definition classes: GroupBuilder
  11. def buffer (args: Fields)(b: cascading.operation.Buffer[_]) : GroupBuilder

    definition classes: GroupBuilder
  12. def clone () : AnyRef

    This method creates and returns a copy of the receiver object.

    This method creates and returns a copy of the receiver object.

    The default implementation of the clone method is platform dependent.

    returns

    a copy of the receiver object.

    attributes: protected
    definition classes: AnyRef
  13. def coGroup (f: Fields, p: Pipe, j: JoinMode = ...) : CoGroupBuilder

  14. var coGroups : List[(Fields, Pipe, JoinMode)]

    attributes: protected
  15. def count [T] (fieldDef: (Fields, Fields))(fn: (T) ⇒ Boolean)(implicit arg0: TupleConverter[T]) : GroupBuilder

    definition classes: GroupBuilder
  16. def count (f: Symbol ='count) : GroupBuilder

    definition classes: GroupBuilder
      deprecated:
    1. Use size instead to match the scala.collections.Iterable API

  17. def dot [T] (left: Fields, right: Fields, result: Fields)(implicit ttconv: TupleConverter[(T, T)], ring: Ring[T], tconv: TupleConverter[T], tset: TupleSetter[T]) : GroupBuilder

    definition classes: GroupBuilder
  18. def drop (cnt: Int) : GroupBuilder

    definition classes: GroupBuilder
  19. def dropWhile [T] (f: Fields)(fn: (T) ⇒ Boolean)(implicit conv: TupleConverter[T]) : GroupBuilder

    definition classes: GroupBuilder
  20. def eq (arg0: AnyRef) : Boolean

    This method is used to test whether the argument (arg0) is a reference to the receiver object (this).

    This method is used to test whether the argument (arg0) is a reference to the receiver object (this).

    The eq method implements an [http://en.wikipedia.org/wiki/Equivalence_relation equivalence relation] on non-null instances of AnyRef: * It is reflexive: for any non-null instance x of type AnyRef, x.eq(x) returns true. * It is symmetric: for any non-null instances x and y of type AnyRef, x.eq(y) returns true if and only if y.eq(x) returns true. * It is transitive: for any non-null instances x, y, and z of type AnyRef if x.eq(y) returns true and y.eq(z) returns true, then x.eq(z) returns true.

    Additionally, the eq method has three other properties. * It is consistent: for any non-null instances x and y of type AnyRef, multiple invocations of x.eq(y) consistently returns true or consistently returns false. * For any non-null instance x of type AnyRef, x.eq(null) and null.eq(x) returns false. * null.eq(null) returns true.

    When overriding the equals or hashCode methods, it is important to ensure that their behavior is consistent with reference equality. Therefore, if two objects are references to each other (o1 eq o2), they should be equal to each other (o1 == o2) and they should hash to the same value (o1.hashCode == o2.hashCode).

    arg0

    the object to compare against this object for reference equality.

    returns

    true if the argument is a reference to the receiver object; false otherwise.

    attributes: final
    definition classes: AnyRef
  21. def equals (arg0: Any) : Boolean

    This method is used to compare the receiver object (this) with the argument object (arg0) for equivalence.

    This method is used to compare the receiver object (this) with the argument object (arg0) for equivalence.

    The default implementations of this method is an [http://en.wikipedia.org/wiki/Equivalence_relation equivalence relation]: * It is reflexive: for any instance x of type Any, x.equals(x) should return true. * It is symmetric: for any instances x and y of type Any, x.equals(y) should return true if and only if y.equals(x) returns true. * It is transitive: for any instances x, y, and z of type AnyRef if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.

    If you override this method, you should verify that your implementation remains an equivalence relation. Additionally, when overriding this method it is often necessary to override hashCode to ensure that objects that are "equal" (o1.equals(o2) returns true) hash to the same scala.Int (o1.hashCode.equals(o2.hashCode)).

    arg0

    the object to compare against this object for equality.

    returns

    true if the receiver object is equivalent to the argument; false otherwise.

    definition classes: AnyRef → Any
  22. def every (ev: (Pipe) ⇒ Every) : GroupBuilder

    definition classes: GroupBuilder
  23. var evs : List[(Pipe) ⇒ Every]

    This is the description of this Grouping in terms of a sequence of Every operations

    This is the description of this Grouping in terms of a sequence of Every operations

    attributes: protected
    definition classes: GroupBuilder
  24. def finalize () : Unit

    This method is called by the garbage collector on the receiver object when garbage collection determines that there are no more references to the object.

    This method is called by the garbage collector on the receiver object when garbage collection determines that there are no more references to the object.

    The details of when and if the finalize method are invoked, as well as the interaction between finalizeand non-local returns and exceptions, are all platform dependent.

    attributes: protected
    definition classes: AnyRef
  25. def foldLeft [X, T] (fieldDef: (Fields, Fields))(init: X)(fn: (X, T) ⇒ X)(implicit setter: TupleSetter[X], conv: TupleConverter[T]) : GroupBuilder

    definition classes: GroupBuilder
  26. def forall [T] (fieldDef: (Fields, Fields))(fn: (T) ⇒ Boolean)(implicit arg0: TupleConverter[T]) : GroupBuilder

    definition classes: GroupBuilder
  27. def forceToReducers : GroupBuilder

    definition classes: GroupBuilder
  28. def getClass () : java.lang.Class[_]

    Returns a representation that corresponds to the dynamic class of the receiver object.

    Returns a representation that corresponds to the dynamic class of the receiver object.

    The nature of the representation is platform dependent.

    returns

    a representation that corresponds to the dynamic class of the receiver object.

    attributes: final
    definition classes: AnyRef
  29. def groupMode : GroupMode

    definition classes: GroupBuilder
  30. def hashCode () : Int

    Returns a hash code value for the object.

    Returns a hash code value for the object.

    The default hashing algorithm is platform dependent.

    Note that it is allowed for two objects to have identical hash codes (o1.hashCode.equals(o2.hashCode)) yet not be equal (o1.equals(o2) returns false). A degenerate implementation could always return 0. However, it is required that if two objects are equal (o1.equals(o2) returns true) that they have identical hash codes (o1.hashCode.equals(o2.hashCode)). Therefore, when overriding this method, be sure to verify that the behavior is consistent with the equals method.

    returns

    the hash code value for the object.

    definition classes: AnyRef → Any
  31. def head (f: Symbol*) : GroupBuilder

    definition classes: GroupBuilder
  32. def head (fd: (Fields, Fields)) : GroupBuilder

    definition classes: GroupBuilder
  33. def isInstanceOf [T0] : Boolean

    This method is used to test whether the dynamic type of the receiver object is T0.

    This method is used to test whether the dynamic type of the receiver object is T0.

    Note that the test result of the test is modulo Scala's erasure semantics. Therefore the expression1.isInstanceOf[String] will return false, while the expression List(1).isInstanceOf[List[String]] will return true. In the latter example, because the type argument is erased as part of compilation it is not possible to check whether the contents of the list are of the requested typed.

    returns

    true if the receiver object is an instance of erasure of type T0; false otherwise.

    attributes: final
    definition classes: Any
  34. var isReversed : Boolean

    attributes: protected
    definition classes: GroupBuilder
  35. def last (f: Symbol*) : GroupBuilder

    definition classes: GroupBuilder
  36. def last (fd: (Fields, Fields)) : GroupBuilder

    definition classes: GroupBuilder
  37. def mapReduceMap [T, X, U] (fieldDef: (Fields, Fields))(mapfn: (T) ⇒ X)(redfn: (X, X) ⇒ X)(mapfn2: (X) ⇒ U)(implicit startConv: TupleConverter[T], middleSetter: TupleSetter[X], middleConv: TupleConverter[X], endSetter: TupleSetter[U]) : GroupBuilder

    Type T is the type of the input field (input to map, T => X) Type X is the intermediate type, which your reduce function operates on (reduce is (X,X) => X) Type U is the final result type, (final map is: X => U)

    Type T is the type of the input field (input to map, T => X) Type X is the intermediate type, which your reduce function operates on (reduce is (X,X) => X) Type U is the final result type, (final map is: X => U)

    The previous output goes into the reduce function on the left, like foldLeft, so if your operation is faster for the accumulator to be on one side, be aware.

    definition classes: GroupBuilder
  38. def mapStream [T, X] (fieldDef: (Fields, Fields))(mapfn: (Iterator[T]) ⇒ TraversableOnce[X])(implicit conv: TupleConverter[T], setter: TupleSetter[X]) : GroupBuilder

    Corresponds to a Cascading Buffer which allows you to stream through the data, keeping some, dropping, scanning, etc.

    Corresponds to a Cascading Buffer which allows you to stream through the data, keeping some, dropping, scanning, etc... The iterator you are passed is lazy, and mapping will not trigger the entire evaluation. If you convert to a list (i.e. to reverse), you need to be aware that memory constraints may become an issue.

    WARNING: Any fields not referenced by the input fields will be aligned to the first output, and the final hadoop stream will have a length of the maximum of the output of this, and the input stream. So, if you change the length of your inputs, the other fields won't be aligned. YOU NEED TO INCLUDE ALL THE FIELDS YOU WANT TO KEEP ALIGNED IN THIS MAPPING! POB: This appears to be a Cascading design decision.

    WARNING: mapfn needs to be stateless. Multiple calls needs to be safe (no mutable state captured)

    definition classes: GroupBuilder
  39. def max (fieldDef: Symbol*) : GroupBuilder

    definition classes: GroupBuilder
  40. def max (fieldDef: (Fields, Fields)) : GroupBuilder

    definition classes: GroupBuilder
  41. def min (fieldDef: Symbol*) : GroupBuilder

    definition classes: GroupBuilder
  42. def min (fieldDef: (Fields, Fields)) : GroupBuilder

    definition classes: GroupBuilder
  43. def mkString (fieldDef: Symbol) : GroupBuilder

    definition classes: GroupBuilder
  44. def mkString (fieldDef: Symbol, sep: String) : GroupBuilder

    definition classes: GroupBuilder
  45. def mkString (fieldDef: Symbol, start: String, sep: String, end: String) : GroupBuilder

    these will only be called if a tuple is not passed, meaning just one column

    these will only be called if a tuple is not passed, meaning just one column

    definition classes: GroupBuilder
  46. def mkString (fieldDef: (Fields, Fields)) : GroupBuilder

    definition classes: GroupBuilder
  47. def mkString (fieldDef: (Fields, Fields), sep: String) : GroupBuilder

    definition classes: GroupBuilder
  48. def mkString (fieldDef: (Fields, Fields), start: String, sep: String, end: String) : GroupBuilder

    definition classes: GroupBuilder
  49. def ne (arg0: AnyRef) : Boolean

    o.ne(arg0) is the same as !(o.eq(arg0)).

    o.ne(arg0) is the same as !(o.eq(arg0)).

    arg0

    the object to compare against this object for reference dis-equality.

    returns

    false if the argument is not a reference to the receiver object; true otherwise.

    attributes: final
    definition classes: AnyRef
  50. def notify () : Unit

    Wakes up a single thread that is waiting on the receiver object's monitor.

    Wakes up a single thread that is waiting on the receiver object's monitor.

    attributes: final
    definition classes: AnyRef
  51. def notifyAll () : Unit

    Wakes up all threads that are waiting on the receiver object's monitor.

    Wakes up all threads that are waiting on the receiver object's monitor.

    attributes: final
    definition classes: AnyRef
  52. def overrideReducers (p: Pipe) : Pipe

    attributes: protected
    definition classes: GroupBuilder
  53. def pivot (fieldDef: (Fields, Fields), defaultVal: Any =null) : GroupBuilder

    Opposite of RichPipe.

    Opposite of RichPipe.unpivot. See SQL/Excel for more on this function converts a row-wise representation into a column-wise one. example: pivot(('feature, 'value) -> ('clicks, 'impressions, 'requests)) it will find the feature named "clicks", and put the value in the column with the field named clicks. Absent fields result in null unless a default value is provided. Unnamed output fields are ignored. NOTE: Duplicated fields will result in an error.

    Hint: if you want more precision, first do a map('value -> value) { x : AnyRef => Option(x) } and you will have non-nulls for all present values, and Nones for values that were present but previously null. All nulls in the final output will be those truly missing. Similarly, if you want to check if there are any items present that shouldn't be: map('feature -> 'feature) { fname : String => if (!goodFeatures(fname)) { throw new Exception("ohnoes") } else fname }

    definition classes: GroupBuilder
  54. def plus [T] (fs: Symbol*)(implicit monoid: Monoid[T], tconv: TupleConverter[T], tset: TupleSetter[T]) : GroupBuilder

    definition classes: GroupBuilder
  55. def plus [T] (fd: (Fields, Fields))(implicit monoid: Monoid[T], tconv: TupleConverter[T], tset: TupleSetter[T]) : GroupBuilder

    use Monoid.

    use Monoid.plus to compute a sum. Not called sum to avoid conflicting with standard sum Your Monoid[T] should be associated and commutative, else this doesn't make sense

    definition classes: GroupBuilder
  56. def reduce [T] (fieldDef: Symbol*)(fn: (T, T) ⇒ T)(implicit setter: TupleSetter[T], conv: TupleConverter[T]) : GroupBuilder

    definition classes: GroupBuilder
  57. def reduce [T] (fieldDef: (Fields, Fields))(fn: (T, T) ⇒ T)(implicit setter: TupleSetter[T], conv: TupleConverter[T]) : GroupBuilder

    apply an associative/commutative operation on the left field.

    apply an associative/commutative operation on the left field. Example: reduce(('mass,'allids)->('totalMass, 'idset)) { (left:(Double,Set[Long]),right:(Double,Set[Long])) => (left._1 + right._1, left._2 ++ right._2) } Equivalent to a mapReduceMap with trivial (identity) map functions.

    The previous output goes into the reduce function on the left, like foldLeft, so if your operation is faster for the accumulator to be on one side, be aware.

    definition classes: GroupBuilder
  58. def reducers (r: Int) : GroupBuilder

    Override the number of reducers used in the groupBy.

    Override the number of reducers used in the groupBy.

    definition classes: GroupBuilder
  59. def reverse : GroupBuilder

    definition classes: GroupBuilder
  60. def scanLeft [X, T] (fieldDef: (Fields, Fields))(init: X)(fn: (X, T) ⇒ X)(implicit setter: TupleSetter[X], conv: TupleConverter[T]) : GroupBuilder

    analog of standard scanLeft (@see scala.

    analog of standard scanLeft (@see scala.collection.Iterable.scanLeft ) This invalidates map-side aggregation, forces all data to be transferred to reducers. Use only if you REALLY have to.

    BEST PRACTICE: make sure init is an immutable object. NOTE: init needs to be serializable with Kryo (because we copy it for each grouping to avoid possible errors using a mutable init object).

    definition classes: GroupBuilder
  61. def schedule (name: String, pipe: Pipe) : Pipe

    definition classes: CoGroupBuilderGroupBuilder
  62. def size (thisF: Fields) : GroupBuilder

    definition classes: GroupBuilder
  63. def size : GroupBuilder

    definition classes: GroupBuilder
  64. def sizeAveStdev (fieldDef: (Fields, Fields)) : GroupBuilder

    Compute the count, ave and stdard deviation in one pass example: g.

    Compute the count, ave and stdard deviation in one pass example: g.cntAveStdev('x -> ('cntx, 'avex, 'stdevx)) uses: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm

    definition classes: GroupBuilder
  65. def sortBy (f: Fields) : GroupBuilder

    definition classes: GroupBuilder
  66. var sortBy : Option[Fields]

    attributes: protected
    definition classes: GroupBuilder
  67. def sortWithTake [T] (f: (Fields, Fields), k: Int)(lt: (T, T) ⇒ Boolean)(implicit arg0: TupleConverter[T]) : GroupBuilder

    definition classes: GroupBuilder
  68. def sortedReverseTake [T] (f: (Fields, Fields), k: Int)(implicit conv: TupleConverter[T], ord: Ordering[T]) : GroupBuilder

    definition classes: GroupBuilder
  69. def sortedTake [T] (f: (Fields, Fields), k: Int)(implicit conv: TupleConverter[T], ord: Ordering[T]) : GroupBuilder

    definition classes: GroupBuilder
  70. def sum (f: Symbol) : GroupBuilder

    definition classes: GroupBuilder
  71. def sum (f: (Fields, Fields)) : GroupBuilder

    definition classes: GroupBuilder
  72. def synchronized [T0] (arg0: T0) : T0

    attributes: final
    definition classes: AnyRef
  73. def take (cnt: Int) : GroupBuilder

    definition classes: GroupBuilder
  74. def takeWhile [T] (f: Fields)(fn: (T) ⇒ Boolean)(implicit conv: TupleConverter[T]) : GroupBuilder

    definition classes: GroupBuilder
  75. def then (fn: (GroupBuilder) ⇒ GroupBuilder) : GroupBuilder

    definition classes: GroupBuilder
  76. def times [T] (fs: Symbol*)(implicit ring: Ring[T], tconv: TupleConverter[T], tset: TupleSetter[T]) : GroupBuilder

    definition classes: GroupBuilder
  77. def times [T] (fd: (Fields, Fields))(implicit ring: Ring[T], tconv: TupleConverter[T], tset: TupleSetter[T]) : GroupBuilder

    definition classes: GroupBuilder
  78. def toList [T] (fieldDef: (Fields, Fields))(implicit conv: TupleConverter[T]) : GroupBuilder

    Convert a subset of fields into a list of Tuples.

    Convert a subset of fields into a list of Tuples. Need to provide the types of the tuple fields. Note that the order of the tuples is not preserved: EVEN IF YOU GroupBuilder.sortBy! If you need ordering use sortedTake or sortBy + scanLeft

    definition classes: GroupBuilder
  79. def toString () : String

    Returns a string representation of the object.

    Returns a string representation of the object.

    The default representation is platform dependent.

    returns

    a string representation of the object.

    definition classes: AnyRef → Any
  80. def wait () : Unit

    attributes: final
    definition classes: AnyRef
  81. def wait (arg0: Long, arg1: Int) : Unit

    attributes: final
    definition classes: AnyRef
  82. def wait (arg0: Long) : Unit

    attributes: final
    definition classes: AnyRef

Inherited from GroupBuilder

Inherited from Serializable

Inherited from AnyRef

Inherited from Any