com.twitter.scalding

GroupBuilder

class GroupBuilder extends Serializable

linear super types: Serializable, AnyRef, Any
known subclasses: CoGroupBuilder
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. Hide All
  2. Show all
  1. GroupBuilder
  2. Serializable
  3. AnyRef
  4. Any
Visibility
  1. Public
  2. All
Impl.
  1. Concrete
  2. Abstract

Instance constructors

  1. new GroupBuilder (groupFields: Fields)

Value Members

  1. def != (arg0: AnyRef) : Boolean

    attributes: final
    definition classes: AnyRef
  2. def != (arg0: Any) : Boolean

    o != arg0 is the same as !(o == (arg0)).

    o != arg0 is the same as !(o == (arg0)).

    arg0

    the object to compare against this object for dis-equality.

    returns

    false if the receiver object is equivalent to the argument; true otherwise.

    attributes: final
    definition classes: Any
  3. def ## () : Int

    attributes: final
    definition classes: AnyRef → Any
  4. def $asInstanceOf [T0] () : T0

    attributes: final
    definition classes: AnyRef
  5. def $isInstanceOf [T0] () : Boolean

    attributes: final
    definition classes: AnyRef
  6. def == (arg0: AnyRef) : Boolean

    o == arg0 is the same as if (o eq null) arg0 eq null else o.equals(arg0).

    o == arg0 is the same as if (o eq null) arg0 eq null else o.equals(arg0).

    arg0

    the object to compare against this object for equality.

    returns

    true if the receiver object is equivalent to the argument; false otherwise.

    attributes: final
    definition classes: AnyRef
  7. def == (arg0: Any) : Boolean

    o == arg0 is the same as o.equals(arg0).

    o == arg0 is the same as o.equals(arg0).

    arg0

    the object to compare against this object for equality.

    returns

    true if the receiver object is equivalent to the argument; false otherwise.

    attributes: final
    definition classes: Any
  8. def asInstanceOf [T0] : T0

    This method is used to cast the receiver object to be of type T0.

    This method is used to cast the receiver object to be of type T0.

    Note that the success of a cast at runtime is modulo Scala's erasure semantics. Therefore the expression1.asInstanceOf[String] will throw a ClassCastException at runtime, while the expressionList(1).asInstanceOf[List[String]] will not. In the latter example, because the type argument is erased as part of compilation it is not possible to check whether the contents of the list are of the requested typed.

    returns

    the receiver object.

    attributes: final
    definition classes: Any
  9. def average (f: Symbol) : GroupBuilder

  10. def average (f: (Fields, Fields)) : GroupBuilder

    uses a more stable online algorithm which should be suitable for large numbers of records similar to: http://en.

    uses a more stable online algorithm which should be suitable for large numbers of records similar to: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm

  11. def buffer (args: Fields)(b: cascading.operation.Buffer[_]) : GroupBuilder

  12. def clone () : AnyRef

    This method creates and returns a copy of the receiver object.

    This method creates and returns a copy of the receiver object.

    The default implementation of the clone method is platform dependent.

    returns

    a copy of the receiver object.

    attributes: protected
    definition classes: AnyRef
  13. def count [T] (fieldDef: (Fields, Fields))(fn: (T) ⇒ Boolean)(implicit arg0: TupleConverter[T]) : GroupBuilder

  14. def count (f: Symbol ='count) : GroupBuilder

      deprecated:
    1. Use size instead to match the scala.collections.Iterable API

  15. def dot [T] (left: Fields, right: Fields, result: Fields)(implicit ttconv: TupleConverter[(T, T)], ring: Ring[T], tconv: TupleConverter[T], tset: TupleSetter[T]) : GroupBuilder

  16. def drop (cnt: Int) : GroupBuilder

  17. def dropWhile [T] (f: Fields)(fn: (T) ⇒ Boolean)(implicit conv: TupleConverter[T]) : GroupBuilder

  18. def eq (arg0: AnyRef) : Boolean

    This method is used to test whether the argument (arg0) is a reference to the receiver object (this).

    This method is used to test whether the argument (arg0) is a reference to the receiver object (this).

    The eq method implements an [http://en.wikipedia.org/wiki/Equivalence_relation equivalence relation] on non-null instances of AnyRef: * It is reflexive: for any non-null instance x of type AnyRef, x.eq(x) returns true. * It is symmetric: for any non-null instances x and y of type AnyRef, x.eq(y) returns true if and only if y.eq(x) returns true. * It is transitive: for any non-null instances x, y, and z of type AnyRef if x.eq(y) returns true and y.eq(z) returns true, then x.eq(z) returns true.

    Additionally, the eq method has three other properties. * It is consistent: for any non-null instances x and y of type AnyRef, multiple invocations of x.eq(y) consistently returns true or consistently returns false. * For any non-null instance x of type AnyRef, x.eq(null) and null.eq(x) returns false. * null.eq(null) returns true.

    When overriding the equals or hashCode methods, it is important to ensure that their behavior is consistent with reference equality. Therefore, if two objects are references to each other (o1 eq o2), they should be equal to each other (o1 == o2) and they should hash to the same value (o1.hashCode == o2.hashCode).

    arg0

    the object to compare against this object for reference equality.

    returns

    true if the argument is a reference to the receiver object; false otherwise.

    attributes: final
    definition classes: AnyRef
  19. def equals (arg0: Any) : Boolean

    This method is used to compare the receiver object (this) with the argument object (arg0) for equivalence.

    This method is used to compare the receiver object (this) with the argument object (arg0) for equivalence.

    The default implementations of this method is an [http://en.wikipedia.org/wiki/Equivalence_relation equivalence relation]: * It is reflexive: for any instance x of type Any, x.equals(x) should return true. * It is symmetric: for any instances x and y of type Any, x.equals(y) should return true if and only if y.equals(x) returns true. * It is transitive: for any instances x, y, and z of type AnyRef if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.

    If you override this method, you should verify that your implementation remains an equivalence relation. Additionally, when overriding this method it is often necessary to override hashCode to ensure that objects that are "equal" (o1.equals(o2) returns true) hash to the same scala.Int (o1.hashCode.equals(o2.hashCode)).

    arg0

    the object to compare against this object for equality.

    returns

    true if the receiver object is equivalent to the argument; false otherwise.

    definition classes: AnyRef → Any
  20. def every (ev: (Pipe) ⇒ Every) : GroupBuilder

  21. var evs : List[(Pipe) ⇒ Every]

    This is the description of this Grouping in terms of a sequence of Every operations

    This is the description of this Grouping in terms of a sequence of Every operations

    attributes: protected
  22. def finalize () : Unit

    This method is called by the garbage collector on the receiver object when garbage collection determines that there are no more references to the object.

    This method is called by the garbage collector on the receiver object when garbage collection determines that there are no more references to the object.

    The details of when and if the finalize method are invoked, as well as the interaction between finalizeand non-local returns and exceptions, are all platform dependent.

    attributes: protected
    definition classes: AnyRef
  23. def foldLeft [X, T] (fieldDef: (Fields, Fields))(init: X)(fn: (X, T) ⇒ X)(implicit setter: TupleSetter[X], conv: TupleConverter[T]) : GroupBuilder

  24. def forall [T] (fieldDef: (Fields, Fields))(fn: (T) ⇒ Boolean)(implicit arg0: TupleConverter[T]) : GroupBuilder

  25. def forceToReducers : GroupBuilder

  26. def getClass () : java.lang.Class[_]

    Returns a representation that corresponds to the dynamic class of the receiver object.

    Returns a representation that corresponds to the dynamic class of the receiver object.

    The nature of the representation is platform dependent.

    returns

    a representation that corresponds to the dynamic class of the receiver object.

    attributes: final
    definition classes: AnyRef
  27. val groupFields : Fields

  28. def groupMode : GroupMode

  29. def hashCode () : Int

    Returns a hash code value for the object.

    Returns a hash code value for the object.

    The default hashing algorithm is platform dependent.

    Note that it is allowed for two objects to have identical hash codes (o1.hashCode.equals(o2.hashCode)) yet not be equal (o1.equals(o2) returns false). A degenerate implementation could always return 0. However, it is required that if two objects are equal (o1.equals(o2) returns true) that they have identical hash codes (o1.hashCode.equals(o2.hashCode)). Therefore, when overriding this method, be sure to verify that the behavior is consistent with the equals method.

    returns

    the hash code value for the object.

    definition classes: AnyRef → Any
  30. def head (f: Symbol*) : GroupBuilder

  31. def head (fd: (Fields, Fields)) : GroupBuilder

  32. def isInstanceOf [T0] : Boolean

    This method is used to test whether the dynamic type of the receiver object is T0.

    This method is used to test whether the dynamic type of the receiver object is T0.

    Note that the test result of the test is modulo Scala's erasure semantics. Therefore the expression1.isInstanceOf[String] will return false, while the expression List(1).isInstanceOf[List[String]] will return true. In the latter example, because the type argument is erased as part of compilation it is not possible to check whether the contents of the list are of the requested typed.

    returns

    true if the receiver object is an instance of erasure of type T0; false otherwise.

    attributes: final
    definition classes: Any
  33. var isReversed : Boolean

    attributes: protected
  34. def last (f: Symbol*) : GroupBuilder

  35. def last (fd: (Fields, Fields)) : GroupBuilder

  36. def mapReduceMap [T, X, U] (fieldDef: (Fields, Fields))(mapfn: (T) ⇒ X)(redfn: (X, X) ⇒ X)(mapfn2: (X) ⇒ U)(implicit startConv: TupleConverter[T], middleSetter: TupleSetter[X], middleConv: TupleConverter[X], endSetter: TupleSetter[U]) : GroupBuilder

    Type T is the type of the input field (input to map, T => X) Type X is the intermediate type, which your reduce function operates on (reduce is (X,X) => X) Type U is the final result type, (final map is: X => U)

    Type T is the type of the input field (input to map, T => X) Type X is the intermediate type, which your reduce function operates on (reduce is (X,X) => X) Type U is the final result type, (final map is: X => U)

    The previous output goes into the reduce function on the left, like foldLeft, so if your operation is faster for the accumulator to be on one side, be aware.

  37. def mapStream [T, X] (fieldDef: (Fields, Fields))(mapfn: (Iterator[T]) ⇒ TraversableOnce[X])(implicit conv: TupleConverter[T], setter: TupleSetter[X]) : GroupBuilder

    Corresponds to a Cascading Buffer which allows you to stream through the data, keeping some, dropping, scanning, etc.

    Corresponds to a Cascading Buffer which allows you to stream through the data, keeping some, dropping, scanning, etc... The iterator you are passed is lazy, and mapping will not trigger the entire evaluation. If you convert to a list (i.e. to reverse), you need to be aware that memory constraints may become an issue.

    WARNING: Any fields not referenced by the input fields will be aligned to the first output, and the final hadoop stream will have a length of the maximum of the output of this, and the input stream. So, if you change the length of your inputs, the other fields won't be aligned. YOU NEED TO INCLUDE ALL THE FIELDS YOU WANT TO KEEP ALIGNED IN THIS MAPPING! POB: This appears to be a Cascading design decision.

    WARNING: mapfn needs to be stateless. Multiple calls needs to be safe (no mutable state captured)

  38. def max (fieldDef: Symbol*) : GroupBuilder

  39. def max (fieldDef: (Fields, Fields)) : GroupBuilder

  40. def min (fieldDef: Symbol*) : GroupBuilder

  41. def min (fieldDef: (Fields, Fields)) : GroupBuilder

  42. def mkString (fieldDef: Symbol) : GroupBuilder

  43. def mkString (fieldDef: Symbol, sep: String) : GroupBuilder

  44. def mkString (fieldDef: Symbol, start: String, sep: String, end: String) : GroupBuilder

    these will only be called if a tuple is not passed, meaning just one column

    these will only be called if a tuple is not passed, meaning just one column

  45. def mkString (fieldDef: (Fields, Fields)) : GroupBuilder

  46. def mkString (fieldDef: (Fields, Fields), sep: String) : GroupBuilder

  47. def mkString (fieldDef: (Fields, Fields), start: String, sep: String, end: String) : GroupBuilder

  48. def ne (arg0: AnyRef) : Boolean

    o.ne(arg0) is the same as !(o.eq(arg0)).

    o.ne(arg0) is the same as !(o.eq(arg0)).

    arg0

    the object to compare against this object for reference dis-equality.

    returns

    false if the argument is not a reference to the receiver object; true otherwise.

    attributes: final
    definition classes: AnyRef
  49. def notify () : Unit

    Wakes up a single thread that is waiting on the receiver object's monitor.

    Wakes up a single thread that is waiting on the receiver object's monitor.

    attributes: final
    definition classes: AnyRef
  50. def notifyAll () : Unit

    Wakes up all threads that are waiting on the receiver object's monitor.

    Wakes up all threads that are waiting on the receiver object's monitor.

    attributes: final
    definition classes: AnyRef
  51. def overrideReducers (p: Pipe) : Pipe

    attributes: protected
  52. def pivot (fieldDef: (Fields, Fields), defaultVal: Any =null) : GroupBuilder

    Opposite of RichPipe.

    Opposite of RichPipe.unpivot. See SQL/Excel for more on this function converts a row-wise representation into a column-wise one. example: pivot(('feature, 'value) -> ('clicks, 'impressions, 'requests)) it will find the feature named "clicks", and put the value in the column with the field named clicks. Absent fields result in null unless a default value is provided. Unnamed output fields are ignored. NOTE: Duplicated fields will result in an error.

    Hint: if you want more precision, first do a map('value -> value) { x : AnyRef => Option(x) } and you will have non-nulls for all present values, and Nones for values that were present but previously null. All nulls in the final output will be those truly missing. Similarly, if you want to check if there are any items present that shouldn't be: map('feature -> 'feature) { fname : String => if (!goodFeatures(fname)) { throw new Exception("ohnoes") } else fname }

  53. def plus [T] (fs: Symbol*)(implicit monoid: Monoid[T], tconv: TupleConverter[T], tset: TupleSetter[T]) : GroupBuilder

  54. def plus [T] (fd: (Fields, Fields))(implicit monoid: Monoid[T], tconv: TupleConverter[T], tset: TupleSetter[T]) : GroupBuilder

    use Monoid.

    use Monoid.plus to compute a sum. Not called sum to avoid conflicting with standard sum Your Monoid[T] should be associated and commutative, else this doesn't make sense

  55. def reduce [T] (fieldDef: Symbol*)(fn: (T, T) ⇒ T)(implicit setter: TupleSetter[T], conv: TupleConverter[T]) : GroupBuilder

  56. def reduce [T] (fieldDef: (Fields, Fields))(fn: (T, T) ⇒ T)(implicit setter: TupleSetter[T], conv: TupleConverter[T]) : GroupBuilder

    apply an associative/commutative operation on the left field.

    apply an associative/commutative operation on the left field. Example: reduce(('mass,'allids)->('totalMass, 'idset)) { (left:(Double,Set[Long]),right:(Double,Set[Long])) => (left._1 + right._1, left._2 ++ right._2) } Equivalent to a mapReduceMap with trivial (identity) map functions.

    The previous output goes into the reduce function on the left, like foldLeft, so if your operation is faster for the accumulator to be on one side, be aware.

  57. def reducers (r: Int) : GroupBuilder

    Override the number of reducers used in the groupBy.

    Override the number of reducers used in the groupBy.

  58. def reverse : GroupBuilder

  59. def scanLeft [X, T] (fieldDef: (Fields, Fields))(init: X)(fn: (X, T) ⇒ X)(implicit setter: TupleSetter[X], conv: TupleConverter[T]) : GroupBuilder

    analog of standard scanLeft (@see scala.

    analog of standard scanLeft (@see scala.collection.Iterable.scanLeft ) This invalidates map-side aggregation, forces all data to be transferred to reducers. Use only if you REALLY have to.

    BEST PRACTICE: make sure init is an immutable object. NOTE: init needs to be serializable with Kryo (because we copy it for each grouping to avoid possible errors using a mutable init object).

  60. def schedule (name: String, pipe: Pipe) : Pipe

  61. def size (thisF: Fields) : GroupBuilder

  62. def size : GroupBuilder

  63. def sizeAveStdev (fieldDef: (Fields, Fields)) : GroupBuilder

    Compute the count, ave and stdard deviation in one pass example: g.

    Compute the count, ave and stdard deviation in one pass example: g.cntAveStdev('x -> ('cntx, 'avex, 'stdevx)) uses: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm

  64. def sortBy (f: Fields) : GroupBuilder

  65. var sortBy : Option[Fields]

    attributes: protected
  66. def sortWithTake [T] (f: (Fields, Fields), k: Int)(lt: (T, T) ⇒ Boolean)(implicit arg0: TupleConverter[T]) : GroupBuilder

  67. def sortedReverseTake [T] (f: (Fields, Fields), k: Int)(implicit conv: TupleConverter[T], ord: Ordering[T]) : GroupBuilder

  68. def sortedTake [T] (f: (Fields, Fields), k: Int)(implicit conv: TupleConverter[T], ord: Ordering[T]) : GroupBuilder

  69. def sum (f: Symbol) : GroupBuilder

  70. def sum (f: (Fields, Fields)) : GroupBuilder

  71. def synchronized [T0] (arg0: T0) : T0

    attributes: final
    definition classes: AnyRef
  72. def take (cnt: Int) : GroupBuilder

  73. def takeWhile [T] (f: Fields)(fn: (T) ⇒ Boolean)(implicit conv: TupleConverter[T]) : GroupBuilder

  74. def then (fn: (GroupBuilder) ⇒ GroupBuilder) : GroupBuilder

  75. def times [T] (fs: Symbol*)(implicit ring: Ring[T], tconv: TupleConverter[T], tset: TupleSetter[T]) : GroupBuilder

  76. def times [T] (fd: (Fields, Fields))(implicit ring: Ring[T], tconv: TupleConverter[T], tset: TupleSetter[T]) : GroupBuilder

  77. def toList [T] (fieldDef: (Fields, Fields))(implicit conv: TupleConverter[T]) : GroupBuilder

    Convert a subset of fields into a list of Tuples.

    Convert a subset of fields into a list of Tuples. Need to provide the types of the tuple fields. Note that the order of the tuples is not preserved: EVEN IF YOU GroupBuilder.sortBy! If you need ordering use sortedTake or sortBy + scanLeft

  78. def toString () : String

    Returns a string representation of the object.

    Returns a string representation of the object.

    The default representation is platform dependent.

    returns

    a string representation of the object.

    definition classes: AnyRef → Any
  79. def wait () : Unit

    attributes: final
    definition classes: AnyRef
  80. def wait (arg0: Long, arg1: Int) : Unit

    attributes: final
    definition classes: AnyRef
  81. def wait (arg0: Long) : Unit

    attributes: final
    definition classes: AnyRef

Inherited from Serializable

Inherited from AnyRef

Inherited from Any