o != arg0 is the same as !(o == (arg0)).
o != arg0 is the same as !(o == (arg0)).
the object to compare against this object for dis-equality.
false if the receiver object is equivalent to the argument; true otherwise.
o == arg0 is the same as if (o eq null) arg0 eq null else o.equals(arg0).
o == arg0 is the same as if (o eq null) arg0 eq null else o.equals(arg0).
the object to compare against this object for equality.
true if the receiver object is equivalent to the argument; false otherwise.
o == arg0 is the same as o.equals(arg0).
o == arg0 is the same as o.equals(arg0).
the object to compare against this object for equality.
true if the receiver object is equivalent to the argument; false otherwise.
This method is used to cast the receiver object to be of type T0.
This method is used to cast the receiver object to be of type T0.
Note that the success of a cast at runtime is modulo Scala's erasure semantics. Therefore the expression1.asInstanceOf[String] will throw a ClassCastException at runtime, while the expressionList(1).asInstanceOf[List[String]] will not. In the latter example, because the type argument is erased as
part of compilation it is not possible to check whether the contents of the list are of the requested typed.
the receiver object.
uses a more stable online algorithm which should be suitable for large numbers of records similar to: http://en.
uses a more stable online algorithm which should be suitable for large numbers of records similar to: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
This method creates and returns a copy of the receiver object.
This method creates and returns a copy of the receiver object.
The default implementation of the clone method is platform dependent.
a copy of the receiver object.
Use size instead to match the scala.collections.Iterable API
This method is used to test whether the argument (arg0) is a reference to the
receiver object (this).
This method is used to test whether the argument (arg0) is a reference to the
receiver object (this).
The eq method implements an [http://en.wikipedia.org/wiki/Equivalence_relation equivalence relation] on
non-null instances of AnyRef:
* It is reflexive: for any non-null instance x of type AnyRef, x.eq(x) returns true.
* It is symmetric: for any non-null instances x and y of type AnyRef, x.eq(y) returns true if and
only if y.eq(x) returns true.
* It is transitive: for any non-null instances x, y, and z of type AnyRef if x.eq(y) returns true and y.eq(z) returns true, then x.eq(z) returns true.
Additionally, the eq method has three other properties.
* It is consistent: for any non-null instances x and y of type AnyRef, multiple invocations of
x.eq(y) consistently returns true or consistently returns false.
* For any non-null instance x of type AnyRef, x.eq(null) and null.eq(x) returns false.
* null.eq(null) returns true.
When overriding the equals or hashCode methods, it is important to ensure that their behavior is
consistent with reference equality. Therefore, if two objects are references to each other (o1 eq o2), they
should be equal to each other (o1 == o2) and they should hash to the same value (o1.hashCode == o2.hashCode).
the object to compare against this object for reference equality.
true if the argument is a reference to the receiver object; false otherwise.
This method is used to compare the receiver object (this) with the argument object (arg0) for equivalence.
This method is used to compare the receiver object (this) with the argument object (arg0) for equivalence.
The default implementations of this method is an [http://en.wikipedia.org/wiki/Equivalence_relation equivalence
relation]:
* It is reflexive: for any instance x of type Any, x.equals(x) should return true.
* It is symmetric: for any instances x and y of type Any, x.equals(y) should return true if and
only if y.equals(x) returns true.
* It is transitive: for any instances x, y, and z of type AnyRef if x.equals(y) returns true and
y.equals(z) returns true, then x.equals(z) should return true.
If you override this method, you should verify that your implementation remains an equivalence relation.
Additionally, when overriding this method it is often necessary to override hashCode to ensure that objects
that are "equal" (o1.equals(o2) returns true) hash to the same
scala.Int
(o1.hashCode.equals(o2.hashCode)).
the object to compare against this object for equality.
true if the receiver object is equivalent to the argument; false otherwise.
This is the description of this Grouping in terms of a sequence of Every operations
This is the description of this Grouping in terms of a sequence of Every operations
This method is called by the garbage collector on the receiver object when garbage collection determines that there are no more references to the object.
This method is called by the garbage collector on the receiver object when garbage collection determines that there are no more references to the object.
The details of when and if the finalize method are invoked, as well as the interaction between finalizeand non-local returns and exceptions, are all platform dependent.
Returns a representation that corresponds to the dynamic class of the receiver object.
Returns a representation that corresponds to the dynamic class of the receiver object.
The nature of the representation is platform dependent.
a representation that corresponds to the dynamic class of the receiver object.
Returns a hash code value for the object.
Returns a hash code value for the object.
The default hashing algorithm is platform dependent.
Note that it is allowed for two objects to have identical hash codes (o1.hashCode.equals(o2.hashCode)) yet
not be equal (o1.equals(o2) returns false). A degenerate implementation could always return 0.
However, it is required that if two objects are equal (o1.equals(o2) returns true) that they have
identical hash codes (o1.hashCode.equals(o2.hashCode)). Therefore, when overriding this method, be sure
to verify that the behavior is consistent with the equals method.
the hash code value for the object.
This method is used to test whether the dynamic type of the receiver object is T0.
This method is used to test whether the dynamic type of the receiver object is T0.
Note that the test result of the test is modulo Scala's erasure semantics. Therefore the expression1.isInstanceOf[String] will return false, while the expression List(1).isInstanceOf[List[String]] will
return true. In the latter example, because the type argument is erased as part of compilation it is not
possible to check whether the contents of the list are of the requested typed.
true if the receiver object is an instance of erasure of type T0; false otherwise.
Type T is the type of the input field (input to map, T => X) Type X is the intermediate type, which your reduce function operates on (reduce is (X,X) => X) Type U is the final result type, (final map is: X => U)
Type T is the type of the input field (input to map, T => X) Type X is the intermediate type, which your reduce function operates on (reduce is (X,X) => X) Type U is the final result type, (final map is: X => U)
The previous output goes into the reduce function on the left, like foldLeft, so if your operation is faster for the accumulator to be on one side, be aware.
Corresponds to a Cascading Buffer which allows you to stream through the data, keeping some, dropping, scanning, etc.
Corresponds to a Cascading Buffer which allows you to stream through the data, keeping some, dropping, scanning, etc... The iterator you are passed is lazy, and mapping will not trigger the entire evaluation. If you convert to a list (i.e. to reverse), you need to be aware that memory constraints may become an issue.
WARNING: Any fields not referenced by the input fields will be aligned to the first output, and the final hadoop stream will have a length of the maximum of the output of this, and the input stream. So, if you change the length of your inputs, the other fields won't be aligned. YOU NEED TO INCLUDE ALL THE FIELDS YOU WANT TO KEEP ALIGNED IN THIS MAPPING! POB: This appears to be a Cascading design decision.
WARNING: mapfn needs to be stateless. Multiple calls needs to be safe (no mutable state captured)
these will only be called if a tuple is not passed, meaning just one column
these will only be called if a tuple is not passed, meaning just one column
o.ne(arg0) is the same as !(o.eq(arg0)).
o.ne(arg0) is the same as !(o.eq(arg0)).
the object to compare against this object for reference dis-equality.
false if the argument is not a reference to the receiver object; true otherwise.
Wakes up a single thread that is waiting on the receiver object's monitor.
Wakes up a single thread that is waiting on the receiver object's monitor.
Wakes up all threads that are waiting on the receiver object's monitor.
Wakes up all threads that are waiting on the receiver object's monitor.
Opposite of RichPipe.
Opposite of RichPipe.unpivot. See SQL/Excel for more on this function converts a row-wise representation into a column-wise one. example: pivot(('feature, 'value) -> ('clicks, 'impressions, 'requests)) it will find the feature named "clicks", and put the value in the column with the field named clicks. Absent fields result in null unless a default value is provided. Unnamed output fields are ignored. NOTE: Duplicated fields will result in an error.
Hint: if you want more precision, first do a map('value -> value) { x : AnyRef => Option(x) } and you will have non-nulls for all present values, and Nones for values that were present but previously null. All nulls in the final output will be those truly missing. Similarly, if you want to check if there are any items present that shouldn't be: map('feature -> 'feature) { fname : String => if (!goodFeatures(fname)) { throw new Exception("ohnoes") } else fname }
use Monoid.
use Monoid.plus to compute a sum. Not called sum to avoid conflicting with standard sum Your Monoid[T] should be associated and commutative, else this doesn't make sense
apply an associative/commutative operation on the left field.
apply an associative/commutative operation on the left field. Example: reduce(('mass,'allids)->('totalMass, 'idset)) { (left:(Double,Set[Long]),right:(Double,Set[Long])) => (left._1 + right._1, left._2 ++ right._2) } Equivalent to a mapReduceMap with trivial (identity) map functions.
The previous output goes into the reduce function on the left, like foldLeft, so if your operation is faster for the accumulator to be on one side, be aware.
Override the number of reducers used in the groupBy.
Override the number of reducers used in the groupBy.
analog of standard scanLeft (@see scala.
analog of standard scanLeft (@see scala.collection.Iterable.scanLeft ) This invalidates map-side aggregation, forces all data to be transferred to reducers. Use only if you REALLY have to.
BEST PRACTICE: make sure init is an immutable object. NOTE: init needs to be serializable with Kryo (because we copy it for each grouping to avoid possible errors using a mutable init object).
Compute the count, ave and stdard deviation in one pass example: g.
Compute the count, ave and stdard deviation in one pass example: g.cntAveStdev('x -> ('cntx, 'avex, 'stdevx)) uses: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
Convert a subset of fields into a list of Tuples.
Convert a subset of fields into a list of Tuples. Need to provide the types of the tuple fields. Note that the order of the tuples is not preserved: EVEN IF YOU GroupBuilder.sortBy! If you need ordering use sortedTake or sortBy + scanLeft
Returns a string representation of the object.
Returns a string representation of the object.
The default representation is platform dependent.
a string representation of the object.
Builder classes used internally to implement coGroups (joins). Can also be used for more generalized joins, e.g., star joins.