Builder classes used internally to implement coGroups (joins).
Represents a result of CoGroup operation on two Grouped pipes.
represents a closed interval of time.
Sets up an implicit dateRange to use in your sources and an implicit timezone.
Mix this in for delimited schemes such as TSV or one-separated values By default, TSV is given
This is a base class for File-based sources
This handles the mapReduceMap work on the map-side of the operation.
Represents a grouping which is the transition from map to reduce phase in hadoop.
thrown when validateTaps fails
Allows working with an iterable object defined in the job (on the submitter) to be used within a Job as you would a Pipe/RichPipe
This class is used to construct unit tests for scalding jobs.
Represents sharded lists of items of type T
MapReduceMapBy Class
This handles the mapReduceMap work on the map-side of the operation.
Usually as soon as we open a source, we read and do some mapping operation on a single column or set of columns.
There are three ways to run jobs sourceStrictness is set to true
This just blindly uses the first public constructor with the same arity as the fields size
One separated value (commonly used by Pig)
Packs a tuple into any object with set methods, e.
* Below are some serializers for objects in the scalding project.
Scala 2.
Every source must have a correct toString method.
Memory only testing for unit tests
The fields here are ('offset, 'line)
This will automatically produce a globbed version of the given path.
Tab separated value source
Mixed in to both TupleConverter and TupleSetter to improve arity safety of cascading jobs before we run anything on Hadoop.
Represents a phase in a distributed computation on an input data source Wraps a cascading Pipe object, and holds the transformation done up until that point
The args class does a simple command line parsing.
Holds some coversion functions for dealing with strings as RichDate objects
This object has all the implicit functions and values that are used to make the scalding DSL.
Represents millisecond based duration (non-calendar based): seconds, minutes, hours calField should be a java.
RichDate adds some nice convenience functions to the Java date/calendar classes We commonly do Date/Time work in analysis jobs, so having these operations convenient is very helpful.
implicits for the type-safe DSL import TDsl.
Base class for classes which pack a Tuple into a serializable object.
Base class for objects which unpack an object into a tuple.
factory methods for TypedPipe