org.htuple.examples
Class SecondarySort

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.htuple.examples.SecondarySort
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public final class SecondarySort
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool

An example MapReduce job showing how the Tuple and ShuffleUtils can be used in conjunction with each other to secondary sort people's names.


Nested Class Summary
static class SecondarySort.Map
          This map class simply tokenizes each input line, and emits a (tuple, line) pair, where the tuple contains the last and first name.
static class SecondarySort.Reduce
          The reducer just emits the map output values, allowing us to examine the resulting output and determine the results of the secondary sort.
 
Field Summary
static String[] EXAMPLE_NAMES
          Sample input used by this example job.
 
Constructor Summary
SecondarySort()
           
 
Method Summary
static void main(String[] args)
          Main entry point for the example.
 int run(String[] args)
          The MapReduce driver - setup and launch the job.
static void setupSecondarySort(org.apache.hadoop.conf.Configuration conf)
          Partition and group on just the last name; sort on both last and first name.
static Tuple stringToTuple(String line)
          Split the input line and return a Tuple representation of the last and first names.
static void writeInput(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path inputDir)
          Writes the contents of EXAMPLE_NAMES into a file in the job input directory in HDFS.
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

EXAMPLE_NAMES

public static final String[] EXAMPLE_NAMES
Sample input used by this example job.

Constructor Detail

SecondarySort

public SecondarySort()
Method Detail

main

public static void main(String[] args)
                 throws Exception
Main entry point for the example.

Parameters:
args - arguments
Throws:
Exception - when something goes wrong

writeInput

public static void writeInput(org.apache.hadoop.conf.Configuration conf,
                              org.apache.hadoop.fs.Path inputDir)
                       throws IOException
Writes the contents of EXAMPLE_NAMES into a file in the job input directory in HDFS.

Parameters:
conf - the Hadoop config
inputDir - the HDFS input directory where we'll write a file
Throws:
IOException - if something goes wrong

run

public int run(String[] args)
        throws Exception
The MapReduce driver - setup and launch the job.

Specified by:
run in interface org.apache.hadoop.util.Tool
Parameters:
args - the command-line arguments
Returns:
the process exit code
Throws:
Exception - if something goes wrong

setupSecondarySort

public static void setupSecondarySort(org.apache.hadoop.conf.Configuration conf)
Partition and group on just the last name; sort on both last and first name.

Parameters:
conf - the Hadoop config

stringToTuple

public static Tuple stringToTuple(String line)
Split the input line and return a Tuple representation of the last and first names.

Parameters:
line - a line containing a tab-delimited last and first name.
Returns:
a Tuple representation of the line


Copyright © 2013. All Rights Reserved.