0
Sponsored Links


Ad by Google
A Secondary sort is one of the very very important feature of Hadoop MapReduce framework. We know that the output of the MapReduce framework is sorted based on the key and it's by default, but the values are not sorted. In this tutorial,I am going to show you an example of Secondary sort but before implementing the secondary sort it's important to understand the custom data type and custom partitioner in Hadoop MapReduce framework.

What is Secondary Sort?
Secondary sorting in MapReduce framework is a technique to sort the output of the reducer based on the values unlike the default one where output of the MapReduce framework is sorted based on the key of the mapper/reducer. So the order of the values can be sorted either in ascending or descending order using secondary sort.

In this tutorial, I will show you an implementation example of Secondary sort, we will process the product review sample dataset, below is the columns of product review dataset.
reviewerID productID reviewText overAllRating reviewTime
And here is the sample of product review dataset -
31852 B002GZGI4E highly recommend 4 1252800000 
31922 B002GZGI4E not as expected  3 1252800987
32122 B002GZGI4E hat was annoying 4 3252800210
32121 B002GZGI4E i 'm not sure "  3 2252800210
12390 B002R0FABA it was worth a shot 3 2252800000
31852 B002R0FABA highly recommend 5 1252800000
31922 B002R0FABA not as expected  1 1252700120
Here we will sort the product review dataset based on reviewerID and their rating, if a reviewer given multiple rating for multiple products than rating should be comes in a descending order that mean, highest rating of a reviewer should come first.
Expected output-
productId=B002R0FABA, reviewerId=12390, reviewTxt=it was worth a shot, rating=3
productId=A002R0XOPQ, reviewerId=12890, reviewTxt=not as expected, rating=2
productId=B002R0XOPQ, reviewerId=31234, reviewTxt=overall ok, rating=3
productId=B002R0FABA, reviewerId=31852, reviewTxt=highly recommend, rating=5
productId=B002GZGI4E, reviewerId=31852, reviewTxt=highly recommend, rating=4
productId=B002GZGI4E, reviewerId=31922, reviewTxt=not as expected, rating=3
productId=B002R0FABA, reviewerId=31922, reviewTxt=not as expected, rating=1
productId=B002GZGI3E, reviewerId=32121, reviewTxt=i 'm not sure ", rating=3
productId=B002GZGI4E, reviewerId=32122, reviewTxt=hat was annoying, rating=4
We know that, output of MapReduce program is sorted based on the key of the mapper, but the above requirement will not full fill with the default behaviour of MapReduce, here we need to sort the values of the mapper to be sorted in descending order, and this could be done with the help of Secondary sorting. And to implement Secondary sort, we need to combine few utility together, below are the key classes needs to be implemented.
  • Custom key class for Mapper
  • Custom partitioner(CustomPartitioner.java)
  • Custom key comparator(KeyComparator.java)
  • Custom group comparator(GroupComparator.java)

Tools and Technologies we are using to solve this use case

  • Java 7
  • Eclipse Mars
  • Hadoop 2.7.1
  • Maven 3.3
  • Ubuntu 14(Linux OS)
Main Objects of this project are:
  • CustomKey.java
  • CustomPartitioner.java
  • KeyComparator.java
  • GroupComparator.java
  • ProductReviewVO.java
  • ProductMapper.java
  • ProductReducer.java
  • ReviewDriver.java
Step 1. Create a new maven project
Go to File Menu then New->Maven Project, and provide the required details, see the below attached screen.
Step 2. Edit pom.xml
Double click on your project's pom.xml file, it will looks like this with very limited information.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.javamakeuse.bd.poc</groupId>
  <artifactId>SecondarySorting</artifactId>
  <version>1.0</version>
  <name>ReduceSideJoin</name>
  <description>SecondarySorting Example in MapReduce</description>
</project>
Now edit this pom.xml file and add Hadoop dependencies, below is the complete pom.xml file, just copy and paste, it will work.
pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.javamakeuse.bd.poc</groupId>
  <artifactId>SecondarySorting</artifactId>
  <version>1.0</version>
  <name>SecondarySorting</name>
  <description>SecondarySorting Example in MapReduce</description>
  <dependencies>
  <dependency>
   <groupId>org.apache.hadoop</groupId>
   <artifactId>hadoop-client</artifactId>
   <version>2.7.1</version>
  </dependency>
  <dependency>
   <groupId>org.apache.hadoop</groupId>
   <artifactId>hadoop-mapreduce-client-core</artifactId>
   <version>2.7.1</version>
  </dependency>
 </dependencies>
 <build>
  <plugins>
   <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-compiler-plugin</artifactId>
    <configuration>
     <source>1.7</source>
     <target>1.7</target>
    </configuration>
   </plugin>
  </plugins>
 </build>
</project>
Step 3. CustomKey.java
CustomKey is a custom data type used as key in mapper and reducer phase,CustomKey is a combination of reviewerID and rating.
package com.javamakeuse.bd.poc;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.WritableComparable;

public class CustomKey implements WritableComparable<CustomKey> {
	private Integer reviewerID;
	private Integer rating;

	public Integer getReviewerID() {
		return reviewerID;
	}
	public void setReviewerID(Integer reviewerID) {
		this.reviewerID = reviewerID;
	}
	public Integer getRating() {
		return rating;
	}
	public void setRating(Integer rating) {
		this.rating = rating;
	}
	@Override
	public void write(DataOutput out) throws IOException {
		out.writeInt(reviewerID);
		out.writeInt(rating);

	}
	@Override
	public void readFields(DataInput in) throws IOException {
		reviewerID = in.readInt();
		rating = in.readInt();
	}
	@Override
	public int compareTo(CustomKey o) {
		int comparedValue = reviewerID.compareTo(o.reviewerID);
		if (comparedValue != 0) {
			return comparedValue;
		}
		return rating.compareTo(o.getRating());
	}

}
Step 4. ProductReviewVO.java
Custom data type use for value in MapReduce program, the output of the mapper and reducer should be the ProductReviewVO object.
package com.javamakeuse.bd.poc;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.Writable;

public class ProductReviewVO implements Writable {
	private String productId;
	private Integer reviewerId;
	private String reviewTxt;
	private int rating;

	public String getProductId() {
		return productId;
	}
	public void setProductId(String productId) {
		this.productId = productId;
	}
	public Integer getReviewerId() {
		return reviewerId;
	}
	public void setReviewerId(Integer reviewerId) {
		this.reviewerId = reviewerId;
	}
	public String getReviewTxt() {
		return reviewTxt;
	}
	public void setReviewTxt(String reviewTxt) {
		this.reviewTxt = reviewTxt;
	}
	public int getRating() {
		return rating;
	}
	public void setRating(int rating) {
		this.rating = rating;
	}
	@Override
	public void write(DataOutput out) throws IOException {
		out.writeUTF(productId);
		out.writeInt(reviewerId);
		out.writeUTF(reviewTxt);
		out.writeInt(rating);
	}
	@Override
	public void readFields(DataInput in) throws IOException {
		productId = in.readUTF();
		reviewerId = in.readInt();
		reviewTxt = in.readUTF();
		rating = in.readInt();
	}
	@Override
	public String toString() {
		return "ProductReviewVO [productId=" + productId + ", reviewerId=" + reviewerId + ", reviewTxt=" + reviewTxt
				+ ", rating=" + rating + "]";
	}

}
Step 5. CustomPartitioner.java
As our mapper key is custom key which is a combination of reviewerID and rating,so here we have to implement custom partitioner, the default one which is based on HashPartitioner may not solved our problem in all cases.
package com.javamakeuse.bd.poc;

import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Partitioner;

public class CustomPartitioner extends Partitioner<CustomKey, NullWritable> {

	@Override
	public int getPartition(CustomKey key, NullWritable value, int numPartitions) {
		// multiply by 127 to perform some mixing
		return Math.abs(key.getReviewerID() * 127) % numPartitions;
	}
}
Step 6. KeyComparator.java
Custom implementation of WritableComparator to sort the values by comparing rating of the reviewer.
package com.javamakeuse.bd.poc;

import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;

public class KeyComparator extends WritableComparator {
    protected KeyComparator() {
      super(CustomKey.class, true);
    }
    @Override
    public int compare(WritableComparable w1, WritableComparable w2) {
      CustomKey ip1 = (CustomKey) w1;
      CustomKey ip2 = (CustomKey) w2;
      int cmp = ip1.getReviewerID().compareTo(ip2.getReviewerID());
      if (cmp != 0) {
        return cmp;
      }
      return ip2.getRating().compareTo(ip1.getRating()); //reverse
    }
  }
Step 7. GroupComparator.java
Group the records based on the reviewerID.
package com.javamakeuse.bd.poc;

import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;

public class GroupComparator extends WritableComparator {
    protected GroupComparator() {
      super(CustomKey.class, true);
    }
    @Override
    public int compare(WritableComparable w1, WritableComparable w2) {
      CustomKey ip1 = (CustomKey) w1;
      CustomKey ip2 = (CustomKey) w2;
      return ip1.getReviewerID().compareTo(ip2.getReviewerID());
    }
  }
Step 8. ProductMapper.java
Mapper class to process the product review dataset and prepare the key value for the reducer
package com.javamakeuse.bd.poc;

import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class ProductMapper extends Mapper<LongWritable, Text, CustomKey, ProductReviewVO> {
	@Override
	protected void map(LongWritable key, Text value,
			Mapper<LongWritable, Text, CustomKey, ProductReviewVO>.Context context)
					throws IOException, InterruptedException {
		String[] columns = value.toString().split("\\t");
		if (columns.length > 3) {
			ProductReviewVO productReviewVO = new ProductReviewVO();
			productReviewVO.setReviewerId(Integer.parseInt(columns[0]));
			productReviewVO.setProductId(columns[1]);
			productReviewVO.setReviewTxt(columns[2]);
			productReviewVO.setRating(Integer.parseInt(columns[3]));
			CustomKey customKey = new CustomKey();
			customKey.setReviewerID(productReviewVO.getReviewerId());
			customKey.setRating(productReviewVO.getRating());

			context.write(customKey, productReviewVO);
		}
	}

}
Step 9. ProductReducer.java
Reducer class to process the mapper output and generate the output of the MapReduce program.
package com.javamakeuse.bd.poc;

import java.io.IOException;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Reducer;

public class ProductReducer extends Reducer<CustomKey, ProductReviewVO, NullWritable, ProductReviewVO> {

	NullWritable nullKey = NullWritable.get();

	@Override
	protected void reduce(CustomKey key, Iterable<ProductReviewVO> values,
			Reducer<CustomKey, ProductReviewVO, NullWritable, ProductReviewVO>.Context context)
					throws IOException, InterruptedException {
		for(ProductReviewVO productReviewVO:values){
			context.write(nullKey, productReviewVO);
		}
	}
}
Step 10. ReviewDriver.java
Driver class to execute the MapReduce program.
package com.javamakeuse.bd.poc;

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class ReviewDriver extends Configured implements Tool {
	public static void main(String[] args) {
		try {
			int status = ToolRunner.run(new ReviewDriver(), args);
			System.exit(status);
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

	@Override
	public int run(String[] args) throws Exception {
		if (args.length != 2) {
			System.err.printf("Usage: %s [generic options] <input1> <output>\n", getClass().getSimpleName());
			ToolRunner.printGenericCommandUsage(System.err);
			return -1;
		}
		Job job = Job.getInstance();
		job.setJarByClass(ReviewDriver.class);
		job.setJobName("ProductReview");
		// input path
		FileInputFormat.addInputPath(job, new Path(args[0]));

		// output path
		FileOutputFormat.setOutputPath(job, new Path(args[1]));

		job.setMapperClass(ProductMapper.class);
		job.setReducerClass(ProductReducer.class);
		job.setPartitionerClass(CustomPartitioner.class);
		job.setSortComparatorClass(KeyComparator.class);
		job.setGroupingComparatorClass(GroupComparator.class);

		job.setMapOutputKeyClass(CustomKey.class);
		job.setMapOutputValueClass(ProductReviewVO.class);

		job.setOutputKeyClass(NullWritable.class);
		job.setOutputValueClass(ProductReviewVO.class);

		return job.waitForCompletion(true) ? 0 : 1;

	}

}
Done, next to run this program, you can run it using any eclipse also, below are the steps to run using terminal.

Step 11. Steps to execute SecondarySorting project
i. Start Hadoop components,open your terminal and type
subodh@subodh-Inspiron-3520:~/software$ start-dfs.sh
subodh@subodh-Inspiron-3520:~/software$ start-yarn.sh
ii. Verify Hadoop started or not with jps command
subodh@subodh-Inspiron-3520:~/software$ jps
8385 NameNode
8547 DataNode
5701 org.eclipse.equinox.launcher_1.3.100.v20150511-1540.jar
9446 Jps
8918 ResourceManager
9054 NodeManager
8751 SecondaryNameNode
You can verify with web-ui also using "http://localhost:50070/explorer.html#/" url.
iii. Create input folder on HDFS using below command.
subodh@subodh-Inspiron-3520:~/software$ hadoop fs -mkdir /input
The above command will create an input folder on HDFS, you can verify it using web UI or hadoop fs -ls / command, Now time to move input file which we need to process, below is the command to copy the product_review.data input file on HDFS inside input folder.
subodh@subodh-Inspiron-3520:~$ hadoop fs -copyFromLocal /home/subodh/programs/input/product_review.data /input
Note - product_review.data dataset is available inside this project source code, you would be able to download it from our downloadable link of this project.

Step 12. Create & Execute jar file
We almost done,now create jar file of SecondarySorting source code. You can create jar file using eclipse or by using mvn package command.
To execute SecondarySorting-1.0.jar file use below command
hadoop jar /home/subodh/SecondarySorting-1.0.jar com.javamakeuse.bd.poc.ReviewDriver /input/product_review.data /output
Above will generate below output and also create an output folder with output of the SecondarySorting project.
16/04/06 23:01:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/04/06 23:01:55 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
16/04/06 23:01:55 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
16/04/06 23:01:56 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/04/06 23:01:56 INFO input.FileInputFormat: Total input paths to process : 1
16/04/06 23:01:56 INFO mapreduce.JobSubmitter: number of splits:1
16/04/06 23:01:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1432274046_0001
16/04/06 23:01:56 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/04/06 23:01:56 INFO mapreduce.Job: Running job: job_local1432274046_0001
16/04/06 23:01:56 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/04/06 23:01:56 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/04/06 23:01:56 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
16/04/06 23:01:56 INFO mapred.LocalJobRunner: Waiting for map tasks
16/04/06 23:01:56 INFO mapred.LocalJobRunner: Starting task: attempt_local1432274046_0001_m_000000_0
16/04/06 23:01:56 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/04/06 23:01:56 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
16/04/06 23:01:56 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/input/product_review.data:0+416
16/04/06 23:01:56 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/04/06 23:01:56 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/04/06 23:01:56 INFO mapred.MapTask: soft limit at 83886080
16/04/06 23:01:56 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/04/06 23:01:56 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/04/06 23:01:56 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/04/06 23:01:56 INFO mapred.LocalJobRunner: 
16/04/06 23:01:56 INFO mapred.MapTask: Starting flush of map output
16/04/06 23:01:56 INFO mapred.MapTask: Spilling map output
16/04/06 23:01:56 INFO mapred.MapTask: bufstart = 0; bufend = 407; bufvoid = 104857600
16/04/06 23:01:56 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214364(104857456); length = 33/6553600
16/04/06 23:01:56 INFO mapred.MapTask: Finished spill 0
16/04/06 23:01:56 INFO mapred.Task: Task:attempt_local1432274046_0001_m_000000_0 is done. And is in the process of committing
16/04/06 23:01:56 INFO mapred.LocalJobRunner: map
16/04/06 23:01:56 INFO mapred.Task: Task 'attempt_local1432274046_0001_m_000000_0' done.
16/04/06 23:01:56 INFO mapred.LocalJobRunner: Finishing task: attempt_local1432274046_0001_m_000000_0
16/04/06 23:01:56 INFO mapred.LocalJobRunner: map task executor complete.
16/04/06 23:01:56 INFO mapred.LocalJobRunner: Waiting for reduce tasks
16/04/06 23:01:56 INFO mapred.LocalJobRunner: Starting task: attempt_local1432274046_0001_r_000000_0
16/04/06 23:01:56 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/04/06 23:01:56 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
16/04/06 23:01:56 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@42c355f5
16/04/06 23:01:56 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=334338464, maxSingleShuffleLimit=83584616, mergeThreshold=220663392, ioSortFactor=10, memToMemMergeOutputsThreshold=10
16/04/06 23:01:56 INFO reduce.EventFetcher: attempt_local1432274046_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
16/04/06 23:01:56 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1432274046_0001_m_000000_0 decomp: 427 len: 431 to MEMORY
16/04/06 23:01:56 INFO reduce.InMemoryMapOutput: Read 427 bytes from map-output for attempt_local1432274046_0001_m_000000_0
16/04/06 23:01:56 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 427, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->427
16/04/06 23:01:56 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
16/04/06 23:01:56 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/04/06 23:01:56 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
16/04/06 23:01:56 INFO mapred.Merger: Merging 1 sorted segments
16/04/06 23:01:56 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 417 bytes
16/04/06 23:01:56 INFO reduce.MergeManagerImpl: Merged 1 segments, 427 bytes to disk to satisfy reduce memory limit
16/04/06 23:01:56 INFO reduce.MergeManagerImpl: Merging 1 files, 431 bytes from disk
16/04/06 23:01:56 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
16/04/06 23:01:56 INFO mapred.Merger: Merging 1 sorted segments
16/04/06 23:01:56 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 417 bytes
16/04/06 23:01:56 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/04/06 23:01:56 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
16/04/06 23:01:57 INFO mapreduce.Job: Job job_local1432274046_0001 running in uber mode : false
16/04/06 23:01:57 INFO mapreduce.Job:  map 100% reduce 0%
16/04/06 23:01:57 INFO mapred.Task: Task:attempt_local1432274046_0001_r_000000_0 is done. And is in the process of committing
16/04/06 23:01:57 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/04/06 23:01:57 INFO mapred.Task: Task attempt_local1432274046_0001_r_000000_0 is allowed to commit now
16/04/06 23:01:57 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1432274046_0001_r_000000_0' to hdfs://localhost:9000/output/_temporary/0/task_local1432274046_0001_r_000000
16/04/06 23:01:57 INFO mapred.LocalJobRunner: reduce > reduce
16/04/06 23:01:57 INFO mapred.Task: Task 'attempt_local1432274046_0001_r_000000_0' done.
16/04/06 23:01:57 INFO mapred.LocalJobRunner: Finishing task: attempt_local1432274046_0001_r_000000_0
16/04/06 23:01:57 INFO mapred.LocalJobRunner: reduce task executor complete.
16/04/06 23:01:58 INFO mapreduce.Job:  map 100% reduce 100%
16/04/06 23:01:58 INFO mapreduce.Job: Job job_local1432274046_0001 completed successfully
16/04/06 23:01:58 INFO mapreduce.Job: Counters: 35
	File System Counters
		FILE: Number of bytes read=21604
		FILE: Number of bytes written=579171
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=832
		HDFS: Number of bytes written=848
		HDFS: Number of read operations=13
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=4
	Map-Reduce Framework
		Map input records=9
		Map output records=9
		Map output bytes=407
		Map output materialized bytes=431
		Input split bytes=112
		Combine input records=0
		Combine output records=0
		Reduce input groups=7
		Reduce shuffle bytes=431
		Reduce input records=9
		Reduce output records=9
		Spilled Records=18
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=5
		Total committed heap usage (bytes)=496500736
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=416
	File Output Format Counters 
		Bytes Written=848
Step 13. Verify the output

That's it.

Download the complete example from here Source Code

Sponsored Links

0 comments:

Post a Comment