Sponsored Links
Ad by Google
A Secondary sort is one of the very very important feature of Hadoop MapReduce framework. We know that the output of the MapReduce framework is sorted based on the key and it's by default, but the values are not sorted. In this tutorial,I am going to show you an example of Secondary sort but before implementing the secondary sort it's important to understand the custom data type and custom partitioner in Hadoop MapReduce framework.
What is Secondary Sort?
Secondary sorting in MapReduce framework is a technique to sort the output of the reducer based on the values unlike the default one where output of the MapReduce framework is sorted based on the key of the mapper/reducer. So the order of the values can be sorted either in ascending or descending order using secondary sort.
In this tutorial, I will show you an implementation example of Secondary sort, we will process the product review sample dataset, below is the columns of product review dataset.
reviewerID productID reviewText overAllRating reviewTime
And here is the sample of product review dataset -
Expected output-
Go to File Menu then New->Maven Project, and provide the required details, see the below attached screen.
Step 2. Edit pom.xml
Double click on your project's pom.xml file, it will looks like this with very limited information.
pom.xml
CustomKey is a custom data type used as key in mapper and reducer phase,CustomKey is a combination of reviewerID and rating.
Custom data type use for value in MapReduce program, the output of the mapper and reducer should be the ProductReviewVO object.
As our mapper key is custom key which is a combination of reviewerID and rating,so here we have to implement custom partitioner, the default one which is based on HashPartitioner may not solved our problem in all cases.
Custom implementation of WritableComparator to sort the values by comparing rating of the reviewer.
Group the records based on the reviewerID.
Mapper class to process the product review dataset and prepare the key value for the reducer
Reducer class to process the mapper output and generate the output of the MapReduce program.
Driver class to execute the MapReduce program.
Step 11. Steps to execute SecondarySorting project
i. Start Hadoop components,open your terminal and type
iii. Create input folder on HDFS using below command.
Step 12. Create & Execute jar file
We almost done,now create jar file of SecondarySorting source code. You can create jar file using eclipse or by using mvn package command.
To execute SecondarySorting-1.0.jar file use below command
That's it.
What is Secondary Sort?
Secondary sorting in MapReduce framework is a technique to sort the output of the reducer based on the values unlike the default one where output of the MapReduce framework is sorted based on the key of the mapper/reducer. So the order of the values can be sorted either in ascending or descending order using secondary sort.
In this tutorial, I will show you an implementation example of Secondary sort, we will process the product review sample dataset, below is the columns of product review dataset.
reviewerID productID reviewText overAllRating reviewTime
And here is the sample of product review dataset -
31852 B002GZGI4E highly recommend 4 1252800000 31922 B002GZGI4E not as expected 3 1252800987 32122 B002GZGI4E hat was annoying 4 3252800210 32121 B002GZGI4E i 'm not sure " 3 2252800210 12390 B002R0FABA it was worth a shot 3 2252800000 31852 B002R0FABA highly recommend 5 1252800000 31922 B002R0FABA not as expected 1 1252700120Here we will sort the product review dataset based on reviewerID and their rating, if a reviewer given multiple rating for multiple products than rating should be comes in a descending order that mean, highest rating of a reviewer should come first.
Expected output-
productId=B002R0FABA, reviewerId=12390, reviewTxt=it was worth a shot, rating=3 productId=A002R0XOPQ, reviewerId=12890, reviewTxt=not as expected, rating=2 productId=B002R0XOPQ, reviewerId=31234, reviewTxt=overall ok, rating=3 productId=B002R0FABA, reviewerId=31852, reviewTxt=highly recommend, rating=5 productId=B002GZGI4E, reviewerId=31852, reviewTxt=highly recommend, rating=4 productId=B002GZGI4E, reviewerId=31922, reviewTxt=not as expected, rating=3 productId=B002R0FABA, reviewerId=31922, reviewTxt=not as expected, rating=1 productId=B002GZGI3E, reviewerId=32121, reviewTxt=i 'm not sure ", rating=3 productId=B002GZGI4E, reviewerId=32122, reviewTxt=hat was annoying, rating=4We know that, output of MapReduce program is sorted based on the key of the mapper, but the above requirement will not full fill with the default behaviour of MapReduce, here we need to sort the values of the mapper to be sorted in descending order, and this could be done with the help of Secondary sorting. And to implement Secondary sort, we need to combine few utility together, below are the key classes needs to be implemented.
- Custom key class for Mapper
- Custom partitioner(CustomPartitioner.java)
- Custom key comparator(KeyComparator.java)
- Custom group comparator(GroupComparator.java)
Tools and Technologies we are using to solve this use case
- Java 7
- Eclipse Mars
- Hadoop 2.7.1
- Maven 3.3
- Ubuntu 14(Linux OS)
- CustomKey.java
- CustomPartitioner.java
- KeyComparator.java
- GroupComparator.java
- ProductReviewVO.java
- ProductMapper.java
- ProductReducer.java
- ReviewDriver.java
Go to File Menu then New->Maven Project, and provide the required details, see the below attached screen.
Step 2. Edit pom.xml
Double click on your project's pom.xml file, it will looks like this with very limited information.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.javamakeuse.bd.poc</groupId> <artifactId>SecondarySorting</artifactId> <version>1.0</version> <name>ReduceSideJoin</name> <description>SecondarySorting Example in MapReduce</description> </project>Now edit this pom.xml file and add Hadoop dependencies, below is the complete pom.xml file, just copy and paste, it will work.
pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.javamakeuse.bd.poc</groupId> <artifactId>SecondarySorting</artifactId> <version>1.0</version> <name>SecondarySorting</name> <description>SecondarySorting Example in MapReduce</description> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.7.1</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>2.7.1</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>1.7</source> <target>1.7</target> </configuration> </plugin> </plugins> </build> </project>Step 3. CustomKey.java
CustomKey is a custom data type used as key in mapper and reducer phase,CustomKey is a combination of reviewerID and rating.
package com.javamakeuse.bd.poc; import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; import org.apache.hadoop.io.WritableComparable; public class CustomKey implements WritableComparable<CustomKey> { private Integer reviewerID; private Integer rating; public Integer getReviewerID() { return reviewerID; } public void setReviewerID(Integer reviewerID) { this.reviewerID = reviewerID; } public Integer getRating() { return rating; } public void setRating(Integer rating) { this.rating = rating; } @Override public void write(DataOutput out) throws IOException { out.writeInt(reviewerID); out.writeInt(rating); } @Override public void readFields(DataInput in) throws IOException { reviewerID = in.readInt(); rating = in.readInt(); } @Override public int compareTo(CustomKey o) { int comparedValue = reviewerID.compareTo(o.reviewerID); if (comparedValue != 0) { return comparedValue; } return rating.compareTo(o.getRating()); } }Step 4. ProductReviewVO.java
Custom data type use for value in MapReduce program, the output of the mapper and reducer should be the ProductReviewVO object.
package com.javamakeuse.bd.poc; import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; import org.apache.hadoop.io.Writable; public class ProductReviewVO implements Writable { private String productId; private Integer reviewerId; private String reviewTxt; private int rating; public String getProductId() { return productId; } public void setProductId(String productId) { this.productId = productId; } public Integer getReviewerId() { return reviewerId; } public void setReviewerId(Integer reviewerId) { this.reviewerId = reviewerId; } public String getReviewTxt() { return reviewTxt; } public void setReviewTxt(String reviewTxt) { this.reviewTxt = reviewTxt; } public int getRating() { return rating; } public void setRating(int rating) { this.rating = rating; } @Override public void write(DataOutput out) throws IOException { out.writeUTF(productId); out.writeInt(reviewerId); out.writeUTF(reviewTxt); out.writeInt(rating); } @Override public void readFields(DataInput in) throws IOException { productId = in.readUTF(); reviewerId = in.readInt(); reviewTxt = in.readUTF(); rating = in.readInt(); } @Override public String toString() { return "ProductReviewVO [productId=" + productId + ", reviewerId=" + reviewerId + ", reviewTxt=" + reviewTxt + ", rating=" + rating + "]"; } }Step 5. CustomPartitioner.java
As our mapper key is custom key which is a combination of reviewerID and rating,so here we have to implement custom partitioner, the default one which is based on HashPartitioner may not solved our problem in all cases.
package com.javamakeuse.bd.poc; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.mapreduce.Partitioner; public class CustomPartitioner extends Partitioner<CustomKey, NullWritable> { @Override public int getPartition(CustomKey key, NullWritable value, int numPartitions) { // multiply by 127 to perform some mixing return Math.abs(key.getReviewerID() * 127) % numPartitions; } }Step 6. KeyComparator.java
Custom implementation of WritableComparator to sort the values by comparing rating of the reviewer.
package com.javamakeuse.bd.poc; import org.apache.hadoop.io.WritableComparable; import org.apache.hadoop.io.WritableComparator; public class KeyComparator extends WritableComparator { protected KeyComparator() { super(CustomKey.class, true); } @Override public int compare(WritableComparable w1, WritableComparable w2) { CustomKey ip1 = (CustomKey) w1; CustomKey ip2 = (CustomKey) w2; int cmp = ip1.getReviewerID().compareTo(ip2.getReviewerID()); if (cmp != 0) { return cmp; } return ip2.getRating().compareTo(ip1.getRating()); //reverse } }Step 7. GroupComparator.java
Group the records based on the reviewerID.
package com.javamakeuse.bd.poc; import org.apache.hadoop.io.WritableComparable; import org.apache.hadoop.io.WritableComparator; public class GroupComparator extends WritableComparator { protected GroupComparator() { super(CustomKey.class, true); } @Override public int compare(WritableComparable w1, WritableComparable w2) { CustomKey ip1 = (CustomKey) w1; CustomKey ip2 = (CustomKey) w2; return ip1.getReviewerID().compareTo(ip2.getReviewerID()); } }Step 8. ProductMapper.java
Mapper class to process the product review dataset and prepare the key value for the reducer
package com.javamakeuse.bd.poc; import java.io.IOException; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class ProductMapper extends Mapper<LongWritable, Text, CustomKey, ProductReviewVO> { @Override protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, CustomKey, ProductReviewVO>.Context context) throws IOException, InterruptedException { String[] columns = value.toString().split("\\t"); if (columns.length > 3) { ProductReviewVO productReviewVO = new ProductReviewVO(); productReviewVO.setReviewerId(Integer.parseInt(columns[0])); productReviewVO.setProductId(columns[1]); productReviewVO.setReviewTxt(columns[2]); productReviewVO.setRating(Integer.parseInt(columns[3])); CustomKey customKey = new CustomKey(); customKey.setReviewerID(productReviewVO.getReviewerId()); customKey.setRating(productReviewVO.getRating()); context.write(customKey, productReviewVO); } } }Step 9. ProductReducer.java
Reducer class to process the mapper output and generate the output of the MapReduce program.
package com.javamakeuse.bd.poc; import java.io.IOException; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.mapreduce.Reducer; public class ProductReducer extends Reducer<CustomKey, ProductReviewVO, NullWritable, ProductReviewVO> { NullWritable nullKey = NullWritable.get(); @Override protected void reduce(CustomKey key, Iterable<ProductReviewVO> values, Reducer<CustomKey, ProductReviewVO, NullWritable, ProductReviewVO>.Context context) throws IOException, InterruptedException { for(ProductReviewVO productReviewVO:values){ context.write(nullKey, productReviewVO); } } }Step 10. ReviewDriver.java
Driver class to execute the MapReduce program.
package com.javamakeuse.bd.poc; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class ReviewDriver extends Configured implements Tool { public static void main(String[] args) { try { int status = ToolRunner.run(new ReviewDriver(), args); System.exit(status); } catch (Exception e) { e.printStackTrace(); } } @Override public int run(String[] args) throws Exception { if (args.length != 2) { System.err.printf("Usage: %s [generic options] <input1> <output>\n", getClass().getSimpleName()); ToolRunner.printGenericCommandUsage(System.err); return -1; } Job job = Job.getInstance(); job.setJarByClass(ReviewDriver.class); job.setJobName("ProductReview"); // input path FileInputFormat.addInputPath(job, new Path(args[0])); // output path FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(ProductMapper.class); job.setReducerClass(ProductReducer.class); job.setPartitionerClass(CustomPartitioner.class); job.setSortComparatorClass(KeyComparator.class); job.setGroupingComparatorClass(GroupComparator.class); job.setMapOutputKeyClass(CustomKey.class); job.setMapOutputValueClass(ProductReviewVO.class); job.setOutputKeyClass(NullWritable.class); job.setOutputValueClass(ProductReviewVO.class); return job.waitForCompletion(true) ? 0 : 1; } }Done, next to run this program, you can run it using any eclipse also, below are the steps to run using terminal.
Step 11. Steps to execute SecondarySorting project
i. Start Hadoop components,open your terminal and type
subodh@subodh-Inspiron-3520:~/software$ start-dfs.sh
subodh@subodh-Inspiron-3520:~/software$ start-yarn.shii. Verify Hadoop started or not with jps command
subodh@subodh-Inspiron-3520:~/software$ jps 8385 NameNode 8547 DataNode 5701 org.eclipse.equinox.launcher_1.3.100.v20150511-1540.jar 9446 Jps 8918 ResourceManager 9054 NodeManager 8751 SecondaryNameNodeYou can verify with web-ui also using "http://localhost:50070/explorer.html#/" url.
iii. Create input folder on HDFS using below command.
subodh@subodh-Inspiron-3520:~/software$ hadoop fs -mkdir /inputThe above command will create an input folder on HDFS, you can verify it using web UI or hadoop fs -ls / command, Now time to move input file which we need to process, below is the command to copy the product_review.data input file on HDFS inside input folder.
subodh@subodh-Inspiron-3520:~$ hadoop fs -copyFromLocal /home/subodh/programs/input/product_review.data /inputNote - product_review.data dataset is available inside this project source code, you would be able to download it from our downloadable link of this project.
Step 12. Create & Execute jar file
We almost done,now create jar file of SecondarySorting source code. You can create jar file using eclipse or by using mvn package command.
To execute SecondarySorting-1.0.jar file use below command
hadoop jar /home/subodh/SecondarySorting-1.0.jar com.javamakeuse.bd.poc.ReviewDriver /input/product_review.data /outputAbove will generate below output and also create an output folder with output of the SecondarySorting project.
16/04/06 23:01:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/04/06 23:01:55 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 16/04/06 23:01:55 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 16/04/06 23:01:56 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 16/04/06 23:01:56 INFO input.FileInputFormat: Total input paths to process : 1 16/04/06 23:01:56 INFO mapreduce.JobSubmitter: number of splits:1 16/04/06 23:01:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1432274046_0001 16/04/06 23:01:56 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 16/04/06 23:01:56 INFO mapreduce.Job: Running job: job_local1432274046_0001 16/04/06 23:01:56 INFO mapred.LocalJobRunner: OutputCommitter set in config null 16/04/06 23:01:56 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 16/04/06 23:01:56 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 16/04/06 23:01:56 INFO mapred.LocalJobRunner: Waiting for map tasks 16/04/06 23:01:56 INFO mapred.LocalJobRunner: Starting task: attempt_local1432274046_0001_m_000000_0 16/04/06 23:01:56 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 16/04/06 23:01:56 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 16/04/06 23:01:56 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/input/product_review.data:0+416 16/04/06 23:01:56 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 16/04/06 23:01:56 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 16/04/06 23:01:56 INFO mapred.MapTask: soft limit at 83886080 16/04/06 23:01:56 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 16/04/06 23:01:56 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 16/04/06 23:01:56 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 16/04/06 23:01:56 INFO mapred.LocalJobRunner: 16/04/06 23:01:56 INFO mapred.MapTask: Starting flush of map output 16/04/06 23:01:56 INFO mapred.MapTask: Spilling map output 16/04/06 23:01:56 INFO mapred.MapTask: bufstart = 0; bufend = 407; bufvoid = 104857600 16/04/06 23:01:56 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214364(104857456); length = 33/6553600 16/04/06 23:01:56 INFO mapred.MapTask: Finished spill 0 16/04/06 23:01:56 INFO mapred.Task: Task:attempt_local1432274046_0001_m_000000_0 is done. And is in the process of committing 16/04/06 23:01:56 INFO mapred.LocalJobRunner: map 16/04/06 23:01:56 INFO mapred.Task: Task 'attempt_local1432274046_0001_m_000000_0' done. 16/04/06 23:01:56 INFO mapred.LocalJobRunner: Finishing task: attempt_local1432274046_0001_m_000000_0 16/04/06 23:01:56 INFO mapred.LocalJobRunner: map task executor complete. 16/04/06 23:01:56 INFO mapred.LocalJobRunner: Waiting for reduce tasks 16/04/06 23:01:56 INFO mapred.LocalJobRunner: Starting task: attempt_local1432274046_0001_r_000000_0 16/04/06 23:01:56 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 16/04/06 23:01:56 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 16/04/06 23:01:56 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@42c355f5 16/04/06 23:01:56 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=334338464, maxSingleShuffleLimit=83584616, mergeThreshold=220663392, ioSortFactor=10, memToMemMergeOutputsThreshold=10 16/04/06 23:01:56 INFO reduce.EventFetcher: attempt_local1432274046_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events 16/04/06 23:01:56 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1432274046_0001_m_000000_0 decomp: 427 len: 431 to MEMORY 16/04/06 23:01:56 INFO reduce.InMemoryMapOutput: Read 427 bytes from map-output for attempt_local1432274046_0001_m_000000_0 16/04/06 23:01:56 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 427, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->427 16/04/06 23:01:56 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning 16/04/06 23:01:56 INFO mapred.LocalJobRunner: 1 / 1 copied. 16/04/06 23:01:56 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs 16/04/06 23:01:56 INFO mapred.Merger: Merging 1 sorted segments 16/04/06 23:01:56 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 417 bytes 16/04/06 23:01:56 INFO reduce.MergeManagerImpl: Merged 1 segments, 427 bytes to disk to satisfy reduce memory limit 16/04/06 23:01:56 INFO reduce.MergeManagerImpl: Merging 1 files, 431 bytes from disk 16/04/06 23:01:56 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce 16/04/06 23:01:56 INFO mapred.Merger: Merging 1 sorted segments 16/04/06 23:01:56 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 417 bytes 16/04/06 23:01:56 INFO mapred.LocalJobRunner: 1 / 1 copied. 16/04/06 23:01:56 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords 16/04/06 23:01:57 INFO mapreduce.Job: Job job_local1432274046_0001 running in uber mode : false 16/04/06 23:01:57 INFO mapreduce.Job: map 100% reduce 0% 16/04/06 23:01:57 INFO mapred.Task: Task:attempt_local1432274046_0001_r_000000_0 is done. And is in the process of committing 16/04/06 23:01:57 INFO mapred.LocalJobRunner: 1 / 1 copied. 16/04/06 23:01:57 INFO mapred.Task: Task attempt_local1432274046_0001_r_000000_0 is allowed to commit now 16/04/06 23:01:57 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1432274046_0001_r_000000_0' to hdfs://localhost:9000/output/_temporary/0/task_local1432274046_0001_r_000000 16/04/06 23:01:57 INFO mapred.LocalJobRunner: reduce > reduce 16/04/06 23:01:57 INFO mapred.Task: Task 'attempt_local1432274046_0001_r_000000_0' done. 16/04/06 23:01:57 INFO mapred.LocalJobRunner: Finishing task: attempt_local1432274046_0001_r_000000_0 16/04/06 23:01:57 INFO mapred.LocalJobRunner: reduce task executor complete. 16/04/06 23:01:58 INFO mapreduce.Job: map 100% reduce 100% 16/04/06 23:01:58 INFO mapreduce.Job: Job job_local1432274046_0001 completed successfully 16/04/06 23:01:58 INFO mapreduce.Job: Counters: 35 File System Counters FILE: Number of bytes read=21604 FILE: Number of bytes written=579171 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=832 HDFS: Number of bytes written=848 HDFS: Number of read operations=13 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Map-Reduce Framework Map input records=9 Map output records=9 Map output bytes=407 Map output materialized bytes=431 Input split bytes=112 Combine input records=0 Combine output records=0 Reduce input groups=7 Reduce shuffle bytes=431 Reduce input records=9 Reduce output records=9 Spilled Records=18 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=5 Total committed heap usage (bytes)=496500736 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=416 File Output Format Counters Bytes Written=848Step 13. Verify the output
That's it.
Download the complete example from here Source Code
Sponsored Links
0 comments:
Post a Comment