Map Reduce flow in Hadoop

Question

I'm learning Hadoop using the book Hadoop in Practice, and while reading chapter 1 i came across this diagram: From the Hadoop docs:(http://hadoop.apache.org/docs/current2/api/org/apache/hadoop/mapred/Reducer.html) 1.Shuffle Reducer is input the grouped output of a Mapper. In the phase the framework, for each Reducer, fetches the relevant partition of the output of all the Mappers, via HTTP. 2.Sort The framework groups Reducer inputs

Accepted Answer

It is the Partitioner that decides how to distribute the output of mappers to different reducers.  Partitioner controls the partitioning of the keys of the intermediate map-outputs. The key (or a subset of the key) is used to derive the partition, typically by a hash function. The total number of partitions is the same as the number of reduce tasks for the job. Hence this controls which of the m reduce tasks the intermediate key (and hence the record) is sent for reduction.

1.Shuffle

2.Sort

Advertisement

Answer