MapReduce is an distributed algorithm.
https://www.psc.edu/wp-content/uploads/2023/07/A-Brief-History-of-Big-Data.pdf
- Map: \((in\_key, in\_value) \Rightarrow list(out\_key, intermediate\_value)\).
- Reduce:
- Group map outputs by \(out\_key\)
- \((out\_key, list(intermediate\_value)) \Rightarrow list(out\_value)\)
example of MapReduce
Say, if you want to count word frequencies in a set of documents.
- Map: \((document\_name, document\_contents) \Rightarrow list(word, #\ occurrences)\)
You can see that this can be distributed to multiple processors. You can have each processor count the word frequencies in a single document. We have now broken the contents into divide and conquerable groups.
- Reduce: \((word, list\ (occurrences\_per\_document)) \Rightarrow (word,sum)\)
We just add up the occurrences that each of the nodes’ output for word frequency.