/ Bigdata

Spark Shell Example for Wordcount

This is a simple example to use Apache Spark's shell for counting words...
It reads a plaintext file (a Sherlock Holmes book) from the disk, counts the occurences of the word "Sherlock".

val sherlockFile = sc.textFile(„hdfs:////user//root//big.txt“)

sherlockFile.count()

val linesWithSherlock = sherlockFile
				.filter(line => line.contains(„Sherlock“))
linesWithSherlock.count()

val wordcountMap = sherlockFile
				.flatMap(line => line.split(„ „))
				.map(word => (word, 1))
wordCountMap.take(20)

val wc2 = wordCountMap.reduceByKey(_ + _)
wc2.take(20)

val wordcountSorted = sherlockFile
				.flatMap(line => line.split(„ „))
				.map(word => (word, 1))
				.reduceByKey(_ + _)
				.map(item => item.swap)
				.sortByKey(false)
				.map(item => item.swap)

wordCountSorted.take(20)