Parallel Sorting: MapReduce TeraSort

O’Malley, Owen. "Terabyte sort on apache hadoop." Yahoo, available online at: http://sortbenchmark.org/Yahoo-Hadoop.pdf, (May) (2008): 1-3.

  • Won the general purpose terabyte sort benchmark in 2008 and 2009
  • 2008
    • 910 nodes
    • 4 dual-core Xeons @ 2.0ghz, 4 SATA disks, 8G RAM, 1 gigabit ethernet
    • 40 nodes per rack
    • 3.48 minutes
  • 2009
    • ~3800 nodes
    • 2 quad-core Xesons @ 2.5ghz, 4 SATA disks, 8G RAM, 1 gigabit ethernet
    • 40 nodes per rack
    • 62 seconds (1TB sort - this benchmark was retired after 2008)
    • 975 minutes (1PB sort)

Terasort includes 3 MapReduce steps:

  • Teragen: generates data
  • Terasort: sorts data
  • Teravalidate: validates the output is sorted

In [3]:
# generate 100GB of random data 
!ssh dsciu001 hdfs dfs -rm -r tmp/teragenout
!yarn jar \
    /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-*.jar \
    teragen -Dmapred.map.tasks=700 100000000000 tmp/teragenout


17/03/07 21:47:52 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 360 minutes, Emptier interval = 0 minutes.
Moved: 'hdfs://dsci/user/lngo/tmp/teragenout' to trash at: hdfs://dsci/user/lngo/.Trash/Current
17/03/07 21:47:55 INFO impl.TimelineClientImpl: Timeline service address: http://dscim003.palmetto.clemson.edu:8188/ws/v1/timeline/
17/03/07 21:47:56 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 6475 for lngo on ha-hdfs:dsci
17/03/07 21:47:56 INFO security.TokenCache: Got dt for hdfs://dsci; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:dsci, Ident: (HDFS_DELEGATION_TOKEN token 6475 for lngo)
17/03/07 21:47:56 INFO terasort.TeraSort: Generating 100000000000 using 700
17/03/07 21:47:56 INFO mapreduce.JobSubmitter: number of splits:700
17/03/07 21:47:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1488826571229_0023
17/03/07 21:47:56 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:dsci, Ident: (HDFS_DELEGATION_TOKEN token 6475 for lngo)
17/03/07 21:47:57 INFO impl.YarnClientImpl: Submitted application application_1488826571229_0023
17/03/07 21:47:57 INFO mapreduce.Job: The url to track the job: http://dscim001.palmetto.clemson.edu:8088/proxy/application_1488826571229_0023/
17/03/07 21:47:57 INFO mapreduce.Job: Running job: job_1488826571229_0023
17/03/07 21:48:12 INFO mapreduce.Job: Job job_1488826571229_0023 running in uber mode : false
17/03/07 21:48:12 INFO mapreduce.Job:  map 0% reduce 0%
17/03/07 21:48:29 INFO mapreduce.Job:  map 1% reduce 0%
17/03/07 21:48:35 INFO mapreduce.Job:  map 2% reduce 0%
17/03/07 21:48:41 INFO mapreduce.Job:  map 3% reduce 0%
17/03/07 21:48:47 INFO mapreduce.Job:  map 4% reduce 0%
17/03/07 21:48:52 INFO mapreduce.Job:  map 5% reduce 0%
17/03/07 21:48:58 INFO mapreduce.Job:  map 6% reduce 0%
17/03/07 21:49:03 INFO mapreduce.Job:  map 7% reduce 0%
17/03/07 21:49:09 INFO mapreduce.Job:  map 8% reduce 0%
17/03/07 21:49:16 INFO mapreduce.Job:  map 9% reduce 0%
17/03/07 21:49:22 INFO mapreduce.Job:  map 10% reduce 0%
17/03/07 21:49:28 INFO mapreduce.Job:  map 11% reduce 0%
17/03/07 21:49:33 INFO mapreduce.Job:  map 12% reduce 0%
17/03/07 21:49:38 INFO mapreduce.Job:  map 13% reduce 0%
17/03/07 21:49:45 INFO mapreduce.Job:  map 14% reduce 0%
17/03/07 21:49:51 INFO mapreduce.Job:  map 15% reduce 0%
17/03/07 21:49:57 INFO mapreduce.Job:  map 16% reduce 0%
17/03/07 21:50:03 INFO mapreduce.Job:  map 17% reduce 0%
17/03/07 21:50:09 INFO mapreduce.Job:  map 18% reduce 0%
17/03/07 21:50:13 INFO mapreduce.Job:  map 19% reduce 0%
17/03/07 21:50:19 INFO mapreduce.Job:  map 20% reduce 0%
17/03/07 21:50:25 INFO mapreduce.Job:  map 21% reduce 0%
17/03/07 21:50:32 INFO mapreduce.Job:  map 22% reduce 0%
17/03/07 21:50:38 INFO mapreduce.Job:  map 23% reduce 0%
17/03/07 21:50:44 INFO mapreduce.Job:  map 24% reduce 0%
17/03/07 21:50:50 INFO mapreduce.Job:  map 25% reduce 0%
17/03/07 21:50:55 INFO mapreduce.Job:  map 26% reduce 0%
17/03/07 21:51:02 INFO mapreduce.Job:  map 27% reduce 0%
17/03/07 21:51:08 INFO mapreduce.Job:  map 28% reduce 0%
17/03/07 21:51:14 INFO mapreduce.Job:  map 29% reduce 0%
17/03/07 21:51:20 INFO mapreduce.Job:  map 30% reduce 0%
17/03/07 21:51:26 INFO mapreduce.Job:  map 31% reduce 0%
17/03/07 21:51:32 INFO mapreduce.Job:  map 32% reduce 0%
17/03/07 21:51:38 INFO mapreduce.Job:  map 33% reduce 0%
17/03/07 21:51:44 INFO mapreduce.Job:  map 34% reduce 0%
17/03/07 21:51:49 INFO mapreduce.Job:  map 35% reduce 0%
17/03/07 21:51:55 INFO mapreduce.Job:  map 36% reduce 0%
17/03/07 21:52:01 INFO mapreduce.Job:  map 37% reduce 0%
17/03/07 21:52:07 INFO mapreduce.Job:  map 38% reduce 0%
17/03/07 21:52:14 INFO mapreduce.Job:  map 39% reduce 0%
17/03/07 21:52:19 INFO mapreduce.Job:  map 40% reduce 0%
17/03/07 21:52:25 INFO mapreduce.Job:  map 41% reduce 0%
17/03/07 21:52:31 INFO mapreduce.Job:  map 42% reduce 0%
17/03/07 21:52:37 INFO mapreduce.Job:  map 43% reduce 0%
17/03/07 21:52:43 INFO mapreduce.Job:  map 44% reduce 0%
17/03/07 21:52:49 INFO mapreduce.Job:  map 45% reduce 0%
17/03/07 21:52:55 INFO mapreduce.Job:  map 46% reduce 0%
17/03/07 21:53:01 INFO mapreduce.Job:  map 47% reduce 0%
17/03/07 21:53:07 INFO mapreduce.Job:  map 48% reduce 0%
17/03/07 21:53:13 INFO mapreduce.Job:  map 49% reduce 0%
17/03/07 21:53:19 INFO mapreduce.Job:  map 50% reduce 0%
17/03/07 21:53:25 INFO mapreduce.Job:  map 51% reduce 0%
17/03/07 21:53:31 INFO mapreduce.Job:  map 52% reduce 0%
17/03/07 21:53:38 INFO mapreduce.Job:  map 53% reduce 0%
17/03/07 21:53:44 INFO mapreduce.Job:  map 54% reduce 0%
17/03/07 21:53:50 INFO mapreduce.Job:  map 55% reduce 0%
17/03/07 21:53:56 INFO mapreduce.Job:  map 56% reduce 0%
17/03/07 21:54:02 INFO mapreduce.Job:  map 57% reduce 0%
17/03/07 21:54:08 INFO mapreduce.Job:  map 58% reduce 0%
17/03/07 21:54:14 INFO mapreduce.Job:  map 59% reduce 0%
17/03/07 21:54:20 INFO mapreduce.Job:  map 60% reduce 0%
17/03/07 21:54:26 INFO mapreduce.Job:  map 61% reduce 0%
17/03/07 21:54:32 INFO mapreduce.Job:  map 62% reduce 0%
17/03/07 21:54:38 INFO mapreduce.Job:  map 63% reduce 0%
17/03/07 21:54:44 INFO mapreduce.Job:  map 64% reduce 0%
17/03/07 21:54:50 INFO mapreduce.Job:  map 65% reduce 0%
17/03/07 21:54:56 INFO mapreduce.Job:  map 66% reduce 0%
17/03/07 21:55:03 INFO mapreduce.Job:  map 67% reduce 0%
17/03/07 21:55:09 INFO mapreduce.Job:  map 68% reduce 0%
17/03/07 21:55:15 INFO mapreduce.Job:  map 69% reduce 0%
17/03/07 21:55:20 INFO mapreduce.Job:  map 70% reduce 0%
17/03/07 21:55:26 INFO mapreduce.Job:  map 71% reduce 0%
17/03/07 21:55:33 INFO mapreduce.Job:  map 72% reduce 0%
17/03/07 21:55:39 INFO mapreduce.Job:  map 73% reduce 0%
17/03/07 21:55:46 INFO mapreduce.Job:  map 74% reduce 0%
17/03/07 21:55:51 INFO mapreduce.Job:  map 75% reduce 0%
17/03/07 21:55:57 INFO mapreduce.Job:  map 76% reduce 0%
17/03/07 21:56:03 INFO mapreduce.Job:  map 77% reduce 0%
17/03/07 21:56:09 INFO mapreduce.Job:  map 78% reduce 0%
17/03/07 21:56:17 INFO mapreduce.Job:  map 79% reduce 0%
17/03/07 21:56:24 INFO mapreduce.Job:  map 80% reduce 0%
17/03/07 21:56:30 INFO mapreduce.Job:  map 81% reduce 0%
17/03/07 21:56:37 INFO mapreduce.Job:  map 82% reduce 0%
17/03/07 21:56:45 INFO mapreduce.Job:  map 83% reduce 0%
17/03/07 21:56:52 INFO mapreduce.Job:  map 84% reduce 0%
17/03/07 21:57:01 INFO mapreduce.Job:  map 85% reduce 0%
17/03/07 21:57:09 INFO mapreduce.Job:  map 86% reduce 0%
17/03/07 21:57:17 INFO mapreduce.Job:  map 87% reduce 0%
17/03/07 21:57:25 INFO mapreduce.Job:  map 88% reduce 0%
17/03/07 21:57:34 INFO mapreduce.Job:  map 89% reduce 0%
17/03/07 21:57:43 INFO mapreduce.Job:  map 90% reduce 0%
17/03/07 21:57:51 INFO mapreduce.Job:  map 91% reduce 0%
17/03/07 21:57:59 INFO mapreduce.Job:  map 92% reduce 0%
17/03/07 21:58:09 INFO mapreduce.Job:  map 93% reduce 0%
17/03/07 21:58:17 INFO mapreduce.Job:  map 94% reduce 0%
17/03/07 21:58:26 INFO mapreduce.Job:  map 95% reduce 0%
17/03/07 21:58:35 INFO mapreduce.Job:  map 96% reduce 0%
17/03/07 21:58:45 INFO mapreduce.Job:  map 97% reduce 0%
17/03/07 21:58:57 INFO mapreduce.Job:  map 98% reduce 0%
17/03/07 21:59:12 INFO mapreduce.Job:  map 99% reduce 0%
17/03/07 21:59:39 INFO mapreduce.Job:  map 100% reduce 0%
17/03/07 22:01:22 INFO mapreduce.Job: Job job_1488826571229_0023 completed successfully
17/03/07 22:01:22 INFO mapreduce.Job: Counters: 32
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=102111973
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=61565
		HDFS: Number of bytes written=10000000000000
		HDFS: Number of read operations=2800
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=1400
	Job Counters 
		Killed map tasks=12
		Launched map tasks=712
		Other local map tasks=712
		Total time spent by all maps in occupied slots (ms)=761804916
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=253934972
		Total vcore-seconds taken by all map tasks=253934972
		Total megabyte-seconds taken by all map tasks=3273729659024
	Map-Reduce Framework
		Map input records=100000000000
		Map output records=100000000000
		Input split bytes=61565
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=600886
		CPU time spent (ms)=168512640
		Physical memory (bytes) snapshot=469840396288
		Virtual memory (bytes) snapshot=9273390780416
		Total committed heap usage (bytes)=236617465856
	org.apache.hadoop.examples.terasort.TeraGen$Counters
		CHECKSUM=-6612818753306859195
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=10000000000000

In [4]:
# sort 100GB of data
!yarn jar \
    /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-*.\
    terasort -Dmapred.reduce.tasks=700 tmp/teragenout tmp/terasortout


17/03/07 22:02:00 INFO terasort.TeraSort: starting
17/03/07 22:02:01 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 6476 for lngo on ha-hdfs:dsci
17/03/07 22:02:01 INFO security.TokenCache: Got dt for hdfs://dsci; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:dsci, Ident: (HDFS_DELEGATION_TOKEN token 6476 for lngo)
17/03/07 22:02:03 INFO input.FileInputFormat: Total input paths to process : 700
Spent 1979ms computing base-splits.
Spent 200ms computing TeraScheduler splits.
Computing input splits took 2180ms
Sampling 10 splits of 74900
Making 700 from 100000 sampled records
Computing parititions took 680ms
Spent 2862ms computing partitions.
17/03/07 22:02:04 INFO impl.TimelineClientImpl: Timeline service address: http://dscim003.palmetto.clemson.edu:8188/ws/v1/timeline/
17/03/07 22:02:05 INFO mapreduce.JobSubmitter: number of splits:74900
17/03/07 22:02:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1488826571229_0024
17/03/07 22:02:05 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:dsci, Ident: (HDFS_DELEGATION_TOKEN token 6476 for lngo)
17/03/07 22:02:05 INFO impl.YarnClientImpl: Submitted application application_1488826571229_0024
17/03/07 22:02:05 INFO mapreduce.Job: The url to track the job: http://dscim001.palmetto.clemson.edu:8088/proxy/application_1488826571229_0024/
17/03/07 22:02:05 INFO mapreduce.Job: Running job: job_1488826571229_0024
17/03/07 22:02:23 INFO mapreduce.Job: Job job_1488826571229_0024 running in uber mode : false
17/03/07 22:02:23 INFO mapreduce.Job:  map 0% reduce 0%
17/03/07 22:02:55 INFO mapreduce.Job:  map 1% reduce 0%
17/03/07 22:03:12 INFO mapreduce.Job:  map 2% reduce 0%
17/03/07 22:03:34 INFO mapreduce.Job:  map 3% reduce 0%
17/03/07 22:03:56 INFO mapreduce.Job:  map 4% reduce 0%
17/03/07 22:04:18 INFO mapreduce.Job:  map 5% reduce 0%
17/03/07 22:04:38 INFO mapreduce.Job:  map 6% reduce 0%
17/03/07 22:05:01 INFO mapreduce.Job:  map 7% reduce 0%
17/03/07 22:05:24 INFO mapreduce.Job:  map 8% reduce 0%
17/03/07 22:05:48 INFO mapreduce.Job:  map 9% reduce 0%
17/03/07 22:06:10 INFO mapreduce.Job:  map 10% reduce 0%
17/03/07 22:06:33 INFO mapreduce.Job:  map 11% reduce 0%
17/03/07 22:06:57 INFO mapreduce.Job:  map 12% reduce 0%
17/03/07 22:07:21 INFO mapreduce.Job:  map 13% reduce 0%
17/03/07 22:07:45 INFO mapreduce.Job:  map 14% reduce 0%
17/03/07 22:08:09 INFO mapreduce.Job:  map 15% reduce 0%
17/03/07 22:08:34 INFO mapreduce.Job:  map 16% reduce 0%
17/03/07 22:08:45 INFO mapreduce.Job:  map 16% reduce 1%
17/03/07 22:08:59 INFO mapreduce.Job:  map 17% reduce 1%
17/03/07 22:09:24 INFO mapreduce.Job:  map 18% reduce 1%
17/03/07 22:09:49 INFO mapreduce.Job:  map 19% reduce 1%
17/03/07 22:10:15 INFO mapreduce.Job:  map 20% reduce 1%
17/03/07 22:10:41 INFO mapreduce.Job:  map 21% reduce 1%
17/03/07 22:11:07 INFO mapreduce.Job:  map 22% reduce 1%
17/03/07 22:11:33 INFO mapreduce.Job:  map 23% reduce 1%
17/03/07 22:12:00 INFO mapreduce.Job:  map 24% reduce 1%
17/03/07 22:12:28 INFO mapreduce.Job:  map 25% reduce 1%
17/03/07 22:12:56 INFO mapreduce.Job:  map 26% reduce 1%
17/03/07 22:13:24 INFO mapreduce.Job:  map 27% reduce 1%
17/03/07 22:13:49 INFO mapreduce.Job:  map 27% reduce 2%
17/03/07 22:13:51 INFO mapreduce.Job:  map 28% reduce 2%
17/03/07 22:14:20 INFO mapreduce.Job:  map 29% reduce 2%
17/03/07 22:14:49 INFO mapreduce.Job:  map 30% reduce 2%
17/03/07 22:15:18 INFO mapreduce.Job:  map 31% reduce 2%
17/03/07 22:15:49 INFO mapreduce.Job:  map 32% reduce 2%
17/03/07 22:16:19 INFO mapreduce.Job:  map 33% reduce 2%
17/03/07 22:16:48 INFO mapreduce.Job:  map 34% reduce 2%
17/03/07 22:17:20 INFO mapreduce.Job:  map 35% reduce 2%
17/03/07 22:17:36 INFO mapreduce.Job:  map 35% reduce 3%
17/03/07 22:17:51 INFO mapreduce.Job:  map 36% reduce 3%
17/03/07 22:18:24 INFO mapreduce.Job:  map 37% reduce 3%
17/03/07 22:18:56 INFO mapreduce.Job:  map 38% reduce 3%
17/03/07 22:19:29 INFO mapreduce.Job:  map 39% reduce 3%
17/03/07 22:20:01 INFO mapreduce.Job:  map 40% reduce 3%
17/03/07 22:20:35 INFO mapreduce.Job:  map 41% reduce 3%
17/03/07 22:20:58 INFO mapreduce.Job:  map 41% reduce 4%
17/03/07 22:21:09 INFO mapreduce.Job:  map 42% reduce 4%
17/03/07 22:21:43 INFO mapreduce.Job:  map 43% reduce 4%
17/03/07 22:22:17 INFO mapreduce.Job:  map 44% reduce 4%
17/03/07 22:22:53 INFO mapreduce.Job:  map 45% reduce 4%
17/03/07 22:23:29 INFO mapreduce.Job:  map 46% reduce 4%
17/03/07 22:23:54 INFO mapreduce.Job:  map 46% reduce 5%
17/03/07 22:24:05 INFO mapreduce.Job:  map 47% reduce 5%
17/03/07 22:24:42 INFO mapreduce.Job:  map 48% reduce 5%
17/03/07 22:25:19 INFO mapreduce.Job:  map 49% reduce 5%
17/03/07 22:25:57 INFO mapreduce.Job:  map 50% reduce 5%
17/03/07 22:26:36 INFO mapreduce.Job:  map 51% reduce 5%
17/03/07 22:26:51 INFO mapreduce.Job:  map 51% reduce 6%
17/03/07 22:27:16 INFO mapreduce.Job:  map 52% reduce 6%
17/03/07 22:27:54 INFO mapreduce.Job:  map 53% reduce 6%
17/03/07 22:28:33 INFO mapreduce.Job:  map 54% reduce 6%
17/03/07 22:29:11 INFO mapreduce.Job:  map 55% reduce 6%
17/03/07 22:29:48 INFO mapreduce.Job:  map 56% reduce 6%
17/03/07 22:30:26 INFO mapreduce.Job:  map 57% reduce 6%
17/03/07 22:30:42 INFO mapreduce.Job:  map 57% reduce 7%
17/03/07 22:31:06 INFO mapreduce.Job:  map 58% reduce 7%
17/03/07 22:31:42 INFO mapreduce.Job:  map 59% reduce 7%
17/03/07 22:32:21 INFO mapreduce.Job:  map 60% reduce 7%
17/03/07 22:32:57 INFO mapreduce.Job:  map 61% reduce 7%
17/03/07 22:33:36 INFO mapreduce.Job:  map 62% reduce 7%
17/03/07 22:34:14 INFO mapreduce.Job:  map 63% reduce 7%
17/03/07 22:34:46 INFO mapreduce.Job:  map 63% reduce 8%
17/03/07 22:34:51 INFO mapreduce.Job:  map 64% reduce 8%
17/03/07 22:35:28 INFO mapreduce.Job:  map 65% reduce 8%
17/03/07 22:36:06 INFO mapreduce.Job:  map 66% reduce 8%
17/03/07 22:36:43 INFO mapreduce.Job:  map 67% reduce 8%
17/03/07 22:37:21 INFO mapreduce.Job:  map 68% reduce 8%
17/03/07 22:37:58 INFO mapreduce.Job:  map 69% reduce 8%
17/03/07 22:38:34 INFO mapreduce.Job:  map 70% reduce 8%
17/03/07 22:39:01 INFO mapreduce.Job:  map 70% reduce 9%
17/03/07 22:39:11 INFO mapreduce.Job:  map 71% reduce 9%
17/03/07 22:39:49 INFO mapreduce.Job:  map 72% reduce 9%
17/03/07 22:40:26 INFO mapreduce.Job:  map 73% reduce 9%
17/03/07 22:41:03 INFO mapreduce.Job:  map 74% reduce 9%
17/03/07 22:41:41 INFO mapreduce.Job:  map 75% reduce 9%
17/03/07 22:42:16 INFO mapreduce.Job:  map 76% reduce 9%
17/03/07 22:42:53 INFO mapreduce.Job:  map 77% reduce 9%
17/03/07 22:43:30 INFO mapreduce.Job:  map 78% reduce 9%
17/03/07 22:43:42 INFO mapreduce.Job:  map 78% reduce 10%
17/03/07 22:44:08 INFO mapreduce.Job:  map 79% reduce 10%
17/03/07 22:44:43 INFO mapreduce.Job:  map 80% reduce 10%
17/03/07 22:45:21 INFO mapreduce.Job:  map 81% reduce 10%
17/03/07 22:45:57 INFO mapreduce.Job:  map 82% reduce 10%
17/03/07 22:46:34 INFO mapreduce.Job:  map 83% reduce 10%
17/03/07 22:47:11 INFO mapreduce.Job:  map 84% reduce 10%
17/03/07 22:47:48 INFO mapreduce.Job:  map 85% reduce 10%
17/03/07 22:48:19 INFO mapreduce.Job:  map 85% reduce 11%
17/03/07 22:48:25 INFO mapreduce.Job:  map 86% reduce 11%
17/03/07 22:49:01 INFO mapreduce.Job:  map 87% reduce 11%
17/03/07 22:49:38 INFO mapreduce.Job:  map 88% reduce 11%
17/03/07 22:50:15 INFO mapreduce.Job:  map 89% reduce 11%
17/03/07 22:50:52 INFO mapreduce.Job:  map 90% reduce 11%
17/03/07 22:51:29 INFO mapreduce.Job:  map 91% reduce 11%
17/03/07 22:52:05 INFO mapreduce.Job:  map 92% reduce 11%
17/03/07 22:52:42 INFO mapreduce.Job:  map 93% reduce 11%
17/03/07 22:53:14 INFO mapreduce.Job:  map 93% reduce 12%
17/03/07 22:53:18 INFO mapreduce.Job:  map 94% reduce 12%
17/03/07 22:53:55 INFO mapreduce.Job:  map 95% reduce 12%
17/03/07 22:54:31 INFO mapreduce.Job:  map 96% reduce 12%
17/03/07 22:55:08 INFO mapreduce.Job:  map 97% reduce 12%
17/03/07 22:55:44 INFO mapreduce.Job:  map 98% reduce 12%
17/03/07 22:56:22 INFO mapreduce.Job:  map 99% reduce 12%
17/03/07 22:56:54 INFO mapreduce.Job:  map 100% reduce 12%
17/03/07 22:57:15 INFO mapreduce.Job:  map 100% reduce 13%
17/03/07 22:57:36 INFO mapreduce.Job:  map 100% reduce 15%
17/03/07 22:57:37 INFO mapreduce.Job:  map 100% reduce 16%
17/03/07 22:57:38 INFO mapreduce.Job:  map 100% reduce 18%
17/03/07 22:57:40 INFO mapreduce.Job:  map 100% reduce 19%
17/03/07 22:57:41 INFO mapreduce.Job:  map 100% reduce 20%
17/03/07 22:57:43 INFO mapreduce.Job:  map 100% reduce 21%
17/03/07 22:57:44 INFO mapreduce.Job:  map 100% reduce 22%
17/03/07 22:57:46 INFO mapreduce.Job:  map 100% reduce 23%
17/03/07 22:57:49 INFO mapreduce.Job:  map 100% reduce 24%
17/03/07 22:57:52 INFO mapreduce.Job:  map 100% reduce 25%
17/03/07 22:57:57 INFO mapreduce.Job:  map 100% reduce 26%
17/03/07 22:58:09 INFO mapreduce.Job:  map 100% reduce 27%
17/03/07 22:58:29 INFO mapreduce.Job:  map 100% reduce 28%
17/03/07 22:58:51 INFO mapreduce.Job:  map 100% reduce 29%
17/03/07 22:59:14 INFO mapreduce.Job:  map 100% reduce 30%
17/03/07 22:59:37 INFO mapreduce.Job:  map 100% reduce 31%
17/03/07 23:00:01 INFO mapreduce.Job:  map 100% reduce 32%
17/03/07 23:00:30 INFO mapreduce.Job:  map 100% reduce 33%
17/03/07 23:00:53 INFO mapreduce.Job:  map 100% reduce 34%
17/03/07 23:01:17 INFO mapreduce.Job:  map 100% reduce 35%
17/03/07 23:01:41 INFO mapreduce.Job:  map 100% reduce 36%
17/03/07 23:02:06 INFO mapreduce.Job:  map 100% reduce 37%
17/03/07 23:02:30 INFO mapreduce.Job:  map 100% reduce 38%
17/03/07 23:02:53 INFO mapreduce.Job:  map 100% reduce 39%
17/03/07 23:03:16 INFO mapreduce.Job:  map 100% reduce 40%
17/03/07 23:03:41 INFO mapreduce.Job:  map 100% reduce 41%
17/03/07 23:04:04 INFO mapreduce.Job:  map 100% reduce 42%
17/03/07 23:04:30 INFO mapreduce.Job:  map 100% reduce 43%
17/03/07 23:04:56 INFO mapreduce.Job:  map 100% reduce 44%
17/03/07 23:05:25 INFO mapreduce.Job:  map 100% reduce 45%
17/03/07 23:05:59 INFO mapreduce.Job:  map 100% reduce 46%
17/03/07 23:06:31 INFO mapreduce.Job:  map 100% reduce 47%
17/03/07 23:07:03 INFO mapreduce.Job:  map 100% reduce 48%
17/03/07 23:07:36 INFO mapreduce.Job:  map 100% reduce 49%
17/03/07 23:08:11 INFO mapreduce.Job:  map 100% reduce 50%
17/03/07 23:08:44 INFO mapreduce.Job:  map 100% reduce 51%
17/03/07 23:08:50 INFO mapreduce.Job:  map 100% reduce 52%
17/03/07 23:08:54 INFO mapreduce.Job:  map 100% reduce 53%
17/03/07 23:08:59 INFO mapreduce.Job:  map 100% reduce 54%
17/03/07 23:09:08 INFO mapreduce.Job:  map 100% reduce 55%
17/03/07 23:09:26 INFO mapreduce.Job:  map 100% reduce 56%
17/03/07 23:09:28 INFO mapreduce.Job:  map 100% reduce 55%
17/03/07 23:09:29 INFO mapreduce.Job:  map 100% reduce 56%
17/03/07 23:09:45 INFO mapreduce.Job:  map 100% reduce 57%
17/03/07 23:10:06 INFO mapreduce.Job:  map 100% reduce 58%
17/03/07 23:10:28 INFO mapreduce.Job:  map 100% reduce 59%
17/03/07 23:10:51 INFO mapreduce.Job:  map 100% reduce 60%
17/03/07 23:11:14 INFO mapreduce.Job:  map 100% reduce 61%
17/03/07 23:11:43 INFO mapreduce.Job:  map 100% reduce 62%
17/03/07 23:12:12 INFO mapreduce.Job:  map 100% reduce 63%
17/03/07 23:12:43 INFO mapreduce.Job:  map 100% reduce 64%
17/03/07 23:13:13 INFO mapreduce.Job:  map 100% reduce 65%
17/03/07 23:13:44 INFO mapreduce.Job:  map 100% reduce 66%
17/03/07 23:14:08 INFO mapreduce.Job:  map 100% reduce 67%
17/03/07 23:14:38 INFO mapreduce.Job:  map 100% reduce 68%
17/03/07 23:15:00 INFO mapreduce.Job:  map 100% reduce 69%
17/03/07 23:15:23 INFO mapreduce.Job:  map 100% reduce 70%
17/03/07 23:15:47 INFO mapreduce.Job:  map 100% reduce 71%
17/03/07 23:16:12 INFO mapreduce.Job:  map 100% reduce 72%
17/03/07 23:16:29 INFO mapreduce.Job:  map 100% reduce 73%
17/03/07 23:16:46 INFO mapreduce.Job:  map 100% reduce 74%
17/03/07 23:17:09 INFO mapreduce.Job:  map 100% reduce 75%
17/03/07 23:17:37 INFO mapreduce.Job:  map 100% reduce 76%
17/03/07 23:18:03 INFO mapreduce.Job:  map 100% reduce 77%
17/03/07 23:18:32 INFO mapreduce.Job:  map 100% reduce 78%
17/03/07 23:19:10 INFO mapreduce.Job:  map 100% reduce 79%
17/03/07 23:19:38 INFO mapreduce.Job:  map 100% reduce 80%
17/03/07 23:20:03 INFO mapreduce.Job:  map 100% reduce 81%
17/03/07 23:20:22 INFO mapreduce.Job:  map 100% reduce 82%
17/03/07 23:20:39 INFO mapreduce.Job:  map 100% reduce 83%
17/03/07 23:20:53 INFO mapreduce.Job:  map 100% reduce 84%
17/03/07 23:21:05 INFO mapreduce.Job:  map 100% reduce 85%
17/03/07 23:21:17 INFO mapreduce.Job:  map 100% reduce 86%
17/03/07 23:21:29 INFO mapreduce.Job:  map 100% reduce 87%
17/03/07 23:21:40 INFO mapreduce.Job:  map 100% reduce 88%
17/03/07 23:21:51 INFO mapreduce.Job:  map 100% reduce 89%
17/03/07 23:22:02 INFO mapreduce.Job:  map 100% reduce 90%
17/03/07 23:22:14 INFO mapreduce.Job:  map 100% reduce 91%
17/03/07 23:22:25 INFO mapreduce.Job:  map 100% reduce 92%
17/03/07 23:22:41 INFO mapreduce.Job:  map 100% reduce 93%
17/03/07 23:22:56 INFO mapreduce.Job:  map 100% reduce 94%
17/03/07 23:23:18 INFO mapreduce.Job:  map 100% reduce 95%
17/03/07 23:23:39 INFO mapreduce.Job:  map 100% reduce 96%
17/03/07 23:24:09 INFO mapreduce.Job:  map 100% reduce 97%
17/03/07 23:24:42 INFO mapreduce.Job:  map 100% reduce 98%
17/03/07 23:25:23 INFO mapreduce.Job:  map 100% reduce 99%
17/03/07 23:26:23 INFO mapreduce.Job:  map 100% reduce 100%
17/03/07 23:31:08 INFO mapreduce.Job: Job job_1488826571229_0024 completed successfully
17/03/07 23:31:08 INFO mapreduce.Job: Counters: 52
	File System Counters
		FILE: Number of bytes read=10400575971110
		FILE: Number of bytes written=20812705218772
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=10000008538600
		HDFS: Number of bytes written=10000000000000
		HDFS: Number of read operations=226800
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=1400
	Job Counters 
		Failed reduce tasks=11
		Killed map tasks=2
		Launched map tasks=74901
		Launched reduce tasks=711
		Data-local map tasks=74883
		Rack-local map tasks=18
		Total time spent by all maps in occupied slots (ms)=2711442153
		Total time spent by all reduces in occupied slots (ms)=4150755153
		Total time spent by all map tasks (ms)=903814051
		Total time spent by all reduce tasks (ms)=1383585051
		Total vcore-seconds taken by all map tasks=903814051
		Total vcore-seconds taken by all reduce tasks=1383585051
		Total megabyte-seconds taken by all map tasks=11651970745492
		Total megabyte-seconds taken by all reduce tasks=17837178477492
	Map-Reduce Framework
		Map input records=100000000000
		Map output records=100000000000
		Map output bytes=10200000000000
		Map output materialized bytes=10400314580000
		Input split bytes=8538600
		Combine input records=0
		Combine output records=0
		Reduce input groups=100000000000
		Reduce shuffle bytes=10400314580000
		Reduce input records=100000000000
		Reduce output records=100000000000
		Spilled Records=200000000000
		Shuffled Maps =52430000
		Failed Shuffles=0
		Merged Map outputs=52430000
		GC time elapsed (ms)=57925615
		CPU time spent (ms)=2132972310
		Physical memory (bytes) snapshot=169946145001472
		Virtual memory (bytes) snapshot=1000650242908160
		Total committed heap usage (bytes)=203119801139200
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=10000000000000
	File Output Format Counters 
		Bytes Written=10000000000000
17/03/07 23:31:08 INFO terasort.TeraSort: done

real	89m10.018s
user	0m0.036s
sys	0m0.018s

Terasort in a nutshell: Bucketsort based on Keys

  • Map processes data and emits items to be sorted as Keys and some random value as filler
  • Assumption: Ranges of values of keys are known
  • Customized Partitioner: balance and distribute keys to various reducers such that keys(i-1) < keys(i) < keys (i+1)
  • Reducers sort their own set of keys
  • Profit!!!

Partitioner design is based on trie tree

Trie tree advantages (https://en.wikipedia.org/wiki/Trie)

  • Data look up is faster in the worst case
  • No collisions of different key
  • No hash function is needed
  • Alphabetical ordering of entries by key

Trie tree disadvantages (https://en.wikipedia.org/wiki/Trie)

  • Slower than hash table in looking up data (we are not looking up data)
  • Floating point numbers can lead to chains and prefixes that are not meaningful (we don't care because we are not analyzing data)
  • Some trie might require more space than a hash table (What is Hadoop for? big data)