测试对于验证系统的正确性、分析系统的性能来说非常重要,但往往容易被我们所忽视。为了能对系统有更全面的了解、能找到系统的瓶颈所在、能对系统性能做更好的改进,打算先从测试入手,学习Hadoop主要的测试手段。
TestDFSIO
TestDFSIO用于测试HDFS的IO性能,使用一个MapReduce作业来并发地执行读写操作,每个map任务用于读或写每个文件,map的输出用于收集与处理文件相关的统计信息,reduce用于累积统计信息,并产生summary。
NameNode的地址为:10.*.*.131:7180
输入命令 hadoop version,提示hadoop jar包所在路径
进入jar包所在路径,输入命令 hadoop jar hadoop-test-2.6.0-mr1-cdh5.16.1.jar,返回如下信息:
An example program must be given as the first argument.Valid program names are: DFSCIOTest: Distributed i/o benchmark of libhdfs. DistributedFSCheck: Distributed checkup of the file system consistency. MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures TestDFSIO: Distributed i/o benchmark. dfsthroughput: measure hdfs throughput filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed) loadgen: Generic map/reduce load generator mapredtest: A map/reduce test check. minicluster: Single process HDFS and MR cluster. mrbench: A map/reduce benchmark that can create many small jobs nnbench: A benchmark that stresses the namenode. testarrayfile: A test for flat files of binary key/value pairs. testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce testfilesystem: A test for FileSystem read/write. testmapredsort: A map/reduce program that validates the map-reduce framework's sort. testrpc: A test for rpc. testsequencefile: A test for flat files of binary key value pairs. testsequencefileinputformat: A test for sequence file input format. testsetfile: A test for flat files of binary key/value pairs. testtextinputformat: A test for text input format. threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
输入并执行命令 hadoop jar hadoop-test-2.6.0-mr1-cdh5.16.1.jar TestDFSIO -write -nrFiles 10 -fileSize 1000
返回如下信息:
19/04/02 16:22:30 INFO fs.TestDFSIO: TestDFSIO.1.719/04/02 16:22:30 INFO fs.TestDFSIO: nrFiles = 1019/04/02 16:22:30 INFO fs.TestDFSIO: nrBytes (MB) = 1000.019/04/02 16:22:30 INFO fs.TestDFSIO: bufferSize = 100000019/04/02 16:22:30 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO19/04/02 16:22:31 INFO fs.TestDFSIO: creating control file: 1048576000 bytes, 10 filesjava.io.IOException: Permission denied: user=root, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
报错! java.io.IOException: Permission denied: user=root, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
执行命令 su hdfs 切换用户为 hdfs
输入并执行命令 hadoop jar hadoop-test-2.6.0-mr1-cdh5.16.1.jar TestDFSIO -write -nrFiles 10 -fileSize 1000
返回如下信息:
bash-4.2$ hadoop jar hadoop-test-2.6.0-mr1-cdh5.16.1.jar TestDFSIO -write -nrFiles 10 -fileSize 100019/04/02 16:26:39 INFO fs.TestDFSIO: TestDFSIO.1.719/04/02 16:26:39 INFO fs.TestDFSIO: nrFiles = 1019/04/02 16:26:39 INFO fs.TestDFSIO: nrBytes (MB) = 1000.019/04/02 16:26:39 INFO fs.TestDFSIO: bufferSize = 100000019/04/02 16:26:39 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO19/04/02 16:26:40 INFO fs.TestDFSIO: creating control file: 1048576000 bytes, 10 files19/04/02 16:26:40 INFO fs.TestDFSIO: created control files for: 10 files19/04/02 16:26:40 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:803219/04/02 16:26:40 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:803219/04/02 16:26:41 INFO mapred.FileInputFormat: Total input paths to process : 1019/04/02 16:26:41 INFO mapreduce.JobSubmitter: number of splits:1019/04/02 16:26:41 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum19/04/02 16:26:41 INFO Configuration.deprecation: dfs.https.address is deprecated. Instead, use dfs.namenode.https-address19/04/02 16:26:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552358721447_000219/04/02 16:26:41 INFO impl.YarnClientImpl: Submitted application application_1552358721447_000219/04/02 16:26:41 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1552358721447_0002/19/04/02 16:26:41 INFO mapreduce.Job: Running job: job_1552358721447_000219/04/02 16:26:48 INFO mapreduce.Job: Job job_1552358721447_0002 running in uber mode : false19/04/02 16:26:48 INFO mapreduce.Job: map 0% reduce 0%19/04/02 16:27:02 INFO mapreduce.Job: map 30% reduce 0%19/04/02 16:27:03 INFO mapreduce.Job: map 100% reduce 0%19/04/02 16:27:08 INFO mapreduce.Job: map 100% reduce 100%19/04/02 16:27:08 INFO mapreduce.Job: Job job_1552358721447_0002 completed successfully19/04/02 16:27:08 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=379 FILE: Number of bytes written=1653843 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2310 HDFS: Number of bytes written=10485760082 HDFS: Number of read operations=43 HDFS: Number of large read operations=0 HDFS: Number of write operations=12 Job Counters Launched map tasks=10 Launched reduce tasks=1 Data-local map tasks=10 Total time spent by all maps in occupied slots (ms)=128477 Total time spent by all reduces in occupied slots (ms)=2621 Total time spent by all map tasks (ms)=128477 Total time spent by all reduce tasks (ms)=2621 Total vcore-milliseconds taken by all map tasks=128477 Total vcore-milliseconds taken by all reduce tasks=2621 Total megabyte-milliseconds taken by all map tasks=131560448 Total megabyte-milliseconds taken by all reduce tasks=2683904 Map-Reduce Framework Map input records=10 Map output records=50 Map output bytes=784 Map output materialized bytes=1033 Input split bytes=1190 Combine input records=0 Combine output records=0 Reduce input groups=5 Reduce shuffle bytes=1033 Reduce input records=50 Reduce output records=5 Spilled Records=100 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=2657 CPU time spent (ms)=94700 Physical memory (bytes) snapshot=7229349888 Virtual memory (bytes) snapshot=32021716992 Total committed heap usage (bytes)=6717702144 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1120 File Output Format Counters Bytes Written=82java.io.FileNotFoundException: TestDFSIO_results.log (Permission denied)
报错! java.io.FileNotFoundException: TestDFSIO_results.log (Permission denied)
这是由于用户hdfs对当前所在文件夹没有足够的访问权限,参考: 中的评论
解决:新建文件夹 ** (命令:mkdir **),并授予用户hdfs对文件夹**的访问权限(命令:sudo chmod -R 777 **),进入文件夹**,执行命令 hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar TestDFSIO -write -nrFiles 10 -fileSize 1000 ,返回如下信息:
bash-4.2$ hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar TestDFSIO -write -nrFiles 10 -fileSize 100019/04/03 10:26:32 INFO fs.TestDFSIO: TestDFSIO.1.719/04/03 10:26:32 INFO fs.TestDFSIO: nrFiles = 1019/04/03 10:26:32 INFO fs.TestDFSIO: nrBytes (MB) = 1000.019/04/03 10:26:32 INFO fs.TestDFSIO: bufferSize = 100000019/04/03 10:26:32 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO19/04/03 10:26:32 INFO fs.TestDFSIO: creating control file: 1048576000 bytes, 10 files19/04/03 10:26:33 INFO fs.TestDFSIO: created control files for: 10 files19/04/03 10:26:33 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:803219/04/03 10:26:33 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:803219/04/03 10:26:33 INFO mapred.FileInputFormat: Total input paths to process : 1019/04/03 10:26:33 INFO mapreduce.JobSubmitter: number of splits:1019/04/03 10:26:33 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum19/04/03 10:26:33 INFO Configuration.deprecation: dfs.https.address is deprecated. Instead, use dfs.namenode.https-address19/04/03 10:26:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552358721447_000619/04/03 10:26:34 INFO impl.YarnClientImpl: Submitted application application_1552358721447_000619/04/03 10:26:34 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1552358721447_0006/19/04/03 10:26:34 INFO mapreduce.Job: Running job: job_1552358721447_000619/04/03 10:26:39 INFO mapreduce.Job: Job job_1552358721447_0006 running in uber mode : false19/04/03 10:26:39 INFO mapreduce.Job: map 0% reduce 0%19/04/03 10:26:53 INFO mapreduce.Job: map 30% reduce 0%19/04/03 10:26:54 INFO mapreduce.Job: map 90% reduce 0%19/04/03 10:26:55 INFO mapreduce.Job: map 100% reduce 0%19/04/03 10:27:00 INFO mapreduce.Job: map 100% reduce 100%19/04/03 10:27:00 INFO mapreduce.Job: Job job_1552358721447_0006 completed successfully19/04/03 10:27:00 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=392 FILE: Number of bytes written=1653853 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2310 HDFS: Number of bytes written=10485760082 HDFS: Number of read operations=43 HDFS: Number of large read operations=0 HDFS: Number of write operations=12 Job Counters Launched map tasks=10 Launched reduce tasks=1 Data-local map tasks=10 Total time spent by all maps in occupied slots (ms)=125653 Total time spent by all reduces in occupied slots (ms)=2636 Total time spent by all map tasks (ms)=125653 Total time spent by all reduce tasks (ms)=2636 Total vcore-milliseconds taken by all map tasks=125653 Total vcore-milliseconds taken by all reduce tasks=2636 Total megabyte-milliseconds taken by all map tasks=128668672 Total megabyte-milliseconds taken by all reduce tasks=2699264 Map-Reduce Framework Map input records=10 Map output records=50 Map output bytes=783 Map output materialized bytes=1030 Input split bytes=1190 Combine input records=0 Combine output records=0 Reduce input groups=5 Reduce shuffle bytes=1030 Reduce input records=50 Reduce output records=5 Spilled Records=100 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=1881 CPU time spent (ms)=78110 Physical memory (bytes) snapshot=6980759552 Virtual memory (bytes) snapshot=31983017984 Total committed heap usage (bytes)=6693060608 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1120 File Output Format Counters Bytes Written=8219/04/03 10:27:00 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write19/04/03 10:27:00 INFO fs.TestDFSIO: Date & time: Wed Apr 03 10:27:00 CST 201919/04/03 10:27:00 INFO fs.TestDFSIO: Number of files: 1019/04/03 10:27:00 INFO fs.TestDFSIO: Total MBytes processed: 10000.019/04/03 10:27:00 INFO fs.TestDFSIO: Throughput mb/sec: 114.7763009893717219/04/03 10:27:00 INFO fs.TestDFSIO: Average IO rate mb/sec: 115.2963409423828119/04/03 10:27:00 INFO fs.TestDFSIO: IO rate std deviation: 7.88001177729581819/04/03 10:27:00 INFO fs.TestDFSIO: Test exec time sec: 27.0519/04/03 10:27:00 INFO fs.TestDFSIO:bash-4.2$
测试命令正确执行以后会在Hadoop File System中创建文件夹存放生成的测试文件,如下所示:
并生成了一系列小文件:
将小文件下载到本地,查看文件大小为1KB
用Notepad++打开后,查看内容为:
并不是可读的内容
执行命令: hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar TestDFSIO -read -nrFiles 10 -fileSize 1000
返回如下信息:
bash-4.2$ hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar TestDFSIO -read -nrFiles 10 -fileSize 100019/04/03 10:51:05 INFO fs.TestDFSIO: TestDFSIO.1.719/04/03 10:51:05 INFO fs.TestDFSIO: nrFiles = 1019/04/03 10:51:05 INFO fs.TestDFSIO: nrBytes (MB) = 1000.019/04/03 10:51:05 INFO fs.TestDFSIO: bufferSize = 100000019/04/03 10:51:05 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO19/04/03 10:51:05 INFO fs.TestDFSIO: creating control file: 1048576000 bytes, 10 files19/04/03 10:51:06 INFO fs.TestDFSIO: created control files for: 10 files19/04/03 10:51:06 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:803219/04/03 10:51:06 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:803219/04/03 10:51:06 INFO mapred.FileInputFormat: Total input paths to process : 1019/04/03 10:51:06 INFO mapreduce.JobSubmitter: number of splits:1019/04/03 10:51:06 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum19/04/03 10:51:06 INFO Configuration.deprecation: dfs.https.address is deprecated. Instead, use dfs.namenode.https-address19/04/03 10:51:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552358721447_000719/04/03 10:51:07 INFO impl.YarnClientImpl: Submitted application application_1552358721447_000719/04/03 10:51:07 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1552358721447_0007/19/04/03 10:51:07 INFO mapreduce.Job: Running job: job_1552358721447_000719/04/03 10:51:12 INFO mapreduce.Job: Job job_1552358721447_0007 running in uber mode : false19/04/03 10:51:12 INFO mapreduce.Job: map 0% reduce 0%19/04/03 10:51:19 INFO mapreduce.Job: map 100% reduce 0%19/04/03 10:51:25 INFO mapreduce.Job: map 100% reduce 100%19/04/03 10:51:25 INFO mapreduce.Job: Job job_1552358721447_0007 completed successfully19/04/03 10:51:25 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=345 FILE: Number of bytes written=1653774 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=10485762310 HDFS: Number of bytes written=81 HDFS: Number of read operations=53 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=10 Launched reduce tasks=1 Data-local map tasks=10 Total time spent by all maps in occupied slots (ms)=50265 Total time spent by all reduces in occupied slots (ms)=2630 Total time spent by all map tasks (ms)=50265 Total time spent by all reduce tasks (ms)=2630 Total vcore-milliseconds taken by all map tasks=50265 Total vcore-milliseconds taken by all reduce tasks=2630 Total megabyte-milliseconds taken by all map tasks=51471360 Total megabyte-milliseconds taken by all reduce tasks=2693120 Map-Reduce Framework Map input records=10 Map output records=50 Map output bytes=774 Map output materialized bytes=1020 Input split bytes=1190 Combine input records=0 Combine output records=0 Reduce input groups=5 Reduce shuffle bytes=1020 Reduce input records=50 Reduce output records=5 Spilled Records=100 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=1310 CPU time spent (ms)=35780 Physical memory (bytes) snapshot=6365962240 Virtual memory (bytes) snapshot=31838441472 Total committed heap usage (bytes)=6873415680 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1120 File Output Format Counters Bytes Written=8119/04/03 10:51:25 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read19/04/03 10:51:25 INFO fs.TestDFSIO: Date & time: Wed Apr 03 10:51:25 CST 201919/04/03 10:51:25 INFO fs.TestDFSIO: Number of files: 1019/04/03 10:51:25 INFO fs.TestDFSIO: Total MBytes processed: 10000.019/04/03 10:51:25 INFO fs.TestDFSIO: Throughput mb/sec: 897.424391994974419/04/03 10:51:25 INFO fs.TestDFSIO: Average IO rate mb/sec: 898.684448242187519/04/03 10:51:25 INFO fs.TestDFSIO: IO rate std deviation: 33.6862358781003719/04/03 10:51:25 INFO fs.TestDFSIO: Test exec time sec: 19.03519/04/03 10:51:25 INFO fs.TestDFSIO:bash-4.2$
执行命令: hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar TestDFSIO -clean
返回如下信息:
bash-4.2$ hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar TestDFSIO -clean19/04/03 11:17:25 INFO fs.TestDFSIO: TestDFSIO.1.719/04/03 11:17:25 INFO fs.TestDFSIO: nrFiles = 119/04/03 11:17:25 INFO fs.TestDFSIO: nrBytes (MB) = 1.019/04/03 11:17:25 INFO fs.TestDFSIO: bufferSize = 100000019/04/03 11:17:25 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO19/04/03 11:17:26 INFO fs.TestDFSIO: Cleaning up test filesbash-4.2$
同时Hadoop File System中删除了TestDFSIO文件夹
nnbench
nnbench用于测试NameNode的负载,它会生成很多与HDFS相关的请求,给NameNode施加较大的压力。这个测试能在HDFS上模拟创建、读取、重命名和删除文件等操作。
nnbench命令的参数说明如下:
NameNode Benchmark 0.4Usage: nnbenchOptions:-operation * NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations.-maps -reduces -startTime
为了使用12个mapper和6个reducer来创建1000个文件,执行命令: hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar nnbench -operation create_write -maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 -replicationFactorPerFile 3 -readFileAfterOpen true -baseDir /benchmarks/NNBench
返回如下信息:
bash-4.2$ hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar nnbench -operation create_write -maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 -replicationFactorPerFile 3 hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar nnbench -operation create_write -maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 -replicationFactorPerFile 3 -readFileAfterOpen true -baseDir /benchmarks/NNBenchNameNode Benchmark 0.419/04/03 16:11:22 INFO hdfs.NNBench: Test Inputs:19/04/03 16:11:22 INFO hdfs.NNBench: Test Operation: create_write19/04/03 16:11:22 INFO hdfs.NNBench: Start time: 2019-04-03 16:13:22,75519/04/03 16:11:22 INFO hdfs.NNBench: Number of maps: 1219/04/03 16:11:22 INFO hdfs.NNBench: Number of reduces: 619/04/03 16:11:22 INFO hdfs.NNBench: Block Size: 119/04/03 16:11:22 INFO hdfs.NNBench: Bytes to write: 019/04/03 16:11:22 INFO hdfs.NNBench: Bytes per checksum: 119/04/03 16:11:22 INFO hdfs.NNBench: Number of files: 100019/04/03 16:11:22 INFO hdfs.NNBench: Replication factor: 319/04/03 16:11:22 INFO hdfs.NNBench: Base dir: /benchmarks/NNBench19/04/03 16:11:22 INFO hdfs.NNBench: Read file after open: true19/04/03 16:11:23 INFO hdfs.NNBench: Deleting data directory19/04/03 16:11:23 INFO hdfs.NNBench: Creating 12 control files19/04/03 16:11:24 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum19/04/03 16:11:24 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:803219/04/03 16:11:24 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:803219/04/03 16:11:24 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.19/04/03 16:11:24 INFO mapred.FileInputFormat: Total input paths to process : 1219/04/03 16:11:24 INFO mapreduce.JobSubmitter: number of splits:1219/04/03 16:11:24 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum19/04/03 16:11:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552358721447_000919/04/03 16:11:24 INFO impl.YarnClientImpl: Submitted application application_1552358721447_000919/04/03 16:11:24 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1552358721447_0009/19/04/03 16:11:24 INFO mapreduce.Job: Running job: job_1552358721447_000919/04/03 16:11:31 INFO mapreduce.Job: Job job_1552358721447_0009 running in uber mode : false19/04/03 16:11:31 INFO mapreduce.Job: map 0% reduce 0%19/04/03 16:11:48 INFO mapreduce.Job: map 50% reduce 0%19/04/03 16:11:49 INFO mapreduce.Job: map 67% reduce 0%19/04/03 16:13:26 INFO mapreduce.Job: map 100% reduce 0%19/04/03 16:13:31 INFO mapreduce.Job: map 100% reduce 17%19/04/03 16:13:32 INFO mapreduce.Job: map 100% reduce 100%19/04/03 16:13:32 INFO mapreduce.Job: Job job_1552358721447_0009 completed successfully19/04/03 16:13:32 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=519 FILE: Number of bytes written=2736365 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2908 HDFS: Number of bytes written=170 HDFS: Number of read operations=66 HDFS: Number of large read operations=0 HDFS: Number of write operations=12012 Job Counters Launched map tasks=12 Launched reduce tasks=6 Data-local map tasks=12 Total time spent by all maps in occupied slots (ms)=1363711 Total time spent by all reduces in occupied slots (ms)=18780 Total time spent by all map tasks (ms)=1363711 Total time spent by all reduce tasks (ms)=18780 Total vcore-milliseconds taken by all map tasks=1363711 Total vcore-milliseconds taken by all reduce tasks=18780 Total megabyte-milliseconds taken by all map tasks=1396440064 Total megabyte-milliseconds taken by all reduce tasks=19230720 Map-Reduce Framework Map input records=12 Map output records=84 Map output bytes=2016 Map output materialized bytes=3276 Input split bytes=1418 Combine input records=0 Combine output records=0 Reduce input groups=7 Reduce shuffle bytes=3276 Reduce input records=84 Reduce output records=7 Spilled Records=168 Shuffled Maps =72 Failed Shuffles=0 Merged Map outputs=72 GC time elapsed (ms)=2335 CPU time spent (ms)=35880 Physical memory (bytes) snapshot=9088864256 Virtual memory (bytes) snapshot=52095377408 Total committed heap usage (bytes)=11191975936 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1490 File Output Format Counters Bytes Written=17019/04/03 16:13:32 INFO hdfs.NNBench: -------------- NNBench -------------- :19/04/03 16:13:32 INFO hdfs.NNBench: Version: NameNode Benchmark 0.419/04/03 16:13:32 INFO hdfs.NNBench: Date & time: 2019-04-03 16:13:32,47519/04/03 16:13:32 INFO hdfs.NNBench:19/04/03 16:13:32 INFO hdfs.NNBench: Test Operation: create_write19/04/03 16:13:32 INFO hdfs.NNBench: Start time: 2019-04-03 16:13:22,75519/04/03 16:13:32 INFO hdfs.NNBench: Maps to run: 1219/04/03 16:13:32 INFO hdfs.NNBench: Reduces to run: 619/04/03 16:13:32 INFO hdfs.NNBench: Block Size (bytes): 119/04/03 16:13:32 INFO hdfs.NNBench: Bytes to write: 019/04/03 16:13:32 INFO hdfs.NNBench: Bytes per checksum: 119/04/03 16:13:32 INFO hdfs.NNBench: Number of files: 100019/04/03 16:13:32 INFO hdfs.NNBench: Replication factor: 319/04/03 16:13:32 INFO hdfs.NNBench: Successful file operations: 019/04/03 16:13:32 INFO hdfs.NNBench:19/04/03 16:13:32 INFO hdfs.NNBench: # maps that missed the barrier: 019/04/03 16:13:32 INFO hdfs.NNBench: # exceptions: 019/04/03 16:13:32 INFO hdfs.NNBench:19/04/03 16:13:32 INFO hdfs.NNBench: TPS: Create/Write/Close: 019/04/03 16:13:32 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: 0.019/04/03 16:13:32 INFO hdfs.NNBench: Avg Lat (ms): Create/Write: NaN19/04/03 16:13:32 INFO hdfs.NNBench: Avg Lat (ms): Close: NaN19/04/03 16:13:32 INFO hdfs.NNBench:19/04/03 16:13:32 INFO hdfs.NNBench: RAW DATA: AL Total #1: 019/04/03 16:13:32 INFO hdfs.NNBench: RAW DATA: AL Total #2: 019/04/03 16:13:32 INFO hdfs.NNBench: RAW DATA: TPS Total (ms): 019/04/03 16:13:32 INFO hdfs.NNBench: RAW DATA: Longest Map Time (ms): 0.019/04/03 16:13:32 INFO hdfs.NNBench: RAW DATA: Late maps: 019/04/03 16:13:32 INFO hdfs.NNBench: RAW DATA: # of exceptions: 019/04/03 16:13:32 INFO hdfs.NNBench:bash-4.2$
任务执行完以后可以到页面 http://*.*.*.*:19888/jobhistory/job/job_1552358721447_0009 查看任务执行详情,如下:
并且在Hadoop File System中生成文件夹NNBench存储任务产生的文件:
进入目录/benchmarks/NNBench/control,查看某个文件 的元信息,发现文件存在三个节点上:
下载下来用Notepad++打开,发现内容是乱码:
mrbench
mrbench会多次重复执行一个小作业,用于检查在机群上小作业的运行是否可重复以及运行是否高效。mrbench的用法如下:
Usage: mrbench [-baseDir] [-jar ] [-numRuns ] [-maps ] [-reduces ] [-inputLines ] [-inputType ] [-verbose]
执行命令: hadoop jar ../jars/hadoop-test-2.6.0-mr1-cdh5.16.1.jar mrbench -numRuns 50
返回如下信息:
…… Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=3 File Output Format Counters Bytes Written=319/04/03 17:10:15 INFO mapred.MRBench: Running job 49: input=hdfs://node1:8020/benchmarks/MRBench/mr_input output=hdfs://node1:8020/benchmarks/MRBench/mr_output/output_29973931619/04/03 17:10:15 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:803219/04/03 17:10:15 INFO client.RMProxy: Connecting to ResourceManager at node1/10.200.101.131:803219/04/03 17:10:15 INFO mapred.FileInputFormat: Total input paths to process : 119/04/03 17:10:15 INFO mapreduce.JobSubmitter: number of splits:219/04/03 17:10:15 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552358721447_005919/04/03 17:10:15 INFO impl.YarnClientImpl: Submitted application application_1552358721447_005919/04/03 17:10:15 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1552358721447_0059/19/04/03 17:10:15 INFO mapreduce.Job: Running job: job_1552358721447_005919/04/03 17:10:21 INFO mapreduce.Job: Job job_1552358721447_0059 running in uber mode : false19/04/03 17:10:21 INFO mapreduce.Job: map 0% reduce 0%19/04/03 17:10:25 INFO mapreduce.Job: map 100% reduce 0%19/04/03 17:10:30 INFO mapreduce.Job: map 100% reduce 100%19/04/03 17:10:30 INFO mapreduce.Job: Job job_1552358721447_0059 completed successfully19/04/03 17:10:30 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=27 FILE: Number of bytes written=450422 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=239 HDFS: Number of bytes written=3 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=5134 Total time spent by all reduces in occupied slots (ms)=2562 Total time spent by all map tasks (ms)=5134 Total time spent by all reduce tasks (ms)=2562 Total vcore-milliseconds taken by all map tasks=5134 Total vcore-milliseconds taken by all reduce tasks=2562 Total megabyte-milliseconds taken by all map tasks=5257216 Total megabyte-milliseconds taken by all reduce tasks=2623488 Map-Reduce Framework Map input records=1 Map output records=1 Map output bytes=5 Map output materialized bytes=39 Input split bytes=236 Combine input records=0 Combine output records=0 Reduce input groups=1 Reduce shuffle bytes=39 Reduce input records=1 Reduce output records=1 Spilled Records=2 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=196 CPU time spent (ms)=2550 Physical memory (bytes) snapshot=1503531008 Virtual memory (bytes) snapshot=8690847744 Total committed heap usage (bytes)=1791492096 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=3 File Output Format Counters Bytes Written=3DataLines Maps Reduces AvgTime (milliseconds)1 2 1 15357bash-4.2$
以上结果表示平均作业完成时间是15秒
打开网址 http://*.*.*.*:8088/cluster ,可以查看执行的任务信息:
Hadoop File System也生成了相应的目录,但是目录里面的内容是空的,如下:
参考内容: ; ;