选项 | 类型 | 默认值 | 描述 |
dfs.block.size | int | 64M | 有一个1T的文件,如果我的块size设置是默认的64M,那么在HDFS上产生的块将有1024000/64=16000块 |
dfs.replication | int | 3 | 存放数据文件的份数 |
选项 | 类型 | 默认值 | 描述 |
io.sort.mb | int | 100 | 缓存map中间结果的buffer大小(MB) |
io.sort.record.percent | float | 0.05 | io.sort.mb中用来保存map output记录边界的百分比,其他缓存用来保存数据 |
io.sort.spill.percent | float | 0.8 | map开始做spill操作的阀值 |
io.sort.factor | int | 10 | 做merge操作时同时操作的stream数上线 |
min.num.spill.for.combine | int | 3 | combine函数运行的最小spill数 |
mapred.compress.map.output | boolean | FALSE | map中间结果是否采用压缩 |
mapred.map.output.compression.codec | class name | org.apache.hadoop.io.compress.DefaultCodec | map中间结果的压缩方式 |
mapred.tasktracker.map.tasks.maximum | int | 2 | 一个tasktracker最多可以同时运行的map任务数量 |
mapred.map.tasks | int | 2 | 一个Job会使用task tracker的map任务槽数量,这个值 ≤ mapred.tasktracker.map.tasks.maximum |
选项 | 类型 | 默认值 | 描述 |
mapred.reduce.parallel.copies | int | 5 | 每个reduce并行下载map结果的最大线程数 |
mapred.reduce.copy.backoff | int | 300 | reduce下载线程最大等待时间(insec) |
io.sort.factor | int | 10 | 做merge操作时同时操作的stream数上线 |
mapred.job.shuffle.input.buffer.percent | float | 0.7 | 用来缓存shuffle数据的reduce task heap百分比 |
mapred.job.shuffle.merge.percent | float | 0.66 | 缓存的内存中多少百分比后开始做merge操作 |
mapred.job.reduce.input.buffer.percent | float | 0 | sort完成后reduce计算阶段用来缓存数据的百分比 |
mapred.tasktracker.reduce.tasks.maximum | int | 2 | 一个task tracker最多可以同时运行的reduce任务数量 |
mapred.reduce.tasks | int | 1 | 一个Job会使用task tracker的reduce任务槽数量 |
mapred.child.java.opts | int | 200M | 配置每个map或reduce使用的内存数量 |
修改项 | 修改值 | 路径 | 作用 |
HADOOP_OPTS | HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true" | bin/hadoop | disable ipv6 |
Job的调度模式 | FIFO | ||
Fair |
相关推荐
Expert Hadoop Administration Managing Tuning and Securing Spark YARN and HDFS 英文无水印pdf pdf使用FoxitReader和PDF-XChangeViewer测试可以打开
Expert Hadoop Administration Managing,Tuning,and Securing Spark,YARN,and HDFS
hadoop优化,从硬件、系统、hadoop架构的从输入直到输出的优化案例
Hadoop is a Java-based distributed framework designed to work with applications implemented using MapReduce modeling. This distributed framework makes it possible to pass the load on to thousands of ...
Title: Hadoop in Practice, ...Chapter 8: Tuning, debugging, and testing Part 4: Beyond MapReduce Chapter 9: SQL on Hadoop Chapter 10: Writing a YARN application Appendix: Installing Hadoop and friends
MapReduce is the distribution system that the Hadoop MapReduce engine uses to distribute work around a cluster by working parallel on smaller data sets. It is useful in a wide range of applications, ...
Lakhe proceeds to cover the selection criteria for ETL tools, the implementation steps for migration with SQOOP- and Flume-based data transfers, and transition optimization techniques for tuning ...
Lakhe proceeds to cover the selection criteria for ETL tools, the implementation steps for migration with SQOOP- and Flume-based data transfers, and transition optimization techniques for tuning ...
hadoop权威指南第三版(英文版)。 Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Preface . . . . . . ....
NULL 博文链接:https://aperise.iteye.com/blog/2383587
Hadoop definitive 第三版, 目录如下 1. Meet Hadoop . . . 1 Data! 1 Data Storage and Analysis 3 Comparison with Other Systems 4 RDBMS 4 Grid Computing 6 Volunteer Computing 8 A Brief History of Hadoop 9...
Dive into the world of SQL on Hadoop and get the most out of your Hive data warehouses. This book is your go-to resource for using Hive: authors Scott Shaw, Ankur Gupta, David Kjerrumgaard, and ...
藏经阁-Dr.Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop
and tuning Intel® Distribution for Apache Hadoop* (Intel® Distribution) software, a big data system optimized to run on Intel processor-based architecture. This guidance is based on benchmark ...
Dive into the world of SQL on Hadoop and get the most out of your Hive data warehouses. This book is your go-to resource for using Hive: authors Scott Shaw, Ankur Gupta, David Kjerrumgaard, and ...
主讲议题:Hadoop应用程序性能调优案例分析 Hadoop Application Performance Tuning : A Case Study 讲师 Milind Bhandarkar,Yahoo!公司Hadoop应用架构师 活动详情见http://tup.csdn.net/
The book will guide you through every step required to write effective distributed programs from setting up your cluster and ...developing analytics applications and tuning them for your purposes.
15 ■ Evaluating and tuning a classifier 281 16 ■ Deploying a classifier 307 17 ■ Case study: Shop It To Me 341 Licensed to Jianbin Dai vii contents preface xvii acknowledgments xix about this book...
Walk through the fundamentals of tuning general neural networks and specific deep network architectures Use vectorization techniques for different data types with DataVec, DL4J’s workflow tool Learn ...
Title: Mastering Apache Cassandra, 2nd Edition Author: Nishant Neeraj Length: 322 pages Edition: 2 Language: English Publisher: Packt Publishing Publication Date: 2015-02-27 ... Integration with Hadoop