mapreduce—四节点群集上的hadoop复制因子为1

nmpmafwu 于 2021-05-30 发布在 Hadoop

关注(0)|答案(1)|浏览(402)

我在四个节点上安装了hadoop。一个节点用于namenode和辅助namenode。另外三个是数据节点。我运行了一个复制因子为3的sqoop作业。sqoop作业成功，数据位于所有三个datanode上。用6个绘图员完成这项工作大约花了1.5个小时。我用复制因子1运行了相同的作业。这项工作也很成功，它运行了大约1个小时，有12个绘图员。
我的问题是：

when i ran the job for second time with replication factor of 1 where is the data stored? (Is the data split and stored in all the three datanodes? (or) The data is stored on the machine from which i ran the job? ) 2. I have 6 core processors on each datanode with 64 GB of ram. Which are the properties should i set to obtain optimum values for the sqoop job? 以下是第一个作业的日志： 15/06/30 00:21:28 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=749046 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=864 HDFS: Number of bytes written=253986997858 HDFS: Number of read operations=24 HDFS: Number of large read operations=0 HDFS: Number of write operations=12 Job Counters Launched map tasks=6 Other local map tasks=6 Total time spent by all maps in occupied slots (ms)=20582400 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=20582400 Total vcore-seconds taken by all map tasks=20582400 Total megabyte-seconds taken by all map tasks=73767321600 Map-Reduce Framework Map input records=162991238 Map output records=162991238 Input split bytes=864 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=187671 CPU time spent (ms)=21216950 Physical memory (bytes) snapshot=5210345472 Virtual memory (bytes) snapshot=57549950976 Total committed heap usage (bytes)=6410469376 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=253986997858 15/06/30 00:21:28 INFO mapreduce.ImportJobBase: Transferred 236.5438 GB in 5,524.6156 seconds (43.8439 MB/sec) 15/06/30 00:21:28 INFO mapreduce.ImportJobBase: Retrieved 162991238 records. 以下是第二个作业的日志： 15/06/30 10:21:02 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=1498130 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=1744 HDFS: Number of bytes written=253986997858 HDFS: Number of read operations=48 HDFS: Number of large read operations=0 HDFS: Number of write operations=24 Job Counters Launched map tasks=12 Other local map tasks=12 Total time spent by all maps in occupied slots (ms)=22551454 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=22551454 Total vcore-seconds taken by all map tasks=22551454 Total megabyte-seconds taken by all map tasks=80824411136 Map-Reduce Framework Map input records=162991238 Map output records=162991238 Input split bytes=1744 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=186898 CPU time spent (ms)=21910100 Physical memory (bytes) snapshot=9802846208 Virtual memory (bytes) snapshot=115099107328 Total committed heap usage (bytes)=12298747904 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=253986997858 15/06/30 10:21:02 INFO mapreduce.ImportJobBase: Transferred 236.5438 GB in 3,647.7444 seconds (66.4029 MB/sec) 15/06/30 10:21:02 INFO mapreduce.ImportJobBase: Retrieved 162991238 records.

hadoop hdfs mapreduce sqoop database-replication

来源：https://stackoverflow.com/questions/31141845/hadoop-replication-factor-of-1-on-a-four-node-cluster

1条答案

按热度按时间

gorkyyrv1#

这是我对你两个问题的回答。1使用复制因子1运行时。hdfs中数据块的副本是一个，但数据将分布在所有三个节点上。数据块自动分布在集群中，这就是原因。
根据集群中可用的核心/插槽指定作业中Map器的数量，这将是最佳的。这里有6个核心计算机，我假设Map器的核心分配为4，reducer为2。所以你有432（2个Map器可以在每个核心上运行）=24个Map器最适合这个工作。默认情况下
希望这能澄清你的疑问。

赞(0）回复(0）举报 2021-05-30

我来回答

mapreduce—四节点群集上的hadoop复制因子为1

1条答案

相关问题

热门标签

最新问答