用ganglia监控hadoop多节点集群

wxclj1h5  于 2021-06-03  发布在  Hadoop
关注(0)|答案(2)|浏览(442)

我想使用ganglia监视hadoop(hadoop版本0.20.2)多节点集群。我的hadoop工作正常。我已经安装了ganglia后,阅读了以下博客---
http://hakunamapdata.com/ganglia-configuration-for-a-small-hadoop-cluster-and-some-troubleshooting/
http://hokamblogs.blogspot.in/2013/06/ganglia-overview-and-installation-on.html
我还研究了ganglia.pdf(附录b ganglia和hadoop/hbase)的监控。​

  1. I have modified only the following lines in**Hadoop-metrics.properties**(same on all Hadoop Nodes)==>
  2. // Configuration of the "dfs" context for ganglia
  3. dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
  4. dfs.period=10
  5. dfs.servers=192.168.1.182:8649
  6. // Configuration of the "mapred" context for ganglia
  7. mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
  8. mapred.period=10
  9. mapred.servers=192.168.1.182:8649:8649
  10. // Configuration of the "jvm" context for ganglia
  11. jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
  12. jvm.period=10
  13. jvm.servers=192.168.1.182:8649
  14. **gmetad.conf**(Only on Hadoop master Node )
  15. data_source "Hadoop-slaves" 5 192.168.1.182:8649
  16. RRAs "RRA:AVERAGE:0.5:1:302400" //Because i want to analyse one week data.
  17. **gmond.conf**(on all the Hadoop Slave nodes and Hadoop Master)
  18. globals {
  19. daemonize = yes
  20. setuid = yes
  21. user = ganglia
  22. debug_level = 0
  23. max_udp_msg_len = 1472
  24. mute = no
  25. deaf = no
  26. allow_extra_data = yes
  27. host_dmax = 0 /*secs */
  28. cleanup_threshold = 300 /*secs */
  29. gexec = no
  30. send_metadata_interval = 0
  31. }
  32. cluster {
  33. name = "Hadoop-slaves"
  34. owner = "Sandeep Priyank"
  35. latlong = "unspecified"
  36. url = "unspecified"
  37. }
  38. /* The host section describes attributes of the host, like the location */
  39. host {
  40. location = "CASL"
  41. }
  42. /* Feel free to specify as many udp_send_channels as you like. Gmond
  43. used to only support having a single channel */
  44. udp_send_channel {
  45. host = 192.168.1.182
  46. port = 8649
  47. ttl = 1
  48. }
  49. /* You can specify as many udp_recv_channels as you like as well. */
  50. udp_recv_channel {
  51. port = 8649
  52. }
  53. /* You can specify as many tcp_accept_channels as you like to share
  54. an xml description of the state of the cluster */
  55. tcp_accept_channel {
  56. port = 8649
  57. }

现在ganglia只提供所有节点的系统度量(mem、disk等)。但是它没有在web界面上显示hadoop度量(比如jvm、mapred度量等)。我怎样才能解决这个问题?

ktca8awb

ktca8awb1#

我确实在ganglia上使用hadoop,是的,我在ganglia上看到了很多hadoop的度量(容器、Map任务、vmem)。事实上,hadoop特定于ganglio的报告有上百个度量。
这篇文章已经足够了。
我在主节点上编辑hadoop-metrics2.properties,内容是:

  1. namenode.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
  2. namenode.sink.ganglia.period=10
  3. namenode.sink.ganglia.servers=gmetad_hostname_or_ip:8649
  4. resourcemanager.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
  5. resourcemanager.sink.ganglia.period=10
  6. resourcemanager.sink.ganglia.servers=gmetad_hostname_or_ip:8649

我也在从机上编辑同样的文件:

  1. datanode.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
  2. datanode.sink.ganglia.period=10
  3. datanode.sink.ganglia.servers=gmetad_hostname_or_ip:8649
  4. nodemanager.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
  5. nodemanager.sink.ganglia.period=10
  6. nodemanager.sink.ganglia.servers=gmetad_hostname_or_ip:8649

请记住,在更改文件后重新启动hadoop和ganglia。
我希望这对你有帮助。

展开查看全部
nfg76nw0

nfg76nw02#

感谢所有人,如果您使用的是旧版本的hadoop,那么将以下文件(来自新版本的hadoop)==>
gangliacontext31.java文件
gangliacontext.java文件
在path==>hadoop/src/core/org/apache/hadoop/metrics/ganglia的新版本中。
使用ant编译hadoop(并在编译时设置适当的代理)。如果它给出了函数定义丢失之类的错误,那么将该函数定义(来自新版本)放入适当的java文件中,然后再次编译hadoop。会有用的。

相关问题