nutch+solr-indexer导致java.lang.outofmemoryerror:java堆空间

nwnhqdif  于 2021-05-30  发布在  Hadoop
关注(0)|答案(1)|浏览(319)

我已经将我的2台服务器配置为在分布式模式下运行(使用hadoop),我的爬网进程配置为nutch2.2.1-hbase(作为存储)和solr。solr由tomcat管理。问题是每次我尝试做最后一步-我的意思是当我想把数据从hbase索引到solr时。然后发生此[1]错误。我尝试添加catalina\u opts(或java\u opts)如下:
catalina\u opts=“$java\u opts-xx:+useConMarkSweepGC-xms1g-xmx6000m-xx:minheapfreeratio=10-xx:maxheapfreeratio=30-xx:maxpermsize=512m-xx:+cmsclassunloadingenabled”
使用tomcat的catalina.sh脚本运行服务器,但没有帮助。我还将这些[2]属性添加到nutch-site.xml文件中,但结果是 OutOfMemory 再一次。你能帮帮我吗?
[1]

2014-09-06 22:52:50,683 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space 
    at java.util.Arrays.copyOf(Arrays.java:2367) 
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) 
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) 
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:587) 
    at java.lang.StringBuffer.append(StringBuffer.java:332) 
    at java.io.StringWriter.write(StringWriter.java:77) 
    at org.apache.solr.common.util.XML.escape(XML.java:204) 
    at org.apache.solr.common.util.XML.escapeCharData(XML.java:77) 
    at org.apache.solr.common.util.XML.writeXML(XML.java:147) 
    at org.apache.solr.client.solrj.util.ClientUtils.writeVal(ClientUtils.java:161) 
    at org.apache.solr.client.solrj.util.ClientUtils.writeXML(ClientUtils.java:129) 
    at org.apache.solr.client.solrj.request.UpdateRequest.writeXML(UpdateRequest.java:355) 
    at org.apache.solr.client.solrj.request.UpdateRequest.getXML(UpdateRequest.java:271) 
    at org.apache.solr.client.solrj.request.RequestWriter.getContentStream(RequestWriter.java:66) 
    at org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream.getDelegate(RequestWriter.java:94) 
    at org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream.getName(RequestWriter.java:104) 
    at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:247) 
    at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197) 
    at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) 
    at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68) 
    at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54) 
    at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:96) 
    at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:117) 
    at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:54) 
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:650) 
    at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1793) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) 
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)

[2]

<property>
  <name>http.content.limit</name>
  <value>150000000</value>
</property>

<property>
   <name>indexer.max.tokens</name>
   <value>100000</value>
</property>

<property>
  <name>http.timeout</name>
  <value>50000</value>
</property>

<property>
  <name>solr.commit.size</name>
  <value>100</value>
</property>
8dtrkrch

8dtrkrch1#

我已通过以下配置(mapred-site.xml文件)解决了此问题:

<property>
  <name>mapred.jobtracker.retirejob.interval</name>
  <value>3600000</value>
</property>

<property>
  <name>mapred.job.tracker.retiredjobs.cache.size</name>
  <value>100</value>
</property>

<property>
<name>mapred.child.java.opts</name>
<value>-Xmx4000m -XX:+UseConcMarkSweepGC</value>
</property>

<property>
<name>mapred.child.ulimit</name>
<value>6000000</value>
</property>

<property>
  <name>mapred.jobtracker.completeuserjobs.maximum</name>
  <value>5</value>
</property>

<property>
  <name>mapred.job.tracker.handler.count</name>
  <value>5</value>
</property>

相关问题