执行错误,在用于twitter情绪分析的配置单元中执行查询时返回代码1

3vpjnl9f  于 2021-06-28  发布在  Hive
关注(0)|答案(1)|浏览(473)

我正在使用hadoop、flume和hive进行twitter情绪分析。我已经用

  1. hive -f tweets.sql

推文.sql

  1. --create the tweets_raw table containing the records as received from Twitter
  2. SET hive.support.sql11.reserved.keywords=false;
  3. CREATE EXTERNAL TABLE Mytweets_raw (
  4. id BIGINT,
  5. created_at STRING,
  6. source STRING,
  7. favorited BOOLEAN,
  8. retweet_count INT,
  9. retweeted_status STRUCT<
  10. text:STRING,
  11. user:STRUCT<screen_name:STRING,name:STRING>>,
  12. entities STRUCT<
  13. urls:ARRAY<STRUCT<expanded_url:STRING>>,
  14. user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
  15. hashtags:ARRAY<STRUCT<text:STRING>>>,
  16. text STRING,
  17. user STRUCT<
  18. screen_name:STRING,
  19. name:STRING,
  20. friends_count:INT,
  21. followers_count:INT,
  22. statuses_count:INT,
  23. verified:BOOLEAN,
  24. utc_offset:INT,
  25. time_zone:STRING>,
  26. in_reply_to_screen_name STRING
  27. )
  28. ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
  29. LOCATION '/user/flume/tweets';
  30. -- create sentiment dictionary
  31. CREATE EXTERNAL TABLE dictionary (
  32. type string,
  33. length int,
  34. word string,
  35. pos string,
  36. stemmed string,
  37. polarity string
  38. )
  39. ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
  40. STORED AS TEXTFILE
  41. LOCATION '/data/dictionary';
  42. -- loading data to the table dictionary
  43. load data inpath 'data/dictionary/dictionary.tsv' INTO TABLE dictionary;
  44. CREATE EXTERNAL TABLE time_zone_map (
  45. time_zone string,
  46. country string
  47. )
  48. ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
  49. STORED AS TEXTFILE
  50. LOCATION '/data/time_zone_map';
  51. -- loading data to the table time_zone_map
  52. load data inpath 'data/time_zone_map/time_zone_map.tsv' INTO TABLE time_zone_map;
  53. -- Clean up tweets
  54. CREATE VIEW tweets_simple AS
  55. SELECT
  56. id,
  57. cast ( from_unixtime( unix_timestamp(concat( '2014 ', substring(created_at,5,15)), 'yyyy MMM dd hh:mm:ss')) as timestamp) ts,
  58. text,
  59. user.time_zone
  60. FROM Mytweets_raw
  61. ;
  62. CREATE VIEW tweets_clean AS
  63. SELECT
  64. id,
  65. ts,
  66. text,
  67. m.country
  68. FROM tweets_simple t LEFT OUTER JOIN time_zone_map m ON t.time_zone = m.time_zone;
  69. -- Compute sentiment
  70. create view l1 as select id, words from Mytweets_raw lateral view explode(sentences(lower(text))) dummy as words;
  71. create view l2 as select id, word from l1 lateral view explode( words ) dummy as word ;
  72. create view l3 as select
  73. id,
  74. l2.word,
  75. case d.polarity
  76. when 'negative' then -1
  77. when 'positive' then 1
  78. else 0 end as polarity
  79. from l2 left outer join dictionary d on l2.word = d.word;
  80. create table tweets_sentiment as select
  81. id,
  82. case
  83. when sum( polarity ) > 0 then 'positive'
  84. when sum( polarity ) < 0 then 'negative'
  85. else 'neutral' end as sentiment
  86. from l3 group by id;
  87. -- put everything back together and re-name sentiments...
  88. CREATE TABLE tweetsbi
  89. AS
  90. SELECT
  91. t.*,
  92. s.sentiment
  93. FROM tweets_clean t LEFT OUTER JOIN tweets_sentiment s on t.id = s.id;
  94. -- data with tweet counts.....
  95. CREATE TABLE tweetsbiaggr
  96. AS
  97. SELECT
  98. country,sentiment, count(sentiment) as tweet_count
  99. FROM tweetsbi
  100. group by country,sentiment;
  101. -- store data for analysis......
  102. CREATE VIEW A as select country,tweet_count as positive_response from tweetsbiaggr where sentiment='positive';
  103. CREATE VIEW B as select country,tweet_count as negative_response from tweetsbiaggr where sentiment='negative';
  104. CREATE VIEW C as select country,tweet_count as neutral_response from tweetsbiaggr where sentiment='neutral';
  105. CREATE TABLE tweetcompare as select A.*,B.negative_response as negative_response,C.neutral_response as neutral_response from A join B on A.country= B.country join C on B.country=C.country;
  106. -- permission to show data in Excel sheet for analysis ....
  107. grant SELECT ON TABLE tweetcompare to user hue;
  108. grant SELECT ON TABLE tweetcompare to user root;
  109. -- for Tableau or Excel
  110. -- UDAF sentiscore = sum(sentiment)*50 / count(sentiment)
  111. -- context n-gram made readable

执行查询时

  1. SELECT t.retweeted_screen_name, sum(retweets) AS total_retweets, count(*) AS tweet_count FROM (SELECT retweeted_status.user.screen_name as retweeted_screen_name, retweeted_status.text, max(retweet_count) as retweets FROM mytweets GROUP BY retweeted_status.user.screen_name, retweeted_status.text) t GROUP BY t.retweeted_screen_name ORDER BY total_retweets DESC LIMIT 10;

此错误显示:

  1. Query ID = root_20161114140028_852cb526-011f-4a25-95c8-8c6587a88759
  2. Total jobs = 2
  3. Launching Job 1 out of 2
  4. Number of reduce tasks not specified. Estimated from input data size: 1
  5. In order to change the average load for a reducer (in bytes):
  6. set hive.exec.reducers.bytes.per.reducer=<number>
  7. In order to limit the maximum number of reducers:
  8. set hive.exec.reducers.max=<number>
  9. In order to set a constant number of reducers:
  10. set mapreduce.job.reduces=<number>
  11. java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/tmp/e70ec3c9-14c7-41e9-ad11-2d4528057e47_resources/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar
  12. at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
  13. at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
  14. at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  15. at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
  16. at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
  17. at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
  18. at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
  19. at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
  20. at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:179)
  21. at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:98)
  22. at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:193)
  23. at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
  24. at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
  25. at java.security.AccessController.doPrivileged(Native Method)
  26. at javax.security.auth.Subject.doAs(Subject.java:422)
  27. at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
  28. at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
  29. at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
  30. at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
  31. at java.security.AccessController.doPrivileged(Native Method)
  32. at javax.security.auth.Subject.doAs(Subject.java:422)
  33. at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
  34. at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
  35. at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
  36. at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:433)
  37. at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:138)
  38. at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
  39. at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
  40. at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858)
  41. at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562)
  42. at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313)
  43. at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
  44. at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
  45. at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
  46. at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
  47. at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
  48. at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
  49. at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
  50. at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
  51. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  52. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  53. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  54. at java.lang.reflect.Method.invoke(Method.java:498)
  55. at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
  56. at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
  57. Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://localhost:9000/tmp/e70ec3c9-14c7-41e9-ad11-2d4528057e47_resources/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar)'
  58. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. File does not exist: hdfs://localhost:9000/tmp/e70ec3c9-14c7-41e9-ad11-2d4528057e47_resources/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar

配置单元站点.xml

  1. <?xml version="1.0" encoding="UTF-8" standalone="no"?>
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  3. <configuration>
  4. <property>
  5. <name>hive.metastore.warehouse.dir</name>
  6. <value>/usr/lib/warehouse</value>
  7. </property>
  8. <property>
  9. <name>hive.metastore.local</name>
  10. <value>true</value>
  11. </property>
  12. <property>
  13. <name>javax.jdo.option.ConnectionURL</name>
  14. <value>jdbc:derby:;databaseName=/usr/lib/warehouse/metastore_db;create=true </value>
  15. </property>
  16. <property>
  17. <name>hive.exec.reducers.bytes.per.reducer</name>
  18. <value>256000000</value>
  19. </property>
  20. <property>
  21. <name>hive.exec.reducers.max</name>
  22. <value>1009</value>
  23. </property>
  24. </configuration>

mapred-site.xml文件

  1. <?xml version="1.0"?>
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  3. <configuration>
  4. <property>
  5. <name>mapreduce.framework.name</name>
  6. <value>yarn</value>
  7. </property>
  8. <property>
  9. <name>mapreduce.job.reduces</name>
  10. <value>1</value>
  11. </property>
  12. </configuration>

core-site.xml文件

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  3. <configuration>
  4. <property>
  5. <name>fs.default.name</name>
  6. <value>hdfs://localhost:9000</value>
  7. </property>
  8. </configuration>

/etc/主机

  1. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
  2. ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

但是,我已将jar文件添加到配置单元中,相同的错误显示:

  1. ADD JAR file:///usr/lib/hive/lib/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar;

请帮我修一下。

dgiusagp

dgiusagp1#

尝试,

  1. hadoop fs -put /usr/lib/hive/lib/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar hdfs://localhost:9000/usr/lib/json-serde-1.3.8-SNAP‌​‌​SHOT-jar-with-depe‌​nd‌​encies.jar
  2. ADD JAR hdfs://localhost:9000/usr/lib/json-serde-1.3.8-SNAP‌​‌​SHOT-jar-with-depe‌​nd‌​encies.jar;

相关问题