org.apache.tika.Tika.parse()方法的使用及代码示例

x33g5p2x  于2022-01-29 转载在 其他  
字(7.8k)|赞(0)|评价(0)|浏览(365)

本文整理了Java中org.apache.tika.Tika.parse()方法的一些代码示例,展示了Tika.parse()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。Tika.parse()方法的具体详情如下:
包路径:org.apache.tika.Tika
类名称:Tika
方法名:parse

Tika.parse介绍

[英]Parses the given file and returns the extracted text content.
[中]解析给定文件并返回提取的文本内容。

代码示例

代码示例来源:origin: apache/tika

  1. public static void parseToReaderExample() throws Exception {
  2. File document = new File("example.doc");
  3. try (Reader reader = new Tika().parse(document)) {
  4. char[] buffer = new char[1000];
  5. int n = reader.read(buffer);
  6. while (n != -1) {
  7. System.out.append(CharBuffer.wrap(buffer, 0, n));
  8. n = reader.read(buffer);
  9. }
  10. }
  11. }

代码示例来源:origin: apache/tika

  1. /**
  2. * Parses the file at the given path and returns the extracted text content.
  3. *
  4. * @param path the path of the file to be parsed
  5. * @return extracted text content
  6. * @throws IOException if the file can not be read or parsed
  7. */
  8. public Reader parse(Path path) throws IOException {
  9. return parse(path, new Metadata());
  10. }

代码示例来源:origin: apache/tika

  1. /**
  2. * Parses the given file and returns the extracted text content.
  3. *
  4. * @param file the file to be parsed
  5. * @return extracted text content
  6. * @throws IOException if the file can not be read or parsed
  7. * @see #parse(Path)
  8. */
  9. public Reader parse(File file) throws IOException {
  10. return parse(file, new Metadata());
  11. }

代码示例来源:origin: apache/tika

  1. /**
  2. * Parses the given document and returns the extracted text content.
  3. * <p>
  4. * The returned reader will be responsible for closing the given stream.
  5. * The stream and any associated resources will be closed at or before
  6. * the time when the {@link Reader#close()} method is called.
  7. *
  8. * @param stream the document to be parsed
  9. * @return extracted text content
  10. * @throws IOException if the document can not be read or parsed
  11. */
  12. public Reader parse(InputStream stream) throws IOException {
  13. return parse(stream, new Metadata());
  14. }

代码示例来源:origin: apache/tika

  1. public void indexDocument(File file) throws Exception {
  2. try (Reader fulltext = tika.parse(file)) {
  3. Document document = new Document();
  4. document.add(new TextField("filename", file.getName(), Store.YES));
  5. document.add(new TextField("fulltext", fulltext));
  6. writer.addDocument(document);
  7. }
  8. }
  9. }

代码示例来源:origin: apache/tika

  1. /**
  2. * Parses the file at the given path and returns the extracted text content.
  3. * <p>
  4. * Metadata information extracted from the document is returned in
  5. * the supplied metadata instance.
  6. *
  7. * @param path the path of the file to be parsed
  8. * @param metadata where document's metadata will be populated
  9. * @return extracted text content
  10. * @throws IOException if the file can not be read or parsed
  11. */
  12. public Reader parse(Path path, Metadata metadata) throws IOException {
  13. InputStream stream = TikaInputStream.get(path, metadata);
  14. return parse(stream, metadata);
  15. }

代码示例来源:origin: apache/tika

  1. /**
  2. * Parses the given file and returns the extracted text content.
  3. * <p>
  4. * Metadata information extracted from the document is returned in
  5. * the supplied metadata instance.
  6. *
  7. * @param file the file to be parsed
  8. * @param metadata where document's metadata will be populated
  9. * @return extracted text content
  10. * @throws IOException if the file can not be read or parsed
  11. * @see #parse(Path)
  12. */
  13. public Reader parse(File file, Metadata metadata) throws IOException {
  14. @SuppressWarnings("deprecation")
  15. InputStream stream = TikaInputStream.get(file, metadata);
  16. return parse(stream, metadata);
  17. }

代码示例来源:origin: apache/tika

  1. /**
  2. * Parses the resource at the given URL and returns the extracted
  3. * text content.
  4. *
  5. * @param url the URL of the resource to be parsed
  6. * @return extracted text content
  7. * @throws IOException if the resource can not be read or parsed
  8. */
  9. public Reader parse(URL url) throws IOException {
  10. Metadata metadata = new Metadata();
  11. InputStream stream = TikaInputStream.get(url, metadata);
  12. return parse(stream, metadata);
  13. }

代码示例来源:origin: apache/tika

  1. public void indexContentSpecificMet(File file) throws Exception {
  2. Metadata met = new Metadata();
  3. try (InputStream is = new FileInputStream(file)) {
  4. tika.parse(is, met);
  5. Document document = new Document();
  6. for (String key : met.names()) {
  7. String[] values = met.getValues(key);
  8. for (String val : values) {
  9. document.add(new TextField(key, val, Store.YES));
  10. }
  11. writer.addDocument(document);
  12. }
  13. }
  14. }

代码示例来源:origin: apache/tika

  1. .equals(metadata.get(Metadata.CONTENT_TYPE))
  2. ? new InputStreamReader(inputStream, StandardCharsets.UTF_8)
  3. : secondaryParser.parse(inputStream);

代码示例来源:origin: apache/tika

  1. reader = new InputStreamReader(inputStream, StandardCharsets.UTF_8);
  2. } else {
  3. reader = secondaryParser.parse(inputStream);

代码示例来源:origin: apache/tika

  1. private Metadata getMetadata(String name) throws TikaException, IOException, SAXException {
  2. URL url = this.getClass().getResource("/org/apache/tika/config/"+name);
  3. assertNotNull("couldn't find: "+name, url);
  4. TikaConfig tikaConfig = new TikaConfig(url);
  5. Tika tika = new Tika(tikaConfig);
  6. Metadata metadata = new Metadata();
  7. tika.parse(url.openStream(), metadata);
  8. return metadata;
  9. }
  10. }

代码示例来源:origin: apache/tika

  1. public void indexWithDublinCore(File file) throws Exception {
  2. Metadata met = new Metadata();
  3. met.add(TikaCoreProperties.CREATOR, "Manning");
  4. met.add(TikaCoreProperties.CREATOR, "Tika in Action");
  5. met.set(TikaCoreProperties.CREATED, new Date());
  6. met.set(TikaCoreProperties.FORMAT, tika.detect(file));
  7. met.set(DublinCore.SOURCE, file.toURI().toURL().toString());
  8. met.add(TikaCoreProperties.SUBJECT, "File");
  9. met.add(TikaCoreProperties.SUBJECT, "Indexing");
  10. met.add(TikaCoreProperties.SUBJECT, "Metadata");
  11. met.set(Property.externalClosedChoise(TikaCoreProperties.RIGHTS.getName(), "public",
  12. "private"), "public");
  13. try (InputStream is = new FileInputStream(file)) {
  14. tika.parse(is, met);
  15. Document document = new Document();
  16. for (String key : met.names()) {
  17. String[] values = met.getValues(key);
  18. for (String val : values) {
  19. document.add(new TextField(key, val, Store.YES));
  20. }
  21. writer.addDocument(document);
  22. }
  23. }
  24. }
  25. }

代码示例来源:origin: apache/tika

  1. @Test
  2. public void testInitializableParser() throws Exception {
  3. URL configFileUrl = getClass().getClassLoader().getResource(TIKA_CFG_FILE);
  4. assert configFileUrl != null;
  5. TikaConfig config = new TikaConfig(configFileUrl);
  6. Tika tika = new Tika(config);
  7. Metadata md = new Metadata();
  8. tika.parse(TikaInputStream.get("someString".getBytes(StandardCharsets.ISO_8859_1)), md);
  9. assertEquals("5", md.get(DummyInitializableParser.SUM_FIELD));
  10. }
  11. }

代码示例来源:origin: org.apache.tika/tika-core

  1. /**
  2. * Parses the file at the given path and returns the extracted text content.
  3. *
  4. * @param path the path of the file to be parsed
  5. * @return extracted text content
  6. * @throws IOException if the file can not be read or parsed
  7. */
  8. public Reader parse(Path path) throws IOException {
  9. return parse(path, new Metadata());
  10. }

代码示例来源:origin: org.apache.tika/tika-core

  1. /**
  2. * Parses the given file and returns the extracted text content.
  3. *
  4. * @param file the file to be parsed
  5. * @return extracted text content
  6. * @throws IOException if the file can not be read or parsed
  7. * @see #parse(Path)
  8. */
  9. public Reader parse(File file) throws IOException {
  10. return parse(file, new Metadata());
  11. }

代码示例来源:origin: com.github.lafa.tikaNoExternal/tika-core

  1. /**
  2. * Parses the file at the given path and returns the extracted text content.
  3. *
  4. * @param path the path of the file to be parsed
  5. * @return extracted text content
  6. * @throws IOException if the file can not be read or parsed
  7. */
  8. public Reader parse(Path path) throws IOException {
  9. return parse(path, new Metadata());
  10. }

代码示例来源:origin: com.github.lafa.tikaNoExternal/tika-core

  1. /**
  2. * Parses the given file and returns the extracted text content.
  3. *
  4. * @param file the file to be parsed
  5. * @return extracted text content
  6. * @throws IOException if the file can not be read or parsed
  7. * @see #parse(Path)
  8. */
  9. public Reader parse(File file) throws IOException {
  10. return parse(file, new Metadata());
  11. }

代码示例来源:origin: org.apache.tika/tika-core

  1. /**
  2. * Parses the resource at the given URL and returns the extracted
  3. * text content.
  4. *
  5. * @param url the URL of the resource to be parsed
  6. * @return extracted text content
  7. * @throws IOException if the resource can not be read or parsed
  8. */
  9. public Reader parse(URL url) throws IOException {
  10. Metadata metadata = new Metadata();
  11. InputStream stream = TikaInputStream.get(url, metadata);
  12. return parse(stream, metadata);
  13. }

代码示例来源:origin: com.github.lafa.tikaNoExternal/tika-core

  1. /**
  2. * Parses the resource at the given URL and returns the extracted
  3. * text content.
  4. *
  5. * @param url the URL of the resource to be parsed
  6. * @return extracted text content
  7. * @throws IOException if the resource can not be read or parsed
  8. */
  9. public Reader parse(URL url) throws IOException {
  10. Metadata metadata = new Metadata();
  11. InputStream stream = TikaInputStream.get(url, metadata);
  12. return parse(stream, metadata);
  13. }

相关文章