org.apache.tika.Tika.detect()方法的使用及代码示例

x33g5p2x  于2022-01-29 转载在 其他  
字(10.0k)|赞(0)|评价(0)|浏览(769)

本文整理了Java中org.apache.tika.Tika.detect()方法的一些代码示例,展示了Tika.detect()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。Tika.detect()方法的具体详情如下:
包路径:org.apache.tika.Tika
类名称:Tika
方法名:detect

Tika.detect介绍

[英]Detects the media type of the given file. The type detection is based on the document content and a potential known file extension.

Use the #detect(String) method when you want to detect the type of the document without actually accessing the file.
[中]检测给定文件的媒体类型。类型检测基于文档内容和潜在的已知文件扩展名。
如果希望在不实际访问文件的情况下检测文档类型,请使用#detect(String)方法。

代码示例

代码示例来源:origin: stackoverflow.com

  1. Tika tika = new Tika();
  2. File file = ...
  3. String mimeType = tika.detect(file);

代码示例来源:origin: BroadleafCommerce/BroadleafCommerce

  1. protected void getMimeType(InputStream inputStream, String fileName, StaticAsset newAsset) {
  2. Tika tika = new Tika();
  3. String tikaMimeType = tika.detect(fileName);
  4. if (tikaMimeType == null) {
  5. try {
  6. tikaMimeType = tika.detect(inputStream);
  7. } catch (IOException e) {
  8. //if tika can't resolve, don't throw exception
  9. }
  10. }
  11. if (tikaMimeType != null) {
  12. newAsset.setMimeType(tikaMimeType);
  13. }
  14. }

代码示例来源:origin: apache/tika

  1. /**
  2. * Detects the media type of a document with the given file name.
  3. * The type detection is based on known file name extensions.
  4. * <p>
  5. * The given name can also be a URL or a full file path. In such cases
  6. * only the file name part of the string is used for type detection.
  7. *
  8. * @param name the file name of the document
  9. * @return detected media type
  10. */
  11. public String detect(String name) {
  12. try {
  13. return detect((InputStream) null, name);
  14. } catch (IOException e) {
  15. throw new IllegalStateException("Unexpected IOException", e);
  16. }
  17. }

代码示例来源:origin: apache/tika

  1. @Override
  2. public String probeContentType(Path path) throws IOException {
  3. // Try to detect based on the file name only for efficiency
  4. String fileNameDetect = tika.detect(path.toString());
  5. if(!fileNameDetect.equals(MimeTypes.OCTET_STREAM)) {
  6. return fileNameDetect;
  7. }
  8. // Then check the file content if necessary
  9. String fileContentDetect = tika.detect(path);
  10. if(!fileContentDetect.equals(MimeTypes.OCTET_STREAM)) {
  11. return fileContentDetect;
  12. }
  13. // Specification says to return null if we could not
  14. // conclusively determine the file type
  15. return null;
  16. }

代码示例来源:origin: apache/tika

  1. public static void main(String[] args) throws Exception {
  2. Tika tika = new Tika();
  3. for (String file : args) {
  4. String type = tika.detect(new File(file));
  5. System.out.println(file + ": " + type);
  6. }
  7. }
  8. }

代码示例来源:origin: apache/tika

  1. /**
  2. * Detects the media type of the given document. The type detection is
  3. * based on the content of the given document stream.
  4. * <p>
  5. * If the document stream supports the
  6. * {@link InputStream#markSupported() mark feature}, then the stream is
  7. * marked and reset to the original position before this method returns.
  8. * Only a limited number of bytes are read from the stream.
  9. * <p>
  10. * The given document stream is <em>not</em> closed by this method.
  11. *
  12. * @param stream the document stream
  13. * @return detected media type
  14. * @throws IOException if the stream can not be read
  15. */
  16. public String detect(InputStream stream) throws IOException {
  17. return detect(stream, new Metadata());
  18. }

代码示例来源:origin: apache/tika

  1. /**
  2. * Detects the media type of the given document. The type detection is
  3. * based on the first few bytes of a document.
  4. * <p>
  5. * For best results at least a few kilobytes of the document data
  6. * are needed. See also the other detect() methods for better
  7. * alternatives when you have more than just the document prefix
  8. * available for type detection.
  9. *
  10. * @since Apache Tika 0.9
  11. * @param prefix first few bytes of the document
  12. * @return detected media type
  13. */
  14. public String detect(byte[] prefix) {
  15. try {
  16. try (InputStream stream = TikaInputStream.get(prefix)) {
  17. return detect(stream);
  18. }
  19. } catch (IOException e) {
  20. throw new IllegalStateException("Unexpected IOException", e);
  21. }
  22. }

代码示例来源:origin: apache/tika

  1. public static String customMimeInfo() throws Exception {
  2. String path = "file:///path/to/prescription-type.xml";
  3. MimeTypes typeDatabase = MimeTypesFactory.create(new URL(path));
  4. Tika tika = new Tika(typeDatabase);
  5. String type = tika.detect("/path/to/prescription.xpd");
  6. return type;
  7. }

代码示例来源:origin: apache/tika

  1. public static String detectWithCustomConfig(String name) throws Exception {
  2. String config = "/org/apache/tika/mime/tika-mimetypes.xml";
  3. Tika tika = new Tika(MimeTypesFactory.create(config));
  4. return tika.detect(name);
  5. }

代码示例来源:origin: apache/tika

  1. /**
  2. * Find the Mime Content Type of a document stored in the given file.
  3. * Returns application/octet-stream if no better match is found.
  4. *
  5. * @deprecated Use {@link Tika#detect(File)} instead
  6. * @param file file to analyze
  7. * @return the Mime Content Type of the specified document
  8. * @throws MimeTypeException if the type can't be detected
  9. * @throws IOException if the file can't be read
  10. */
  11. public MimeType getMimeType(File file)
  12. throws MimeTypeException, IOException {
  13. return forName(new Tika(this).detect(file));
  14. }

代码示例来源:origin: apache/tika

  1. /**
  2. * Detects the media type of the file at the given path. The type
  3. * detection is based on the document content and a potential known
  4. * file extension.
  5. * <p>
  6. * Use the {@link #detect(String)} method when you want to detect the
  7. * type of the document without actually accessing the file.
  8. *
  9. * @param path the path of the file
  10. * @return detected media type
  11. * @throws IOException if the file can not be read
  12. */
  13. public String detect(Path path) throws IOException {
  14. Metadata metadata = new Metadata();
  15. try (InputStream stream = TikaInputStream.get(path, metadata)) {
  16. return detect(stream, metadata);
  17. }
  18. }

代码示例来源:origin: apache/tika

  1. /**
  2. * Detects the media type of the resource at the given URL. The type
  3. * detection is based on the document content and a potential known
  4. * file extension included in the URL.
  5. * <p>
  6. * Use the {@link #detect(String)} method when you want to detect the
  7. * type of the document without actually accessing the URL.
  8. *
  9. * @param url the URL of the resource
  10. * @return detected media type
  11. * @throws IOException if the resource can not be read
  12. */
  13. public String detect(URL url) throws IOException {
  14. Metadata metadata = new Metadata();
  15. try (InputStream stream = TikaInputStream.get(url, metadata)) {
  16. return detect(stream, metadata);
  17. }
  18. }

代码示例来源:origin: apache/tika

  1. /**
  2. * Detects the media type of the given file. The type detection is
  3. * based on the document content and a potential known file extension.
  4. * <p>
  5. * Use the {@link #detect(String)} method when you want to detect the
  6. * type of the document without actually accessing the file.
  7. *
  8. * @param file the file
  9. * @return detected media type
  10. * @throws IOException if the file can not be read
  11. * @see #detect(Path)
  12. */
  13. public String detect(File file) throws IOException {
  14. Metadata metadata = new Metadata();
  15. try (@SuppressWarnings("deprecation")
  16. InputStream stream = TikaInputStream.get(file, metadata)) {
  17. return detect(stream, metadata);
  18. }
  19. }

代码示例来源:origin: apache/tika

  1. private static void benchmark(File file) throws Exception {
  2. if (file.isHidden()) {
  3. // ignore
  4. } else if (file.isFile()) {
  5. try (InputStream input = new FileInputStream(file)) {
  6. byte[] content = IOUtils.toByteArray(input);
  7. String type =
  8. tika.detect(new ByteArrayInputStream(content));
  9. long start = System.currentTimeMillis();
  10. for (int i = 0; i < 1000; i++) {
  11. tika.detect(new ByteArrayInputStream(content));
  12. }
  13. System.out.printf(
  14. Locale.ROOT,
  15. "%6dns per Tika.detect(%s) = %s%n",
  16. System.currentTimeMillis() - start, file, type);
  17. }
  18. } else if (file.isDirectory()) {
  19. for (File child : file.listFiles()) {
  20. benchmark(child);
  21. }
  22. }
  23. }

代码示例来源:origin: apache/tika

  1. public static String customCompositeDetector() throws Exception {
  2. String path = "file:///path/to/prescription-type.xml";
  3. MimeTypes typeDatabase = MimeTypesFactory.create(new URL(path));
  4. Tika tika = new Tika(new CompositeDetector(typeDatabase,
  5. new EncryptedPrescriptionDetector()));
  6. String type = tika.detect("/path/to/tmp/prescription.xpd");
  7. return type;
  8. }

代码示例来源:origin: apache/tika

  1. public static String detectWithCustomDetector(String name) throws Exception {
  2. String config = "/org/apache/tika/mime/tika-mimetypes.xml";
  3. Detector detector = MimeTypesFactory.create(config);
  4. Detector custom = new Detector() {
  5. private static final long serialVersionUID = -5420638839201540749L;
  6. public MediaType detect(InputStream input, Metadata metadata) {
  7. String type = metadata.get("my-custom-type-override");
  8. if (type != null) {
  9. return MediaType.parse(type);
  10. } else {
  11. return MediaType.OCTET_STREAM;
  12. }
  13. }
  14. };
  15. Tika tika = new Tika(new CompositeDetector(custom, detector));
  16. return tika.detect(name);
  17. }
  18. }

代码示例来源:origin: apache/tika

  1. @Test
  2. public void testByteOrderMark() throws Exception {
  3. assertEquals(MediaType.TEXT_PLAIN.toString(), tika.detect(
  4. new ByteArrayInputStream("\ufefftest".getBytes(UTF_16LE)),
  5. new Metadata()));
  6. assertEquals(MediaType.TEXT_PLAIN.toString(), tika.detect(
  7. new ByteArrayInputStream("\ufefftest".getBytes(UTF_16BE)),
  8. new Metadata()));
  9. assertEquals(MediaType.TEXT_PLAIN.toString(), tika.detect(
  10. new ByteArrayInputStream("\ufefftest".getBytes(UTF_8)),
  11. new Metadata()));
  12. }

代码示例来源:origin: apache/tika

  1. private void testStream(String expected, String urlOrFileName,
  2. InputStream in) throws IOException {
  3. assertNotNull("Test stream: [" + urlOrFileName + "] is null!", in);
  4. if (!in.markSupported()) {
  5. in = new java.io.BufferedInputStream(in);
  6. }
  7. try {
  8. Metadata metadata = new Metadata();
  9. // String mime = this.proDetector.detect(in, metadata).toString();
  10. String mime = tika.detect(in, metadata).toString();
  11. assertEquals(
  12. urlOrFileName + " is not properly detected: detected.",
  13. expected, mime);
  14. // Add resource name and test again
  15. metadata.set(TikaCoreProperties.RESOURCE_NAME_KEY, urlOrFileName);
  16. // mime = this.proDetector.detect(in, metadata).toString();
  17. mime = tika.detect(in, metadata).toString();
  18. assertEquals(urlOrFileName
  19. + " is not properly detected after adding resource name.",
  20. expected, mime);
  21. } finally {
  22. in.close();
  23. }
  24. }

代码示例来源:origin: apache/tika

  1. /**
  2. * Test for things like javascript files whose content is enclosed in XML
  3. * comment delimiters, but that aren't actually XML.
  4. *
  5. * @see <a
  6. * href="https://issues.apache.org/jira/browse/TIKA-426">TIKA-426</a>
  7. */
  8. @Test
  9. public void testNotXML() throws IOException {
  10. assertEquals(MediaType.TEXT_PLAIN.toString(), tika.detect(
  11. new ByteArrayInputStream("<!-- test -->".getBytes(UTF_8)),
  12. new Metadata()));
  13. }

代码示例来源:origin: apache/tika

  1. /**
  2. * Test for type detection of empty documents.
  3. *
  4. * @see <a
  5. * href="https://issues.apache.org/jira/browse/TIKA-483">TIKA-483</a>
  6. */
  7. @Test
  8. public void testEmptyDocument() throws IOException {
  9. assertEquals(MediaType.OCTET_STREAM.toString(), tika.detect(
  10. new ByteArrayInputStream(new byte[0]), new Metadata()));
  11. Metadata namehint = new Metadata();
  12. namehint.set(TikaCoreProperties.RESOURCE_NAME_KEY, "test.txt");
  13. assertEquals(MediaType.TEXT_PLAIN.toString(),
  14. tika.detect(new ByteArrayInputStream(new byte[0]), namehint));
  15. Metadata typehint = new Metadata();
  16. typehint.set(Metadata.CONTENT_TYPE, "text/plain");
  17. assertEquals(MediaType.TEXT_PLAIN.toString(),
  18. tika.detect(new ByteArrayInputStream(new byte[0]), typehint));
  19. }

相关文章