org.apache.tika.Tika.detect()方法的使用及代码示例

x33g5p2x  于2022-01-29 转载在 其他  
字(10.0k)|赞(0)|评价(0)|浏览(682)

本文整理了Java中org.apache.tika.Tika.detect()方法的一些代码示例,展示了Tika.detect()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。Tika.detect()方法的具体详情如下:
包路径:org.apache.tika.Tika
类名称:Tika
方法名:detect

Tika.detect介绍

[英]Detects the media type of the given file. The type detection is based on the document content and a potential known file extension.

Use the #detect(String) method when you want to detect the type of the document without actually accessing the file.
[中]检测给定文件的媒体类型。类型检测基于文档内容和潜在的已知文件扩展名。
如果希望在不实际访问文件的情况下检测文档类型,请使用#detect(String)方法。

代码示例

代码示例来源:origin: stackoverflow.com

Tika tika = new Tika();
File file = ...
String mimeType = tika.detect(file);

代码示例来源:origin: BroadleafCommerce/BroadleafCommerce

protected void getMimeType(InputStream inputStream, String fileName, StaticAsset newAsset) {
  Tika tika = new Tika();
  String tikaMimeType = tika.detect(fileName);
  if (tikaMimeType == null) {
    try {
      tikaMimeType = tika.detect(inputStream);
    } catch (IOException e) {
      //if tika can't resolve, don't throw exception
    }
  }
  if (tikaMimeType != null) {
    newAsset.setMimeType(tikaMimeType);
  }
}

代码示例来源:origin: apache/tika

/**
 * Detects the media type of a document with the given file name.
 * The type detection is based on known file name extensions.
 * <p>
 * The given name can also be a URL or a full file path. In such cases
 * only the file name part of the string is used for type detection. 
 *
 * @param name the file name of the document
 * @return detected media type
 */
public String detect(String name) {
  try {
    return detect((InputStream) null, name);
  } catch (IOException e) {
    throw new IllegalStateException("Unexpected IOException", e);
  }
}

代码示例来源:origin: apache/tika

@Override
public String probeContentType(Path path) throws IOException {
  // Try to detect based on the file name only for efficiency
  String fileNameDetect = tika.detect(path.toString());
  if(!fileNameDetect.equals(MimeTypes.OCTET_STREAM)) {
    return fileNameDetect;
  }
  
  // Then check the file content if necessary
  String fileContentDetect = tika.detect(path);
  if(!fileContentDetect.equals(MimeTypes.OCTET_STREAM)) {
    return fileContentDetect;
  }
  
  // Specification says to return null if we could not 
  // conclusively determine the file type
  return null;
}

代码示例来源:origin: apache/tika

public static void main(String[] args) throws Exception {
    Tika tika = new Tika();

    for (String file : args) {
      String type = tika.detect(new File(file));
      System.out.println(file + ": " + type);
    }
  }
}

代码示例来源:origin: apache/tika

/**
 * Detects the media type of the given document. The type detection is
 * based on the content of the given document stream.
 * <p>
 * If the document stream supports the
 * {@link InputStream#markSupported() mark feature}, then the stream is
 * marked and reset to the original position before this method returns.
 * Only a limited number of bytes are read from the stream.
 * <p>
 * The given document stream is <em>not</em> closed by this method.
 *
 * @param stream the document stream
 * @return detected media type
 * @throws IOException if the stream can not be read
 */
public String detect(InputStream stream) throws IOException {
  return detect(stream, new Metadata());
}

代码示例来源:origin: apache/tika

/**
 * Detects the media type of the given document. The type detection is
 * based on the first few bytes of a document.
 * <p>
 * For best results at least a few kilobytes of the document data
 * are needed. See also the other detect() methods for better
 * alternatives when you have more than just the document prefix
 * available for type detection.
 *
 * @since Apache Tika 0.9
 * @param prefix first few bytes of the document
 * @return detected media type
 */
public String detect(byte[] prefix) {
  try {
    try (InputStream stream = TikaInputStream.get(prefix)) {
      return detect(stream);
    }
  } catch (IOException e) {
    throw new IllegalStateException("Unexpected IOException", e);
  }
}

代码示例来源:origin: apache/tika

public static String customMimeInfo() throws Exception {
  String path = "file:///path/to/prescription-type.xml";
  MimeTypes typeDatabase = MimeTypesFactory.create(new URL(path));
  Tika tika = new Tika(typeDatabase);
  String type = tika.detect("/path/to/prescription.xpd");
  return type;
}

代码示例来源:origin: apache/tika

public static String detectWithCustomConfig(String name) throws Exception {
  String config = "/org/apache/tika/mime/tika-mimetypes.xml";
  Tika tika = new Tika(MimeTypesFactory.create(config));
  return tika.detect(name);
}

代码示例来源:origin: apache/tika

/**
 * Find the Mime Content Type of a document stored in the given file.
 * Returns application/octet-stream if no better match is found.
 *
 * @deprecated Use {@link Tika#detect(File)} instead
 * @param file file to analyze
 * @return the Mime Content Type of the specified document
 * @throws MimeTypeException if the type can't be detected
 * @throws IOException if the file can't be read
 */
public MimeType getMimeType(File file)
    throws MimeTypeException, IOException {
  return forName(new Tika(this).detect(file));
}

代码示例来源:origin: apache/tika

/**
 * Detects the media type of the file at the given path. The type
 * detection is based on the document content and a potential known
 * file extension.
 * <p>
 * Use the {@link #detect(String)} method when you want to detect the
 * type of the document without actually accessing the file.
 *
 * @param path the path of the file
 * @return detected media type
 * @throws IOException if the file can not be read
 */
public String detect(Path path) throws IOException {
  Metadata metadata = new Metadata();
  try (InputStream stream = TikaInputStream.get(path, metadata)) {
    return detect(stream, metadata);
  }
}

代码示例来源:origin: apache/tika

/**
 * Detects the media type of the resource at the given URL. The type
 * detection is based on the document content and a potential known
 * file extension included in the URL.
 * <p>
 * Use the {@link #detect(String)} method when you want to detect the
 * type of the document without actually accessing the URL.
 *
 * @param url the URL of the resource
 * @return detected media type
 * @throws IOException if the resource can not be read
 */
public String detect(URL url) throws IOException {
  Metadata metadata = new Metadata();
  try (InputStream stream = TikaInputStream.get(url, metadata)) {
    return detect(stream, metadata);
  }
}

代码示例来源:origin: apache/tika

/**
 * Detects the media type of the given file. The type detection is
 * based on the document content and a potential known file extension.
 * <p>
 * Use the {@link #detect(String)} method when you want to detect the
 * type of the document without actually accessing the file.
 *
 * @param file the file
 * @return detected media type
 * @throws IOException if the file can not be read
 * @see #detect(Path)
 */
public String detect(File file) throws IOException {
  Metadata metadata = new Metadata();
  try (@SuppressWarnings("deprecation")
  InputStream stream = TikaInputStream.get(file, metadata)) {
    return detect(stream, metadata);
  }
}

代码示例来源:origin: apache/tika

private static void benchmark(File file) throws Exception {
  if (file.isHidden()) {
    // ignore
  } else if (file.isFile()) {
    try (InputStream input = new FileInputStream(file)) {
      byte[] content = IOUtils.toByteArray(input);
      String type =
          tika.detect(new ByteArrayInputStream(content));
      long start = System.currentTimeMillis();
      for (int i = 0; i < 1000; i++) {
        tika.detect(new ByteArrayInputStream(content));
      }
      System.out.printf(
          Locale.ROOT,
          "%6dns per Tika.detect(%s) = %s%n",
          System.currentTimeMillis() - start, file, type);
    }
  } else if (file.isDirectory()) {
    for (File child : file.listFiles()) {
      benchmark(child);
    }
  }
}

代码示例来源:origin: apache/tika

public static String customCompositeDetector() throws Exception {
  String path = "file:///path/to/prescription-type.xml";
  MimeTypes typeDatabase = MimeTypesFactory.create(new URL(path));
  Tika tika = new Tika(new CompositeDetector(typeDatabase,
      new EncryptedPrescriptionDetector()));
  String type = tika.detect("/path/to/tmp/prescription.xpd");
  return type;
}

代码示例来源:origin: apache/tika

public static String detectWithCustomDetector(String name) throws Exception {
    String config = "/org/apache/tika/mime/tika-mimetypes.xml";
    Detector detector = MimeTypesFactory.create(config);

    Detector custom = new Detector() {
      private static final long serialVersionUID = -5420638839201540749L;

      public MediaType detect(InputStream input, Metadata metadata) {
        String type = metadata.get("my-custom-type-override");
        if (type != null) {
          return MediaType.parse(type);
        } else {
          return MediaType.OCTET_STREAM;
        }
      }
    };

    Tika tika = new Tika(new CompositeDetector(custom, detector));
    return tika.detect(name);
  }
}

代码示例来源:origin: apache/tika

@Test
public void testByteOrderMark() throws Exception {
  assertEquals(MediaType.TEXT_PLAIN.toString(), tika.detect(
      new ByteArrayInputStream("\ufefftest".getBytes(UTF_16LE)),
      new Metadata()));
  assertEquals(MediaType.TEXT_PLAIN.toString(), tika.detect(
      new ByteArrayInputStream("\ufefftest".getBytes(UTF_16BE)),
      new Metadata()));
  assertEquals(MediaType.TEXT_PLAIN.toString(), tika.detect(
      new ByteArrayInputStream("\ufefftest".getBytes(UTF_8)),
      new Metadata()));
}

代码示例来源:origin: apache/tika

private void testStream(String expected, String urlOrFileName,
    InputStream in) throws IOException {
  assertNotNull("Test stream: [" + urlOrFileName + "] is null!", in);
  if (!in.markSupported()) {
    in = new java.io.BufferedInputStream(in);
  }
  try {
    Metadata metadata = new Metadata();
    // String mime = this.proDetector.detect(in, metadata).toString();
    String mime = tika.detect(in, metadata).toString();
    assertEquals(
        urlOrFileName + " is not properly detected: detected.",
        expected, mime);
    // Add resource name and test again
    metadata.set(TikaCoreProperties.RESOURCE_NAME_KEY, urlOrFileName);
    // mime = this.proDetector.detect(in, metadata).toString();
    mime = tika.detect(in, metadata).toString();
    assertEquals(urlOrFileName
        + " is not properly detected after adding resource name.",
        expected, mime);
  } finally {
    in.close();
  }
}

代码示例来源:origin: apache/tika

/**
 * Test for things like javascript files whose content is enclosed in XML
 * comment delimiters, but that aren't actually XML.
 * 
 * @see <a
 *      href="https://issues.apache.org/jira/browse/TIKA-426">TIKA-426</a>
 */
@Test
public void testNotXML() throws IOException {
  assertEquals(MediaType.TEXT_PLAIN.toString(), tika.detect(
      new ByteArrayInputStream("<!-- test -->".getBytes(UTF_8)),
      new Metadata()));
}

代码示例来源:origin: apache/tika

/**
 * Test for type detection of empty documents.
 * 
 * @see <a
 *      href="https://issues.apache.org/jira/browse/TIKA-483">TIKA-483</a>
 */
@Test
public void testEmptyDocument() throws IOException {
  assertEquals(MediaType.OCTET_STREAM.toString(), tika.detect(
      new ByteArrayInputStream(new byte[0]), new Metadata()));
  Metadata namehint = new Metadata();
  namehint.set(TikaCoreProperties.RESOURCE_NAME_KEY, "test.txt");
  assertEquals(MediaType.TEXT_PLAIN.toString(),
      tika.detect(new ByteArrayInputStream(new byte[0]), namehint));
  Metadata typehint = new Metadata();
  typehint.set(Metadata.CONTENT_TYPE, "text/plain");
  assertEquals(MediaType.TEXT_PLAIN.toString(),
      tika.detect(new ByteArrayInputStream(new byte[0]), typehint));
}

相关文章