Apache tika检测到csv的mime类型不正确

ulmd4ohb  于 2023-01-10  发布在  Apache
关注(0)|答案(2)|浏览(259)

我已经创建了.csv文件使用excel和我写了下面的代码使用apache tika:

public static boolean checkThatMimeTypeIsCsv(InputStream inputStream) throws IOException {
    BufferedInputStream bis = new BufferedInputStream(inputStream);
    AutoDetectParser parser = new AutoDetectParser();
    Detector detector = parser.getDetector();
    Metadata md = new Metadata();
    MediaType mediaType = detector.detect(bis, md);
    return "text/csv".equals(mediaType.toString());
}

public static void main(String[] args) throws IOException {
    System.out.println(checkThatMimeTypeIsCsv(new FileInputStream("Data.csv")));
}

但是它返回false '。
是蒂卡太坏了还是我错过了什么?

6pp0gazn

6pp0gazn1#

试试这个...

public static String checkThatMimeTypeIsCsv(String fileName ) throws Exception {
    File sourceFile = new File(fileName );
    DefaultDetector file_detector = new DefaultDetector();
    TikaInputStream file_stream = TikaInputStream.get(sourceFile);
    Metadata metadata = new Metadata();
    metadata.set(Metadata.RESOURCE_NAME_KEY, sourceFile.getName());
    MediaType mediaType = file_detector.detect(file_stream, metadata);              
    String file_type = mediaType.toString();
    System.out.println(file_type);
    return file_type;
}
wkyowqbh

wkyowqbh2#

下面是一个如何使用Apache Tika 2.6.0(当前版本)执行此操作的示例

// Read a CSV file.       
File file = new File("src/test/resources/testcsv/entities.csv");
String csvContent = Files.readString(file.toPath());

InputStream is = new FileInputStream(file);
BufferedInputStream bufferedInputStream = new BufferedInputStream(is);

// Prepare Tika data for detection
Metadata metadata = new Metadata();
metadata.set(TikaCoreProperties.RESOURCE_NAME_KEY, file.getFilename());

String detectedMimeType = MimeTypes.getDefaultMimeTypes().detect(bufferedInputStream, metadata).toString();
assertEquals("text/csv", detectedMimeType);

对于一个没有真实的的CSV文件,但试图伪造扩展名:

// Read a file that is not a CSV. I've downloaded  https://upload.wikimedia.org/wikipedia/commons/7/74/Apache_Tika_Logo.svg and renamed to '.csv' extension for the test        
File file = new File("src/test/resources/testcsv/Apache_Tika_Logo.csv");
String csvContent = Files.readString(file.toPath());

InputStream is = new FileInputStream(file);
BufferedInputStream bufferedInputStream = new BufferedInputStream(is);

// Prepare Tika data for detection
Metadata metadata = new Metadata();
metadata.set(TikaCoreProperties.RESOURCE_NAME_KEY, file.getFilename());

String detectedMimeType = MimeTypes.getDefaultMimeTypes().detect(bufferedInputStream, metadata).toString();
assertNotEquals("text/csv", detectedMimeType);

示例文件中detectedMimeType变量的输出为image/svg+xml

相关问题