如何将csv文件按大小拆分为多个文件

oaxa6hgo 于 2022-12-06 发布在其他

关注(0)|答案(3)|浏览(689)

在java项目中，我生成了一个大csv文件（大约500 Mb），我需要将该文件拆分为多个文件，每个文件的大小最多为10 Mb。我发现很多类似的帖子，但其中任何一个都回答了我的问题，因为在所有帖子中，Java代码将原始文件拆分为正好10 Mb的文件，并且（显然）截断记录。相反，我需要每个记录都是完整的，完好无损的。任何记录都应该被截断。如果我从原始的大csv文件复制一个记录到一个生成的文件，如果我复制记录，文件大小将溢出10 Mb，我应该可以不复制那个记录，关闭那个文件，创建一个新文件，然后在新文件中复制那个记录。这可能吗？有人能帮助我吗？谢谢！
我试了这个代码：

File f = new File("/home/luca/Desktop/test/images.csv");
BufferedInputStream bis = new BufferedInputStream(new FileInputStream(f));
FileOutputStream out;
String name = f.getName();
int partCounter = 1;
int sizeOfFiles = 10 * 1024 * 1024;// 1MB
byte[] buffer = new byte[sizeOfFiles];
int tmp = 0;
while ((tmp = bis.read(buffer)) > 0) {
 File newFile=new File("/home/luca/Desktop/test/"+name+"."+String.format("%03d", partCounter++));
 newFile.createNewFile();
 out = new FileOutputStream(newFile);
 out.write(buffer,0,tmp);
 out.close();
}

但显然不起作用。这段代码将一个源文件分割成n个10 Mb的文件，并截断记录。在我的例子中，我的csv文件有16列，所以用上面的过程，我有一个例子，最后一个记录只有5列填充。其他的都被截断了。

解决方案这里是我写的代码。

FileReader fileReader = new FileReader("/home/luca/Desktop/test/images.csv");
BufferedReader bufferedReader = new BufferedReader(fileReader);
String line="";
int fileSize = 0;
BufferedWriter fos = new BufferedWriter(new FileWriter("/home/luca/Desktop/test/images_"+new Date().getTime()+".csv",true));
while((line = bufferedReader.readLine()) != null) {
    if(fileSize + line.getBytes().length > 9.5 * 1024 * 1024){
        fos.flush();
        fos.close();
        fos = new BufferedWriter(new FileWriter("/home/luca/Desktop/test/images_"+new Date().getTime()+".csv",true));
        fos.write(line+"\n");
        fileSize = line.getBytes().length;
    }else{
        fos.write(line+"\n");
        fileSize += line.getBytes().length;
    }
}          
fos.flush();
fos.close();
bufferedReader.close();

此代码读取一个csv文件并将其拆分为n个文件，每个文件最大为10 Mb，每个csv行完全复制或根本不复制。

csv

来源：https://stackoverflow.com/questions/19635844/how-to-split-csv-file-into-multiple-files-by-size

3条答案

按热度按时间

5lwkijsr1#

原则上很简单。
创建一个10MB的缓冲区（byte[]），从源文件中读取尽可能多的字节。然后从 back 开始搜索换行符。从缓冲区的开头到换行符的部分=新文件。保留多余的部分，并将其复制到缓冲区的开头（偏移量0）。然后重复所有操作，直到没有源文件为止。

赞(0）回复(0）举报 2022-12-06

mf98qq942#

使用此split -a 3 -b 100m -d filename.tar.gz newfilename

赞(0）回复(0）举报 2022-12-06

p1iqtdky3#

这会将任何基于行的文件（包括CSV）拆分为指定大小的（行长度- 1）以内的文件。它会重复指定的标题行（例如带有标题行的CSV）：

protected void processDocument(File inFile, long maxFileSize, boolean containsHeaderRow) {       
    if (maxFileSize > 0 && infile.length() > maxFileSize) {
        FileReader fileReader = new FileReader(inFile);
        BufferedReader bufferedReader = new BufferedReader(fileReader);
        try {
            byte[] headerRow = new byte[0];
            if (containsHeaderRow) {
                try {
                    String headerLine = bufferedReader.readLine();
                    if (headerLine != null) {
                        headerRow = (headerLine + "\n").getBytes();
                    }
                } catch (IOException e1) {
                    throw new Exception("Failed to read header row from input file.", e1);
                }
            }
            long headerRowByteCount = headerRow.length;
            if (maxFileSize < headerRowByteCount) {
                // Would just write header repeatedly so throw error
                throw new Exception("Split file size is less than the header row size.");
            }
            int fileCount = 0;
            boolean notEof = true;
            while (notEof) {
                fileCount += 1;
                long fileSize = 0;
                // create a new file with same path but appended count
                String newFilename = inFile.getAbsolutePath() |+ "-" + fileCount;
                File outFile = new File(newFilename);
                BufferedOutputStream fos = null;
                try {
                    try {
                        fos = new BufferedOutputStream(new FileOutputStream(outFile));
                    } catch (IOException e) {
                        throw new Exception("Failed to initialise output file for file splitting on file " + fileCount, e);
                    }
                    if (containsHeaderRow) {
                        try {
                            fos.write(headerRow);
                        } catch (IOException e) {
                            throw new Exception("Failed to write header row to output file for file splitting on file " + fileCount, e);
                        }
                        fileSize += headerRowByteCount;
                    }
                    while (fileSize < maxFileSize) {
                        String line = null;
                        try {
                            line = bufferedReader.readLine();
                        } catch (IOException e) {
                            throw new Exception("Failed to write output file for file splitting on file " + fileCount, e);
                        }
                        if (line == null) {
                            notEof = false;
                            break;
                        }
                        byte[] lineBytes = (line + "\n").getBytes();
                        fos.write(lineBytes);
                        fileSize += lineBytes.length;
                    }
                    fos.flush();
                    fos.close();
                    processDocument(outFile); 
                } catch (IOException e) {
                    throw new Exception("Failed to write output file for file splitting on file number" + fileCount, e);
                } finally {
                    try {
                        if (fos != null) {
                            fos.close();
                        }
                    } catch (IOException e) {
                    }
                }
            }
        } finally {
            try {
                bufferedReader.close();
            } catch (IOException e) {
                throw new Exception("Failed to close reader for input file.", e);
            }
        }

    } else {
        processDocument(inFile); 
    }
}

赞(0）回复(0）举报 2022-12-06

我来回答

如何将csv文件按大小拆分为多个文件

3条答案

相关问题

热门标签

最新问答