C语言 为什么我的行数不同?

oxf4rvwz  于 2022-12-17  发布在  其他
关注(0)|答案(1)|浏览(139)

我用不同的编程语言制作了这些不同的程序来计算一个文件的行数,结果发现根据程序的不同输出也不同,但奇怪的是有些程序的结果是一样的,我是用一个6 GB的UTF-8 XML文件测试的,这个文件大约有1. 46亿行。

# Python
# Output -> 146114085 lines
import time

lines = 0

start = time.perf_counter()

with open('file_path') as myfile:
    for line in myfile:
        lines += 1

print("{} lines".format(lines))

end = time.perf_counter()

elapsed = end - start

print(f'Elapsed time: {elapsed:.3f} seconds')
// Java
// Output -> 146114085 lines (just as with python)

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

public class Main {
    public static void main(String[] args) {
        try {
            long startTime = System.currentTimeMillis();
            int BUFFER_SIZE = 1024*1024;
            String filePath = "file_path";
            FileReader file = file = new FileReader(filePath);
            BufferedReader reader = new BufferedReader(file, BUFFER_SIZE);
            long lines = reader.lines().count();
            reader.close();
            System.out.println("The number of lines is " + lines);
            long elapsedTime = System.currentTimeMillis() - startTime;
            System.out.println("Duration in seconds: " + elapsedTime/1000);
        } catch (FileNotFoundException e) {
            throw new RuntimeException(e);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }
}
// Rust
// Output -> 146113746 lines
use std::fs::File;
use std::io::{BufRead, BufReader, Error, Read};
use std::time::Instant;

fn main() {
    let file_path = "file_path";
    let buffer_size = 1024*1024;
    let start = Instant::now();
    if let Err(err) = read_file(buffer_size, file_path) {
        println!("{}", err);
    }
    let duration = start.elapsed();
    println!("The function took {} seconds to execute", duration.as_secs());
}

fn read_file(buffer_size: usize, file_path: &str) -> Result<(), Error> {
    let file = File::open(file_path)?;
    let reader = BufReader::with_capacity(buffer_size, file);
    let lines = reader.lines().fold(0, |sum, _| sum + 1);
    println!("Number of lines {}", lines);
    Ok(())
}
// C
// Output -> 146113745 lines (one line less than rust output)
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(int argc, char *argv[]) {
    // start time
    clock_t start = clock();

    // File path
    const char* file_path = "file_path";

    // Open the file for reading
    FILE *fp = fopen(file_path, "r");

    // Allocate a buffer to hold the data
    const size_t BUFFER_SIZE = 1024*1024;
    char *buffer = malloc(BUFFER_SIZE);

    // Declare the number of lines variable
    unsigned int lines = 0;

    // Read the data in chunks
    while (!feof(fp)) {
        // Read a chunk of data from the file
        size_t bytes_read = fread(buffer, 1, BUFFER_SIZE, fp);

        // Process the data here...
        for (int i = 0; i < bytes_read; i++) {
            if (buffer[i] == '\n') {
                lines++;
            }
        }
    }

    printf("The number of lines %u\n", lines);

    // Clean up
    free(buffer);
    fclose(fp);

    // End
    clock_t end = clock();

    // Calculate the elapsed time in seconds
    double elapsed = (double) ((end - start) / CLOCKS_PER_SEC);

    printf("Elapsed time: %f seconds", elapsed);

    return 0;
}

最后,命令wc Output -〉146113745行(与C一样)wc -l file_path
我认为正确答案是Rust 's,因为它比wc/C多了一个,而且是最后一行,在到达文件末尾时没有行的变化,让我困惑的情况是java和python。

46scxncf

46scxncf1#

我定义的行是.*?\\n|.+,这在https://regexr.com/中有效,由于某些原因,我在python和java中使用的文件阅读实现中,字符'\r'被解释为换行符,但这在Rust实现中没有发生,在wc实现中也没有,显然在我用C编写的实现中也没有(它是显式的),但是如果我将条件((buffer[i] == '\n')改为((buffer[i] == '\n') || (buffer[i] == '\r')),我得到的值与python和java中的值减1相同。

相关问题