使用C程序搜索用echo命令创建的文件时未找到子字符串

chhkpiq4  于 2023-03-01  发布在  其他
关注(0)|答案(2)|浏览(90)

我正在编写一个C程序,该程序应该计算给定文件中子字符串列表的出现次数。当我用手动创建并填充文本的文件测试该程序时,它工作正常,正确地计算了每个子字符串的出现次数。但是,当我尝试在用echo命令创建的文件上使用该程序时,程序似乎找不到子字符串的任何示例,即使我用文本编辑器打开文件时可以看到子字符串存在于文件中。
我已经检查了程序的逻辑,我相信它是正确的,但我不知道为什么它不能与echo创建的文件一起工作。
下面是该程序的简化版本:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>

#define BUFFER_SIZE 1024
int num_substrings = 0;
int use_systemcall = 0; 

void search_file(char *filename, char *substring) {
    // Open the file with the given filename in read mode
    FILE *file = fopen(filename, "r");
    
    // Check if the file was successfully opened
    if (file == NULL) {
        // Print an error message and exit the program with an error code
        fprintf(stderr, "Error: could not open file '%s'\n", filename);
        exit(1);
    }
    
    int count = 0;
    char buffer[BUFFER_SIZE];
    char *line;
    size_t len = 0;
    ssize_t read;
    
    // Read the file line by line until the end
    while ((read = getline(&line, &len, file)) != -1) {
        // Skip the last line if it is empty
        if (read == 1 && line[0] == '\n') {
            continue;
        }

        // Strip any newline characters from the end of the line
        if (line[read - 1] == '\n') {
            line[read - 1] = '\0';
            read--;
        }
    
        // Find the first occurrence of the given substring in the current line
        char *match = strstr(line, substring);
    
        // While there are still occurrences of the substring in the current line
        while (match != NULL) {
            // Increment the counter and find the next occurrence of the substring
            count++;
            match = strstr(match + 1, substring);
        }
    }
    
    // Close the file
    fclose(file);
    
    // Print the number of occurrences of the substring found in the file
    printf("Found %d occurrences of substring '%s' in file '%s'\n",
           count, substring, filename);
}

int main(int argc, char *argv[]) {
    // Get the filename from the first command-line argument
    char *filename = argv[1];
    
    // Initialize an array to store the substrings and a counter for the number of substrings
    char substrings[10][100];
    int num_substrings = 0;
    
    // Loop through the remaining command-line arguments (starting from the second one)
    for (int i = 2; i < argc; i++) {
        // Copy the current argument (substring) into the substrings array
        strcpy(substrings[num_substrings], argv[i]);
        
        // Increment the counter for the number of substrings
        num_substrings++;
    }
    
    // Ask the user if they want to use a system call
    printf("Do you want to use system call? (y/n): ");

    char answer[10];
    fgets(answer, 10, stdin);
    
    // Check if the user answered yes (y or Y) and set the use_systemcall variable accordingly
    int use_systemcall = 0;
    if (answer[0] == 'y' || answer[0] == 'Y') {
        use_systemcall = 1;
    }

    printf("Filename: %s\n", filename);
    printf("Substrings: ");
    for (int i = 0; i < num_substrings; i++) {
        printf("%s ", substrings[i]);
    }
    printf("\n");
    
    // Open the file for reading
    FILE *file = fopen(filename, "rb");
    
    if (file == NULL) {
        printf("Error: Cannot open file %s\n", filename);
        return 1;
    }
    
    // Initialize a buffer to read the file in blocks of 100 characters
    char buffer[101];
    
    // Loop through each substring and search for it in the file
    for (int i = 0; i < num_substrings; i++) {
        // Reset the file pointer to the beginning of the file
        fseek(file, 0, SEEK_SET);
        
        // Initialize a counter for the number of occurrences of the substring
        int count = 0;
        
        // Loop through the file in blocks of 100 characters
        while (fread(buffer, sizeof(char), 100, file) > 0) {
            // Add a null terminator at the end of the buffer
            buffer[100] = '\0';
            
            // Search for the substring in the buffer
            char *result = strstr(buffer, substrings[i]);
            
            // If the substring is found, increment the count
            while (result != NULL) {
                count++;
                
                // Move the result pointer to the next character after the match
                result++;
                
                // Search for the substring again starting from the result pointer
                result = strstr(result, substrings[i]);
            }
        }
        
        // Print the number of occurrences of the substring
        printf("'%s' appears %d times in the file.\n", substrings[i], count);
    }
    return 0;
}

指令:

echo "hello world" > foo.txt    ---For creating file
./substring_search foo.txt world -- for searching substrings

输出:

'world' appears 0 times in the file.

有人能帮我找出可能导致此问题的原因以及如何修复它吗?

bkhjykvo

bkhjykvo1#

代码存在一些问题:

不完整的块读取

它的文件是说50字节,那么buffer[100] = '\0'不使buffer[]正确的 * 字符串 *。最好使用从fread()返回的长度。我怀疑这是OP的关键问题。

子串跨越块边界

如果子串的一部分在一个块中而其余部分在另一个块中,则strstr(buffer, substrings[i]);不检测该子串。

超出访问范围

char substrings[10][100];

num_substrings >= 10时,substrings[num_substrings]是错误的。
当源字符串的长度大于或等于100时,strcpy(substrings[num_substrings], argv[i]);是错误的。

**文件中是否有空字符

如果源文件包含'\0',则strstr(buffer, substrings[i])将比阅读整个buffer[]早停止。

首先检查argc

int main(int argc, char *argv[]) {
  if (argc < 2) {
    fprintf(stderr, "Error: Insufficient arguments\n");
    return EXIT_FAILURE;
  }
  
  // OK now to save the argument for later fopen() use. 
  char* filename = argv[1];
  ... 
  FILE *file = fopen(filename, "rb");

"\n""\r\n"的比较

如果手动创建的文件或 echo 文件具有不同的行结束符,我不认为这会对OP产生影响-但在调试时要注意这一点。

0lvr5msh

0lvr5msh2#

程序的 * 简化 * 版本不产生发布的输出:没有关于系统调用的问题,并且缺少文件名输出。我得到用echo创建的foo.txt文件的输出:

Do you want to use system call? (y/n): y
Filename: foo.txt
Substrings: world
'world' appears 1 times in the file.

程序存在一些问题,但不应妨碍预期输出:

  • 没有对命令行参数的数量或单个字符串的长度进行健全性测试。
  • 被同名局部变量隐藏的未使用全局变量
  • 未使用的函数search_file,它使用不同的方法读取文件,也应该生成预期的输出。
  • main函数一次读取文件100个字节,因此不计算重叠块边界的匹配
  • 在部分读取结束时,部分读取将不是空终止的,从而导致在前一块结束时发生的匹配被计数两次。

您应该简化发布的代码并确保它仍然存在问题。

相关问题