当我尝试读取csv文件并将信息存储在C中的结构中时,我得到错误

xfyts7mz  于 2022-12-17  发布在  其他
关注(0)|答案(1)|浏览(161)

如果程序没有读取4个值,它应该显示错误消息,但我的文件有4个值,即使我将value中的值更改为2、3或5,我也会得到相同的输出。
这个程序的输出是:

File format incorrect.

但是,当我从read == 4read != 4更改为read == 1read != 1时,我的输出为:

8 records read.
Iskandar  0.000000 0.000000 0.000000
Kholmatov,100,100,100 0 -0.001162 0.000000 0.000000
George  0.000000 0.000000 20.625134
Washington,90,50,100  -0.001162 0.000000 0.000000
Dennis  0.000000 0.000000 0.000000
Ritchie,90,0,10  0.000000 0.000000 0.000000
Bill  0.000000 0.000000 0.000000
Gates,60,50,77  0.000000 0.000000 -0.001162`

我的数据.csv文件:

Iskandar Kholmatov,100,100,100
George Washington,90,50,100
Dennis Ritchie,90,0,10
Bill Gates,60,50,77

我的程序:

#include <stdio.h>

// struct to hold the name of a student
struct name
{
  char first[20]; // string to hold the first name
  char last[20]; // string to hold the last name
};

// struct to hold the grades of a student
struct student
{
  struct name Name; // name struct from above
  float grades[3]; // array to hold 3 grades
  float average; // float to hold the average of 3 grades above
};

int main(void)
{
  // file pointer variable for accessing the file
  FILE *file;

  // attempt to open file.txt in read mode to read the file contents
  file = fopen("data.csv", "r");

  // if the file failed to open, exit with an error message and status
  if (file == NULL)
  {
    printf("Error opening file.\n");
    return 1;
  }

  struct student students[5];

  int read = 0;

  // records will keep track of the number of Student records read from the file
  int records = 0;

  // read all records from the file and store them into the students array
  do
  {
    read = fscanf(file, "%s,%s,%f,%f,%f\n",
           students[records].Name.first,
           students[records].Name.last,
           &students[records].grades[0],
           &students[records].grades[1],
           &students[records].grades[2]);

    // if fscanf read 4 values from the file then we've successfully read
    // in another record
    if (read == 4)
      records++;

    // The only time that fscanf should NOT read 4 values from the file is
    // when we've reached the end of the file, so if fscanf did not read in
    // exactly 4 values and we're not at the end of the file, there has been
    // an error (likely due to an incorrect file format) and so we exit with
    // an error message and status.
    if (read != 4 && !feof(file))
    {
      printf("File format incorrect.\n");
      return 1;
    }

    // if there was an error reading from the file exit with an error message
    // and status
    if (ferror(file))
    {
      printf("Error reading file.\n");
      return 1;
    }

  } while (!feof(file));

  // close the file as we are done working with it
  fclose(file);

  // print out the number of records read
  printf("\n%d records read.\n\n", records);

  // print out each of the records that was read
  for (int i = 0; i < records; i++)
    printf("%s %s %f %f %f\n",
           students[i].Name.first,
           students[i].Name.last,
           students[i].grades[0],
           students[i].grades[1],
           students[i].grades[2]);
  printf("\n");

  return 0;
}

我期望的输出只是.csv文件中的信息。

6vl6ewon

6vl6ewon1#

阅读CSV(Comma-Separated Value)文件在一般情况下是困难的,其中字段可以嵌入双引号,然后可以包含逗号和双引号以嵌入双引号,并且其中单个字段可以扩展到多行。
在您的数据中,您不必担心这些特殊情况。相反,您已经强加了不一致性,因为您根据分隔它们的空格将名称字段拆分为两个。只要数据中没有“Alice Betty Clarke”作为名称,您仍然可以这样做。
您试图用途:

read = fscanf(file, "%s,%s,%f,%f,%f\n",
           students[records].Name.first,
           students[records].Name.last,
           &students[records].grades[0],
           &students[records].grades[1],
           &students[records].grades[2]);

单是这一点就有许多问题:
1.您试图读取由逗号分隔的名称,但它们由空格分隔。
1.在格式字符串的末尾放一个换行符(白色)。
1.第二个%s将读取白色,这意味着它将吞掉逗号和数字。
1.您无法防止缓冲区因名称过长而溢出。
这些问题的解决办法是:
1.这个问题很容易解决--将格式字符串中的第一个逗号替换为空白(或者完全省略它:"%s%s"读取由白色分隔的两个字)。
1.参见What is the effect of trailing white space in a scanf() format string?当你从一个文件中阅读时,就像在你的代码中一样,这并不像你从用户在终端上的输入中读取那样严重-但是当输入来自终端时,格式字符串中的白色是一个灾难性的UI/UX错误。修复是微不足道的-从格式字符串中省略\n。下一个调用将跳过前导空格,包括从先前呼叫遗留下来的换行符。
1.使用求反扫描集:%[^,]。为了简单和一致性,您可以使用它来代替第一个字段。
1.限制输入的长度:"%19[^, ] %19[^, ],%f,%f,%f"。注意,有三个转换说明符不跳过前导白色,它们是%c%[…](扫描集)和%n。当使用扫描集时,必须在转换说明符之间包括空白。
您已经为以下项试验了各种值:

if (read == 4)
      records++;

由于您尝试读取5个值,因此应测试5个值;如果没有得到5,则可能是EOF(返回值EOF)、某种编码错误(不太可能,但返回值也可能是EOF)或数据格式错误(返回值在0..4范围内)。在收到EOF时,应该退出循环。如果数据格式错误,如果要继续,应该读取并忽略数据,直到下一个换行符:

int c;
while ((c = getchar()) != EOF && c != '\n')
    ;

立即弃船可能更明智。或者,计算错误记录的数量,读取文件的其余部分,以便报告更多的错误记录,并可能在最终检测到EOF后放弃进一步的处理。
您应该确保不要尝试读取超过数组容量的记录。
您可以使用fgets()或POSIX getline()阅读整行,然后将该行传递给sscanf(),从而改进错误报告。注意,如果这样做,您可能需要检查第三个数字之后的垃圾,可能使用%n转换规范来标识转换停止的位置,并确保数字后没有非空字符。scanf()系列函数不计算返回值中的%n转换。
请注意,错误消息应该写入stderr,而不是stdout。(如fopen()open()),并使用字符串文本作为文件名。必须检查打开是否成功,如果没有成功,报告错误(在标准错误- stderr上),并且应该在错误消息中包含文件名。为了避免重复,应该将指向文件名的变量传递给打开函数,然后在格式化错误消息时也可以使用该变量。如果没有更好的机制,可以使用perror()报告问题。例如:

const char *filename = "data.csv";
FILE *fp = fopen(filename, "r");
if (fp == NULL)
{
    perror(filename);
    exit(EXIT_FAILURE);
}

将所有这些更改和优化放在一起,您可能会得到如下代码:

#include <ctype.h>
#include <stdio.h>
#include <string.h>

struct Name
{
    char first[20];
    char last[20];
};

struct Student
{
    struct Name name;
    float grades[3];
    float average;
};

static int trailing_white_space_only(const char *buffer)
{
    unsigned char *data = (unsigned char *)buffer;
    while (*data != '\0' && isspace(*data))
        data++;
    return *data == '\0';
}

int main(void)
{
    const char *filename = "data.csv";
    FILE *fp = fopen("data.csv", "r");

    if (fp == NULL)
    {
        fprintf(stderr, "Error opening file '%s' for reading\n", filename);
        return 1;
    }

    enum { MAX_STUDENTS = 5 };
    struct Student students[MAX_STUDENTS];

    int n_fields = 0;
    int records = 0;
    int lineno = 0;
    int fail = 0;

    char buffer[2048];
    while (records < MAX_STUDENTS && fgets(buffer, sizeof(buffer), fp) != NULL)
    {
        buffer[strcspn(buffer, "\n")] = '\0';
        lineno++;
        int offset = 0;
        n_fields = sscanf(buffer, "%19[^, ] %19[^, ],%f,%f,%f%n",
                          students[records].name.first,
                          students[records].name.last,
                          &students[records].grades[0],
                          &students[records].grades[1],
                          &students[records].grades[2],
                          &offset);

        if (n_fields == 5)
        {
            if (trailing_white_space_only(&buffer[offset]))
                records++;
            else
            {
                fprintf(stderr, "Trailing junk on line %d\n    [%s]\n",
                        lineno, buffer);
                fail++;
            }
        }
        else
        {
            fail++;
            fprintf(stderr, "Format error on line %d (field %d)\n    [%s]\n",
                    lineno, n_fields + 1, buffer);
        }
    }

    fclose(fp);

    if (fail == 0)
        printf("\n%d records read successfully.\n\n", records);
    else
        printf("\n%d records read successfully (and %d invalid records "
               "were discarded).\n\n", records, fail);

    for (int i = 0; i < records; i++)
    {
        char name[sizeof(struct Name)];
        snprintf(name, sizeof(name), "%.19s %.19s",
                 students[i].name.first, students[i].name.last);
        printf("%-39s %6.2f %6.2f %6.2f\n", name,
               students[i].grades[0],
               students[i].grades[1],
               students[i].grades[2]);
    }
    printf("\n");

    return 0;
}

使用问题中的数据文件data.csv,输出为:

4 records read successfully.

Iskandar Kholmatov                      100.00 100.00 100.00
George Washington                        90.00  50.00 100.00
Dennis Ritchie                           90.00   0.00  10.00
Bill Gates                               60.00  50.00  77.00

现在考虑这个变体数据文件,它在第3、5和6行有错误数据:

Iskandar Kholmatov,100,100,100
George Washington,90,50,100
Garbage Disposal,read,me,a,riddle
Dennis Ritchie,90,0,10
Steve Jobs,60,70,80,
Betty Alice Clarke,94,95,97
Bill Gates,60,50,77

输出为:

Format error on line 3 (field 3)
    [Garbage Disposal,read,me,a,riddle]
Trailing junk on line 5
    [Steve Jobs,60,70,80,]
Format error on line 6 (field 3)
    [Betty Alice Clarke,94,95,97]

4 records read successfully (and 3 invalid records were discarded).

Iskandar Kholmatov                      100.00 100.00 100.00
George Washington                        90.00  50.00 100.00
Dennis Ritchie                           90.00   0.00  10.00
Bill Gates                               60.00  50.00  77.00

还有许多方法可以改进程序。例如,如果文件中的记录多于数组中的记录,则可以读取并诊断多余的记录(同时报告错误)。或者可以修改代码以动态分配学生数组,并在必要时增大数组。

相关问题