C++解析文本并写入CSV文件

4ioopgfo  于 12个月前  发布在  其他
关注(0)|答案(4)|浏览(147)

我的朋友给我发了这个,我找不到办法去做。


的数据
我用人工智能做过无数次尝试,但都没有希望。如果有人能帮助我,我将非常感激。它必须是C++

#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <unordered_map>
#include <regex>

// Function to split a string based on a delimiter
std::vector<std::string> splitString(const std::string& s, char delimiter) {
    std::vector<std::string> tokens;
    std::stringstream ss(s);
    std::string token;
    while (std::getline(ss, token, delimiter)) {
        tokens.push_back(token);
    }
    return tokens;
}

int main() {
    // Input and output file paths
    const std::string inputFile = "file_list.csv";
    const std::string outputFile = "output.csv";

    // Open the input file
    std::ifstream inFile(inputFile);
    if (!inFile.is_open()) {
        std::cerr << "Error opening file: " << inputFile << std::endl;
        return 1;
    }

    // Create a map to store information about each student's group and questions answered
    std::unordered_map<std::string, std::pair<std::string, std::vector<int>>> studentInfo;

    // Regular expression pattern for extracting information
    std::regex pattern(R"(([a-zA-Z]+)_?(\d+)_?(\d+)?.*)");

    // Read the input file line by line
    std::string line;
    while (std::getline(inFile, line)) {
        // Use regex to match the pattern
        std::smatch match;
        if (std::regex_match(line, match, pattern)) {
            // Extract matched groups
            std::string groupName = match[1];
            std::string studentID = match[2];
            std::string questionNumberStr = match[3];

            // Convert question number to integer
            int questionNumber = (questionNumberStr.empty()) ? -1 : std::stoi(questionNumberStr);

            // Update the map with the student's information
            if (!studentID.empty()) {
                if (studentInfo.find(studentID) == studentInfo.end()) {
                    // If the student is not in the map, add a new entry
                    studentInfo[studentID] = std::make_pair(groupName, std::vector<int>{questionNumber});
                } else {
                    // If the student is already in the map, update the existing entry
                    if (questionNumber != -1) {
                        studentInfo[studentID].second.push_back(questionNumber);
                    }
                }
            }
        }
    }

    // Close the input file
    inFile.close();

    // Open the output file for writing
    std::ofstream outFile(outputFile);
    if (!outFile.is_open()) {
        std::cerr << "Error opening file: " << outputFile << std::endl;
        return 1;
    }

    // Write the header to the output file
    outFile << "StudentList,GroupName,QuestionsAnswered" << std::endl;

    // Write the student information to the output file
    for (const auto& entry : studentInfo) {
        // Sort the list of questions answered by each student
        std::vector<int> questions = entry.second.second;
        std::sort(questions.begin(), questions.end());

        // Write the student information to the output file
        outFile << entry.first << "," << entry.second.first << ",";
        for (size_t i = 0; i < questions.size(); ++i) {
            outFile << questions[i];
            if (i < questions.size() - 1) {
                outFile << ",";
            }
        }
        outFile << std::endl;
    }

    // Close the output file
    outFile.close();

    std::cout << "Output file created successfully: " << outputFile << std::endl;

    return 0;
}

字符串
我找不到一种方法将qrub、grup或group words从group列的元素中分离出来。它必须是一个像示例一样的csv文件。
file_list.csv的一部分

d_3_2211011228.cpp
d_3_2211012211.cpp.txt
D_3_2211011054.cpp.txt
d_3_2211011096 .txt
d_question1_2111011034.txt
d_question2_2111011034.txt
d_question3_2111011034.txt
Group a_1_2211011032.cpp
Group a_2_2211011032.cpp
Group a_3_2211011032.cpp
group c_QUESTION 1_2211011024.txt

qxgroojn

qxgroojn1#

我无法找到一种方法将qrub、grup或group words从group列的元素中分离出来。
一种方法是将每一行分成三部分,使用下划线(_)作为下划线。
在这里的演示中,我使用了结构Fields,其中所有的成员都是字符串,这些是从一行输入中解析出来的“原始”字段。
为了方便起见,我使用 hidden friends 习惯用法为struct Fields提供了一个流插入操作符。

struct Fields
{
    std::string student_id;
    std::string group;
    std::string question;

    friend std::ostream& operator<< (std::ostream& ost, Fields const& fields)
    {
        ost << "student_id : " << std::quoted(fields.student_id)
            << "\ngroup      : " << std::quoted(fields.group)
            << "\nquestion   : " << std::quoted(fields.question)
            << "\n\n";
        return ost;
    }
};

字符串
给定输入文件中的一行,函数parse_fields提取这三个字段(作为字符串),并将它们放入Fields结构中。该结构作为引用参数传递。
如果解析成功,函数parse_fields返回true。如果解析失败,则向std::cerr输出错误消息,parse_fields返回false

bool parse_fields(std::string const& line, Fields& fields)
{
    std::stringstream sst{ line };
    if (!std::getline(sst, fields.group, '_'))
    {
        std::cerr
            << "`parse_fields` - Could not extract `group` from `line`: "
            << std::quoted(line)
            << "\n\n";
        return false;
    }
    if (!std::getline(sst, fields.question, '_'))
    {
        std::cerr
            << "`parse_fields` - Could not extract `question` from `line`: "
            << std::quoted(line)
            << "\n\n";
        return false;
    }
    if (!std::getline(sst, fields.student_id))  // no underscore here!
    {
        std::cerr
            << "`parse_fields` - Could not extract `student_id` from `line`: "
            << std::quoted(line)
            << "\n\n";
        return false;
    }
    if (!sst.eof() && !(sst >> std::ws).eof())
    {
        std::string s;
        std::getline(sst, s);
        std::cerr
            << fields
            << "`parse_fields` - Expecting `sst.eof()`, "
            << "i.e., end-of-line, but more remains: : "
            << std::quoted(s) 
            << "\n\n";
        return false;
    }
    fields.student_id = trim_whitespace(fields.student_id);
    if (fields.student_id.empty())
    {
        std::cerr
            << "`parse_fields` - `student_id` field is empty: "
            << std::quoted(line)
            << "\n\n";
        return false;
    }
    fields.group = trim_whitespace(fields.group);
    if (fields.group.empty())
    {
        std::cerr
            << "`parse_fields` - `group` field is empty: "
            << std::quoted(line)
            << "\n\n";
        return false;
    }
    fields.question = trim_whitespace(fields.question);
    if (fields.question.empty())
    {
        std::cerr
            << "`parse_fields` - `question` field is empty: "
            << std::quoted(line)
            << '\n';
        return false;
    }
    return true;
}


解析完字段后,进一步的处理可以将每个字段转换为最终形式。group可能是最简单的。只需将其最后一个字符转换为小写,并丢弃所有其他字符。
其他字段需要更多的工作。对于question number,我可能会使用std::string中的几个成员函数。使用"0123456789"作为search-for参数调用这些函数:

明确了question的位置后,调用成员函数substr将为您提取它。
使用find_first_offind_first_not_of的类似技术将得到student_id

I can't take all the fun

当然,这仍然有很多工作要做,你必须在Map中找到student_idgroup,然后将question推到问题向量的后面,等等。
但我得给你留点乐子!

演示程序

下面是一个简短的演示程序。
我运行了一个截短版本的数据文件从谷歌组:

File Name
A_ 2_ 2211011100.cpp
A_1_ 2211011043.cpp
a_1_2111088001.cpp
group c_QUESTION 1_2211011024.txt
GroupA_Quesiton2_2211011034.cpp
grupc_question3_2111013335_cpp.txt


该程序包括我工具箱中的几个函数:to_lower_in_placetrim_whitespace它们相对简单,正如你在源代码中看到的那样。除此之外,我添加了函数parse_header以跳过数据文件的第一行。
函数main打开文件,并在循环中读取记录,当文件结束时停止。

// main.cpp
#include <algorithm>
#include <cctype>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>

void to_lower_in_place(std::string& s) noexcept
{
    std::transform(s.begin(), s.end(), s.begin(),
        [](unsigned char c) {
            return std::tolower(c);
        }
    );
}

std::string trim_whitespace(std::string const& s)
{
    // Trim leading and trailing whitespace from string `s`.
    auto const first{ s.find_first_not_of(" \f\n\r\t\v") };
    if (first == std::string::npos)
        return {};
    auto const last{ s.find_last_not_of(" \f\n\r\t\v") };
    enum : std::string::size_type { one = 1u };
    return s.substr(first, (last - first + one));
}

struct Fields
{
    std::string student_id;
    std::string group;
    std::string question;

    friend std::ostream& operator<< (std::ostream& ost, Fields const& fields)
    {
        ost << "student_id : " << std::quoted(fields.student_id)
            << "\ngroup      : " << std::quoted(fields.group)
            << "\nquestion   : " << std::quoted(fields.question)
            << "\n\n";
        return ost;
    }
};

bool parse_header(std::istream& ist)
{
    std::string line;
    if (!std::getline(ist, line))
    {
        std::cerr << "parse_header - `std::getline` failed unexpectedly "
            "while parsing header. \n\n";
        return false;
    }
    std::string s{ trim_whitespace(line) };
    to_lower_in_place(s);
    if (s != "file name")
    {
        std::cerr 
            << "parse_header - Could not parse header: "
            << std::quoted(line)
            << " \n\n";
        return false;
    }
    return true;
}

bool parse_fields(std::string const& line, Fields& fields)
{
    std::stringstream sst{ line };
    if (!std::getline(sst, fields.group, '_'))
    {
        std::cerr
            << "`parse_fields` - Could not extract `group` from `line`: "
            << std::quoted(line)
            << "\n\n";
        return false;
    }
    if (!std::getline(sst, fields.question, '_'))
    {
        std::cerr
            << "`parse_fields` - Could not extract `question` from `line`: "
            << std::quoted(line)
            << "\n\n";
        return false;
    }
    if (!std::getline(sst, fields.student_id))  // no underscore here!
    {
        std::cerr
            << "`parse_fields` - Could not extract `student_id` from `line`: "
            << std::quoted(line)
            << "\n\n";
        return false;
    }
    if (!sst.eof() && !(sst >> std::ws).eof())
    {
        std::string s;
        std::getline(sst, s);
        std::cerr
            << fields
            << "`parse_fields` - Expecting `sst.eof()`, "
            << "i.e., end-of-line, but more remains: : "
            << std::quoted(s) 
            << "\n\n";
        return false;
    }
    fields.student_id = trim_whitespace(fields.student_id);
    if (fields.student_id.empty())
    {
        std::cerr
            << "`parse_fields` - `student_id` field is empty: "
            << std::quoted(line)
            << "\n\n";
        return false;
    }
    fields.group = trim_whitespace(fields.group);
    if (fields.group.empty())
    {
        std::cerr
            << "`parse_fields` - `group` field is empty: "
            << std::quoted(line)
            << "\n\n";
        return false;
    }
    fields.question = trim_whitespace(fields.question);
    if (fields.question.empty())
    {
        std::cerr
            << "`parse_fields` - `question` field is empty: "
            << std::quoted(line)
            << '\n';
        return false;
    }
    return true;
}

int main()
{
    enum : int { no_errors, open_error, header_error, parsing_error };
    int exit_code = no_errors;
    const std::string inputFile = "file_list.csv";
    std::ifstream ist{ inputFile };
    if (!ist.is_open())
    {
        std::cerr
            << "ERROR: Could not open `inputFile`: "
            << std::quoted(inputFile)
            << "\n\n";
        return open_error;
    }
    if (!parse_header(ist))
    {
        std::cerr
            << "ERROR: Could not parse header (i.e., first line) of `inputFile`: "
            << std::quoted(inputFile)
            << "\n\n";
        return header_error;
    }
    std::string line;
    while (std::getline(ist, line))
    {
        std::cout<< "line       : " << std::quoted(line) << '\n';
        Fields fields;
        if (parse_fields(line, fields))
            std::cout << fields;
        else
        {
            std::cerr << "Skipping invalid line. \n\n";
            exit_code = parsing_error;
        }
    }
    return exit_code;
}
// end file: main.cpp


下面是输出:

line       : "A_ 2_ 2211011100.cpp"
student_id : "2211011100.cpp"
group      : "A"
question   : "2"

line       : "A_1_ 2211011043.cpp"
student_id : "2211011043.cpp"
group      : "A"
question   : "1"

line       : "a_1_2111088001.cpp"
student_id : "2111088001.cpp"
group      : "a"
question   : "1"

line       : "group c_QUESTION 1_2211011024.txt"
student_id : "2211011024.txt"
group      : "group c"
question   : "QUESTION 1"

line       : "GroupA_Quesiton2_2211011034.cpp"
student_id : "2211011034.cpp"
group      : "GroupA"
question   : "Quesiton2"

line       : "grupc_question3_2111013335_cpp.txt"
student_id : "2111013335_cpp.txt"
group      : "grupc"
question   : "question3"

g0czyy6m

g0czyy6m2#

最主要的问题是正则表达式不正确。下面是我修改的程序。

#include <iostream>
#include <fstream>
//#include <sstream> // removed
#include <vector>
#include <unordered_map>
#include <regex>
#include <algorithm> // added
#include <cctype> // added

/* removed
// Function to split a string based on a delimiter
std::vector<std::string> splitString(const std::string& s, char delimiter) {
    std::vector<std::string> tokens;
    std::stringstream ss(s);
    std::string token;
    while (std::getline(ss, token, delimiter)) {
        tokens.push_back(token);
    }
    return tokens;
}
*/

int main() {
    // Input and output file paths
    const std::string inputFile = "file_list.csv";
    const std::string outputFile = "output.csv";

    // Open the log file // added section
    const std::string logFileName = "skipped.csv";
    std::ofstream logFile(logFileName);
    if (!logFile.is_open()) {
        std::cerr << "Error opening file: " << logFileName << std::endl;
        return 1;
    }

    // Open the input file
    std::ifstream inFile(inputFile);
    if (!inFile.is_open()) {
        std::cerr << "Error opening file: " << inputFile << std::endl;
        return 1;
    }

    // Create a map to store information about each student's group and questions answered
    std::unordered_map<std::string, std::pair<std::string, std::vector<int>>> studentInfo;

    // Regular expression pattern for extracting information
    //std::regex pattern(R"(([a-zA-Z]+)_?(\d+)_?(\d+)?.*)");
    std::regex pattern(R"(([a-zA-Z]+)_[^\d_]*(\d+)_(\d+)[^\d.]*\.)"); // fixed

    // Read the input file line by line
    std::string line;
    while (std::getline(inFile, line)) {
        // Use regex to match the pattern
        std::smatch match;
        //if (std::regex_match(line, match, pattern)) {
        if (std::regex_search(line, match, pattern)) { // fixed
            // Extract matched groups
            std::string groupName = match[1];
            //std::string studentID = match[2];
            std::string studentID = match[3]; // fixed
            //std::string questionNumberStr = match[3];
            std::string questionNumberStr = match[2]; // fixed

            // Convert groupName to lower case // added
            std::transform(groupName.cbegin(), groupName.cend(), groupName.begin(), [](unsigned char c) { return std::tolower(c); });

            // Convert question number to integer
            // int questionNumber = (questionNumberStr.empty()) ? -1 : std::stoi(questionNumberStr);
            int questionNumber = std::stoi(questionNumberStr); // fixed

            // Update the map with the student's information
            //if (!studentID.empty()) { // removed
                if (studentInfo.find(studentID) == studentInfo.end()) {
                    // If the student is not in the map, add a new entry
                    studentInfo[studentID] = std::make_pair(groupName, std::vector<int>{questionNumber});
                } else {
                    // If the student is already in the map, update the existing entry
                    //if (questionNumber != -1) { // removed
                        studentInfo[studentID].second.push_back(questionNumber);
                    //} // removed
                }
            //} // removed
        } else if (!line.empty()) { // added
            logFile << line << std::endl; // log filenames that fail to parse
        }
    }

    // Close the input file
    inFile.close();

    // Open the output file for writing
    std::ofstream outFile(outputFile);
    if (!outFile.is_open()) {
        std::cerr << "Error opening file: " << outputFile << std::endl;
        return 1;
    }

    // Write the header to the output file
    outFile << "StudentList,GroupName,QuestionsAnswered" << std::endl;

    // Write the student information to the output file
    //for (const auto& entry : studentInfo) {
    for (auto& entry : studentInfo) { // fixed
        // Sort the list of questions answered by each student
        //std::vector<int> questions = entry.second.second;
        std::vector<int>& questions = entry.second.second; // fixed
        std::sort(questions.begin(), questions.end());

        // Write the student information to the output file
        outFile << entry.first << "," << entry.second.first << ",";
        outFile << '"'; // added
        for (size_t i = 0; i < questions.size(); ++i) {
            outFile << questions[i];
            //if (i < questions.size() - 1) {
            if (i + 1 < questions.size()) { // better way
                outFile << ",";
            }
        }
        outFile << '"'; // added
        outFile << std::endl;
    }

    // Close the output file
    outFile.close();
    logFile.close(); // added

    std::cout << "Output file created successfully: " << outputFile << std::endl;

    return 0;
}

字符串

wvt8vs2t

wvt8vs2t3#

C++解析文本并写入CSV文件
这看起来更像是一种谜题,而不是学生的任务。无论如何,我觉得这很奇怪。我将在这里展示一种使收支平衡的方法,并在C++中留下一个示例和输出。我的阅读绝不是最终的结论,因为描述似乎不够正式,不足以得出一个结论(或者说,我可以看到足够正式的结论)。

    • 这是一个无聊的长职位,因为我编码沿着,也张贴中间代码和文件。* TL; DR :不要读。或者只是为了以防万一。:;*

输入输出文件

看起来我们有一个输入csv文件,每个答案有一行,3个字段和一个'_'作为输入。预期的输出是一个合成csv文件,每个学生显示一行,对答案进行分组。输出的排序标准是 * stable *,这意味着学生按照他们在输入中出现的顺序列出。
在原始的完整输入csv中,作为Google Drive中的链接提供,甚至还有一个标题 * File Name *,但我认为这只是为了分散读者的注意力。没有文件,没有内容,文本就是这样:第三个字段以基本的学号开始,如果我们将文件视为使用下划线作为下划线的3字段csv,但它伪装成一个1行有趣的csv,带有 * 文件名 *,用于没有意义的文件。
至于csv文件是什么,RFC4180 Common Format and MIME Type for Comma-Separated Values (CSV) Files是参考

按照提供的方式输入csv文件

a_1_2122011022.cpp
a_2_2111011011.txt.cpp
B_2_2111011243.cpp.txt
A_3_211011011.txt

字符串
并且该预期输出如图所示
[![这显示了提取数据的方法][1]][1]
这样做似乎只是为了迷惑学生,并测试他/她解析文本的能力,无论如何。它不是一个1字段csv文件中的文件列表。它只是一个简单的3字段csv和一个_文件。
它列出了3个学生的4个答案。

  • 第一个字段是组名,但只考虑最后一个字母。
  • 正如我们在输出中看到的,输入时必须忽略大小写,输出时使用小写,因为学生2011011011在描述中出现在问题2的组a中,而问题3的组A
  • 第二次存档是问题编号
  • 第三个字段是学生编号,一个10位数的字段。任何其他数据都可以忽略,或者应该更好地解释:D

任务描述输出csv

根据这一描述(以及上述注解):

Student list    GroupName   QuestionsAnswered
2122011022          a              1
2111011011          a              2,3
2111011243          b              2


输出csv也是一个3字段文件。

  • 为了保持一致性,下划线是显而易见的选择,因为它是用于输入的下划线。
  • 显而易见的选择,逗号,是不好的,因为上表已经使用逗号来分组每个学生的答案。

所以在这个例子中,我将_作为
对于csv没有任何说明,但很明显,需要一个标题行,因为我们在

  • "输出文件的结构应该如下:"* 行,所以我会写一个。

来自问题提供的代码

提供的代码更多的是在C方面,而不是在C++方面。因此,作为一个csv解析器,奇怪的是没有看到scanf的使用,而是看到regex包含在内。正则表达式是一个很棒的资源,但这里的表达式是固定的,简单的,所以我认为在这种情况下,regex更多的是一个问题而不是一个解决方案。
当然,对一个同样匹配正则表达式的向量进行排序也是算法的一种情况。for_each将是一个实际上只有一行的解析器的基本选择。但是,看看任务描述,很明显输出不能排序。学生的顺序在输出中保持:只是每个学生的额外答案被分组在最后一个字段中。请参见示例中的student 2111011011的情况。注意,顺序是保留的。
代码似乎比需要的要复杂一些,但它似乎几乎是一个解决方案。

使用Google表格测试原始输入数据

这是原始问题的数据:

a_1_2122011022.cpp
a_2_2111011011.txt.cpp
B_2_2111011243.cpp.txt
A_3_211011011.txt
d_3_2211011228.cpp
d_3_2211012211.cpp.txt
D_3_2211011054.cpp.txt
d_3_2211011096 .txt
d_question1_2111011034.txt
d_question2_2111011034.txt
d_question3_2111011034.txt
Group a_1_2211011032.cpp
Group a_2_2211011032.cpp
Group a_3_2211011032.cpp
group c_QUESTION 1_2211011024.tx


在Sheets中使用File | Import--当然使用-作为模板--我们得到

a          1  2122011022.cpp
      a          2  2111011011.txt.cpp
      B          2  2111011243.cpp.txt
      A          3  2111011011.txt
      d          3  2211011228.cpp
      d          3  2211012211.cpp.txt
      D          3  2211011054.cpp.txt
      d          3  2211011096 .txt
      d  question1  2111011034.txt
      d  question2  2111011034.txt
      d  question3  2111011034.txt
Group a          1  2211011032.cpp
Group a          2  2211011032.cpp
Group a          3  2211011032.cpp
group c QUESTION 1  2211011024.txt


因此,看来解释是正确的:3个字段,都可以。我们只需要插入字段名称并按学生分组答案。

C++和问题的抽象

因为顺序是要保持的,所以在提供的代码中,unordered_map似乎是一个自然的选择,作为一个 * 容器 *。每个学生的答案也可以存储在一个向量中,但是,因为我们不需要测试输入中的唯一值,我们可以只使用一个字符串并附加每个学生的答案。

第一个原型

下面的程序获取一个名为argumen的文件并解析它,在屏幕上显示结果

#include <cstdlib>
#include <fstream>
#include <iostream>
#include <string>
using std::ifstream;
using std::string;

int main(int argc, char** argv)
{
    string default_file{"file_list.csv"};
    string file_name{};
    if (argc < 2)
        file_name = default_file;
    else
        file_name = argv[1];
    std::cerr << "\n\tInput File is \"" << file_name
              << "\"\n";
    ifstream inFile{file_name};
    if (!inFile.is_open())
    {
        std::cerr << "\tError: " << file_name << "\n";
        return 0;
    }
    std::cout << "\t\"" << file_name << "\" is open\n";
    string      name;
    auto        lines  = 0;
    char        f[3][80]{0};  // 3 fields, 80 chars
    const char* mask = "%79[^_]_%79[^_]_%10[^_]";
    while (not inFile.eof())
    {
        getline(inFile, name);
        if (name != "")
            std::cerr << "#" << 1+lines << ":\t" << name
                      << " (" << name.size() << ")\n";
        auto res =
            sscanf(name.c_str(), mask, &f[0], &f[1], &f[2]);
        if (res == 3)
        {
            ++lines;
            std::cout << "    Fields:" << std::left
                      << "\n\t1:  \"" << f[0] << "\""
                      << "\n\t2:  \"" << f[1] << "\""
                      << "\n\t3:  \"" << f[2] << "\"\n\n";
        }
    };  // while()
    inFile.close();
    std::cerr << "\n\t\"" << file_name
              << "\" File is closed.\n\t" << lines
              << " lines read.\n";
    return 0;
}


为此文件

a_1_2122011022.cpp
a_2_2111011011.txt.cpp
B_2_2111011243.cpp.txt
A_3_2111011011.txt

它显示

#1:     a_1_2122011022.cpp (18)
    Fields:
        1:  "a"
        2:  "1"
        3:  "2122011022"

#2:     a_2_2111011011.txt.cpp (22)
    Fields:
        1:  "a"
        2:  "2"
        3:  "2111011011"

#3:     B_2_2111011243.cpp.txt (22)
    Fields:
        1:  "B"
        2:  "2"
        3:  "2111011243"

#4:     A_3_2111011011.txt (18)
    Fields:
        1:  "A"
        2:  "3"
        3:  "2111011011"

        "file_list.csv" File is closed.
        4 lines read.

创建csv文件

既然它看起来没问题,我们可以使用一个简单的struct来保存Map,并使用一个函数来编写csv文件。

Answers.h中的简单结构

#pragma once
#include <fstream>
#include <iostream>
#include <unordered_map>
#include <utility>

using std::string;
using Umap =
    std::unordered_map<string, std::pair<char, string>>;

struct Answers
{
    char   delimiter;  // for the csv output file
    size_t lines;
    Umap   answers;               // the map
    Answers(string, string);      // constructor from a file
    int show();                   // display contents
    int create_csv(string);       // write csv
    int insert_if_ok(char[][80]);
};

Answers.cpp中的实现

这是一个简单的问题:

  • Answers()获取文件名并将数据加载到Map中
  • show()在屏幕上显示数据
  • create_csv()获取一个文件名,并将Map作为csv文件卸载到磁盘。
  • insert_if_ok()解析数据并上传。
#include "Answers.h"

#include <algorithm>
#include <iostream>
#include <string>
#include <utility>

// the format for the 3 fields in the map
using Data = std::pair<string, std::pair<char, string>>;
using std::cerr;
using std::cout;

Answers::Answers(
  string file_name = "input.csv",
  string out       = "output.csv")
  : delimiter('_'), lines(0), answers({})
{
  string default_file{"input.csv"};
  if (file_name == "") file_name = default_file;
  std::ifstream inFile{file_name};
  if (!inFile.is_open())
  {
      cerr << "\tError opening \"" << file_name << "\"\n";
      return;
  }
  string      name;
  char        f[3][80]{0};  // 3 fields, 80 chars
  const char* mask = "%79[^_]_%79[^_]_%10[^_]";
  while (not inFile.eof())
  {
      getline(inFile, name);
      if (inFile.fail()) return;
      auto res =
          sscanf(name.c_str(), mask, &f[0], &f[1], &f[2]);
      if (res == 3) insert_if_ok(f);
  };
  inFile.close();
}

int Answers::show()
{
  cout << "    " << answers.size()
       << " students on file\n";
  for (auto asw : answers)
  {
      cout << "\t" << asw.first << " " << asw.second.first
           << " " << asw.second.second << "\n";
  }
  cout << std::endl;
  return 0;
}

int Answers::create_csv(string file_name = "output.csv")
{
  if (answers.size() < 1)
  {
      std::cerr << "no data recorded yet. Aborting\n";
  }
  string default_file{"output.csv"};
  if (file_name == "") file_name = default_file;
  std::ofstream outFile{file_name};
  if (!outFile.is_open())
  {
      cerr << "\tError creating \"" << file_name
           << "\"\n";
      return -1;
  }
  string header{
      "\"Student "
      "list\"_\"GroupName\"_\"QuestionsAnswered\""};
  std::cerr << "    Created: \"" << file_name << "\"\n";
  std::ostream_iterator<string> oit{outFile, "\n"};
  *oit++ = header;  // csv file field names
  std::for_each(
      answers.begin(), answers.end(),
      [&](Data asw)
      {
          *oit++ = asw.first + delimiter +
                   asw.second.first + delimiter +
                   asw.second.second;
      });
  outFile.close();
  return 0;
}

int Answers::insert_if_ok(char f[][80])
{
  string student{f[2]};
  // test just size not contents
  if (student.length() != 10) return -1;

  // get group as lowercase char at the end of
  // f[0]
  size_t len = strlen(f[0]);
  if (len < 1) return 0;
  char group = f[0][len - 1];
  if ((group >= 'A') and (group <= 'Z'))
      group += 0x20;  // convert to lowercase
  else if ((group < 'a') or (group > 'z'))
      return 0;

  // get answer as last digit number in f[1]
  len = strlen(f[1]);
  if (len < 1) return 0;
  char asw = f[1][len - 1];
  if ((asw < '1') or (asw > '9')) return 0;

  // now insert into map
  auto ref = answers.find(student);
  if (ref != answers.end())
  {  // check if same group
      if (group != ref->second.first) return 0;
      ref->second.second += ",";
      ref->second.second += asw;
  }
  else
  {  // first answer for this student
      string str_asw{asw};
      answers.insert(std::make_pair(
          student, std::make_pair(group, str_asw)));
  }
  return 1;
}

main进行测试

#include <iostream>
#include <string>
#include "Answers.h"

int main(int argc, char** argv)
{
    string def_in_file{"file_list.csv"};
    string def_out_file{"output.csv"};
    string in_file_name{def_in_file};
    string out_file_name{def_out_file};
    if (argc > 2) out_file_name = argv[2];
    if (argc > 1) in_file_name = argv[1];
    std::cerr << "\n\tGenerating \"" << out_file_name
              << "\" from \"" << in_file_name
              << "\"\n";
    Answers a_test(in_file_name,out_file_name);
    a_test.show();
    a_test.create_csv(out_file_name);
    return 0;
}

简单测试的输出

SO > cat file_list.csv
a_1_2122011022.cpp
a_2_2111011011.txt.cpp
B_2_2111011243.cpp.txt
A_3_2111011011.txt

SO > p

        Generating "output.csv" from "file_list.csv"
    3 students on file
        2122011022 a 1
        2111011011 a 2,3
        2111011243 b 2

    Created: "output.csv"

SO > cat output.csv
"Student list"_"GroupName"_"QuestionsAnswered"
2122011022_a_1
2111011011_a_2,3
2111011243_b_2

SO >

SO > cat input2.csv
a_1_2122011022.cpp
a_2_2122011022.cpp
a_2_2111011011.txt.cpp
B_2_2111011243.cpp.txt
A_3_2110110111.txt
d_3_2211011228.cpp
d_3_2211012211.cpp.txt
D_3_2211011054.cpp.txt
d_3_2211011096 .txt
d_question1_2111011034.txt
d_question2_2111011034.txt
d_question3_2111011034.txt
Group a_1_2211011032.cpp
Group a_2_2211011032.cpp
Group a_3_2211011032.cpp
group c_QUESTION 1_2211011024.txt
SO >
SO > p input2.csv output2.csv

        Generating "output2.csv" from "input2.csv"
    11 students on file
        2211011096 d 3
        2122011022 a 1,2
        2211012211 d 3
        2110110111 a 3
        2111011011 a 2
        2111011243 b 2
        2211011228 d 3
        2211011054 d 3
        2111011034 d 1,2,3
        2211011032 a 1,2,3
        2211011024 c 1

    Created: "output2.csv"

SO > cat output2.csv
"Student list"_"GroupName"_"QuestionsAnswered"
2211011096_d_3
2122011022_a_1,2
2211012211_d_3
2110110111_a_3
2111011011_a_2
2111011243_b_2
2211011228_d_3
2211011054_d_3
2111011034_d_1,2,3
2211011032_a_1,2,3
2211011024_c_1

SO >

使用驱动器中的原始csv运行

原始文件

Tks @tbxfreeware为张贴链接到这个文件.这是一个极简csv与1字段和174记录.它似乎与文件名,甚至有一个标题 “文件名” 但我相信这只是一个分心的学生.没有文件,没有内容,只是一个文本,在每行的第二条下划线之后是10位数的学生编号。有些行有额外的单词和/或额外的空格,没有正式的描述应该有什么,也没有描述在感知到 * 错误 * 的情况下该怎么做。
无论如何,这是文件(在写这篇文章的时候):

File Name
A_ 2_ 2211011100.cpp
A_1_ 2211011043.cpp
a_1_2111088001.cpp
a_1_2111012229.txt
A_1_2111011900.txt
a_1_2111012073.txt
a_1_1211011066.txt
a_1_2122011022.cpp
a_1_2111011011.txt
A_1_2111011243.txt
a_1_2111012004.txt
A_1_2211011011.cpp.txt
a_1_2211012221.txt
A_1_2211011029.txt
a_1_2211011456.cpp
a_1_2211022046.cpp
a_1_2211011099.cpp.txt
a_1_2211042092.txt
a_1_2311012023.txt
a_2_2111015001.txt
a_2_2111022009.txt
A_2_2111022021.txt
a_2_2112211033.txt
a_2_2111011077.txt
a_2_2111011062.cpp
a_2_2111011068.txt
A_2_2111011073.txt
a_2_2111011094.txt
A_2_2211011007.cpp.txt
a_2_2211011021.txt
A_2_2211011029.txt
a_2_2211011035.cpp
a_2_2211011046.cpp
a_2_2211011072.cpp.txt
a_2_2211011082.txt
a_3_2111011009.txt
A_3_2111011021.txt
a_3_2111011033.txt
a_3_2111011055.txt
a_3_2111011062.cpp
a_3_2111011068.txt
a_3_2111011094.txt
A_3_2211011007.cpp.txt
A_3_2211011029.txt
a_3_2211011072.cpp.txt
a_3_2211011082.txt
a_q1_2111011025.txt
b _ 2_ 2211011014 (2).txt
b _ 3_ 2211011014 (1).txt
B_1_2011011022.txt
b_1_2111011024.cpp
b_1_2111011045.cpp
B_1_2211011002.txt
b_1_2211011004.txt.txt
b_1_2211011017.txt
b_1_2211011033.txt
b_1_2211011080.txt
b_1_2311011095.txt
B_2_2011011022.txt
b_2_2111011024.cpp
b_2_2111011028.txt
B_2_2111011040.cpp
B_2_2111011088.txt
B_2_2211011002.txt
B_2_2211011003.cpp
b_2_2211011004.txt.txt
b_2_2211011017.txt
b_2_2211011033.txt
B_2_2211011067_.txt
b_2_2211011073.txt
b_2_2211011074.cpp
b_2_2211011080.txt
b_2_2211011104 (2).txt
b_2_2311011095.txt
B_3_2011011022.txt
b_3_2111011024.cpp
B_3_2111011040.cpp
B_3_2111011088.txt
B_3_2211011002.txt
B_3_2211011003.cpp
b_3_2211011004.txt.txt
b_3_2211011033.txt
b_3_2211011080.txt
b_3_2211011104 (1).txt
B_Q1_2111011070.cpp
B_Q2_2111011070.txt
B_Q3_2111011070.cpp
c_1_2011011051.cpp
c_1_2111011038.txt
c_1_2111011076.cpp.txt
c_1_2211011006.txt
c_1_2211011012.cpp
C_1_2211011026.txt
c_1_2211011027.cpp
C_1_2211011041.txt
c_1_2211011070.cpp
c_1_2211011101.txt
c_2_2011011051.cpp
c_2_2111011038.txt
c_2_2111011076.cpp.cpp
c_2_2211011006.txt
c_2_2211011012.cpp
C_2_2211011026.txt
c_2_2211011027.cpp
c_2_2211011037.cpp
C_2_2211011041.txt
c_2_2211011045.cpp.txt
C_2_2211011048.cpp
C_2_2211011050.cpp.txt
C_2_2211011069.txt
c_2_2211011080.cpp
c_2_2211011101.txt
c_3_2011011051.cpp
c_3_2111011038.txt
c_3_2111011076.cpp.cpp
c_3_2211011006.txt
c_3_2211011012.cpp
c_3_2211011027.cpp
C_3_2211011041.txt
c_3_2211011070.cpp
c_3_2211011101.txt
d _1 _2111011226  (1).cpp
d _2 _2111011226  (2).cpp
d _3 _2111011226  (3).cpp
d_1_2111011453.cpp
D_1_2111011059.txt
D_1_2111011090.cpp.txt
d_1_2211011048.cpp
d_1_2211411011.cpp.txt
D_1_2211011054.cpp.txt
D_1_2211011062.cpp
d_1_2241011094.txt
d_1_2211044096.txt
d_2_2111014453.cpp
D_2_2111013359.txt
D_2_2111013390.cpp.txt
d_2_2211013308.cpp
D_2_2211331054.cpp.txt
d_2_2211331096.txt
d_3_2111011053.cpp
D_3_2122011059.txt
D_3_2111221090.cpp.txt
d_3_2211011228.cpp
d_3_2211012211.cpp.txt
D_3_2211011054.cpp.txt
d_3_2211011096 .txt
d_question1_2111011034.txt
d_question2_2111011034.txt
d_question3_2111011034.txt
Group a_1_2211011032.cpp
Group a_2_2211011032.cpp
Group a_3_2211011032.cpp
group c_QUESTION 1_2211011024.txt
group c_QUESTION 2 _2211011024.txt
GroupA_Quesiton2_2211011034.cpp
GroupA_Question1_2211011034.cpp
GroupA_Question3_2211011034.cpp
GroupB_1_2211011081.txt
GroupB_2_2211011081.txt
GroupB_3_2211011081.txt
groupB_Question2_2011011038.cpp
GroupB_Question3_2011011038.cpp.txt
GroupC_Q1_2111022004.txt
GroupC_Q2_2111022004.txt
GroupC_Q3_2111011094.txt
Groupd_Q1_2111012211.txt
Groupd_Q2_2111012211.txt
Groupd_Q3_2111011046.txt
grupc_question1_2111014405.cpp.txt
Grupc_question2_2111012005.cpp.txt
grupc_question3_2111013335_cpp.txt
qroupa_question1_2211011681.txt
qroupa_question2_2211080051.txt
qroupa_question3_2211011901.txtFile Name
A_ 2_ 2211011100.cpp
A_1_ 2211011043.cpp
a_1_2111088001.cpp
a_1_2111012229.txt
A_1_2111011900.txt
a_1_2111012073.txt
a_1_1211011066.txt
a_1_2122011022.cpp
a_1_2111011011.txt
A_1_2111011243.txt
a_1_2111012004.txt
A_1_2211011011.cpp.txt
a_1_2211012221.txt
A_1_2211011029.txt
a_1_2211011456.cpp
a_1_2211022046.cpp
a_1_2211011099.cpp.txt
a_1_2211042092.txt
a_1_2311012023.txt
a_2_2111015001.txt
a_2_2111022009.txt
A_2_2111022021.txt
a_2_2112211033.txt
a_2_2111011077.txt
a_2_2111011062.cpp
a_2_2111011068.txt
A_2_2111011073.txt
a_2_2111011094.txt
A_2_2211011007.cpp.txt
a_2_2211011021.txt
A_2_2211011029.txt
a_2_2211011035.cpp
a_2_2211011046.cpp
a_2_2211011072.cpp.txt
a_2_2211011082.txt
a_3_2111011009.txt
A_3_2111011021.txt
a_3_2111011033.txt
a_3_2111011055.txt
a_3_2111011062.cpp
a_3_2111011068.txt
a_3_2111011094.txt
A_3_2211011007.cpp.txt
A_3_2211011029.txt
a_3_2211011072.cpp.txt
a_3_2211011082.txt
a_q1_2111011025.txt
b _ 2_ 2211011014 (2).txt
b _ 3_ 2211011014 (1).txt
B_1_2011011022.txt
b_1_2111011024.cpp
b_1_2111011045.cpp
B_1_2211011002.txt
b_1_2211011004.txt.txt
b_1_2211011017.txt
b_1_2211011033.txt
b_1_2211011080.txt
b_1_2311011095.txt
B_2_2011011022.txt
b_2_2111011024.cpp
b_2_2111011028.txt
B_2_2111011040.cpp
B_2_2111011088.txt
B_2_2211011002.txt
B_2_2211011003.cpp
b_2_2211011004.txt.txt
b_2_2211011017.txt
b_2_2211011033.txt
B_2_2211011067_.txt
b_2_2211011073.txt
b_2_2211011074.cpp
b_2_2211011080.txt
b_2_2211011104 (2).txt
b_2_2311011095.txt
B_3_2011011022.txt
b_3_2111011024.cpp
B_3_2111011040.cpp
B_3_2111011088.txt
B_3_2211011002.txt
B_3_2211011003.cpp
b_3_2211011004.txt.txt
b_3_2211011033.txt
b_3_2211011080.txt
b_3_2211011104 (1).txt
B_Q1_2111011070.cpp
B_Q2_2111011070.txt
B_Q3_2111011070.cpp
c_1_2011011051.cpp
c_1_2111011038.txt
c_1_2111011076.cpp.txt
c_1_2211011006.txt
c_1_2211011012.cpp
C_1_2211011026.txt
c_1_2211011027.cpp
C_1_2211011041.txt
c_1_2211011070.cpp
c_1_2211011101.txt
c_2_2011011051.cpp
c_2_2111011038.txt
c_2_2111011076.cpp.cpp
c_2_2211011006.txt
c_2_2211011012.cpp
C_2_2211011026.txt
c_2_2211011027.cpp
c_2_2211011037.cpp
C_2_2211011041.txt
c_2_2211011045.cpp.txt
C_2_2211011048.cpp
C_2_2211011050.cpp.txt
C_2_2211011069.txt
c_2_2211011080.cpp
c_2_2211011101.txt
c_3_2011011051.cpp
c_3_2111011038.txt
c_3_2111011076.cpp.cpp
c_3_2211011006.txt
c_3_2211011012.cpp
c_3_2211011027.cpp
C_3_2211011041.txt
c_3_2211011070.cpp
c_3_2211011101.txt
d _1 _2111011226  (1).cpp
d _2 _2111011226  (2).cpp
d _3 _2111011226  (3).cpp
d_1_2111011453.cpp
D_1_2111011059.txt
D_1_2111011090.cpp.txt
d_1_2211011048.cpp
d_1_2211411011.cpp.txt
D_1_2211011054.cpp.txt
D_1_2211011062.cpp
d_1_2241011094.txt
d_1_2211044096.txt
d_2_2111014453.cpp
D_2_2111013359.txt
D_2_2111013390.cpp.txt
d_2_2211013308.cpp
D_2_2211331054.cpp.txt
d_2_2211331096.txt
d_3_2111011053.cpp
D_3_2122011059.txt
D_3_2111221090.cpp.txt
d_3_2211011228.cpp
d_3_2211012211.cpp.txt
D_3_2211011054.cpp.txt
d_3_2211011096 .txt
d_question1_2111011034.txt
d_question2_2111011034.txt
d_question3_2111011034.txt
Group a_1_2211011032.cpp
Group a_2_2211011032.cpp
Group a_3_2211011032.cpp
group c_QUESTION 1_2211011024.txt
group c_QUESTION 2 _2211011024.txt
GroupA_Quesiton2_2211011034.cpp
GroupA_Question1_2211011034.cpp
GroupA_Question3_2211011034.cpp
GroupB_1_2211011081.txt
GroupB_2_2211011081.txt
GroupB_3_2211011081.txt
groupB_Question2_2011011038.cpp
GroupB_Question3_2011011038.cpp.txt
GroupC_Q1_2111022004.txt
GroupC_Q2_2111022004.txt
GroupC_Q3_2111011094.txt
Groupd_Q1_2111012211.txt
Groupd_Q2_2111012211.txt
Groupd_Q3_2111011046.txt
grupc_question1_2111014405.cpp.txt
Grupc_question2_2111012005.cpp.txt
grupc_question3_2111013335_cpp.txt
qroupa_question1_2211011681.txt
qroupa_question2_2211080051.txt
qroupa_question3_2211011901.txt

此文件的示例代码输出

一些行被删除,因为完整的输出和csv只是一个运行程序的问题.和产生的csv可以导入到任何程序.它只是在这里导入一次到Microsoft Excel.
注意上面的代码:

  • 只接受1组的学生。
  • 如果在同一组中,则接受学生的重复答案。
  • 这只是一个玩具,没有进行广泛的测试,只运行了几次。
SO > p full-original.csv full-output.csv

        Generating "full-output.csv" from "full-original.csv"
    107 students on file
        2311011095 b 1,2
        2211011029 a 1,2,3   
         
 ... some lines deleted ...

    Created: "full-output.csv"

SO >

  [1]: https://i.stack.imgur.com/oB8Mb.jpg
tp5buhyn

tp5buhyn4#

我找不到一种方法将qrub、grup或group单词与group列的元素分离。
一种方法是使用下划线(_)作为分隔符,将每行分为三个部分。
这个答案描述了我编写的程序,它使用两个主要类StudentAnswersFileList从OP中读取CSV文件。

  1. StudentAnswers-管理包含输入文件中数据的std::map。它还包含一个将Map写入输出文件的函数。
  2. FileList-包含读取输入文件的解析函数。
    这个答案很长。我建议你先阅读一点,然后再决定你的兴趣是否被激发了。否则,你可以随时退出!
    完整程序的源代码从第StudentAnswers节开始。

FileList部分介绍了程序核心的解析算法。

输入文件的布局

问题规范中留下了输入文件布局的许多细节。我已经通过基于发布到Google Drive的sample data file中发现的数据进行假设来填充它们。
1.输入文件为CSV,其中没有引用字段。
1.每条记录都是一行,其中包含一个字段。
1.第一行是信头,包含字段名称,即File Name
1.后续的行包含数据。每行都有一个文件名。
文件名可以分为三个部分,但都不包含下划线字符('_')。下划线字符用作分隔符,出现在第一部分和第二部分之间,也出现在第二部分和第三部分之间。

  1. group_name-此部分可以包含任意数量的字符。最后一个字母字符在转换为小写char时为group_name。所有其他字符都将被丢弃。
  2. question_number-此部分可以包含任意数量的字符。当转换为int类型时,最后一系列数字字符为question_number。因此,它是最接近此部分结尾的数字。所有其他字符都将被丢弃。
  3. student_id-此部分可以包含任意数量的字符。最接近开头的一系列数字(作为字符串)是student_id。所有其他字符都将被丢弃。

应用程序使用的Map

示例数据文件中有三个学生属于两个组。学生2211011080bc的成员。同样,2211011048属于cd,而2111011094属于ac
基于此,我假设一个学生可以属于多个组。因此,下面定义的Map将使用同时包含group_namestudent_id的级联键。
这些平凡的型别别名是设计用来改善下列定义的可读性。

using StudentID = std::string;
using GroupName = char;
using QuestionNumber = int;

字符串
结构GroupStudentKey定义将在Map中使用的关键字。它将group_name放在student_id之前,以便Map首先按组排序,然后在每个组内按学生排序。“spirpace”运算符(<=>)使编译器创建强制执行此排序的默认比较运算符。

struct GroupStudentKey
{
    GroupName group_name{};
    StudentID student_id;

    // ...

    auto operator<=> (GroupStudentKey const&) const
        = default;
};


QuestionsAnswered管理一个QuestionNumber对象的向量,其中每个QuestionNumber对象都是一个int。它是Map所使用的元素类型。
QuestionsAnswered类有六个成员函数:

  • sort-按升序对问题编号的向量进行排序。
  • operator()-返回对向量的引用。
  • operator() const-返回对向量的常量引用。
  • operator== const-默认相等运算符
  • operator<<-将问题编号输出到流。
  • operator>>-输入流中的问题编号。

现在我们有了Map所使用的类型,它的键为GroupStudentKey,并且包含QuestionsAnswered对象的元素。

using map_type = std::map<GroupStudentKey, QuestionsAnswered>;


学生回答
StudentAnswers封装了一个map对象,并提供了用于管理它的最小工具集。
文件StudentAnswers.h包含它的声明,沿着上面给出的Map定义。

// StudentAnswers.h
#ifndef STUDENT_ANSWERS_H
#define STUDENT_ANSWERS_H

#include <map>
#include <string>
#include <string_view>
#include <utility>
#include <vector>

using StudentID = std::string;
using GroupName = char;
using QuestionNumber = int;

struct GroupStudentKey
{
    GroupName group_name{};
    StudentID student_id;

    GroupStudentKey()
        = default;

    GroupStudentKey(GroupName const group_name, StudentID const& student_id)
        : group_name{ group_name }
        , student_id{ student_id }
    {}

    GroupStudentKey(GroupName const group_name, StudentID&& student_id)
        : group_name{ group_name }
        , student_id{ std::move(student_id) }
    {}

    auto operator<=> (GroupStudentKey const&) const
        = default;
};

class QuestionsAnswered
{
public:
    using vector_type = std::vector<QuestionNumber>;
    using size_type = vector_type::size_type;
    static_assert(std::is_same_v<size_type, std::size_t>);

    vector_type& operator()();
    vector_type const& operator()() const;

    bool operator== (QuestionsAnswered const&) const noexcept
        = default;

    void sort();

    friend std::ostream& operator<< (
        std::ostream& ost,
        QuestionsAnswered const& questions_answered);

    friend std::istream& operator>> (
        std::istream& ist,
        QuestionsAnswered& questions_answered);

private:
    vector_type questions_answered;
};

class StudentAnswers
{
public:
    using map_type = std::map<GroupStudentKey, QuestionsAnswered>;

    enum : int { 
        no_errors
        , open_error
        , header_error
        , skipped_record
        , stream_failed
    };

    std::string static exit_msg(int const exit_code);

    void sort();

    int read_csv(
        std::string const& file_name,
        std::string const& file_name_skipped_records = "skipped.csv");

    int write_csv(std::string const& file_name);

    map_type& operator()();
    map_type const& operator()() const;

private:
    map_type student_answers;
    bool parse_field_name(std::istream& ist, std::string_view sv);
    bool parse_header(std::istream& ist, std::ostream& log);
};

#endif  // !STUDENT_ANSWERS_H
// end file: StudentAnswers.h


StudentAnswers的成员函数大多数是自扩展的。一个重要的组处理文件I/O。

  • read_csv-读取由成员函数write_csv写入的CSV文件。清除Map,然后从文件中加载数据。此函数包括许多检查,当输入文件中的某个记录无法解析时,它可以给予详细的错误消息。
  • write_csv-写入一个CSV文件,其中包含Map中的字段student_idgroup_namequestions_answered

operator()的两个版本(一个是常量,另一个是非常量)返回对Map的引用。

  • operator()-传回对映的指涉。
  • operator() const-返回对Map的常量引用。

唯一的其他成员函式如下:

  • sort-在Map上进行一次遍历,并为每个元素调用questions_answered.sort()

  • exit_msg-传回描述做为参数提供之exit_codestd::string。这些是由成员函数read_csvwrite_csv传回的结束码。

  • parse_header-读取CSV文件的第一行。

  • parse_field_name-parse_header的辅助函数

// StudentAnswers.cpp
#include <algorithm>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>
#include <string_view>
#include <vector>

#include "StudentAnswers.h"
#include "tbx.utility.h"

//======================================================================
// QuestionsAnswered
//======================================================================
QuestionsAnswered::vector_type& QuestionsAnswered::operator()() {
    return questions_answered;
}

QuestionsAnswered::vector_type const& QuestionsAnswered::operator()() const {
    return questions_answered;
}

void QuestionsAnswered::sort() {
    std::sort(questions_answered.begin(), questions_answered.end());
}

std::ostream& operator<< (
    std::ostream& ost, 
    QuestionsAnswered const& questions_answered)
{
    if (!questions_answered().empty())
    {
        using size_type = QuestionsAnswered::size_type;
        enum : size_type { one = 1u };
        size_type const n_commas{ questions_answered().size() - one };
        for (size_type  i{}; i < n_commas; ++i)
            ost << questions_answered()[i] << ',';
        ost << questions_answered().back();
    }
    return ost;
}

std::istream& operator>> (
    std::istream& ist, 
    QuestionsAnswered& questions_answered)
{
    questions_answered().clear();
    std::string s;
    if (getline(ist, s))
    {
        std::stringstream sst{ s };
        int n;
        while (std::getline(sst, s, ','))
            if (tbx::convert_to_int(s, n))
            {
                questions_answered().push_back(n);
            }
            else
            {
                std::cerr << "QuestionsAnswered: `operator>>` could not "
                    "convert " << std::quoted(s) << " to `int`.\n"
                    "Skipping this number.\n\n";
            }
    }
    return ist;
}

//======================================================================
// StudentAnswers
//======================================================================
std::string StudentAnswers::exit_msg(int const exit_code)
{
    switch (exit_code)
    {
    case StudentAnswers::no_errors:
        return "no_errors";
    case StudentAnswers::open_error:
        return "open_error";
    case StudentAnswers::header_error:
        return "header_error";
    case StudentAnswers::skipped_record:
        return "skipped_record";
    case StudentAnswers::stream_failed:
        return "stream_failed";
    default:
        return "[invalid exit_code]";
    }
}

void StudentAnswers::sort()
{
    for (auto& [key, questions_answered] : student_answers)
        questions_answered.sort();
}

int StudentAnswers::read_csv(
    std::string const& file_name,
    std::string const& file_name_skipped_records)
{
    int exit_code = no_errors;
    std::ofstream log{ file_name_skipped_records };
    if (!log.is_open())
    {
        std::cerr
            << "StudentAnswers: Could not open log file: "
            << std::quoted(file_name_skipped_records)
            << "\nThis is the file where skipped records are logged."
            << "\n\n";
        return open_error;
    }
    std::ifstream ist{ file_name };
    if (!ist.is_open())
    {
        std::cerr
            << "StudentAnswers: Could not open input file: "
            << std::quoted(file_name)
            << "\n\n";
        return open_error;
    }
    if (!parse_header(ist, log))
    {
        std::cerr
            << "StudentAnswers: Could not parse header of input file: "
            << std::quoted(file_name)
            << "\n\n";
        return header_error;
    }

    GroupStudentKey key;
    QuestionsAnswered questions_answered;
    std::string record, s;
    std::stringstream sst;

    student_answers.clear();
    while (!std::getline(ist, record).eof())
    {
        if (ist.fail())
        {
            std::cerr << "StudentAnswers: `std::getline` failed unexpectedly mid-record.\n\n";
            exit_code = stream_failed;
            break;
        }
        sst.clear();
        sst.str(record);
        if (!std::getline(sst, key.student_id, ','))
        {
            std::cerr << "StudentAnswers: Could not parse `student_id`. Skipping record.\n\n";
            log << record << '\n';
            exit_code = skipped_record;
            continue;
        }
        enum : std::string::size_type { one = 1u };
        if (!std::getline(sst, s, ',') 
            || s.length() != one
            || !tbx::is_alpha(s.front()))
        {
            std::cerr << "StudentAnswers: Could not parse `group_name`. Skipping record.\n\n";
            log << record << '\n';
            exit_code = skipped_record;
            continue;
        }
        key.group_name = tbx::to_lower(s.front());
        sst >> questions_answered;
        if (questions_answered().empty())
        {
            std::cerr << "StudentAnswers: Could not parse `questions_answered`. Skipping record.\n\n";
            log << record << '\n';
            exit_code = skipped_record;
            continue;
        }
        if (student_answers.count(key))
        {
            std::cerr << "StudentAnswers: Duplicate `key`. Skipping record.\n\n";
            log << record << '\n';
            exit_code = skipped_record;
            continue;
        }
        student_answers[key] = questions_answered;
    }
    ist.close();
    log.close();
    return exit_code;
}

int StudentAnswers::write_csv(std::string const& file_name)
{
    int exit_code = no_errors;
    std::ofstream ost{ file_name };
    if (ost.is_open())
    {
        sort();
        char const* const header = "StudentID,GroupName,QuestionsAnswered\n";
        ost << header;
        for (auto const& [key, questions_answered] : student_answers)
        ost << key.student_id
            << ',' << key.group_name
            << ',' << questions_answered
            << '\n';
        ost.close();
    }
    else
    {
        std::cerr
            << "StudentAnswers: Could not open output file: "
            << std::quoted(file_name)
            << "\n\n";
        exit_code = open_error;
    }
    return exit_code;
}

bool StudentAnswers::parse_field_name(
    std::istream& ist, 
    std::string_view sv)
{
    std::string s;
    if (std::getline(ist, s, ','))
    {
        s = tbx::trim_whitespace(s);
        tbx::to_lower_in_place(s);
        return s == sv;
    }
    return false;
}

bool StudentAnswers::parse_header(std::istream& ist, std::ostream& log)
{
    std::string s;
    if (!std::getline(ist, s))
    {
        std::cerr << "StudentAnswers: `std::getline` failed unexpectedly "
            "while parsing header. \n\n";
        return false;
    }
    std::stringstream sst{ s };
    if (!parse_field_name(sst, "studentid") ||
        !parse_field_name(sst, "groupname") ||
        !parse_field_name(sst, "questionsanswered") ||
        !sst.eof())
    {
        std::cerr 
            << "StudentAnswers: Invalid header: "
            << std::quoted(s)
            << "Expected: \"StudentID,GroupName,QuestionsAnswered\"\n\n";
        return false;
    }
    return true;
}

StudentAnswers::map_type& StudentAnswers::operator()()
{
    return student_answers;
}

StudentAnswers::map_type const& StudentAnswers::operator()() const
{
    return student_answers;
}
// end file: StudentAnswers.cpp

文件列表

FileList类是一个文件读取器。

  • 它没有数据成员。
  • 所有成员函数都是静态的。
  • 函数read_csv接受一个StudentAnswers对象作为参数,并通过解析输入文件file_list.csv来填充它。
  • file_list.csv中的记录具有本答案开头所述的布局。
// FileList.h
#ifndef FILE_LIST_H
#define FILE_LIST_H

#include <fstream>
#include <iostream>
#include <string>

#include "StudentAnswers.h"

class FileList
{
public:
    enum : int {
        no_errors
        , open_error
        , header_error
        , skipped_record
        , stream_failed
    };
    std::string static exit_msg(int const exit_code);

    int static read_csv(
        std::string const& file_name,
        StudentAnswers& student_answers,
        std::string const& file_name_skipped_records = "skipped.csv");

private:
    enum : std::string::size_type { zero, one };

    bool static parse_header(std::istream& ist, std::ostream& log);

    bool static parse_group_name(
        std::string const& record,
        std::string::size_type& pos,
        GroupStudentKey& key);

    bool static parse_question_number(
        std::string const& record,
        std::string::size_type& pos,
        QuestionNumber& question_number);

    bool static parse_student_id(
        std::string const& record,
        std::string::size_type& pos,
        GroupStudentKey& key);
};

#endif // !FILE_LIST_H
// end file: FileList.h

语法分析并不难,但这是一项非常精细的工作,如果你不注意的话,很容易犯一个接一个的错误。
成员函数read_csv运行一个循环,每次迭代从输入文件中读取一行,并将其存储在字符串变量record中。
解析例程对变量record进行一次传递,使用变量pos来跟踪当前位置。posrecord中下一个未解析字符的下标。pos作为引用参数依次传递给每个解析例程。当其中一个例程更新它时,其他例程将看到变化。
解析例程大量使用std::string类中的find函数。当搜索失败时,它们都返回sentinel std::string::npos
第一对用于查找下划线。

  • record.find('_')-返回record中第一个下划线的下标。
  • record.find('_', pos)-返回record中“next”下划线的下标。开始搜索位置pos

下一组用于查找 * 数字串 *。前两组从位置pos向前搜索,用于解析student_id

  • record.find_first_of("0123456789", pos)-从位置pos开始向前搜索,并返回遇到的第一个数字的下标。
  • record.find_first_not_of("0123456789", pos)-从位置pos开始向前搜索,并返回遇到的第一个非数字的下标。

接下来的两个从位置pos向后搜索,并用于解析question_number

  • record.find_last_of("0123456789", pos)-从位置pos开始向后搜索,并返回遇到的第一个数字的下标。
  • record.find_last_not_of("0123456789", pos)-从位置pos开始向后搜索,并返回遇到的第一个非数字的下标。

在函数parse_group_id中,策略是找到第一个下划线,然后从那里向后搜索,寻找一个字母字符。找到的第一个字符被转换为小写,并存储为group_id
函数parse_question_number首先查找下一个下划线,然后从那里向后搜索两次。第一次向后搜索查找question_number中的最后一个数字。第二次向后搜索查找question_number中第一个数字之前的非数字字符。
question_number中的开始和结束数字的位置被确定之后,对std::from_chars的调用将它们转换为int
解析student_id与解析question_number类似,不同之处在于您从第二个下划线向前搜索,而不是向后搜索。不需要转换为int。调用substrstudent_id提取为字符串。

// FileList.cpp
#include <charconv>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <string>

#include "FileList.h"
#include "tbx.utility.h"

std::string FileList::exit_msg(int const exit_code)
{
    switch (exit_code)
    {
    case FileList::no_errors:
        return "no_errors";
    case FileList::open_error:
        return "open_error";
    case FileList::header_error:
        return "header_error";
    case FileList::skipped_record:
        return "skipped_record";
    case FileList::stream_failed:
        return "stream_failed";
    default:
        return "[invalid exit_code]";
    }
}

int FileList::read_csv(
    std::string const& file_name,
    StudentAnswers& student_answers,
    std::string const& file_name_skipped_records)
{
    int exit_code = no_errors;
    std::ofstream log{ file_name_skipped_records };
    if (!log.is_open())
    {
        std::cerr
            << "FileList: Could not open log file: "
            << std::quoted(file_name_skipped_records)
            << "\nThis is the file where skipped records are logged."
            << "\n\n";
        return open_error;
    }
    std::ifstream ist{ file_name };
    if (!ist.is_open())
    {
        std::cerr
            << "FileList: Could not open input file: "
            << std::quoted(file_name)
            << "\n\n";
        return open_error;
    }
    if (!parse_header(ist, log))
    {
        std::cerr
            << "FileList: Could not parse header of input file: "
            << std::quoted(file_name)
            << "\n\n";
        return header_error;
    }

    GroupStudentKey key;
    QuestionNumber question_number{};
    std::string record;
    std::string::size_type pos{};

    student_answers().clear();
    while (!std::getline(ist, record).eof())
    {
        if (ist.fail())
        {
            std::cerr << "FileList: `std::getline` failed unexpectedly mid-record.\n\n";
            exit_code = stream_failed;
            break;
        }
        pos = zero;
        if (!parse_group_name(record, pos, key))
        {
            std::cerr << "FileList: Could not parse `group_name`. Skipping record.\n\n";
            log << record << '\n';
            exit_code = skipped_record;
            continue;
        }
        if (!parse_question_number(record, pos, question_number))
        {
            std::cerr << "FileList: Could not parse `question_number`. Skipping record.\n\n";
            log << record << '\n';
            exit_code = skipped_record;
            continue;
        }
        if (!parse_student_id(record, pos, key))
        {
            std::cerr << "FileList: Could not parse `student_id`. Skipping record.\n\n";
            log << record << '\n';
            exit_code = skipped_record;
            continue;
        }
        student_answers()[key]().push_back(question_number);
    }
    ist.close();
    student_answers.sort();
    return exit_code;
}

bool FileList::parse_header(std::istream& ist, std::ostream& log)
{
    std::string s;
    if (!std::getline(ist, s))
    {
        std::cerr << "FileList: `std::getline` failed unexpectedly "
            "while parsing header. \n\n";
        return false;
    }
    s = tbx::trim_whitespace(s);
    tbx::to_lower_in_place(s);
    if (s != "file name")
    {
        std::cerr
            << "FileList: Could not parse header: "
            << std::quoted(s)
            << " \n\n";
        log << s << '\n';
        return false;
    }
    return true;
}

bool FileList::parse_group_name(
    std::string const& record,
    std::string::size_type& pos,
    GroupStudentKey& key)
{
    pos = record.find('_');
    if (pos == std::string::npos || pos == zero)
        return false;
    auto p{ pos++ };
    do {
        if (tbx::is_alpha(record[--p]))
        {
            key.group_name = tbx::to_lower(record[p]);
            return true;
        }
    } while (p > zero);
    return false;
}

bool FileList::parse_question_number(
    std::string const& record,
    std::string::size_type& pos,
    QuestionNumber& question_number)
{
    auto const start{ pos };
    pos = record.find('_', pos);
    if (pos == std::string::npos || pos == start)
        return false;
    auto p{ pos - one };
    p = record.find_last_of("0123456789", p);
    if (p == std::string::npos || p < start)
        return false;
    auto const q{ p };  // final digit in `question_number`
    p = record.find_last_not_of("0123456789", p);
    if (p == std::string::npos || p + one < start)
        return false;
    p += one;  // first digit in question_number
    auto [ptr, ec] = std::from_chars(
        &record[p],
        &record[q + one],
        question_number);
    if (ptr == &record[p])
        return false;
    ++pos;
    return true;
}

bool FileList::parse_student_id(
    std::string const& record,
    std::string::size_type& pos,
    GroupStudentKey& key)
{
    pos = record.find_first_of("0123456789", pos);
    if (pos == std::string::npos)
        return false;
    auto const p{ pos };
    pos = record.find_first_not_of("0123456789", pos);
    if (pos == std::string::npos)
        return false;
    key.student_id = record.substr(p, pos - p);
    return true;
}
// end file: FileList.cpp

函数main

给定类StudentAnswersFileList,编写函数main是一件小事。它调用这些类的读和写函数,并报告exit_code

// main.cpp
#include <iomanip>
#include <iostream>
#include <string>

#include "FileList.h"
#include "StudentAnswers.h"

int main()
{
    const std::string input_file_name = "file_list.csv";
    const std::string output_file_name = "output.csv";

    std::cout << "Parsing file: " 
        << std::quoted(input_file_name) 
        << '\n';
    StudentAnswers student_answers;
    int exit_code = FileList::read_csv(input_file_name, student_answers);
    if (exit_code == FileList::no_errors 
        || exit_code == FileList::skipped_record)
    {
        student_answers.write_csv(output_file_name);

        // Verify that function `read_csv` works as expected.
        StudentAnswers test;
        test.read_csv(output_file_name);
        if (test() != student_answers())
            std::cerr << "Output file cannot be read back in.\n\n";
    }

    std::cout 
        << "exit_code: " 
        << FileList::exit_msg(exit_code) 
        << "\n\n";
    return exit_code;
}
// end file: main.cpp

工具箱中的一些函数

// tbx.utility.h
#ifndef TBX_UTILITY_H
#define TBX_UTILITY_H

#include <string>

namespace tbx
{
    bool convert_to_int(std::string const& s, int& n);
    bool is_alpha(char const c) noexcept;
    char to_lower(char const c) noexcept;
    void to_lower_in_place(std::string& s) noexcept;
    std::string trim_whitespace(std::string const& s);
}
#endif // !TBX_UTILITY_H
// end file: tbx.utility.h
// tbx.utility.cpp
#include <algorithm>
#include <cctype>
#include <sstream>
#include <string>

namespace tbx
{
    bool convert_to_int(std::string const& s, int& n)
    {
        std::stringstream sst{ s };
        if ((sst >> n) && !sst.eof())
            sst >> std::ws;
        return sst.eof() && !sst.fail();
    }
    bool is_alpha(char const c) noexcept
    {
        return static_cast<bool>(std::isalpha(static_cast<unsigned char>(c)));
    }
    char to_lower(char const c) noexcept
    {
        return static_cast<char>(std::tolower(static_cast<unsigned char>(c)));
    }
    void to_lower_in_place(std::string& s) noexcept
    {
        std::transform(s.begin(), s.end(), s.begin(),
            [](unsigned char c) {
                return std::tolower(c);
            }
        );
    }
    std::string trim_whitespace(std::string const& s)
    {
        // Trim leading and trailing whitespace from string `s`.
        auto const first{ s.find_first_not_of(" \f\n\r\t\v") };
        if (first == std::string::npos)
            return {};
        auto const last{ s.find_last_not_of(" \f\n\r\t\v") };
        enum : std::string::size_type { one = 1u };
        return s.substr(first, (last - first + one));
    }
}
// end file: tbx.utility.cpp

输入

我运行了两次程序,一次是针对完整的输入文件,一次是针对截断的文件。
完整的文件file_list.csv(来自Google Drive)包含174条记录。
被截断的文件只有十几条记录。被截断的文件中的几个学生属于多个组。几个学生回答了多个问题。

File Name
b_1_2211011080.txt
b_2_2211011080.txt
b_3_2211011080.txt
c_2_2211011080.cpp
C_2_2211011048.cpp
d_1_2211011048.cpp
a_2_2111011094.txt
a_3_2111011094.txt
GroupC_Q3_2111011094.txt
d _1 _2111011226  (1).cpp
d _2 _2111011226  (2).cpp
d _3 _2111011226  (3).cpp

输出

此程序的输出被发送到两个文件:

  1. output.csv-Map的内容。这是成功解析的数据。此文件包含三个字段:StudentIDGroupNameQuestionsAnswered
  2. skipped.csv-任何无法解析的记录都将写入此文件。
    在程序的两次运行中,没有发生解析错误。文件skipped.csv在两次运行后都是空的。
    下面是从截断的输入文件生成的输出:
StudentID,GroupName,QuestionsAnswered
2111011094,a,2,3
2211011080,b,1,2,3
2111011094,c,3
2211011048,c,2
2211011080,c,2
2111011226,d,1,2,3
2211011048,d,1

相关问题