避免C++11原始字符串中的第一个换行符?

wooyq4lh  于 2023-01-15  发布在  其他
关注(0)|答案(8)|浏览(132)

C++11中的原始字符串字面量非常好,除了格式化它们的明显方式导致第一个字符是冗余的换行符\n
请看这个例子:

some_code();
    std::string text = R"(
This is the first line.
This is the second line.
This is the third line.
)";
    more_code();

显而易见的变通方案看起来非常丑陋:

some_code();
    std::string text = R"(This is the first line.
This is the second line.
This is the third line.
)";
    more_code();

有没有人找到一个优雅的解决方案呢?

u1ehiz5o

u1ehiz5o1#

通过在字符串文本自动转换为的const char*上加1,可以获得指向第二个字符的指针(跳过前导换行符):

some_code();
    std::string text = 1 + R"(
This is the first line.
This is the second line.
This is the third line.
)";
    more_code();

恕我直言,上面的代码在中断周围代码的缩进方面存在缺陷。一些语言提供了一个内置函数或库函数,可以做如下操作:

  • 删除空的前导行,并且
  • 查看第二行的缩进量,并从以后所有行中删除相同的缩进量

它允许以下用法:

some_code();
std::string text = unindent(R"(
    This is the first line.
    This is the second line.
    This is the third line.
    )");
more_code();

编写这样的函数相对简单...

std::string unindent(const char* p)
{
    std::string result;
    if (*p == '\n') ++p;
    const char* p_leading = p;
    while (std::isspace(*p) && *p != '\n')
        ++p;
    size_t leading_len = p - p_leading;
    while (*p)
    {
        result += *p;
        if (*p++ == '\n')
        {
            for (size_t i = 0; i < leading_len; ++i)
                if (p[i] != p_leading[i])
                    goto dont_skip_leading;
            p += leading_len;
        }
      dont_skip_leading: ;
    }
    return result;
}

(The一个有点奇怪的p_leading[i]方法旨在让使用制表符和空格的人的生活不会比他们自己的生活更困难;-P,只要行以相同的顺序开始。)

31moq8wy

31moq8wy2#

这可能不是您想要的,但为了以防万一,您应该注意自动字符串文字连接:

std::string text =
"This is the first line.\n"
"This is the second line.\n"
"This is the third line.\n";
yks3o0rb

yks3o0rb3#

我推荐@Brian 's answer,特别是如果你只需要几行文字,或者你可以用你的文字编辑器来处理的话。如果不是这样的话,我有一个替代方案。

std::string text =
"\
This is the first line." R"(
This is the second line.
This is the third line.)";

Live example
原始字符串常量仍然可以与“普通”字符串常量连接,如代码所示,开头的"\意味着从第一行“消除”"字符,将其放在自己的行中。
不过,如果我决定的话,我还是会把这些文本放到一个单独的文件中,然后在运行时加载它。不过这对你来说没有压力:-)。
而且,这也是我最近写的比较难看的代码之一。

prdp8dxp

prdp8dxp4#

我能看到的最接近的是:

std::string text = ""
R"(This is the first line.
This is the second line.
This is the third line.
)";

如果在分隔符序列中允许使用空格,那就更好了。

std::string text = R"
    (This is the first line.
This is the second line.
This is the third line.
)
    ";

我的预处理器会给你一个警告,但不幸的是,它有点没用。Clang和GCC完全被抛弃了。

nszi6y05

nszi6y055#

接受的答案会从clang-tidy产生警告cppcoreguidelines-pro-bounds-constant-array-index。请参见Pro.bounds:有关详细信息,请参见边界安全性配置文件。
如果您没有std::span,但至少使用C++17编译,请考虑:

constexpr auto text = std::string_view(R"(
This is the first line.
This is the second line.
This is the third line.
)").substr(1);

其主要优点是可读性(恕我直言),并且您可以在代码的其余部分打开这个整齐的警告。
使用gcc时,如果有人无意中将原始字符串简化为空字符串,则会出现编译器错误(demo),而可接受的方法要么不产生任何结果(demo),要么根据编译器设置出现“outside bounds of constant string”警告。

ecfdbz9o

ecfdbz9o6#

我也遇到了同样的问题,我认为下面的解决方案是以上所有解决方案中最好的。我希望它对你也有帮助(见评论中的例子):

/**
 * Strips a multi-line string's indentation prefix.
 *
 * Example:
 * \code
 *   string s = R"(|line one
 *                 |line two
 *                 |line three
 *                 |)"_multiline;
 *   std::cout << s;
 * \endcode
 *
 * This prints three lines: @c "line one\nline two\nline three\n"
 *
 * @author Christian Parpart <christian@parpart.family>
 */

inline std::string operator ""_multiline(const char* text, unsigned long size) {
  if (!*text)
    return {};

  enum class State {
    LineData,
    SkipUntilPrefix,
  };

  constexpr char LF = '\n';
  State state = State::LineData;
  std::stringstream sstr;
  char sep = *text++;

  while (*text) {
    switch (state) {
      case State::LineData: {
        if (*text == LF) {
          state = State::SkipUntilPrefix;
          sstr << *text++;
        } else {
          sstr << *text++;
        }
        break;
      }
      case State::SkipUntilPrefix: {
        if (*text == sep) {
          state = State::LineData;
          text++;
        } else {
          text++;
        }
        break;
      }
    }
  }

  return sstr.str();
}
tp5buhyn

tp5buhyn7#

是的,这很烦人。也许应该有原始文本(R"PREFIX(")和 * multiline * 原始文本(M"PREFIX)。
我想出了这个替代方案,几乎描述自己:

#include<iterator> // std::next
...
{
    ...
    ...
    std::string atoms_text = 
std::next/*_line*/(R"XYZ(
  O123        12.4830720891       13.1055820441        9.5288258996
  O123        13.1055820441       13.1055820441        9.5288258996
)XYZ");
    assert( atoms_text[0] != '\n' );
    ...
}

局限性:
1.如果原始文本为空,它将生成一个无效字符串,但这应该是显而易见的。
1.如果原始文本没有以新行开始,它将替代第一个字符。

  1. std::next仅在C++17中是constexpr,然后您可以使用1+(char const*)R"XYZ(",但它不太清楚,可能会产生警告。
constexpr auto atom_text = 1 + (R"XYZ(
  O123        12.4830720891       13.1055820441        9.5288258996
  O123        13.1055820441       13.1055820441        9.5288258996
)XYZ");

此外,没有担保;)。毕竟,我不知道用指向静态数据的指针进行算术运算是否合法。
+ 1方法的另一个优点是它可以放在最后:

constexpr auto atom_text = R"XYZ(
  O123        12.4830720891       13.1055820441        9.5288258996
  O123        13.1055820441       13.1055820441        9.5288258996
)XYZ" + 1;

可能性是无限的:
一个三个三个一个

qzlgjiam

qzlgjiam8#

在C++20中,这可以通过使用字符串字面运算符模板在编译时完全实现。
这有几个主要好处:

  • 只有未缩进的字符串才会存储在生成的二进制文件中。
  • 无分配,零运行时开销
  • 结果值将是一个字符数组(const char (&)[N])的引用-类似于C++中的普通字符常量;所以没有std::array恶作剧和生命周期问题。

用法示例:godbolt

std::cout << R"(
     a
    b
     c
    d
)"_M << std::endl;
/* Will print the following:
 a
b
 c
d
*/

// The type of R"(...)"_M is const char (&)[N],
// so it can be used like a normal string literal:
std::cout << std::size(R"(asdf)"_M) << std::endl;
// (will print 5)
constexpr std::string_view str = R"(
  foo
  bar
)"_M;
// str == "foo\nbar"

// also works with wchar_t, char8_t, char16_t and char32_t literals:
std::wcout << LR"(
  foo
  bar
)"_M;
std::wcout << std::endl;

通常情况下,不可能将字符串常量作为模板参数传递,例如:

template<const char* str>
void foo();

// ill-formed
foo<"bar">();

但是在C++20中,我们现在可以有类类型的模板参数,并且这些参数可以从字符串常量初始化。
与新的字符串文字运算符模板结合使用,可以将整个字符串文字作为模板参数获取:

template<class _char_type, std::size_t size>
struct string_wrapper {
    using char_type = _char_type;

    consteval string_wrapper(const char_type (&arr)[size]) {
        std::ranges::copy(arr, str);
    }

    char_type str[size];
};

template<string_wrapper str>
consteval decltype(auto) operator"" _M() {
    /*...*/
}

// R"(foobar)"_M
// would now result in the following code:
// operator"" _M<string_wrapper<char, 7>{"foobar"}>()

将长度和单个字符都作为常量表达式,使我们能够在编译时计算未缩进字符串所需的长度,并将结果字符串存储在另一个模板参数中(这样我们只需要返回对最终字符串值的引用):

// unindents the individual lines of a raw string literal
// e.g. unindent_string("  \n  a\n  b\n  c\n") -> "a\nb\nc"
template<class char_type>
consteval std::vector<char_type> unindent_string(string_view<char_type> str) {
    /* ... */
}

// returns the size required for the unindented string
template<class char_type>
consteval std::size_t unindent_string_size(string_view<char_type> str) {
    /* ... */
}

// used for sneakily creating and storing
// the unindented string in a template parameter.
template<string_wrapper sw>
struct unindented_string_wrapper {
    using char_type = typename decltype(sw)::char_type;
    static constexpr std::size_t buffer_size = unindent_string_size<char_type>(sw.str);
    using array_ref = const char_type (&)[buffer_size];

    consteval unindented_string_wrapper(int) {
        auto newstr = unindent_string<char_type>(sw.str);
        std::ranges::copy(newstr, buffer);
    }

    consteval array_ref get() const {
        return buffer;
    }

    char_type buffer[buffer_size];
};

// uses a defaulted template argument that depends on the str
// to initialize the unindented string within a template parameter.
// this enables us to return a reference to the unindented string.
template<string_wrapper str, unindented_string_wrapper<str> unindented = 0>
consteval decltype(auto) do_unindent() {
    return unindented.get();
}

// the actual user-defined string literal operator
template<string_wrapper str>
consteval decltype(auto) operator"" _M() {
    return do_unindent<str>();
}

完整代码:godbolt

#include <algorithm>
#include <string_view>
#include <vector>
#include <ranges>

namespace multiline_raw_string {
    template<class char_type>
    using string_view = std::basic_string_view<char_type>;

    // characters that are considered space
    // we need this because std::isspace is not constexpr
    template<class char_type>
    constexpr string_view<char_type> space_chars = std::declval<string_view<char_type>>();
    template<>
    constexpr string_view<char> space_chars<char> = " \f\n\r\t\v";
    template<>
    constexpr string_view<wchar_t> space_chars<wchar_t> = L" \f\n\r\t\v";
    template<>
    constexpr string_view<char8_t> space_chars<char8_t> = u8" \f\n\r\t\v";
    template<>
    constexpr string_view<char16_t> space_chars<char16_t> = u" \f\n\r\t\v";
    template<>
    constexpr string_view<char32_t> space_chars<char32_t> = U" \f\n\r\t\v";
    
    
    // list of all potential line endings that could be encountered
    template<class char_type>
    constexpr string_view<char_type> potential_line_endings[] = std::declval<string_view<char_type>[]>();
    template<>
    constexpr string_view<char> potential_line_endings<char>[] = {
        "\r\n",
        "\r",
        "\n"
    };
    template<>
    constexpr string_view<wchar_t> potential_line_endings<wchar_t>[] = {
        L"\r\n",
        L"\r",
        L"\n"
    };
    template<>
    constexpr string_view<char8_t> potential_line_endings<char8_t>[] = {
        u8"\r\n",
        u8"\r",
        u8"\n"
    };
    template<>
    constexpr string_view<char16_t> potential_line_endings<char16_t>[] = {
        u"\r\n",
        u"\r",
        u"\n"
    };
    template<>
    constexpr string_view<char32_t> potential_line_endings<char32_t>[] = {
        U"\r\n",
        U"\r",
        U"\n"
    };

    // null-terminator for the different character types
    template<class char_type>
    constexpr char_type null_char = std::declval<char_type>();
    template<>
    constexpr char null_char<char> = '\0';
    template<>
    constexpr wchar_t null_char<wchar_t> = L'\0';
    template<>
    constexpr char8_t null_char<char8_t> = u8'\0';
    template<>
    constexpr char16_t null_char<char16_t> = u'\0';
    template<>
    constexpr char32_t null_char<char32_t> = U'\0';

    // detects the line ending used within a string.
    // e.g. detect_line_ending("foo\nbar\nbaz") -> "\n"
    template<class char_type>
    consteval string_view<char_type> detect_line_ending(string_view<char_type> str) {
        return *std::ranges::max_element(
            potential_line_endings<char_type>,
            {},
            [str](string_view<char_type> line_ending) {
                // count the number of lines we would get with line_ending
                auto view = std::views::split(str, line_ending);
                return std::ranges::distance(view);
            }
        );
    }

    // returns a view to the leading sequence of space characters within a string
    // e.g. get_leading_space_sequence(" \t  foo") -> " \t  "
    template<class char_type>
    consteval string_view<char_type> get_leading_space_sequence(string_view<char_type> line) {
        return line.substr(0, line.find_first_not_of(space_chars<char_type>));
    }

    // checks if a line consists purely out of space characters
    // e.g. is_line_empty("    \t") -> true
    //      is_line_empty("   foo") -> false
    template<class char_type>
    consteval bool is_line_empty(string_view<char_type> line) {
        return get_leading_space_sequence(line).size() == line.size();
    }

    // splits a string into individual lines
    // and removes the first & last line if they are empty
    // e.g. split_lines("\na\nb\nc\n", "\n") -> {"a", "b", "c"}
    template<class char_type>
    consteval std::vector<string_view<char_type>> split_lines(
        string_view<char_type> str,
        string_view<char_type> line_ending
    ) {
        std::vector<string_view<char_type>> lines;

        for (auto line : std::views::split(str, line_ending)) {
            lines.emplace_back(line.begin(), line.end());
        }

        // remove first/last lines in case they are completely empty
        if(lines.size() > 1 && is_line_empty(lines[0])) {
            lines.erase(lines.begin());
        }
        if(lines.size() > 1 && is_line_empty(lines[lines.size()-1])) {
            lines.erase(lines.end()-1);
        }

        return lines;
    }

    // determines the longest possible sequence of space characters
    // that we can remove from each line.
    // e.g. determine_common_space_prefix_sequence({" \ta", " foo", " \t\ŧbar"}) -> " "
    template<class char_type>
    consteval string_view<char_type> determine_common_space_prefix_sequence(
        std::vector<string_view<char_type>> const& lines
    ) {
        std::vector<string_view<char_type>> space_sequences = {
            string_view<char_type>{} // empty string
        };

        for(string_view<char_type> line : lines) {
            string_view<char_type> spaces = get_leading_space_sequence(line);
            for(std::size_t len = 1; len <= spaces.size(); len++) {
                space_sequences.emplace_back(spaces.substr(0, len));
            }
   
            // remove duplicates
            std::ranges::sort(space_sequences);
            auto [first, last] = std::ranges::unique(space_sequences);
            space_sequences.erase(first, last);
        }

        // only consider space prefix sequences that apply to all lines
        auto shared_prefixes = std::views::filter(
            space_sequences,
            [&lines](string_view<char_type> prefix) {
                return std::ranges::all_of(
                    lines,
                    [&prefix](string_view<char_type> line) {
                        return line.starts_with(prefix);
                    }
                );
            }
        );

        // select the longest possible space prefix sequence
        return *std::ranges::max_element(
            shared_prefixes,
            {},
            &string_view<char_type>::size
        );
    }

    // unindents the individual lines of a raw string literal
    // e.g. unindent_string("  \n  a\n  b\n  c\n") -> "a\nb\nc"
    template<class char_type>
    consteval std::vector<char_type> unindent_string(string_view<char_type> str) {
        string_view<char_type> line_ending = detect_line_ending(str);
        std::vector<string_view<char_type>> lines = split_lines(str, line_ending);
        string_view<char_type> common_space_sequence = determine_common_space_prefix_sequence(lines);

        std::vector<char_type> new_string;
        bool is_first = true;
        for(auto line : lines) {
            // append newline
            if(is_first) {
                is_first = false;
            } else {
                new_string.insert(new_string.end(), line_ending.begin(), line_ending.end());
            }

            // append unindented line
            auto unindented = line.substr(common_space_sequence.size());
            new_string.insert(new_string.end(), unindented.begin(), unindented.end());
        }

        // add null terminator
        new_string.push_back(null_char<char_type>);

        return new_string;
    }

    // returns the size required for the unindented string
    template<class char_type>
    consteval std::size_t unindent_string_size(string_view<char_type> str) {
        return unindent_string(str).size();
    }

    // simple type that stores a raw string
    // we need this to get around the limitation that string literals
    // are not considered valid non-type template arguments.
    template<class _char_type, std::size_t size>
    struct string_wrapper {
        using char_type = _char_type;

        consteval string_wrapper(const char_type (&arr)[size]) {
            std::ranges::copy(arr, str);
        }

        char_type str[size];
    };

    // used for sneakily creating and storing
    // the unindented string in a template parameter.
    template<string_wrapper sw>
    struct unindented_string_wrapper {
        using char_type = typename decltype(sw)::char_type;
        static constexpr std::size_t buffer_size = unindent_string_size<char_type>(sw.str);
        using array_ref = const char_type (&)[buffer_size];

        consteval unindented_string_wrapper(int) {
            auto newstr = unindent_string<char_type>(sw.str);
            std::ranges::copy(newstr, buffer);
        }

        consteval array_ref get() const {
            return buffer;
        }

        char_type buffer[buffer_size];
    };

    // uses a defaulted template argument that depends on the str
    // to initialize the unindented string within a template parameter.
    // this enables us to return a reference to the unindented string.
    template<string_wrapper str, unindented_string_wrapper<str> unindented = 0>
    consteval decltype(auto) do_unindent() {
        return unindented.get();
    }

    // the actual user-defined string literal operator
    template<string_wrapper str>
    consteval decltype(auto) operator"" _M() {
        return do_unindent<str>();
    }
}

using multiline_raw_string::operator"" _M;

相关问题