gcc 为什么wcrtomb只支持ASCII？

bmp9r5qi 于 2024-01-08 发布在其他

关注(0)|答案(1)|浏览(168)

在我的系统wcrtomb()上，即使我使用-fexec-charset=utf-8编译，它似乎也认为“窄多字节表示”意味着“仅ASCII”。我的印象是-fexec-charset gcc标志控制“窄多字节表示”的含义，wcrtomb从“宽字符集”转换为“窄多字节表示”。如果“窄多字节表示”是utf-8，“宽字符集”是utf-32比wcrtomb应该从utf-32转换到utf-8。我知道实际的答案可能是只使用explicit utf-32 to utf-8 conversion instead of depending on "wide character set" and "narrow multibyte representation"。我想知道为什么**这不做我所期望的。

#include <clocale>
#include <cwchar>
#include <iostream>
#include <string>
#include <vector>
#include <fstream>

int main() {
    wchar_t max = 0x10FFFF;
    std::vector<char> out(MB_CUR_MAX * max);
    char *end = &out[0];
    for(wchar_t c = 0; c < max; ++c) {
        std::mbstate_t state{};
        std::size_t ret = wcrtomb(end, c, &state);
    if(ret != static_cast<std::size_t>(-1)) {
        end += ret;
    }
    }
    std::ofstream outfile("out", std::ios::out | std::ios::binary); 
    outfile.write(&out[0], end - &out[0]);
    return 0;
}

个字符
我所尝试的：
1.设置-fexec-charset=utf-8，尽管gcc文档说这是默认值
1.设置-fwide-exec-charset=utf-32 le，即使看起来已经是这种情况
1.为编译和执行设置LC_ALL=en_US.UTF-8
1.使用clang而不是gcc进行编译（不支持-fwide-exec-charset，但打印__clang_wide_literal_encoding__表示UTF-32）
系统信息：Ubuntu 22.04.3 LTS g++（Ubuntu 11.4.0- 1ubuntu 1 ~22.04）11.4.0 Ubuntu clang version 14.0.0-1ubuntu1.1

gcc

来源：https://stackoverflow.com/questions/77520909/why-is-wcrtomb-ascii-only

1条答案

按热度按时间

6kkfgxo01#

为什么wcrtomb只支持ASCII？
因为程序中的区域设置是C。C程序在启动时的初始区域设置是C，这是ASCII。转换是区域设置相关的。如果你想从环境中继承区域设置，请使用setlocale(LC_ALL, "")。请参阅setlocale和locale.h文档。你链接的示例设置区域设置，你的代码没有。
-fexec-charset gcc标志控制“窄多字节表示”的含义
不。-fexec-charset选择编译器用来将源代码中的字符串文字"π"转换为二进制代码的编码。与-fwide-exec-charset相同，但适用于L"π"宽文字。C标准库函数根据区域设置选择多字节字符编码。

赞(0）回复(0）举报 2024-01-08

我来回答

gcc 为什么wcrtomb只支持ASCII？

1条答案

相关问题

热门标签

最新问答