从R导出西里尔字符?

smdnsysy  于 2022-12-06  发布在  其他
关注(0)|答案(2)|浏览(175)

我有一个数据集,其中一列包含俄语单词:

raw_data2 = structure(list(word = c("абрикос", 
                                    "автомобиль", 
                                    "аист", 
                                    "ананас", 
                                    "апрель", 
                                    "атака", 
                                    "баклажан"), 
subject_nr = c(3L, 21L, 12L, 17L, 8L, 1L, 17L), 
acc = c(98.976109215, 91.8803418803, 94.8979591837, 94.5273631841, 94.4444444444, 94.5355191257, 94.3661971831)), 
row.names = c(1L, 100L, 200L, 300L, 400L, 500L, 600L), 
class = "data.frame")

当我在RStudio中查看文件时,没有任何问题:

然而,当我将数据导出到表中以便在Excel中进一步处理时,我得到了这个UTF-混乱,Excel无法将其转换回俄语单词(即使在数据导入过程中选择了UTF-8):

"word";"subject_nr";"acc"
"<U+0430><U+0431><U+0440><U+0438><U+043A><U+043E><U+0441>";3;98,976109215
"<U+0430><U+0432><U+0442><U+043E><U+043C><U+043E><U+0431><U+0438><U+043B><U+044C>";21;91,8803418803
"<U+0430><U+0438><U+0441><U+0442>";12;94,8979591837
"<U+0430><U+043D><U+0430><U+043D><U+0430><U+0441>";17;94,5273631841
"<U+0430><U+043F><U+0440><U+0435><U+043B><U+044C>";8;94,4444444444
"<U+0430><U+0442><U+0430><U+043A><U+0430>";1;94,5355191257
"<U+0431><U+0430><U+043A><U+043B><U+0430><U+0436><U+0430><U+043D>";17;94,3661971831

有没有办法在保存表时强制R用对应的西里尔字母替换这些字符串?它当然“知道”这些字母是什么,因为它在预览中显示它们。我使用了以下代码(不起作用):

write.table(raw_data2,
            file = "raw_data2.csv",
            append = FALSE,
            quote = TRUE,
            sep = ";",
            eol = "\n",
            na = "NA",
            dec = ",",
            row.names = FALSE,
            col.names = TRUE,
            qmethod = c("escape", "double"),
            fileEncoding = "UTF-8")
xxb16uws

xxb16uws1#

如果你把它写到xlsx文件中,对我来说很好。

openxlsx::write.xlsx(raw_data2, 'temp.xlsx')
pod7payv

pod7payv2#

对我来说,Sys.setlocale("LC_CTYPE", "russian")工作得很好(代码来源:(第10页)

相关问题