tidyverse中的read_delim()无法像basic read.table()那样直接更正文本文件中未对齐的标题

irlmq6kh  于 2022-12-05  发布在  其他
关注(0)|答案(1)|浏览(173)

I am trying to use tidyverse read_delim() to read a tab-separated text file. I can easily use the basic R's read.table() with no problem but when I tested read_delim() with delim = "\t"; I got a problem. For example, I have a file below, "test.txt". As you can see, the header shifts to the right as the first col is row names without a header.

T1  T2  T3
A   1   4   7
B   2   5   8
C   3   6   9

I can use basic R to read this file successfully:

dat <- read.table("test.txt", header=T, sep="\t")

dat
   T1 T2 T3
A  1  4  7
B  2  5  8
C  3  6  9

But when I tried to use tidyverse read_delim, I got problems:

dat1 <- read_delim("test.txt", delim ="\t")
Rows: 3 Columns: 3                                                                                                   
── Column specification ──────────────────────────────────────────────────────────
Delimiter: "\t"
chr (2): T1, T3
dbl (1): T2
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning message:
One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)

I know basic R's read.table() can automatically correct this problem, but could someone tell me if tidyverse read_delim() has a way to resolve this issue? Thank you! -Xiaokuan

3qpi33ja

3qpi33ja1#

问题并不完全在于标题未对齐-而是readr根本不支持或识别行名称。* readr::read_delim()因此没有考虑行名称没有列标题的事实,而只看到三个列名称后面跟着四列数据。
如果您的目标是将数据导入为tibble,最好的办法可能是使用base::read.table(),然后使用tibble::as_tibble(),使用rownames参数将行名称转换为常规列。
第一个
另一个选项是手动编辑输入文件,在行名称上方包括列标题。

  • 顺便说一句,这不是疏忽,而是tidyverse团队有意选择的,因为他们认为行名称是不好的做法。例如,来自tibble文档:“一般来说,最好避免使用行名称,因为它们基本上是一个字符列,与其他列的语义不同。”另请参阅tibble github中的this interesting discussion

相关问题