R语言如何根据另一个变量来查看变量的某个值/类别(大型数据集)

x759pob2 于 2023-11-14 发布在其他

关注(0)|答案(2)|浏览(143)

我写这个问题是因为我在R studio中遇到的另一个问题。我有一个非常大的数据集，其中包含鸟类的运动数据（ACC），并且每个个体都有多行（每一行代表一个时间戳）。在我的数据集中，我需要查看在某个地区有多少个个体。这里的问题是，我为每个个体设置了许多行，使用简单的函数，如table或summary，返回分配给该地区的行数。我想要的是知道属于该地区的个人，使用简单的函数。
这是我到目前为止所做的：

我的框架中有很多行，但只有大约50个个体（每个个体有多行）。
我总共有大约15个地区，每行有一个地区ID（重复）。

我试过用table

table(df$territory_id) %>% sort(decreasing = TRUE) %>% head

字符串
这给了我输出：

ter1  ter2  ter3  ter4  ter5  ter6 
275034 207746 232739 165260 162103 259644

型
这里我有区域ID的行数。因为我想知道有多少不同的人属于一个区域，我已经在单独的对象中划分了区域，并为此做了表：

t <- filter(df, territory == "ter1")

型
然后：

table(t$individualID)

型
这给了我我想要的输出。但是，我需要为每个地区重复这个过程。
我想知道是否有更简单的方法来做到这一点？我只有15个领土，但如果我有更多的领土，那将需要很多时间来重复功能。有没有更简单的方法来做到这一点？

来源：https://stackoverflow.com/questions/77350614/how-to-look-at-a-certain-value-category-of-a-variable-according-to-another-one

2条答案

按热度按时间

vshtjzan1#

你的数据看起来很大，所以虽然你的头脑给出了数据的样子，但它并不好用（因为它看起来像是一个位置的一只鸟的六个时间戳）。所以我创建了我自己的，希望仍然相似：

library(tidyverse)
set.seed(0)

df <- data.frame(
    bird_id = rep(1:10, each = 10),
    territory_id = sample(LETTERS[1:10], 100, replace = TRUE),
    timestamp =  ymd_hms("2023-01-01 12:00:00") + sample(1:10000000, 100, replace = TRUE))

> head(df)
  bird_id territory_id           timestamp
1       1            I 2023-03-05 03:57:14
2       1            D 2023-01-01 21:06:37
3       1            G 2023-03-01 07:23:02
4       1            A 2023-02-23 01:09:48
5       1            B 2023-03-29 22:41:45
6       1            G 2023-01-29 03:29:01

因此，虽然我很清楚你想分析你的数据集，但我不确定你具体想做什么。所以这里有一些你可能想要的东西，以及你如何做到这一点：

# 1. get the number of birds you have seen at any point in each territory
df |>
  distinct(territory_id, bird_id) |>
  count(territory_id)

# 2. count the number of rows in your dataset for each territory
count(df, territory_id)

# 3. count the number of rows in your dataset for each territory and bird

count(df, territory_id, bird_id)

型

赞(0）回复(0）举报 2023-11-14

gg58donl2#

是的！这就是我想知道的！非常感谢！基本上我已经看了你提供的第一个代码：

df |>
distinct(territory_id, bird_id) |>
count(territory_id)

字符串
它返回的结果是这样的：

territory_id     n
  <chr>        <int>
  1 GR002            2
  2 GR009            1
  3 GR011            1

型
等等
但在这里，我想知道属于该领土的individualID（也许然后我把它改为）：

df |>
distinct(territory_id, bird_id) |>
count(territory_id, bird_id)

型
它还给了我：

<chr>        <chr>                    <int>
  1 GR002        individual1 (eobs 5860)          1
  2 GR002        individual2 (eobs 5861)          1
  3 GR009        individual3 (eobs 6483)          1

型
这就给了我我想要的。所以我只需要使用计数功能...谢谢！

赞(0）回复(0）举报 2023-11-14

我来回答

R语言如何根据另一个变量来查看变量的某个值/类别(大型数据集)

2条答案

相关问题

热门标签

最新问答

R语言 如何根据另一个变量来查看变量的某个值/类别(大型数据集)

2条答案

相关问题

热门标签

最新问答

R语言如何根据另一个变量来查看变量的某个值/类别(大型数据集)