使用reframe或complete根据数据中的最小/最大值生成数据集

shstlldc 于 2024-01-03 发布在其他

关注(0)|答案(1)|浏览(108)

我尝试使用一个数据集中的值，来创建另一个数据集进行模型预测。
我的数据集有两个站点（A和B），不同年份的数据，每个站点的范围不同，以及大量的个人（站点和年份的比率也不同）。
我需要最终的数据集，以包括所有独特的组合网站，最小最大年为该网站，和质量值从最小到最大的增量为0.1。例如，网站A的数据在5年和质量范围从2-5，所以应该有205个组合（1个网站x 5年x 31质量值）

# example dataset
df <- data.frame(site = c(rep("A", 20),                      # 20 obs for site A
                          rep("B", 30)),                     # 30 obs for site B
                 year = c(sample(1:5, 20, replace = TRUE),           # 5 years for site A
                          sample(c(1:4, 6:7), 30, replace = TRUE)),  # 6 years for site B, resulting range should span 1-7 (including 5)
                 mass = c(sample(seq(2, 5, 0.1), 20, replace = TRUE),    # different range for A than B
                          sample(seq(1, 6, 0.1), 30, replace = TRUE)))   # different range for A than B

# I've tried using complete, but it doesn't recognize mass
df %>% complete(year, nesting(site), 
                fill = list(seq(min(mass), max(mass), 0.1)))
Error in seq(min(mass), max(mass), 0.1) : object 'mass' not found

# I've also tried reframe, but it doesn't cover the full range of masses
df %>% reframe(year = min(year):max(year), .by = c(site, mass))

字符串

r

来源：https://stackoverflow.com/questions/77705643/use-reframe-or-complete-to-generate-dataset-based-on-min-max-values-in-data

1条答案

按热度按时间

jvidinwx1#

你可以从seq元素沿着range s来expand.grid s。

> res <-
+   by(df, df$site, \(x) 
+      cbind(site=x$site[1], 
+            expand.grid(year=do.call('seq.int', c(as.list(range(x$year)), 1)),
+                        mass=do.call('seq.int', c(as.list(range(x$mass)), .1))))) |>
+   do.call(what='rbind')
> 
> by(res, res$site, summary)
res$site: A
     site                year        mass     
 Length:130         Min.   :1   Min.   :2.00  
 Class :character   1st Qu.:2   1st Qu.:2.60  
 Mode  :character   Median :3   Median :3.25  
                    Mean   :3   Mean   :3.25  
                    3rd Qu.:4   3rd Qu.:3.90  
                    Max.   :5   Max.   :4.50  
--------------------------------------------------------------------------- 
res$site: B
     site                year        mass      
 Length:336         Min.   :1   Min.   :1.100  
 Class :character   1st Qu.:2   1st Qu.:2.275  
 Mode  :character   Median :4   Median :3.450  
                    Mean   :4   Mean   :3.450  
                    3rd Qu.:6   3rd Qu.:4.625  
                    Max.   :7   Max.   :5.800

字符串

数据类型：*

> dput(df)
structure(list(site = c("A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B"), year = c(1L, 5L, 1L, 1L, 2L, 4L, 2L, 2L, 1L, 
4L, 1L, 5L, 4L, 2L, 2L, 3L, 1L, 1L, 3L, 4L, 6L, 6L, 6L, 4L, 2L, 
4L, 3L, 2L, 1L, 2L, 7L, 3L, 7L, 2L, 4L, 4L, 7L, 2L, 6L, 4L, 6L, 
4L, 2L, 2L, 3L, 1L, 6L, 2L, 2L, 7L), mass = c(2.5, 2.1, 3.9, 
2.2, 4.1, 4, 2.1, 4.2, 2.5, 4.5, 2.9, 2.7, 2.4, 2, 3.6, 2.6, 
2.3, 3.2, 2.9, 2.8, 3.8, 2.1, 2.9, 1.8, 5.2, 4.4, 3.8, 2.5, 4.6, 
3.7, 5.5, 1.4, 3.7, 1.1, 2.7, 3.3, 5.8, 2.7, 1.4, 5.5, 4.9, 4.9, 
3, 4.5, 4.5, 4.8, 5.1, 2.7, 3.6, 2.2)), class = "data.frame", row.names = c(NA, 
-50L))

型

赞(0）回复(0）举报 2024-01-03

我来回答

使用reframe或complete根据数据中的最小/最大值生成数据集

1条答案

相关问题

热门标签

最新问答