使用reframe或complete根据数据中的最小/最大值生成数据集

shstlldc  于 2024-01-03  发布在  其他
关注(0)|答案(1)|浏览(94)

我尝试使用一个数据集中的值,来创建另一个数据集进行模型预测。
我的数据集有两个站点(A和B),不同年份的数据,每个站点的范围不同,以及大量的个人(站点和年份的比率也不同)。
我需要最终的数据集,以包括所有独特的组合网站,最小最大年为该网站,和质量值从最小到最大的增量为0.1。例如,网站A的数据在5年和质量范围从2-5,所以应该有205个组合(1个网站x 5年x 31质量值)

# example dataset
df <- data.frame(site = c(rep("A", 20),                      # 20 obs for site A
                          rep("B", 30)),                     # 30 obs for site B
                 year = c(sample(1:5, 20, replace = TRUE),           # 5 years for site A
                          sample(c(1:4, 6:7), 30, replace = TRUE)),  # 6 years for site B, resulting range should span 1-7 (including 5)
                 mass = c(sample(seq(2, 5, 0.1), 20, replace = TRUE),    # different range for A than B
                          sample(seq(1, 6, 0.1), 30, replace = TRUE)))   # different range for A than B

# I've tried using complete, but it doesn't recognize mass
df %>% complete(year, nesting(site), 
                fill = list(seq(min(mass), max(mass), 0.1)))
Error in seq(min(mass), max(mass), 0.1) : object 'mass' not found

# I've also tried reframe, but it doesn't cover the full range of masses
df %>% reframe(year = min(year):max(year), .by = c(site, mass))

字符串

jvidinwx

jvidinwx1#

你可以从seq元素沿着range s来expand.grid s。

> res <-
+   by(df, df$site, \(x) 
+      cbind(site=x$site[1], 
+            expand.grid(year=do.call('seq.int', c(as.list(range(x$year)), 1)),
+                        mass=do.call('seq.int', c(as.list(range(x$mass)), .1))))) |>
+   do.call(what='rbind')
> 
> by(res, res$site, summary)
res$site: A
     site                year        mass     
 Length:130         Min.   :1   Min.   :2.00  
 Class :character   1st Qu.:2   1st Qu.:2.60  
 Mode  :character   Median :3   Median :3.25  
                    Mean   :3   Mean   :3.25  
                    3rd Qu.:4   3rd Qu.:3.90  
                    Max.   :5   Max.   :4.50  
--------------------------------------------------------------------------- 
res$site: B
     site                year        mass      
 Length:336         Min.   :1   Min.   :1.100  
 Class :character   1st Qu.:2   1st Qu.:2.275  
 Mode  :character   Median :4   Median :3.450  
                    Mean   :4   Mean   :3.450  
                    3rd Qu.:6   3rd Qu.:4.625  
                    Max.   :7   Max.   :5.800

字符串

  • 数据类型:*
> dput(df)
structure(list(site = c("A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B"), year = c(1L, 5L, 1L, 1L, 2L, 4L, 2L, 2L, 1L, 
4L, 1L, 5L, 4L, 2L, 2L, 3L, 1L, 1L, 3L, 4L, 6L, 6L, 6L, 4L, 2L, 
4L, 3L, 2L, 1L, 2L, 7L, 3L, 7L, 2L, 4L, 4L, 7L, 2L, 6L, 4L, 6L, 
4L, 2L, 2L, 3L, 1L, 6L, 2L, 2L, 7L), mass = c(2.5, 2.1, 3.9, 
2.2, 4.1, 4, 2.1, 4.2, 2.5, 4.5, 2.9, 2.7, 2.4, 2, 3.6, 2.6, 
2.3, 3.2, 2.9, 2.8, 3.8, 2.1, 2.9, 1.8, 5.2, 4.4, 3.8, 2.5, 4.6, 
3.7, 5.5, 1.4, 3.7, 1.1, 2.7, 3.3, 5.8, 2.7, 1.4, 5.5, 4.9, 4.9, 
3, 4.5, 4.5, 4.8, 5.1, 2.7, 3.6, 2.2)), class = "data.frame", row.names = c(NA, 
-50L))

相关问题