对于以下数据,我想做以下操作:
[1]删除该ID仅有一个重复测量/观测的观测
[2]如果剩余的id的重复测量少于5次,重复这些观察直到至少有5个观察(例如,如果一个id有2行,重复3次)
数据:
structure(list(id = c("0101", "0102", "0102", "0103", "0103",
"0103", "0104", "0104", "0104", "0104", "0104", "0105", "0105",
"0105", "0105", "0105", "0106", "0106", "0106", "0106", "0106",
"0107", "0107", "0107", "0107", "0107", "0108", "0108", "0108",
"0108"), date = c("10/01/91", "12/03/91", "05/05/92", "06/22/92",
"12/17/92", "07/14/93", "07/28/92", "01/14/93", "08/11/93", "02/03/94",
"08/23/94", "09/24/92", "03/05/93", "10/18/93", "04/14/94", "05/31/94",
"01/13/93", "07/27/93", "03/10/94", "09/01/94", "03/09/95", "01/15/93",
"07/23/93", "02/07/94", "07/28/94", "02/07/95", "03/19/93", "10/04/93",
"05/17/94", "11/15/94"), y = c(0, 0, 9, 0, -11, -11, 0, 10, 9,
4, 5, 0, -7, -17, -13, -17, 0, 6, 6, 1, 3, 0, -9, -13, -18, -17,
0, -8, -8, -10)), row.names = c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L,
23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L), class = "data.frame")
4条答案
按热度按时间pbpqsu0x1#
看起来很简单,但我可能误解了;这能解决你的问题吗?
创建于2023-04-20使用reprex v2.0.2
nxowjjhe2#
这将删除n = 1的组,将具有少于5个观察结果的组循环为5个(并保留5个或更多个不变):
lyfkaqu13#
split
和rep
与[
组合的方法。结果
数据
uajslkp64#
我已经明确地说明了重复,所以它总是一个倍数,即使这会使你超过每个组的确切数量5。使用 data.table 包: