在R中保留具有相同日期和时间的数据对

ryevplcw  于 2023-01-10  发布在  其他
关注(0)|答案(5)|浏览(149)

我有两组数据,看起来像这样(这是一个非常小的子集)。

data1 <- data.frame("Metal" = c("Al", "Al", "Al", "Al", "Al", "Al", "Al"), "Type" = 
c("F", "F", "F", "F", "F", "F", "F"), "Date" = c("2000-01-01", "2000-01-01", "2000- 
01-02", "2000-01-03",                                                                                        
"2000-01-03", "2000-01-07", "2000-01-07"), "Time" = c("11:00:00", "12:00:00", 
"15:00:00", "13:00:00", "17:00:00", "20:00:00", "20:00:00"), "Value" = c(100, 200, 
300, 100, 400, 500, 500))

data2 <- data.frame("Metal" = c("Al", "Al", "Al", "Al", "Al", "Al", "Al"), "Type" = 
c("P", "P", 
                                                                      "P", "P", "P", 
"P", "P"), "Date" = c("2000-01-01", "2000-01-01", "2000-01-01", "2000-01-03", "2000- 
01-03", 
                                                                                                 
"2000-01-04", "2000-01-07"), "Time" = c("11:00:00", "11:00:00", "14:00:00", 
"17:00:00", "13:00:00", "16:00:00", "20:00:00"), "Value" = c(100, 100, 200, 900, 100, 
400, 999))

我希望保留两个表中具有相同日期和时间的数据,并创建一个新表(data 3)。有时data 1和data 2之间会有重复,我不希望data 3包含重复,只有其中的一个,并与它的对从其他表。我还希望输出表被排序,以显示对从每个表下对方(因此,我的“Type”列将交替使用F、P、F、P等)。
以下是我所需的输出

data3 <- data.frame("Metal" = c("Al", "Al", "Al", "Al", "Al", 
"Al", "Al", "Al"), "Type" = c("F", "P", "F", 
                                                                
"P", "F", "P", "F", "P"), "Date" = c("2000-01-01", "2000-01-01", 
"2000-01-03", "2000-01-03", "2000-01-03", "2000-01-03", "2001-01- 
07", "2001-01-07"), "Time" = 
                  c("11:00:00", "11:00:00", "13:00:00", 
"13:00:00", "17:00:00", "17:00:00", "20:00:00", "20:00:00"), 
"Value" = c(100, 100, 100, 100, 400, 900, 500, 999))

我试过使用dplyr中的各种类型的连接,但是它们都不是我想要的连接方式。
谢谢你的帮助!!

wribegjk

wribegjk1#

我们可能需要绑定数据,然后在分组后filter出重复项

library(dplyr)
library(data.table)
bind_rows(data1, data2, .id = 'grp')%>%
  group_by(Metal, Date, Time) %>%
  filter(n() > 1) %>%
  arrange(Date, Time, rowid(grp)) %>%
  slice(match(c("F", "P"), Type)) %>%
  ungroup %>% 
  select(-grp)
  • 输出
# A tibble: 8 × 5
  Metal Type  Date       Time     Value
  <chr> <chr> <chr>      <chr>    <dbl>
1 Al    F     2000-01-01 11:00:00   100
2 Al    P     2000-01-01 11:00:00   100
3 Al    F     2000-01-03 13:00:00   100
4 Al    P     2000-01-03 13:00:00   100
5 Al    F     2000-01-03 17:00:00   400
6 Al    P     2000-01-03 17:00:00   900
7 Al    F     2000-01-07 20:00:00   500
8 Al    P     2000-01-07 20:00:00   999
  • OP的数据
> data3
  Metal Type       Date     Time Value
1    Al    F 2000-01-01 11:00:00   100
2    Al    P 2000-01-01 11:00:00   100
3    Al    F 2000-01-03 13:00:00   100
4    Al    P 2000-01-03 13:00:00   100
5    Al    F 2000-01-03 17:00:00   400
6    Al    P 2000-01-03 17:00:00   900
7    Al    F 2001-01-07 20:00:00   500
8    Al    P 2001-01-07 20:00:00   999
knsnq2tg

knsnq2tg2#

这并不容易:-)

library(dplyr)

bind_rows(data1, data2) %>% 
  group_by(Date, Time) %>% 
  filter(n()>1) %>% 
  ungroup() %>% 
  group_by(Type) %>% 
  arrange(Time) %>% 
  ungroup() %>% 
  mutate(Flag = ifelse(Type == "P" & lag(Type, default = last(Type)) == "F", 1, NA)) %>% 
  mutate(Flag1 = lead(Flag)) %>% 
  filter(if_any(.cols = starts_with("Flag"), .fns = ~ . == 1)) %>% 
  select(-starts_with("Flag"))
Metal Type  Date       Time     Value
  <chr> <chr> <chr>      <chr>    <dbl>
1 Al    F     2000-01-01 11:00:00   100
2 Al    P     2000-01-01 11:00:00   100
3 Al    F     2000-01-03 13:00:00   100
4 Al    P     2000-01-03 13:00:00   100
5 Al    F     2000-01-03 17:00:00   400
6 Al    P     2000-01-03 17:00:00   900
7 Al    F     2000-01-07 20:00:00   500
8 Al    P     2000-01-07 20:00:00   999
7qhs6swi

7qhs6swi3#

inner_join的一种方法
这里的困难在于获得正确的格式,纯粹的数据过滤器本身是在inner_join之后完成的。

library(dplyr)
library(tidyr)

joined <- inner_join(data1 %>% distinct(), data2 %>% distinct(), 
            c("Metal", "Date", "Time"))

joined
  Metal Type.x       Date     Time Value.x Type.y Value.y
1    Al      F 2000-01-01 11:00:00     100      P     100
2    Al      F 2000-01-03 13:00:00     100      P     100
3    Al      F 2000-01-03 17:00:00     400      P     900
4    Al      F 2000-01-07 20:00:00     500      P     999

整理数据

joined %>% 
  pivot_longer(starts_with("Type"), values_to="Type") %>% 
  rowwise() %>% 
  mutate(Value = c_across(starts_with("Value"))[c(F=1, P=2)[Type]]) %>% 
  select(-contains("."), -name) %>% 
  ungroup()
# A tibble: 8 × 5
  Metal Date       Time     Type  Value
  <chr> <chr>      <chr>    <chr> <dbl>
1 Al    2000-01-01 11:00:00 F       100
2 Al    2000-01-01 11:00:00 P       100
3 Al    2000-01-03 13:00:00 F       100
4 Al    2000-01-03 13:00:00 P       100
5 Al    2000-01-03 17:00:00 F       400
6 Al    2000-01-03 17:00:00 P       900
7 Al    2000-01-07 20:00:00 F       500
8 Al    2000-01-07 20:00:00 P       999
3hvapo4f

3hvapo4f4#

有一些使用tidyverse包的方法应该可以用于排序,这些方法与Beeflight31的解决方案的一部分合并在一起:

data3 <- rbind(data1, data2)
data3 <- data3 %>%
  distinct(Metal, Date, Time, Value, .keep_all=TRUE) %>%
  group_by(Date, Time, Metal)
jjhzyzn0

jjhzyzn05#

编辑:这就是你想要的吗?

data3 <- rbind(data1,data2)
data3$unique_date <- paste(data3$Date, data3$Time, sep="_")
matching_rows <- which(duplicated(data3$unique_date))
data3 <- data3[matching_rows,]
data3 <- data3[!duplicated(data3$unique_date),]

相关问题