r或sql为不同的类别选择具有不同值的记录

olhwl3o2  于 2021-06-26  发布在  Hive
关注(0)|答案(3)|浏览(301)

我有这样的Dataframe/或表

RowNumber   Category    Value 
   1 .        A          12
   2 .        A           3 
   3 .        B          24     
   4.         B          32
   5 .        B          11
   6 .        C          30   
   7 .        D           2
   8 .        D          33

..
使用sql(hive)或r希望得到这两种语言的指导:根据不同类别有不同的截止点来选择记录
对于类别a,我想选择值>=10,但是对于所有其他类别,b、c、d需要选择值>=20
结果是:

RowNumber   Category    Value 
   1 .        A          12
   3 .        B          24     
   4.         B          32
   6 .        C          30   
   8 .        D          33

我怎么能这么做?
谢谢您!!

9rnv2umw

9rnv2umw1#

简单的查询

select c1,c2 from tbl where c2 >= 10 and c1 = 'A' 
union all
select c1,c2 from tbl where c2 >= 20 and c1 != 'A' 

+---------+---------+--+
| _u1.c1  | _u1.c2  |
+---------+---------+--+
| A       | 12      |
| B       | 24      |
| B       | 32      |
| C       | 30      |
| D       | 33      |
+---------+---------+--+
rjzwgtxy

rjzwgtxy2#

在base r中,可以使用:

df <- data.frame(RowNumber = c(1, 2, 3, 4, 5, 6, 7 ,8), Category = c("A", "A", "B", "B", "B", "C", "D", "D"), Value = c(12, 3, 24, 32, 11, 30, 2, 33))
df[df$Category == "A" & df$Value >= 10 | df$Category != "A" & df$Value >= 20, ]

您将获得期望的结果:

RowNumber Category  Value
1         1        A    12
3         3        B    24
4         4        B    32
6         6        C    30
8         8        D    33
wz8daaqr

wz8daaqr3#

这里有一些选择。

library(sqldf)

# 1

sqldf("select * from DF 
       where (Category = 'A' and Value >= 10) or (not Category = 'A' and Value >= 20)")

# 2

sqldf("select * from DF where Value >= (case when Category = 'A' then 10 else 20 end)")

# 3

sqldf("select * from DF where Value >= (10 * (not Category = 'A') + 10)")

# 4

subset(DF, (Category == "A" & Value >= 10) | (Category != "A" & Value >= 20))

# 5

subset(DF, Value >= ifelse(Category == "A", 10, 20))

# 6

subset(DF, Value >= 10 * (Category != "A") + 10)

以上任何一项都可以给出:

RowNumber Category Value
1         1        A    12
2         3        B    24
3         4        B    32
4         6        C    30
5         8        D    33

注意

可复制形式的输入为:

Lines <- "RowNumber   Category    Value 
   1        A          12
   2        A           3 
   3        B          24     
   4        B          32
   5        B          11
   6        C          30   
   7        D           2
   8        D          33"

DF <- read.table(text = Lines,  header = TRUE)

相关问题