R语言 两个不同频率的堆积条形图

9wbgstp7  于 2023-04-09  发布在  其他
关注(0)|答案(1)|浏览(148)

我有一个组合频率,在一组读数上有两个条件。
Dataframe 可以在这里找到:

dput(Merchant_Category_Frequency_with_Target)
structure(list(Var1 = structure(1:31, .Label = c("Airline", "Airports", 
"Alcohol", "Auto", "Books & stationery", "Business Services", 
"Cloth stores", "Contracted services", "Dept stores", "Digital goods", 
"Direct marketing", "Education", "Electronics", "Food", "Fuel", 
"Govt services", "Home furnishing", "Hotels", "Insurance", "Medical", 
"Misc Services", "Music stores", "Professional services & memberships", 
"Quasi cash", "Railways", "Rent Payments", "Restaurants", "Retail", 
"Transportation services", "Utility", "Wallet load"), class = "factor"), 
    Freq.x = c(429L, 1L, 325L, 499L, 239L, 1324L, 5242L, 38L, 
    3881L, 355L, 91L, 1554L, 2200L, 424L, 5588L, 1935L, 264L, 
    1409L, 2384L, 1789L, 971L, 23L, 505L, 5L, 1662L, 4408L, 1820L, 
    3135L, 1297L, 4660L, 1543L), Freq.y = c(16L, NA, 11L, 34L, 
    19L, 56L, 179L, 1L, 141L, 10L, 8L, 100L, 229L, 8L, 142L, 
    40L, 13L, 37L, 142L, 75L, 39L, NA, 18L, NA, 62L, 389L, 33L, 
    148L, 39L, 437L, 194L)), row.names = c(NA, -31L), class = "data.frame")

我想有一个所有读数(Var1)和两个频率的组合频率分布表,(Freq.x)应该是一种颜色的酒吧和堆叠在它上面,(Freq.y)应该是另一种颜色的酒吧。
我试着按照网上的各种教程,但他们似乎没有工作,因为这里的变量是一个字符,而不是一个数字数据。
干杯

zpjtge22

zpjtge221#

首先,您需要使用pivot_longergather转换数据,我使用pivot_longer

df <- structure(list(Var1 = structure(1:31, .Label = c("Airline", "Airports", 
"Alcohol", "Auto", "Books & stationery", "Business Services", 
"Cloth stores", "Contracted services", "Dept stores", "Digital goods", 
"Direct marketing", "Education", "Electronics", "Food", "Fuel", 
"Govt services", "Home furnishing", "Hotels", "Insurance", "Medical", 
"Misc Services", "Music stores", "Professional services & memberships", 
"Quasi cash", "Railways", "Rent Payments", "Restaurants", "Retail", 
"Transportation services", "Utility", "Wallet load"), class = "factor"), 
    Freq.x = c(429L, 1L, 325L, 499L, 239L, 1324L, 5242L, 38L, 
    3881L, 355L, 91L, 1554L, 2200L, 424L, 5588L, 1935L, 264L, 
    1409L, 2384L, 1789L, 971L, 23L, 505L, 5L, 1662L, 4408L, 1820L, 
    3135L, 1297L, 4660L, 1543L), Freq.y = c(16L, NA, 11L, 34L, 
    19L, 56L, 179L, 1L, 141L, 10L, 8L, 100L, 229L, 8L, 142L, 
    40L, 13L, 37L, 142L, 75L, 39L, NA, 18L, NA, 62L, 389L, 33L, 
    148L, 39L, 437L, 194L)), row.names = c(NA, -31L), class = "data.frame")

data <- df |>
  pivot_longer(cols = c(Freq.x, Freq.y), names_to = "freq")

> head(data)
# A tibble: 6 × 3
  Var1     freq   value
  <fct>    <chr>  <int>
1 Airline  Freq.x   429
2 Airline  Freq.y    16
3 Airports Freq.x     1
4 Airports Freq.y    NA
5 Alcohol  Freq.x   325
6 Alcohol  Freq.y    11

然后使用ggplot函数:

ggplot(data, aes(x = Var1, y = value, fill = freq)) + 
  geom_bar(stat = "identity") + 
  theme(axis.text.x = element_text(angle = -45, vjust = 0.5, hjust = 0.05))

这是输出:

相关问题