使用R过滤到特定日期范围

ehxuflar  于 2023-09-27  发布在  其他
关注(0)|答案(3)|浏览(108)

我有一组按日期列出的分类变量。期望的结果是由特定日期范围选择的分类变量的计数的图。
我可以产生一个情节的整个集,但没有变化,我发现产生的结果。Date的格式为date,libloc为字符。最终的结果是我们在不同地点按学期做的指令数量的图。

以下是我目前所掌握的:

library(ggplot)
library(lubridate)
library(readr)

#df <- read_excel("C:/Users/12083/Desktop/instructions/datasetd.xlsx")
df <- structure(list(Location = c("8", "Boise", "Idaho Falls","Meridian", 
                                  "Other", "Pocatello", "REND", "Salt Lake City", 
                                  "Sun Valley", "Twin Falls", NA), 
                     counts = c(1L, 12L, 780L, 61L, 18L, 3446L, 
                                2L, 1L, 1L, 4L, 24L)), 
                row.names = c(NA, -11L), class = c("tbl_df","tbl","data.frame"))

df %>%
  select(date,Location) %>%
  filter(date >= as.Date("2017-01-05") & date <= as.Date("2018-01-10"))%>%
  group_by(Location) %>%
  summarise(count=n())
g <- ggplot(df, aes(Location))
g + geom_bar()
woobm2wo

woobm2wo1#

  • 救命啊 *

您可能会发现我的santoku包有帮助。它可以将日期分割成时间间隔:

library(santoku)
library(dplyr)

df_summary <- df %>%
  select(date,Location) %>%
  filter(date >= as.Date("2017-01-05") & date <= as.Date("2018-01-10")) %>%
  mutate(semester = chop(date, as.Date(c("2017-01-05", "2017-01-09")))) %>%
  group_by(Location, semester) %>%
  summarise(count=n())

显然,你会想选择你的学期日期适当。
然后你可以打印类似这样的东西:

ggplot(df_summary, aes(semester, count)) + geom_col() + facet_wrap(vars(location))
fxnxkyjh

fxnxkyjh2#

希望这对你有帮助:

#### Filtering using R to a specific date range ####
# From: https://stackoverflow.com/questions/62926802/filtering-using-r-to-a-specific-date-range

# First, I downloaded a sample dataset with dates and categorical data from here: 
# https://vincentarelbundock.github.io/Rdatasets/datasets.html
# Specifically, I got weather.csv

setwd("F:/Home Office/R")

data = read.csv("weather.csv") # Read the data into R
head(data)                     # Quality control, looks good
data = data[,2:3]              # For this example, I cut it to only take the relevant columns
data$date = as.Date(data$date) # This formats the date as dates for R
library(tidyverse)             # This will import some functions that you need, spcifically %>% and ggplot

# Step 0: look that the data makes sense to you
summary(data$date)
summary(data$city)

# Step 1: filter the right data
filtered = data %>% 
  filter(date > as.Date("2016-07-01") & date < as.Date("2017-07-01")) # This will only take rows between those dates

# Step 2: Plot the filtered data
# Using a bar plot: 
plot = ggplot(filtered, aes(x=city, fill = city)) + geom_bar() # You don't really need the fill, but I like it
plot

# Quality control: look at the numbers before and after the filtering:
summary(data$city)
summary(filtered$city)

输出:

> summary(short.data$city)
 Auckland   Beijing   Chicago    Mumbai San Diego 
      731       731       731       731       731 
> summary(filtered$city)
 Auckland   Beijing   Chicago    Mumbai San Diego 
      364       364       364       364       364

你也许可以让它更优雅。。但我觉得效果不错

编辑成线图

此编辑遵循您在评论中的请求:

# Line plot
# The major difference between geom_bar() and geom_line() is that 
# geom_line() requires both an X and Y values.
# So first I created a new data frame which has these values:
summarised.data = filtered %>%
  group_by(city) %>%
  tally()

# Now you can create the plot with ggplot:
# Notes: 
# 1. group = 1 is necessary
# 2. I added geom_point() so that each X value gets a point. I think it's easier to read. You can remove this if you like
plot.line = ggplot(summarised.data, aes(x=city, y=n, group = 1)) + geom_line() + geom_point()
plot.line

输出:

> summarised.data
# A tibble: 5 x 2
  city          n
  <fct>     <int>
1 Auckland    364
2 Beijing     364
3 Chicago     364
4 Mumbai      364
5 San Diego   364

bd1hkmkf

bd1hkmkf3#

这是一个新的答案,因为方法不同

#### Filtering using R to a specific date range ####
# From: https://stackoverflow.com/questions/62926802/filtering-using-r-to-a-specific-date-range

# First, the data I took by copy and pasting from here: 
# https://stackoverflow.com/questions/63006201/mapping-sample-data-to-actual-csv-data
# and saved it as bookdata.csv with Excel

setwd("C:/Users/di58lag/Documents/scratchboard/Scratchboard")
data = read.csv("bookdata.csv") # Read the data into R

head(data)                                            # Quality control, looks good
data$dates = as.Date(data$dates, format = "%d/%m/%Y") # This formats the date as dates for R
library(tidyverse)                                    # This will import some functions that you need, spcifically %>% and ggplot

# Step 0: look that the data makes sense to you
summary(data$dates)
summary(data$city)

# Step 1: filter the right data
start.date = as.Date("2020-01-02")
end.date   = as.Date("2020-01-04")

filtered = data %>% 
  filter(dates >= start.date & 
         dates <= end.date) # This will only take rows between those dates

# Step 2: Plotting
# Now you can create the plot with ggplot:
# Notes: 
# I added geom_point() so that each X value gets a point. 
# I think it's easier to read. You can remove this if you like
# Also added color, because I like it, feel free to delete

Plot = ggplot(filtered, aes(x=dates, y=classes, group = city)) + geom_line(aes(linetype=city, color = city)) + geom_point(aes(color=city))
Plot

# For a clean version of the plot:
clean.plot = ggplot(filtered, aes(x=dates, y=classes, group = city)) + geom_line(aes(linetype=city))
clean.plot

输出:绘图:

干净。plot:

编辑:增加了一个表函数!

阅读了你的评论后,我想我明白你想做什么了。您要求:
“在垂直方向上显示讲师的位置计数,在水平方向上显示日期计数。”
问题是,原始数据实际上并没有给予你计数的数量-即“有多少次一个特定的位置出现在一个特定的日期”。因此,我必须使用table函数添加另一行来计算:

data.table = as.data.frame(table(filtered))

这会计算每个日期+位置组合出现的次数,并给予一个称为“频率”的值。
现在,您可以将此频率绘制为计数,如下所示:

# Step 1.5: Counting the values
data.table = as.data.frame(table(filtered)) # This calculates the frequency of each date+location combination
data.table = data.table %>% filter(Freq>0)  # This is used to cut out any Freq=0 values (you don't want to plot cases where no event occured)
data.table$dates = as.Date(data.table$dates) # You need to rerun the "as.Date" func because it formats the dates back to "Factors"

#Quality control:
dim(filtered)   # Gives you the size of the dataframe before the counting
dim(data.table) # Gives the size after the counting
summary(data.table) # Will give you a summary of how many values are for each city, what is the date range and what is the Frequency range

# Now you can create the plot with ggplot:
# Notes: 
# I added geom_point() so that each X value gets a point. 
# I think it's easier to read. You can remove this if you like
# Also added color, because I like it, feel free to delete

Plot = ggplot(data.table, aes(x=dates, y=Freq, group = city)) + geom_line(aes(linetype=city, color = city)) + geom_point(aes(color=city))
Plot

# For a clean version of the plot:
clean.plot = ggplot(filtered, aes(x=dates, y=classes, group = city)) + geom_line(aes(linetype=city))
clean.plot

我有一种感觉,这并不完全是你想要的,因为数字是相当低的(1-12计数之间的范围),但这是我所理解的。
输出:

> summary(data.table) 
          city        dates                 Freq      
 Pocatello  :56   Min.   :2015-01-12   Min.   :1.000  
 Idaho Falls:10   1st Qu.:2015-02-10   1st Qu.:1.000  
 Meridian   : 8   Median :2015-03-04   Median :1.000  
            : 0   Mean   :2015-03-11   Mean   :1.838  
 8          : 0   3rd Qu.:2015-04-06   3rd Qu.:2.000  
 Boise      : 0   Max.   :2015-06-26   Max.   :5.000  
 (Other)    : 0

相关问题