我在Windows 10上使用R版本4.1.3,遇到内存使用问题。
目前,我需要在程序中使用arrow和dplyr库,当我比较windows任务管理器和memory.size(max=F)函数之间使用的内存时,windows任务管理器给出的内存要大得多,243.5 MB RAM Windows,而memory.size(max=F)函数给出的内存为75.77 MB。
但是,我删除了用rm()创建的对象,然后使用gc()函数恢复对象使用的内存。
下面,R代码,有输出和没有输出,我用来呈现我的问题:
-带输出的编码
> gc(verbose = TRUE)
Garbage collection 2 = 0+0+2 (level 2) ...
14.2 Mbytes of cons cells used (41%)
3.9 Mbytes of vectors used (6%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 264908 14.2 648748 34.7 401965 21.5
Vcells 500529 3.9 8388608 64.0 1671274 12.8
>
> # basic memory
> memory.size(max=F)
[1] 28.78
>
> library(arrow)
Attachement du package : ‘arrow’
L'objet suivant est masqué depuis ‘package:utils’:
timestamp
>
> # Memory after loading the arrow library with memory.size
> memory.size(max=F)
[1] 51.32
>
> library(dplyr)
Attachement du package : ‘dplyr’
Les objets suivants sont masqués depuis ‘package:stats’:
filter, lag
Les objets suivants sont masqués depuis ‘package:base’:
intersect, setdiff, setequal, union
>
> # Memory after loading the dplyr library with memory.size
> memory.size(max=F)
[1] 90.2
>
> df <- data.frame(
+ col1 = rnorm(1000000),
+ col2 = rnorm(1000000),
+ col3 = runif(1000000),
+ col4 = sample(1:999, size = 1000000, replace = T),
+ col5 = sample(c("GroupA", "GroupB"), size = 1000000, replace = T),
+ col6 = sample(c("TypeA", "TypeB"), size = 1000000, replace = T)
+ )
>
> # Memory after df object creation
> memory.size(max=F)
[1] 132.83
>
> arrow::write_dataset(
+ df,
+ paste0(Sys.getenv("USERPROFILE"),"/ExProblemeGc"),
+ format = "parquet"
+ )
>
> # Memory after writing to disk
> memory.size(max=F)
[1] 120.07
>
> rm(df)
>
> # Memory after deletion df
> memory.size(max=F)
[1] 120.07
>
> gc(verbose = TRUE)
Garbage collection 15 = 9+2+4 (level 2) ...
45.0 Mbytes of cons cells used (61%)
38.0 Mbytes of vectors used (49%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 842160 45 1380031 73.8 1380031 73.8
Vcells 4976056 38 10146329 77.5 8388368 64.0
>
> # Memory after gc(verbose = TRUE)
> memory.size(max=F)
[1] 101.27
>
> gc(verbose = TRUE)
Garbage collection 16 = 9+2+5 (level 2) ...
45.0 Mbytes of cons cells used (61%)
11.3 Mbytes of vectors used (15%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 842053 45.0 1380031 73.8 1380031 73.8
Vcells 1475891 11.3 10146329 77.5 8388368 64.0
>
> # Memory after gc(verbose = TRUE)
> memory.size(max=F)
[1] 74.34
>
> ds <- arrow::open_dataset(paste0(Sys.getenv("USERPROFILE"),"/ExProblemeGc"))
>
> # Memory after ds creation
> memory.size(max=F)
[1] 79.02
>
> req <-
+ ds %>%
+ collect()
>
> # Memory after req creation
> memory.size(max=F)
[1] 84.45
>
> rm(req)
>
> # Mémoire aprés suppression df
> memory.size(max=F)
[1] 84.45
>
> gc(verbose = TRUE)
Garbage collection 17 = 9+2+6 (level 2) ...
49.6 Mbytes of cons cells used (52%)
12.5 Mbytes of vectors used (16%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 927293 49.6 1797205 96.0 1380031 73.8
Vcells 1627658 12.5 10146329 77.5 8388368 64.0
>
> # Memory after gc(verbose = TRUE)
> memory.size(max=F)
[1] 75.77
>
> gc(verbose = TRUE)
Garbage collection 18 = 9+2+7 (level 2) ...
49.6 Mbytes of cons cells used (52%)
12.5 Mbytes of vectors used (16%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 927239 49.6 1797205 96.0 1380031 73.8
Vcells 1627568 12.5 10146329 77.5 8388368 64.0
>
> # Memory after gc(verbose = TRUE)
> memory.size(max=F)
[1] 75.77
>
> rm(ds)
>
> # Memory after deletion df
> memory.size(max=F)
[1] 75.77
>
> gc(verbose = TRUE)
Garbage collection 19 = 9+2+8 (level 2) ...
49.6 Mbytes of cons cells used (52%)
12.5 Mbytes of vectors used (16%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 927149 49.6 1797205 96.0 1380031 73.8
Vcells 1627532 12.5 10146329 77.5 8388368 64.0
>
> # Memory after gc(verbose = TRUE)
> memory.size(max=F)
[1] 75.77
>
> gc(verbose = TRUE)
Garbage collection 20 = 9+2+9 (level 2) ...
49.6 Mbytes of cons cells used (52%)
12.5 Mbytes of vectors used (16%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 927146 49.6 1797205 96.0 1380031 73.8
Vcells 1627527 12.5 10146329 77.5 8388368 64.0
>
> # Memory after gc(verbose = TRUE)
> memory.size(max=F)
[1] 75.77
-无输出编码
gc(verbose = TRUE)
# basic memory
memory.size(max=F)
library(arrow)
# Memory after loading the arrow library with memory.size
memory.size(max=F)
library(dplyr)
# Memory after loading the dplyr library with memory.size
memory.size(max=F)
df <- data.frame(
col1 = rnorm(1000000),
col2 = rnorm(1000000),
col3 = runif(1000000),
col4 = sample(1:999, size = 1000000, replace = T),
col5 = sample(c("GroupA", "GroupB"), size = 1000000, replace = T),
col6 = sample(c("TypeA", "TypeB"), size = 1000000, replace = T)
)
# Memory after df object creation
memory.size(max=F)
arrow::write_dataset(
df,
paste0(Sys.getenv("USERPROFILE"),"/ExProblemeGc"),
format = "parquet"
)
# Memory after writing to disk
memory.size(max=F)
rm(df)
# Memory after deletion df
memory.size(max=F)
gc(verbose = TRUE)
# Memory after gc(verbose = TRUE)
memory.size(max=F)
gc(verbose = TRUE)
# Memory after gc(verbose = TRUE)
memory.size(max=F)
ds <- arrow::open_dataset(paste0(Sys.getenv("USERPROFILE"),"/ExProblemeGc"))
# Memory after ds creation
memory.size(max=F)
req <-
ds %>%
collect()
# Memory after req creation
memory.size(max=F)
rm(req)
# Mémoire aprés suppression df
memory.size(max=F)
gc(verbose = TRUE)
# Memory after gc(verbose = TRUE)
memory.size(max=F)
gc(verbose = TRUE)
# Memory after gc(verbose = TRUE)
memory.size(max=F)
rm(ds)
# Memory after deletion df
memory.size(max=F)
gc(verbose = TRUE)
# Memory after gc(verbose = TRUE)
memory.size(max=F)
gc(verbose = TRUE)
# Memory after gc(verbose = TRUE)
memory.size(max=F)
你认为这种记忆差异正常吗?它是否可能是由所使用的库和/或使用R语言的不良做法造成的?
我想知道为什么Windows任务管理器和R的memory.size(max=F)函数在内存使用上有区别。
谢谢你的帮助,我将随时为你提供你可能需要的任何进一步的信息。
最好的问候,
1条答案
按热度按时间2izufjch1#
作为补充,我使用了函数default_memory_pool()$bytes_allocated和default_memory_pool()$max_memory,下面是我得到的返回:
1-加载所有必要的库之后:
2-使用data.frame创建对象df后:
没有使用箭头函数,我想我明白了为什么$bytes_allocated和$max_memory的值不受影响?
3-使用arrow::write_dataset后:
使用箭头函数会影响$bytes_allocated和$max_memory的值
4-删除df对象和gc()后:
我不明白为什么default_memory_pool()在删除df后$bytes_allocated = 0,而创建df时为0,arrow::write_dataset后为19000128。不是19000128吗?
5-在创建ds对象时使用arrow::open_dataset之后:
在创建ds时使用箭头函数不会影响$bytes_allocated和$max_memory的值。为什么不呢?
6-在传递ds的内容并使用collect()创建req对象之后:
再次使用箭头函数会影响$bytes_allocated和$max_memory的值。为什么不呢?
7-删除req对象和gc()后:
删除req对象会影响$bytes_allocated的值
8-删除ds对象和gc()后:
我不太明白$bytes_allocated和$max_memory是如何工作的。你能解释一下吗?