[R] 提升 R 的效率 — `system.time()` & The R profiler

Min

Published in

Becoming a data scientist

4 min readDec 3, 2019

這是 Coursera R-programming 的課程紀錄筆記

為什麼要學 R-profiler？

想要提升程式執行的效率，卻不知道哪個部分執行的比較久
使用系統化的方式計算執行每部份的 code 所花費的時間

在提升效率之前要知道的事

永遠先把程式碼全部寫好，確定需要的功能都寫好後，再考慮最佳化的問題

兩個估算時間的方式

system.time()
The R Profiler

system.time()

使用方式

Input：想要測試的 code 或 function
Return：proc_time的物件，包含 user time和elapsed time user time：CPU(s) 執行這段 code 需要的時間
elapsed time："wall clock" time

一般來說，這兩個的時間會非常接近，但在以下兩種情況會有明顯差異

elapsed time >user time：CPU(s) 花很多時間在等待其他背景程式執行完成，而非花時間在你要計算的這段 code
elapsed time <user time：你的電腦有 multiple cores/processors

例子

> system.time(readLines("https://medium.com"))   user  system elapsed 
  0.022   0.008   1.069

此例中elapsed time 的時間比較長，可能是因為系統需要花時間連上網站

問題

使用system.time()時，需要每個區塊各自測試，所以必須先知道哪段 code 比較耗時
那如果不知道從哪裡開始測試呢？我們就需要用到The R Profiler

The R profiler

兩個 functions

Rprof()：把想要測試的 code 夾在兩個Rprof()之間，如下所示
summaryRprof()：summarize 結果

Rprof() 的使用方式

Two methods for normalizing the data

by.total ：每個 function 花費的時間 ÷ 所有執行時間
by.self：（每個 function 花費時間 − 底層 functions 花費時間） ÷ 所有執行時間

什麼是底層 functions 花費時間？一般在執行一個 function 時，會呼叫許多 help functions 幫忙做事。因此，通常時間都是花在 help functions 中，而非 top level function 本身。使用 by.self 即可知道 top level function 本身花費的時間。

例子

> Rprof(tmp <- tempfile())
> example(glm)
> Rprof()
> summaryRprof(tmp)$by.self
                self.time self.pct total.time total.pct
"FUN"                0.04    15.38       0.06     23.08
"tryCatch"           0.02     7.69       0.20     76.92
"cat"                0.02     7.69       0.06     23.08
"prettyNum"          0.02     7.69       0.04     15.38
".Rd_get_title"      0.02     7.69       0.02      7.69
"%in%"               0.02     7.69       0.02      7.69
...$by.total
                      total.time total.pct self.time self.pct
"eval"                      0.22     84.62      0.00     0.00
"tryCatch"                  0.20     76.92      0.02     7.69
"doTryCatch"                0.20     76.92      0.00     0.00
"tryCatchList"              0.20     76.92      0.00     0.00
"tryCatchOne"               0.20     76.92      0.00     0.00
"withVisible"               0.20     76.92      0.00     0.00
...$sample.interval
[1] 0.02$sampling.time
[1] 0.26

參考資料

Coursera R-programming: https://www.coursera.org/learn/r-programming?specialization=data-science-foundations-r
Profiling R code: https://www.r-bloggers.com/profiling-r-code/

[R] 提升 R 的效率 — system.time() & The R profiler

為什麼要學 R-profiler？

在提升效率之前要知道的事

兩個估算時間的方式

system.time()

使用方式

例子

問題

The R profiler

兩個 functions

Two methods for normalizing the data

例子

參考資料

Written by Min

[R] 提升 R 的效率 — `system.time()` & The R profiler