Skip to content

Instantly share code, notes, and snippets.

@jundoll
Last active August 29, 2015 14:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jundoll/40b863a7f13f7ca51d1d to your computer and use it in GitHub Desktop.
Save jundoll/40b863a7f13f7ca51d1d to your computer and use it in GitHub Desktop.
blog code: LC_ALL=Cでsortを高速に
##### LC_ALLによるsort速度比較
# まずbashでwords.txtを生成
# cat /dev/urandom | LC_ALL=C tr -dc '[:alnum:]' | fold -w 10 | head -n 1000000 > words.txt
dat <- read.table("words.txt")[[1]]
# 簡単な速度比較
bench <- function(lc_all) {
Sys.setlocale(locale = lc_all)
time <- proc.time()
assign(lc_all, sort(dat), pos = globalenv())
proc.time() - time
}
bench("C")
bench("ja_JP.UTF-8")
bench("en_US.UTF-8")
# 結果が同じかどうかの確認
all(get("C") == get("ja_JP.UTF-8"))
all(get("C") == get("en_US.UTF-8"))
all(get("ja_JP.UTF-8") == get("en_US.UTF-8"))
# dplyrを使った場合の速度比較
bench2 <- function(lc_all) {
Sys.setlocale(locale = lc_all)
time <- proc.time()
assign(lc_all, arrange(as.data.frame(dat), dat)[[1]], pos = globalenv())
proc.time() - time
}
bench2("C")
bench2("ja_JP.UTF-8")
bench2("en_US.UTF-8")
# 結果の確認
all(get("C") == get("ja_JP.UTF-8"))
all(get("C") == get("en_US.UTF-8"))
all(get("ja_JP.UTF-8") == get("en_US.UTF-8"))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment