Why is rbindlist "better" than rbind? Ask Question

Question

rbindlist is an optimized version of do.call(rbind, list(...)), which is known for being slow when using rbind.data.frame

Where does it really excel

Some questions that show where rbindlist shines are

Fast vectorized merge of list of data.frames by row

Trouble converting long list of data.frames (~1 million) to single data.frame using do.call and ldply

These have benchmarks that show how fast it can be.

rbind.data.frame is slow, for a reason

rbind.data.frameたくさんのチェックを行い、名前で一致させます。(つまり、rbind.data.frame は列が異なる順序になっている可能性があることを考慮し、名前で一致させます)、rbindlistこの種のチェックは行わず、位置で結合します。

例えば

do.call(rbind, list(data.frame(a = 1:2, b = 2:3), data.frame(b = 1:2, a = 2:3)))
##    a b
## 1  1 2
## 2  2 3
## 3  2 1
## 4  3 2

rbindlist(list(data.frame(a = 1:5, b = 2:6), data.frame(b = 1:5, a = 2:6)))
##     a b
##  1: 1 2
##  2: 2 3
##  3: 1 2
##  4: 2 3

rbindlistのその他の制限

それ慣れている対処に苦労していましたがfactors、その後バグが修正されました。

rbindlist 2 つのデータテーブル (1 つは係数を持ち、もう 1 つは列に文字型を持つ)（バグ #2650）

列名が重複する問題がある

見る警告メッセージ: rbindlist(allargs) 内: 強制によって NA が導入されました: data.table にバグがある可能性がありますか?（バグ #2384）

rbind.data.frame の行名はイライラすることがある

rbindlistlists data.framesおよびを処理できdata.tables、行名のないdata.tableを返します。

行名が混乱する可能性がdo.call(rbind, list(...))ある

do.call 内で rbind を使用するときに行の名前変更を回避するにはどうすればよいでしょうか?

メモリ効率

メモリに関してはrbindlist、はで実装されているため、メモリ効率が良く、参照によって属性を設定するためにC使用します。setattr

rbind.data.frameはで実装されておりR、多くの割り当てを行い、attr<-(およびclass<-およびを使用しますrownames<-。これらはすべて、作成された data.frame のコピーを (内部的に) 作成します。

Answer 1

rbindlist is an optimized version of do.call(rbind, list(...)), which is known for being slow when using rbind.data.frame

Where does it really excel

Some questions that show where rbindlist shines are

Fast vectorized merge of list of data.frames by row

Trouble converting long list of data.frames (~1 million) to single data.frame using do.call and ldply

These have benchmarks that show how fast it can be.

rbind.data.frame is slow, for a reason

rbind.data.frameたくさんのチェックを行い、名前で一致させます。(つまり、rbind.data.frame は列が異なる順序になっている可能性があることを考慮し、名前で一致させます)、rbindlistこの種のチェックは行わず、位置で結合します。

例えば

do.call(rbind, list(data.frame(a = 1:2, b = 2:3), data.frame(b = 1:2, a = 2:3)))
##    a b
## 1  1 2
## 2  2 3
## 3  2 1
## 4  3 2

rbindlist(list(data.frame(a = 1:5, b = 2:6), data.frame(b = 1:5, a = 2:6)))
##     a b
##  1: 1 2
##  2: 2 3
##  3: 1 2
##  4: 2 3

rbindlistのその他の制限

それ慣れている対処に苦労していましたがfactors、その後バグが修正されました。

rbindlist 2 つのデータテーブル (1 つは係数を持ち、もう 1 つは列に文字型を持つ)（バグ #2650）

列名が重複する問題がある

見る警告メッセージ: rbindlist(allargs) 内: 強制によって NA が導入されました: data.table にバグがある可能性がありますか?（バグ #2384）

rbind.data.frame の行名はイライラすることがある

rbindlistlists data.framesおよびを処理できdata.tables、行名のないdata.tableを返します。

行名が混乱する可能性がdo.call(rbind, list(...))ある

do.call 内で rbind を使用するときに行の名前変更を回避するにはどうすればよいでしょうか?

メモリ効率

メモリに関してはrbindlist、はで実装されているため、メモリ効率が良く、参照によって属性を設定するためにC使用します。setattr

rbind.data.frameはで実装されておりR、多くの割り当てを行い、attr<-(およびclass<-およびを使用しますrownames<-。これらはすべて、作成された data.frame のコピーを (内部的に) 作成します。

Why is rbindlist "better" than rbind? Ask Question

ベストアンサー1

Where does it really excel

rbind.data.frame is slow, for a reason

rbindlistのその他の制限

rbind.data.frame の行名はイライラすることがある

メモリ効率

おすすめ記事