≥1022文字の行をリダイレクトするのが≥1021文字の行をリダイレクトするよりも83倍長くなるのはなぜですか？

2024-06-22 • tag-icon

≥1022文字の行をリダイレクトするのが≥1021文字の行をリダイレクトするよりも83倍長くなるのはなぜですか？

約50,000行を含む約10MBのテキストファイルがあります。 1021バイト以上のすべての行を選択し、出力を通常のファイルにリダイレクトするか、catにパイプすると0.135秒かかります。 ≥1022バイトに変更したときに15.9秒かかりました。これは83倍長い時間です。結果は同じです。

$ time grep '^.{1021,}$' my_file > /tmp/grep1021

real    0m0.135s
user    0m0.120s
sys     0m0.013s
$ time grep '^.{1022,}$' my_file > /tmp/grep1022

real    0m11.483s
user    0m11.036s
sys     0m0.441s
$ cmp /tmp/grep102?
$

それ以降は時間がかかります。 2,100文字を超える行では52秒かかります（結果は依然として同じです）。

$ time grep '^.{1200,}$' my_file | cat > /dev/null

real    0m15.903s
user    0m15.182s
sys     0m0.737s
$ time grep '^.{1500,}$' my_file | cat > /dev/null

real    0m27.114s
user    0m24.584s
sys     0m2.545s
$ time grep '^.{1800,}$' my_file | cat > /dev/null

real    0m36.468s
user    0m34.889s
sys     0m1.594s
$ time grep '^.{2100,}$' my_file | cat > /dev/null

real    0m52.164s
user    0m47.949s
sys     0m4.221s

これは単独では起こらず、それ自体grepは十分に高速です。

$ time grep '^.{1022,}$' my_file > /dev/null

real    0m0.073s
user    0m0.060s
sys     0m0.012s
$ time grep '^.{3000,}$' my_file > /dev/null

real    0m0.495s
user    0m0.411s
sys     0m0.084s

なぜこれが起こるのですか？私の考えでは、チャンキングについてのように見えますが、なぜ合格するのか説明できません。少ないパイプラインへのデータが必要です。たくさん処理時間が長くなります。国境は間違いなく1024に近いです。

システムはopenSUSE 15.3を実行し、Linuxカーネルは5.3.18-59.19-defaultです。

追加情報：

--line-bufferedgrepに追加しても違いはありません
使用すると、awk 'length >= nプロセスは非常に迅速に実行されます。

$ time grep '^.{1021,}$' my_file | wc
    263   30511 1899841

real    0m0.162s
user    0m0.147s
sys     0m0.031s
$ time grep '^.{1022,}$' my_file | wc
    263   30511 1899841

real    0m11.514s
user    0m11.044s
sys     0m0.487s

$ ulimit -p
8
$ time grep --line-buffered '^.{1021,}$' my_file | cat > /dev/null

real    0m0.137s
user    0m0.120s
sys     0m0.027s
$ time grep --line-buffered '^.{1022,}$' my_file | cat > /dev/null

real    0m11.528s
user    0m10.989s
sys     0m0.547s
$ time awk 'length >= 1021' my_file | cat > /dev/null

real    0m0.044s
user    0m0.041s
sys     0m0.008s
$ time awk 'length >= 1022' my_file | cat > /dev/null

real    0m0.044s
user    0m0.045s
sys     0m0.005s
$ time awk 'length >= 3000' my_file | cat > /dev/null

real    0m0.045s
user    0m0.038s
sys     0m0.012s

≥1022文字の行をリダイレクトするのが≥1021文字の行をリダイレクトするよりも83倍長くなるのはなぜですか？

ベストアンサー1

おすすめ記事