大容量ファイルで複数行式で sed を使用する場合のメモリ不足

Question

最初の3つのコマンドが犯人です。

:a
N
$!ba

これにより、ファイル全体が一度にメモリに読み込まれます。次のスクリプトは、一度に1つのセグメントのみをメモリに保持できます。

% cat test.sed
#!/usr/bin/sed -nf

# Append this line to the hold space. 
# To avoid an extra newline at the start, replace instead of append.
1h
1!H

# If we find a paren at the end...
/)$/{
    # Bring the hold space into the pattern space
    g
    # Remove the newlines
    s/\n//g 
    # Print what we have
    p
    # Delete the hold space
    s/.*//
    h
}
% cat test.in
a
b
c()
d()
e
fghi
j()
% ./test.sed test.in
abc()
d()
efghij()

このawkソリューションは各行を印刷するため、一度にメモリに1行だけ残ります。

% awk '/)$/{print;nl=1;next}{printf "%s",$0;nl=0}END{if(!nl)print ""}' test.in
abc()
d()
efghij()

Answer 1

最初の3つのコマンドが犯人です。

:a
N
$!ba

これにより、ファイル全体が一度にメモリに読み込まれます。次のスクリプトは、一度に1つのセグメントのみをメモリに保持できます。

% cat test.sed
#!/usr/bin/sed -nf

# Append this line to the hold space. 
# To avoid an extra newline at the start, replace instead of append.
1h
1!H

# If we find a paren at the end...
/)$/{
    # Bring the hold space into the pattern space
    g
    # Remove the newlines
    s/\n//g 
    # Print what we have
    p
    # Delete the hold space
    s/.*//
    h
}
% cat test.in
a
b
c()
d()
e
fghi
j()
% ./test.sed test.in
abc()
d()
efghij()

このawkソリューションは各行を印刷するため、一度にメモリに1行だけ残ります。

% awk '/)$/{print;nl=1;next}{printf "%s",$0;nl=0}END{if(!nl)print ""}' test.in
abc()
d()
efghij()

大容量ファイルで複数行式で sed を使用する場合のメモリ不足

ベストアンサー1

おすすめ記事