非常に大きな文書のn行ごとに文字列を切り取る

2024-06-21 • tag-icon

ここに文書Aがあります。

@rand1
ABCDEFBHIJKLM
+
<</////
@rand2
NOPQRSTUVW
+
<<//<<<
@anotherrand
XYZABCDE
+
<<//<<<

文書Aのすべての行を含む出力が必要ですが、2,6,10...（パターン2 + n * 4）行は最初の3文字だけを含むように切り捨てられました。出力は次のとおりです。

@rand1
ABC
+
<</////
@rand2
NOP
+
<<//<<<
@anotherrand
XYZ
+
<<//<<<

私は非常に大きなファイル（> 1000万行）でこれをやっていますが、それをすばやく作成する方法が見つからないようです。次のコードは私が望むものを達成しますが、時間がかかりすぎます。

r=0 #line number of documentA being read
l=2 #line that needs to be trimmed

while read line; do
  r=$(echo $r +1 | bc)
  echo $r
  if [ $r == $l ]
  then
    echo $line | cut -c -3 >> outputfile
    l=$(echo $l + 4 | bc)
  else
    echo $line >> outputfile
  fi
done < document A

ベストアンサー1

GNU sed（OSXで利用可能gsed）の場合は、"nskipm"アドレス演算子を使用してください。

sed -E '2~4s/(.{3}).*/\1/' inputfile > outputfile

ベストアンサー1

おすすめ記事