awkを使用して複数の列と行を含む破損したファイルを修正して統合するには?

awkを使用して複数の列と行を含む破損したファイルを修正して統合するには?

5つの列(フィールド)を含むCSV形式の複数行ファイルがあります。統合する必要があるさまざまな種類のコードを含む最初の破損した列を統合して修正する必要があります。最初のコード列の完全な最終形式は次のようになります。00AB[0-9][0-9][0-9][0-9][0-9]ここで、[0-9]は任意の数字にすることができます。00AB21345など。最初の4桁つまり00ABは常にそのままにしてください。しかし、次の5桁(つまり[0-9][0-9][0-9][0-9][0-9])は任意の数字にすることができ、5桁より大きい数字がある場合、左端の欠落数字は0になります。交換する必要があります。

Example  <111> --> <00AB00111> ; or <1111> --> <00AB01111>. 

たとえば、次のファイルがあるとします。

111     xx  yy  zzz ddd
1111    xx  yy  zzz ddd
11111   xx  yy  zzz ddd
A111    xx  yy  zzz ddd
A1111   xx  yy  zzz ddd
A11111  xx  yy  zzz ddd
AB111   xx  yy  zzz ddd
AB1111  xx  yy  zzz ddd
AB11111 xx  yy  zzz ddd
0A111   xx  yy  zzz ddd
0A1111  xx  yy  zzz ddd
0A11111 xx  yy  zzz ddd
0AB111  xx  yy  zzz ddd
0AB1111 xx  yy  zzz ddd
0AB11111 xx yy  zzz ddd
00A111  xx  yy  zzz ddd
00A1111 xx  yy  zzz ddd
00A11111xx  yy  zzz ddd
00AB111 xx  yy  zzz ddd
00AB1111 xx yy  zzz ddd
0AB11111 xx yy  zzz ddd
00AB12344   xx  yy  zzz ddd
00AB34527   xx  yy  zzz ddd
00AB56278   xx  yy  zzz ddd
00AB98902   xx  yy  zzz ddd

すべての可能なシナリオに対処するために、次のような長いawkスクリプトを作成しました。太字は、ファイル内で変更が必要な潜在的なシナリオを表します。
私のリクエストです。小さなスクリプトでこの問題を解決できるawkスクリプトを知っている人はいますか?それでは、私が学ぶことができるように詳細に説明できますか? :)

##111 Awk -F',' '{if($0~/[0-9][0-9][0-9]/){print "001AB00"suBstr($1,1,3)","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' SC3.csv > y1.csv

##1111
Awk -F',' '{if($0~/[0-9][0-9][0-9][0-9]/){print "001AB"suBstr($1,1,4)","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y1.csv > y2.csv
##11111
Awk -F',' '{if($0~/[0-9][0-9][0-9][0-9][0-9]/){print "001AB" suBstr($1,1,5)","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y2.csv > y3.csv
##A111
Awk -F',' '{if($0~/[A-Z][0-9][0-9][0-9]/){print "001"suBstr($1,1,1) "B00"suBstr($1,2,4)","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y3.csv > y4.csv
##A1111
Awk -F',' '{if($0~/[A-Z][0-9][0-9][0-9][0-9]/){print "001"suBstr($1,1,1) "B0" suBstr($1,2,5)","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y4.csv > y5.csv
##A11111
Awk -F',' '{if($0~/[A-Z][0-9][0-9][0-9[0-9][0-9]/){print "001"suBstr($1,1,1) "B" suBstr($1,2,6)","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y5.csv > y6.csv
##AB111
Awk -F',' '{if($0~/[A-Z][A-Z][0-9][0-9][0-9]/){print "001"suBstr($1,1,2) "00" suBstr($1,3,5)","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y6.csv > y7.csv
##AB1111
Awk -F',' '{if($0~/[A-Z][A-Z][0-9][0-9][0-9][0-9]/){print "001"suBstr($1,1,2)"0" suBstr($1,3,6)","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y7.csv > y8.csv
##AB11111
Awk -F',' '{if($0~/[A-Z][A-Z][0-9][0-9][0-9][0-9][0-9]/){print "001"suBstr($1,1,7)","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y8.csv > y9.csv
##1A111
Awk -F',' '{if($0~/[0-9][A-Z][0-9][0-9][0-9]/){print "00"suBstr($1,1,2) ",B00" suBstr($1,3,5) ","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y9.csv > y10.csv
##1A1111  
Awk -F',' '{if($0~/[0-9][A-Z][0-9][0-9][0-9][0-9]/){print "00"suBstr($1,1,1) "B0" suBstr($1,3,6) ","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y10.csv > y11.csv
##1A11111
Awk -F',' '{if($0~/[0-9][A-Z][0-9][0-9][0-9][0-9][0-9]/){print "00"suBstr($1,1,2) "B" suBstr($1,3,7) ","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y11.csv > y12.csv
##1AB111
Awk -F',' '{if($0~/[0-9][A-Z][A-Z][0-9][0-9][0-9]/){print "00"suBstr($1,1,1) suBstr($1,1,3)"00" suBstr($1,4,6) ","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y12.csv > y13.csv
##1AB1111
Awk -F',' '{if($0~/[0-9][A-Z][A-Z][0-9][0-9][0-9][0-9]/){print "00" suBstr($1,1,3) "0" suBstr($1,4,7) ","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y13.csv > y14.csv
##1AB11111
Awk -F',' '{if($0~/[0-9][A-Z][A-Z][0-9][0-9][0-9][0-9][0-9]/){print "00" suBstr($1,1,8) ","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y14.csv > y15.csv
##11A111
Awk -F',' '{if($0~/[0-9][0-9][A-Z][0-9][0-9][0-9]/){print "0" suBstr($1,1,3)"B00" suBstr($1,4,6) ","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y15.csv > y16.csv
##11A1111
Awk -F',' '{if($0~/[0-9][0-9][A-Z][0-9][0-9][0-9]/){print "0" suBstr($1,1,3)"B0" suBstr($1,4,7) ","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y16.csv > y17.csv
##11A11111
Awk -F',' '{if($0~/[0-9][0-9][A-Z][0-9] [0-9][0-9][0-9]/){print "0" suBstr($1,1,3)"B" suBstr($1,4,8) ","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y17.csv > y18.csv
##11AB111
Awk -F',' '{if($0~/[0-9][0-9] [A-Z][[A-Z][0-9][0-9][0-9]/){print "0" suBstr($1,1,4)"00" suBstr($1,5,7) ","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y18.csv > y19.csv
##11AB1111
Awk -F',' '{if($0~/[0-9][0-9] [A-Z][[A-Z][0-9][0-9][0-9][0-9]/){print "0" suBstr($1,1,4)"0" suBstr($1,5,8) ","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y19.csv > y20.csv
##1AB11111
Awk -F',' '{if($0~/[0-9][0-9] [A-Z][[A-Z][0-9][0-9][0-9][0-9][0-9]/){print "0" suBstr($1,5,9) ","$2","$3","$4","$5;}else{print $1","$2","$3","$4","$5;}}' y20.csv > y21.csv` 

ベストアンサー1

おそらく:

awk 'sub("^0?0?A?B?","",$1) && $1=sprintf("00AB%05d",$1)'

フィールド1のすべての先行フラグメントを削除し、それを最大長5までゼロで埋められた残りの数字が後続の数字00ABに変換します。00AB

この式は常にtrueなので、暗黙的なアクション{ print }がトリガーされます。正規表現はnullである可能性があるため、常にsubtrueです。少し狡猾です!空の文字列が一致しても成功した一致であるため、^0?0?A?B?置換が発生します。

おすすめ記事