awk/sed: 複数の異なる文字列にカウンタを追加

2024-06-27 • tag-icon

テキストファイルがあり、興味のあるさまざまな文字列にカウンタを追加したいと思います。一例infile：

string_of_interest
abcd
efgh
another_string_of_interest
ijkl
abcd
another_string_of_interest
mnop
wxyz
string_of_interest
ijkl
wxyz
another_good_string
abcd
efgh
another_string_of_interest

ご覧のとおり、無視する文字列が複数あり、そのうちのいくつかは重複する可能性がありますが、outfile次の結果を生成するために文字列サブセットの繰り返し数だけを計算したいと思います。

string_of_interest_1
abcd
efgh
another_string_of_interest_1
ijkl
abcd
another_string_of_interest_2
mnop
wxyz
string_of_interest_2
ijkl
wxyz
another_good_string_1
abcd
efgh
another_string_of_interest_3

カウンタは、スネーク命名法を使用して各文字列の一部として追加されます。

私はあちこちでsed試してみawkましたが、あまりにも新しくて近い場所はありません。どんな提案がありますか？

ベストアンサー1

興味のある文字列のすべての行にキー文字列が含まれている場合"string"、次のようにすることができます。

awk '/string/{ $0=$0 "_" ++seen[$0] }1' infile

それ以外の場合は、該当する文字列に一致する各行に増分カウンタを追加する次のコードを使用します。

awk '
    $0 == "string_of_interest" ||
    $0 == "another_string_of_interest" ||
    $0 == "another_good_string" { $0=$0 "_" ++seen[$0] } 1
' infile

ベストアンサー1

おすすめ記事