文章まとめ[繰り返し]

文章まとめ[繰り返し]

データがあり、結論を導き出すために文を要約したいと思います。以下の例はデータとは関係がなく、アイデアを明確にして再現できます。

Employee Suzie signed one time.
Employee Dan signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzie signed one time.
Employee Harold signed one time.
Employee Sebastian signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzan signed one time.

私はこれらの文を次のように要約したいと思います。

Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)

持って遊んawkだけどやりにくいようです。その後、sed成功しませんでした。それはsed物事を発見し変更するようです。

ベストアンサー1

一般的な方法は

$ awk '{ count[$2]++ }
       END {
           for (name in count)
               printf("%s signed %d time(s)\n", name, count[name])
       }' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)

つまり、連想配列/ハッシュを使用して、特定の名前が表示された回数を保存します。ENDブロック内のすべての名前を繰り返し、各名前の要約を印刷します。

より良い形式を指定するには、呼び出しで%sプレースホルダを変更して名前(左揃え)に10文字を予約しますprintf()%-10s

$ awk '{ count[$2]++ }
       END {
           for (name in count)
               printf("%-10s signed %d time(s)\n", name, count[name])
       }' <file
Harold     signed 1 time(s)
Dan        signed 1 time(s)
Sebastian  signed 1 time(s)
Suzie      signed 4 time(s)
Jordan     signed 2 time(s)
Suzan      signed 1 time(s)

出力をもう少し操作してみましょう(退屈して)。

$ awk '{ count[$2]++ }
       END {
           for (name in count)
               printf("%-10s signed %d time%s\n", name, count[name],
                      count[name] > 1 ? "s" : "" )
       }' <file
Harold     signed 1 time
Dan        signed 1 time
Sebastian  signed 1 time
Suzie      signed 4 times
Jordan     signed 2 times
Suzan      signed 1 time

おすすめ記事