Perlまたはbashでのテキストファイルの集約とグループ化

Question

パールから

perl -F';' -lane 'push @{$h{join ";",@F[0..2]}},$F[3];
                  END{
                    for(sort keys %h){
                        print "$_: ". join ",",@{$h{$_}};
                    }
                  }' your_file

連想配列を使用して同様のことを行うことができなければなりませawkんが、私はこれに精通していないので、awk実際のコードに貢献することはできません。

説明する

以下は、「魔法」をできるだけ少なく使用する上記のコードの拡張バージョンです。

open($FH,"<","your_file");
while($line=<$FH>){ # For each line in the file (accomplished by -n)
    chomp $line; # Remove the newline at the end (done by -l)
    # The ; is set by -F and storing the split in @F done by -a
    @F = split /;/,$line # Split the line into fields on ;
    $app_id = join ";",@F[0..2]; # AppID is the first 3 fields
    push @{$h{$app_id}},$F[3]; # The 4th field is added onto the hash
} # The whole file has been read at this point.
foreach $key (sort keys %h){ # Sort the hash by AppID
     print "$key: " . join ",",@{h{$key}}."\n"; # Print the array values
     # The newline ("\n") added at the end is also done by -l
}

これで、pushこの文だけを詳しく説明できます。

push通常、配列変数に要素を追加するために使用されます。たとえば、
```
push @a,$x
```
変数の内容を$x配列に追加します@a。
ファイルを1行ずつ読み込むループがハッシュテーブル（%h）を埋めています。ハッシュのキーはAppIDであり、各キーに対応する値はそのAppIDに関連付けられているすべてのユーザーIDを含む配列です。これは匿名配列です（名前なし）。 Perlは配列参照として実装されています（Cポインタとやや似ています）。%hAppID に対応する値は$app_idで表されるため、Perl$h{$app_id}配列 sigial( @) を追加するとハッシュ値を配列として処理し (配列参照逆参照)、現在のユーザー ID をここにプッシュします。
「Perlish」のように感じることができるもう1つの選択肢は、4番目のフィールドを現在の値にリンクすることです。
```
while(...) { ... $h{$app_id} = $h{$app_id} . ",$F[3]" }
foreach $key (sort keys %h) { print "$_: $h{$_}" }
```
Perlは.文字列連結演算子です。

説明されたコードでは、perl -e '...'構文の強調表示がコードに到達し、読みやすくするためにラッパーを省略しました。

Answer 1