awkを使用して2つのCSVファイルを一致させるには？

Question

以下は高速で、そのsort部分だけがメモリに大量のデータを格納し、ページングなどを実行するように構築されています。処理すれば大丈夫でしょう。

$ cat tst.sh
#!/usr/bin/env bash

# First awk output is hdrFlag,fileNr,ID,VALs1-4 then we sort on
# the hdrFlag to handle the header line first, then the key values
# so we can process all matching keys together from both input
# files so we only have to store the IDs for the current key set.
awk 'BEGIN{FS=OFS=","} FNR==1{++fileNr} {print (FNR>1), fileNr, $0}' "$@" |
sort -t, -k1,1n -k4 |
awk '
    BEGIN { FS=OFS="," }
    {
        curr = $4 FS $5 FS $6 FS $7
        if ( curr != prev ) {
            prt()
            prev = curr
        }
        ids[$2] = ($2 in ids ? ids[$2] " " : "") $3
    }
    END { prt() }

    function prt(       file,numFiles) {
        for (file in ids) {
            numFiles++
        }
        if (numFiles > 1) {
            print ids[1], ids[2]
        }
        delete ids
    }
'

。

$ ./tst.sh file1 file2
ID1,ID2
1,55
2,84

同じ4つの値のセットに対して、ファイル間に複数の一致する項目がある状況をどのように処理するかを推測しています。

Answer 1

以下は高速で、そのsort部分だけがメモリに大量のデータを格納し、ページングなどを実行するように構築されています。処理すれば大丈夫でしょう。

$ cat tst.sh
#!/usr/bin/env bash

# First awk output is hdrFlag,fileNr,ID,VALs1-4 then we sort on
# the hdrFlag to handle the header line first, then the key values
# so we can process all matching keys together from both input
# files so we only have to store the IDs for the current key set.
awk 'BEGIN{FS=OFS=","} FNR==1{++fileNr} {print (FNR>1), fileNr, $0}' "$@" |
sort -t, -k1,1n -k4 |
awk '
    BEGIN { FS=OFS="," }
    {
        curr = $4 FS $5 FS $6 FS $7
        if ( curr != prev ) {
            prt()
            prev = curr
        }
        ids[$2] = ($2 in ids ? ids[$2] " " : "") $3
    }
    END { prt() }

    function prt(       file,numFiles) {
        for (file in ids) {
            numFiles++
        }
        if (numFiles > 1) {
            print ids[1], ids[2]
        }
        delete ids
    }
'

。

$ ./tst.sh file1 file2
ID1,ID2
1,55
2,84

同じ4つの値のセットに対して、ファイル間に複数の一致する項目がある状況をどのように処理するかを推測しています。

awkを使用して2つのCSVファイルを一致させるには？

ベストアンサー1

おすすめ記事