CSVファイルのマージ、フィールド区切り記号も引用符で囲まれています。

Question

明らかにcsvパーサーを使用する方が良いでしょうが、安全であると仮定できる場合

最初のフィールドにはカンマは含まれません。
最初のファイルに存在するIDのみが必要です（IDがfile2またはfile3にあるがfile1にはない場合は無視してください）。
これらのファイルはRAMに収まるほど小さいです。

これにより、このPerlメソッドが機能します。

#!/usr/bin/env perl 
use strict;

my %f;
## Read the files
while (<>) {
    ## remove trailing newlines
    chomp;
    ## Replace any commas within quotes with '|'.
    ## I am using a while loop to deal with multiple commas.
    while (s/\"([^"]*?),([^"]*?)\"/"$1|$2"/){}
    ## match the id and the rest.
    /^(.+?)(,.+)/; 
    ## The keys of the %f hash are the ids
    ## each line with the same id is appended to
    ## the current value of the key in the hash.
    $f{$1}.=$2; 
}
## Print the lines
foreach my $id (keys(%f)) {
    print "$id$f{$id}\n";
}

上記のスクリプトを別の名前で保存し、次のfoo.plように実行します。

perl foo.pl file1.csv file2.csv file3.csv

上記のスクリプトは1行で書くこともできます。

perl -lne 'while(s/\"([^"]*?),([^"]*)\"/"$1|$2"/){} /^(.+?)(,.+)/; $k{$1}.=$2; 
           END{print "$_$k{$_}" for keys(%k)}' file1 file2 file3

Answer 1

明らかにcsvパーサーを使用する方が良いでしょうが、安全であると仮定できる場合

最初のフィールドにはカンマは含まれません。
最初のファイルに存在するIDのみが必要です（IDがfile2またはfile3にあるがfile1にはない場合は無視してください）。
これらのファイルはRAMに収まるほど小さいです。

これにより、このPerlメソッドが機能します。

#!/usr/bin/env perl 
use strict;

my %f;
## Read the files
while (<>) {
    ## remove trailing newlines
    chomp;
    ## Replace any commas within quotes with '|'.
    ## I am using a while loop to deal with multiple commas.
    while (s/\"([^"]*?),([^"]*?)\"/"$1|$2"/){}
    ## match the id and the rest.
    /^(.+?)(,.+)/; 
    ## The keys of the %f hash are the ids
    ## each line with the same id is appended to
    ## the current value of the key in the hash.
    $f{$1}.=$2; 
}
## Print the lines
foreach my $id (keys(%f)) {
    print "$id$f{$id}\n";
}

上記のスクリプトを別の名前で保存し、次のfoo.plように実行します。

perl foo.pl file1.csv file2.csv file3.csv

上記のスクリプトは1行で書くこともできます。

perl -lne 'while(s/\"([^"]*?),([^"]*)\"/"$1|$2"/){} /^(.+?)(,.+)/; $k{$1}.=$2; 
           END{print "$_$k{$_}" for keys(%k)}' file1 file2 file3

CSVファイルのマージ、フィールド区切り記号も引用符で囲まれています。

ベストアンサー1

おすすめ記事