両方のファイルの正規表現

Question

#!/usr/bin/perl

use strict;

my $f1 = shift;
my $f2 = shift;

open(F1,"<",$f1) || die "couldn't open '$f1' for read: $!\n";
open(F2,"<",$f2) || die "couldn't open '$f2' for read: $!\n";

# set the input record separator (IRS) to '@'
$/='@';

# Normally the IRS is found at the END of a record, but your input
# files START with the input record separator, so we need to throw
# away the first (bogus) input record (i.e. everything from the start
# of the files to the first @ characters in them. in other words, the
# first @ characters in both files.)
my $junk = <F1>;
$junk = <F2>;

while (!eof(F1) && !eof(F2)) {
  my @record1 = split(/\n/, <F1>);
  my @record2 = split(/\n/, <F2>);

  printf "%s%s\n", $/, $record1[0];  # prepend the IRS
  printf "%s%s\n", substr($record2[1],0,4), $record1[1];
  printf "%s\n",   $record2[2];
  printf "%s%s\n", $record2[3], $record1[3];
};

close(F1);
close(F2);

これにより、読み取るために両方のファイルを開き、$/Perlの入力レコード区切り変数を1文字@に設定します。

その後、両方のファイルがEOFに達していない場合は、各ファイルから1つのレコードを読み取り、レコードを配列に分割し（改行を区切り\n文字として使用）、指定したようにマージされたレコードを出力します。

Perl配列は1ではなく0から始まります。たとえば、$record1[0]file1に書き込まれた最初の行は次のようになります。

スクリプトをファイル（たとえばhassan.pl）に保存して実行可能にしたら、chmod +x hassan.pl次のように実行します。

出力例：

$ ./hassan.pl file1.txt file2.txt  
@NB551168:120:HTKN2BGX5:1:11101:3598:1051 2:N:0:NATC
NATCCAATCTCTAAAGTTT
+
#EEEAA/A/EEEE///EEE
@NB551168:120:HTKN2BGX5:1:11101:24202:1051 2:N:0:NTCG
NTCGTGAGACCGGGTGTTG
+
#EEEAAAAAAEEE///<AA
@NB551168:120:HTKN2BGX5:1:11101:4381:1051 2:N:0:NCTT
NCTTGCTACTCCTAAGGCA
+
#EEAA////6/////EE//

diff（この出力が必要なものと正確に一致していることを確認しました。）

Answer 1

#!/usr/bin/perl

use strict;

my $f1 = shift;
my $f2 = shift;

open(F1,"<",$f1) || die "couldn't open '$f1' for read: $!\n";
open(F2,"<",$f2) || die "couldn't open '$f2' for read: $!\n";

# set the input record separator (IRS) to '@'
$/='@';

# Normally the IRS is found at the END of a record, but your input
# files START with the input record separator, so we need to throw
# away the first (bogus) input record (i.e. everything from the start
# of the files to the first @ characters in them. in other words, the
# first @ characters in both files.)
my $junk = <F1>;
$junk = <F2>;

while (!eof(F1) && !eof(F2)) {
  my @record1 = split(/\n/, <F1>);
  my @record2 = split(/\n/, <F2>);

  printf "%s%s\n", $/, $record1[0];  # prepend the IRS
  printf "%s%s\n", substr($record2[1],0,4), $record1[1];
  printf "%s\n",   $record2[2];
  printf "%s%s\n", $record2[3], $record1[3];
};

close(F1);
close(F2);