「while」ループに「awk」を入れ子にして、2つのファイルを1行ずつ解析し、列の値を比較します。

Question

awk最初の問題は、そのように内部でbash変数を使用できないことです。$a内部awk評価は大地 aしかし、はaに定義されていないので空です。この問題を解決する1つの方法は、のオプションを使用して変数を定義することです。awkbashawk-v

-v var=val
--assign var=val
   Assign the value val to the variable var,  before  execution  of
   the  program  begins.  Such variable values are available to the
   BEGIN rule of an AWK program.

したがって、次のようにすることができます。

while read chr a b cov; do 
  awk -v a="$a" -v b="$b" '($2<=a && b <= $3) {print NR}' exons.bed > out$a$b 
done < reads.bed

しかし、別のエラーがあります。読み取りがエクソン内に属するには、読み取りの開始位置がエクソンの開始位置より大きく、終了位置がエクソンの終了位置より小さくなければなりません。これを使用して、$2<=a && b <= $3エクソン境界の外側から始まる読み取りを選択します。あなたが望むものです$2>=a && $3<=b。

とにかく、bashループでこれらのタスクを実行することは、各sumペアに対してa入力ファイルを一度読み取る必要があるため、非常に非効率的ですb。なぜやりませんかawk？

awk 'NR==FNR{a[NR]=$2;b[NR]=$3; next} {
        for (i in a){
           if($2>=a[i] && $3<=b[i]){
            out[i]=out[i]" "FNR 
        }}}
        END{for (i in out){
                   print "Exon",i,"contains reads of line(s)"out[i],\
                   "of reads file" 
        }}' exons.bed reads.bed

上記のスクリプトをサンプルファイルで実行すると、次の出力が生成されます。

Exon 1 contains reads of line(s) 1 of reads file
Exon 2 contains reads of line(s) 2 3 4 5 of reads file

わかりやすくするために、ここでは短縮されていない形式で同じ内容があります。

#!/usr/bin/awk -f

## While we're reading the 1st file, exons.bed
NR==FNR{
    ## Save the start position in array a and the end 
    ## in array b. The keys of the arrays are the line numbers.
    a[NR]=$2;
    b[NR]=$3; 
    ## Move to the next line, without continuing
    ## the script.
    next;
}
 ## Once we move on to the 2nd file, reads.bed
 {
     ## For each set of start and end positions
     for (i in a){
         ## If the current line's 2nd field is greater than
         ## this start position and smaller than this end position,
         ## add this line number (FNR is the current file's line number)
         ## to the list of reads for the current value of i. 
         if($2>=a[i] && $3<=b[i]){
             out[i]=out[i]" "FNR 
         }
     }
 }
 ## After both files have been processed
 END{
     ## For each exon in the out array
     for (i in out){
         ## Print the exon name and the redas it contains
         print "Exon",i,"contains reads of line(s)"out[i],
             "of reads file" 
        }

Answer 1