n番目の反復列データの読み取り/演算

2024-06-20 • tag-icon

異なるサンプルの遺伝子数を含む行列があります。

Col1: GeneName
Col2: Length
Col3;Col4;Col5; Counts for genes in sampleA/sampleB/sampleC
Col6;Col7;Col8; Total counts in sampleA/sampleB/sampleC

以下はサンプルマトリックスです。

A1BG    1758    53  4373    207 46005749    43849471    31554941 
A1BG-AS1    2126    5   88  12  46005749    43849471    31554941
A1CF    9695    8882    3522    437 46005749    43849471    31554941 
A2M 5399    15963   12325   7227    46005749    43849471    31554941 
A2M-AS1 6660    50  33  36  46005749    43849471    31554941

他のサンプルについては counts_sampleA / (total_counts_sampleA*Length) などに分けたいと思います。

ファイルの猫awk 'BEGIN {OFS="\t"} { print $1,$2,$3/($6*$2),$4/($7*$2),$5/($8*$2) }'

これは予想される結果です

A1BG    1758    6.55307e-10 5.67278e-08 3.73151e-09  
A1BG-AS1    2126    5.11204e-11 9.43963e-10 1.78875e-10   
A1CF    9695    1.99136e-08 8.28471e-09 1.42845e-09   
A2M 5399    6.42672e-08 5.20606e-08 4.24207e-08   
A2M-AS1 6660    1.63186e-10 1.12999e-10 1.71301e-10

うまく動作しますが、行列が大きい場合はうまく動作しません。 column3-colum102 に geneCountinEachSample があり、Coulmn103-column202 に totalCountinEachSample を含む 100 個のサンプルがある場合、どうすれば作成できますか？

より多くのサンプルがあるときに必要な数の列を処理できるように、forループで使用したいと思います。

cat inFile | awk 'BEGIN {OFS="\t"} { row=NF; samples=3; size=$samples+2; for ( i=3; i<=$size; i++); END print $i/$[$i+$samples] }'

これを行う方法に関する提案。ありがとうございます！

ベストアンサー1

まあ、あなたはほとんど答えを得ました：

awk '
     {cols=((NF/2) + 1)
      for (i=1; i <= cols; i++) {
          if (i >= 3) {
              count_index= i + cols - 2
              printf("%s\t", 1.0 * $i / ($count_index * $2))
          } else {
              printf("%s\t", $i) 
          }
      }
      printf("\n")
     }' inFile

使い方はcat file | awk ...次善策です。 awkはファイルをパラメータとして直接処理するので、そうする方がawk ... < infile安いです。猫の無駄な使用。

ベストアンサー1

おすすめ記事