列のパターンで始まる情報のみを選択して別の列に印刷する方法

Question

これにより、POSIX awk図のように機能することができます。カンマ付きの3番目のフィールドを1より大きい行に分割し、S：で始まる要素を選択し、コンマで連結して最後の+1フィールドにします。

awk -F '\t' '
  BEGIN {
    OFS = FS
    _SEP_ = ","
  }

  NR==1{$(NF+1) = "S_mutation"}

  NR>1&&length($3)>0{
    nf = split($3, a, _SEP_)
    t = ""
    for (i=1; i<=nf; i++) {
      if (a[i] ~ /^S:/) {
        t = t (t=="" ? t : _SEP_) a[i]
      }
    }
    $(NF+1) = t
  }1
' file

同じPerlですが、正規表現を使うと

perl -lnse '$,="\t";
  print $_,($.==1?q(S_mutation):
  "@{[/(?:\t|,)\KS:[^,]*/g]}"||());
' -- -\"=, ./file

出力：

id   clade  mutation             S-mutation
243  40A    S:ojo,L:juju,S:lili  S:ojo,S:lili
254                              
267  40B    J:jijy,S:asel,M:ase  S:asel

Answer 1

これにより、POSIX awk図のように機能することができます。カンマ付きの3番目のフィールドを1より大きい行に分割し、S：で始まる要素を選択し、コンマで連結して最後の+1フィールドにします。

awk -F '\t' '
  BEGIN {
    OFS = FS
    _SEP_ = ","
  }

  NR==1{$(NF+1) = "S_mutation"}

  NR>1&&length($3)>0{
    nf = split($3, a, _SEP_)
    t = ""
    for (i=1; i<=nf; i++) {
      if (a[i] ~ /^S:/) {
        t = t (t=="" ? t : _SEP_) a[i]
      }
    }
    $(NF+1) = t
  }1
' file

同じPerlですが、正規表現を使うと

perl -lnse '$,="\t";
  print $_,($.==1?q(S_mutation):
  "@{[/(?:\t|,)\KS:[^,]*/g]}"||());
' -- -\"=, ./file

出力：

id   clade  mutation             S-mutation
243  40A    S:ojo,L:juju,S:lili  S:ojo,S:lili
254                              
267  40B    J:jijy,S:asel,M:ase  S:asel

列のパターンで始まる情報のみを選択して別の列に印刷する方法

ベストアンサー1

おすすめ記事