文字列を抽出すると、正規表現で空の文字列が生成されます。

文字列を抽出すると、正規表現で空の文字列が生成されます。

次のファイルがあります。

awk -F'\t' '$3=="mRNA"'  GCF_000390325.2_Ntom_v01_genomic.gff | head
NW_008828495.1  Gnomon  mRNA    35293   38211   .   +   .   ID=rna-XM_009608413.3;Parent=gene-LOC104084433;Dbxref=GeneID:104084433,Genbank:XM_009608413.3;Name=XM_009608413.3;gbkey=mRNA;gene=LOC104084433;model_evidence=Supporting evidence includes similarity to: 6 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 76 samples with support for all annotated introns;product=cytochrome P450 CYP82D47-like;transcript_id=XM_009608413.3
NW_008828515.1  Gnomon  mRNA    6799    11530   .   +   .   ID=rna-XM_009591409.3;Parent=gene-LOC104116524;Dbxref=GeneID:104116524,Genbank:XM_009591409.3;Name=XM_009591409.3;gbkey=mRNA;gene=LOC104116524;model_evidence=Supporting evidence includes similarity to: 2 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 22 samples with support for all annotated introns;product=protein JASON-like%2C transcript variant X2;transcript_id=XM_009591409.3
NW_008828515.1  Gnomon  mRNA    6799    11530   .   +   .   ID=rna-XM_009630598.3;Parent=gene-LOC104116524;Dbxref=GeneID:104116524,Genbank:XM_009630598.3;Name=XM_009630598.3;gbkey=mRNA;gene=LOC104116524;model_evidence=Supporting evidence includes similarity to: 2 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 34 samples with support for all annotated introns;product=protein JASON-like%2C transcript variant X1;transcript_id=XM_009630598.3
NW_008828528.1  Gnomon  mRNA    2303    14453   .   +   .   ID=rna-XM_033657931.1;Parent=gene-LOC117278374;Dbxref=GeneID:117278374,Genbank:XM_033657931.1;Name=XM_033657931.1;gbkey=mRNA;gene=LOC117278374;model_evidence=Supporting evidence includes similarity to: 72%25 coverage of the annotated genomic feature by RNAseq alignments;product=uncharacterized LOC117278374;transcript_id=XM_033657931.1
NW_008828528.1  Gnomon  mRNA    5510    7652    .   -   .   ID=rna-XM_033657569.1;Parent=gene-LOC117278090;Dbxref=GeneID:117278090,Genbank:XM_033657569.1;Name=XM_033657569.1;gbkey=mRNA;gene=LOC117278090;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=uncharacterized mitochondrial protein AtMg00810-like%2C transcript variant X1;transcript_id=XM_033657569.1
NW_008828528.1  Gnomon  mRNA    5873    8848    .   -   .   ID=rna-XM_033657711.1;Parent=gene-LOC117278090;Dbxref=GeneID:117278090,Genbank:XM_033657711.1;Name=XM_033657711.1;gbkey=mRNA;gene=LOC117278090;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 2 samples with support for all annotated introns;product=uncharacterized mitochondrial protein AtMg00810-like%2C transcript variant X2;transcript_id=XM_033657711.1
NW_008828570.1  Gnomon  mRNA    5   6611    .   -   .   ID=rna-XM_009610342.3;Parent=gene-LOC104102329;Dbxref=GeneID:104102329,Genbank:XM_009610342.3;Name=XM_009610342.3;gbkey=mRNA;gene=LOC104102329;model_evidence=Supporting evidence includes similarity to: 27 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 56 samples with support for all annotated introns;partial=true;product=TATA-box-binding protein-like;start_range=.,5;transcript_id=XM_009610342.3
NW_008828592.1  Gnomon  mRNA    9998    13370   .   +   .   ID=rna-XM_033658453.1;Parent=gene-LOC104103684;Dbxref=GeneID:104103684,Genbank:XM_033658453.1;Name=XM_033658453.1;gbkey=mRNA;gene=LOC104103684;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 10 samples with support for all annotated introns;product=pentatricopeptide repeat-containing protein At1g15510%2C chloroplastic;transcript_id=XM_033658453.1
NW_008828592.1  Gnomon  mRNA    13457   18285   .   -   .   ID=rna-XM_009612846.3;Parent=gene-LOC104104451;Dbxref=GeneID:104104451,Genbank:XM_009612846.3;Name=XM_009612846.3;gbkey=mRNA;gene=LOC104104451;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 23 samples with support for all annotated introns;product=uncharacterized LOC104104451;transcript_id=XM_009612846.3
NW_008828641.1  Gnomon  mRNA    4417    7406    .   +   .   ID=rna-XM_009613787.3;Parent=gene-LOC104105226;Dbxref=GeneID:104105226,Genbank:XM_009613787.3;Name=XM_009613787.3;gbkey=mRNA;gene=LOC104105226;model_evidence=Supporting evidence includes similarity to: 8 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 75 samples with support for all annotated introns;product=heat shock factor protein HSF30%2C transcript variant X1;transcript_id=XM_009613787.3

次のコマンドを使用してID,product値を抽出していますが、.mrna1,

awk -F'\t' '$3=="mRNA"'  GCF_000390325.2_Ntom_v01_genomic.gff | perl -F'\t' -lane 'if($F[2] eq "mRNA"){/ID=([^\;]+).*product="([^"]+)/; print "$1.mrna1,$2"}' > GCF_000390325.2_Ntom_v01_genomic.gff.csv

私が得たい結果は次のとおりです。

rna-XM_009608413.3,cytochrome P450 CYP82D47-like
rna-XM_009591409.3,protein JASON-like%2C transcript variant X2
rna-XM_009630598.3,protein JASON-like%2C transcript variant X1
rna-XM_033657931.1,uncharacterized LOC117278374
rna-XM_033657569.1,uncharacterized mitochondrial protein AtMg00810-like%2C transcript variant X1
...

私が逃したものは何ですか?

よろしくお願いします。

ベストアンサー1

キャプチャ変数$ 1 $ 2 ercを使用するたびに、まずその変数が存在することを確認する必要があります。

この場合、$ 1 $ 2は空であり、警告はオンになっていないため、これについては通知されません。

正規表現では、product="の後に引用符が必要です。データには引用符はありません。-wオプションを使用してPerlを呼び出すことをお勧めします。

perl  -w -F'\t' -lane 'if(($F[2] eq "mRNA")&&/ID=([^\;]+).*product=([^;]+)/){print "$1.mrna1,$2"}'

おすすめ記事