次のtxtデータがあります(df1.txt)。
>tr|A0A1B1L9R9|A0A1B1L9R9_BACTU
MNKQLFLASLKETQKSILSYACGAALYLWLLIWIFPSMVSAKGLNELIAAMPDSVKKIVG
MESPIQNVMDFLAGEYYSLLFIIILTIFCVTVATHLIARHVDKGAMAYLLATPVSRVQIA
ITQATVLILGLLIIVSVTYVAGLVGAEWFLQDNNLNKELFLKINIVGGLIFLVVSAYSFF
FSCICNDERKALSYSASLTILFFVLDMVGKLSDKLEWMKNLSLFTLFRPKEIAEGAYNIW
PVSIGLIAGALCIFIVAIVVFKKRDLPL
>sp|O15304|SIVA_HUMAN
MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDE
GCAVVHLPESPKPGPTGAPRAARGQMLIGPDGRLIRSLGQASEADPSGVASIACSSCVRA
VDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET
次のtxtデータがあります(df2.txt)。
tr|A0A1B1L9R9|A0A1B1L9R9_BACTU ABC transporter permease OS=Bacillus thuringiensis OX=1428 GN=berB PE=4 SV=1
sp|O15304|SIVA_HUMAN Apoptosis regulatory protein Siva OS=Homo sapiens OX=9606 GN=SIVA1 PE=1 SV=2
同様の情報に基づいてマージしたいので、次のような出力が必要です。
>tr|A0A1B1L9R9|A0A1B1L9R9_BACTU ABC transporter permease OS=Bacillus thuringiensis OX=1428 GN=berB PE=4 SV=1
MNKQLFLASLKETQKSILSYACGAALYLWLLIWIFPSMVSAKGLNELIAAMPDSVKKIVG
MESPIQNVMDFLAGEYYSLLFIIILTIFCVTVATHLIARHVDKGAMAYLLATPVSRVQIA
ITQATVLILGLLIIVSVTYVAGLVGAEWFLQDNNLNKELFLKINIVGGLIFLVVSAYSFF
FSCICNDERKALSYSASLTILFFVLDMVGKLSDKLEWMKNLSLFTLFRPKEIAEGAYNIW
PVSIGLIAGALCIFIVAIVVFKKRDLPL
>sp|O15304|SIVA_HUMAN Apoptosis regulatory protein Siva OS=Homo sapiens OX=9606 GN=SIVA1 PE=1 SV=2
MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDE
GCAVVHLPESPKPGPTGAPRAARGQMLIGPDGRLIRSLGQASEADPSGVASIACSSCVRA
VDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET
私は成功せず、これを試しています。どんなアイデアがありますか?
cat df1.txt | seqkit replace -k df2.txt -p '(.+)' -r '$1 {kv}'
ベストアンサー1
awkでは、次のことができます。
awk '
NR==FNR {a[">"$1] = ">"$0; next} $1 in a {$1 = a[$1]} 1
' df2.txt df1.txt
前任者。
$ awk 'NR==FNR {a[">"$1] = ">"$0; next} $1 in a {$1 = a[$1]} 1' df2.txt df1.txt
>tr|A0A1B1L9R9|A0A1B1L9R9_BACTU ABC transporter permease OS=Bacillus thuringiensis OX=1428 GN=berB PE=4 SV=1
MNKQLFLASLKETQKSILSYACGAALYLWLLIWIFPSMVSAKGLNELIAAMPDSVKKIVG
MESPIQNVMDFLAGEYYSLLFIIILTIFCVTVATHLIARHVDKGAMAYLLATPVSRVQIA
ITQATVLILGLLIIVSVTYVAGLVGAEWFLQDNNLNKELFLKINIVGGLIFLVVSAYSFF
FSCICNDERKALSYSASLTILFFVLDMVGKLSDKLEWMKNLSLFTLFRPKEIAEGAYNIW
PVSIGLIAGALCIFIVAIVVFKKRDLPL
>sp|O15304|SIVA_HUMAN Apoptosis regulatory protein Siva OS=Homo sapiens OX=9606 GN=SIVA1 PE=1 SV=2
MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDE
GCAVVHLPESPKPGPTGAPRAARGQMLIGPDGRLIRSLGQASEADPSGVASIACSSCVRA
VDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET