列1の「Query_」ヘッダーの間にあるすべての行を削除するsedまたはawkコマンドがあるかどうか疑問に思います(各ヘッダー間の行数が5個未満の場合)。以下は、1Gb程度の大容量ファイルから抜粋した内容です。さまざまな方法を試しましたが、すべて失敗しました。
Query_10 26 KMGKWYPTEDAPAKKRKTQSWRQNKSKLRGGIVPGQVLIILAGKHKGKRVVYLTQLSTGE 205
XP_010718494 131 KMPRYYPTEDVPRKSHGKKPFSQHKRRLRASITPGTVLILLTGRHRGKRVVFLKQLGTGL 192
NP_001291831 111 KMPRYYPTEDVPRKSHGKKPFSQHVRKLRASITPGTILIILTGRHRGKRVVFLKQLSSGL 172
Query_10 206 IVVTGPHKFNRCPLKKLAQSFTMPTSTFVDI*GLNFDITEQHFVKEKP**SSEEAQFFAK 385
XP_010718494 193 LLVTGPLVVNRVPLRRAHQKFVIATSTKVDISGVKIHLTDAYFKKKKLRKPKQEGEIFDT 255
NP_001291831 173 LLVTGPLSLNRVPLRRTHQKFVIATSTKIDISSVKIHLTDAYFKKKKP--RHQEGEIFDT 235
XP_012359817 173 LLVTGPLVLNRVPLRRTHQKFVIATSTKIDISNVKIHLTDAYFKKKKP--RHQEGEIFDT 235
XP_009246541 173 LLVTGPLVLNRVPLRRTHQKFVIATSTKIDISNVKIHLTDAYFKKKKP--RHQEGEIFDT 235
XP_003225150 155 LLVTGPLAINRVPLRRAHQKFVIATSTKVDISSVKLHLNDVYFKKKKLRKPKQEGEIFDT 217
Query_13 31 MEEQKEKGLSNPEVV*KYRQCSEIVNQVLSTVVSSCVPGADVASICTNGDFLIEDGLRNI 210
XP_002947167 7 IQGEQEPNLSVPEVVTKYKAAADICNRALQAVIDGCKDGSKIVDLCRTGDNFITKECGNI 66
XP_004993505 1 MELDRQSKVVDADALSKYRAAAAIANDCVQQLVANCIAGADVYTLAVEADTYIEQKLKEL 60
XP_006961234 1 MSETKEYSLNNPDTLTKYKTAAQISEKVLAAVSDLCVPGAKIVDICQQGDKLIEEELAKV 62
XP_008089018 1 MSEETDYTLNNPDTLTKYKTAAQISEKVLAAVAELVVPGEKIVTICEKGDKLIEEELAKV 60
Query_13 211 EPDTNIEKGIAIPVCLNINNICSYYSPLPDASTTLQEGDLVKVDLGAHFDGYIVSAASSI 390
XP_004029906 65 YTKKKVEKGPAFPTCISINEICGHYSPLLSDSSLLKEGDVVKIDLGTHIDGFIALGAHTV 131
XP_004031065 64 FTKKKLQKGPAFPTCISVNEICGHYSPLISDSSLLKEGDVVKIDLGAQIDGFIALAAHTV 130
XP_003223249 65 KKEKDMKKGIAFPTSISVNNCVCHFSPLKDQDYILKEGDLVKIDLGVHVDGFISNVAHSF 125
XP_002947167 67 YKGKQIEKGVAFPTCVSVNSVVGHFSPNADDTSALKAGDVVKFDMGCHIDGFIATQATTV 126
XP_003880798 73 ENGKKMEKGIAFPTCISINEICGHFSPVEENAETLTEGDVVKIDMGCHIDGYISVVAYTV 135
XP_004348044 69 KANKKVKKGIAFPTCVSLNSTVCHQSPLSDAAITLQAGDVAKVDLGVHVDGLIAVVAHTI 129
XP_003284133 69 HSKKKIEKGIAFPTCISVNNCVGHYSPLKATSRSLVDGDIVKIDLGVHINGFIAVGAHTI 128
NP_001241588 65 YKNVKIERGVAFPTCLSINNVVCHFSPLASDEAVLEEGDILKIDMACHIDGFIAVVAHTH 126
XP_009039553 76 YQKKIIDKGVAFPTCVSVNECVCHNSPLESDTTSLSEGDLVKLDVGCYVDGYIAVAAHTM 141
必要な結果は次のとおりです。
Query_10 206 IVVTGPHKFNRCPLKKLAQSFTMPTSTFVDI*GLNFDITEQHFVKEKP**SSEEAQFFAK 385
XP_010718494 193 LLVTGPLVVNRVPLRRAHQKFVIATSTKVDISGVKIHLTDAYFKKKKLRKPKQEGEIFDT 255
NP_001291831 173 LLVTGPLSLNRVPLRRTHQKFVIATSTKIDISSVKIHLTDAYFKKKKP--RHQEGEIFDT 235
XP_012359817 173 LLVTGPLVLNRVPLRRTHQKFVIATSTKIDISNVKIHLTDAYFKKKKP--RHQEGEIFDT 235
XP_009246541 173 LLVTGPLVLNRVPLRRTHQKFVIATSTKIDISNVKIHLTDAYFKKKKP--RHQEGEIFDT 235
XP_003225150 155 LLVTGPLAINRVPLRRAHQKFVIATSTKVDISSVKLHLNDVYFKKKKLRKPKQEGEIFDT 217
Query_13 211 EPDTNIEKGIAIPVCLNINNICSYYSPLPDASTTLQEGDLVKVDLGAHFDGYIVSAASSI 390
XP_004029906 65 YTKKKVEKGPAFPTCISINEICGHYSPLLSDSSLLKEGDVVKIDLGTHIDGFIALGAHTV 131
XP_004031065 64 FTKKKLQKGPAFPTCISVNEICGHYSPLISDSSLLKEGDVVKIDLGAQIDGFIALAAHTV 130
XP_003223249 65 KKEKDMKKGIAFPTSISVNNCVCHFSPLKDQDYILKEGDLVKIDLGVHVDGFISNVAHSF 125
XP_002947167 67 YKGKQIEKGVAFPTCVSVNSVVGHFSPNADDTSALKAGDVVKFDMGCHIDGFIATQATTV 126
XP_003880798 73 ENGKKMEKGIAFPTCISINEICGHFSPVEENAETLTEGDVVKIDMGCHIDGYISVVAYTV 135
XP_004348044 69 KANKKVKKGIAFPTCVSLNSTVCHQSPLSDAAITLQAGDVAKVDLGVHVDGLIAVVAHTI 129
XP_003284133 69 HSKKKIEKGIAFPTCISVNNCVGHYSPLKATSRSLVDGDIVKIDLGVHINGFIAVGAHTI 128
NP_001241588 65 YKNVKIERGVAFPTCLSINNVVCHFSPLASDEAVLEEGDILKIDMACHIDGFIAVVAHTH 126
XP_009039553 76 YQKKIIDKGVAFPTCVSVNECVCHNSPLESDTTSLSEGDLVKLDVGCYVDGYIAVAAHTM 141
私が試したPythonスクリプト:
lines = [line.rstrip() for line in open('infile.txt')]
for line in lines:
data = line.split()
sequence = data[2]
if data[0].startswith("Query_"):
hits = [i for i,c in enumerate(sequence) if c == <50]
continue
else:
print(list(sequence[plus50] for plus50 in hits))
ベストアンサー1
そしてsed:
sed '
/^Query_/{ #starts loop when meet patten
:a
$!{
N
/\nQuery_/!ba #untill meet next pattern
}
/\(\n.*\)\{6,\}/{ #checks how many lines in block
$b #for end of file
s/\nQuery_/\n&/ #marks lines to print
}
}
/\n\n/P #prints marked lines
D #remove 1st line in block, go to start
'
その他のスクリプトフォームアッ:
awk '
/^Query/{c=0;lines=$0;next}
++c<5{lines=lines "\n" $0;next}
c==5{print lines}
1 #short for {print}
'