存在しない場合は、あるファイルから別のファイルの文字列を見つけ、元のファイルから削除します。

2024-06-24 • tag-icon

text-processing

存在しない場合は、あるファイルから別のファイルの文字列を見つけ、元のファイルから削除します。

ファイル内の各行を調べて、他のテキストファイルのどの行でも行が一致しない場合は、元のファイルからその行を削除するスクリプトを作成しようとしています。

このスクリプトに必要な入出力の例は次のとおりです。

入力例：ファイル1（グループファイル）、

hello
hi hello
hi
great
interesting

           file 2: 
this is a hi you see
this is great don't ya think
sometimes hello is a good expansion of its more commonly used shortening hi
interesting how brilliant coding can be just wish i could get the hang of it

サンプルスクリプト出力 - ファイル1が次に変更されます。

hello
hi
great
interesting

hi helloそのため、2番目のファイルには存在しないため削除されました。

ここにスクリプトがあります。変数を生成するポイントまで動作するようです。

#take first line from stability.contigs.groups
echo | head -n1 ~/test_folder/stability.contigs.groups > ~/test_folder/ErrorFix.txt
#remove the last 5 character
sed -i -r '$ s/.{5}$//' ~/test_folder/ErrorFix.txt 

#find match of the word string in errorfix.txt in stability.trim.contigs.fasta if not found then delete the line containing the string in stability.contigs.groups
STRING=$(cat ~/test_folder/MothurErrorFix.txt)
FILE=~/test_folder/stability.trim.contigs.fasta
if [ ! -z $(grep "$STRING" "$FILE") ]
    then
        perl -e 's/.*\$VAR\s*\n//' ~/test_folder/stability.contigs.groups
fi

ベストアンサー1

お持ちの場合は、gnu grep以下を実行できます。

grep -oFf file1 file2 | sort | uniq | grep -Ff - file1

grep中間行の順序を維持する必要がない場合は、最後の行を削除してくださいfile1。
アクセス権がない場合は、gnu grep次のようにしますawk。

awk 'NR==FNR{z[$0]++;next};{for (l in z){if (index($0, l)) y[l]++}}
END{for (i in y) print i}' file1 file2

ベストアンサー1

おすすめ記事