2つの単語のうち1つだけを含め、両方を含まない行数を計算する方法

2024-06-21 • tag-icon

テキストファイル（）でsumという単語を含む行数を数える必要がありますthe。anpoem.txt同時に両方ではありません。

使ってみました。

grep -c the poem.txt | grep -c an poem.txt

theしかし、合計合計数がan9行の場合、6という誤った答えが出ます。

単語自体ではなく単語を含む行数を計算したいです。実際の単語だけが計算されるのでandではtheありませんが。thereanPan

サンプルファイル：poem.txt

Where is the misty shark?
Where is she?
The small reef roughly fights the mast.
Where is the small gull?
Where is he?
The gull grows like a clear pirate.
Clouds fall like old mainlands.

She will Rise calmly like a dead pirate.
Eat an orange.
Warm, sunny sharks quietly pull a cold, old breeze.
All ships command rough, rainy sails.

Elvis Aaron Presley also known simply as the Elvis
He is also referred to as the King
The best-selling solo music artist of all time
He was the most commercially successful artist in many genres

He has many awards including a Grammy lifetime achievement
Elvis in the 1970s has numerous jumpsuits including an eagle one.

追加の説明：この時点でorを含む行数は何ですか？ただし、すべてのandを含む行は計算しないでtheください。anthean

the car is red - this counted
an apple is in the corner - not counted
hello i am big - not counted
where is an apple - counted

したがって、ここで出力は2でなければなりません。

編集：大文字と小文字の区別は心配しません。

最終編集：すべての助けに感謝します。この問題を正常に解決しました。答えの1つを使用していくつか変更しました。 cat poem.txt | grep -Evi -e '\<an .* the\>' -e '\<the .* an\>' | grep -Eci -e '\<(an|the)\>追加情報を得るために2番目のgrepをaに変更した方法です。すべての助けにもう一度ありがとう！ :)-c-n

ベストアンサー1

perl -nE 'END {say $c+0} ++$c if /\bthe\b/i xor /\ban\b/i' file

gawk 'END {print c+0} /\<the\>/ != /\<an\>/ {++c}' IGNORECASE=1 file

各式の一致結果を比較すると、所望の結果が得られる。

たとえば、一致結果は\<the\>0または1です。他の一致の結果が同じ場合、両方の正規表現が見つかったり見つからず、その行は計算されません。異なる場合、一致するものの1つが見つかりましたが、他のものは見つからなかったことを意味するため、カウンタが増加します。

gawkには次のxor()機能が組み込まれています。

gawk 'END {print c+0} xor(/\<the\>/,/\<an\>/) {++c}' IGNORECASE=1 file

ベストアンサー1

おすすめ記事