2つのパターン間の文字列の取得中にエラーが発生しました。

2024-06-25 • tag-icon

grep search

2つのパターン間の文字列の取得中にエラーが発生しました。

2つのパターンの間に文字列を取得したいと思います。モードは<p> </p>htmlファイルの最初の環境です。

<p>Sorcery, 
          R (1)
          </p>
        <p class="ctext"><b>As an additional cost to cast Goblin Grenade, sacrifice a Goblin.<br><br>Goblin Grenade deals 5 damage to target creature or player.</b></p>


      <p><i>Don't underestimate the aerodynamic qualities of the common goblin.</i></p>
      <p>Illus. Kev Walker</p>

環境はファイルの最初のものなので、<p>以前まで一致したものをすべて削除してください</p>。

name="goblin grenade"
wget -O- http://magiccards.info/query?q="$name" | grep -oP '<p>\K[^<]+'

なぜ正しく機能しないのかわかりません。わかりました。

Sorcery, 
Illus. Kev Walker

ベストアンサー1

HTML の解析に正規表現を使用せず、代わりに適切な HTML パーサーを使用してください。

理論：

コンパイル理論によると、HTMLは正規表現ベースの解析を使用できません。有限状態マシン。 HTMLの階層のため、次のものを使用する必要があります。プッシュダウンオートマトン操作して左利き受容体ツールに似た構文の使用アクリル。

realLife©®™ルーチンツール：

代わりに、正しい作業に適したツールを使用する必要があります。

...これは仕事です。xmllint:

渡す文字列一致:

string="Sorcery"
xmllint --html --xpath "//p[contains(text(), '$string')]/text()" file_or_URL

N番目の<p>ノードを介して（ここでNは1です）：

xmllint --html --xpath "//p[1]/text()" file_or_URL

確認するhttps://stackoverflow.com/questions/1732348/regex-match-open-tags-book-xhtml-self-contained-tags

おすすめ記事