HTMLファイルからJavaScriptを削除してプレーンテキストを保持する方法

Question

HTMLファイルからJavaScriptを削除してプレーンテキストを保持する方法は？

これは、正規表現を使用してトークンとメンテナンスの可能性を解析することに関連する別の問題を強調すると考えるので、興味深い質問です。

このスクリプトは、システムでPHPを使用できる場合にこれを行います。

#!/usr/local/bin/php
# point the #! to wherever your PHP commandline binary is

<?php

error_reporting(1);

$html = file_get_contents('http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags');

// create an object representing the DOM tree of the webpage
$document = new DOMDocument;
$document->loadHTML($html);

// store the <script> elements as a DOMN
$script_nodes = $document->getElementsByTagName('script');

// For some reason you can't use the DOMNode::removeChild method
// when iterating through an instance of PHP's DOMNodeList
// so use an array to queue the values in. see
// http://php.net/manual/en/domnode.removechild.php
$scripts_to_remove = [];
for ( $i=0; $i < $script_nodes->length; $i++ ) {
    $scripts_to_remove[] = $script_nodes->item($i);
}

// now we can iterate through the <script> nodes removing them
foreach ( $scripts_to_remove  as $s_node ) {
    $parent = $s_node->parentNode;
    $parent->removeChild($s_node);
}

// print out the new DOM as HTML
echo $document->saveHTML();

使用法

スクリプトを使用するには、上記のコードを含むファイルを設定して実行可能にし、タグを削除したHTMLを含める必要があるファイルに実行し、出力をリダイレクトします<script>。

Answer 1

HTMLファイルからJavaScriptを削除してプレーンテキストを保持する方法は？

これは、正規表現を使用してトークンとメンテナンスの可能性を解析することに関連する別の問題を強調すると考えるので、興味深い質問です。

このスクリプトは、システムでPHPを使用できる場合にこれを行います。

#!/usr/local/bin/php
# point the #! to wherever your PHP commandline binary is

<?php

error_reporting(1);

$html = file_get_contents('http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags');

// create an object representing the DOM tree of the webpage
$document = new DOMDocument;
$document->loadHTML($html);

// store the <script> elements as a DOMN
$script_nodes = $document->getElementsByTagName('script');

// For some reason you can't use the DOMNode::removeChild method
// when iterating through an instance of PHP's DOMNodeList
// so use an array to queue the values in. see
// http://php.net/manual/en/domnode.removechild.php
$scripts_to_remove = [];
for ( $i=0; $i < $script_nodes->length; $i++ ) {
    $scripts_to_remove[] = $script_nodes->item($i);
}

// now we can iterate through the <script> nodes removing them
foreach ( $scripts_to_remove  as $s_node ) {
    $parent = $s_node->parentNode;
    $parent->removeChild($s_node);
}

// print out the new DOM as HTML
echo $document->saveHTML();

使用法

スクリプトを使用するには、上記のコードを含むファイルを設定して実行可能にし、タグを削除したHTMLを含める必要があるファイルに実行し、出力をリダイレクトします<script>。

HTMLファイルからJavaScriptを削除してプレーンテキストを保持する方法

ベストアンサー1

使用法

おすすめ記事