CLIでのデフォルトのWebスクレイピング

Question

そして

curl "http://clojurescript.net/" | scrape -be '//body/script' | xml2json | jq '.html.body.script[].src

あなたは

"http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"
"http://kanaka.github.io/cljs-bootstrap/web/jqconsole.min.js"
"http://kanaka.github.io/cljs-bootstrap/web/jq_readline.js"
"http://kanaka.github.io/cljs-bootstrap/web/repl-web.js"
"http://kanaka.github.io/cljs-bootstrap/web/repl-main.js"

これらのツールは次のとおりです。

すごいJQhttps://stedolan.github.io/jq/;
刈るhttps://github.com/jeroenjanssens/data-science-at-the-command-line/blob/master/tools/scrape;
xml2jsonhttps://github.com/Inist-CNRS/node-xml2json-command。

または以下を使用して：

curl "http://clojurescript.net/" | hxnormalize -x | hxselect -i 'body > script' |  grep -oP '(http:.*?)(")' | sed 's/"//g'

あなたは:

http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js
http://kanaka.github.io/cljs-bootstrap/web/jqconsole.min.js
http://kanaka.github.io/cljs-bootstrap/web/jq_readline.js
http://kanaka.github.io/cljs-bootstrap/web/repl-web.js
http://kanaka.github.io/cljs-bootstrap/web/repl-main.js

Answer 1

そして

curl "http://clojurescript.net/" | scrape -be '//body/script' | xml2json | jq '.html.body.script[].src

あなたは

"http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"
"http://kanaka.github.io/cljs-bootstrap/web/jqconsole.min.js"
"http://kanaka.github.io/cljs-bootstrap/web/jq_readline.js"
"http://kanaka.github.io/cljs-bootstrap/web/repl-web.js"
"http://kanaka.github.io/cljs-bootstrap/web/repl-main.js"

これらのツールは次のとおりです。

すごいJQhttps://stedolan.github.io/jq/;
刈るhttps://github.com/jeroenjanssens/data-science-at-the-command-line/blob/master/tools/scrape;
xml2jsonhttps://github.com/Inist-CNRS/node-xml2json-command。

または以下を使用して：

curl "http://clojurescript.net/" | hxnormalize -x | hxselect -i 'body > script' |  grep -oP '(http:.*?)(")' | sed 's/"//g'

あなたは:

http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js
http://kanaka.github.io/cljs-bootstrap/web/jqconsole.min.js
http://kanaka.github.io/cljs-bootstrap/web/jq_readline.js
http://kanaka.github.io/cljs-bootstrap/web/repl-web.js
http://kanaka.github.io/cljs-bootstrap/web/repl-main.js

CLIでのデフォルトのWebスクレイピング

ベストアンサー1

おすすめ記事