BeautifulSoupとPythonでメタタグのコンテンツプロパティを取得する質問する

2024-07-05 • tag-icon

BeautifulSoupとPythonでメタタグのコンテンツプロパティを取得する質問する

私は Python と Beautiful soup を使用して、以下のタグのコンテンツ部分を抽出しようとしています。

<meta property="og:title" content="Super Fun Event 1" />
<meta property="og:url" content="http://superfunevents.com/events/super-fun-event-1/" />

BeautifulSoup でページを正常に読み込み、他のものを検索できるようになりました (これにより、ソースに隠された id タグから記事の id も取得されます)。ただし、html を検索してこれらのビットを検索する正しい方法がわかりません。find と findAll のさまざまなバリエーションを試しましたが、役に立ちませんでした。現在、コードは URL のリストを反復処理しています...

#!/usr/bin/env python
# -*- coding: utf-8 -*-

#importing the libraries
from urllib import urlopen
from bs4 import BeautifulSoup

def get_data(page_no):
    webpage = urlopen('http://superfunevents.com/?p=' + str(i)).read()
    soup = BeautifulSoup(webpage, "lxml")
    for tag in soup.find_all("article") :
        id = tag.get('id')
        print id
# the hard part that doesn't work - I know this example is well off the mark!        
    title = soup.find("og:title", "content")
    print (title.get_text())
    url = soup.find("og:url", "content")
    print (url.get_text())
# end of problem

for i in range (1,100):
    get_data(i)

og:title と og:content を見つけるためにビットを並べ替えるのを手伝ってくれる人がいたら、とても助かります!

ベストアンサー1

metaの最初の引数としてタグ名を指定しますfind()。次に、キーワード引数を使用して特定の属性を確認します。

title = soup.find("meta", property="og:title")
url = soup.find("meta", property="og:url")

print(title["content"] if title else "No meta title given")
print(url["content"] if url else "No meta url given")

タイトルと URL メタプロパティが常に存在することがわかっている場合、ここでのif/チェックはオプションになります。else

ベストアンサー1

おすすめ記事