Python 3を使用してUnicode文字を含むWebコンテンツを取得することはできません。

2024-06-29 • tag-icon

Python 3を使用してUnicode文字を含むWebコンテンツを取得することはできません。

python3を使用して特定のタグのWebページを読み取ろうとするとUnicode文字を処理できないため、タグを取得するUnicodeEncodeError: 'latin-1' codec can't encode character '\u201c' in position 145: ordinal not in range(256)ために正しい構文を使用する方法などのエラーが発生します。

これまでに試したMWEは次のとおりです。

import requests


page = requests.get("https://www.biblegateway.com/passage/?search=Genesis+35&version=NIV")

from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
story = soup.find_all('p') # to extract story title including <h3> tags
periods = [pt.get_text() for pt in story] # extract only data from <h3> tags
print (periods)

Python 3を使用してUnicode文字を含むWebコンテンツを取得することはできません。

ベストアンサー1

おすすめ記事