What does the 'b' character do in front of a string literal? Ask Question

Question

Python 3.x makes a clear distinction between the types:

str = '...' literals = a sequence of characters. A “character” is a basic unit of text: a letter, digit, punctuation mark, symbol, space, or “control character” (like tab or backspace). The Unicode standard assigns each character to an integer code point between 0 and 0x10FFFF. (Well, more or less. Unicode includes ligatures and combining characters, so a string might not have the same number of code points as user-perceived characters.) Internally, str uses a flexible string representation that can use either 1, 2, or 4 bytes per code point.
bytes = b'...' literals = a sequence of bytes. A “byte” is the smallest integer type addressable on a computer, which is nearly universally an octet, or 8-bit unit, thus allowing numbers between 0 and 255.

If you're familiar with:

Java or C#, think of str as String and bytes as byte[];
SQL, think of str as NVARCHAR and bytes as BINARY or BLOB;
Windows registry, think of str as REG_SZ and bytes as REG_BINARY.

If you're familiar with C(++), then forget everything you've learned about char and strings, because a character is not a byte. That idea is long obsolete.

You use str when you want to represent text.

print('שלום עולם')

You use bytes when you want to represent low-level binary data like structs.

NaN = struct.unpack('>d', b'\xff\xf8\x00\x00\x00\x00\x00\x00')[0]

You can encode a str to a bytes object.

>>> '\uFEFF'.encode('UTF-8')
b'\xef\xbb\xbf'

And you can decode a bytes into a str.

>>> b'\xE2\x82\xAC'.decode('UTF-8')
'€'

But you can't freely mix the two types.

>>> b'\xEF\xBB\xBF' + 'Text with a UTF-8 BOM'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str

The b'...' notation is somewhat confusing in that it allows the bytes 0x01-0x7F to be specified with ASCII characters instead of hex numbers.

>>> b'A' == b'\x41'
True

しかし、強調しておかなければならないのは、文字はバイトではないということです。

>>> 'A' == b'A'
False

Python 2.xの場合

Python 3.0 より前のバージョンでは、テキストとバイナリデータのこのような区別はありませんでした。代わりに、次の区別がありました。

unicode=u'...'リテラル = Unicode 文字のシーケンス = 3.xstr
str='...'リテラル = 混同されたバイト/文字のシーケンス
- 通常はテキストで、何らかの未指定のエンコード方式でエンコードされます。
- しかし、出力のようなバイナリデータを表すためにも使用されますstruct.pack。

2.x から 3.x への移行を容易にするために、リテラル構文が Python 2.6 にバックポートされ、バイナリ文字列 (3.x にあるはず) とテキスト文字列 (3.x にあるはず)b'...'を区別できるようになりました。プレフィックスは 2.x では何も行いませんが、 3.x ではスクリプトに Unicode 文字列に変換しないように指示します。bytesstrb2to3

つまり、b'...'Python のリテラルの目的は PHP の場合と同じです。

また、単なる好奇心ですが、b と u 以外にも、他の機能を果たす記号はありますか?

プレフィックスrは生の文字列 (たとえば、タブの代わりにr'\t'バックスラッシュ + ) を作成し、三重引用符で囲むか、複数行の文字列リテラルを許可します。t'''...'''"""..."""

接頭f辞（Python 3.6で導入) は、Python 変数を参照できる「フォーマットされた文字列リテラル」を作成します。たとえば、f'My name is {name}.'はの省略形です'My name is {0}.'.format(name)。

Answer 1

Python 3.x makes a clear distinction between the types:

str = '...' literals = a sequence of characters. A “character” is a basic unit of text: a letter, digit, punctuation mark, symbol, space, or “control character” (like tab or backspace). The Unicode standard assigns each character to an integer code point between 0 and 0x10FFFF. (Well, more or less. Unicode includes ligatures and combining characters, so a string might not have the same number of code points as user-perceived characters.) Internally, str uses a flexible string representation that can use either 1, 2, or 4 bytes per code point.
bytes = b'...' literals = a sequence of bytes. A “byte” is the smallest integer type addressable on a computer, which is nearly universally an octet, or 8-bit unit, thus allowing numbers between 0 and 255.

If you're familiar with:

Java or C#, think of str as String and bytes as byte[];
SQL, think of str as NVARCHAR and bytes as BINARY or BLOB;
Windows registry, think of str as REG_SZ and bytes as REG_BINARY.

If you're familiar with C(++), then forget everything you've learned about char and strings, because a character is not a byte. That idea is long obsolete.

You use str when you want to represent text.

print('שלום עולם')

You use bytes when you want to represent low-level binary data like structs.

NaN = struct.unpack('>d', b'\xff\xf8\x00\x00\x00\x00\x00\x00')[0]

You can encode a str to a bytes object.

>>> '\uFEFF'.encode('UTF-8')
b'\xef\xbb\xbf'

And you can decode a bytes into a str.

>>> b'\xE2\x82\xAC'.decode('UTF-8')
'€'

But you can't freely mix the two types.

>>> b'\xEF\xBB\xBF' + 'Text with a UTF-8 BOM'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str

The b'...' notation is somewhat confusing in that it allows the bytes 0x01-0x7F to be specified with ASCII characters instead of hex numbers.

>>> b'A' == b'\x41'
True

しかし、強調しておかなければならないのは、文字はバイトではないということです。

>>> 'A' == b'A'
False

Python 2.xの場合

Python 3.0 より前のバージョンでは、テキストとバイナリデータのこのような区別はありませんでした。代わりに、次の区別がありました。

unicode=u'...'リテラル = Unicode 文字のシーケンス = 3.xstr
str='...'リテラル = 混同されたバイト/文字のシーケンス
- 通常はテキストで、何らかの未指定のエンコード方式でエンコードされます。
- しかし、出力のようなバイナリデータを表すためにも使用されますstruct.pack。

2.x から 3.x への移行を容易にするために、リテラル構文が Python 2.6 にバックポートされ、バイナリ文字列 (3.x にあるはず) とテキスト文字列 (3.x にあるはず)b'...'を区別できるようになりました。プレフィックスは 2.x では何も行いませんが、 3.x ではスクリプトに Unicode 文字列に変換しないように指示します。bytesstrb2to3

つまり、b'...'Python のリテラルの目的は PHP の場合と同じです。

また、単なる好奇心ですが、b と u 以外にも、他の機能を果たす記号はありますか?

プレフィックスrは生の文字列 (たとえば、タブの代わりにr'\t'バックスラッシュ + ) を作成し、三重引用符で囲むか、複数行の文字列リテラルを許可します。t'''...'''"""..."""

接頭f辞（Python 3.6で導入) は、Python 変数を参照できる「フォーマットされた文字列リテラル」を作成します。たとえば、f'My name is {name}.'はの省略形です'My name is {0}.'.format(name)。

What does the 'b' character do in front of a string literal? Ask Question

ベストアンサー1

Python 2.xの場合

おすすめ記事