What does the 'b' character do in front of a string literal? Ask Question

What does the 'b' character do in front of a string literal? Ask Question

Apparently, the following is the valid syntax:

b'The string'

I would like to know:

  1. What does this b character in front of the string mean?
  2. What are the effects of using it?
  3. What are appropriate situations to use it?

I found a related question right here on SO, but that question is about PHP though, and it states the b is used to indicate the string is binary, as opposed to Unicode, which was needed for code to be compatible from version of PHP < 6, when migrating to PHP 6. I don't think this applies to Python.

I did find this documentation on the Python site about using a u character in the same syntax to specify a string as Unicode. Unfortunately, it doesn't mention the b character anywhere in that document.

Also, just out of curiosity, are there more symbols than the b and u that do other things?

ベストアンサー1

Python 3.x makes a clear distinction between the types:

  • str = '...' literals = a sequence of characters. A “character” is a basic unit of text: a letter, digit, punctuation mark, symbol, space, or “control character” (like tab or backspace). The Unicode standard assigns each character to an integer code point between 0 and 0x10FFFF. (Well, more or less. Unicode includes ligatures and combining characters, so a string might not have the same number of code points as user-perceived characters.) Internally, str uses a flexible string representation that can use either 1, 2, or 4 bytes per code point.
  • bytes = b'...' literals = a sequence of bytes. A “byte” is the smallest integer type addressable on a computer, which is nearly universally an octet, or 8-bit unit, thus allowing numbers between 0 and 255.

If you're familiar with:

  • Java or C#, think of str as String and bytes as byte[];
  • SQL, think of str as NVARCHAR and bytes as BINARY or BLOB;
  • Windows registry, think of str as REG_SZ and bytes as REG_BINARY.

If you're familiar with C(++), then forget everything you've learned about char and strings, because a character is not a byte. That idea is long obsolete.

You use str when you want to represent text.

print('שלום עולם')

You use bytes when you want to represent low-level binary data like structs.

NaN = struct.unpack('>d', b'\xff\xf8\x00\x00\x00\x00\x00\x00')[0]

You can encode a str to a bytes object.

>>> '\uFEFF'.encode('UTF-8')
b'\xef\xbb\xbf'

And you can decode a bytes into a str.

>>> b'\xE2\x82\xAC'.decode('UTF-8')
'€'

But you can't freely mix the two types.

>>> b'\xEF\xBB\xBF' + 'Text with a UTF-8 BOM'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str

The b'...' notation is somewhat confusing in that it allows the bytes 0x01-0x7F to be specified with ASCII characters instead of hex numbers.

>>> b'A' == b'\x41'
True

しかし、強調しておかなければならないのは、文字はバイトではないということです

>>> 'A' == b'A'
False

Python 2.xの場合

Python 3.0 より前のバージョンでは、テキストとバイナリ データのこのような区別はありませんでした。代わりに、次の区別がありました。

  • unicode=u'...'リテラル = Unicode 文字のシーケンス = 3.xstr
  • str='...'リテラル = 混同されたバイト/文字のシーケンス
    • 通常はテキストで、何らかの未指定のエンコード方式でエンコードされます。
    • しかし、出力のようなバイナリ データを表すためにも使用されますstruct.pack

2.x から 3.x への移行を容易にするために、リテラル構文が Python 2.6 にバックポートされ、バイナリ文字列 (3.x にあるはず) とテキスト文字列 (3.x にあるはず)b'...'を区別できるようになりました。プレフィックスは 2.x では何も行いませんが、 3.x ではスクリプトに Unicode 文字列に変換しないように指示します。bytesstrb2to3

つまり、b'...'Python のリテラルの目的は PHP の場合と同じです。

また、単なる好奇心ですが、b と u 以外にも、他の機能を果たす記号はありますか?

プレフィックスrは生の文字列 (たとえば、タブの代わりにr'\t'バックスラッシュ + ) を作成し、三重引用符で囲むか、複数行の文字列リテラルを許可します。t'''...'''"""..."""

接頭f辞(Python 3.6で導入) は、Python 変数を参照できる「フォーマットされた文字列リテラル」を作成します。たとえば、f'My name is {name}.'は の省略形です'My name is {0}.'.format(name)

おすすめ記事