Pyspark: 辞書を検索して列の値を置き換える質問する

Question

次のいずれかを使用できますna.replace:

df = spark.createDataFrame([
    ('Tablet', ), ('Phone', ),  ('PC', ), ('Other', ), (None, )
], ["device_type"])

df.na.replace(deviceDict, 1).show()

+-----------+
|device_type|
+-----------+
|     Mobile|
|     Mobile|
|    Desktop|
|      Other|
|       null|
+-----------+

またはマップリテラル:

from itertools import chain
from pyspark.sql.functions import create_map, lit

mapping = create_map([lit(x) for x in chain(*deviceDict.items())])


df.select(mapping[df['device_type']].alias('device_type'))

+-----------+
|device_type|
+-----------+
|     Mobile|
|     Mobile|
|    Desktop|
|       null|
|       null|
+-----------+

後者のソリューションでは、マッピングに存在しない値がに変換されることに注意してくださいNULL。この動作が望ましくない場合は、を追加できますcoalesce。

from pyspark.sql.functions import coalesce


df.select(
    coalesce(mapping[df['device_type']], df['device_type']).alias('device_type')
)

+-----------+
|device_type|
+-----------+
|     Mobile|
|     Mobile|
|    Desktop|
|      Other|
|       null|
+-----------+

Answer 1

次のいずれかを使用できますna.replace:

df = spark.createDataFrame([
    ('Tablet', ), ('Phone', ),  ('PC', ), ('Other', ), (None, )
], ["device_type"])

df.na.replace(deviceDict, 1).show()

+-----------+
|device_type|
+-----------+
|     Mobile|
|     Mobile|
|    Desktop|
|      Other|
|       null|
+-----------+

またはマップリテラル:

from itertools import chain
from pyspark.sql.functions import create_map, lit

mapping = create_map([lit(x) for x in chain(*deviceDict.items())])


df.select(mapping[df['device_type']].alias('device_type'))

+-----------+
|device_type|
+-----------+
|     Mobile|
|     Mobile|
|    Desktop|
|       null|
|       null|
+-----------+

後者のソリューションでは、マッピングに存在しない値がに変換されることに注意してくださいNULL。この動作が望ましくない場合は、を追加できますcoalesce。

from pyspark.sql.functions import coalesce


df.select(
    coalesce(mapping[df['device_type']], df['device_type']).alias('device_type')
)

+-----------+
|device_type|
+-----------+
|     Mobile|
|     Mobile|
|    Desktop|
|      Other|
|       null|
+-----------+

Pyspark: 辞書を検索して列の値を置き換える質問する

ベストアンサー1

おすすめ記事