Cleaning the values of a multitype data frame in python/pandas, I want to trim the strings. I am currently doing it in two instructions :
import pandas as pd
df = pd.DataFrame([[' a ', 10], [' c ', 5]])
df.replace('^\s+', '', regex=True, inplace=True) #front
df.replace('\s+$', '', regex=True, inplace=True) #end
df.values
This is quite slow, what could I improve ?
ベストアンサー1
You can use DataFrame.select_dtypes
列を選択してstring
からapply
機能するstr.strip
。
注意:はであるため、値はや のtypes
ようにはなりません。dicts
lists
dtypes
object
df_obj = df.select_dtypes('object')
#if need also processing string categories
#df_obj = df.select_dtypes(['object', 'category'])
print (df_obj)
0 a
1 c
df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip())
print (df)
0 1
0 a 10
1 c 5
しかし、列が数列しかない場合はstr.strip
:
df[0] = df[0].str.strip()