Creating dataframe from a dictionary where entries have different lengths Ask Question

Creating dataframe from a dictionary where entries have different lengths Ask Question

Say I have a dictionary with 10 key-value pairs. Each entry holds a numpy array. However, the length of the array is not the same for all of them.

How can I create a dataframe where each column holds a different entry?

When I try:

import pandas as pd
import numpy as np
from string import ascii_uppercase  # from the standard library

# repeatable sample data
np.random.seed(2023)
data = {k: np.random.randn(v) for k, v in zip(ascii_uppercase[:10], range(10, 20))}

df = pd.DataFrame(data)

I get:

ValueError: arrays must all be the same length

Any way to overcome this? I am happy to have Pandas use NaN to pad those columns for the shorter entries.

Desired Result

           A         B         C         D         E         F         G         H         I         J
0   0.711674 -1.076522 -1.502178 -1.519748  0.340619  0.051132  0.036537  0.367296  1.056500 -1.186943
1  -0.324485 -0.325682 -1.379593  2.097329 -1.253501 -0.238061  2.431822 -0.576828 -0.733918 -0.540638
2  -1.001871 -1.035498 -0.204455  0.892562  0.370788 -0.208009  0.422599 -0.416005 -0.083968 -0.638495
3   0.236251 -0.426320  0.642125  1.596488  0.455254  0.401304  1.843922 -0.137542  0.127288  0.150411
4  -0.102160 -1.029361 -0.181176 -0.638762 -2.283720  0.183169 -0.221562  1.294987  0.344423  0.919450
5  -1.141293 -0.521774  0.771749 -1.133047 -0.000822  1.235830  0.337117  0.520589  0.685970  0.910146
6   2.654407 -0.422758  0.741523  0.656597  2.398876 -0.291800 -0.557180 -0.194273  0.399908  1.605234
7   1.440605 -0.099244  1.324763  0.595787 -2.583105  0.029992  0.053141 -0.385593  0.893458  0.667165
8   0.098902 -1.380258  0.439287 -0.811120  1.311009 -0.868404  1.053804 -3.065784  0.384793  0.950338
9  -3.121532  0.301903 -0.557873 -0.300535 -1.579478  0.604346 -0.658515 -0.668181  0.641113  0.734329
10       NaN -1.033599  0.927080  1.008391 -0.840683  0.728554  1.844449  0.056965 -0.577314  1.015465
11       NaN       NaN -0.600727 -1.087762 -0.165509  1.364820 -0.075514 -0.909368 -0.819947  0.627386
12       NaN       NaN       NaN -1.787079 -2.068410  1.342694  0.264263 -1.487910  0.746819  1.062655
13       NaN       NaN       NaN       NaN  0.452739 -1.456708 -1.395359  1.169611  1.836805  0.262885
14       NaN       NaN       NaN       NaN       NaN  0.969357  0.708416  0.393677 -1.455490 -2.086486
15       NaN       NaN       NaN       NaN       NaN       NaN  0.762756  0.530569 -0.828721 -1.076369
16       NaN       NaN       NaN       NaN       NaN       NaN       NaN -0.586429 -0.609144 -0.507519
17       NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN -1.071297 -0.274501
18       NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN  1.848811

ベストアンサー1

In Python 3.x:

import pandas as pd
import numpy as np

d = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) )
    
pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in d.items() ]))

Out[7]: 
    A  B
0   1  1
1   2  2
2 NaN  3
3 NaN  4

In Python 2.x:

replace d.items() with d.iteritems().

おすすめ記事