有没有一种简单的方法可以从属性对象的Iterable构造一个PandasDataFrame?

vwhgwdsa  于 2022-12-02  发布在  其他
关注(0)|答案(1)|浏览(104)

我们可以用dataclass es这样做:

from dataclasses import dataclass
import pandas as pd

@dataclass
class MyDataClass:
    i: int
    s: str

df = pd.DataFrame([MyDataClass("a", 1), MyDataClass("b", 2)])

这使得DataFramedf如人们所期望的那样具有列is
对于attrs类,有没有简单的方法可以做到这一点?
我可以通过迭代对象的属性,构造一个类似dict[str, list](在本例中为{"i": [1, 2], "s": ["a", "b"]})类型的对象,并从中构造DataFrame来实现这一点,但直接支持attrs对象会更好。

gfttwv5a

gfttwv5a1#

You can access the dictionary at the heart of a dataclass like so

a = MyDataClass("a", 1)
a.__dict__

this outputs:

{'i': 'a', 's': 1}

Knowing this, if you have an iterable arr of type MyDataClass , you can access the __dict__ attribute and construct a dataframe

arr = [MyDataClass("a", 1), MyDataClass("b", 2)]
df = pd.DataFrame([x.__dict__ for x in arr])

df outputs:

i  s
0  a  1
1  b  2

The limitation with this approach that if the slots option is used, then this will not work.
Alternatively, it is possible to convert the data from a dataclass to a tuple or dictionary using dataclasses.astuple and dataclasses.asdict respectively.
The data frame can be also constructed using either of the following:

# using astuple
df = pd.DataFrame(
  [dataclasses.astuple(x) for x in arr], 
  columns=[f.name for f in dataclasses.fields(MyDataClass)]
)

# using asdict
df = pd.DataFrame([dataclasses.asdict(x) for x in arr])

相关问题