pandas 在原始DataFrame中插入行差值

b4lqfgs4  于 2023-11-15  发布在  其他
关注(0)|答案(2)|浏览(114)

我有一个read.csv DataFrame,它在每次运行脚本时都会不断更新一个新行,看起来像...

df = pd.read_csv(file_path)
print(df.to_string(index=False))

timestamp    Puts   Calls  PutCh  CallCh  ChDiff

09:41:12 AM 2027891 1820724 280101  200974   79127
09:48:51 AM 2075976 1862053 328186  242303   85883
09:58:48 AM 2091487 1885842 343697  266092   77605
10:08:21 AM 2091879 1918592 344089  298842   45247
02:26:00 PM 1995234 1941917 247444  322167  -74723
02:44:36 PM 1990071 1934874 242281  315124  -72843
02:56:17 PM 1970892 1938472 223102  318722  -95620

字符串
现在我想从我读过的关于df.diff()的文章中得到每一个后续行的差异。所以我删除了时间戳列,得到一个新的名称为df1的文件,并编写了我的脚本。

df1.diff()


得到的输出是....

Puts   Calls    PutCh  CallCh    ChDiff
     NaN     NaN      NaN     NaN       NaN
 48085.0 41329.0  48085.0 41329.0    6756.0
 15511.0 23789.0  15511.0 23789.0   -8278.0
   392.0 32750.0    392.0 32750.0  -32358.0
-96645.0 23325.0 -96645.0 23325.0 -119970.0
 -5163.0 -7043.0  -5163.0 -7043.0    1880.0
-19179.0  3598.0 -19179.0  3598.0  -22777.0


在这里,我希望将这些差值添加到原始DataFrame(df)中的每一列的括号中。更详细地说,我的输出应该是这样的(这里的时间戳列也应该像我的df一样).

Puts    Calls   PutCh   CallCh  ChDiff
2027891 1820724 280101  200974  79127
2075976 1862053 328186  242303  85883
(48085) (41329) (48085) (41329) (6756)
2091487 1885842 343697  266092  77605
(15511) (23789) (15511) (23789) (-8278)
2091879 1918592 344089  298842  45247
(392)   (32750) (392)   (32750) (-32358)


有没有办法做到这一点?

yduiuuwa

yduiuuwa1#

diff的输出转换为字符串,将圆括号和concat添加回原始值,最后添加sort_index以按顺序重新组织行:

tmp = (df.drop(columns='timestamp').diff()
         .iloc[1:]
         .apply(lambda s: '('+s.astype(str)+')')
      )

out = pd.concat([df, tmp]).sort_index()

字符串
输出量:

timestamp        Puts      Calls       PutCh     CallCh       ChDiff
0  09:41:12 AM     2027891    1820724      280101     200974        79127
1  09:48:51 AM     2075976    1862053      328186     242303        85883
1          NaN   (48085.0)  (41329.0)   (48085.0)  (41329.0)     (6756.0)
2  09:58:48 AM     2091487    1885842      343697     266092        77605
2          NaN   (15511.0)  (23789.0)   (15511.0)  (23789.0)    (-8278.0)
3  10:08:21 AM     2091879    1918592      344089     298842        45247
3          NaN     (392.0)  (32750.0)     (392.0)  (32750.0)   (-32358.0)
4  02:26:00 PM     1995234    1941917      247444     322167       -74723
4          NaN  (-96645.0)  (23325.0)  (-96645.0)  (23325.0)  (-119970.0)
5  02:44:36 PM     1990071    1934874      242281     315124       -72843
5          NaN   (-5163.0)  (-7043.0)   (-5163.0)  (-7043.0)     (1880.0)
6  02:56:17 PM     1970892    1938472      223102     318722       -95620
6          NaN  (-19179.0)   (3598.0)  (-19179.0)   (3598.0)   (-22777.0)

带有termcolor的变体:

from termcolor import colored

tmp = (df.drop(columns='timestamp').diff()
         .iloc[1:]
         .applymap(lambda x: colored(f'({x})',
                                     'green' if x>0 else 'red')
                  )
      )
out = (pd.concat([df.applymap(lambda x: colored(x, 'white')), tmp])
         .fillna(colored('', 'white'))
         .sort_index(kind='stable')
         .rename(columns=lambda x: colored(x, 'white', None))
      )

print(out.to_string(index=False))


输出量:


的数据

bzzcjhmw

bzzcjhmw2#

您可以将差异转换为字符串以添加'()',然后在对结果进行排序之前连接您的字符串:

df1 = '(' + df.iloc[:, 1:].diff().dropna(how='all').astype(int).astype(str) + ')'
out = pd.concat([df, df1]).fillna('').sort_index(kind='stable')
out.index = np.where(out.index.duplicated(), '', out.index)

字符串
输出量:

>>> out
     timestamp      Puts    Calls     PutCh   CallCh     ChDiff
0  09:41:12 AM   2027891  1820724    280101   200974      79127
1  09:48:51 AM   2075976  1862053    328186   242303      85883
                 (48085)  (41329)   (48085)  (41329)     (6756)
2  09:58:48 AM   2091487  1885842    343697   266092      77605
                 (15511)  (23789)   (15511)  (23789)    (-8278)
3  10:08:21 AM   2091879  1918592    344089   298842      45247
                   (392)  (32750)     (392)  (32750)   (-32358)
4  02:26:00 PM   1995234  1941917    247444   322167     -74723
                (-96645)  (23325)  (-96645)  (23325)  (-119970)
5  02:44:36 PM   1990071  1934874    242281   315124     -72843
                 (-5163)  (-7043)   (-5163)  (-7043)     (1880)
6  02:56:17 PM   1970892  1938472    223102   318722     -95620
                (-19179)   (3598)  (-19179)   (3598)   (-22777)

相关问题