python-3.x 如何在pandas中将数据框行更改为下一行

9udxz4iz  于 2023-03-31  发布在  Python
关注(0)|答案(5)|浏览(153)

我是一个noob python用户,我的目的是得到名称和转移到下一行

import pandas as pd
import numpy as np
df = pd.DataFrame({"1": ['Alfred', 'car', 'bike','Alex','car'],
                   "2": [np.nan, 'Ford', 'Giant',np.nan,'Toyota'],
                   "3": [pd.NaT, pd.Timestamp("2018-01-01"),
                            pd.Timestamp("2018-07-01"),np.nan,pd.Timestamp("2021-01-01")]})

我的目标结果如下
x一个一个一个一个x一个一个二个x
我试图寻找一些方法,但不能解决这个问题感谢看到我的帖子和帮助
谢谢mozway,jezrael,mcsoini帮助,这是工作,我会学习这些不同的方法.
约瑟夫Assaker我有一个问题,你的答案,当我运行下面的代码,并显示错误代码.我错过了什么??
一个三个三个一个

y3bcpkx1

y3bcpkx11#

想法是将Mark列的缺失值向前填充到Name列,然后在相同的掩码中过滤行:

df.columns = ["Transportation", "Mark", "BuyDate"]
m = df["Mark"].notna()
df["Name"] = df["transportation"].mask(m).ffill()
df = df[m].reset_index(drop=True)
print(df)
  Transportation    Mark    BuyDate    Name
0            car    Ford 2018-01-01  Alfred
1           bike   Giant 2018-07-01  Alfred
2            car  Toyota 2021-01-01    Alex
lzfw57am

lzfw57am2#

你可以使用一个helper列,然后使用一个forward fill:

# rename columns
df.columns = ["transportation", "Mark", "BuyDate"]
# assumption: the rows where "Mark" is NaN defines the name for the following rows
df["is_name"] = df["Mark"].isna()
# create a new column which is NaN everywhere except for the name rows
df["name"] = np.where(df.is_name, df["transportation"], np.nan)
# do a forward fill to extend the names to all rows
df["name"] = df["name"].fillna(method="ffill")
# filter by non-name rows and drop the temporary is_name column
df = df.loc[~df.is_name].drop("is_name", axis=1)
print(df)

Out:
  transportation    Mark    BuyDate    name
1            car    Ford 2018-01-01  Alfred
2           bike   Giant 2018-07-01  Alfred
4            car  Toyota 2021-01-01    Alex
z9smfwbn

z9smfwbn3#

你可以使用这个管道:

m = df.iloc[:,1].notna()
(df.assign(Name=df.iloc[:,0].mask(m).ffill()) # add new column
   .loc[m] # keep only the columns with info
   # below: rework df to fit output
   .rename(columns={'1': 'transportation', '2': 'Mark', '3': 'BuyDate'})
   .reset_index(drop=True)
)

输出:

transportation    Mark    BuyDate    Name
0            car    Ford 2018-01-01  Alfred
1           bike   Giant 2018-07-01  Alfred
2            car  Toyota 2021-01-01    Alex
0vvn1miw

0vvn1miw4#

您可以这样做:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({"1": ['Alfred', 'car', 'bike','Alex','car'],
...                    "2": [np.nan, 'Ford', 'Giant',np.nan,'Toyota'],
...                    "3": [pd.NaT, pd.Timestamp("2018-01-01"),
...                             pd.Timestamp("2018-07-01"),np.nan,pd.Timestamp("2021-01-01")]})
>>>
>>> df
        1       2          3
0  Alfred     NaN        NaT
1     car    Ford 2018-01-01
2    bike   Giant 2018-07-01
3    Alex     NaN        NaT
4     car  Toyota 2021-01-01
>>>
>>> new_df = pd.DataFrame(columns=['Transportation', 'Mark', 'BuyDate', 'Name'])
>>>
>>> j = 0
>>> for i in range(1, df.shape[0]):
...   if df.loc[i][1] is np.nan:
...     running_name = df.loc[i][0]
...     continue
...   new_df.loc[j] = list(df.loc[i]) + [running_name]
...   j += 1
...
>>> new_df
  Transportation    Mark    BuyDate    Name
0            car    Ford 2018-01-01  Alfred
1           bike   Giant 2018-07-01  Alfred
2            car  Toyota 2021-01-01    Alex
>>>
j1dl9f46

j1dl9f465#

col1=df.assign(col1=np.where(pd.isna(df['2']),df['1'],pd.NA)).col1.ffill()
df.assign(Name=col1).loc[pd.notna(df['2'])]

输出:

相关问题