你能帮帮我吗?
我有两个panda Dataframe ,我需要从列表中第一个 Dataframe 的特定列中每隔10个值提取一个值。然后我需要在第二个 Dataframe 中创建一个新列,并将这些值放入此列。我还需要在两个 Dataframe 中匹配行号。以下是代码:
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt
# we will use random forest classifier as our classifier
forest_classifier=RandomForestClassifier(max_depth=4)
# reading in accelerometer data
train_time = pd.read_csv("https://courses.edx.org/assets/courseware/v1/b98039c3648763aae4f153a6ed32f38b/asset-v1:HarvardX+PH526x+3T2022+type@asset+block/train_time_series.csv", index_col=0)
train_labels = pd.read_csv("https://courses.edx.org/assets/courseware/v1/d64e74647423e525bbeb13f2884e9cfa/asset-v1:HarvardX+PH526x+3T2022+type@asset+block/train_labels.csv", index_col=0)
x = []
y = []
z = []
# making lists out of the x, y, z columns
for i in range(3, len(train_time), 10):
x.append(train_time.iloc[i]["x"])
y.append(train_time.iloc[i]["y"])
z.append(train_time.iloc[i]["z"])
print(z) # checking the list
# making new columns in train_labels file with the obtained lists
train_labels["x"] = pd.Series([x])
train_labels["y"] = pd.Series([y])
train_labels["z"] = pd.Series([z])
train_labels.head()
但是我得到了输出,其中创建的列只有“NaN”的值
输出应该是包含创建的x、y、z列的 Dataframe ,这些列具有相应的观测数。
1条答案
按热度按时间qhhrdooz1#
您可以堆叠
x
、y
和z
并用途:输出:
Note 1:您的代码不起作用,因为您创建
pd.Series
时没有使用train_labels
中的索引,因此索引未对齐:注2:您也可以避免循环:
因此代码可能是: