我正在对Loan Prediction数据集(Pandas dataframe)进行探索性数据分析。该dataframe有两列:Property_Area的值有三种类型- Rural,Urban,Semiurban。另一列是Loan_Status明智的值有两种类型- Y,N。我想绘制一个像这样的图表:沿着X轴应该有Property_Area,并且,对于每种类型的3个区域,我想显示沿着Y轴接受或拒绝贷款的百分比。如何做到这一点?
以下是我的数据示例:
data = pd.DataFrame({'Loan_Status':['N','Y','Y','Y','Y','N','N','Y','N','Y','N'],
'Property_Area': ['Rural', 'Urban','Urban','Urban','Urban','Urban',
'Semiurban','Urban','Semiurban','Rural','Semiurban']})
我试着这样做:
status = data['Loan_Status']
index = data['Property_Area']
df = pd.DataFrame({'Loan Status' : status}, index=index)
ax = df.plot.bar(rot=0)
data is the dataframe for the original dataset
输出:x1c 0d1x
**编辑:**我可以做我想做的事情,但是,为此,我不得不写一段很长的代码:
new_data = data[['Property_Area', 'Loan_Status']].copy()
count_rural_y = new_data[(new_data.Property_Area == 'Rural') & (data.Loan_Status == 'Y') ].count()
count_rural = new_data[(new_data.Property_Area == 'Rural')].count()
#print(count_rural[0])
#print(count_rural_y[0])
rural_y_percent = (count_rural_y[0]/count_rural[0])*100
#print(rural_y_percent)
#print("-"*50)
count_urban_y = new_data[(new_data.Property_Area == 'Urban') & (data.Loan_Status == 'Y') ].count()
count_urban = new_data[(new_data.Property_Area == 'Urban')].count()
#print(count_urban[0])
#print(count_urban_y[0])
urban_y_percent = (count_urban_y[0]/count_urban[0])*100
#print(urban_y_percent)
#print("-"*50)
count_semiurban_y = new_data[(new_data.Property_Area == 'Semiurban') & (data.Loan_Status == 'Y') ].count()
count_semiurban = new_data[(new_data.Property_Area == 'Semiurban')].count()
#print(count_semiurban[0])
#print(count_semiurban_y[0])
semiurban_y_percent = (count_semiurban_y[0]/count_semiurban[0])*100
#print(semiurban_y_percent)
#print("-"*50)
objects = ('Rural', 'Urban', 'Semiurban')
y_pos = np.arange(len(objects))
performance = [rural_y_percent,urban_y_percent,semiurban_y_percent]
plt.bar(y_pos, performance, align='center', alpha=0.5)
plt.xticks(y_pos, objects)
plt.ylabel('Loan Approval Percentage')
plt.title('Area Wise Loan Approval Percentage')
plt.show()
输出:
如果可能的话,你能给我一个更简单的方法吗?
1条答案
按热度按时间b4qexyjb1#
Pandas
Crosstabs
withnormalize
会让这个过程变得简单在pandas Dataframe 中获取2+列并获取 * 每行 * 的百分比的一种简单方法是使用
pandas
crosstab
函数和normalize = 'index'
以下是交叉表函数将如何查找它:
df_percent
,如下所示:*然后你可以很容易地将其绘制到条形图中:
Here you can see the code working in google colab
下面是我为这个答案生成的示例 Dataframe :
创建以下示例数据框: