我对回归不是很熟悉。过去我只使用过一个因变量和一个自变量的简单线性回归，但现在我有一个不同的情况，只有两个因变量。在网上，here，我读到回归也可以在两个因变量之间进行，它被称为Multivariate Regression（也许它也可以有其他的名字，我不知道）.我在网络上和StackOverflow上搜索，但我发现很少或根本没有关于多元回归.如果可能的话，更喜欢使用scipy，但在现实中任何库将是受欢迎的.
我想确定这两个因变量之间的关系，以便进行预测，因为我需要知道A队对B队的进球数。因此，B队对A队的失球数是相同的。我想将进攻和防守的变量联系起来。y是A队的进球能力，而x是B队的进球能力。
我的因变量是：

x = [1, 2, 2, 3, 1]是team B的失球（在他过去的5场比赛中，从未对阵过A队）
y = [2, 1, 3, 0, 1]是team A的进球（在他过去的5场比赛中，从未与B队比赛）

现在，在第六场比赛中，A队和B队发生冲突，我想计算A队对B队的得分为2 goals（精确）的概率，考虑**两个因变量x和y。

import numpy
from scipy import stats

#dependent variable (x)
x = [1, 2, 2, 3, 1] #goals conceded by Team B

#dependent variable (y)
y = [2, 1, 3, 0, 1] #goals scored by Team A

字符串

更新

的数据

正如我在评论中所说的--仅仅有一系列盲目的目标是不够的;重要的是谁进球，谁进球。这可以被认为是每支球队的进攻和防守能力。下面是一个线性模型，近似进球数为~（得分球队的进攻能力）-（得分球队的防守能力）+一些顶级常量。如果你为不涉及利物浦和切尔西的比赛添加更多的比赛历史记录，这将提高准确性。

import numpy as np
import pandas as pd

goals_long = pd.Series(
    name='goals',
    data=(
        2, 1, 1, 0, 3, 2, 0, 1, 1, 1,
        1, 2, 2, 3, 2, 1, 3, 3, 1, 2,
    ),
    index=pd.MultiIndex.from_arrays(
        names=('scored by', 'scored on'),
        arrays=(
            (
                'Liverpool',   'West Ham',
                'Liverpool',   'Tottenham',
                'Liverpool',   'Crystal Palace',
                'Liverpool',   'Leeds',
                'Liverpool',   'Arsenal',
                'Wolver',      'Chelsea',
                'Aston Villa', 'Chelsea',
                'Arsenal',     'Chelsea',
                'Tottenham',   'Chelsea',
                'Everton',     'Chelsea',
            ),
            (
                'West Ham',       'Liverpool',
                'Tottenham',      'Liverpool',
                'Crystal Palace', 'Liverpool',
                'Leeds',          'Liverpool',
                'Arsenal',        'Liverpool',
                'Chelsea',        'Wolver',
                'Chelsea',        'Aston Villa',
                'Chelsea',        'Arsenal',
                'Chelsea',        'Tottenham',
                'Chelsea',        'Everton',
            ),
        ),
    ),
)
print('Game history:')
print(goals_long, end='\n\n')

pd.options.display.width = 0

names = goals_long.index.levels[0]
lhs = pd.DataFrame(
    data=np.zeros((goals_long.size, 2*len(names))),
    index=goals_long.index,
    columns=pd.MultiIndex.from_product(
        names=('group', 'name'),
        iterables=(
            ('offence', 'defence'), names,
        ),
    ),
)

for (scored_by, scored_on), val in goals_long.items():
    lhs.loc[(scored_by, scored_on), ('offence', scored_by)] = 1
    lhs.loc[(scored_by, scored_on), ('defence', scored_on)] = -1
lhs.loc[:, ('constant', '')] = 1

x, residuals, rank, s = np.linalg.lstsq(
    a=lhs,
    b=goals_long,
    rcond=None,
)

stats, (constant,) = np.split(x, (20,))
stats = pd.DataFrame(
    data=stats.reshape((2, -1)).T,
    columns=('offence', 'defence'),
    index=pd.Index(name='name', data=names),
)
print('Regressed statistics:')
print(stats, end='\n\n')

print('Predicted scores (for historical games):')
print(lhs @ x, end='\n\n')

lhs_lc = pd.DataFrame(
    columns=lhs.columns,
    data=np.zeros((2, lhs.shape[1])),
    index=pd.MultiIndex.from_arrays(
        names=('scored by', 'scored on'),
        arrays=(
            ('Liverpool', 'Chelsea'),
            ('Chelsea', 'Liverpool'),
        ),
    ),
)
lhs_lc.loc[('Liverpool', 'Chelsea'), ('offence', 'Liverpool')] = 1
lhs_lc.loc[('Liverpool', 'Chelsea'), ('defence', 'Chelsea')] = -1
lhs_lc.loc[('Chelsea', 'Liverpool'), ('offence', 'Chelsea')] = 1
lhs_lc.loc[('Chelsea', 'Liverpool'), ('defence', 'Liverpool')] = -1
print('Predicted scores (future game):')
print(lhs_lc @ x)

个字符
一个更愚蠢的方法，结合了第一队的得分能力和第二队的目标敏感性，产生一个正态分布，忽略了与其他球队的团队差异，是

import pandas as pd
from scipy.stats import norm

goals = pd.Series(
    name='goals',
    data=(
        2, 1, 1, 0, 3, 2, 0, 1, 1, 1,
        1, 2, 2, 3, 2, 1, 3, 3, 1, 2,
    ),
    index=pd.MultiIndex.from_arrays(
        names=('scored by', 'scored on'),
        arrays=(
            (
                'Liverpool',   'West Ham',
                'Liverpool',   'Tottenham',
                'Liverpool',   'Crystal Palace',
                'Liverpool',   'Leeds',
                'Liverpool',   'Arsenal',
                'Wolver',      'Chelsea',
                'Aston Villa', 'Chelsea',
                'Arsenal',     'Chelsea',
                'Tottenham',   'Chelsea',
                'Everton',     'Chelsea',
            ),
            (
                'West Ham',       'Liverpool',
                'Tottenham',      'Liverpool',
                'Crystal Palace', 'Liverpool',
                'Leeds',          'Liverpool',
                'Arsenal',        'Liverpool',
                'Chelsea',        'Wolver',
                'Chelsea',        'Aston Villa',
                'Chelsea',        'Arsenal',
                'Chelsea',        'Tottenham',
                'Chelsea',        'Everton',
            ),
        ),
    ),
)

pd.options.display.width = 0
print('Game history:')
print(goals, end='\n\n')

team_a = 'Liverpool'
team_b = 'Chelsea'
goal_test = 2
distro = norm(*norm.fit(
    pd.concat((
        goals.xs(key=team_a, level='scored by'),
        goals.xs(key=team_b, level='scored on'),
    ))
))
print(f'The probability of {team_a} scoring {goal_test} goals on {team_b} is '
      f'{distro.pdf(goal_test):.1%}')

The probability of Liverpool scoring 2 goals on Chelsea is 39.6%

的字符串

scipy 计算具有两个因变量的回归(多元)

更新

1条答案

相关问题

热门标签

最新问答