我正在编写一个高斯过程回归算法。下面是代码:
% Data generating function
fh = @(x)(2*cos(2*pi*x/10).*x);
% range
x = -5:0.01:5;
N = length(x);
% Sampled data points from the generating function
M = 50;
selection = boolean(zeros(N,1));
j = randsample(N, M);
% mark them
selection(j) = 1;
Xa = x(j);
% compute the function and extract mean
f = fh(Xa) - mean(fh(Xa));
sigma2 = 1;
% computing the interpolation using all x's
% It is expected that for points used to build the GP cov. matrix, the
% uncertainty is reduced...
K = squareform(pdist(x'));
K = exp(-(0.5*K.^2)/sigma2);
% upper left corner of K
Kaa = K(selection,selection);
% lower right corner of K
Kbb = K(~selection,~selection);
% upper right corner of K
Kab = K(selection,~selection);
% mean of posterior
m = Kab'*inv(Kaa+0.001*eye(M))*f';
% cov. matrix of posterior
D = Kbb - Kab'*inv(Kaa + 0.001*eye(M))*Kab;
% sampling M functions from from GP
[A,B,C] = svd(Kaa);
F0 = A*sqrt(B)*randn(M,M);
% mean from GP using sampled points
F0m = mean(F0,2);
F0d = std(F0,0,2);
%%
% put together data and estimation
F = zeros(N,1);
S = zeros(N,1);
F(selection) = f' + F0m;
S(selection) = F0d;
% sampling M function from posterior
[A,B,C] = svd(D);
a = A*sqrt(B)*randn(N-M,M);
% mean from posterior GPs
Fm = m + mean(a,2);
Fmd = std(a,0,2);
F(~selection) = Fm;
S(~selection) = Fmd;
%%
figure;
% show what we got...
plot(x, F, ':r', x, F-2*S, ':b', x, F+2*S, ':b'), grid on;
hold on;
% show points we got
plot(Xa, f, 'Ok');
% show the whole curve
plot(x, fh(x)-mean(fh(x)), 'k');
grid on;
字符串
我希望得到一个很好的数字,其中未知数据点的不确定性很大,而采样数据点周围的不确定性很小。我得到了一个奇怪的数字,更奇怪的是采样数据点周围的不确定性比其他数据点更大。我做错了什么?
1条答案
按热度按时间pobjuy321#
你的代码有几处错误。以下是最重要的几点:
f
的索引。您正在定义Xa = x(j)
,但实际上应该执行Xa = x(selection)
,以便索引与您在内核矩阵K
上使用的索引一致。f = fh(Xa) - mean(fh(Xa))
没有任何意义,并且会使图中的圆偏离实际函数。(如果选择减去某个值,它应该是一个固定的数字或函数,并且不依赖于随机采样的观测值。)m
和D
计算后验均值和方差;不需要从后验中采样,然后获得这些样本的估计值。这里是修改后的版本的脚本与上述几点固定。
字符串
由此生成的图,其中包含5个随机选择的观测值,真实函数以黑色显示,后验均值以蓝色显示,置信区间以绿色显示。
x1c 0d1x的数据