R语言 回归曲线近似

6fe3ivhb  于 2023-09-27  发布在  其他
关注(0)|答案(3)|浏览(137)

我在求曲线的切向量。曲线的方程是一个问题,我有不同的点,基于这些点,我正在寻找函数的近似值,它描述曲线并拟合点。
当我绘制我的数据时,它看起来像这样:

应用多项式回归后(基于本文:https://www.statology.org/curve-fitting-in-r/)得到以下结果:

  1. fit <- lm(cl2[,3] ~ poly(cl2[,2], 3))
  2. summary(fit)
  1. Call:
  2. lm(formula = cl2[, 3] ~ poly(cl2[, 2], 3))
  3. Residuals:
  4. Min 1Q Median 3Q Max
  5. -0.31834 -0.10187 0.02132 0.09577 0.27393
  6. Coefficients:
  7. Estimate Std. Error t value Pr(>|t|)
  8. (Intercept) -109.89121 0.03789 -2900.217 < 2e-16 ***
  9. poly(cl2[, 2], 3)1 7.33365 0.16516 44.403 < 2e-16 ***
  10. poly(cl2[, 2], 3)2 -4.43572 0.16516 -26.857 4.25e-14 ***
  11. poly(cl2[, 2], 3)3 1.14772 0.16516 6.949 4.66e-06 ***
  12. ---
  13. Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 1
  14. Residual standard error: 0.1652 on 15 degrees of freedom
  15. Multiple R-squared: 0.9946, Adjusted R-squared: 0.9935
  16. F-statistic: 913.7 on 3 and 15 DF, p-value: < 2.2e-16

当我拟合曲线时,结果看起来不错:

  1. lines(cl2[,2], predict(fit, data.frame(cl2[,2:3])))

基于系数s,我假设曲线的方程为:
1.14x**3-4.43x**2+7.33*x-109
当我计算y的估计值时,我得到了非常奇怪的数字:
y_actual:
1 -108.4569 -108.1504 -108.0895 -108.0728 -108.0461 -108.1777 -108.2751 -108.4619 -108.6918 [10] -108.9750 -109.3552 -109.7625 -110.3328 -110.9580 -111.4312 -112.0062 -112.7337 -113.5880 [19] -114.3681
y_predicted:
1 -8935267 -8980331 -9044297 -9115821 -9166614 -9270340 -9355643 -9456574 -9533497 -9602089 [11] -9631113 -9670175 -9715100 -9754453 -9798813 - 9851816 -9880888 -9926067 -9940310
这是怎么回事?
我试着将poly函数的原始变量设置为TRUE,得到了不同的系数,但问题仍然存在。

编辑

dput的数据

  1. y_actual <-
  2. c(-108.4569, -108.1504, -108.0895, -108.0728, -108.0461, -108.1777,
  3. -108.2751, -108.4619, -108.6918, -108.975, -109.3552, -109.7625,
  4. -110.3328, -110.958, -111.4312, -112.0062, -112.7337, -113.588,
  5. -114.3681)
  6. y_predicted <-
  7. c(-8935267, -8980331, -9044297, -9115821, -9166614, -9270340,
  8. -9355643, -9456574, -9533497, -9602089, -9631113, -9670175, -9715100,
  9. -9754453, -9798813, -9851816, -9880888, -9926067, -9940310)

编辑:

  1. x_values <- c(-197.3419, -197.6753, -198.1467, -198.6710, -199.0418, -199.7946, -200.4095, -201.1323, -201.6797, -202.1654, -202.3702, -202.6451, -202.9605, -203.2359, -203.5455, -203.9142, -204.1158, -204.4285, -204.5268)

问题解决方案:
感谢大家的宝贵意见。帮了大忙。
我提出了以下函数:

  1. #find tangent line
  2. tangent_xy <- function(point_index, centerline){
  3. #fit the polynomial regression
  4. fit <- lm(centerline[,3] ~ poly(centerline[,2], 3, raw = T))
  5. # get coefficients
  6. cf <- fit$coefficients
  7. # equation of fitted curve
  8. (eq <- paste(sprintf('%s*x^%s', cf, seq_along(cf) - 1L), collapse='+'))
  9. # first derivative of fitted curve
  10. f <- D(parse(text = eq), "x")
  11. # calculate slope (value of derivative at given point)
  12. slope <- eval(f, envir = list(x = cl2[point_index,2]))
  13. #get coordinates of point
  14. x0 <- centerline[point_index, 2]
  15. y0 <- centerline[point_index, 3]
  16. # equation of tangent line
  17. y = slope*centerline[,2]-slope*x0+y0
  18. # points for plotting with lines function
  19. return(y)
  20. }

使用如下函数:

  1. curve(ff, min(cl2[,2]), max(cl2[,2]))
  2. lines(cl2[,2], tangent_xy(3, cl2))
  3. lines(cl2[,2], tangent_xy(12, cl2))
  4. lines(cl2[,2], tangent_xy(15, cl2))
  5. lines(cl2[,2], tangent_xy(7, cl2))

我得到以下输出:

它并不完美,但我只需要近似值,所以它就可以了。我将按照罗兰的建议调查GAM。也许这些会更好地工作。

zrfyljdw

zrfyljdw1#

要从coef因子生成方程,请使用sprintf表示幂级数。parse d转换成函数,我们可以用curvepoints上绘制它。

  1. > cf <- lm(cl2[, 3] ~ poly(cl2[, 2], 3, raw=TRUE)) |> coef()
  2. > (eq <- paste(sprintf('%s*x^%s', cf, seq_along(cf) - 1L), collapse='+'))
  3. [1] "-2007.84158109665*x^0+-24.0105660224912*x^1+-0.0949876187837792*x^2+-0.000111799887426291*x^3"
  4. > f <- eval(parse(text=paste('\\(x)', eq)))
  5. > curve(f, min(cl2[, 2]), max(cl2[, 2]), col=2, panel.first=points(cl2[, -1]))

  • 数据:*
  1. cl2 <- structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  2. 1, 1, -205, -204, -203, -202, -201, -200, -199, -198, -197, -196,
  3. -195, -194, -193, -192, -191, -190, -189, -188, -187, -114.3681,
  4. -113.588, -112.7337, -112.0062, -111.4312, -110.958, -110.3328,
  5. -109.7625, -109.3552, -108.975, -108.6918, -108.4619, -108.2751,
  6. -108.1777, -108.0461, -108.0728, -108.0895, -108.1504, -108.4569
  7. ), dim = c(19L, 3L), dimnames = list(NULL, c("", "x", "y")))
展开查看全部
zbdgwd5y

zbdgwd5y2#

下面是代码中的示例。由于没有提供,我对x值进行了近似。我上面的评论是:“使用raw=TRUE作为非正交多项式”。
蓝线是预测函数的预测值。而红线使用拟合的系数。请注意,绿色线使用相同的系数,但四舍五入到5位有效数字,并产生略有不同的曲线。
这是高阶方程的问题,一个微小的差异被乘以变成一个很大的差异。一个混乱的系统。

  1. y_actual <-
  2. c(-108.4569, -108.1504, -108.0895, -108.0728, -108.0461, -108.1777,
  3. -108.2751, -108.4619, -108.6918, -108.975, -109.3552, -109.7625,
  4. -110.3328, -110.958, -111.4312, -112.0062, -112.7337, -113.588,
  5. -114.3681)
  6. #approximating the x values
  7. x<- rev(seq(-204.5, -197, length.out=19))
  8. fit <- lm(y_actual ~ poly(x, 3, raw=TRUE))
  9. summary(fit)
  10. # Call:
  11. # lm(formula = y_actual ~ poly(x, 3, raw = TRUE))
  12. #
  13. # Residuals:
  14. # Min 1Q Median 3Q Max
  15. # -0.123971 -0.030200 -0.000881 0.033129 0.075555
  16. #
  17. # Coefficients:
  18. # Estimate Std. Error t value Pr(>|t|)
  19. # (Intercept) -1.924e+04 1.086e+04 -1.771 0.0968 .
  20. # poly(x, 3, raw = TRUE)1 -2.537e+02 1.623e+02 -1.563 0.1390
  21. # poly(x, 3, raw = TRUE)2 -1.099e+00 8.087e-01 -1.359 0.1942
  22. # poly(x, 3, raw = TRUE)3 -1.546e-03 1.343e-03 -1.151 0.2678
  23. # ---
  24. # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
  25. #
  26. # Residual standard error: 0.05382 on 15 degrees of freedom
  27. # Multiple R-squared: 0.9994, Adjusted R-squared: 0.9993
  28. # F-statistic: 8647 on 3 and 15 DF, p-value: < 2.2e-16
  29. y_predicted <- predict(fit, data.frame(x))
  30. #plotting the prediction
  31. plot(x, y_actual)
  32. lines(x, y_predicted, col="blue", lwd=3)
  33. #plotting the prediction with the coefficients
  34. y_pred_2 <- fit$coefficients[4]*x**3 + fit$coefficients[3]*x**2 + fit$coefficients[2]*x + fit$coefficients[1]
  35. lines(x, y_pred_2+0.001, col="red")
  36. #round the coefficents
  37. y_pred_3 <- -1.5455E-3*x**3 - 1.09927*x**2 - 253.68*x - 19238
  38. lines(x, y_pred_3, col="green")

展开查看全部
uwopmtnx

uwopmtnx3#

如果你没有一个基于科学的模型,并且需要凭经验来做,我建议你使用GAM。

  1. DF <- data.frame(x = x_values, y = y_actual)
  2. library(ggplot2)
  3. p <- ggplot(DF, aes(x, y)) +
  4. geom_point()
  5. library(mgcv)
  6. fit <- gam(y ~ s(x), data = DF)
  7. summary(fit)
  8. plot(fit)
  9. p + stat_function(fun = \(x) predict(fit, newdata = data.frame(x)), n = 1e3)

  1. GAMderivative <- function(fit, x0, eps = 1e-7) {
  2. #finite difference method
  3. Y0 <- predict(fit, data.frame(x = x0))
  4. Y1 <- predict(fit, data.frame(x = x0 + eps))
  5. dydx <- (Y1-Y0)/eps
  6. setNames(dydx, x0)
  7. }
  8. GAMderivative(fit, x0 = c(-203, -201, -199))
  9. # -203 -201 -199
  10. #1.7973066 0.3248605 0.1396629
展开查看全部

相关问题