r - predict.lm() in a loop. warning: prediction from a rank-deficient fit may be misleading -
r - predict.lm() in a loop. warning: prediction from a rank-deficient fit may be misleading -
#r code # fit regression model each cluster y <- list() length(y) <- k vars <- list() length(vars) <- k f <- list() length(f) <- k (i in 1:k) { vars[[i]] <- names(corc[[i]][corc[[i]]!= "1"]) f[[i]] <- as.formula(paste("death ~", paste(vars[[i]], collapse= "+"))) y[[i]] <- lm(f[[i]], data=c1[[i]]) #training set c1[[i]] <- cbind(c1[[i]], fitted(y[[i]])) c2[[i]] <- cbind(c2[[i]], predict(y[[i]], c2[[i]])) #test set }
hello,
i have training info set (c1) , test info set (c2). each 1 has 129 variables. did k means cluster analysis on c1 , split info set based on cluster membership , created list of different clusters (c1[[1]], c1[[2]], ..., c1[[k]]). assigned cluster membership each case in c2 , created c2[[1]],..., c2[[k]]. fit linear regression each cluster in c1. dependant variable "death". predictors different in each cluster , vars[[i]] (i=1,...,k) shows list of predictors' name. want predict death each case in test info set (c2[[1]],..., c2[[k]). when run next code, of clusters got error: in predict.lm(y[[i]], c2[[i]]) : prediction rank-deficient fit may misleading
i read alot warning couldn't figure out issue is. appreciate if perchance assist me this.
thanks, mahsa
you can inspect predict function body(predict.lm)
. there see line:
if (p < ncol(x) && !(missing(newdata) || is.null(newdata))) warning("prediction rank-deficient fit may misleading")
this warning checks if rank of info matrix @ to the lowest degree equal number of parameters want fit. 1 way invoke having collinear covariates:
data <- data.frame(y=c(1,2,3,4), x1=c(1,1,2,3), x2=c(3,4,5,2), x3=c(4,2,6,0), x4=c(2,1,3,0)) data2 <- data.frame(x1=c(3,2,1,3), x2=c(3,2,1,4), x3=c(3,4,5,1), x4=c(0,0,2,3)) fit <- lm(y ~ ., data=data) predict(fit, data2) 1 2 3 4 4.076087 2.826087 1.576087 4.065217 warning message: in predict.lm(fit, data2) : prediction rank-deficient fit may misleading
notice x3 , x4 have same direction in data
. 1 multiple of other. can checked length(fit$coefficients) > fit$rank
another way having more parameters available variables:
fit2 <- lm(y ~ x1*x2*x3*x4, data=data) predict(fit2, data2) warning message: in predict.lm(fit2, data2) : prediction rank-deficient fit may misleading
r
Comments
Post a Comment