Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

A Study on Cross Validation for Model Selection and Estimation


Affiliations
1 Division of Agricultural Statistics (SKUAST-K), Shalimar (J&K), India
2 Division of Statistics and Computer Science, (SKUAST-J), Main Campus, Chatha (Jammu), India
     

   Subscribe/Renew Journal


In the present study, k-fold cross validation method was examined for performance evaluation of different regression models. A multistage sampling technique was adopted for the selection of samples in which districts, villages within districts and fodder trees in the selected village formed the first stage, second stage and third stage units, respectively. A total number of 10 trees were randomly selected from each village so as to constitute a predetermined total sample size of 60 trees. Primary data on height, bole height, diameter at breast height (dbh), no. of primary branches, secondary branches, average no. of leaves per secondary branch, age, canopy diameter and green fodder yield (dependent variable) for each selected tree were collected through visiting farmers field in the selected area and by adopting standard forest mensuration procedures. Regression analysis was used to study the relationship between fodder yield (dependent variable) and other parameters. Different regression models were tried and on the basis of adj. R2, the best five models were selected. Goodness of fit of the selected models was tested by applying chi-square test. The chi-square test results came out to be insignificant indicating thereby that the models under study were qualified for goodness of fit and could be used for further study. The models were validated for its adequacy through different criteria, namely, adj. R2, bias, variance, ischolar_main mean square error and coefficient of dispersion. On the basis of set criteria, the models were ranked. After applying the Wilcoxon signed rank test on fitting data set, one can arrive at the final ranks by considering ranks of both fitting (Rf) and validating (Rv) data sets. Finally, on the basis of all the criteria adopted in the present investigation, the regression model obtained as Ŷ=8.480+0.000004 L2S ranked first, where Ŷ=estimated fodder yield, L=avg. no. of leaves per secondary branch (S) and hence, recommended for fodder yield prediction of Grewia optiva for the present study area.

Keywords

Cross Validation, Regression Analysis, Goodness of Fit, Grewia optiva.
Subscription Login to verify subscription
User
Notifications
Font Size


  • Cao, Q.V., Burkhart, H.E. and Max, T. A. (1980). Evaluation of two methods for cubicvolume prediction of loblolly pine to any merchantable limit. Forest Sci., 26 (1) : 71-80.
  • Caswell, H. (1976). The validation problem. In: Patten, B. (Ed.), Systems analysis and simulation in ecology, vol. 4. Academic Press, New York, pp. 313-325.
  • Chandra, J.P. and Sharma, R.K. (1977). Note on nursery technique of beul (Grewia oppositifolia). Indian Forester, 103 (10) : 684-685.
  • Gelfand, A.E., Dey, D.K. and Chang, H. (1992). Model determination using predictive distributions with implementation via sampling based methods. Technical Report No. 462, Department of Statistics, Stanford University, Stanford, California, 38 pp.
  • Gentil, S. and Blake, G. (1981). Validation of complex ecosystem models. Ecol. Modelling, 14 : 21-38.
  • Hastie, T., Tibshirani, R. and Friedman J. (2009).The elements of statistical learning: data mining, inference and prediction 2009. 2nd Ed. Springer Series in Statistics,745.
  • Joshi, N.K. and Dhiman, R.C. (1992). Lopping yield studies of Grewia optiva Drummond. Van Vigyan, 30(2) : 80-85.
  • Larson, S. (1931). The shrinkage of the co-efficient of multiple correlation. J. Edu. Psychol., 22 : 45-55.
  • Laurie, M.V. (1945). Fodder trees in india. pp. 17-82. FRI Dehradun.
  • Loehle, C. (1997). A hypothesis testing framework for evaluating ecosystem model performance. Ecol. Modelling 97 : 153-165.
  • Mayer, D.G. and Butler, D.G. (1993). Statistical validation. Ecol. Modelling, 68 : 21-32.
  • Mosteller, F. and Turkey, J.W. (1968). Data analysis, including statistics. In: Handbook of social psychology. Addison-Wesley, pp. 601-720.
  • Oreskes, N., Shrader-Frechette, K. and Belitz, K. (1994). Verification, validation, and confirmation of numerical models in the earthsciences. Science, 263 : 641-646.
  • Reynolds, Jr. M.R., Burkhart, H.E. and Daniels, R.F. (1981). Procedures for statistical validation of stochastic simulation models. Forest Sci., 27 (2) : 349-364.
  • Robinson, A.P. and Ek, A.R. (2000). The consequences of hierarchy for modelling in forest ecosystems. Can. J. Forest Res., 30 (12) : 1837-1846.
  • Rykiel, E.J. (1996). Testing ecological models - the meaning of validation. Ecol. Modelling, 90 (3) : 229-244.
  • Sehgal, R. N. and Chauhan, V. (1989).Grewia optiva an ideal agroforestry tree of western Himalaya. Farm Forestry News 5. Winrock International, USA.
  • Shao, J. (1993). Linear model selection by cross-validation. J. Am. Stat. Assoc., 88 : 486-494.
  • Snee, R.D. (1977). Validation of regression models: methods and examples. Technometrics, 19 : 415-428.
  • Stone, M. (1974). Cross-validatory choice and the assessment of statistical predictions. J. Roy. Stat. Soc. Ser B., 36:111-133.
  • Vanclay, J.K. and Skovsgaard, J.P. (1997). Evaluating forest growth models. Ecol. Modelling, 98 (1) : 1-12.
  • Wani, F. J.., Rizvi, S. E. H. and Sharma, M. K. (2015).Statistical Modelling for fodder yield estimation of Grewia optiva in Jammu Shiwaliks. Internat. J. Agric. & Statistical Sci., 11(1) : 139-142.
  • Wilcoxon, Frank (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1 (6) : 80-83.

Abstract Views: 432

PDF Views: 0




  • A Study on Cross Validation for Model Selection and Estimation

Abstract Views: 432  |  PDF Views: 0

Authors

Fehim Jeelani Wani
Division of Agricultural Statistics (SKUAST-K), Shalimar (J&K), India
S. E. H. Rizvi
Division of Statistics and Computer Science, (SKUAST-J), Main Campus, Chatha (Jammu), India
Manish Kumar Sharma
Division of Statistics and Computer Science, (SKUAST-J), Main Campus, Chatha (Jammu), India
M. Iqbal Jeelani Bhat
Division of Statistics and Computer Science, (SKUAST-J), Main Campus, Chatha (Jammu), India

Abstract


In the present study, k-fold cross validation method was examined for performance evaluation of different regression models. A multistage sampling technique was adopted for the selection of samples in which districts, villages within districts and fodder trees in the selected village formed the first stage, second stage and third stage units, respectively. A total number of 10 trees were randomly selected from each village so as to constitute a predetermined total sample size of 60 trees. Primary data on height, bole height, diameter at breast height (dbh), no. of primary branches, secondary branches, average no. of leaves per secondary branch, age, canopy diameter and green fodder yield (dependent variable) for each selected tree were collected through visiting farmers field in the selected area and by adopting standard forest mensuration procedures. Regression analysis was used to study the relationship between fodder yield (dependent variable) and other parameters. Different regression models were tried and on the basis of adj. R2, the best five models were selected. Goodness of fit of the selected models was tested by applying chi-square test. The chi-square test results came out to be insignificant indicating thereby that the models under study were qualified for goodness of fit and could be used for further study. The models were validated for its adequacy through different criteria, namely, adj. R2, bias, variance, ischolar_main mean square error and coefficient of dispersion. On the basis of set criteria, the models were ranked. After applying the Wilcoxon signed rank test on fitting data set, one can arrive at the final ranks by considering ranks of both fitting (Rf) and validating (Rv) data sets. Finally, on the basis of all the criteria adopted in the present investigation, the regression model obtained as Ŷ=8.480+0.000004 L2S ranked first, where Ŷ=estimated fodder yield, L=avg. no. of leaves per secondary branch (S) and hence, recommended for fodder yield prediction of Grewia optiva for the present study area.

Keywords


Cross Validation, Regression Analysis, Goodness of Fit, Grewia optiva.

References