Interpretable machine learning methods to explain on-farm yield variability of high productivity wheat in Northwest India

Nayak, H S and Silva, J S and Parihar, C M and Krupnik, T J and Sena, D R and Kakraliya, S K and Jat, H S and Sidhu, H S and Sharma, P C and Jat, M L and Sapkota, T B (2022) Interpretable machine learning methods to explain on-farm yield variability of high productivity wheat in Northwest India. Field Crops Research (TSI), 287. pp. 1-14. ISSN 0378-4290

Full text not available from this repository. (Request a copy)

Abstract

The increasing availability of complex, geo-referenced on-farm data demands analytical frameworks that can guide crop management recommendations. Recent developments in interpretable machine learning techniques offer opportunities to use these methods in agronomic studies. Our objectives were two-fold: (1) to assess the performance of different machine learning methods to explain on-farm wheat yield variability in the Northwestern Indo-Gangetic Plains of India, and (2) to identify the most important drivers and interactions explaining wheat yield variability. A suite of fine-tuned machine learning models (ridge and lasso regression, classification and regression trees, k-nearest neighbor, support vector machines, gradient boosting, extreme gradient boosting, and random forest) were statistically compared using the R2, root mean square error (RMSE), and mean absolute error (MAE). The best performing model was again fine-tuned using a grid search approach for the bias-variance trade-off. Three post-hoc model agnostic techniques were used to interpret the best performing model: variable importance (a variable was considered “important” if shuffling its values increased or decreased the model error considerably), interaction strength (based on Friedman’s H-statistic), and two-way interaction (i.e., how much of the total variability in wheat yield was explained by a particular two-way interaction). Model outputs were compared against empirical data to contextualize results and provide a blueprint for future analysis in other production systems. Tree-based and decision boundary-based methods outperformed regression-based methods in explaining wheat yield variability. Random forest was the best performing method in terms of goodness-of-fit and model precision and accuracy with RMSE, MAE, and R2 ranging between 367 and 470 kg ha−1, 276–345 kg ha−1, and 0.44–0.63, respectively. Random forest was then used for selection of important variables and interactions. The most important management variables explaining wheat yield variability were nitrogen application rate and crop residue management, whereas the average of monthly cumulative solar radiation during February and March (coinciding with reproductive phase of wheat) was the most important biophysical variable. The effect size of these variables on wheat yield ranged between 227 kg ha−1 for nitrogen application rate to 372 kg ha−1 for cumulative solar radiation during February and March. The effect of important interactions on wheat yield was detected in the data namely the interaction between crop residue management and disease management and, nitrogen application rate and seeding rate. For instance, farmers’ fields with moderate disease incidence yielded 750 kg ha−1 less when crop residues were removed than when crop residues were retained. Similarly, wheat yield response to residue retention was higher under low seed and N application rates. As an inductive research approach, the appropriate application of interpretable machine learning methods can be used to extract agronomically actionable information from large-scale farmer field data.

Item Type:	Article
Divisions:	Global Research Program - Resilient Farm and Food Systems
CRP:	UNSPECIFIED
Uncontrolled Keywords:	Random forest, Variable importance, Interaction strength, Accumulated local effect plot, Partial dependency plot, Quantile regression, Big data
Subjects:	Others > Data & Analytics Others > Wheat Others > India
Depositing User:	Mr Nagaraju T
Date Deposited:	25 Oct 2023 04:15
Last Modified:	25 Oct 2023 04:15
URI:	http://oar.icrisat.org/id/eprint/12238
Official URL:	https://www.sciencedirect.com/science/article/pii/...
Projects:	UNSPECIFIED
Funders:	UNSPECIFIED
Acknowledgement:	HSN sincerely acknowledges the Indian Council of Agricultural Research (ICAR)-Indian Agricultural Research Institute (IARI) and the International Maize and Wheat Improvement Center (CIMMYT) for support of PhD research work. The research was carried out by CIMMYT, ICAR-Central Soil Salinity Research Institute (CSSRI), ICAR-IARI, Borlaug Institute for South Asia-CIMMYT with the support of CGIAR Research Program on Climate Change, Agriculture and Food Security (CCAFS; https://ccafs.cgiar.org). CCAFS’ work was funded through CGIAR Fund Donors and through bilateral funding agreements (https://ccafs.cgiar.org/donors). Portions of this work were also supported by the Cereal Systems Initiative for South Asia (https://csisa.org) funded by the USAID and Bill and Melinda Gates Foundation (BMGF). We acknowledge Dr Zia Ahmed, Dr Raj Singh, Dr V K Singh, Dr Rajkumar Dhakar, Dr S L Jat, Dr D K Sharma, Dr B N Mandal, Gokul Paudel, Asif Faisal, Khaled Hossain, Saral Karki, Sanjay Pothireddy, and Noufa Cheerakkollil Konath for technical assistance and Manish Kumar, Deepak Bejarniya, Dr Kajod Mal Choudhary, Yogesh Kumar, Love Singh, Sushil Kumar, and Kailash Kalvania for help during data collection. This manuscript should not be taken as endorsement by CCAFS, USAID or the US government, or BMGF, and shall not be used for advertising purposes.
Links:	Google Scholar

View Statistics

Actions (login required)

View Item

Altmetric