Machine Learning for Plant Genomic Prediction

Charaya, N and Kalia, Sonika and Rautela, I and Yadav, P and Bhardwaj, P and Nautiyal, R and Kalia, M and Sharma, V (2025) Machine Learning for Plant Genomic Prediction. In: Machine Learning for Plant Biology. John Wiley & Sons, Inc., pp. 195-208. ISBN 9781394329618

Full text not available from this repository. (Request a copy)

Abstract

Research and breeding on plants have led to significant yield increases, supporting human lifestyle shifts. However, climate change, caused by rising temperatures and industrial output, may significantly alter the agricultural landscape. Breeding new cultivars takes time, and it will take decades to close the cereal production gap. Therefore, new breeding methods are needed to address these challenges. Genomic prediction (GP) is a technique that uses genotypic data generated by high-throughput genotyping technologies, like genotype-by-sequencing, to predict phenotypic values. Single-nucleotide polymorphisms (SNPs), are commonly used to record the observed genotype concerning a reference genome. GP may contribute to enhancing crop genetic performance in addition to advancing our understanding of the basic genetic architecture of genomes. Plant breeding is actually being greatly impacted by GP, as it may disclose complicated features like yield or disease/pest resistance directly from genotypes. The standard strategy in GP modeling is to use and compare different computational methods, notably machine learning, deep learning, and statistics, to find the best solution for single or multi-trait problems. The most popular methods in GP are the GBLUP and Bayesian approaches. Deep learning and machine learning techniques are currently being proven as strong replacements for GP in terms of accuracy, computing time, and cost. Two widely used methods in machine learning are gradient boosting (GB) and random forests (RF), while for deep learning, multilayer perceptrons (MLP) and convolutional neural networks are employed. Data is what drives the decision-making process in plant breeding, and strong tools in the machine learning framework enable the extraction of valuable information from data. The two primary subfields of machine learning are supervised and unsupervised learning. In genomic prediction, supervised learning is applied in plant breeding to model phenotypic traits as a function of molecular markers. For genomic prediction, neural networks, tree ensembles, kernel methods, and linear methods are the main supervised learning algorithms. This work offers an understanding of how these algorithms are implemented and how methods can be compared using cross-validations. We provide an overview for researchers wishing to use machine learning techniques in their research by summarizing the benefits and drawbacks of doing so in the context of genomic prediction.

Item Type:	Book Section
Divisions:	Global Research Program - Accelerated Crop Improvement
CRP:	UNSPECIFIED
Uncontrolled Keywords:	Genomic prediction (GP), high-throughput genotyping technologies, Single-nucleotide polymorphisms (SNPs), machine learning, plants
Subjects:	Others > Genetics and Genomics
Depositing User:	Mr Nagaraju T
Date Deposited:	19 May 2026 05:58
Last Modified:	19 May 2026 05:58
URI:	http://oar.icrisat.org/id/eprint/13630
Acknowledgement:	UNSPECIFIED
Links:	Google Scholar

View Statistics

Actions (login required)

View Item

Altmetric