Genetic Diversity Analysis in Tropical Maize Germplasm for Stem Borer and Storage Pest Resistance using Molecular Markers and Phenotypic traits

One hundred maize inbred lines and eighty four hybrids were characterized for resistance to maize stem borer and post-harvest insect pests. This was achieved using genetic distance and population structure based on simple sequence repeat (SSR) markers and biophysical traits. The test materials were evaluated for stem borer, maize weevil and larger grain borer (LGB) resistance. Leaf samples were harvested from 10 healthy plants per genotype and bulked. Genomic DNA was extracted using a modified version of mini-prep Cetyl Trimethyl Ammonium Bromide (CTAB) method. The samples were genotyped with 55 SSRs makers. Univariate analysis of variance was done using the general linear model procedure of SAS statistical package. Rodgers genetic distance was calculated for all data sets as a measure of genetic distance using NTSYS-pc for Windows. The distance matrices were used to generate phenograms using the unweighted pair group method based on arithmetic average (UPGMA) method in MEGA5. The genotypes were assigned into different populations using population structure software. The data was further subjected to discriminant and principal component analysis to group the gnotyoes. Analysis of molecular variance within and among the different populations was done using arlequin. There were significant differences (P ≤ 0.001) for all the biophysical traits evaluated. The SSR marker data estimated successfully the close relationship among different hybrids and inbred lines within clusters. Comparisons of the different multivariate analyses revealed high concordance among the different approaches of analyses. The results of this study can be directly used by breeding programs to develop resistant genotypes.


Introduction
Maize is a staple food for more than 300 million people in sub Saharan Africa (SSA) and is commonly grown by small-scale and resource poor farmers in rural areas (Shiferaw et al., 2011). However, the average maize yield in SSA was estimated at 1.4 t/ha, which is extremely low as compared to the 3.3 t/ha reported in developing countries in other parts of the world, the 4.9 t/ha worldwide production and the 8.4 t/ha in industrialized countries. Several factors, including a wide range of pests and diseases, periodic drought, scarcity of irrigation water, low soil fertility and farmers inability to use farm inputs contribute to low productivity in SSA. Insect pest in the field and in storage are among the factors that reduce yields and food availability in the region. Maize stem borers cause maize losses of up to 15% in susceptible germplasm in the infested ecologies, while the storage pest, such as maize weevil and larger grain borer (LGB) cause 20-30% yield loss (http://www.syngentafoundation.org). Although there are different possible methods that help in minimizing yield loss by insect pest (e.g. chemical, biological and cultural methods), host plant resistance developed through breeding is a preferred method to disseminate improved maize varieties due to its environmental and human safety, relatively low cost, and ease of use by farmers. However, there is very little effort in breeding for insect pest resistance in SSA which may be due to the genetic and logistical challenges associated with insect pest and hosts (screening and selecting for insect resistance). Nevertheless, CIMMYT and partners have developed various multiple borer resistance (MBR) lines and population using conventional breeding methods under artificial infestation. Some of the MBR germplasm have been released and disseminated in some countries. Assessment of genetic relationship and population structure is an important tool that underpins successful breeding programs (Mohammadi and Prasanna, 2003;Mukhtar et al., 2002). Genetic distance is a measure of genetic divergence between species or between populations within a species. Smaller genetic distances indicate a close genetic relationship whereas large genetic distances indicate a more distant genetic relationship. In a breeding program, genetic gain achieved through artificial selection is proportional to the extent of genetic differences present in the parental lines or populations. Thus, the correct choice of parents can influence the outcome of selection (Bohn et al., 1999). Depending on the objectives of a breeding programme, breeders use different methods in selecting the best parental combinations, including (a) pedigree relationships, (b) morphological and agronomic traits, (c) adaptability and yield stability, and (d) genetic distances estimated from morphological and molecular markers (Bohn et al. 1999;Maric et al., 2004;Bertan et al., 2007). Morphological and agronomic traits were the earliest genetic markers used in germplasm characterization and quantifying genetic distance in crops but they have a number of limitations including low polymorphism, low heritability, late expression during the development process and are highly influenced by the environment (Smith and Smith 1989).
In contrast, molecular markers, are more effective than morphological and agronomic traits for germplasm characterization. Genetic distance and population structure can be estimated from various types of molecular markers, including restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP), random polymorphic DNA (RAPD), microsatellites or simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). SSR makers are widely used by maize researchers because they are available in large numbers in the public domain (MaizeGDB: http://www.maizegdb.org), co-dominant, multiallelic, highly polymorphic even in closely related individuals, can be exchanged between laboratories, and have uniform distribution in the genome . Although advances in marker technology have shifted toward SNP markers, particularly for model organisms with substantial genomic resources, SSRs markers perform better at clustering germplasm into populations and providing more resolution in measuring genetic distance than SNPs markers (Hamblin et al., 2007).
Genetic variability for resistance to field and postharvest insect pests using phenotypic data have been reported Tefera, 2012). However, the extent of genetic differences and patterns of relationships among this germplasm and its response to stem borer, weevil and LGB resistance has not been well studied. The objective of this study was therefore to understand the extent of genetic difference, relationship and population structure across a subset of tropical maize germplasm that has been bred for field and storage pests' resistance using SSR markers and biophysical traits.

Phenotypic evaluation
There were significant differences (P ≤ 0.001) among the maize inbred lines and hybrids for all the biophysical and bioassay traits measured in the study. These traits were used to group the maize germplasm into resistant and susceptible.

Genetic distance and relationship
Roger's genetic distance between pairwise comparisons of all the 184 genotypes ranged from 0.004 to 0.467, and the overall average distance was 0.302. The vast majority (92.4 %) fell between 0.200 and 0.400 ( Figure 1).
The UPGMA tree generated from Roger's genetic distance matrix grouped the majority of the genotypes into two major groups, one for inbred lines and the other for hybrids ( Figure 2). The first group had three sub-groups (NA, G1 and G2) while the second group had also three sub-groups G3, G4 AND G5. Sub-group one (G1) consisted of a total of 68 inbred lines, including  Sub-group 2 (G2) consisted of inbred lines which have been bred for both stem borer and storage insect pests (9 lines), stem borer resistance (15 lines) and yield (2 lines).
In the second group which was composed of hybrids, Sub-group 3 (G3) consisted of hybrids which had been bred for storage pest resistance (23 hybrids), stem borers (10 hybrids) and grain yield (5 hybrids).
Group 4 (G4) was composed of 13 commercial hybrids from different seed companies which were all susceptible to the storage insect, and only 4 of the 13 hybrids showed some levels of resistance to the stem borer.
Group five (G5) consisted of 25 hybrids that were resistant to stem borer and two hybrids resistant to both stem borer and the storage insect pests.
The first five principal components from principal component analysis explained 25.7% of the total SSR variations among samples. A plot of PC1 (8.8%) and PC2 (7.4%) revealed 3 major groups ( Figure 3) and the pattern of grouping was the same as for the model-based population partition at k=3.

The population structure based on SSRs
The estimated log probability of the data (LnP(D)) increased sharply between K = 1 and K = 4 ( Figure  4b), and fairly stabilized between K = 5 and K = 6 ( Figure 4a). The ad hoc statistic ΔK showed a higher likelihood values at K = 3 (Figure 4b), with a sharp decrease when K increased from 3 to 6 ( Figure 4a). Therefore the estimated LnP (D) and K both suggest the presence of 3 possible groups.
Assignment of genotypes into specific groups was irrespective of the type of germplasm (inbred versus hybrids) and generally followed their pedigree information and their reaction to field and storage pests, with overlapping variation with some other traits, such as grain yield and drought tolerance. The majority of the genotypes were assigned to group 2, which included 23 hybrids (CKIR series) and 15 inbred lines (CKSB series) bred for stem borer resistance, 18 commercial hybrids and other inbred lines from the CIMMYT breeding programs. Group 1 and 3 consisted of 41 inbred lines in CKSP series and 28 hybrids in CKPH series that were bred for storage pest resistance within the CIMMYT breeding program. The mixed population generally was made up of CIMMYT inbred lines bred for yield and drought tolerance. Each individual is represented by a single vertical line that is partitioned into K coloured segments, with lengths proportional to the estimated probability membership to each of the K inferred clusters

Discriminant analysis
The reliability of the different groups obtained through the model-based population structure and cluster analyses was assessed through discriminant analyses using the group membership from both methods as categorical variables. The discrimination model with the stepwise procedure identified 12 alleles from 11 SSRs as the best explanatory variables for the priori group defined at K = 3 and 22 alleles from 21 SSRs for the prior groups obtained using cluster analysis (Table 1 and 2 shows the list of SSR alleles that were chosen by the stepwise discriminant analyses). The Mahalanobis distance matrix from pairwise comparisons of the 3 groups obtained from STRUCTURE at K=3 ranged from 4.0 to 37.0 and they were all significant, with group 3 being 2 to 11 times more distant from all others.
The Mahalanobis distance between groups obtained using cluster analysis ranged from 9.84 to 83.4. The commercial hybrids (CHS) were generally more distant from all the other genotypes. Based on the population structure, the grouping at K=3 corresponds to the clustering based on the Rodgers genetic distance since population 1 was equivalent to the SPRL, population two constituted the SBRL and SBRH which were close to one another with a distance of 9.84 between them, and the commercial hybrids (G4 in the dendrogram), population 3 to SPRH and the mixed population constituted other CIMMYT lines bred for yield and drought. The phenotypic traits for classifying the genotypes into resistance and susceptible was not a good indicator for discriminating the genotypes, since the canonical correlation coefficient (CAN1) was 0.13 and 0.26 for the stem borer and storage pest resistance indices respectively.
Comparisons of the different multivariate analyses revealed high concordance among the PCA, model-based population partition, clustering based on the genetic distance and discriminant analyses in terms of the number of groups and members in each group. However, there was low concordance between grouping based on the phenotypic data indices and the SSR based population partitioning in assigning the genotypes into the different groups or populations. Table 3 shows the partitioning of the overall SSR variance into hierarchical levels using AMOVA. When AMOVA was performed using the 6 possible groups predicted from UPGMA-cluster analyses and population structure; and the two groups based on storage pest resistance, the estimated fixation indices (FST) varied from 6.49 % to 27.85%. When the overall SSR variance was partitioned into hierarchical levels using the groups predefined from the model-based population partition at K = 2, K = 3, K = 4, K = 5 and K = 6 as categorical variables, FST accounted for 15.3%, 23.8%, 25.86%, 26.56% and 27.85%, respectively. In the cluster analysis that based on the storage pest resistance trait, FST accounted for 24.26% and 6.49% respectively. A random permutation test indicated that the proportion of variances attributable at all groups were highly significant (p < 0.0001).

Discussion
The significant differences and wide range in the means of the phenotypic traits related to resistance among the germplasm shows that there is great potential for the development of improved maize genotypes that are resistant to the postharvest insect pests. The biophysical/bioassay and molecular data confirm the existence of genetic divergence in tropical maize germplasm in response to the maize field and storage insect pests. This is in agreement with earlier studies that reported the existence of genetic variability of resistance to the maize weevil, larger grain borer and the stem borers among tropical maize germplasm (Arnason et al. 1994;Mwololo et al., 2010;Tefera et al., 2011). This genetic diversity can be exploited in breeding programs to introgress resistance to field and postharvest insect pests into improved varieties using conventional and genetic engineering approaches (Dhliwayo and Pixley 2003).
Overall mean Roger's genetic distance of 0.353 among pairwise comparisons of inbred lines, with the vast majority (94.2 %) showing distances between 0.300 and 0.400 have been reported (Semagn et al., 2012). This slightly differs from the average distance (0.3012) obtained from the current study. The observed lower genetic distance is likely due to the mixed origin of the inbred lines and hybrids. Clustering of the individual candidates among the wide germplasm evaluated in relation to resistance to the maize stem borer and postharvest insect pests was evident. Some of the genotypes which had been bred for stem borer and storage insect pests were resistant to both classes of maize insects hence has the potential to breed for multiple resistance. In addition, the clustering based on the SSR marker conforms to the history of generating the different genotypes. The grouping based on the phenotypic traits did not show a clear genetic differentiation with regard to specific resistance traits of the six different groups from the cluster analysis based on the SSR marker data. This is in agreement with previous studies whereby there was lack of clear clustering patterns based on phenotypes, environmental adaptation and grain colour (Xia et al., 2005). This can be explained by the fact that, selectively neutral markers used were not subject to selection and thus resistance, an adaptive trait had low correlation with SSR data (Koebner et al., 2002). The molecular analysis provides a wider genome sampling than the phenotypic analysis, therefore it is able to give a clear picture of genetic distance. The variation detected by the molecular markers is non-adaptive, hence not affected by natural or artificial selection. Most desirable phenotypic traits in plant breeding are a result of interaction among expressed genes, but agronomic studies are still essential in germplasm description and determination of molecular genetic distance is a complement (Donini et al., 2000). Clear estimates of the genetic distances would be closer when there is association between the loci controlling the phenotypic trait of interest (QTL) and the markers used and when a larger number of the traits of interest in relation to a particular situation are evaluated (Roy et al., 2004;Lefebvre et al., 2001). Earlier studies have reported that it is necessary to consider the molecular and phenotypic data separately in genotype divergence studies (Warburton et al., 2002). The use of phenotypic traits is therefore, relatively less efficient in discrimination of closely related genotypes and analysis of their genetic relationships compared to the use of molecular markers. Nevertheless, the use of phenotypic traits serves as a general approach in germplasm classification within a collection in relation to a particular trait.
The multivariate analyses revealed high concordance among the PCA, model-based population partitioning, clustering based on the genetic distance and discriminant analyses in terms of the number of groups and members in each group. Earlier studies have shown that principal component analysis as well as population structure are good predictors of grouping patterns and they can be used to complement the clustering method analysis, since different combinations of genetic distance matrices and clustering algorithms can give rise to somewhat different groups Semagn et al., 2012).
The FST values form the analysis of molecular variance indicates a moderate genetic differentiation among groups and or populations. This is in agreement with the results of genetic diversity studies from previous research on maize populations (Semagn et al., 2012;Wen et al., 2012). In addition it has been reported that most variation in maize populations is partitioned within, rather than between populations, because maize is an out-crossing species a factor that lead to reduced population differentiation (Hamrick and Godt 1997).
Genetic divergence for resistance to stem borer and postharvest insect pests exists in tropical maize germplasm. Using the biophysical/bioassay traits which are adaptive, it was possible to discriminate the resistant from the susceptible but not according to their pedigree. The integrated analysis using SSR markers suggested that the maize germplasm was likely to be composed of four subpopulations (k = 3), one group of storage pest resistance lines, another group of stem borer resistance lines related to stem borer resistant hybrids, a third group of storage pest resistant hybrids and a fourth group constituting commercial hybrids from different seed companies within Kenya and a mixed group formed by the remaining genotypes. The grouping based on the SSR markers was highly consistent with the pedigree data. The results of this study can be directly used by breeding programs to better explore the genetic variability within the groups to develop new lines and between the groups to generate hybrids resistant to both field and postharvest insect pests in maize.

Evaluation for maize stem borer
A total of one hundred eighty four maize genotypes comprising of 100 inbred lines and 84 hybrids, from CIMMYT Kenya selected from CIMMYT Kenya breeding program was used in the study (Appendix 2).  (Tefera et al., 2011). At harvest, the numbers of exit holes on the stems were counted and the cumulative tunnel length was measured by splitting the stems. Ears from stem borer uninfested plots were harvested, sun-dried to a moisture content of 12-13 % and used for, maize weevil and larger grain borer evaluation at the KARI/CIMMYT Entomology Laboratory in Kiboko as described below.

Evaluation for maize weevil and larger grain borer
The maize grains were disinfested by fumigating with phostoxin tablets for seven days to eliminate field infestation. For each genotype 100 grams of grain from each plot per replication was placed in 250 ml jars, infested with 50 unsexed 7-10 day old maize weevils and larger grain borer separately, and stored for 90 days at a temperature of 26-28 º C and relative humidity of 70-75 %. The insects used in the experiment were obtained from the KARI/CIMMYT Kiboko maize Entomology Laboratory where they were reared on the grains of maize cultivar PH3253 under controlled conditions (28 º C and 75% relative humidity). Evaluation was conducted using a completely randomized design with 3 replications.
The contents of each jar were sieved with mesh (Endecotts Ltd, UK 1 ) 90 days after infestation to separate grains, insects and flour. The flour produced by the insects was weighed, while the number of damaged kernels and adult insect progeny were counted. The grain weight loss was computed by subtracting the final from the initial weight of the grain sample and expressed as a percentage (Tefera et al. 2011). Damaged kernels were separated from the undamaged based on grain tunnelling and holes. The percentage of damaged grain was computed. Finally, the weight of the damaged and undamaged grains was measured.

DNA extraction and genotyping
Leaf samples were harvested from 10 healthy plants per genotype about 3 weeks after sowing at the Kiboko station. They were sampled in perforated Ziploc bags, immediately transferred into a Styrofoam box containing dry ice and transported to the Biosciences for eastern and central Africa (BecA) hub in Nairobi. Approximately equal amount of leaf tissue from each of the 10 plants per genotype was bulked, cut into pieces, and transferred into 1.2 ml strip tubes that contained two 4-mm stainless steel grinding balls (Spex CetriPrep, USA). The leaf samples were freeze-dried for 4 days using a Labconco freeze dryer (http://www.labconco.com) as described in the user's manual. The lyophilized leaf samples were ground into fine powder at 1500 strokes per minute for 2 minutes using GenoGrinder-2000 and genomic DNA was extracted using a modified version of the CIMMYT high throughput mini-prep Cetyl Trimethyl Ammonium Bromide (CTAB) method as described elsewhere (Semagn 2014). The quality of the isolated DNA was checked after running aliquots of DNA samples on a 0.8% agarose gel that contained 0.3 µg/mL Gel-Red-(Biotium). DNA concentration was measured using NanoDrop-ND-1000 Spectrophotometer, (Thermo Scientific, Wilmington, DE 19810, USA).
The samples were genotyped with 56 fluorescentlylabelled SSRs (Appendix 1), selected from the list of markers used for the genetic characterization of CIMMYT maize inbred lines and OPVs (Warburton et al., 2002). Polymerase Chain Reaction (PCR), genotyping and data scoring were done as described in another paper . Both DNA extraction and genotyping were done at the Biosciences Eastern and Central Africa (BecA) hub.

Analysis of phenotypic data
The percentage weight loss, flour weight and grain damage data were transformed using arcsine transformation to normalize its frequency distribution. A univariate analysis of variance using the general linear model (GLM) procedure of SAS version 9.3 (SAS Institute 2003) was performed on grain biophysical and insect bioassay traits as well as the stem borer damage traits. A susceptibility index based on leaf damage score, number of borer exit holes and cumulative tunnel length was computed by summing up the ratios between values and overall mean and dividing by the number of parameters evaluated. Germplasm with susceptibility-index values less than 0.8 were regarded as resistant, and those with greater than 0.8 as susceptible (Tefera et al. 2011).

Analysis of molecular data
SSR data analyses were conducted as described by Semagn et al., (2014). Briefly, AlleloBin (http://www. icrisat.org/bt-software-downloads.htm) was used for adjusting inconsistencies in allele calls obtained from GeneMapper software. The number of adjusted alleles per locus for each bulked genotype varied from 2 to 11. Thus, the adjusted allele sizes were converted into binary format (present =1 and absent = 0) using ALS-Binary(http://www.icrisat.org/bt-software-downl oads.htm). Rogers distance matrix was calculated between each pair of genotypes using NTSYS-pc for Windows, version 2.0. The distance matrix was used to generate phenograms using the unweighted pair-group method based on arithmetic average (UPGMA) as implemented in MEGA5.1. Principal component analysis (PCA) was performed to project the genotypes into different groups using JMP version 7.0 (SAS institute Inc., Cary, NC, USA). The first two principal components were plotted to visualize patterns of relationships among genotypes. An admixture model-based clustering method implemented in the software package STRUCTURE version 2.3.3 (Pritchard et al., 2000) was used to infer population structure among genotypes. STRUCTURE was run by varying the number of clusters (k) from 1 to 6, with each K repeated thrice at a burn-in period of 100,000 and 100,000 MCMC (Markov Chain Monte Carlo) replications after burn-in. Genotypes with membership probabilities > 60% were assigned to the same group, while those with < 60% probability memberships in any single groups were assigned to a "mixed" group. A stepwise forward canonical discriminant analysis was run using SAS statistical package (SAS Institute 2003). Analysis of molecular variance (AMOVA) was used to partition the variation among and within groups using ARLEQUIN version 3.11. For both discriminant analysis and AMOVA, the genotypes were assigned into groups or populations using the results from the phenotypic data, STRUCTURE and cluster analysis (Appendix 3).