eprintid: 11757 rev_number: 9 eprint_status: archive userid: 3170 dir: disk0/00/01/17/57 datestamp: 2021-03-23 04:11:25 lastmod: 2021-03-23 04:11:25 status_changed: 2021-03-23 04:11:25 type: article metadata_visibility: show creators_name: Kar, S creators_name: Garin, V creators_name: Kholová, J creators_name: Vadez, V creators_name: Durbha, S S creators_name: Tanaka, R creators_name: Iwata, H creators_name: Urban, M O creators_name: Adinarayana, J creators_gender: Female icrisatcreators_name: Garin, V icrisatcreators_name: Kholová, J affiliation: Centre of Studies in Resources Engineering, Indian Institute of Technology Bombay, Mumbai affiliation: ICRISAT (Patancheru) affiliation: Institut de Recherche pour le Développement (IRD) – Université de Montpellier – UMR DIADE, Montpellier affiliation: Laboratory of Biometrics and Bioinformatics, University of Tokyo, Tokyo affiliation: Bean Physiology - Agrobiodiversity, Alliance of Bioversity International and CIAT, Cali country: India country: France country: Japan country: Columbia title: SpaTemHTP: A Data Analysis Pipeline for Efficient Processing and Utilization of Temporal High-Throughput Phenotyping Data ispublished: pub subjects: S2 divisions: CRPS4 full_text_status: public keywords: High-throughput phenotyping, SpATS, Cross-validation, Simulation, Change point analysis, HTP-pipeline abstract: The rapid development of phenotyping technologies over the last years gave the opportunity to study plant development over time. The treatment of the massive amount of data collected by high-throughput phenotyping (HTP) platforms is however an important challenge for the plant science community. An important issue is to accurately estimate, over time, the genotypic component of plant phenotype. In outdoor and field-based HTP platforms, phenotype measurements can be substantially affected by data-generation inaccuracies or failures, leading to erroneous or missing data. To solve that problem, we developed an analytical pipeline composed of three modules: detection of outliers, imputation of missing values, and mixed-model genotype adjusted means computation with spatial adjustment. The pipeline was tested on three different traits (3D leaf area, projected leaf area, and plant height), in two crops (chickpea, sorghum), measured during two seasons. Using real-data analyses and simulations, we showed that the sequential application of the three pipeline steps was particularly useful to estimate smooth genotype growth curves from raw data containing a large amount of noise, a situation that is potentially frequent in data generated on outdoor HTP platforms. The procedure we propose can handle up to 50% of missing values. It is also robust to data contamination rates between 20 and 30% of the data. The pipeline was further extended to model the genotype time series data. A change-point analysis allowed the determination of growth phases and the optimal timing where genotypic differences were the largest. The estimated genotypic values were used to cluster the genotypes during the optimal growth phase. Through a two-way analysis of variance (ANOVA), clusters were found to be consistently defined throughout the growth duration. Therefore, we could show, on a wide range of scenarios, that the pipeline facilitated efficient extraction of useful information from outdoor HTP platform data. High-quality plant growth time series data is also provided to support breeding decisions. The R code of the pipeline is available at https://github.com/ICRISAT-GEMS/SpaTemHTP. date: 2020-11 date_type: published publication: Frontiers in Plant Science (TSI) volume: 11 number: 552509 publisher: Frontiers Media pagerange: 1-16 id_number: doi:10.3389/fpls.2020.552509 refereed: TRUE issn: 1664-462X official_url: https://doi.org/10.3389/fpls.2020.552509 related_url_url: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=10.3389%2Ffpls.2020.552509&btnG= related_url_type: pub funders: Bill and Melinda Gates Foundation citation: Kar, S and Garin, V and Kholová, J and Vadez, V and Durbha, S S and Tanaka, R and Iwata, H and Urban, M O and Adinarayana, J (2020) SpaTemHTP: A Data Analysis Pipeline for Efficient Processing and Utilization of Temporal High-Throughput Phenotyping Data. Frontiers in Plant Science (TSI), 11 (552509). pp. 1-16. ISSN 1664-462X document_url: http://oar.icrisat.org/11757/1/fpls-11-552509.pdf