Yellow lupin (Lupinus luteus L.) is a minor legume crop characterized by its high seed protein content. Although grown in several temperate countries, its orphan condition has limited the generation of genomic tools to aid breeding efforts to improve yield and nutritional quality. In this study, we report the construction of 454-expresed sequence tag (EST) libraries, carried out comparative studies between L. luteus and model legume species, developed a comprehensive set of EST-simple sequence repeat (SSR) markers, and validated their utility on diversity studies and transferability to related species.
Two runs of 454 pyrosequencing yielded 205 Mb and 530 Mb of sequence data for L1 (young leaves, buds and flowers) and L2 (immature seeds) EST- libraries. A combined assembly (L1L2) yielded 71,655 contigs with an average contig length of 632 nucleotides. L1L2 contigs were clustered into 55,309 isotigs. 38,200 isotigs translated into proteins and 8,741 of them were full length. Around 57% of L. luteus sequences had significant similarity with at least one sequence of Medicago, Lotus, Arabidopsis, or Glycine, and 40.17% showed positive matches with all of these species. L. luteus isotigs were also screened for the presence of SSR sequences. A total of 2,572 isotigs contained at least one EST-SSR, with a frequency of one SSR per 17.75kbp. Empirical evaluation of the EST-SSR candidate markers resulted in 222 polymorphic EST-SSRs. Two hundred and fifty four (65.7%) and 113 (30%) SSR primer pairs were able to amplify fragments from L. hispanicus and L. mutabilis DNA, respectively. Fifty polymorphic EST-SSRs were used to genotype a sample of 64 L. luteus accessions. Neighbor-joining distance analysis detected the existence of several clusters among L. luteus accessions, strongly suggesting the existence of population subdivisions. However, no clear clustering patterns followed the accession's origin.
L. luteus deep transcriptome sequencing will facilitate the further development of genomic tools and lupin germplasm. Massive sequencing of cDNA libraries will continue to produce raw materials for gene discovery, identification of polymorphisms (SNPs, EST-SSRs, INDELs, etc.) for marker development, anchoring sequences for genome comparisons and putative gene candidates for QTL detection.