Location: Children's Nutrition Research Center2021 Annual Report
Objective 1: Select inbred mouse strains with phenotypic extremes in milk production will be used to: a) identify genomic variants along with intestinal and mammary-expressed genes that differentiate low and high milk production, and b) determine the extent to which genome-driven differences in milk production and mammary gene expression are directly mediated through host-dependent differences in the intestinal and/or mammary tissue microbiome. Subobjective 1A: Sequence the genomes of additional unsequenced strains from our original milk yield cohort and then use this completed lactation phenome genotype data to identify strain-specific private alleles and predict the functional consequences of these variants on genes with the potential to regulate traits defined in the lactation phenome dataset. Subobjective 1B: Combine the lactation phenome dataset with the expanded common variant data from sub objective 1A to conduct an enhanced joint-GWAS of SNP, INDEL, and SV, and to subsequently predict the functional consequences of the newly identified variants to lactation. Subobjective 1C: Using a complete 3x3 diallele cross of QSi3, QSi5, and PL/J determine the contribution of strain-dosage, heterosis, parent-of-origin, and epistasis to milk production and composition, and mammary gland development during early lactation, and identify mammary epithelial cell and intestinal eGenes on the basis of allelic imbalance. Subobjective 1D: Integrate the set of eGenes discovered in 1C with the set of private and common variants discovered 1A and1B and employ network modeling to predict and test those variant-eGene pairs that are most likely to cause the variation in the lactation phenome traits. Subobjective 1E: Analyze the fecal microbiota along with prolactin and oxytocin in samples obtained from the diallel conducted under sub-objective 1C to determine the contribution of strain-dosage, heterosis, parent-of-origin, and epistasis to the diversity and richness of the intestinal microbiota, to the abundance of specific taxa, and to neuroendocrine function in mouse strains with a genetic propensity for high or low milk yield.
Genetic background is known to influence variation in milk production however environmental factors also play a role. Advances in high-throughput DNA sequencing technologies have revolutionized the way in which the microbial world is viewed and has led to the concept that the microbiome is a major regulator of normal development and health. The microbiome is regulated by diet, but is also under the control of the host genome. In this regard, the full number of host genetic variants associated with lactation-related traits remains to be determined. Differences in milk production are driven by changes in gene expression within organs important to milk synthesis. Additionally, the intestinal microbiome is controlled by the host genome, but can directly influence gene expression within the host. We aim to understand how variations in the maternal genome interact with the microbiome to determine lactation success. Whole genome sequence data from select mouse strains will be used to identify genetic variants that are unique to high or low milk production. These newly identified variants will be functionally linked to milk production and composition, and to lactation-induced intestinal and mammary gene expression through a specific RNA Sequencing test known as allelic imbalance. Strain- and allele-dependent differences in fecal ribosomal 16s sequencing reads will associate the variants with the intestinal microbiome. Lastly, maternal microbiome seeding through neonatal cross-fostering will establish the ability of the intestinal microbiome to over-ride the effects of genetic background lactation-dependent gene expression and milk production.
The objective for this project was to compare the genomes of mouse families that differ in their ability to produce milk or differ in the amounts of nutrients in their milk as a way to a) identify genomic variants along and genes that differentiate low and high milk production or nutrient content, and b) determine the extent to which genome-driven differences are directly mediated through differences in the intestinal and/or mammary tissue microbiome. Under this objective, there are six sub-objectives and our work during the past year focused on the first three. Sub-objective 1A was to sequence the genomes of additional mouse families from our original milk yield study and then use this completed genotype data to identify family-specific private variants and predict their functional consequences. Sub-objective 1B was to combine the lactation phenome dataset with the expanded common variant data from Sub-objective 1A to conduct an enhanced genome-wide association (GWAS) and to subsequently predict the functional consequences of the newly identified variants to lactation. Sub-objective 1C was to use a complete 3x3 diallele cross of three inbred strains to determine the contribution of strain-dosage, heterosis, parent-of-origin, and epistasis to milk production and composition, and mammary gland development during early lactation, and identify mammary epithelial cell and intestinal eGenes on the basis of allelic imbalance. For Sub-objective 1A we completed DNA sequencing of the genomes for 9 different inbred mouse families that were part of our original lactation trait study but had incomplete information on their genetics. The data from this effort allowed us to identify the genetic locations in these mouse families where there were differences from one family to the next. These differences in the DNA sequence are generally responsible for the physical and/or biological characteristics that distinguish one family from another. As a result of this work the project has created a high-quality dataset containing 25,993,223 variants present in the 31 mouse families within our lactation study mouse samples. Within the large set we used frequency filtering to identify 1,701,355 variants that were uniquely present in any one of the 31 families but absent from the rest. The "private" variants in this small set have been suggested to possess a greater ability to cause biological differences among families than what would be expected for those in the large set. We further analyzed this small set by using internet-based biological knowledge databases to identify genes that are altered by these variants and to determine if these genes would be predicted to affect biological pathways relevant to lactation. Using a database known as the "variant effect predictor" (Vep) we discovered that there were about 50,000 genes potentially altered by private variants. There were 1,250 gene-associated variants found within the vitally important regions that carry the instructions for making the protein machinery of our cells. Importantly, 92 of these variants were predicted to have a high probability of inactivating the genes in which they were found. These will be the subject of further study. To gain insight into broader biological functions of the above variants gene, lists were made from the filters and run through a second internet database known as "InnatedB". This database first identifies the cellular pathways that a particular gene list affects and then scores pathway the using a probability test called the "over-representation" test. For the complete list of 50,000 variant-associated genes, InnatedB revealed interactions in 1,725 distinct biological pathways. At this point we also used filtering to create gene lists that were found within different mouse families from our lactation study cohort. The gene lists were filtered to contain only those from the top or bottom 20% of the families for each trait. This approach allowed for the identification of genes and pathways that could be associated with either high or low performance within a particular lactation measurement. There were 119 different maternal traits measured in the lactation study cohort, and this part of the analysis is still ongoing. However, initial work has identified several interesting pathway associations. For example, when we took the families known for a high lactation as indicated by litter gain per gram of maternal body weight, there were 7 pathways with a high over-representation score. The most interesting of these was a pathway necessary for protein synthesis through cell structures known as ribosomes. In this ribosome pathway there were 7 distinct genes that contained private variants in the families from the top 20%. All of these genes contribute to the assembly of the ribosome machinery. Analysis of the additional maternal traits is ongoing.