One of these multi-tissue clusters rat cluster 3 included probe sets representing annotated protein coding genes and showed a striking enrichment for mitochondrial related genes , enrichment for heart and lymphoblasts cell-types Figure 5. Lastly, we identified two common clusters rat cluster 4 , rat cluster 5 that were most highly enriched for immune response genes and specifically expressed in whole blood and myeloid cell-types Figure 5.
- Login using.
- Wild Man Creek (Virgin River, Book 12).
- Download Time Ordering Of Gene Coexpression.
- Download Time Ordering Of Gene Coexpression.
- [Full text] Co-expression network analysis identified CDH11 in association with pr | OTT?
In particular rat cluster 5 recapitulates a previously identified co-expression network detected in seven tissues Irf7 -driven inflammatory gene network or IDIN  , which comprised genes directly and indirectly regulated by the Irf7 transcription factor a master regulator of the type 1 interferon response . This co-expression network, which is highly expressed in immune cells, may represent a molecular signature of macrophages in complex tissues and is associated with risk of inflammatory diseases and autoimmune disease Type 1 diabetes in humans  ,  , as previously demonstrated .
We highlight that this inflammatory network IDIN was previously identified by complex integration of genome-wide TFBS predictions, expression QTL mapping using genome-wide SNPs and co-expression network analysis in seven rat tissues, and was experimentally validated and translated to humans . For each rat cluster detected in all seven tissues we report the number of probe sets, the top five functional categories and their statistical significance full list in Table S2 , the summary of cell-type enrichment statistics expressed as Benjamini and Hochberg BH -adjusted p -value, Cten analysis and the graph with the significant protein-protein interactions PPI , including the overall significance of the directed PPI network DAPPLE analysis.
The colour scale on the right indicate the significance of the detected PPI. Similarly to the analysis of the rat microarray data, we have used a two-step strategy to first prioritize candidate clusters using and then validate the clusters by permutations and pinpoint the neocortical regions where these clusters are present. The clusters were annotated in detail and compared with the large catalogue of differentially expressed genes between fetal cortical zones previously reported in .
In particular, human cluster 1 recapitulates the cell-to-extracellular matrix interactions processes which were previously found to be associated with up-regulation in either VZ, ISVZ or OSVA neocortex regions . However, our multi-tissue network analysis and annotation of the results suggest further functional specialisation of the two clusters which was previously unappreciated. Top left , each node in the network represents a gene and, in keeping with  , for each gene we highlight significant up-regulation in VZ red or CP green as compared with the other neocortex regions.
Genes that are were not differentially expressed between neocortex regions are coloured in grey. Bottom left , summary of cell-type enrichment analysis expressed as Benjamini and Hochberg BH -adjusted p -value, Cten analysis. Top left , each node in the network represents a gene and, in keeping with  , for each gene we highlight significant up-regulation in CP red or VZ green as compared with the other neocortex regions. Genes that were not differentially expressed between neocortex regions are coloured in grey. Genes present in KEGG pathways related to cognitive functions MAPK signaling, axon guidance, calcium guidance and long-term potentiation are extracted from the main network and highlighted.
Bottom left , summary of cell-type enrichment analysis expressed as Benjamini and Hochberg BH -adjusted p -value, Cten analysis showing the most significant enrichment for fetal brain, prefontal cortex and amygdala tissues. In particular for human cluster 1 we found strong co-expression between 1, of the differentially expressed genes which are enriched for cell adhesion and cell-extracellular matrix ECM interaction processes during cortical development . This is in keeping with the notion that cell cycle progression in mammalian cells is strictly regulated by both integrin-mediated adhesion to the extracellular matrix and by binding of growth factors to their receptors .
Surprisingly, cell-type enrichment analysis suggested highly specific expression of human cluster 1 in MOLT-4 human T lymphoblast; acute lymphoblastic leukemia cell line, which constitutively does not express p53 a key regulator of the cell cycle, DNA repair and cell death. However, since we found down-regulation of p53 signalling and other related pathways, the observed enrichment for MOLT-4 cell-type most likely reflected cell-type-specific depletion of p53 expression and of many target genes in the CP region. Taken together, these analyzes of human cluster 1 suggest that differentially expressed genes related to cell-ECM interaction exert their function in a highly coordinated fashion where multiple pathways are involved in cell proliferation and self-renewal of neural progenitors in developing human neocortex.
Cell-type enrichment and protein-protein interaction analyzes for human cluster 2 showed high specificity of this cluster in fetal brain, prefontal cortex, amygdala tissues enrichment , and strong conservation of the network at the protein level , Figure 7. The original investigation of gene expression variation across human fetal neocortexes regions reported in  suggested a role for extracellular matrix in progenitor neuronal cells self-renewal.
Here, our C3D analysis was able to recapitulate these biological processes and furthermore highlight extensive co-expression between cell-cycle and ECM-interaction genes in proliferation and renewal of neuronal progenitors in specific neocortex regions human cluster 1. In addition, our analysis revealed a distinct functionally-coherent network human cluster 2 related to development of later cognitive functions in developing brain, which was not reported in the original study . These new findings are consistent with recent data on human-specific gene expression changes taking place during postnatal brain development in the prefrontal cortex .
Building on the HO GSVD framework, we have developed a new algorithm C3D for efficient, parameter-free and automatic detection of co-expression clusters and networks in multiple conditions. Our method is designed for analysis of weighted and unweighted networks input matrices across conditions, enabling applications to diverse data types and structures. Although the original HO GSVD algorithm assumes the non-singularity of the co-expression matrix , by using the Moore-Penrose pseudo-inverse, our C3D algorithm can be applied to the non-invertible case.
We show that when an exact HO-GSVD of the input matrices exists as defined in 4 , see Methods , our HO GSVD is able to extract the right decomposition basis through the eigen-decomposition of , whereas it finds an approximate decomposition of the data in the absence of an exact solution Figure S4. In particular, our empirical simulations and real-case applications reveal that our approximate decomposition is able to capture both common and differential co-expression structures for a wide range of noise levels, suggesting that our algorithm can be useful for practical applications to genomic data.
Here, through the HO GSVD of large-scale genomic datasets we aimed to uncover the complex interactions between genes networks that can occur within or across multiple conditions. Selecting informative vectors of , we provide different orderings of to reveal candidate clusters that are important to all conditions or specific to a sub-set of conditions; then, we can distinguish the specific conditions where the clusters are present using a permutation-based approach.
This procedure allows to pinpoint automatically the specific conditions where the sub-network structures are present and, at the same time, to provide an empirical estimate of the statistical significance empirical P -value for each cluster identified. In simulation studies, we demonstrated how C3D outperforms competing approaches in accuracy and reliability while being computationally less demanding.
We highlight how our method allowed accurate detection of clusters within complex structures i. In contrast with other approaches, C3D does not need the user to specify ad-hoc parameters related to the expected number of clusters or cluster density  or necessary to determine the optimal height cut-off in the gene clustering tree  ,  , . Since C3D utilised raw gene expression data matrices as input, the higher stability of C3D might be due to the reduced influence of the small number of observations on the stability of co-expression estimates, which can result in extreme patterns of correlation changes, corresponding to stable and fragile co-expression, as previously shown .
To demonstrate this point, we reported an application of C3D to two large transcriptional datasets: i microarray-based gene expression profiles in seven rat tissues and ii RNA-seq-based gene expression analysis of germinal zones from human fetal neocortex. In the rat analysis, we reported several functionally enriched co-expression clusters, including a previously identified inflammatory gene network driven by the IRF7 transcription factor that represents a gene expression signature of macrophages within complex tissues. In addition, our C3D analyzes revealed novel gene co-expression networks in sub-sets of tissues.
For instance, we identified a network comprising Hsp and known cardiomyopathy genes, which suggested coordinated regulation of heat shock proteins genes in multiple tissues, and their potential functional role in cardiovascular disease . While this network was not recovered by either WGCNA or DiffCoEx analyzes, we were able to replicate this new finding using separate cardiac and liver gene expression datasets in humans Figure 4. In the study of human fetal neocortex we demonstrated previously undescribed co-expression between cell cycle and ECM-receptor interaction pathways and support their role in the proliferation and self-renewal of neural progenitors.
In addition, our analyzes highlighted that pathways central to later cognitive functions e. These studies illustrated how our method can be effectively applied to leverage the vast stream of genome-scale transcriptional data that has risen exponentially over the last years, promising to aid the fine-scale characterization of both context-specific and systems-level networks and pathways. We describe a new computational method Cross-Conditions Cluster Detection or C3D to detect both similarity and dissimilarity clustering patterns in weighted networks across multiple conditions. After a data initialization step, C3D employs HO GSVD-based algorithm and cluster nodes selection and validation procedures to identify clusters, the specific conditions where the clusters are detected and the statistical significance of the clusters, as summarized in Figure 1 and detailed below.
In this step we assume the input data are non-square matrices , where the rows represent the observations and the columns indicate genes. The number of genes must be the same across datasets while the number of observations can differ. We first log transform the data and subtract for each gene its average gene expression to avoid capturing differences in average gene expression across conditions. We then calculate the co-expression matrices corresponding to each condition.
Each represents the covariance matrix of the data in condition. As in classic principal component analysis, the columns of can be scaled to unit variance to work on the correlation matrices rather than the covariance. Alternatively, our algorithm can directly take any co-expression matrix as input. This feature of our algorithm allows to extract common and differential clusters from matrices based on different co-expression measures, including robust correlation e. Spearman, Kendall and non linear metrics such as mutual information . Similarly to classic SVD, each observation from the input data can be characterized by its expression profile and represented by a data point in a dimensional space.
The observations from all datasets are contained in a subspace of dimension , which thereafter is referred to as the HO GSVD subspace. Here, we aim at finding directions in the HO GSVD subspace that either capture the variability in gene expression that is common to all conditions common factors or that is specific to a subset of conditions differential factors. Inspired by  we developed a general algorithm that allows computation of an approximate solution to the HO GSVD problem in the non full column rank case. The right basis vectors allow to identify set of genes clusters with similar co-expression patterns, that are either specific to a subset of conditions or common to all conditions.
The derivation and discussion of the special cases square, symmetric matrices with full rank and square, symmetric matrices with full rank is reported in Text S1. In the most general case, we define the right basis vectors as the solution of the eigen-decomposition problem of the matrix 3 where is the arithmetic mean of all the pairwise quotients and denotes the Moore-Penrose inverse of the co-expression matrix . Here the Moore-Penrose inverse is used as a substitute of since the invertibility of is not guaranteed when , which is the typical scenario in genomics.
In this case, for all we have 4 and its Moore-Penrose inverse is given by 5 Therefore we have 6 since is full row rank. Hence we can rewrite as follows 7 When there exists a common subspace of dimension , with basis vectors , for which the decomposition of the co-expression matrices 4 is exact, equation 7 becomes an equality and the eigenvectors of will lead to the exact basis of the common subspace. In this case the eigenvectors of do not provide an exact decomposition of the subspace.
Time ordering of gene coexpression.
Moreover, is not guaranteed to be non-defective and have a full set of real eigenvalues and eigenvectors. However, even in the absence of an exact common decomposition, the real part of the complex eigenvectors can be used to derive a low rank approximation of the common subspace and extract common and differential covariance structures from the data.
Our simulations suggest that if a common subspace of dimension with basis vectors explains a significant fraction of the variance in the original datasets , the approximation 4 holds and the first eigenvectors of the matrix corresponding to the largest eigenvalues of will provide a good approximation of the basis vectors of the HO GSVD subspace Figure S4. After we identified using our approximate HO GSVD, the input datasets can be reordered by using the informative vectors of , so that nodes that share similar characteristics tend to cluster into the same diagonal block of the co-expression matrix or in the same block formed by reordered rows of the expression matrix.
For each selected , the identification of a sub-set of nodes that have significantly large similarity with each other as compared with the rest of the nodes is obtained using a Gaussian Mixture Model GMM. Similarly to  , here we assume that each informative can be decomposed into two components since we are interested in learning how likely the distribution of is unimodal cannot be used for data clustering or bimodal.
Moreover, we assume that the two components groups are not treated symmetrically since the component with smaller weight identifies the cluster of nodes with high similarity. Conditionally on , the posterior probability that the th node belongs to th component, is calculated using the function fdrtool in the R package fdrtool  with the normal mixture distribution option. Nodes are classified into the two components depending upon the local misclassification error rate MER where is the th ordered element of , and are the weight and the th component with smaller weight, respectively.
In contrast with alternative commonly used methods  ,  ,  , our approach does not use arbitrary parameters external to the data apart from the MER level , such as the size of the cluster or the cluster density, to select the significant nodes. The C3D method integrates an automatic permutation-based approach to assess the significance of clusters across multiple conditions.
This allows to i identify the specific conditions where each cluster is detected and ii assess an empirical measure of significance for each cluster. The first step is implemented to identify the subset of the input data with , which represents the conditions where the clusters are present. Likewise, the subset with indicates the conditions where the cluster is not present. For each dataset separately, is computed as the proportion of the cluster quality calculated from random samples that exceed , where indicates the individual cluster quality in.
In the second step, we evaluate the overall significance overall P-value or of the cluster present in conditions but not in. The overall P-value for the target cluster is computed as the proportion of cluster quality of the random samples that exceed , where represents the overall cluster quality in all input datasets.
In both steps, we used incremental permutations to generate random samples in a computationally efficient way and regard a P -value and below 0. The cluster density for the weighted graphs was calculated as previously shown . More details are provided and discussed in Text S1.
The rat datasets consisted of microarray-based expression profiles for probe sets that were measured in adrenal, aorta, fat, kidney, left ventricle, liver and skeletal muscle tissues in a panel of recombinant inbred rat strains . RNA-seq data were expressed as fragments per kilobase of exon per million fragments mapped FPKM values and normalized on log2 scale, yielding an expression matrix of in neocortex regions, which were analyzed by C3D.
SD, standard deviation measured over 20 replicated datasets; dashed line,. Top , computational time required by C3D algorithm to analyze 1, genes in 25 conditions top left and 10, genes in 3 conditions top right. We assessed whether rat cluster 1 genes were significantly co-expressed in human heart and liver tissues. We first selected the top 10, varying genes in each dataset using co-variance filtering and then calculated the partial correlation matrix.
We then tested whether the human-rat orthologous genes of rat cluster 1 annotated genes had significant partial-correlations more than what expected in 10, randomly sampled networks. Out of genes in rat cluster 1 , and had human-rat orthologous genes in heart and liver expression datasets, and included all Hsp and cardiomyopathy genes identified in the rat except for PLEC which was not present in the human liver dataset. We report the density of the number of edges observed in 10, randomly sampled networks and number of significant edges detected in each tissue indicated by the red dot.
The dashed red line indicates the 95 percentile of the distribution. For each tissue, the P -values were calculated as follows:. Correlation between the solutions of the approximate HO GSVD eigenvectors of and simulated cluster structures for different noise levels i. For each level of error variance x-axes , independent replicates were generated and the absolute correlations between the first three eigenvectors of and the simulated patterns are reported as median and interquartile range y-axes.
The quality of the pattern reconstruction decreases when the error variance increases for all cluster structures. As expected, the drop is higher for the cluster structure that is unique to one condition since it explains a lower amount of the total variance across the three conditions. Please refer to Text S1 for additional details on the simulated data.
Disease Enrichment for rat cluster 1. R: Ratio of enrichment for disease associated genes, rawP: enrichment p -value from hypergeometric test, adjP: enrichment p -value adjusted for the multiple testing. Functional annotation of co-expression clusters identified in human fetal neocortex. Conceived and designed the experiments: XX EP. Developed the code for the C3D analyses: XX. Coordinated the study: LB EP. Abstract Recent high-throughput efforts such as ENCODE have generated a large body of genome-scale transcriptional data in multiple conditions e.
Another four and seven putative TF coding genes were also found to be hubs in the up and down set, respectively Table S5. The search for their orthologous genes in other ascomycetes revealed that none had been characterized, and therefore, their function remains to be elucidated. The relevant role of the characterized ortholog hub genes identified in this study highlights the central position of these nodes in the up and down sets and support the necessity of investigating their putative regulatory influence on the T.
Finally, other genes encoding proteins of different functions were also identified as hub nodes in the up and down sets Table S5. In the up set, three CAZymes jgi , , , two SSCRPs jgi , , the translocon-associated protein TRAP jgi and the Sec61 beta subunit jgi from the endoplasmic reticulum ER translocon, ion and amino acid transporters, and various unknown proteins were found.
In the down set, other hubs were discovered, including seven CAZymes jgi , , , , , , , six proteases jgi , , , , , , a SWI-SNF chromatin-remodeling complex protein jgi and unknown proteins Table 2 , Table S5. This vast array of hub genes demonstrates that the great diversity of genes is central in the co-expressed modules.
- Getting started.
- The Interactive Fly: Developmental Biology of Drosophila;
- Shelf Life: How I Found The Meaning of Life Stacking Supermarket Shelves?
- On the existence and profile of nodal solutions for a two-dimensional elliptic problem with large exponent in nonlinearity;
- Time-course analysis of gene expression profiles during five fetal stages!
- Time ordering of gene coexpression.?
However, most of them are still not characterized, and efforts are required to elucidate their function in fungal physiology. The prediction of XBS based on the promoter of cellulase and hemicellulase coding genes was used in this study to verify the genes possibly regulated by Xyr1 in the presence of a complex lignocellulosic biomass. To validate our pipeline of regulatory motif predictions, the XBS predicted in the cbh1 and xyn1 promoters were compared with those from other studies. According to Ries et al. However, it was not predicted in this study.
This was likely to be due to the differences between the strains and pipelines. In addition, Kiesenhofer et al. In addition to cbh1 , the XBS predicted in the xyn1 promoter were also compared with the ones previously reported Rauscher et al. For example, Rauscher et al. Point mutations in each of these two motifs caused a substantial decrease in the reporter activity of the gene glucose oxidase and showed that they are critical for the transcriptional activation of xyn1 in the presence of xylan, one of the primary sugar inducers of hemicellulases Rauscher et al.
Some years later, Furukawa et al. Finally, Kiesenhofer et al. In summary, the previous studies indicate that some of the XBS predicted in this study could be functional in T. In addition to the cbh1 and xyn1 , the promoter of the cbh2 gene also had two XBS predicted in this study that were shared with its homologs in other T. Based on the assumption that Xyr1 is essential to the induction of the CAZymes and sugar transporters, it was not surprising that genes related to carbohydrate transport and metabolism, such as CAZymes and putative sugar transporters, were the most abundant upregulated genes that had XBS in the KOG analyses.
Alternatively, the XBS were enriched in the genes of secondary metabolites biosynthesis, transport, and catabolism class in the down set Figure 3 , which suggests that Xyr1 participates in the regulation of secondary metabolism. To verify the putative targets of Xyr1, the direct co-expressed neighbors of the xyr1 node were retrieved from module1, and we searched for the XBS in the gene promoters Table S8.
NirA is a nitrate-specific transcription factor that modulates nitrogen metabolite repression NMR , and this TF accumulates in the nucleus in the presence of nitrate or nitrite and a low concentration of assimilable nitrogen sources, such as ammonium. The nitrogen metabolism regulator AreA interacts physically with NirA, and the complex formed activates the genes for nitrate assimilation Gallmetzer et al.
The T. Three genes jgi , , encoding putative transporters of non-sugar solutes were co-expressed with xyr1 and had XBS in their promoters Table S8. The gene jgi encodes a putative calcium transporter with 10 transmembrane domains, and it was found as a hub in the coral1 module Table S8. Therefore, it is worth investigating the role of this putative calcium transporter in the induction of the genes responsive to lignocellulose degradation, since it could transport cations that activate gene expression.
In addition to the non-sugar transporters, 15 putative sugar transporter coding genes were co-expressed with xyr1 and demonstrated at least one XBS predicted at their promoter region Table S7. One of them was considered to be a hub gene encoding a protein annotated as allantoate permease jgi , but its participation in the fungal response toward the biomass deconstruction remains unclear. Most of these putative transporters are members of the MFS superfamily, and therefore, could transport a variety of solutes, including sugar and amino acids.
Sloothaak et al. Several candidates were identified, and one RUT-C jgi ; QM6a ortholog: jgi was found upregulated and co-expressed with xyr1 in this study Table S7. Unexpectedly, this putative xylose transporter coding gene had one XBS predicted in the promoter and showed an increasing expression profile in bagasse. The prediction of an XBS in the promoter region of this gene suggests that it could be regulated by Xyr1 and could be involved in the xylose assimilation after the hemicellulose breakdown of bagasse.
A large number of interesting genes encoding CAZymes, putative sugar and ion transporters, as well as TFs and proteins with a putative regulatory role were highly co-expressed within some modules. The prediction of XBS in the promoters confirmed the influence of Xyr1 in the CAZyme coding genes regulation and enabled the identification of new putative targets of this master regulator.
Hub nodes were also found within the modules, and many of them had not been characterized. Several CAZymes, accessory proteins and uncharacterized protein coding genes were co-expressed with xyr1. Finally, this study provided an extensive number of genes that were co-expressed in bagasse. These genes have the potential to contribute to the lignocellulose degradation and to the development of T. GB performed the analyses, and RdS carried out the data processing. DR-P and MC supervised the study and performed the analyses. JO supervised the study and planned the analyses. All the authors wrote the draft and approved its final version.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The handling editor declared a shared affiliation, though no other collaboration, with several of the authors, RS and DR, at the time of the review. Amore, A. Regulation of cellulase and hemicellulase gene expression in fungi. Genomics 14, — Bader, G. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform.
Bailey, T. Nucleic Acids Res. Benocci, T. ARA1 regulates not only l-arabinose but also d-galactose catabolism in Trichoderma reesei. FEBS Lett. Bi, D. Gene expression patterns combined with network analysis identify hub genes associated with bladder cancer. Bodenheimer, A. Crystal structures of wild-type Trichoderma reesei Cel7A catalytic domain in open and closed states. Borin, G. Comparative transcriptome analysis reveals different strategies for degradation of steam-exploded sugarcane bagasse by Aspergillus niger and Trichoderma reesei.
BMC Genomics Comparative secretome analysis of Trichoderma reesei and Aspergillus niger during growth on sugarcane biomass. PLoS One e Borisova, A. Correlation of structure, function and protein dynamics in GH7 cellobiohydrolases from Trichoderma atroviride, T. Biofuels Bornscheuer, U. Enzymatic degradation of ligno cellulose. Chemie Int. Castro, L. Expression pattern of cellulolytic and xylanolytic genes regulated by transcriptional factors XYR1 and CRE1 are affected by carbon source in Trichoderma reesei. Gene Expr.
Patterns 14, 88— Chen, L. Chen, Y. Chinnici, J. Neurospora crassa female development requires the PACC and other signal transduction pathways, transcription factors, chromatin remodeling, cell-to-cell fusion, and autophagy. Cologna, N. Exploring Trichoderma and Aspergillus secretomes: proteomics approaches for the identification of enzymes of biotechnological interest. Enzyme Microb. Criscuolo, A. AlienTrimmer: a tool to quickly and accurately trim off multiple short contaminant sequences from high-throughput sequencing reads.
Genomics , — Daly, P. Transcriptomic responses of mixed cultures of ascomycete fungi to lignocellulose using dual RNA-seq reveal inter-species antagonism and limited beneficial effects on CAZyme expression. Fungal Genet. Dekhang, R. The Neurospora transcription factor ADV-1 transduces light signals and temporal information to control rhythmic expression of genes involved in cell fusion.
G3 7, — Derntl, C. Novel strategies for genomic manipulation of Trichoderma reesei with the purpose of strain engineering. Transcription factor Xpp1 is a switch between primary and secondary fungal metabolism. Identification of the main regulator responsible for synthesis of the typical yellow pigment produced by Trichoderma reesei. Diamond, A. The anaphase promoting complex targeting subunit Ama1 links meiotic exit to cytokinesis during sporulation in Saccharomyces cerevisiae. Cell 20, — Dos Santos Castro, L. Comparative metabolism of cellulose, sophorose and glucose in Trichoderma reesei using high-throughput genomic and proteomic analyses.
Druzhinina, I. A complete annotation of the chromosomes of the cellulase producer Trichoderma reesei provides insights in gene clusters, their expression and reveals genes required for fitness. Genetic engineering of Trichoderma reesei cellulases and their production. Eibinger, M. Functional characterization of the native swollenin from Trichoderma reesei : study of its possible role as C1 factor of enzymatic lignocellulose conversion.
Development of a low-cost cellulase production process using Trichoderma reesei for Brazilian biorefineries. Ene, I. Host carbon sources modulate cell wall architecture, drug resistance and virulence in a fungal pathogen. Free, S. Friedmann, J. Dunlap, S. Google Scholar. Furukawa, T. Identification of specific binding sites for XYR1, a transcriptional activator of cellulolytic and xylanolytic genes in Trichoderma reesei. Gallmetzer, A. Reversible oxidation of a conserved methionine in the nuclear export sequence determines subcellular distribution and activity of the fungal nitrate regulator NirA.
PLOS Genet. Gonzalez-Valbuena, E. Metrics to estimate differential co-expression networks. BioData Min. Gourlay, K. Swollenin aids in the amorphogenesis step during the enzymatic hydrolysis of pretreated biomass. Gupta, A. Sustainable bio-ethanol production from agro-residues: A review. Energy Rev. Re-annotation of the CAZy genes of Trichoderma reesei and transcription in the presence of lignocellulosic substrates. Cell Fact. Han, J. Evidence for dynamically organized modularity in the yeast protein—protein interaction network. Nature , 88— He, R. Trpac1, a pH response transcription regulator, is involved in cellulase gene expression in Trichoderma reesei.
Horta, M. Network of proteins, enzymes and genes linked to biomass degradation shared by Trichoderma species. Huang, Z. A novel major facilitator transporter TrSTR1 is essential for pentose utilization and involved in xylanase induction in Trichoderma reesei. The glucose repressor gene cre1 of Trichoderma : isolation and expression of a full length and a truncated mutant form. Junqueira, T. Techno-economic analysis and climate change impacts of sugarcane biorefineries considering different time horizons. Kameshwar, A. Gene expression metadata analysis reveals molecular mechanisms employed by Phanerochaete chrysosporium during lignin degradation and detoxification of plant extractives.
Kiesenhofer, D. Influence of cis element arrangement on promoter strength in Trichoderma reesei. Kim, D. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. Korripally, P. Regulation of gene expression during the onset of ligninolytic oxidation by Phanerochaete chrysosporium on spruce wood. Langfelder, P.
Signed vs. BMC Bioinformatics Le Crom, S. Tracking the roots of cellulase hyperproduction by the fungus Trichoderma reesei using massively parallel DNA sequencing. This approach identified meaningful co-expression modules significantly related to tumor grade and stage, and revealed hub genes contributing to extracellular matrix interactions and mitosis in SOC. Our study provides a novel and broad application platform for the identification of SOC gene signatures, and may be useful to characterize new molecular targets and develop effective therapeutic strategies.
Gene co-expression networks were then built among these ten datasets d1 to d Table 1: General information of involved three modeling and five validation datasets. Figure 1: Weighted gene co-expression network of SOC. A Network topology analysis was employed to choose a soft-thresholding power to achieve scale-free topology in all modeling sets.
Consensus gene dendrogram and module colors denote correspondence. C Correlation values of blue and ivory module-trait relationships across ten random sampling datasets. D Correlation values of yellow and white module-trait relationships across ten random sampling datasets. For each module, we calculated correlations between gene expression and clinical features such as tumor stage, grade, recurrence time, vital time, recurrence status, and vital status.
The last four features were regarded as prognostic traits. We noticed that there were multiple modules associated with one or more traits. In particular, there were consistent correlations among the ten sets in four modules, each named after their representative color: blue, ivory, yellow, and white.
Original Research ARTICLE
For instance, the blue and the ivory modules were related to tumor stage; the yellow module was related to grade; and the white module was related to grade in nine out of ten sets. Besides, correlations between gene expression patterns and prognostic traits were found in a minority of the ten sets. The correlation indexes are shown in Supplementary Figure 1, and the significance of module-trait relationships is shown in Figure 1C — 1E. As recommended by the WGCNA author, all uncharacterized genes were assigned to the gray module, which should have a Z score lower than that of most other modules [ 18 ].
The Z scores of the gray module and the four stage-associated or grade-associated modules were The blue module was regarded as a representative stage-associated module and the yellow module as a grade-associated module, because they both contained higher conservation and consistent association with stage or grade.
Supplementary Table 2 contains gene symbols inside these four modules. The blue and yellow modules comprised and genes, respectively. Genes with the top strongest connections within the blue and yellow modules from each set were extracted to show their connections and identify hub genes Supplementary Figures 2 and 3. Within each network, node sizes, font sizes, and color depth are proportional to their connectivity sum of in-module degrees. Shared hub genes were readily discernible in all ten sets. To compare and integrate our gene co-expression networks with protein interaction data, we extracted a high-quality protein interaction network from the Search Tool for the Retrieval of Interacting Genes STRING , which only contains interactions with a combined score above Nodes were defined as individual genes in the network, and edges were defined as the interactions between genes.
Subsequently, we found mutual genes in each module and in the STRING network gene set and extracted them from the respective subnetworks. As shown in Figures 2A and 3A , the blue module subnetwork contained nodes and edges, while the yellow module subnetwork contained nodes and edges. Since the subnetworks were extracted from a high-quality STRING protein interaction database section, derived from traceable interaction experiments, the data suggest that a tight regulatory relationship exists for these module genes in nature.
Figure 2: Blue module gene network and enrichment analysis. A Top hub genes of the blue module are shown in blue; gene importance was assigned according to circle diameter and color depth, in descending order.
B Gene ontology and pathway enrichment analysis of blue module genes. Figure 3: Yellow module gene network and enrichment analysis. A Top hub genes of the yellow module are shown in yellow; gene importance was assigned according to circle diameter and color depth, in descending order. B Gene ontology and pathway enrichment analysis of yellow module genes. A comparison of the top 25 hub genes throughout the co-expression network among the ten datasets, and mutual subnetwork genes, is summarized in Table 2 blue module and Table 3 yellow module.
The regulatory networks among these hub genes, although complex, were organized in a similar topology. All significant terms enriched in the above annotation systems are represented as a word cloud to facilitate comparison of the relative significance of enriched terms, where the grayscale and font size of each term are proportional to the adjusted p value derived from the enrichment analysis. Thus, the enriched terms in the annotation systems were mostly related to mitosis. These findings corroborate previous research implicating extensive cell proliferation and accelerated DNA replication as fundamental characteristics of tumor cells.
For a more intuitive depiction of the the expression distribution of module genes related to SOC stages, we calculated statistical significance via Kruskal-Wallis tests and plotted the module eigengene expression distribution for stages in each modeling dataset i. Meanwhile, positive correlations between eigengene expression and stages were universally demonstrated in all boxplots Figure 4.
Comparing time series transcriptome data between plants using a network module finding algorithm
Figure 4: Distributions of blue module eigengene expression among traits in modeling and validation datasets. Overall p -values and pairwise p values are shown. A genes; B top 7 genes. Since this co-expression network was identified in three public datasets and the correlation of its eigengene expression with stages in each dataset was validated, we determined if this correlation would be a universal rule across SOCs by perusing the other five independent SOC datasets from the curatedOvarianData package GSE, GSE, TCGA.
General information for the eight modeling or validation datasets examined is shown in Table 1. We calculated the eigengene expressions of module genes in these five validation datasets, and estimated the expression distribution among different stages using nonparametric tests. The distribution, mean value, and statistical results are shown in Figure 4A.
In the other three datasets, the p values were greater than 0. As fewer numbers of module genes are likely needed for clinical transformation, we attempted to use the top seven hub genes to replace the blue module genes. Significant differences were found between the yellow module eigengene expression values and different tumor grades in all modeling and validation datasets. Similarly, positive correlations between eigengene expression and tumor grades were demonstrated in all boxplots.
There were significant differences between the eigengene expression values of the top seven hub genes and tumor grades in the three modeling datasets and in four validation datasets Figure 5B. Figure 5: Distributions of yellow module eigengene expression among traits in modeling and validation datasets. In this study we integrated large-scale transcriptional profiling, incorporating three modeling datasets with SOC samples, to identify robust co-expression modules associated with cancer characteristics. Our long-term goal was to provide insights into disease biology and diagnostic classification, which may cover the shortage of objectivity in postoperative pathological diagnosis and guide early-phase clinical therapeutic applications.
We also determined that co-expression networks reflect causative relationships between gene-gene interactions. First, this study constructed two SOC-stage-specific blue and ivory and two grade-specific yellow and white modules based on ten random datasets sampled from SOC samples. Second, we identified the shared hub genes in these ten datasets and found mutual subnetwork hub genes from the high-quality STRING protein interaction database for the blue and yellow modules.
Third, we illustrated hub gene interactions and performed gene enrichment analysis on GO and pathway terms. Extracellular matrix organization genes were enriched for stage-related modules blue , while cell cycle genes were enriched for grade-related modules yellow. Then, we validated the correlations between module eigengene expression and tumor stages or grades in modeling datasets and other public validation datasets that were not used to build co-expression networks but showed ideal robustness.