goreparent clusters GO terms into groups of directly related terms with a common parent. The extent of the search for common parents between GO terms can be customised by the variables max_parents (default=2) - which defines how many generations up the search will be performed from initial ‘child’ GO terms - and max_from_top (default=3) - which defines how many levels down from the top level GO terms (“biological_process”, “molecular_function”, “cellular_component”) is considered outside of the search frame and should enable more specific clusters to be identified.
library(goreparent)
enrich_file = system.file("extdata","go_enrich_results.txt", package = "goreparent")
go_input = read.table(enrich_file, sep="\t", header=TRUE)
go_parents = add_go_groups(go_input, descriptive_parent=FALSE)
## grouping 75 GO terms by common parents
## defined 13 parental groups
# plot with terms collapsed into groups
plot_go_parents(go_parents)
# plot with expanded GO terms for the top 2 groups
plot_go_parents(go_parents, collapse=FALSE, n_top=2)
Sometimes, if max_from_top
is set too low, or too many
terms are grouping under a few too broadly descriptive parents, you may
want to further refine or describe these groups. This can be achieved
one of three ways: altering max_from_top
, adding terms to
ignore_terms
, or adding group descriptions with
descriptive_parent=TRUE
.
go_parents_l2 = add_go_groups(go_input, max_from_top = 2, descriptive_parent=FALSE)
## grouping 75 GO terms by common parents
## defined 13 parental groups
table(go_parents_l2$parent_description) %>%
as.data.frame() %>%
rename(parent_description=Var1) %>%
arrange(desc(Freq)) %>%
knitr::kable()
parent_description | Freq |
---|---|
regulation of developmental process | 15 |
cellular component organization | 13 |
cell differentiation | 11 |
cell-cell signaling | 6 |
nervous system process | 6 |
neuron death | 6 |
cell growth | 5 |
anatomical structure development | 4 |
regulation of biological quality | 3 |
regulation of cellular process | 3 |
intrinsic apoptotic signaling pathway | 1 |
locomotory behavior | 1 |
multicellular organismal signaling | 1 |
As we see from before, using max_from_top=3
[default],
we found terms associated with neural processes, whereas here using
max_from_top=2
we get broader categories such as
"cell differentiation"
and
"cell-cell signaling"
. If we look at the GO terms contained
within these clusters, we find that they are clearly related to neural
processes.
go_parents_l2 %>%
filter(parent_description %in% c("cell differentiation", "cell-cell signaling")) %>%
select(parent_description, Description) %>% arrange(parent_description) %>%
knitr::kable()
parent_description | Description |
---|---|
cell differentiation | axonogenesis |
cell differentiation | axon development |
cell differentiation | dendrite morphogenesis |
cell differentiation | dendrite development |
cell differentiation | regulation of neuron projection development |
cell differentiation | dendritic spine morphogenesis |
cell differentiation | dendritic spine development |
cell differentiation | positive regulation of neuron projection development |
cell differentiation | neuron migration |
cell differentiation | neuron projection guidance |
cell differentiation | axon guidance |
cell-cell signaling | modulation of synaptic transmission |
cell-cell signaling | regulation of synaptic plasticity |
cell-cell signaling | long-term synaptic potentiation |
cell-cell signaling | Wnt signaling pathway |
cell-cell signaling | cell-cell signaling by wnt |
cell-cell signaling | canonical Wnt signaling pathway |
max_from_top
to a higher valuego_parents_l2 %>%
select(ID, Description, FDR) %>%
add_go_groups(max_from_top = 3, descriptive_parent = FALSE) %>%
left_join(go_parents_l2[,c("Description", "parent_description")], by = "Description", suffix=c(".level3", ".level2")) %>%
arrange(parent_description.level2, parent_description.level3) %>% select("Description", starts_with("parent_description")) %>%
filter(parent_description.level2 %in% c("cell differentiation","cell-cell signaling")) %>%
knitr::kable()
## grouping 75 GO terms by common parents
## defined 13 parental groups
Description | parent_description.level3 | parent_description.level2 |
---|---|---|
axonogenesis | nervous system development | cell differentiation |
axon development | nervous system development | cell differentiation |
dendrite morphogenesis | nervous system development | cell differentiation |
dendrite development | nervous system development | cell differentiation |
regulation of neuron projection development | nervous system development | cell differentiation |
dendritic spine morphogenesis | nervous system development | cell differentiation |
dendritic spine development | nervous system development | cell differentiation |
positive regulation of neuron projection development | nervous system development | cell differentiation |
neuron migration | nervous system development | cell differentiation |
neuron projection guidance | nervous system development | cell differentiation |
axon guidance | nervous system development | cell differentiation |
Wnt signaling pathway | cell-cell signaling by wnt | cell-cell signaling |
cell-cell signaling by wnt | cell-cell signaling by wnt | cell-cell signaling |
canonical Wnt signaling pathway | cell-cell signaling by wnt | cell-cell signaling |
modulation of synaptic transmission | chemical synaptic transmission | cell-cell signaling |
regulation of synaptic plasticity | chemical synaptic transmission | cell-cell signaling |
long-term synaptic potentiation | chemical synaptic transmission | cell-cell signaling |
This can be helpful if all other groups are descriptive of their
children terms, but a few are not. cell-cell signalling is too broad,
but neuron death is not. This requires specifying
add_go_groups(ignore_terms = "GO:0007267")
# GO description for cell-cell signaling
id2term("GO:0007267")
## [1] "cell-cell signaling"
go_parents_l2 %>%
filter(parent_description %in% c("cell-cell signaling", "neuron death")) %>%
select(ID, Description, FDR) %>%
add_go_groups(descriptive_parent = FALSE, ignore_terms = "GO:0007267") %>%
left_join(go_parents_l2[,c("Description", "parent_description")], by = "Description", suffix=c(".ignore_term", ".level2")) %>%
arrange(parent_description.level2, parent_description.ignore_term) %>% select("Description", starts_with("parent_description")) %>%
knitr::kable()
## grouping 12 GO terms by common parents
## defined 3 parental groups
Description | parent_description.ignore_term | parent_description.level2 |
---|---|---|
Wnt signaling pathway | cell-cell signaling by wnt | cell-cell signaling |
cell-cell signaling by wnt | cell-cell signaling by wnt | cell-cell signaling |
canonical Wnt signaling pathway | cell-cell signaling by wnt | cell-cell signaling |
modulation of synaptic transmission | modulation of chemical synaptic transmission | cell-cell signaling |
regulation of synaptic plasticity | modulation of chemical synaptic transmission | cell-cell signaling |
long-term synaptic potentiation | modulation of chemical synaptic transmission | cell-cell signaling |
neuron death | neuron death | neuron death |
neuron apoptotic process | neuron death | neuron death |
regulation of neuron death | neuron death | neuron death |
regulation of neuron apoptotic process | neuron death | neuron death |
negative regulation of neuron death | neuron death | neuron death |
negative regulation of neuron apoptotic process | neuron death | neuron death |
This is a generalised way to remove parent terms that may encompass too many child terms to be descriptive by themselves.
go_parents_l2 %>%
select(ID, Description, FDR) %>%
add_go_groups(max_from_top = 2, descriptive_parent = FALSE, max_children = 100) %>%
left_join(go_parents_l2[,c("Description", "parent_description")], by = "Description", suffix=c(".max_child", ".level2")) %>%
arrange(parent_description.level2, parent_description.max_child) %>% select("Description", starts_with("parent_description")) %>%
filter(parent_description.level2 %in% c("cell differentiation","cell-cell signaling")) %>%
knitr::kable()
## grouping 75 GO terms by common parents
## defined 32 parental groups
Description | parent_description.max_child | parent_description.level2 |
---|---|---|
axonogenesis | axon development | cell differentiation |
axon development | axon development | cell differentiation |
dendrite morphogenesis | dendrite development | cell differentiation |
dendrite development | dendrite development | cell differentiation |
dendritic spine morphogenesis | dendrite development | cell differentiation |
dendritic spine development | dendrite development | cell differentiation |
neuron migration | neuron migration | cell differentiation |
neuron projection guidance | neuron projection guidance | cell differentiation |
axon guidance | neuron projection guidance | cell differentiation |
positive regulation of neuron projection development | positive regulation of cell projection organization | cell differentiation |
regulation of neuron projection development | regulation of neuron projection development | cell differentiation |
Wnt signaling pathway | cell-cell signaling by wnt | cell-cell signaling |
cell-cell signaling by wnt | cell-cell signaling by wnt | cell-cell signaling |
canonical Wnt signaling pathway | cell-cell signaling by wnt | cell-cell signaling |
modulation of synaptic transmission | modulation of chemical synaptic transmission | cell-cell signaling |
regulation of synaptic plasticity | modulation of chemical synaptic transmission | cell-cell signaling |
long-term synaptic potentiation | modulation of chemical synaptic transmission | cell-cell signaling |
This option works on clusters of GO terms that may not be closely
related, and thus works on GO clustering output from other clustering
algorithms. It is, however, currently very slow, and therefore we only
recommend running it on the top clusters of significance by setting
n_top
to an integer.
The descriptive parent option uses
make_descriptive_parent()
- a function which relies on
finding GO terms which are semi-related to all GO terms in a cluster.
This function works to find parent terms which are step-parents/uncles
to children terms. This allows 1 degree of separation in defining parent
terms, which once terms are ranked, chooses the parent term with the
highest relatedness. If all child GO terms are unable to be related to
the one parent even with this relaxation in relationships, further terms
are tried until all child terms are covered by one parent.
Using normal goreparent parental clustering +
descriptive_parent=TRUE
go_parents_l2.descriptive = add_go_groups(go_input, descriptive_parent=TRUE, max_from_top=2, verbose=FALSE, n_top = 10)
plot_go_parents(go_parents_l2.descriptive)
using make_descriptive_parent()
separately
post-add_go_group
This creates a table with
new_parent_description in a separate column which can be used to add
to/replace the original ‘parent term’ in a format of your choosing.
go_parents_l2.add_desc = go_parents_l2 %>%
filter(parent_description %in% c("cellular component organization")) %>%
select(ID, Description, parent_description, FDR) %>%
make_descriptive_parent(replace_group_col = FALSE)
go_parents_l2.add_desc %>% head(2) %>% select(-FDR) %>% knitr::kable()
ID | Description | parent_description | new_parent_description |
---|---|---|---|
GO:0050808 | synapse organization | cellular component organization | synapse organization |
GO:0007416 | synapse assembly | cellular component organization | synapse organization |
library(ggplot2)
go_parents_l2.add_desc %>%
ggplot( aes(x=-log10(FDR), y=Description, col=-log10(FDR))) +
geom_point() +
facet_wrap(~parent_description+new_parent_description) +
theme_minimal()+
theme(panel.border = element_rect(fill=NA, size=0.1)) +
theme(strip.text.x = element_text(face="bold", hjust=0))
We are aware of simplifyEnrichment and how it can cluster GO terms, however we found that the clusters - while semantically related - did not have an easily string-convertible description of what terms the cluster actually contains, and therefore still requires displaying all the child terms and naming in a non-informative manner.
There is a built-in option to use simplifyEnrichment semantic
similarity clustering within add_go_groups()
, using the
option method="cluster"
. For this method we HIGHLY
recommend running with descriptive_parent=TRUE
.
go_clusters = add_go_groups(go_input, method = "cluster", descriptive_parent=TRUE)
plot_go_parents(go_clusters)