Regulation of gene expression by alternative polyadenylation in health and disease.
3/5 보강
TL;DR
Sirolimus reduces secondary cSCC risk but increases AEs; patient selection and monitoring are essential; sirolimus reduces secondary cSCC risk but increases AEs.
OpenAlex 토픽 ·
RNA Research and Splicing
Biochemical Acid Research Studies
RNA modifications and cancer
Sirolimus reduces secondary cSCC risk but increases AEs; patient selection and monitoring are essential; sirolimus reduces secondary cSCC risk but increases AEs.
APA
Bin Tian, SHAN Yu, Qiang Zhang (2026). Regulation of gene expression by alternative polyadenylation in health and disease.. Nature reviews. Genetics, 27(5), 385-404. https://doi.org/10.1038/s41576-025-00928-w
MLA
Bin Tian, et al.. "Regulation of gene expression by alternative polyadenylation in health and disease.." Nature reviews. Genetics, vol. 27, no. 5, 2026, pp. 385-404.
PMID
41540274 ↗
Abstract 한글 요약
More than half of the human protein-coding genes display alternative polyadenylation (APA), whereby 3'-end processing of the nascent RNA takes place at different sites. APA leads to mRNA isoforms containing different 3' untranslated regions (3'UTRs), which generally modulate mRNA metabolism in cis but can also exert cellular functions in trans. In addition, intronic APA alters protein sequences at the carboxy-terminal region or inhibits gene expression through premature transcription termination. APA is increasingly recognized as a key layer of transcriptomic regulation that defines cell identity and proliferation and/or differentiation states, as well as controlling cellular responses to environmental cues. The relevance of APA for human health is highlighted by the many pathological conditions that are associated with APA dysregulation, including cancer, developmental disorders and neurodegeneration, as well as the disease risks associated with a growing number of genetic variations shown to affect APA. Here, we discuss physiological and pathological APA dynamics, the human mutations and genetic variants that are associated with changes in APA, and our current understanding of the functional effects and regulatory mechanisms of APA.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (3)
- Novel 5,7-Diazaindole-based ERK5 inhibitor induces endoplasmic reticulum stress and mitochondrial apoptosis in non-small cell lung Cancer.
- Liquid biopsy in malignant primary bone tumors: Clinical applications of circulating tumor DNA and circulating tumor cells for diagnosis, prognosis and treatment monitoring (Review).
- Ribonuclease P/MRP subunit RPP40 coordinates the transcription of pre-rRNA and ribosomal protein genes to promote Hepatocellular carcinoma malignancy.
📖 전문 본문 읽기 PMC JATS · ~78 KB · 영문
Introduction
Introduction
Almost all eukaryotic mRNAs and long non-coding RNAs (lncRNAs; generally, longer than 200 nucleotides) that are transcribed by RNA polymerase II (RNAPII) use cleavage and polyadenylation (CPA) for 3′-end maturation (Fig. 1a). The CPA site (also known as the polyadenylation site (PAS)) is recognized by the CPA machinery through interactions between RNA motifs surrounding the PAS (known as PAS motifs) and the RNA-binding proteins (RBPs) in the CPA machinery (Box 1). CPA comprises two coupled enzymatic reactions (Fig. 1a) – cleavage of the nascent RNA and addition of a polyadenosine (poly(A)) tail (of up to 250 adenosines in human cells). Whereas the cleavage step affects transcription termination, which takes place within a few kilobases after the PAS1 (Fig. 1a), the polyadenylation step has important roles in mRNA metabolism, such as nuclear export, stability and translation2. CPA is therefore a nexus for gene expression, connecting nuclear events with cytoplasmic fates for gene transcripts. In accord with this, mutations of CPA factor genes as well as those of PAS motifs in genes with crucial cellular functions have been implicated in many human diseases (Supplementary Tables 1 and 2).
More than half of the human mRNA genes harbour multiple PASs, leading to the expression of alternative polyadenylation (APA) isoforms (Fig. 1b). Those genes that display APA tend to have ubiquitous expression across cell types3 and a long evolutionary history4. An APA event can be classified as one of two types depending on its functional consequences and relationship with splicing: APA in a terminal exon, typically the 3′-most one, alters the length of the 3′ untranslated region (3′UTR) of mRNA and is hence named 3′UTR-APA (also known as tandem APA); whereas APA taking place upstream of the 3′-most exon, mainly in introns, leads to changes in both the 3′UTR and the coding sequence, and has been referred to variously as coding region APA (CR-APA), alternative last exon (ALE), premature CPA (PCPA) or intronic polyadenylation (IPA; the term we use here) (Fig. 1b). Depending on the splicing context, an IPA transcript could end with either a composite internal–terminal exon or a skipped terminal exon (Fig. 1b); this difference has important mechanistic and functional implications (see below). Across mammals, 3′UTR-APA events are generally more conserved than IPA events5; about 50% and 20% of human genes that display APA have conserved 3′UTR-APA and IPA events, respectively, in rodents5.
In this Review, we discuss the dynamics of APA in various physiological and pathological conditions, the human mutations and genetic variants that are associated with APA isoform changes, and our current understanding of the functional effects of APA and molecular mechanisms of its regulation. We also cover high-throughput methods and machine learning-based approaches to study APA. We focus on studies in metazoans, mostly humans. Readers are referred to other recent reviews for more in-depth coverage of the mechanism of CPA6,7 or of gene regulation by APA in plants8.
Almost all eukaryotic mRNAs and long non-coding RNAs (lncRNAs; generally, longer than 200 nucleotides) that are transcribed by RNA polymerase II (RNAPII) use cleavage and polyadenylation (CPA) for 3′-end maturation (Fig. 1a). The CPA site (also known as the polyadenylation site (PAS)) is recognized by the CPA machinery through interactions between RNA motifs surrounding the PAS (known as PAS motifs) and the RNA-binding proteins (RBPs) in the CPA machinery (Box 1). CPA comprises two coupled enzymatic reactions (Fig. 1a) – cleavage of the nascent RNA and addition of a polyadenosine (poly(A)) tail (of up to 250 adenosines in human cells). Whereas the cleavage step affects transcription termination, which takes place within a few kilobases after the PAS1 (Fig. 1a), the polyadenylation step has important roles in mRNA metabolism, such as nuclear export, stability and translation2. CPA is therefore a nexus for gene expression, connecting nuclear events with cytoplasmic fates for gene transcripts. In accord with this, mutations of CPA factor genes as well as those of PAS motifs in genes with crucial cellular functions have been implicated in many human diseases (Supplementary Tables 1 and 2).
More than half of the human mRNA genes harbour multiple PASs, leading to the expression of alternative polyadenylation (APA) isoforms (Fig. 1b). Those genes that display APA tend to have ubiquitous expression across cell types3 and a long evolutionary history4. An APA event can be classified as one of two types depending on its functional consequences and relationship with splicing: APA in a terminal exon, typically the 3′-most one, alters the length of the 3′ untranslated region (3′UTR) of mRNA and is hence named 3′UTR-APA (also known as tandem APA); whereas APA taking place upstream of the 3′-most exon, mainly in introns, leads to changes in both the 3′UTR and the coding sequence, and has been referred to variously as coding region APA (CR-APA), alternative last exon (ALE), premature CPA (PCPA) or intronic polyadenylation (IPA; the term we use here) (Fig. 1b). Depending on the splicing context, an IPA transcript could end with either a composite internal–terminal exon or a skipped terminal exon (Fig. 1b); this difference has important mechanistic and functional implications (see below). Across mammals, 3′UTR-APA events are generally more conserved than IPA events5; about 50% and 20% of human genes that display APA have conserved 3′UTR-APA and IPA events, respectively, in rodents5.
In this Review, we discuss the dynamics of APA in various physiological and pathological conditions, the human mutations and genetic variants that are associated with APA isoform changes, and our current understanding of the functional effects of APA and molecular mechanisms of its regulation. We also cover high-throughput methods and machine learning-based approaches to study APA. We focus on studies in metazoans, mostly humans. Readers are referred to other recent reviews for more in-depth coverage of the mechanism of CPA6,7 or of gene regulation by APA in plants8.
APA dynamics in physiological conditions
APA dynamics in physiological conditions
The widespread variability of APA isoforms was first revealed in the late 1990s by expressed sequence tag data9, which helped uncover distinct global trends of APA isoform expression in different human tissues, such as biased expression of distal PAS (dPAS; further from the 5′ end of the gene) isoforms in neuronal tissues and of proximal PAS (pPAS; closer to the 5′ end of the gene) isoforms in blood10. Subsequently, data from microarrays and serial analysis of gene expression provided evidence of APA isoform variations in cell proliferation, differentiation and development. Of note, it was reported that proliferating cells tend to have shortened mRNA 3′UTRs, as exemplified by activated T cells11, whereas progressive 3′UTR lengthening was shown to occur in mouse embryonic development and during differentiation of the myoblast C2C12 cell line12. Concurrently, general 3′UTR shortening was reported in cancer cells13. These early studies established APA as a prominent layer of transcriptomic regulation in different tissues and cell contexts (Fig. 2a). The advent of 3′-end RNA sequencing (RNA-seq) in the early 2010s enabled genome-wide profiling of APA isoforms with greater precision14–16. In addition, standard RNA-seq data have also been used to interrogate APA isoform expression when coupled with bioinformatics methods to demarcate PASs17. In the past few years, there has been a marked increase in the use of single-cell RNA sequencing (scRNA-seq) data to examine APA, owing to the fortuitous fact that most scRNA-seq methods give rise to reads biased to the 3′ end of RNAs (Box 2).
Global trends in cell growth and differentiation
In line with the global 3′UTR lengthening in cell differentiation and development that was reported in early studies using bulk samples12, analysis of scRNA-seq data for approximately two million nuclei from mouse embryonic days 9.5 to 13.5 identified a progressive 3′UTR lengthening trend in all cell types across embryonic stages18. Nevertheless, the degree of APA regulation varies in different cell types, with neurogenesis lineages having more prominent APA than others18. This is in agreement with an early study in which APA events in myogenesis were compared with those in neurogenesis using bulk samples19. It was found that these two lineages share similar APA events but that the global shift in pPAS-to-dPAS use (resulting in 3′UTR lengthening) is executed to a much greater extent in neurogenesis than myogenesis19. One interpretation of these results is that there exists a common APA regulatory mechanism in cell differentiation that leads to transcript lengthening, and that neurogenesis has additional mechanism(s) that augment the common APA regulatory programme (such as Hu proteins, see ‘RNA-binding proteins’). Notably, the neurogenesis-associated APA regulatory scheme is well conserved across species, as shown in fly20,21 and worm22.
In another single cell-based study of pre-implantation development, it was shown that whereas there is continuous 3′UTR lengthening from zygote to morula in mice, human cells exhibit initial lengthening of 3′UTRs (from zygote to four-cell stage) followed by 3′UTR shortening23. However, similar pathways, such as RNA processing, are enriched for APA-regulated genes in both species, indicating that APA has conserved biological consequences. The global trend of 3′UTR lengthening during cell differentiation is likely to be related to the 3′UTR shortening that occurs during cell proliferation11, as differentiated cells are generally quiescent. In line with this, several cell differentiation lineages that involve increased proliferation have a global trend of 3′UTR shortening, including the activation of haematopoietic stem cells to differentiated multipotent progenitors24 and the differentiation of lymphoid cells25.
A key question related to changes in 3′UTR length during proliferation and differentiation concerns the extent to which APA regulation can be attributed to certain phases of the cell cycle. Using scRNA-seq to examine APA events at different stages of the cell cycle — including G1/S, S, G2/M, M and M/G1, in mouse embryonic fibroblasts and two human cell lines26 — it was found that, during the cell cycle, genes that are involved in cell cycle control have greater APA regulation than other genes and, importantly, that changes in APA isoforms are more prominent than changes in mRNA abundance26. These findings highlight a key role of APA in cell cycle regulation. However, it remains unclear to what extent cell proliferation-associated 3′UTR shortening is governed by cell cycle phases. It is also worth noting that although most studies have focused on 3′UTR-APA, IPA is also globally regulated and follows the same switch in pPAS-to-dPAS usage during embryonic development and cell differentiation16, indicating that both types of APA are regulated by similar mechanisms.
Exceptions to general trends
The general APA trends associated with cell proliferation (3′UTR shortening) and differentiation (3′UTR lengthening) have several exceptions. Differentiation of spermatogonial stem cells to mature sperm cells involves stage-wise 3′UTR shortening and IPA activation27–29. This APA scheme, which culminates in spermatids after meiosis, seems to be coordinated with chromatin remodelling (the relaxation of histone proteins and their replacement with spermidine), as well as increased nonsense-mediated decay (NMD) activity that degrades mRNAs with long 3′UTRs30,31. As a result, mRNAs with shorter 3′UTRs can better escape from the large-scale culling of mRNAs that takes place during spermatogenesis, making them available for translation and protein production at later stages of sperm development when transcription ceases to operate. However, the differentiation of oogonia to oocytes (oogenesis), which also involves meiosis, does not have a similar global dPAS-to-pPAS shift28, which indicates that meiosis per se is not likely to be the cause of the unique APA programme in spermatogenesis.
Another exception to general APA trends is the differentiation of professional secretory cells. It was found that differentiation of human embryonic stem cells to trophoblast-like cells in vitro elicits global 3′UTR shortening32. Further analysis of scRNA-seq data from placenta, which is composed of several trophoblast types, revealed that the differentiation of syncytiotrophoblast, a trophoblast subtype specialized in hormone production and secretion during pregnancy, involves a global dPAS-to-pPAS shift that is coupled with the expression of genes in the protein secretion pathway. This secretion-coupled mechanism of APA can also explain APA-mediated 3′UTR shortening during the differentiation of B cells to antibody-secreting plasma cells33, as well as during T cell activation11. However, the latter process has a concomitant increase in cell proliferation, which makes it difficult to separate APA mechanisms coupled with secretion from those connected with proliferation.
Regulation by cell metabolism
As part of the transcriptomic programme, APA is highly dynamic upon changes to the metabolic state of cells. For example, activation of the mTOR pathway, which regulates cell metabolism in response to nutrient availability34, causes general 3′UTR shortening in multiple human and mouse cell types35. Interestingly, certain genes — such as those involved in protein processing in the endoplasmic reticulum and ubiquitin-mediated proteolysis — seem to be particularly affected by mTOR-mediated 3′UTR shortening35,36. Because mTOR pathway activation is an integral part of cell proliferation and growth, it is tempting to speculate that mTOR activation might have an important role in mediating the global APA changes that occur in these processes. This could also explain the general 3′UTR shortening that occurs in cardiac hypertrophy37, where cells grow in size but not in number. The catabolic process of autophagy38, which involves inhibition of mTOR under conditions of nutrient starvation, was found to be augmented by perturbations of multiple CPA factors in a Drosophila model39, further highlighting the cross-talk between cell metabolism and APA regulation.
Cellular response to stress
Stress conditions lead to global APA changes with variable consequences. Some types of stress cause 3′UTR lengthening, such as the ribotoxic stress elicited by anisomycin, which inhibits the peptidyl transferase activity of the ribosome40. However, some other stress conditions, especially those with genotoxic effects, have been reported to cause global 3′UTR shortening, including UV damage41,42, heat shock43,44 and oxidative stress45. In particular, 3′UTRs markedly shorten during recovery from arsenic stress45, involving both the selection of pPAS and the decay of long 3′UTR isoforms45. Of note, some stress conditions such as heat shock, osmotic stress and oxidative stress can also induce transcriptional read-through (Fig. 2b) — when RNA polymerase continues transcribing past the normal termination sites — leading to the expression of downstream of gene (DoG) transcripts46,47. It is notable that when transcriptional read-through is caused by suppressed CPA — a failure to cleave the pre-mRNA at its usual PAS — there is also 3′UTR lengthening through APA, as shown in cells treated with JTE-607, a small-molecule inhibitor of the CPA endonuclease CPSF73 (also known as CPSF3)48,49 (Box 1). By contrast, transcriptional read-through caused by defective termination per se does not seem to elicit substantial 3′UTR lengthening, as shown in cells with knockdown of the transcription termination factor XRN2 (ref. 50). Notably, because some DoG RNAs are poly(A)+ owing to the use of cryptic PAS in intergenic regions (Fig. 2b), poly(A)+ DoG RNA expression can manifest as an extreme case of APA, where PAS selection is beyond the normal gene boundary.
The widespread variability of APA isoforms was first revealed in the late 1990s by expressed sequence tag data9, which helped uncover distinct global trends of APA isoform expression in different human tissues, such as biased expression of distal PAS (dPAS; further from the 5′ end of the gene) isoforms in neuronal tissues and of proximal PAS (pPAS; closer to the 5′ end of the gene) isoforms in blood10. Subsequently, data from microarrays and serial analysis of gene expression provided evidence of APA isoform variations in cell proliferation, differentiation and development. Of note, it was reported that proliferating cells tend to have shortened mRNA 3′UTRs, as exemplified by activated T cells11, whereas progressive 3′UTR lengthening was shown to occur in mouse embryonic development and during differentiation of the myoblast C2C12 cell line12. Concurrently, general 3′UTR shortening was reported in cancer cells13. These early studies established APA as a prominent layer of transcriptomic regulation in different tissues and cell contexts (Fig. 2a). The advent of 3′-end RNA sequencing (RNA-seq) in the early 2010s enabled genome-wide profiling of APA isoforms with greater precision14–16. In addition, standard RNA-seq data have also been used to interrogate APA isoform expression when coupled with bioinformatics methods to demarcate PASs17. In the past few years, there has been a marked increase in the use of single-cell RNA sequencing (scRNA-seq) data to examine APA, owing to the fortuitous fact that most scRNA-seq methods give rise to reads biased to the 3′ end of RNAs (Box 2).
Global trends in cell growth and differentiation
In line with the global 3′UTR lengthening in cell differentiation and development that was reported in early studies using bulk samples12, analysis of scRNA-seq data for approximately two million nuclei from mouse embryonic days 9.5 to 13.5 identified a progressive 3′UTR lengthening trend in all cell types across embryonic stages18. Nevertheless, the degree of APA regulation varies in different cell types, with neurogenesis lineages having more prominent APA than others18. This is in agreement with an early study in which APA events in myogenesis were compared with those in neurogenesis using bulk samples19. It was found that these two lineages share similar APA events but that the global shift in pPAS-to-dPAS use (resulting in 3′UTR lengthening) is executed to a much greater extent in neurogenesis than myogenesis19. One interpretation of these results is that there exists a common APA regulatory mechanism in cell differentiation that leads to transcript lengthening, and that neurogenesis has additional mechanism(s) that augment the common APA regulatory programme (such as Hu proteins, see ‘RNA-binding proteins’). Notably, the neurogenesis-associated APA regulatory scheme is well conserved across species, as shown in fly20,21 and worm22.
In another single cell-based study of pre-implantation development, it was shown that whereas there is continuous 3′UTR lengthening from zygote to morula in mice, human cells exhibit initial lengthening of 3′UTRs (from zygote to four-cell stage) followed by 3′UTR shortening23. However, similar pathways, such as RNA processing, are enriched for APA-regulated genes in both species, indicating that APA has conserved biological consequences. The global trend of 3′UTR lengthening during cell differentiation is likely to be related to the 3′UTR shortening that occurs during cell proliferation11, as differentiated cells are generally quiescent. In line with this, several cell differentiation lineages that involve increased proliferation have a global trend of 3′UTR shortening, including the activation of haematopoietic stem cells to differentiated multipotent progenitors24 and the differentiation of lymphoid cells25.
A key question related to changes in 3′UTR length during proliferation and differentiation concerns the extent to which APA regulation can be attributed to certain phases of the cell cycle. Using scRNA-seq to examine APA events at different stages of the cell cycle — including G1/S, S, G2/M, M and M/G1, in mouse embryonic fibroblasts and two human cell lines26 — it was found that, during the cell cycle, genes that are involved in cell cycle control have greater APA regulation than other genes and, importantly, that changes in APA isoforms are more prominent than changes in mRNA abundance26. These findings highlight a key role of APA in cell cycle regulation. However, it remains unclear to what extent cell proliferation-associated 3′UTR shortening is governed by cell cycle phases. It is also worth noting that although most studies have focused on 3′UTR-APA, IPA is also globally regulated and follows the same switch in pPAS-to-dPAS usage during embryonic development and cell differentiation16, indicating that both types of APA are regulated by similar mechanisms.
Exceptions to general trends
The general APA trends associated with cell proliferation (3′UTR shortening) and differentiation (3′UTR lengthening) have several exceptions. Differentiation of spermatogonial stem cells to mature sperm cells involves stage-wise 3′UTR shortening and IPA activation27–29. This APA scheme, which culminates in spermatids after meiosis, seems to be coordinated with chromatin remodelling (the relaxation of histone proteins and their replacement with spermidine), as well as increased nonsense-mediated decay (NMD) activity that degrades mRNAs with long 3′UTRs30,31. As a result, mRNAs with shorter 3′UTRs can better escape from the large-scale culling of mRNAs that takes place during spermatogenesis, making them available for translation and protein production at later stages of sperm development when transcription ceases to operate. However, the differentiation of oogonia to oocytes (oogenesis), which also involves meiosis, does not have a similar global dPAS-to-pPAS shift28, which indicates that meiosis per se is not likely to be the cause of the unique APA programme in spermatogenesis.
Another exception to general APA trends is the differentiation of professional secretory cells. It was found that differentiation of human embryonic stem cells to trophoblast-like cells in vitro elicits global 3′UTR shortening32. Further analysis of scRNA-seq data from placenta, which is composed of several trophoblast types, revealed that the differentiation of syncytiotrophoblast, a trophoblast subtype specialized in hormone production and secretion during pregnancy, involves a global dPAS-to-pPAS shift that is coupled with the expression of genes in the protein secretion pathway. This secretion-coupled mechanism of APA can also explain APA-mediated 3′UTR shortening during the differentiation of B cells to antibody-secreting plasma cells33, as well as during T cell activation11. However, the latter process has a concomitant increase in cell proliferation, which makes it difficult to separate APA mechanisms coupled with secretion from those connected with proliferation.
Regulation by cell metabolism
As part of the transcriptomic programme, APA is highly dynamic upon changes to the metabolic state of cells. For example, activation of the mTOR pathway, which regulates cell metabolism in response to nutrient availability34, causes general 3′UTR shortening in multiple human and mouse cell types35. Interestingly, certain genes — such as those involved in protein processing in the endoplasmic reticulum and ubiquitin-mediated proteolysis — seem to be particularly affected by mTOR-mediated 3′UTR shortening35,36. Because mTOR pathway activation is an integral part of cell proliferation and growth, it is tempting to speculate that mTOR activation might have an important role in mediating the global APA changes that occur in these processes. This could also explain the general 3′UTR shortening that occurs in cardiac hypertrophy37, where cells grow in size but not in number. The catabolic process of autophagy38, which involves inhibition of mTOR under conditions of nutrient starvation, was found to be augmented by perturbations of multiple CPA factors in a Drosophila model39, further highlighting the cross-talk between cell metabolism and APA regulation.
Cellular response to stress
Stress conditions lead to global APA changes with variable consequences. Some types of stress cause 3′UTR lengthening, such as the ribotoxic stress elicited by anisomycin, which inhibits the peptidyl transferase activity of the ribosome40. However, some other stress conditions, especially those with genotoxic effects, have been reported to cause global 3′UTR shortening, including UV damage41,42, heat shock43,44 and oxidative stress45. In particular, 3′UTRs markedly shorten during recovery from arsenic stress45, involving both the selection of pPAS and the decay of long 3′UTR isoforms45. Of note, some stress conditions such as heat shock, osmotic stress and oxidative stress can also induce transcriptional read-through (Fig. 2b) — when RNA polymerase continues transcribing past the normal termination sites — leading to the expression of downstream of gene (DoG) transcripts46,47. It is notable that when transcriptional read-through is caused by suppressed CPA — a failure to cleave the pre-mRNA at its usual PAS — there is also 3′UTR lengthening through APA, as shown in cells treated with JTE-607, a small-molecule inhibitor of the CPA endonuclease CPSF73 (also known as CPSF3)48,49 (Box 1). By contrast, transcriptional read-through caused by defective termination per se does not seem to elicit substantial 3′UTR lengthening, as shown in cells with knockdown of the transcription termination factor XRN2 (ref. 50). Notably, because some DoG RNAs are poly(A)+ owing to the use of cryptic PAS in intergenic regions (Fig. 2b), poly(A)+ DoG RNA expression can manifest as an extreme case of APA, where PAS selection is beyond the normal gene boundary.
APA in pathological contexts
APA in pathological contexts
Dysregulated APA events are increasingly associated with human disease. Some pathological conditions involve global changes in APA profile related to cell identity or proliferation and/or differentiation state, as in cancer cells; others are associated with a large number of altered APA events owing to dysregulated core CPA factors or regulators, as in infection and some neurological conditions. In addition, mutations of CPA factors and APA regulators have been found to cause many human pathologies (Supplementary Table 1). Moreover, a growing number of mutations and genetic variants associated with PAS have been shown to alter APA with implications for pathological phenotypes and disease risks (Supplementary Tables 2 and 3).
Inflammation and infection
APA has long been known for its role in adaptive immunity, including B cell differentiation and T cell activation11,33; its dynamics are also increasingly appreciated in innate immunity. Human macrophages display 3′UTR shortening after bacterial infection51, and scRNA-seq analysis of peripheral blood mononuclear cells from individuals with COVID-19 and healthy controls indicates that there is general 3′UTR shortening in all patient cell types, including monocytes and natural killer cells52. In accordance with these observations, ablations of CPA factor CFI68 (also known as CPSF6) (Box 1) in a human lung cell line and a mouse fibroblast cell line, which led to pre-emptive 3′UTR shortening, enhanced the antiviral capacity of these cells against vesicular stomatitis virus. The authors attributed this phenomenon to the stabilization of key mRNAs encoding antiviral proteins through 3′UTR shortening53. Along similar lines, the CPSF73 inhibitor JTE-607, which generally suppresses pPAS use, has an anti-inflammatory effect54. This was shown to result from the suppressed expression of mRNAs encoding inflammatory cytokines55, which supports the notion that proper CPA activity is crucial for the rapid increase of cytokine gene expression during inflammation.
Several viruses inhibit CPA through interactions between the CPA machinery and viral proteins, such as the NS1 protein of influenza A virus56 and the ICP27 protein of herpes simplex virus (HSV)57. CPA inhibition in cells infected by these viruses results in APA changes and increased expression of DoG transcripts58,59. Interestingly, it was found that HSV infection can activate both IPA (involving early transcription termination) and DoG RNA expression (involving transcriptional read-through) simultaneously in the same genes59,60, which suggests that the connection between PAS selection and transcription termination may depend on how much CPA activity is reduced. Therefore, although the virus-induced inhibition of CPA is thought to exert general suppression of host gene expression, it is tempting to speculate that mitigation of innate immunity or inflammation through CPA inhibition of specific antiviral genes may be a survival strategy used by viruses.
Cancer
In general, cancer cells express transcripts with shortened 3′UTRs, which could be attributed plausibly to their high proliferation rate13,17. Importantly, the APA profile provides additional information for cancer prognosis beyond that obtained by measurement of gene expression levels17. As is the case for different cell differentiation lineages, the extent of APA varies between cancer types13,17. For example, kidney renal clear cell carcinoma typically has less 3′UTR shortening than some other cancer types, such as lung squamous cell carcinoma17. Notably, kidney renal clear cell carcinoma cells were also found to have increased expression of DoG transcripts61. Importantly, those individuals with kidney renal clear cell carcinoma who display shortened 3′UTRs have worse prognosis62. In some other cancer types, such as skin cutaneous melanoma, poor prognosis is associated with 3′UTR lengthening62, which indicates that the relationship between APA regulation and oncogenic development or severity is likely cancer type-specific. Notably, a recent study found that DoG transcript abundance is associated with poor patient survival for breast, colon and liver cancers63. Since DoG expression could happen together with 3′UTR lengthening owing to decreased CPA activity, it would be interesting to examine whether APA profiles could provide additional power to survival analyses for patients with these cancers. Along these lines, it is worth noting that inefficient CPA was found to induce replication stress64, such as malfunction of replication origins and transcription–replication conflicts (Fig. 2b), which results in DNA damage and genome instability that can further contribute to cancer development.
In keeping with the generally high expression of IPA isoforms in blood cells, IPA dysregulation has been reported in several types of leukaemia and lymphoma10. Widespread IPA activation in chronic lymphocytic leukaemia results in the expression of truncated proteins lacking C-terminal functional domains65. Importantly, owing to their relatively large size, some tumour-suppressor genes are particularly affected by this truncation mechanism in chronic lymphocytic leukaemia cells. By contrast, in multiple myeloma, a cancer that develops from antibody-secreting plasma cells, loss of IPA isoform expression predicts shorter progression-free survival66. Therefore, as with the solid tumours discussed above, the contribution of APA dysregulation to oncogenesis and cancer progression in haematopoietic cells appears highly dependent on the cancer type.
Neurological disorders
The highly prominent preference for using dPAS over pPAS in neurons, resulting in both longer 3′UTRs and the selection of more-distal terminal exons, underscores the importance of expression of long transcripts for neuronal functions67. Consistently, dysregulated APA is emerging as a hallmark of neurodevelopmental disorders and neurodegeneration. Germline mutations in genes encoding several core CPA factors and transcription termination factors lead to neurodevelopmental defects (Supplementary Table 1). In addition, APA isoforms have been identified as a transcriptomic signature in individuals with amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD)68,69. A case in point is the activation of an IPA isoform of STMN2 (encoding the neuronal growth-associated factor stathmin 2) owing to nuclear loss of the RBP TDP-43 (encoded by TARDBP) in individuals with ALS or FTD who have TARDBP mutations70. Importantly, this IPA dysregulation event also takes place in individuals with familial or sporadic ALS or FTD who do not have TARDBP mutations, indicating that it is a convergent pathological event that drives neurodegeneration70.
Disease-causing genetic mutations
PAS mutations have long been found to cause human genetic diseases71 (Supplementary Table 2). For example, an A-to-G mutation that changes the strong PAS hexamer AATAAA in the HBA2 gene (encoding haemoglobin A2) to the weaker hexamer AATAAG in individuals with α-thalassaemia was the first example of a CPA-suppressing (loss-of-function) mutation to be reported72. Conversely, a G-to-A mutation at the cleavage site of the F2 gene (encoding coagulation factor II, also known as prothrombin), which contributes to thrombophilia, was the first example of a CPA-enhancing (gain-of-function) mutation73. In these scenarios, CPA suppression or enhancement results in downregulation or upregulation of mRNA expression levels, respectively, which indicates that, in the cells that manifest these disease phenotypes, CPA might be a rate-limiting step for mRNA expression of affected genes74 (Fig. 2c). In keeping with this, it is worth noting that somatic mutations of PAS hexamer are frequently associated with downregulated expression of tumour-suppressor genes in cancer75.
Whereas genes containing only one PAS may be more susceptible to PAS mutations because of the singular opportunity for CPA, PAS mutations in genes that undergo APA have also been found to alter gene expression (Supplementary Table 2). In accord with the notion that short 3′UTR mRNA isoforms are typically more stable than long 3′UTR isoforms5, mutations that create a PAS hexamer AATAAA in the 3′UTR of CCND1 (encoding cyclin D1) result in 3′UTR shortening, and hence increased mRNA stability and protein production, leading to an increased proliferation rate of mantle-cell lymphoma cells and shorter survival time for patients carrying mutations76. Conversely, mutations that weaken a pPAS in the 3′UTR of NAA10 lead to 3′UTR lengthening and downregulation of gene expression, causing Lenz microphthalmia syndrome77. In agreement with the general trend that IPA isoforms are unstable, a mutation changing AACAAA to AATAAA in intron 8 of ERCC4 (encoding a subunit of the ERCC1–ERCC4 nucleotide excision repair endonuclease) activates IPA isoform expression, and hence reduces ERCC4 expression, in a rare form of xeroderma pigmentosum78.
Genetic variants associated with disease risk
In contrast to the small number of diseases caused by APA-altering mutations, a large number of genetic variants, mostly single nucleotide polymorphisms (SNPs), have been associated with APA isoform changes in several recent studies (Box 3 and summarized in Supplementary Table 3). For simplicity, here we use apaQTL (APA isoform quantitative trait locus) to refer to a genetic variant that is associated with a change in APA isoform expression. As expected, apaQTLs related to 3′UTR-APA are mostly enriched near the 3′ end of the gene79,80. However, apaQTLs in other regions of the gene are also common79, in line with the notion that APA can be regulated by diverse mechanisms (see below).
It was estimated that 16–19% of the reported apaQTLs co-localize with genetic variants that are significantly associated with clinical phenotypes as identified by genome-wide association studies79,80. A few apaQTLs are of particular note because of their validated significance for association with risk of the autoimmune disease systemic lupus erythematosus (SLE) (Fig. 2d). An apaQTL (rs10954213) in the 3′UTR of the transcription factor IRF5 is associated with pPAS strength81, with the ‘A’ allele leading to a strong PAS hexamer (AATAAA) and the ‘G’ allele to a much weaker PAS hexamer (AATGAA). As such, the ‘A’ allele-containing IRF5 gene preferentially expresses a shortened 3′UTR isoform that is more stable and produces a greater amount of IRF5 protein, which is a risk factor for SLE81. In a similar manner, an apaQTL (rs6598) in the 3′UTR of GIMAP5 creates either an AATAAA (‘A’ allele) or AATAGA (‘G’ allele) PAS hexamer82, resulting in a stable, short 3′UTR isoform or an unstable, long 3′UTR isoform, respectively. Because the expression level of GIMAP5 negatively correlates with SLE risk, the ‘G’ allele, which generates a less stable, long 3′UTR isoform, is associated with increased disease risk. Moreover, two genetic variants of TNFSF13B (which encodes B cell survival and activation factor (BAFF)) – one involving deletion and the other an SNP – together create an AATAAA hexamer, leading to 3′UTR shortening and increased mRNA expression owing to removal of a microRNA (miRNA) target site83. Increased expression of BAFF increases the risk of SLE, as well as multiple sclerosis.
Dysregulated APA events are increasingly associated with human disease. Some pathological conditions involve global changes in APA profile related to cell identity or proliferation and/or differentiation state, as in cancer cells; others are associated with a large number of altered APA events owing to dysregulated core CPA factors or regulators, as in infection and some neurological conditions. In addition, mutations of CPA factors and APA regulators have been found to cause many human pathologies (Supplementary Table 1). Moreover, a growing number of mutations and genetic variants associated with PAS have been shown to alter APA with implications for pathological phenotypes and disease risks (Supplementary Tables 2 and 3).
Inflammation and infection
APA has long been known for its role in adaptive immunity, including B cell differentiation and T cell activation11,33; its dynamics are also increasingly appreciated in innate immunity. Human macrophages display 3′UTR shortening after bacterial infection51, and scRNA-seq analysis of peripheral blood mononuclear cells from individuals with COVID-19 and healthy controls indicates that there is general 3′UTR shortening in all patient cell types, including monocytes and natural killer cells52. In accordance with these observations, ablations of CPA factor CFI68 (also known as CPSF6) (Box 1) in a human lung cell line and a mouse fibroblast cell line, which led to pre-emptive 3′UTR shortening, enhanced the antiviral capacity of these cells against vesicular stomatitis virus. The authors attributed this phenomenon to the stabilization of key mRNAs encoding antiviral proteins through 3′UTR shortening53. Along similar lines, the CPSF73 inhibitor JTE-607, which generally suppresses pPAS use, has an anti-inflammatory effect54. This was shown to result from the suppressed expression of mRNAs encoding inflammatory cytokines55, which supports the notion that proper CPA activity is crucial for the rapid increase of cytokine gene expression during inflammation.
Several viruses inhibit CPA through interactions between the CPA machinery and viral proteins, such as the NS1 protein of influenza A virus56 and the ICP27 protein of herpes simplex virus (HSV)57. CPA inhibition in cells infected by these viruses results in APA changes and increased expression of DoG transcripts58,59. Interestingly, it was found that HSV infection can activate both IPA (involving early transcription termination) and DoG RNA expression (involving transcriptional read-through) simultaneously in the same genes59,60, which suggests that the connection between PAS selection and transcription termination may depend on how much CPA activity is reduced. Therefore, although the virus-induced inhibition of CPA is thought to exert general suppression of host gene expression, it is tempting to speculate that mitigation of innate immunity or inflammation through CPA inhibition of specific antiviral genes may be a survival strategy used by viruses.
Cancer
In general, cancer cells express transcripts with shortened 3′UTRs, which could be attributed plausibly to their high proliferation rate13,17. Importantly, the APA profile provides additional information for cancer prognosis beyond that obtained by measurement of gene expression levels17. As is the case for different cell differentiation lineages, the extent of APA varies between cancer types13,17. For example, kidney renal clear cell carcinoma typically has less 3′UTR shortening than some other cancer types, such as lung squamous cell carcinoma17. Notably, kidney renal clear cell carcinoma cells were also found to have increased expression of DoG transcripts61. Importantly, those individuals with kidney renal clear cell carcinoma who display shortened 3′UTRs have worse prognosis62. In some other cancer types, such as skin cutaneous melanoma, poor prognosis is associated with 3′UTR lengthening62, which indicates that the relationship between APA regulation and oncogenic development or severity is likely cancer type-specific. Notably, a recent study found that DoG transcript abundance is associated with poor patient survival for breast, colon and liver cancers63. Since DoG expression could happen together with 3′UTR lengthening owing to decreased CPA activity, it would be interesting to examine whether APA profiles could provide additional power to survival analyses for patients with these cancers. Along these lines, it is worth noting that inefficient CPA was found to induce replication stress64, such as malfunction of replication origins and transcription–replication conflicts (Fig. 2b), which results in DNA damage and genome instability that can further contribute to cancer development.
In keeping with the generally high expression of IPA isoforms in blood cells, IPA dysregulation has been reported in several types of leukaemia and lymphoma10. Widespread IPA activation in chronic lymphocytic leukaemia results in the expression of truncated proteins lacking C-terminal functional domains65. Importantly, owing to their relatively large size, some tumour-suppressor genes are particularly affected by this truncation mechanism in chronic lymphocytic leukaemia cells. By contrast, in multiple myeloma, a cancer that develops from antibody-secreting plasma cells, loss of IPA isoform expression predicts shorter progression-free survival66. Therefore, as with the solid tumours discussed above, the contribution of APA dysregulation to oncogenesis and cancer progression in haematopoietic cells appears highly dependent on the cancer type.
Neurological disorders
The highly prominent preference for using dPAS over pPAS in neurons, resulting in both longer 3′UTRs and the selection of more-distal terminal exons, underscores the importance of expression of long transcripts for neuronal functions67. Consistently, dysregulated APA is emerging as a hallmark of neurodevelopmental disorders and neurodegeneration. Germline mutations in genes encoding several core CPA factors and transcription termination factors lead to neurodevelopmental defects (Supplementary Table 1). In addition, APA isoforms have been identified as a transcriptomic signature in individuals with amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD)68,69. A case in point is the activation of an IPA isoform of STMN2 (encoding the neuronal growth-associated factor stathmin 2) owing to nuclear loss of the RBP TDP-43 (encoded by TARDBP) in individuals with ALS or FTD who have TARDBP mutations70. Importantly, this IPA dysregulation event also takes place in individuals with familial or sporadic ALS or FTD who do not have TARDBP mutations, indicating that it is a convergent pathological event that drives neurodegeneration70.
Disease-causing genetic mutations
PAS mutations have long been found to cause human genetic diseases71 (Supplementary Table 2). For example, an A-to-G mutation that changes the strong PAS hexamer AATAAA in the HBA2 gene (encoding haemoglobin A2) to the weaker hexamer AATAAG in individuals with α-thalassaemia was the first example of a CPA-suppressing (loss-of-function) mutation to be reported72. Conversely, a G-to-A mutation at the cleavage site of the F2 gene (encoding coagulation factor II, also known as prothrombin), which contributes to thrombophilia, was the first example of a CPA-enhancing (gain-of-function) mutation73. In these scenarios, CPA suppression or enhancement results in downregulation or upregulation of mRNA expression levels, respectively, which indicates that, in the cells that manifest these disease phenotypes, CPA might be a rate-limiting step for mRNA expression of affected genes74 (Fig. 2c). In keeping with this, it is worth noting that somatic mutations of PAS hexamer are frequently associated with downregulated expression of tumour-suppressor genes in cancer75.
Whereas genes containing only one PAS may be more susceptible to PAS mutations because of the singular opportunity for CPA, PAS mutations in genes that undergo APA have also been found to alter gene expression (Supplementary Table 2). In accord with the notion that short 3′UTR mRNA isoforms are typically more stable than long 3′UTR isoforms5, mutations that create a PAS hexamer AATAAA in the 3′UTR of CCND1 (encoding cyclin D1) result in 3′UTR shortening, and hence increased mRNA stability and protein production, leading to an increased proliferation rate of mantle-cell lymphoma cells and shorter survival time for patients carrying mutations76. Conversely, mutations that weaken a pPAS in the 3′UTR of NAA10 lead to 3′UTR lengthening and downregulation of gene expression, causing Lenz microphthalmia syndrome77. In agreement with the general trend that IPA isoforms are unstable, a mutation changing AACAAA to AATAAA in intron 8 of ERCC4 (encoding a subunit of the ERCC1–ERCC4 nucleotide excision repair endonuclease) activates IPA isoform expression, and hence reduces ERCC4 expression, in a rare form of xeroderma pigmentosum78.
Genetic variants associated with disease risk
In contrast to the small number of diseases caused by APA-altering mutations, a large number of genetic variants, mostly single nucleotide polymorphisms (SNPs), have been associated with APA isoform changes in several recent studies (Box 3 and summarized in Supplementary Table 3). For simplicity, here we use apaQTL (APA isoform quantitative trait locus) to refer to a genetic variant that is associated with a change in APA isoform expression. As expected, apaQTLs related to 3′UTR-APA are mostly enriched near the 3′ end of the gene79,80. However, apaQTLs in other regions of the gene are also common79, in line with the notion that APA can be regulated by diverse mechanisms (see below).
It was estimated that 16–19% of the reported apaQTLs co-localize with genetic variants that are significantly associated with clinical phenotypes as identified by genome-wide association studies79,80. A few apaQTLs are of particular note because of their validated significance for association with risk of the autoimmune disease systemic lupus erythematosus (SLE) (Fig. 2d). An apaQTL (rs10954213) in the 3′UTR of the transcription factor IRF5 is associated with pPAS strength81, with the ‘A’ allele leading to a strong PAS hexamer (AATAAA) and the ‘G’ allele to a much weaker PAS hexamer (AATGAA). As such, the ‘A’ allele-containing IRF5 gene preferentially expresses a shortened 3′UTR isoform that is more stable and produces a greater amount of IRF5 protein, which is a risk factor for SLE81. In a similar manner, an apaQTL (rs6598) in the 3′UTR of GIMAP5 creates either an AATAAA (‘A’ allele) or AATAGA (‘G’ allele) PAS hexamer82, resulting in a stable, short 3′UTR isoform or an unstable, long 3′UTR isoform, respectively. Because the expression level of GIMAP5 negatively correlates with SLE risk, the ‘G’ allele, which generates a less stable, long 3′UTR isoform, is associated with increased disease risk. Moreover, two genetic variants of TNFSF13B (which encodes B cell survival and activation factor (BAFF)) – one involving deletion and the other an SNP – together create an AATAAA hexamer, leading to 3′UTR shortening and increased mRNA expression owing to removal of a microRNA (miRNA) target site83. Increased expression of BAFF increases the risk of SLE, as well as multiple sclerosis.
Functional effects of 3′UTR-APA
Functional effects of 3′UTR-APA
APA has diverse effects on gene expression; here, we discuss the general principles and highlight some recent findings relating to the functional roles of 3′UTR-APA. Readers can find additional information on this subject in earlier reviews84–86.
By definition, 3′UTR-APA changes the size and content of the 3′UTR of an mRNA (Fig. 3a). For simplicity, the 3′UTR sequence that is subject to regulation by APA (between the first and last PAS) is named the alternative 3′UTR (aUTR), and the sequence that is common to all isoforms is named the common 3′UTR (cUTR) (Fig. 3a). The median cUTR and aUTR sizes in human genes are ~280 nucleotides and ~900 nucleotides, respectively5. As such, 3′UTR-APA can generate mRNA isoforms that differ in length by more than fourfold.
mRNA subcellular localization
3′UTRs have long been known to influence the subcellular localization of mRNAs in polarized cells, such as neurons. For example, the short and long 3′UTR isoforms of BDNF (encoding brain-derived neurotrophic factor) are enriched in somata and dendrites, respectively87. In line with this, using micro-dissected rat hippocampal slices and 3′-end RNA-seq, it was shown that, in general, the neuropil area (a synaptically dense region containing a relatively low number of cell bodies) has mRNAs with longer 3′UTRs than those in somata88. However, the opposite trend – of long 3′UTR isoforms being more localized in somata – has also been reported for many genes88. This supports the notion that binding to cognate RBPs, such as Pumilio (in Drosophila) or PUM2 (in humans)89, as opposed to 3′UTR size per se, is the driving force for differential localization of 3′UTR isoforms90. In addition, the neuropil-enriched transcripts tend to have a higher GC content and are more likely to fold into structured RNAs, which implicates the binding of RBPs to structured RNAs in determining mRNA localization. Importantly, neural activity drives changes in 3′UTR isoform abundance between the two compartments, indicating that there is localized regulation of mRNA metabolism through APA isoforms88.
The functions of the aUTR in mRNA localization have also been studied in cells that are not overtly polarized. Using APEX2-based proximity labelling, it was shown that transcripts with short 3′UTRs are more likely to be associated with nuclear pores than those with long 3′UTRs91. Although this study does not compare 3′UTR isoforms directly, it implicates 3′UTR size as a parameter for nuclear export kinetics. At the steady state, long 3′UTR isoforms generally have a higher nuclear-to-cytoplasmic abundance ratio than do short 3′UTR isoforms92,93, which suggests that aUTRs may hinder nuclear export. However, owing to enriched UGUA motifs, aUTRs are also more likely to bind the cleavage factor I (CFI) complex component CFI68 (refs. 93–95), which also functions as a nuclear export adaptor. Therefore, it remains an open question as to how 3′UTR size versus CFI68 binding impacts the nuclear export kinetics of different 3′UTR isoforms.
The proximity labelling study also indicates a 3′UTR-mediated scheme for mRNA localization to certain organelles, such as mitochondria91. Along the same lines, using cell fractionation and 3′-end sequencing, another study found that aUTRs generally facilitate translation-independent endoplasmic reticulum association of mRNAs96. The authors also found that 3′UTR size, GC content and RNA structure are key features that can predict the potential of a given mRNA for translation-independent endoplasmic reticulum association. Although the underlying mechanism(s) are yet to be determined, some RBPs have previously been found to be endoplasmic reticulum bound97, which could potentially facilitate aUTR-mediated endoplasmic reticulum association. One ramification of translation-independent endoplasmic reticulum association is its influence on where in the cytoplasm an mRNA is translated, and hence the location of its newly made protein. Similarly, by analysing mRNAs that are enriched in the endoplasmic reticulum, cytosol and TIS granules (mesh-like condensate structures associated with the endoplasmic reticulum), it was found that most mRNAs have biased subcellular localization dependent on their sequence or functional features98. For example, mRNAs encoding transcription factors tend to be enriched in TIS granules, and endoplasmic reticulum-enriched mRNAs generally encode large proteins with high expression levels98.
mRNA stability and translation
The 3′UTR contains many mRNA stability elements, such as AU-rich elements (AREs), GU-rich elements (GREs) and miRNA target sites99. Interestingly, the AAUAAA motif in the 3′UTR can also serve as a decay signal for LC3B-medicated mRNA decay, involving binding of mRNA by the lipidated autophagy regulator LC3B and subsequent mRNA deadenylation100. In addition, the size of the 3′UTR has a negative effect on mRNA stability through the NMD pathway, owing to either the length-dependent recruitment of UPF1 (ref. 101) or spurious splicing events in long 3′UTRs102. In several human cell lines103, it was reported that 3′UTRs in the newly made mRNA pool are globally longer than in the steady-state pool, which suggests that short 3′UTR isoforms are generally more stable than long 3′UTR isoforms. However, a less pronounced trend was reported in mouse NIH3T3 cells104. In addition, in activated T cells, which display global 3′UTR shortening, there are no discernible changes in mRNA levels that could be attributed to changes in 3′UTR lengths105. Therefore, the effects of aUTR on mRNA stability do not seem to be clear-cut, presumably owing to the variable milieus of mRNA-stabilizing and mRNA-destabilizing RBPs that determine the final outcomes of a specific 3′UTR isoform in different cells. A similarly complex relationship exists between 3′UTR isoforms and translation efficiency, as divergent results have been obtained in different cell types104–106. Of note, a recent analysis using a large number of ribosome profiling data reported a limited role of the 3′UTR in translation efficiency107.
In line with the cell type-specific influence on mRNA stability, it was found that long 3′UTR isoforms are relatively more stable in the neuronal cell line model SH-SY5Y than in the non-neuronal cell lines HEK293T and HepG2 (ref. 103). In keeping with this, mRNAs with greater stability have been shown to facilitate their localization to neurites108 and, in the neuropil compartment of the hippocampus, long 3′UTR isoforms are generally more stable than short 3′UTR isoforms88. In the latter study, the authors found that long 3′UTR isoforms for some genes can be shortened in a compartment-specific manner following neural activity, presumably through endonuclease cleavage or 3′-to-5′ degradation. Notably, this post-transcriptional remodelling of APA isoforms was also reported in ageing brain, where translation-coupled mRNA decay was implicated as the mechanism109.
Formation of double-stranded RNA
The Alu sequence is the major type of short interspersed nuclear element in the human genome; it accounts for ~11% of the entire genomic sequence and is often found in 3′UTRs. Two Alu sequences oriented in opposite directions, known as inverted Alu (IRAlu), can form an extended double-stranded RNA (dsRNA) structure. In most cells, IRAlu-containing mRNAs are retained in paraspeckles in the nucleus110 – membraneless structures that are dynamically regulated under stress conditions – thereby preventing them from activating dsRNA-sensing mechanisms of the innate immunity system in the cytoplasm. It was recently shown that removal of IRAlu from the 3′UTR of MDM2 mRNA through APA is crucial for its protein expression and for MDM2-mediated inhibition of tumour-suppressor protein p53 during tumorigenesis111. The same study also reported that longer 3′UTRs in neural progenitor cells cause downregulation of genes containing IRAlus in their aUTRs111. Notably, owing to globally increased expression of long 3′UTR isoforms, neuronal cells were found to have a higher dsRNA content overall, which predisposes them to dsRNA-elicited inflammation112.
Formation of condensates
RNA–RBP condensates are membraneless structures formed by liquid–liquid phase separation113. Because of increased valency in RNA–RBP interactions, long 3′UTRs have greater potentials to form condensates114. For example, 3′UTR size is a key determinant of mRNA recruitment to stress granules, a type of condensate formed in the cytoplasm when mRNA translation is inhibited by stress115. On average, the shortest and longest 3′UTRs of a gene are comparable with the sizes of its 5′UTR (~280 nucleotides) and coding sequence (~1.2 kb), respectively5. Therefore, with respect to size, aUTRs could have a substantial impact on the formation of RNA–RBP condensates. On this note, it would be interesting to examine whether neurons, which have longer 3′UTRs than other cell types, are more prone to stress granule formation, which is a contributing factor in neurodegeneration116.
RNA-mediated protein scaffolding
Some 3′UTRs can have a structural and scaffolding role in facilitating protein–protein interactions99. For example, it was shown that the aUTR of CD47 facilitates the interaction between newly made CD47 protein and the adaptor protein SET, leading to increased plasma membrane localization of the ‘don’t eat me’ signal through CD47 and better protection of cells from phagocytosis by macrophages117. Similarly, it was found that, despite similar mRNA localization of long and short 3′UTR isoforms of BIRC3, the resulting proteins bind distinct sets of partners, leading to divergent functions in the regulation of cell death (aUTR-independent) versus B cell migration (aUTR-dependent)118.
3′UTR-mediated protein–protein interactions are reminiscent of the interactions mediated by some lncRNAs. A case in point is the lncRNA gene NEAT1, which generates two major APA isoforms: NEAT1_1, a 3.7 kb, short isoform; and NEAT1_2, a 22.7 kb, long isoform (Fig. 3b). NEAT1_2 has a tRNA-like structure at its 3′ end, resulting from cleavage by RNase P, and is essential for scaffolding paraspeckles in the nucleus119,120. By contrast, NEAT1_1 has a poly(A) tail, can be exported into the cytoplasm and is dispensable for paraspeckle formation. However, NEAT1_1 can bind and hold together several glycolytic enzymes, including PGK1, PGAM1 and ENO1, thereby increasing the efficiency of glycolysis through substrate channelling across these enzymes121. As such, NEAT1_1 has been implicated in promoting aerobic glycolysis in cancer cells121. It is yet to be seen whether the aUTRs of protein-coding genes can have similar trans-acting functions, independent of the protein encoded by the same mRNA.
Functional effects of intronic APA
IPA isoforms have traditionally been less well studied than 3′UTR-APA isoforms, owing largely to their low expression levels and the prevalence of A-rich sequences in introns that can complicate the identification of intronic PAS (Box 2). However, with the advent of 3′-end sequencing methods and better annotation of IPA sites in the genome, understanding of the mechanisms of gene regulation by IPA, particularly its role in suppressing gene expression, has grown rapidly in recent years.
Protein isoform regulation
By default, IPA changes the C-terminal region of protein products (Fig. 4). As such, one functional consequence is variation of protein localization that is based on the C-terminal region. For example, the IGHM gene (encoding the IgM antibody heavy chain)122,123, which was the first reported case of APA, produces a membrane-anchored protein isoform (CPA occurring in the last exon) in B cells and a secreted, soluble isoform (IPA isoform) in plasma cells. It has been estimated that more than 400 human genes could have similar IPA-mediated regulation of membrane-bound versus soluble isoforms124 (Fig. 4). Notably, splicing-suppressing antisense oligonucleotides were found to activate IPA, thereby switching transmembrane receptor tyrosine kinases (RTKs), such as vascular endothelial growth factor receptor 2 (encoded by KDR), to their secreted, soluble isoforms with antagonistic functions125. This approach holds promise as a therapeutic modality for RTK-related diseases, such as cancer.
IPA-induced changes in the C-terminal region have also been shown to alter the location and/or activities of intracellular proteins (Fig. 4). For example, the CDC42 gene expresses a ubiquitous protein isoform CDC42u (using a PAS in the last exon) and a brain-specific isoform CDC42b (using an intronic PAS). Although these isoforms differ only in the last few amino acids at their C termini, it was found that in astrocytes and neural precursors CDC42u is associated with the plasma membrane at the leading edge of migrating cells, whereas CDC42b is localized mainly to intracellular membrane compartments where it has a function in endocytosis126 (Fig. 4). Another prominent example is the gene GLS (encoding glutaminase); protein products of the IPA isoform of GLS (termed GAC) are localized to mitochondria whereas those encoded by isoforms using the last exon PAS (termed KGA) are mostly cytosolic127 (Fig. 4). Importantly, GAC is enzymatically more active than KGA, which is crucial for cancer cells that preferentially express the IPA isoform and useglutamine as a fuel for growth128.
Premature transcription termination
An emerging regulatory scheme involving IPA is the suppression of gene expression by premature transcription termination129. For example, IPA is globally activated under conditions of genotoxic stress, resulting in widespread premature transcription termination. This may function as an adaptive mechanism to avoid the transcription of damaged DNA in stressed cells, which could otherwise result in aberrant transcripts as well as genome instability. Notably, IPA isoforms containing a 5′ splice site (5′SS) (composite internal–terminal exon (Fig. 1b)) are generally unstable103,130 owing to U1 small nuclear ribonucleoprotein (snRNP)-mediated nuclear retention of RNA131 and RNA degradation through nuclear surveillance by the poly(A) tail exosome targeting (PAXT) complex130 (Fig. 4). In addition, IPA isoforms containing a composite internal–terminal exon often have a long 3′UTR that could make them susceptible to cytoplasmic degradation by NMD101,102 and/or encode suboptimal C-terminal protein sequences that are subject to rapid proteasomal degradation132 (Fig. 4).
Notably, genes encoding several core CPA factors harbour conserved intronic PAS, including RBBP6 (ref. 133), CSTF3 (refs. 134,135), PCF11 (refs. 136,137), PAPOLA138 and WDR33 (ref. 139). Whereas some protein products of IPA isoforms are non-functional (for example, PCF11), some others have dominant-negative functions (for example, RBBP6) (Fig. 4). Presumably, these conserved IPA events equip CPA factor genes with negative-feedback regulatory mechanisms to modulate their protein activities (often the full-length protein) and maintain the homeostasis of CPA activity in the cell.
APA has diverse effects on gene expression; here, we discuss the general principles and highlight some recent findings relating to the functional roles of 3′UTR-APA. Readers can find additional information on this subject in earlier reviews84–86.
By definition, 3′UTR-APA changes the size and content of the 3′UTR of an mRNA (Fig. 3a). For simplicity, the 3′UTR sequence that is subject to regulation by APA (between the first and last PAS) is named the alternative 3′UTR (aUTR), and the sequence that is common to all isoforms is named the common 3′UTR (cUTR) (Fig. 3a). The median cUTR and aUTR sizes in human genes are ~280 nucleotides and ~900 nucleotides, respectively5. As such, 3′UTR-APA can generate mRNA isoforms that differ in length by more than fourfold.
mRNA subcellular localization
3′UTRs have long been known to influence the subcellular localization of mRNAs in polarized cells, such as neurons. For example, the short and long 3′UTR isoforms of BDNF (encoding brain-derived neurotrophic factor) are enriched in somata and dendrites, respectively87. In line with this, using micro-dissected rat hippocampal slices and 3′-end RNA-seq, it was shown that, in general, the neuropil area (a synaptically dense region containing a relatively low number of cell bodies) has mRNAs with longer 3′UTRs than those in somata88. However, the opposite trend – of long 3′UTR isoforms being more localized in somata – has also been reported for many genes88. This supports the notion that binding to cognate RBPs, such as Pumilio (in Drosophila) or PUM2 (in humans)89, as opposed to 3′UTR size per se, is the driving force for differential localization of 3′UTR isoforms90. In addition, the neuropil-enriched transcripts tend to have a higher GC content and are more likely to fold into structured RNAs, which implicates the binding of RBPs to structured RNAs in determining mRNA localization. Importantly, neural activity drives changes in 3′UTR isoform abundance between the two compartments, indicating that there is localized regulation of mRNA metabolism through APA isoforms88.
The functions of the aUTR in mRNA localization have also been studied in cells that are not overtly polarized. Using APEX2-based proximity labelling, it was shown that transcripts with short 3′UTRs are more likely to be associated with nuclear pores than those with long 3′UTRs91. Although this study does not compare 3′UTR isoforms directly, it implicates 3′UTR size as a parameter for nuclear export kinetics. At the steady state, long 3′UTR isoforms generally have a higher nuclear-to-cytoplasmic abundance ratio than do short 3′UTR isoforms92,93, which suggests that aUTRs may hinder nuclear export. However, owing to enriched UGUA motifs, aUTRs are also more likely to bind the cleavage factor I (CFI) complex component CFI68 (refs. 93–95), which also functions as a nuclear export adaptor. Therefore, it remains an open question as to how 3′UTR size versus CFI68 binding impacts the nuclear export kinetics of different 3′UTR isoforms.
The proximity labelling study also indicates a 3′UTR-mediated scheme for mRNA localization to certain organelles, such as mitochondria91. Along the same lines, using cell fractionation and 3′-end sequencing, another study found that aUTRs generally facilitate translation-independent endoplasmic reticulum association of mRNAs96. The authors also found that 3′UTR size, GC content and RNA structure are key features that can predict the potential of a given mRNA for translation-independent endoplasmic reticulum association. Although the underlying mechanism(s) are yet to be determined, some RBPs have previously been found to be endoplasmic reticulum bound97, which could potentially facilitate aUTR-mediated endoplasmic reticulum association. One ramification of translation-independent endoplasmic reticulum association is its influence on where in the cytoplasm an mRNA is translated, and hence the location of its newly made protein. Similarly, by analysing mRNAs that are enriched in the endoplasmic reticulum, cytosol and TIS granules (mesh-like condensate structures associated with the endoplasmic reticulum), it was found that most mRNAs have biased subcellular localization dependent on their sequence or functional features98. For example, mRNAs encoding transcription factors tend to be enriched in TIS granules, and endoplasmic reticulum-enriched mRNAs generally encode large proteins with high expression levels98.
mRNA stability and translation
The 3′UTR contains many mRNA stability elements, such as AU-rich elements (AREs), GU-rich elements (GREs) and miRNA target sites99. Interestingly, the AAUAAA motif in the 3′UTR can also serve as a decay signal for LC3B-medicated mRNA decay, involving binding of mRNA by the lipidated autophagy regulator LC3B and subsequent mRNA deadenylation100. In addition, the size of the 3′UTR has a negative effect on mRNA stability through the NMD pathway, owing to either the length-dependent recruitment of UPF1 (ref. 101) or spurious splicing events in long 3′UTRs102. In several human cell lines103, it was reported that 3′UTRs in the newly made mRNA pool are globally longer than in the steady-state pool, which suggests that short 3′UTR isoforms are generally more stable than long 3′UTR isoforms. However, a less pronounced trend was reported in mouse NIH3T3 cells104. In addition, in activated T cells, which display global 3′UTR shortening, there are no discernible changes in mRNA levels that could be attributed to changes in 3′UTR lengths105. Therefore, the effects of aUTR on mRNA stability do not seem to be clear-cut, presumably owing to the variable milieus of mRNA-stabilizing and mRNA-destabilizing RBPs that determine the final outcomes of a specific 3′UTR isoform in different cells. A similarly complex relationship exists between 3′UTR isoforms and translation efficiency, as divergent results have been obtained in different cell types104–106. Of note, a recent analysis using a large number of ribosome profiling data reported a limited role of the 3′UTR in translation efficiency107.
In line with the cell type-specific influence on mRNA stability, it was found that long 3′UTR isoforms are relatively more stable in the neuronal cell line model SH-SY5Y than in the non-neuronal cell lines HEK293T and HepG2 (ref. 103). In keeping with this, mRNAs with greater stability have been shown to facilitate their localization to neurites108 and, in the neuropil compartment of the hippocampus, long 3′UTR isoforms are generally more stable than short 3′UTR isoforms88. In the latter study, the authors found that long 3′UTR isoforms for some genes can be shortened in a compartment-specific manner following neural activity, presumably through endonuclease cleavage or 3′-to-5′ degradation. Notably, this post-transcriptional remodelling of APA isoforms was also reported in ageing brain, where translation-coupled mRNA decay was implicated as the mechanism109.
Formation of double-stranded RNA
The Alu sequence is the major type of short interspersed nuclear element in the human genome; it accounts for ~11% of the entire genomic sequence and is often found in 3′UTRs. Two Alu sequences oriented in opposite directions, known as inverted Alu (IRAlu), can form an extended double-stranded RNA (dsRNA) structure. In most cells, IRAlu-containing mRNAs are retained in paraspeckles in the nucleus110 – membraneless structures that are dynamically regulated under stress conditions – thereby preventing them from activating dsRNA-sensing mechanisms of the innate immunity system in the cytoplasm. It was recently shown that removal of IRAlu from the 3′UTR of MDM2 mRNA through APA is crucial for its protein expression and for MDM2-mediated inhibition of tumour-suppressor protein p53 during tumorigenesis111. The same study also reported that longer 3′UTRs in neural progenitor cells cause downregulation of genes containing IRAlus in their aUTRs111. Notably, owing to globally increased expression of long 3′UTR isoforms, neuronal cells were found to have a higher dsRNA content overall, which predisposes them to dsRNA-elicited inflammation112.
Formation of condensates
RNA–RBP condensates are membraneless structures formed by liquid–liquid phase separation113. Because of increased valency in RNA–RBP interactions, long 3′UTRs have greater potentials to form condensates114. For example, 3′UTR size is a key determinant of mRNA recruitment to stress granules, a type of condensate formed in the cytoplasm when mRNA translation is inhibited by stress115. On average, the shortest and longest 3′UTRs of a gene are comparable with the sizes of its 5′UTR (~280 nucleotides) and coding sequence (~1.2 kb), respectively5. Therefore, with respect to size, aUTRs could have a substantial impact on the formation of RNA–RBP condensates. On this note, it would be interesting to examine whether neurons, which have longer 3′UTRs than other cell types, are more prone to stress granule formation, which is a contributing factor in neurodegeneration116.
RNA-mediated protein scaffolding
Some 3′UTRs can have a structural and scaffolding role in facilitating protein–protein interactions99. For example, it was shown that the aUTR of CD47 facilitates the interaction between newly made CD47 protein and the adaptor protein SET, leading to increased plasma membrane localization of the ‘don’t eat me’ signal through CD47 and better protection of cells from phagocytosis by macrophages117. Similarly, it was found that, despite similar mRNA localization of long and short 3′UTR isoforms of BIRC3, the resulting proteins bind distinct sets of partners, leading to divergent functions in the regulation of cell death (aUTR-independent) versus B cell migration (aUTR-dependent)118.
3′UTR-mediated protein–protein interactions are reminiscent of the interactions mediated by some lncRNAs. A case in point is the lncRNA gene NEAT1, which generates two major APA isoforms: NEAT1_1, a 3.7 kb, short isoform; and NEAT1_2, a 22.7 kb, long isoform (Fig. 3b). NEAT1_2 has a tRNA-like structure at its 3′ end, resulting from cleavage by RNase P, and is essential for scaffolding paraspeckles in the nucleus119,120. By contrast, NEAT1_1 has a poly(A) tail, can be exported into the cytoplasm and is dispensable for paraspeckle formation. However, NEAT1_1 can bind and hold together several glycolytic enzymes, including PGK1, PGAM1 and ENO1, thereby increasing the efficiency of glycolysis through substrate channelling across these enzymes121. As such, NEAT1_1 has been implicated in promoting aerobic glycolysis in cancer cells121. It is yet to be seen whether the aUTRs of protein-coding genes can have similar trans-acting functions, independent of the protein encoded by the same mRNA.
Functional effects of intronic APA
IPA isoforms have traditionally been less well studied than 3′UTR-APA isoforms, owing largely to their low expression levels and the prevalence of A-rich sequences in introns that can complicate the identification of intronic PAS (Box 2). However, with the advent of 3′-end sequencing methods and better annotation of IPA sites in the genome, understanding of the mechanisms of gene regulation by IPA, particularly its role in suppressing gene expression, has grown rapidly in recent years.
Protein isoform regulation
By default, IPA changes the C-terminal region of protein products (Fig. 4). As such, one functional consequence is variation of protein localization that is based on the C-terminal region. For example, the IGHM gene (encoding the IgM antibody heavy chain)122,123, which was the first reported case of APA, produces a membrane-anchored protein isoform (CPA occurring in the last exon) in B cells and a secreted, soluble isoform (IPA isoform) in plasma cells. It has been estimated that more than 400 human genes could have similar IPA-mediated regulation of membrane-bound versus soluble isoforms124 (Fig. 4). Notably, splicing-suppressing antisense oligonucleotides were found to activate IPA, thereby switching transmembrane receptor tyrosine kinases (RTKs), such as vascular endothelial growth factor receptor 2 (encoded by KDR), to their secreted, soluble isoforms with antagonistic functions125. This approach holds promise as a therapeutic modality for RTK-related diseases, such as cancer.
IPA-induced changes in the C-terminal region have also been shown to alter the location and/or activities of intracellular proteins (Fig. 4). For example, the CDC42 gene expresses a ubiquitous protein isoform CDC42u (using a PAS in the last exon) and a brain-specific isoform CDC42b (using an intronic PAS). Although these isoforms differ only in the last few amino acids at their C termini, it was found that in astrocytes and neural precursors CDC42u is associated with the plasma membrane at the leading edge of migrating cells, whereas CDC42b is localized mainly to intracellular membrane compartments where it has a function in endocytosis126 (Fig. 4). Another prominent example is the gene GLS (encoding glutaminase); protein products of the IPA isoform of GLS (termed GAC) are localized to mitochondria whereas those encoded by isoforms using the last exon PAS (termed KGA) are mostly cytosolic127 (Fig. 4). Importantly, GAC is enzymatically more active than KGA, which is crucial for cancer cells that preferentially express the IPA isoform and useglutamine as a fuel for growth128.
Premature transcription termination
An emerging regulatory scheme involving IPA is the suppression of gene expression by premature transcription termination129. For example, IPA is globally activated under conditions of genotoxic stress, resulting in widespread premature transcription termination. This may function as an adaptive mechanism to avoid the transcription of damaged DNA in stressed cells, which could otherwise result in aberrant transcripts as well as genome instability. Notably, IPA isoforms containing a 5′ splice site (5′SS) (composite internal–terminal exon (Fig. 1b)) are generally unstable103,130 owing to U1 small nuclear ribonucleoprotein (snRNP)-mediated nuclear retention of RNA131 and RNA degradation through nuclear surveillance by the poly(A) tail exosome targeting (PAXT) complex130 (Fig. 4). In addition, IPA isoforms containing a composite internal–terminal exon often have a long 3′UTR that could make them susceptible to cytoplasmic degradation by NMD101,102 and/or encode suboptimal C-terminal protein sequences that are subject to rapid proteasomal degradation132 (Fig. 4).
Notably, genes encoding several core CPA factors harbour conserved intronic PAS, including RBBP6 (ref. 133), CSTF3 (refs. 134,135), PCF11 (refs. 136,137), PAPOLA138 and WDR33 (ref. 139). Whereas some protein products of IPA isoforms are non-functional (for example, PCF11), some others have dominant-negative functions (for example, RBBP6) (Fig. 4). Presumably, these conserved IPA events equip CPA factor genes with negative-feedback regulatory mechanisms to modulate their protein activities (often the full-length protein) and maintain the homeostasis of CPA activity in the cell.
Mechanisms for the regulation of APA
Mechanisms for the regulation of APA
A growing number of APA regulatory mechanisms have been discovered in recent years, thanks to various sequencing techniques that interrogate nascent and mature RNAs as well as high-throughput and machine-learning approaches to uncover underlying rules for different regulators (Box 2).
Core CPA factors
Regulation of core CPA factors (Box 1) has been implicated in global APA changes in many physiological and pathological processes, such as cell proliferation (regulated by CSTF64 (ref. 140)), spermatogenesis (CSTF64τ (ref. 141), CFI25 and CFI68 (ref. 142), and PCF11 (ref. 143)), pre-implantation embryonic development (CFI25 and CFI68 (ref. 23)), haematopoiesis (CFI25 (ref. 24)), neurogenesis (PCF11 (ref. 144)), somatic cell reprogramming (CFI25 (ref. 145)), renewal of embryonic stem cells (FIP1 (ref. 146)), inflammation (CFI25 and CFI68 (ref. 147)), oncogenesis (FIP1 (ref. 148)) and tumour suppression (CFI25 (ref. 149)). Conversely, germline mutations of several CPA factor genes have been associated with human diseases (Supplementary Table 1).
Most core CPA factors promote pPAS usage in the last exon150, such as FIP1, RBBP6, CSTF64 (encoded by CSTF2) and its paralogue CSTF64τ (encoded by CSTF2T), and PCF11. Conversely, CPA inhibition by genetic knockdown of CPA factors or by the small-molecule CPSF73 inhibitor JTE-607 leads to a global pPAS-to-dPAS usage shift48,49. This regulatory mode is in line with the first-come-first-served model for PAS usage151 and is in accord with the fact that a pPAS is typically weaker than a dPAS as defined by surrounding RNA motifs5 (Fig. 5a), a configuration that would confer regulatability to the suboptimal pPAS as well as ensure proper transcription termination after a strong dPAS.
By contrast, ablations of CFI25 (also known as CPSF5; encoded by NUDT21) and CFI68, two components of the CFI complex — as well as the poly(A) tail-binding proteins PABPN1 and PABPC1 — lead to global dPAS-to-pPAS shifts150,152 (Fig. 5a). The mechanisms by which PABPN1 and PABPC1 regulate APA are not fully resolved, even though PABPN1 was shown to inhibit pPAS usage152. Much mechanistic insight, however, has been learned about APA regulation mediated by CFI25 and CFI68 (refs. 150,153,154), which involves binding of CFI25 to the UGUA motifs that are enriched in the upstream region of dPAS150,153,154 and recruitment of the CPA machinery by CFI68 through interaction with FIP1 (ref. 153) and/or the formation of nuclear condensates155. On this note, nuclear condensate formation has recently been shown for RBBP6, which may function as the effector of PAS usage through its function in activation of the CPA endonuclease CPSF73 (ref. 156).
Notably, the distance between dPAS and pPAS is an important determinant of APA regulation for both the first-come-first-served scheme and the CFI complex-mediated mechanism150,151. Given that more than half of CPA events are estimated to occur near nuclear speckles156, it would be interesting to examine how these two modes of APA regulation are executed for genes in different loci of the nucleus. On this note, it was found that increased dwell time in the nucleus could subject APA isoforms with long 3′UTRs to additional CPA, giving rise to short 3′UTR isoforms157,158. This mechanism, known as ‘sequential CPA’, is in line with the global trend of 3′UTR shortening in cells with suppressed expression of nuclear export factors, such as NXF1 (ref. 93), or export adaptor proteins SRSF3 and SRSF7 (ref. 94). Sequential CPA may function as an adaptive mechanism to facilitate localization of mRNAs to cytoplasm through 3′UTR shortening when nuclear export is attenuated, such as during cellular stress159.
RNAPII is generally considered to be an essential component of the CPA machinery (Box 1). Its C-terminal domain (CTD), which contains 52 repeats of the heptapeptide Tyr-Ser-Pro-Thr-Ser-Pro-Ser in human cells, is subject to several types of post-translational modification160. In particular, phosphorylation at different positions of the heptapeptide is coordinated with different stages of the transcription cycle as well as with chromatin configuration. For example, the phosphorylation statuses of Ser5, Ser2 and Thr4 are closely connected, respectively, with transcription initiation, elongation and termination. As such, kinases and phosphatases of the RNAPII CTD, such as CDK9 (ref. 161), CDK12 and CDK13 (refs. 162–164), PP1 (refs. 161,165) and PP2A (ref. 166), have been reported to have substantial influences on PAS selection.
Splicing
Owing to the cooperation between splicing of the last intron and CPA in the last exon151,167, removal of the penultimate intron tends to promote the usage of pPAS in the last exon (Fig. 5a). However, for intronic PAS, splicing is in a kinetic competition with CPA (Fig. 5b). As such, intron features that could reduce the splicing speed, such as weak splice sites and large intron size, facilitate IPA151. Consistently, knockdown of core splicing factors, such as SF3B1 and U2AF2 (ref. 150), or functional inhibition of splicing by antisense morpholinos, such as those targeting U1 (ref. 168), U4 (ref. 169) and U6 (ref. 170) snRNPs, all lead to marked IPA activation. The U1 snRNP-based suppression of CPA168, also known as U1 telescripting171, is of particular note. First, U1 is much more abundant in the cell than other splicing snRNPs, which supports the notion that it has additional roles beyond splicing168. Second, U1 telescripting shows a very strong IPA suppression effect at the 5′ end of a gene, which gradually weakens towards the 3′ end150,171. This polarity of IPA suppression is distinct from the IPA suppression by U2 or U6 snRNPs170. Third, in accordance with its bias to the 5′ end of a gene, U1 snRNP has also been shown to control both promoter directionality172 and alternative promoter selection173. Mechanistically, in addition to the inhibition of IPA through splicing, U1 snRNP increases transcriptional elongation in the intron, which accentuates kinetic suppression of IPA174, and exerts CPA inhibition through physical interactions with the CPA machinery175.
Transcription dynamics
A growing body of evidence indicates that transcription initiation affects CPA activity and, hence, PAS choice. For example, in general, a strong promoter imparts a higher CPA activity on the pre-mRNA, resulting in preferential usage of pPAS176–178. Using long-read RNA-seq, it was found that promoter choice can be directly coupled with PAS selection in a tissue-specific manner and, mechanistically, involves histone acetylation179. More recently, it was shown that selection of a distal promoter (far from the gene body) tends to favour IPA events that result in the use of proximal terminal exons and, conversely, selection of a proximal promoter is more likely to be coupled with CPA in the 3′-most exon180. Mechanistically, the transcriptional elongation rate was found to be a contributing factor for this promoter-PAS coupling. Another recent study found that the Mediator protein complex, which has a key role in transcription initiation, interacts with the CPA factor FIP1 and, hence, regulates APA181. This finding is in line with the long-standing observation that CPA factors are present at both ends of the gene182. Therefore, the transcriptional elongation rate and CPA factor recruitment are two mechanisms that can connect the PAS choice to promoter strength and identity. In this context, it is also worth noting that transcription by immature RNAPII owing to suppression of the integrator complex (INTc) was found to activate IPA and hence premature transcription termination at the 5′ end of genes183. Interestingly, the RNAs generated by this process contain dsRNA species that trigger the integrated stress response183.
In keeping with the kinetic nature of IPA regulation, the inhibition of transcription elongation factors, such as the polymerase-associated factor 1 (PAF1) complex and the cyclin-dependent kinase CDK12, activates IPA globally164,184. Of note, CDK12 mutations cause a ‘BRCAness’ phenotype that is typically associated with mutations of genes in the homologous recombination repair (HRR) pathway, such as BRCA1 and BRCA2. Suppression of CDK12 activity preferentially activates IPA events in HRR genes and leads to their downregulation, thereby phenocopying their mutations164. Although it was shown that HRR genes tend to be long and hence more likely to harbour IPA events, it is not entirely clear why mutations of other elongation factors, which also control IPA in long genes, do not display the same BRCAness phenotype. One possibility is that the expression pattens of CDK12 and HRR genes are generally correlated, leading to functional cooperation. In a similar vein, it is noteworthy that in Drosophila a mutant RNAPII with a slower elongation speed causes substantial IPA activation in the body but not in the brain185, indicating the importance of cell context in determining the outcome of transcription elongation-mediated IPA regulation.
Chromatin structure and modifications
Chromatin configuration and post-translational modifications are integral components of pre-RNA processing (Fig. 5a). For example, genomic regions containing frequently used PAS are depleted of chromatin186 and the pattern of histone H3 trimethylation at Lys36 (H3K36me3) correlates with the selection of alternative PAS178. More recent studies have indicated the relevance of the 3D genome structure to transcriptional activity187. Genes located in different compartments of the nucleus, such as near the nuclear lamina versus in nuclear speckles, have distinct features, including the gene size, gene density and GC content188. Although it has not yet been systematically examined, it is conceivable that CPA activity and APA regulatory mechanisms may differ in different gene compartments. In line with this, short and long genes seem to have distinct sensitivities to knockdown of the CPA factor PCF11 (refs. 136,137) and to inhibition of CPSF73 by JTE-607 (ref. 48). Direct evidence of the impact of 3D genome structure on APA comes from the finding that CTCF and cohesin-mediated chromatin looping in the downstream region of a gene inhibits dPAS usage189. Consistently, DNA methylation of CTCF binding sites, which suppresses CTCF binding, promotes pPAS use189. The CTCF-mediated APA regulation is mechanistically similar to CPA modulation by blocking RNAPII elongation. Akin to this, it was reported that CRISPR–dCas9 targeting the non-template strand of DNA, which blocks RNAPII elongation, could promote the use of PAS situated before the blocking site190,191.
RNA modifications
The N6-methyladenosine (m6A) modification is ubiquitous in mRNAs192. Interestingly, m6A levels are biased towards short 3′UTR isoforms192. In a specific case, the m6A level of an IPA isoform of the HTT gene (encoding the pathogenic Huntingtin protein) was found to correlate with its expression level in individuals with Huntington disease, as well as in cell lines and mouse models of the disease193. These global and gene-specific observations are in line with the proposition that m6A deposition in transcripts could be attributed to the RNA dwell time in the nucleus158. Nevertheless, direct suppression of pPAS use by the m6A reader protein YTHDC1 was reported in MCF-7 and HEK293T cells, which involves sequestration of FIP1 through liquid–liquid phase separation and hence inhibition of FIP1–CPSF30 interaction194. In addition, m6A with 2′-O-methylation (m6Am), which is enriched towards the 5′ end of mRNA195, was found to sequester and hence functionally suppress the core CPA factor PCF11, leading to pPAS inhibition196. Similar to PCF11 knockdown197, this mechanism promotes the all-trans retinoic acid-based therapy of neuroblastoma, which involves induction of cancer cells to a less malignant state through cell differentiation.
RNA-binding proteins
The large repertoire of RBPs in the cell have been increasingly recognized for their functions in APA86,198. By binding to their cognate motifs near the PAS, RBPs mostly hinder, but can also facilitate, the recruitment of the CPA machinery199. In addition, although it has not yet been systematically examined, RBPs that regulate splicing could, in principle, alter IPA activity. Here, we discuss a few examples of biological conditions in which APA regulation by RBPs is widespread, to illustrate the principles of RBP-mediated APA control.
Hu proteins — including the ubiquitously expressed HuR (encoded by ELAVL1) and three neuron-specific paralogues HuB, HuC and HuD (encoded by ELAVL2, ELAVL3 and ELAVL4, respectively) — are RBPs that bind to U-rich regions and have extensive roles in mRNA metabolism. HuB, HuC and HuD downregulate HuR expression through APA, by promoting the expression of a translationally suppressed 3′UTR-APA isoform of HuR200. pPAS suppression by Hu proteins was found to have a widespread role in 3′UTR lengthening in neurogenesis in Drosophila201,202, in particular at the onset of neuronal differentiation203. Similar in mechanism but opposite in the direction of regulation, by interacting with UGUA motifs, PQBP1 was found to suppress the use of UGUA-enriched dPAS in neural projector cells. This regulatory scheme is weakened during neurogenesis, when PQBP1 levels decrease204.
Mutations of the genes encoding TDP-43 and FUS are associated with familial forms of ALS and FTD. TDP-43 binds GU-rich sequences that resemble the binding sites of CPA factors CSTF64 and CSTF64τ (Box 1). Nuclear depletion of TDP-43 owing to its cytoplasmic aggregation is a hallmark of ALS and FTD205,206. Several recent studies have uncovered widespread APA changes in individuals with ALS or FTD who have cytoplasmic aggregation of TDP-43 (refs. 68,69,207,208). FUS preferentially binds GU-rich and G-rich sequences209,210, both of which are enriched in the downstream region of dPAS. Interestingly, APA regulation by FUS was found to involve both its RNA binding activity and its interaction with H3K36me3 (ref. 211).
APA regulation by RBPs is also common in cancer cells. For example, the U-rich motif-binding RBP HNRNPC is responsible for many APA events in metastases of colon cancer212 and breast cancer213. KHDRBS1 (also known as SAM68), which has specificity for binding U(A/U)AA, interacts with the transcription termination factor XRN2 and leads to altered APA in prostate cancer50. Loss-of-function mutations of APC, a GC-rich motif-binding RBP, lead to 3′UTR lengthening in colorectal adenocarcinoma, in which APC mutations are frequent214.
A growing number of APA regulatory mechanisms have been discovered in recent years, thanks to various sequencing techniques that interrogate nascent and mature RNAs as well as high-throughput and machine-learning approaches to uncover underlying rules for different regulators (Box 2).
Core CPA factors
Regulation of core CPA factors (Box 1) has been implicated in global APA changes in many physiological and pathological processes, such as cell proliferation (regulated by CSTF64 (ref. 140)), spermatogenesis (CSTF64τ (ref. 141), CFI25 and CFI68 (ref. 142), and PCF11 (ref. 143)), pre-implantation embryonic development (CFI25 and CFI68 (ref. 23)), haematopoiesis (CFI25 (ref. 24)), neurogenesis (PCF11 (ref. 144)), somatic cell reprogramming (CFI25 (ref. 145)), renewal of embryonic stem cells (FIP1 (ref. 146)), inflammation (CFI25 and CFI68 (ref. 147)), oncogenesis (FIP1 (ref. 148)) and tumour suppression (CFI25 (ref. 149)). Conversely, germline mutations of several CPA factor genes have been associated with human diseases (Supplementary Table 1).
Most core CPA factors promote pPAS usage in the last exon150, such as FIP1, RBBP6, CSTF64 (encoded by CSTF2) and its paralogue CSTF64τ (encoded by CSTF2T), and PCF11. Conversely, CPA inhibition by genetic knockdown of CPA factors or by the small-molecule CPSF73 inhibitor JTE-607 leads to a global pPAS-to-dPAS usage shift48,49. This regulatory mode is in line with the first-come-first-served model for PAS usage151 and is in accord with the fact that a pPAS is typically weaker than a dPAS as defined by surrounding RNA motifs5 (Fig. 5a), a configuration that would confer regulatability to the suboptimal pPAS as well as ensure proper transcription termination after a strong dPAS.
By contrast, ablations of CFI25 (also known as CPSF5; encoded by NUDT21) and CFI68, two components of the CFI complex — as well as the poly(A) tail-binding proteins PABPN1 and PABPC1 — lead to global dPAS-to-pPAS shifts150,152 (Fig. 5a). The mechanisms by which PABPN1 and PABPC1 regulate APA are not fully resolved, even though PABPN1 was shown to inhibit pPAS usage152. Much mechanistic insight, however, has been learned about APA regulation mediated by CFI25 and CFI68 (refs. 150,153,154), which involves binding of CFI25 to the UGUA motifs that are enriched in the upstream region of dPAS150,153,154 and recruitment of the CPA machinery by CFI68 through interaction with FIP1 (ref. 153) and/or the formation of nuclear condensates155. On this note, nuclear condensate formation has recently been shown for RBBP6, which may function as the effector of PAS usage through its function in activation of the CPA endonuclease CPSF73 (ref. 156).
Notably, the distance between dPAS and pPAS is an important determinant of APA regulation for both the first-come-first-served scheme and the CFI complex-mediated mechanism150,151. Given that more than half of CPA events are estimated to occur near nuclear speckles156, it would be interesting to examine how these two modes of APA regulation are executed for genes in different loci of the nucleus. On this note, it was found that increased dwell time in the nucleus could subject APA isoforms with long 3′UTRs to additional CPA, giving rise to short 3′UTR isoforms157,158. This mechanism, known as ‘sequential CPA’, is in line with the global trend of 3′UTR shortening in cells with suppressed expression of nuclear export factors, such as NXF1 (ref. 93), or export adaptor proteins SRSF3 and SRSF7 (ref. 94). Sequential CPA may function as an adaptive mechanism to facilitate localization of mRNAs to cytoplasm through 3′UTR shortening when nuclear export is attenuated, such as during cellular stress159.
RNAPII is generally considered to be an essential component of the CPA machinery (Box 1). Its C-terminal domain (CTD), which contains 52 repeats of the heptapeptide Tyr-Ser-Pro-Thr-Ser-Pro-Ser in human cells, is subject to several types of post-translational modification160. In particular, phosphorylation at different positions of the heptapeptide is coordinated with different stages of the transcription cycle as well as with chromatin configuration. For example, the phosphorylation statuses of Ser5, Ser2 and Thr4 are closely connected, respectively, with transcription initiation, elongation and termination. As such, kinases and phosphatases of the RNAPII CTD, such as CDK9 (ref. 161), CDK12 and CDK13 (refs. 162–164), PP1 (refs. 161,165) and PP2A (ref. 166), have been reported to have substantial influences on PAS selection.
Splicing
Owing to the cooperation between splicing of the last intron and CPA in the last exon151,167, removal of the penultimate intron tends to promote the usage of pPAS in the last exon (Fig. 5a). However, for intronic PAS, splicing is in a kinetic competition with CPA (Fig. 5b). As such, intron features that could reduce the splicing speed, such as weak splice sites and large intron size, facilitate IPA151. Consistently, knockdown of core splicing factors, such as SF3B1 and U2AF2 (ref. 150), or functional inhibition of splicing by antisense morpholinos, such as those targeting U1 (ref. 168), U4 (ref. 169) and U6 (ref. 170) snRNPs, all lead to marked IPA activation. The U1 snRNP-based suppression of CPA168, also known as U1 telescripting171, is of particular note. First, U1 is much more abundant in the cell than other splicing snRNPs, which supports the notion that it has additional roles beyond splicing168. Second, U1 telescripting shows a very strong IPA suppression effect at the 5′ end of a gene, which gradually weakens towards the 3′ end150,171. This polarity of IPA suppression is distinct from the IPA suppression by U2 or U6 snRNPs170. Third, in accordance with its bias to the 5′ end of a gene, U1 snRNP has also been shown to control both promoter directionality172 and alternative promoter selection173. Mechanistically, in addition to the inhibition of IPA through splicing, U1 snRNP increases transcriptional elongation in the intron, which accentuates kinetic suppression of IPA174, and exerts CPA inhibition through physical interactions with the CPA machinery175.
Transcription dynamics
A growing body of evidence indicates that transcription initiation affects CPA activity and, hence, PAS choice. For example, in general, a strong promoter imparts a higher CPA activity on the pre-mRNA, resulting in preferential usage of pPAS176–178. Using long-read RNA-seq, it was found that promoter choice can be directly coupled with PAS selection in a tissue-specific manner and, mechanistically, involves histone acetylation179. More recently, it was shown that selection of a distal promoter (far from the gene body) tends to favour IPA events that result in the use of proximal terminal exons and, conversely, selection of a proximal promoter is more likely to be coupled with CPA in the 3′-most exon180. Mechanistically, the transcriptional elongation rate was found to be a contributing factor for this promoter-PAS coupling. Another recent study found that the Mediator protein complex, which has a key role in transcription initiation, interacts with the CPA factor FIP1 and, hence, regulates APA181. This finding is in line with the long-standing observation that CPA factors are present at both ends of the gene182. Therefore, the transcriptional elongation rate and CPA factor recruitment are two mechanisms that can connect the PAS choice to promoter strength and identity. In this context, it is also worth noting that transcription by immature RNAPII owing to suppression of the integrator complex (INTc) was found to activate IPA and hence premature transcription termination at the 5′ end of genes183. Interestingly, the RNAs generated by this process contain dsRNA species that trigger the integrated stress response183.
In keeping with the kinetic nature of IPA regulation, the inhibition of transcription elongation factors, such as the polymerase-associated factor 1 (PAF1) complex and the cyclin-dependent kinase CDK12, activates IPA globally164,184. Of note, CDK12 mutations cause a ‘BRCAness’ phenotype that is typically associated with mutations of genes in the homologous recombination repair (HRR) pathway, such as BRCA1 and BRCA2. Suppression of CDK12 activity preferentially activates IPA events in HRR genes and leads to their downregulation, thereby phenocopying their mutations164. Although it was shown that HRR genes tend to be long and hence more likely to harbour IPA events, it is not entirely clear why mutations of other elongation factors, which also control IPA in long genes, do not display the same BRCAness phenotype. One possibility is that the expression pattens of CDK12 and HRR genes are generally correlated, leading to functional cooperation. In a similar vein, it is noteworthy that in Drosophila a mutant RNAPII with a slower elongation speed causes substantial IPA activation in the body but not in the brain185, indicating the importance of cell context in determining the outcome of transcription elongation-mediated IPA regulation.
Chromatin structure and modifications
Chromatin configuration and post-translational modifications are integral components of pre-RNA processing (Fig. 5a). For example, genomic regions containing frequently used PAS are depleted of chromatin186 and the pattern of histone H3 trimethylation at Lys36 (H3K36me3) correlates with the selection of alternative PAS178. More recent studies have indicated the relevance of the 3D genome structure to transcriptional activity187. Genes located in different compartments of the nucleus, such as near the nuclear lamina versus in nuclear speckles, have distinct features, including the gene size, gene density and GC content188. Although it has not yet been systematically examined, it is conceivable that CPA activity and APA regulatory mechanisms may differ in different gene compartments. In line with this, short and long genes seem to have distinct sensitivities to knockdown of the CPA factor PCF11 (refs. 136,137) and to inhibition of CPSF73 by JTE-607 (ref. 48). Direct evidence of the impact of 3D genome structure on APA comes from the finding that CTCF and cohesin-mediated chromatin looping in the downstream region of a gene inhibits dPAS usage189. Consistently, DNA methylation of CTCF binding sites, which suppresses CTCF binding, promotes pPAS use189. The CTCF-mediated APA regulation is mechanistically similar to CPA modulation by blocking RNAPII elongation. Akin to this, it was reported that CRISPR–dCas9 targeting the non-template strand of DNA, which blocks RNAPII elongation, could promote the use of PAS situated before the blocking site190,191.
RNA modifications
The N6-methyladenosine (m6A) modification is ubiquitous in mRNAs192. Interestingly, m6A levels are biased towards short 3′UTR isoforms192. In a specific case, the m6A level of an IPA isoform of the HTT gene (encoding the pathogenic Huntingtin protein) was found to correlate with its expression level in individuals with Huntington disease, as well as in cell lines and mouse models of the disease193. These global and gene-specific observations are in line with the proposition that m6A deposition in transcripts could be attributed to the RNA dwell time in the nucleus158. Nevertheless, direct suppression of pPAS use by the m6A reader protein YTHDC1 was reported in MCF-7 and HEK293T cells, which involves sequestration of FIP1 through liquid–liquid phase separation and hence inhibition of FIP1–CPSF30 interaction194. In addition, m6A with 2′-O-methylation (m6Am), which is enriched towards the 5′ end of mRNA195, was found to sequester and hence functionally suppress the core CPA factor PCF11, leading to pPAS inhibition196. Similar to PCF11 knockdown197, this mechanism promotes the all-trans retinoic acid-based therapy of neuroblastoma, which involves induction of cancer cells to a less malignant state through cell differentiation.
RNA-binding proteins
The large repertoire of RBPs in the cell have been increasingly recognized for their functions in APA86,198. By binding to their cognate motifs near the PAS, RBPs mostly hinder, but can also facilitate, the recruitment of the CPA machinery199. In addition, although it has not yet been systematically examined, RBPs that regulate splicing could, in principle, alter IPA activity. Here, we discuss a few examples of biological conditions in which APA regulation by RBPs is widespread, to illustrate the principles of RBP-mediated APA control.
Hu proteins — including the ubiquitously expressed HuR (encoded by ELAVL1) and three neuron-specific paralogues HuB, HuC and HuD (encoded by ELAVL2, ELAVL3 and ELAVL4, respectively) — are RBPs that bind to U-rich regions and have extensive roles in mRNA metabolism. HuB, HuC and HuD downregulate HuR expression through APA, by promoting the expression of a translationally suppressed 3′UTR-APA isoform of HuR200. pPAS suppression by Hu proteins was found to have a widespread role in 3′UTR lengthening in neurogenesis in Drosophila201,202, in particular at the onset of neuronal differentiation203. Similar in mechanism but opposite in the direction of regulation, by interacting with UGUA motifs, PQBP1 was found to suppress the use of UGUA-enriched dPAS in neural projector cells. This regulatory scheme is weakened during neurogenesis, when PQBP1 levels decrease204.
Mutations of the genes encoding TDP-43 and FUS are associated with familial forms of ALS and FTD. TDP-43 binds GU-rich sequences that resemble the binding sites of CPA factors CSTF64 and CSTF64τ (Box 1). Nuclear depletion of TDP-43 owing to its cytoplasmic aggregation is a hallmark of ALS and FTD205,206. Several recent studies have uncovered widespread APA changes in individuals with ALS or FTD who have cytoplasmic aggregation of TDP-43 (refs. 68,69,207,208). FUS preferentially binds GU-rich and G-rich sequences209,210, both of which are enriched in the downstream region of dPAS. Interestingly, APA regulation by FUS was found to involve both its RNA binding activity and its interaction with H3K36me3 (ref. 211).
APA regulation by RBPs is also common in cancer cells. For example, the U-rich motif-binding RBP HNRNPC is responsible for many APA events in metastases of colon cancer212 and breast cancer213. KHDRBS1 (also known as SAM68), which has specificity for binding U(A/U)AA, interacts with the transcription termination factor XRN2 and leads to altered APA in prostate cancer50. Loss-of-function mutations of APC, a GC-rich motif-binding RBP, lead to 3′UTR lengthening in colorectal adenocarcinoma, in which APC mutations are frequent214.
Conclusions and future perspectives
Conclusions and future perspectives
In the past few years, APA has emerged as a widespread gene regulatory mechanism with diverse biological and clinical consequences. Based on more than three decades of biochemical and molecular biology studies of CPA, and propelled more recently by advances in RNA-seq technologies and machine-learning methods, APA research has now come of age. Here, we outline some areas of APA research where outstanding questions remain to be addressed.
The APA isoform expression profile is closely connected with cell identity and is therefore informative as a diagnostic or prognostic tool. However, the accurate identification and quantification of APA isoforms is still challenging, especially when using standard RNA-seq data. Analysing the use of alternative PAS in the context of transcription start site and splicing events is important to understand the coupling between transcriptional and co-transcriptional events. Long-read sequencing, especially on RNA molecules directly, could help address these issues215. In addition, metabolic labelling with 4-thiouridine and APEX2-based proximal labelling could further shed light on the life cycle of APA isoforms in the cell, including the interplays between isoform-specific mRNA decay and translation and interactions with RBPs and subcellular locations.
APA regulation often displays a global trend, such as a dPAS-to-pPAS shift or vice versa, involving many genes in a concerted manner. Identification of the most consequential APA event(s) in a given condition is still challenging. APA perturbation tools, such as genomic removal of alternative PAS by CRISPR–Cas9 (ref. 62) or altering their use by dCaS9-based or dCas13-based methods190,191,216, could be used to address this challenge. Similarly, little is known about the significance of APA of lncRNAs. At least ~15% of lncRNA genes undergo APA5. Although some lncRNA APA events, such as that of NEAT1, have been found to have important functional consequences, our understanding of lncRNA APA is, for the most part, still in its infancy. PAS-based screening tools are needed to determine the biological relevance of APA events of lncRNA genes.
Although much strides have been made in understanding the regulatory mechanisms of APA, some important aspects are still unclear. For example, largely underexplored are how APA is engaged in different signalling pathways and how chromatin structures and modifications, as well as DNA methylation patterns, might influence APA ‘memory’ and vice versa. How APA regulation is executed in a spatial manner in different compartments of the nucleus, such as nuclear speckles and nuclear lamina, is just beginning to be unveiled. Moreover, the ever-expanding repertoire of APA-regulating RBPs needs to be systematically and holistically examined in different physiological and pathological contexts.
The genetic information on APA regulation is a rich source of data for APA research, especially in relation to the clinically relevant apaQTLs. However, their validation is a daunting task. High-throughput methods, including single cell-based 3′-end sequencing and massively parallel reporter assays, coupled with machine-learning tools (Box 2) would be instrumental. The co-localization of apaQTLs to other molecular features, such as gene expression, splicing, protein expression and DNA methylation, can shed light on the mechanisms and consequences of APA. In addition, large-scale validations using data mining tools in combination with high-throughput experimental analysis could offer a fruitful approach to delineate the relevance of genetically variable APA events in different cell contexts.
On the therapeutic front, antisense oligonucleotides have been successfully used to perturb APA217. IPA activation has been implicated in neoantigen production, which is potentially relevant to cancer immunotherapies218. Targeting APA events, either specifically on individual genes or globally across genes, could be an attractive therapeutic modality for various clinical conditions (such as those listed in Supplementary Tables 1–3) as we gain more knowledge about the mechanisms, consequences and genetics of APA-mediated gene regulation.
In the past few years, APA has emerged as a widespread gene regulatory mechanism with diverse biological and clinical consequences. Based on more than three decades of biochemical and molecular biology studies of CPA, and propelled more recently by advances in RNA-seq technologies and machine-learning methods, APA research has now come of age. Here, we outline some areas of APA research where outstanding questions remain to be addressed.
The APA isoform expression profile is closely connected with cell identity and is therefore informative as a diagnostic or prognostic tool. However, the accurate identification and quantification of APA isoforms is still challenging, especially when using standard RNA-seq data. Analysing the use of alternative PAS in the context of transcription start site and splicing events is important to understand the coupling between transcriptional and co-transcriptional events. Long-read sequencing, especially on RNA molecules directly, could help address these issues215. In addition, metabolic labelling with 4-thiouridine and APEX2-based proximal labelling could further shed light on the life cycle of APA isoforms in the cell, including the interplays between isoform-specific mRNA decay and translation and interactions with RBPs and subcellular locations.
APA regulation often displays a global trend, such as a dPAS-to-pPAS shift or vice versa, involving many genes in a concerted manner. Identification of the most consequential APA event(s) in a given condition is still challenging. APA perturbation tools, such as genomic removal of alternative PAS by CRISPR–Cas9 (ref. 62) or altering their use by dCaS9-based or dCas13-based methods190,191,216, could be used to address this challenge. Similarly, little is known about the significance of APA of lncRNAs. At least ~15% of lncRNA genes undergo APA5. Although some lncRNA APA events, such as that of NEAT1, have been found to have important functional consequences, our understanding of lncRNA APA is, for the most part, still in its infancy. PAS-based screening tools are needed to determine the biological relevance of APA events of lncRNA genes.
Although much strides have been made in understanding the regulatory mechanisms of APA, some important aspects are still unclear. For example, largely underexplored are how APA is engaged in different signalling pathways and how chromatin structures and modifications, as well as DNA methylation patterns, might influence APA ‘memory’ and vice versa. How APA regulation is executed in a spatial manner in different compartments of the nucleus, such as nuclear speckles and nuclear lamina, is just beginning to be unveiled. Moreover, the ever-expanding repertoire of APA-regulating RBPs needs to be systematically and holistically examined in different physiological and pathological contexts.
The genetic information on APA regulation is a rich source of data for APA research, especially in relation to the clinically relevant apaQTLs. However, their validation is a daunting task. High-throughput methods, including single cell-based 3′-end sequencing and massively parallel reporter assays, coupled with machine-learning tools (Box 2) would be instrumental. The co-localization of apaQTLs to other molecular features, such as gene expression, splicing, protein expression and DNA methylation, can shed light on the mechanisms and consequences of APA. In addition, large-scale validations using data mining tools in combination with high-throughput experimental analysis could offer a fruitful approach to delineate the relevance of genetically variable APA events in different cell contexts.
On the therapeutic front, antisense oligonucleotides have been successfully used to perturb APA217. IPA activation has been implicated in neoantigen production, which is potentially relevant to cancer immunotherapies218. Targeting APA events, either specifically on individual genes or globally across genes, could be an attractive therapeutic modality for various clinical conditions (such as those listed in Supplementary Tables 1–3) as we gain more knowledge about the mechanisms, consequences and genetics of APA-mediated gene regulation.
Supplementary Material
Supplementary Material
Supplementary InformationThe online version contains supplementary material available at https://doi.org/10.1038/s41576-025-00928-w.
Supplementary InformationThe online version contains supplementary material available at https://doi.org/10.1038/s41576-025-00928-w.
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
- Self-management of male urinary symptoms: qualitative findings from a primary care trial.
- Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
- Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
- Clinical Presentation and Outcomes of Patients Undergoing Surgery for Thyroid Cancer.
- Association of patient health education with the postoperative health related quality of life in low- intermediate recurrence risk differentiated thyroid cancer patients.