COSMIC v97 (Nov 2022) A focus on blood cancer, 4 census Tier 2 genes, 10 cancer hallmark genes are updated along with resistance data. In this release of COSMIC, we have 44,000 new genomic variants, 127,000 new coding mutations, 27,000 non-coding mutations, 6000 new samples and 1,435 new whole genomes. We have also curated 20 new systematic screen papers.
As part of release v97 we have focused on updating the expert-curated mutation data for blood tumours. Blood tumours in COSMIC are classified under haematopoietic and lymphoid tissue as haematopoietic neoplasms or lymphoid neoplasms, which include cancer types such as leukaemias, lymphomas and myelomas as well as myeloproliferative neoplasms. Seventy six additional publications with mutation screening data in these tumour types are included in this release. The types of data ranges from whole genome studies and studies utilising large next generation sequencing panels to case reports with more unusual clinical details and novel treatments. Over 2,600 samples were curated and 24,356 new variants added from these samples. Release v97 also incorporates 9 new blood tumour types into COSMIC.
Gene drug pairs are added for website visualisation:
All the key statistics have also been updated, for more details please check the Drug resistance page.
Cancer Mutation Census data has been updated for v97 release. These are the key updates:
COSMIC-3D data has been updated for v97 release. These are the key updates:
Actionability and CMC downloads are free for non-commercial use, files are available on the COSMIC download page. Please refer to our licensing page here to understand if you are a Non-Commercial or Commercial user and how to obtain a license.
Since the annotation system upgrade in v90, VEP is used to standardise and normalise all variant annotations.https://www.ensembl.org/info/docs/tools/vep/index.html
One unintended consequence of using VEP is that it outputs genomic level ( g.) annotations for many non-coding mutations in the 5' UTRs of genes, as well as for all mutations in intergenic regions Sometimes these mutations are associated with a named gene and are known or predicted to be functionally significant, having well known CDS (c.) annotations reported in the scientific literature (eg TERT promoter mutations). Previously, these CDS (c.) annotations were shown in COSMIC, but since the v90 upgrade these are overwritten by the standardised VEP genomic annotations and any link to the gene is lost in the case of promoter mutations.
In order to maintain a standardised dataset, we will continue to show the VEP genomic annotations for all mutations, but we have now produced a mapping file to allow the non-coding variant (NCV) genomic annotations to be linked back to the CDS syntaxes.
The new mapping file NCV_CDS_syntax_mapping.tsv released in v97 can be cross referenced with the CosmicNonCodingVariants.vcf.gz or CosmicNCV.tsv.gz download files to link CDS syntaxes with LEGACY_ID or COSV identifiers.
Generally, on the website we focus on coding mutations, but non-coding variants are displayed on the Genome Browser and can also be viewed directly by searching for the COSN identifier eg: https://cancer.sanger.ac.uk/cosmic/search?q=COSN32285790
In v97, the new mapping file contains only TERT promoter mutations, but we plan to include non-coding mutation mapping for other genes in future releases.
This new file is available on the COSMIC download page.
Follow links below to the 20 papers which are new in v97, or view the full table of papers here.
Numbers with a '+' at the end of each statistics denotes the increase since the last release.
COSMIC Actionability v7 includes 11 additional fully-curated genes:CD274 (PD-L1), HRAS, MAP2K1 (MEK1), AR, GNA11, GNAQ, SMAD4, TSC1, DDR2, ETV6, FOXL2
This means we have a total of 72 fully curated genes:ABL1, AKT1, AKT2, AKT3, ALK, ASXL1, ATM, BCR ,BRAF, BRCA1, BRCA2, BTK, CDK12, CDK4, CDK6, CEBPA, CTNNB1, DNMT3A, EGFR, ERBB2, ERBB3, EZH2, FGFR1, FGFR2, FGFR3, FGFR4, FLT3, IDH1, IDH2, JAK1, JAK2, JAK3, KIT, KMT2A (MLL) ,KRAS, MDM2, MDM4, MET, MLH, MPL, MSH2, MSH6, NF1, NF2 ,NPM1, NRAS, PDGFRA, PDGFRB, PIK3CA, PMS2, PTCH1, PTEN, RET, ROS1, RUNX1, SF3B1, SMO, STK11, TET2, TP53, WT1, CD274 (PD-L1), HRAS, MAP2K1 (MEK1), AR, GNA11, GNAQ, SMAD4, TSC1, DDR2, ETV6, FOXL2
To view the full list of curated genes visit the About page on the Actionability website.
All previously-recorded clinical trials have been checked for new or updated results.
Expressed/not category added to Patient Pre-screening; From v7 onwards the download file contains a new category: 'Expressed/not' This is used for trials that compare patients that express a protein with those that don???t or compare patients with high expression with those with low expression. In practice, there is usually a threshold expression level and the comparison is between patients above/below it. If our curator is able to find out the measure and threshold level that was used, it appears as part of the trial name.This new value is represented by the term Patient Pre-Screening, in the column mutation_selected_dict
There are several trials using this new category in v7.
Addition of Australian/New Zealand Clinical Trials Registry
Actionability v7 includes the addition of a new datasource: the Australian New Zealand Clinical Trials Registry (ANZCTR). This can be seen in the Source_Type column as a value of 9.
COSMIC Mutational Signatures is a resource curated in partnership with COSMIC and Cancer Grand Challenges, and in close association with our collaborators at Wellcome Sanger Institute, the Pillay lab at University College London and the Alexandrov lab at University of California.
New for COSMIC Mutational Signatures release v3.3
We have added a novel collection of reference signatures to describe copy number variations, in total we have 24 CN signatures. Copy number signatures are defined using the 48-channel copy number classification scheme. The scheme incorporates loss-of-heterozygosity status, total copy number state, and segment length to categorise segments from allele-specific copy number profiles (as major copy number and minor copy number respectively i.e. non-phased profiles), and the signatures displayed here were identified from 9,873 tumour copy number profiles obtained from The Cancer Genome Atlas (TCGA) SNP6 array data spanning 33 cancer types.
The SBS and DBS signatures have been enriched with more topographical data and graphs, across 7 new features. These are:
In adding these new topographical features we overhauled the existing transcriptional strand asymmetry feature and made it possible to view a feature's respective graph in a tissue specific as well as an aggregated manner.
Other changes include:
Focused curation on rare head and neck cancers:
Gene focused curation:
Updates to Cancer Gene Census
Updates to Hallmark Genes
Whole Genome data
Head and neck (H&N) cancer is a relatively uncommon type of cancer. Around 12,400 new cases are diagnosed in the UK each year (NHS) and H&N cancer accounts for 3% and 4% of the total cancer incidence in the US and Europe respectively. COSMIC v96 contains data from focused curation on less common H&N cancers, for example the ones that develop in the salivary glands, sinuses, or muscle and bone in the head and neck. Variant and patient data was curated from 56 publications. The focus of the papers ranged widely, including defining mutational profiles for the tumours, their aetiology, histopathology or treatment options,and finding actionable mutations for each tumour type. From this curation, 129 new site-histology pairs with sequencing data were added to COSMIC, and a New NCI Thesaurus code has been created for Sinonasal low-grade Schneiderian papillary carcinoma in collaboration with the NCIT.
17 further publications were evaluated and are listed on the COSMIC website but data from these could not be curated for quality reasons or because they were review type publications that don't report novel variant data.
POLR2A, the gene encoding RNA polymerase II catalytic subunit A, is a key player in meningiomas (COSP 41827).
A subset of WHO grade I meningiomas are defined by somatic hotspot mutations in p.Q403K and p.L438_H439 deletions. Germline mutations in POLR2A are associated with heterogenous multi-system disorders and p.L438_H439del is associated with the most severe phenotype. POLR2A status as a cancer gene was reviewed and it was added to the Cancer Gene Census as a Tier 2 gene for its role as a potential oncogene in meningioma. The gene seems to be commonly deleted in cancer where recurrent mutations cause widespread changes in gene expression, although no definitive evidence was found that they cause cancer. Differential expression is enriched for genes involved in the cell cycle, apoptosis and cancer-associated signalling pathways. The literature reporting POLR2A mutations across all cancers was comprehensively curated.
The Protein kinase D1 (14q12) gene has been added to the Cancer Gene Census as a Tier 2 gene for its role in fusions found in cribiform adenocarcinomas of the salivary gland. It is a serine/threonine protein kinase involved in several signalling pathways and many cellular processes including cell migration and differentiation, cell survival and regulation of cell shape and adhesion. Our H&N curation focus included several papers reporting recurrent p.E710D mutations in the majority of polymorphous low grade adenocarcinomas (PAC), the second most common malignant tumour of minor salivary glands (COSP 46408, 49877, 49498, 49500, 49502). The p.E710D mutation is also found in a minority of cribiform adenocarcinomas of the salivary gland (CASG) (COSP 49877 & PMID 31492931), but not in more aggressive head and neck adenoid cystic carcinomas or pleomorphic adenomas (COSP 49498), nor in other solid tumours and leukaemias. A minority of PACs and a majority of CASGs carry fusions involving either PRKD1, PRKD2 or PRKD3. Other PRKD1 mutations are found at lower frequencies in a variety of other tumours including breast, leukaemia, lymphoma and gastric cancers.
Cancer Gene Census genes are partitioned into two tiers according to the strength of their association with cancer. Tier 1 genes must possess a documented activity relevant to cancer, along with evidence of mutations in cancer which change the activity of the gene product in a way that promotes oncogenic transformation. For Tier 2 genes, there is a strong indication of a causal link, but the functional evidence is less definitive and/or the mutation patterns (that enable assignment of a role in cancer).
ACVR1B: Tumour suppressor gene
CTNNA1: Tumour suppressor gene, fusion gene
POLR2A:
PRKD1: fusion gene
COSMIC Hallmarks of Cancer annotations employ cancer phenotypic traits to describe how Tier 1 CGC genes functionally contribute to cancer development.
v96 has new Hallmarks of Cancer annotations for the following genes:
Follow links below to the 9 papers which are new in v96, or view the full table of papers here.
COSMIC welcomes author contributions of data as they are invaluable in supporting us to identify new genes and trends in cancer research. We actively collaborate with authors who have their publications at a submission stage. Correctly formatted variant data ensures faster inclusion of the paper in COSMIC and dissemination of the data into the research community to further empower new research. An example of such collaboration was an author submission that highlighted POLR2A. As a result of this submission, POLR2A status as a cancer gene was reviewed and it was added to the Cancer Gene Census as a Tier 2 gene and the literature reporting POLR2A mutations across all cancers was comprehensively curated for this release. Whether our submissions report previously undescribed cancer mutations or cancer genes or well known variants in new cancer types, all papers are triaged and prioritised. Some journals require a proof of submission to COSMIC as a pre-publication requisite. However, papers are released in COSMIC only after peer-review and publication to guarantee high quality and open access status of data.
Instructions on how to submit data to COSMIC can be found here: https://cancer.sanger.ac.uk/cosmic/submissions
As part of the V95 release we have focused on updating the expert-curated mutation data for rare cancers of the female genital tract and breast. This release has approximately 100 additional publications with mutation screening data in these diseases, including ovarian germ cell tumours, uterine Mullerian tumours and breast adenomyoepithelioma. We have also updated the classification of mucosal melanomas, including those associated with the female genital tract, to give details for the specific mucosal tissue.
V95 includes information about the resistance mutations in the FGFR2 and NT5C2 genes. We also have two new expert-curated genes, SDHA and TENT5C, which are associated with gastrointestinal stromal tumours and multiple myelomas respectively. Finally, we have focused on in-depth curation and updates of mutation data for the chromatin remodelling genes ARID1A, ARID1B, ARID2, PBRM1, SMARCA4, SMARCB1 and SMARCD1. More than 80 additional publications with mutation screening data in these genes are included in this release.
We've updated our Terms & Conditions for Non-Commercial use of COSMIC Core data (including the Cell Lines Project, COSMIC-3D, and Mutational Signatures). Whether you're thinking of registering or a current user, it's vital you read these thoroughly.
Unless stated, these apply to all releases of COSMIC, including previous versions that you may have downloaded.
As part of this change, the following statement has been removed: 'If I now need a licence for my use of COSMIC data, instead of licensing I can use/continue to use an old unsupported version of COSMIC'. This means that you are not permitted to use old and unsupported versions of COSMIC.
We don't have capacity to support older versions of COSMIC. Our database is designed as a 'living tool' that is constantly evolving in line with the latest research and information. It's also important to note that old versions of COSMIC aren't kept up to date. As a result, many of the links are broken and the data isn't accurate. With this in mind, we hope you will understand the necessity for this change to our T&Cs.
Read the full T&Cs here
Terminal Nucleotidyltransferase 5C (TENT5C), previously known as FAM46C, encodes a non-canonical poly(A) RNA polymerase. It is thought to enhance mRNA stability and gene expression, the main targets is mRNA which encodes ER-targeted proteins. Commonly found to be mutated in multiple myeloma, evidence suggests that TENT5C is a B-cell lineage-specific growth suppressor. Somatic mutations in multiple myeloma samples are recorded across the gene, most of these are substitutions.
SDHA (Succinate dehydrogenase complex flavoprotein subunit A) encodes a catalytic subunit of succinate-ubiquinone oxidoreductase, a complex of the mitochondrial respiratory chain. Germline mutations associated with loss of heterozygosity in the tumour drive several cancer types. However, rarer second-hit somatic mutations, and occasionally double somatic mutations, are also reported. This is notably in SDHA expression-negative 'wild type' gastro-intestinal stromal tumours (GISTS) lacking KIT or PDGFRA mutations. Somatic mutations in other tumours, such as pituitary adenomas and paragangliomas, are also seen.
Extensive research has shown that targeting FGFRs with small molecule inhibitors halts receptor activation, downstream signalling, and results in tumour shrinkage. However, the efficacy of these inhibitors can be limited due to acquired mechanisms of chemotherapy drug resistance which impedes treatment and leads to tumour relapse.
COSMIC V95 includes patient mutation data in which resistance to drug treatment is caused by point mutations in the FGFR2 gene. Cancers studied include; intrahepatic cholangiocarcinoma (iCCA), breast cancer, lung cancer and gastric cancer.
Multiple alternatively spliced isoforms of FGFR2 are known to exist, and mutations detailed here refer to amino acid numbering in the FGFR2-IIIb isoform, the FGFR transcript shown as canonical on the COSMIC website (ENST00000457416.6)
Genomic analysis shows an alteration in targetable oncogenes in almost 50% of cholangiocarcinoma patients with recurrent alteration in IDH1 and FGFR2. This occurs almost exclusively in patients with iCCA compared to extrahepatic.
FGFR2 genomic alterations including activating point mutations, fusions, and rearrangements are known oncogenic drivers and provide a molecular signature to identify patients who may benefit from inhibition of FGFR2 tyrosine kinase activity.
Whilst second generation selective (ATP competitive) FGFR inhibitors such as BGJ398/infigratinib, Debio 1347, and pemigatinib/INCB054828 have been shown to increase the disease control rate, the rapid emergence of acquired drug resistance has been frequently observed. Goyal et al. (COSP42875) first described genetic mechanisms of clinical acquired resistance to FGFR inhibition in patients with FGFR2 fusion-positive iCCA. Through the analysis of pre- and post-progression ctDNA and tumour biopsies in three patients with FGFR2 fusion positive iCCAs treated with BGJ398, this study revealed the emergence of FGFR2 kinase domain mutations including a FGFR2 V565F gatekeeper mutation in all 3 patients. Goyal et al (COSP46683) followed this initial study with six FGFR2 fusion-positive iCCA patients treated with FGFR kinase inhibitors BGJ398 and Debio 1347, and again found mutations in kinase domain residues - K660M, V565F/H, N550K/H/T, and L618V, plus M372I in the transmembrane domain. Consistent with these findings, four other investigators identified the emergence of similar FGFR2 kinase domain mutations in patients with FGFR2 fusion positive cholangiocarcinoma, who had initially responded to pemigatinib (Silverman et al, COSP49195 and Krook et al, COSP49205), BGJ398 (Krook et al, COSP47638) or an unspecified FGFR inhibitor (Kasi et al, COSP49199).
Mutations observed in these studies include M539L, N550K/H/T, V565F/H, E566A, L618V, K660M and K642R which result in increased receptor kinase activity. Structural modelling has suggested two ways in which these mutations confer resistance:
Similar kinase domain gain of function FGFR2 activating mutations (and FGFR amplifications) were shown to be apparent in post-resistance samples of ER+ metastatic breast cancer after treatment with ER-directed therapy (Mao et al, COSP48455) and ER therapy with CDK4/6 inhibitors (palbociclib) (Formisano et al, COSP46556).
Apart from the emergence of secondary FGFR alterations, another challenge to the effectiveness of FGFR targeted therapies in patients is the occurrence of intra-tumoural and temporal heterogeneity. This is a major obstacle to the effectiveness of FGFR-targeted therapies in patients with liver cancers as shown by Goyal et al (COSP42875) and Kasi (COSP49199).
Bypass mechanisms also contribute to the development of drug resistance. Min Lau et al (COSP44344) demonstrated the emergence of a PKC dependent re-wiring mechanism to confer resistance to AZD4547 (second generation FGFR inhibitor) in FGFR2 amplified diffuse gastric cancer. The FGFR2 V565F gatekeeper mutation also emerged in a PDX model of the gastric cancer and overexpression during ex-vivo culture with AZD4547 which caused resistance to AZD4547 and cross resistance to infigratinib.
Next-generation covalent (irreversible) inhibitors, such as futibatinib (TAS 120), as a possible means to overcome or suppress resistance mutations, are studied by Goyal et al (COSP46683). They describe four patients with FGFR2 fusion positive cholangiocarcinoma who developed acquired resistance to infigratinib or Debio-1347 and subsequently responded to TAS-120, although gatekeeper resistance mutations were later found. A similar subsequent response to TAS-120 was shown using in vitro assays by Krooke et al. (COSP47638).
NT5C2 (5'-nucleotidase, cytosolic II) encodes a hydrolase that serves as an important role in cellular purine metabolism by acting primarily on inosine 5'-monophosphate and other purine nucleotides. Gain of function mutations in NT5C2 result in altered activating and autoregulatory switch-off mechanisms and a protein with increased nucleotidase activity. This drives resistance to thiopurine chemotherapy, such as 6-mercaptopurine, in relapsed acute lymphoblastic leukaemia (ALL). NT5C2 point mutations commonly occur at R39, R238, R367, and D407, and are frequently recurrent, with R367Q the most common relapse-associated NT5C2 mutation, accounting for 90% of mutant cases.
Please note that due to technical difficulties, resistance data for FGFR2 and NT5C2 are not showing on the website currently. All resistance mutations are available in the download files.
SWitch/Sucrose NonFermentable (SWI/SNF) is a chromatin remodelling complex which uses the energy of ATP hydrolysis to reposition nucleosomes, thereby regulating access to the DNA and modulating transcription and DNA replication/repair. Mutations involving subunits of the SWI/SNF complex are common in a wide range of cancers, occurring in approximately 20%, with ARID1A the most frequently mutated subunit. Those with the highest SWI/SNF mutation rates are ovarian clear cell carcinoma, clear cell renal cell carcinoma, hepatocellular carcinoma, gastric cancer, melanoma and pancreatic cancer.
Typical teratoid/rhabdoid tumour (AT/RT), a rare and highly aggressive malignancy of the central nervous system (CNS), is usually diagnosed in infancy or childhood and is most often characterised by loss of expression of the SMARCB1 gene product (INI1). However, an unusual case with retained expression of INI1 and without mutations identified in SMARCB1 is reported by Bookhout et al. (COSP49144) in an infant with thalamic AT/RT.
Johan et al. (COSP41712) report a series of cribriform neuroepithelial tumour (CRINET), a rare non-rhabdoid brain tumour showing cribriform growth pattern and SMARCB1 loss. They conclude that CRINET represents a SMARCB1-deficient non-rhabdoid tumour which shares molecular similarities with the AT/RT-TYR subgroup but has distinct histopathological features and favourable long-term outcome.
In renal medullary carcinoma, a highly aggressive type of renal cancer occurring in patients with sickle cell trait, loss of SMARCB1 expression has emerged as a key diagnostic feature and Jia et al. (COSP49139) demonstrate biallelic inactivation of SMARCB1 in the majority of their 20 cases.
In breast implant-associated anaplastic large cell lymphoma, a distinct entity which arises in the capsule surrounding textured saline or silicone breast implants, Quesada et al. (COSP49321) report a novel STAT3-JAK2 fusion as well as mutations or gene losses in several genes including SMARCB1.
Rooper et al. (COSP49118) find recurrent loss of SMARCA4 in sinonasal teratocarcinosarcoma (TSC), a rare and aggressive tumour with mixed teratomatous, carcinomatous and sarcomatous components. They suggest SMARCA4 inactivation may be the dominant genetic event in TCS and that this lesion is on a diagnostic spectrum with SMARCA4-deficient sinonasal carcinoma.
ARID1A is a key non-catalytic component in the SWI/SNF complex. It acts primarily as a tumour suppressor and is emerging as a potential therapeutic target. Hung et al. (COSP49190) study the spectrum of ARID1A genetic alterations in non-small cell lung carcinoma and assess the clinicopathological significance of these mutations and expression loss in these tumours.
Wu et al. (COSP49189) perform comprehensive genomic profiling in ovarian seromucinous borderline tumours, an uncommon ovarian epithelial neoplasm characterised by association with endometriosis, and find frequent somatic mutations in KRAS, PIK3CA and ARID1A.
The mutation profile at hotspots of ARID2 in oral squamous cell carcinoma patients from South India is examined by Das et al. (COSP49067) and Bala et al. (COSP48911) identify ARID2 as a novel tumour suppressor in early-onset sporadic rectal cancer. Both ARID1A and ARID2 are among the genes identified by Varaljai et al. (COSP49256) as drivers in intracranial metastases in malignant melanoma and as such are therapeutic targets.
Female adnexal tumours of probable Wolffian origin (FATWO) are very rare gynaecological tumours of low malignant potential thought to derive from the mesonephric (Wolffian) remnants in the upper female genital tract. Most frequently they occur in the paraovarian region and occasionally within the ovary, fallopian tube or retroperitoneum. Mirkovic et al. (COSP45731) examine the molecular changes in FATWO to determine whether they are molecularly similar to mesonephric carcinoma. They find FATWO lacking mutations of KRAS/NRAS, which are characteristic of mesonephric carcinoma. Bennett et al. (COSP47213) also perform a molecular analysis of FATWO, finding few pathogenic mutations and suggesting this could be useful in the differential diagnosis of difficult FATWO cases showing similarity to more common ovarian and broad ligament lesions.
Wang et al. (COSP47834) report mutations in primary vaginal malignant melanoma, an extremely rare mucosal melanoma. In their cohort of 36 patients, NRAS mutations and PD-L1 expression are most prevalent, whereas the detection rate of KIT and TERT mutations is low. Patients with NRAS mutations have a poorer survival outcome compared with those with wild-type NRAS. For invasive melanomas arising from different anatomical sites in the lower female genital tract, Zarei et al. (COSP47239) observe the most common genetic alterations in KIT, TP53 and NF1.
Jung et al. (COSP48959) present whole exome sequencing results for gestational choriocarcinoma, a unique cancer of pregnant tissues. Hodroj et al. (COSP49095) report the molecular characterisation of ovarian yolk sac tumour, a rare malignant germ cell tumour, with mutations in KRAS, KIT and ARID1A which may be used as therapeutic targets. Frumovitz et al. (COSP48770) investigate mutational hotspots in cancer-related genes in small cell neuroendocrine cervical cancer. Dundr et al. (COSP48465) highlight a case of ovarian mesonephric-like adenocarcinoma arising in serous borderline tumour.
Rare cancers of the breast include metaplastic breast cancer (MpBC), a predominantly triple negative breast cancer (TNBC) representing a histologically heterogeneous group of invasive carcinomas. MpBC is defined by differentiation of the neoplastic epithelium to a non-glandular component, such as squamous or mesenchymal e.g. spindle cell, osseous or chondroid. It is an aggressive form of breast cancer, with patients presenting at an advanced stage and it is often more resistant to conventional chemotherapy than other TNBC.
TP53 is the most frequently mutated gene in MpBC followed by PIK3CA, as shown by Afkhami et al. (COSP48792) who also report a PIK3CA-mutated case of MpBC with exceptional response to everolimus therapy. Vranic et al. (COSP48788) perform molecular profiling of spindle cell breast cancers and show they are characterised by targetable molecular alterations in the majority of cases. Reed et al. (COSP48795) report results of whole exome sequencing for MpBC, confirming previous reports of high frequency of TP53 mutations and presenting evidence for a significant enrichment of co-occurring mutations in PTEN, PIK3CA and TP53.
Breast adenomyoepithelioma, is an uncommon, biphasic tumour ranging from benign, to atypical in situ, and malignant, with the latter associated with carcinoma which can arise in the epithelial or myoepithelial component. Using whole exome and targeted massively parallel sequencing analysis Geyer et al. (COSP48780) demonstrate that oestrogen receptor-positive adenomyoepitheliomas display mutually exclusive PIK3CA or AKT1 activating mutations, while oestrogen receptor-negative tumours harbour highly recurrent codon Q61 HRAS hotspot mutations, which co-occur with PIK3CA or PIK3R1 mutations. This update also includes case reports of adenomyoepithelioma from Watanabe et al. (COSP48777) and Han et al. (COSP48782).
You can see the links to specific publications within the main body of text, or view the full table of papers..
For transparency, we have recently changed our data definitions and created sub-categories to be clearer as to what the different mutation statistics mean for our users.
COSMIC v94 (May 2021) 2 new expertly curated genes, a focus on rare lung cancers, a focus on rare pancreatic cancers, and curation of somatic mutations in 12 hallmark apoptosis genes. Along with this, 9 cancer hallmark genes data are also updated. In this release of COSMIC, we have 1 million new coding mutations, around half a million genomic mutations, quarter a million non-coding mutations, 31,606 new samples and 1,447 new whole genomes. We have also curated 47 new systematic screen papers. Our new products Actionability and CMC are also updated with the latest datasets.
We have listened to the user feedback and have made changes to the Actionability download file, it includes a new COSO column to link to our existing classification file. This will help to find complete disease classification in COSMIC, along with the NCIT code.
MYCN (MYCN proto-oncogene, bHLH transcription factor) encodes a member of the MYC family, a protein with a basic helix-loop-helix domain. Amplification of MYCN is associated with different tumours, particularly neuroblastoma, and both amplification and overexpression are associated with adverse prognosis in Wilms tumour (WT). Additionally, activating hotspot mutations in MYCN at P44, a highly conserved residue, have been identified in WT, suggesting a significant role for MYCN dysregulation in the molecular biology of WT. Mutations are also detected in childhood T-cell acute lymphoblastic leukaemia and in skin basal cell carcinoma, where most mutations cluster in the sequence encoding the Myc box 1 (MB1) region, particularly at P44.
MAPK1 (mitogen-activated protein kinase 1) encodes a member of the MAP kinase family. These extracellular signal-regulated kinases (ERKs) act as an integration point for multiple biochemical signals, and are involved in a wide variety of cellular processes such as proliferation, differentiation, transcription regulation and development. Mutations in the MAPK signalling pathway have been associated with the development of several carcinomas. A high frequency of MAPK1 mutations occurs in primary cervical squamous cell carcinoma, with a hotspot mutation at p.E322K; in cervical intraepithelial neoplasm additional driver mutations in genes such as MAPK1 are required to achieve squamous cell progression. The MAPK1 hotspot mutation is also found in head and neck squamous cell carcinoma at a lower rate.
As part of release v94 we have focused on updating the expert-curated mutation data for pancreatic cancer. Over 90 additional publications with mutation screening data in this disease are included in the release. As well as data on the most common cancer, pancreatic ductal adenocarcinoma (PDAC), publications on neuroendocrine tumour, intraductal papillary mucinous neoplasm and sclerosing epithelioid mesenchymal neoplasm have been curated.
Pancreatic cancer is one of the most fatal malignancies with a 5-year survival rate of <10%. Many patients present with lymph node or distant metastases at initial diagnosis and combined chemotherapies achieve only modest results. Targeted therapies and immunotherapies have so far achieved little efficacy.
PDAC is the most prevalent neoplastic disease of the pancreas, accounting for 95% of all pancreatic malignancies. KRAS, TP53, CDKN2A and SMAD4 are the most common drivers of PDAC, with codon 12 KRAS mutations found in the majority of cases. This update includes a publication by Vitellius et al. (COSP48266) who study the link between survival and mutations in driver genes in PDAC patients who have isolated pulmonary metastases. They find an absence of mutations in CDKN2A and SMAD4 in the primary tumours in these patients. Cheng et al. (COSP47781) find KRAS G12V mutation associated with high circulating regulatory T cell levels with both of these factors predicting worse prognosis in advanced PDAC patients.
PDAC can develop independently or arise from precancerous lesions such as pancreatic intraepithelial neoplasia (PanIN) or intraductal papillary mucinous neoplasm (IPMN), and these lesions are targets for early disease detection. Hosoda et al. (COSP43069) use whole exome sequencing and targeted sequencing to compare mutations in low- and high-grade PanIN, finding KRAS mutations in 94% of low-grade lesions and no TP53 or SMAD4 mutations, suggesting the latter genes are inactivated later in the neoplastic process. IPMN have high frequencies of mutations in KRAS, GNAS and RNF43 and include several distinct histopathological subtypes, one of which is studied by Omori et al. (COSP48022). Their results suggest that gastric-type epithelia which acquire GNAS mutations, together with induction of intrinsic CDX2 expression, may evolve with clonal selection and additional molecular aberrations into intestinal-type IPMNs, then to an invasive phenotype. Pancreatic mucinous cystic neoplasms, another precursor lesion of PDAC, have multiple KRAS mutations in the non-mucinous epithelial lining in the majority of lesions according to a study by An et al. (COSP47785).
Rare cancers include pancreatic neuroendocrine tumours (PNET), heterogenous neoplasms accounting for <3% of all pancreatic malignancies. The main genetic alterations in PNET are in MEN1, VHL and genes involved in the mTOR pathway, DAXX and/or ATRX mutations. Kit et al. (COSP48169) describe well-differentiated PNET in a Russian cohort demonstrating various molecular genetic features, including new genetic variations and potential driver genes.
Other rare pancreatic cancers include sclerosing epithelioid mesenchymal neoplasm, a proposed new entity. Basturk et al. (COSP47728) define the clinicopathological features but find no characteristic recurrent molecular signatures by whole exome sequencing. Acinar cell carcinoma (ACC) is a rare pancreatic neoplasm with dismal prognosis. Kryklyva et al. (COSP47731) report the occurrence of ACC in carriers of BRCA2 germline mutations, plus somatic BRCA1/2 alterations and a mutational signature associated with BRCA1/2 deficiency in a significant subset of sporadic ACC. A case of BRAF V600E-mutated ACC in a patient who responds well to combined BRAF/MEK inhibitor treatment is presented by Busch et al. (COSP48541). Three cases of rare pancreatic gastrointestinal stromal tumour, radiologically mimicking cystadenocarcinoma, are described by Ambrosio et al. (COSP36241).
As part of release v94 we have focused on updating the expert-curated mutation data for rare lung cancers. Approximately 100 additional publications with mutation screening data in these diseases, including sarcomatoid carcinoma, enteric adenocarcinoma and large cell neuroendocrine carcinoma, are in this release.
Pulmonary sarcomatoid carcinoma (PSC), a rare subtype of non small cell carcinoma which comprises poorly differentiated cells and sarcoma or sarcomatoid components, is highly aggressive and has a poor prognosis. It includes 5 pathological subtypes: pleomorphic carcinoma, spindle cell carcinoma, giant cell carcinoma, pulmonary blastoma and carcinosarcoma. This update includes a study of the mutation spectrum in PSC, where a whole exome study with validation by Liu et al. (COSP40313) finds that MET mutational events leading to exon 14 skipping are frequent and potentially targetable events in PSC. Nakagomi et al. (COSP48108) separately analyse the mutations in both components in PSC and their results support the hypothesis that the components have a common origin. Comparison of the mutation profile in each component also shows that the sarcomatous component has a greater accumulation of mutations and a larger genetic distance to the common-origin than the epithelial component. Currently, PSC is treated similarly to non small cell lung cancer, but does not respond well and is refractory to chemotherapy and radiation, with rapid recurrence after surgical resection. Lococo et al. (COSP48180) show the presence of KRAS mutations associated with local metastases at recurrence and with a significantly decreased probability of survival in a cohort of patients with surgically resected PSC. Developing effective treatment strategies for this disease is difficult owing to the tumour?s rarity and its intratumoural heterogeneity. Sukrithan et al. (COSP4810) report improved survival with immune checkpoint inhibitor therapy in their cohort of 5 patients with advanced PSC.
Pulmonary enteric adenocarcinoma (PEA), a rare variant defined as adenocarcinoma with an enteric differentiation component exceeding 50%, shows some histological and immunohistochemical similarities with metastatic colorectal adenocarcinoma. A comparative analysis by Matsushima et al. (COSP48150) indicates that ?-catenin and SATB2 are useful immunohistochemical markers for differentiating between the two diseases. Their molecular analysis reveals KRAS as the most frequently mutated gene in PEA, a result confirmed by Nottegar et al. (COSP48151) who also find very few cases harbouring abnormalities a?ecting EGFR, BRAF and ALK.
Large cell neuroendocrine carcinoma (LCNC) of the lung is a rare, molecularly and biologically heterogeneous disease, sharing many molecular and histological features with small cell lung cancer and non-small cell lung cancer. It is an aggressive disease with a poor prognosis. Makino et al. (COSP48190) compare molecular analyses in pulmonary LCNC and adenocarcinoma, finding a lack of EGFR mutations in the former, and suggest these tumours may have a favourable response to adjuvant treatments, which are not typically prescribed in non-small cell lung cancer. Both Kogo et al. (COSP48193) and Zhao et al. (COSP48187) report cases of LCNC transformed from EGFR-mutated adenocarcinoma, where this change is a tyrosine kinase inhibitor resistance mechanism and more usually a transformation to small cell carcinoma. Zhou et al. (COSP48185) report a significant impact of primary tumour location on the survival of a series of LCNC patients, where clinicopathological features and genomic characteristics differ between central and peripheral tumours, including higher EGFR mutational status in peripheral tumours.
Additionally, Zhao et al. (COSP47930) investigate the clinicopathological features and genomics of both the epithelial and mesenchymal components of pulmonary blastomatoid carcinosarcoma, where both retain high consistency in genetic abnormalities. Li et al. (COSP47640) report a case of hepatoid adenocarcinoma, a disease histologically resembling typical hepatocellular carcinoma metastatic to the lung, and its genomic profile including a FAT1 driver mutation.
As part of release v94 we have curated somatic mutations in 12 Hallmark genes involved in evading apoptosis, the mechanism that programmes cell death once cells become damaged. These genes include CHEK2 (checkpoint kinase 2), a tumour suppressor gene, which encodes a cell cycle checkpoint regulator and member of the CDS1 subfamily of serine/threonine protein kinases that contains a forkhead-associated protein interaction domain essential for activation in response to DNA damage. Infrequent somatic mutations in CHEK2 have been found in myelodysplastic syndrome and in solid tumours such as prostate, lung and breast, and vulval squamous cell carcinoma. CHEK2 is also a novel target in diffuse large B cell lymphoma.
ERCC2 (ERCC excision repair 2, TFIIH core complex helicase subunit), a tumour suppressor gene, encodes a protein involved in transcription-coupled nucleotide excision repair and an integral member of the basal transcription factor BTF2/TFIIH complex. Somatic ERCC2 mutations are found in muscle-invasive bladder cancer, where these alterations, as well as those in other DNA repair pathway genes, can predict responses to neoadjuvant platinum-based chemotherapies and to targeted therapies on the basis of mutation status.
The following Hallmark genes, all involved in apoptosis evasion and many of which are DNA repair genes, have also been curated for somatic mutations: BLM, DDB2, ELF3, ERCC3, ERCC5, RECQL4, POLG, FEN, FLT4, MAPK1
Follow links below to the 47 papers which are new in v94, or view the full table of papers here.
COSMIC v93 Release includes a launch of our newest product in the COSMIC suite, Mutation Actionability in Precision Oncology (Actionability). Alongside this we are releasing new developments and updates for COSMIC Mutational Signatures in our v3.2 release.
Please Note: All COSMIC data is still going to be the same as v92, apart from an addition of the new Actionability download file. The core COSMIC dataset is going to be updated for our next release in v94.
The aim of Actionability is to indicate the availability of drugs that target mutations in cancer and track the progress of clinical studies towards making new drugs available. Drugs that target somatic mutations are represented at all stages of drug development, through safety and clinical phases to market and repurposing, with additional case studies.
The principal units of actionability are mutation, disease, and drug. Capturing relations between these units allows identification of existing and upcoming drugs that target particular genetic variants in specific cancer types.
In our first release of Actionability, we have manually curated the following:
It is possible to view the list of genes on the Actionability about us page.
The Actionability data is available as a download file complete download file containing all Actionability data or as a sample file.
To read more about Actionability, please read our blog.
We?d like to share our exciting new developments and updates for COSMIC Mutational Signatures in our v3.2 release.
COSMIC Mutational Signatures is a collaboration between Wellcome Sanger Institute, Cambridge, UK, COSMIC and the Alexandrov lab at the University of California, San Diego, USA, part of the wider Mutographs Cancer Grand Challenges Project.
In our latest release, you will be able to find the following:
New for Release v3.2
SBS10c and SBS10d are two new signatures linked with polymerase proofreading deficiency (similar to SBS10a, SBS10b and SBS28), but in this case they are related to mutations in a different polymerase, POLD1.
SBS91 was identified in normal cells from the cerebral cortex and the oesophageal squamous epithelium and its aetiology is unknown.
SBS92 is a new mutational signature related to tobacco smoking.
SBS92, SBS93 and SBS94 were extracted from the same dataset as version 3 signatures (PCAWG Consortium data) but are now using our newly developed computational method, SigProfilerExtractor.
We have created a new unified download page, serving as a one stop shop for the mutational signature profiles, grouped per release for all past releases from v1 in 2013. This groups all reference signatures into a single file, organised by variant class (SBS, DBS and ID), genome (GRCh37, GRCh38, mm9, mm10, rn6) and the COSMIC Mutational Signature release version. Our new downloads page can be found here.
We have added additional supporting evidence for the proposed aetiology of the signatures.
This high quality manually curated data can be used alongside the comments section to provide transparency and justification for why a signature is considered to be a reference signature or artefact.
In the example of signature SBS30, its aetiology was unknown when it was identified. However it has later been experimentally proven that it was related to base excision repair deficiency due to NTHL1 mutations.
It is now possible for all SBS and DBS signatures to be rendered with the newly added rn6 rat reference genome, alongside the pre-existing reference genomes GRCh37, GRCh38, mm9, and mm10 for all SBS and DBS signatures.
We have increased the coverage and uniformity of our signatures and have corrected some errata in texts and images.
The URL used to access Mutational Signatures has changed from https://cancer.sanger.ac.uk/cosmic/signatures becoming https://cancer.sanger.ac.uk/signatures. There will be no interruption to service as all of the links will be redirected to the new site.
We have added better responsive layouts.
We have also implemented a simplification and cleaner redesign of the website navigation.
COSMIC v92 (August 2020) includes 2 new fully curated genes, substantial curation update on spliceosomes (SF3B1, SRSF2, U2AF1, ZRSR2), 2 new gene-drug pairs. Along with the data update we are launching a brand new product, The Cancer Mutation Census (CMC). In this release of COSMIC, we have 2.5 million new coding mutations, around 1 million genomic mutations, half a million non-coding mutations, 16,000 new samples and 1147 new whole genomes. We have curated 18 new systematic screen papers. We are also including a new data download file for Hallmarks of Cancer and have updated the Fusion, VCF and Sample Features files.
The Cancer Mutation Census (CMC) is an undertaking to classify coding mutations in COSMIC and identify variants driving different types of cancer. We are often asked which are the mutations that matter the most - the CMC will help to answer this question! The CMC allows for the prioritisation of somatic mutations that introduce biologically relevant changes to protein function, and participate in the development of cancer.
Metrics including ClinVar significance, dN/dS ratios, and variant frequencies in normal populations (gnomAD) have been integrated into this resource. They have been used alongside COSMIC data on mutations' prevalence across 1,500 forms of human cancer. This helps to predict candidates for driver mutations in the coding portion of the genome.
For further information, please read our blog.
Cancer_Gene_Census_Hallmarks_Of_Cancer.tsv.gz -We are providing a new Hallmark download file. This is a manually curated resource which will continue to expand and build upon the high-quality Cancer Gene Census project that COSMIC is well known for.
To our VCF files, we have updated the mutation syntax with the latest HGVS syntax formats and fixed some existing annotation anomalies.
To the fusion file, we have added the gene IDs, gene names and exon numbers for the 3' and 5' fusion genes.
We have added phenotype ID for the primary tissue and histology in the Sample Features file to help uniquely identify samples.
For more information please have a look at the download page, here.
We have aligned the website with the downloads, by merging the SNPs and noSNPs databases as a response to user feedback. This is so that the download files will now mirror what is displayed on the website. The SNPs are still going to be identified in the download files as before. They remain on the website, with the full mutation information displayed.
Initially we excluded these SNPs from the website but from v92 we now include them, but with a flag. The Overview section of each mutation page shows 'SNP Yes' if a mutation has been flagged. Here is an example. SNP flagging is applied to whole genome screens, but not targeted gene screens which are added from expert manual curation. There is a column in each mutation download file which allows SNPs to be identified and filtered out if desired.
FGFR4 (Fibroblast growth factor receptor 4) belongs to a gene family of receptor tyrosine kinases encoding cell surface receptors for fibroblast growth factors. It is mainly expressed during embryonic development and tissue repair following injury, with expression declining postnatally. Dysregulation of FGFR4 via over expression, gene amplification, somatic mutation and germline mutation all play important roles in both cancer development and progression. Somatic mutations have been found in various primary and metastatic cancers including breast (with enrichment in invasive lobular type), prostate, lung and colorectal as well as hepatocellular carcinoma and the predominantly childhood cancer rhabdomyosarcoma (RMS), with enrichment in embryonic RMS type. FGFR4 has been shown to be highly expressed in RMS tissue and frequently mutated. FGFR4 somatic mutations are often oncogenic missense mutations found across the gene, with a hotspot at p.V550 within the tyrosine kinase domain. FGFR4 is considered a druggable target although several mutations have been shown to cause strong resistance to Type I and some Type II inhibitors.
CDK12 (cyclin dependent kinase 12) encodes a serine/threonine kinase involved in the regulation of RNA polymerase II and mRNA processing. The gene is recurrently mutated in high grade serous ovarian carcinoma and metastatic prostate carcinoma where it acts as a tumour supresssor gene with loss of function mutations. These alterations are associated with a tandem-duplicator phenotype, a genomic signature characterized by focal tandem duplications. CDK12 mutations are also detected in many other cancers, including colorectal, gastric, breast, endometrial and bladder. In prostate cancer CDK12 mutations show promise as potential predictive biomarkers for response to immune checkpoint inhibitors.
As part of release v92 we have focused on updating the expert-curated mutation data for 4 RNA splicing factor genes (SF3B1, SRSF2, U2AF1 and ZRSR2). Approximately 60 additional publications with mutation screening data in these genes are included in the release.
Pre-mRNA splicing, the post-transcriptional mechanism for regulating gene expression, is facilitated by a large group of ribonucleoprotein complexes known as the major and minor spliceosome. While mutations have been observed in the large number of genes encoding the splicing machinery, there are 4 which are most frequently mutated: SF3B1, SRSF2, U2AF1 and ZRSR2. The first three code for components of the major spliceosome and ZRSR2 is primarily a component of the minor spliceosome. Recurrent somatic mutations in these 4 spliceosome genes are frequent in haematological malignancies such as myelodysplastic syndrome (MDS), myeloproliferative disorders and acute myeloid leukaemia (AML), and are also found in solid tumours e.g. breast, pancreatic and lung cancers and uveal melanoma, but at lower rates. SF3B1, SRSF2 and U2AF1 mutations are generally mutually exclusive, are located in hotspots and always occur as heterozygous change-of-function mutations. ZRSR2 mutations do not always follow the same pattern; these mutations are usually seen as loss-of-function, occur throughout the gene and can, occasionally, co-occur with other splicing factor mutations.
SF3B1 (splicing factor 3b subunit 1) encodes subunit 1 of the splicing factor 3b protein complex. This gene is the most frequently mutated spliceosome component in MDS where it is altered in more than 80% of cases with refractory anaemia with ringed sideroblasts. This update includes a paper from Shingai et al. (COSP47580) who report mutations in SF3B1, as well as SRSF2 and U2AF1, in MDS and investigate their effect on diagnosis and prognosis. Berger et al. (COSP47520) find that, in contrast to ringed sideroblast (RS)-MDS, the incidence of SF3B1 mutations is low (8%) in RS-AML. SF3B1 is also one of the major molecular alterations underlying uveal melanoma pathogenesis and Sarubi et al. (COSP47578) genotype Brazilian uveal melanomas, finding a mutation rate of 30%. In both uveal melanoma and MDS, these mutations are associated with a better prognosis, whereas in chronic lymphocytic leukaemia SF3B1 mutations are correlated with a worse prognosis. There is also evidence for involvement of SF3B1 alterations in mucosal melanoma and Oiso et al. (COSP47466) report such a case.
SRSF2 (serine and arginine rich splicing factor 2) encodes a member of the serine/arginine (SR)-rich family of pre-mRNA splicing factors, which constitute part of the spliceosome. SRSF2 mutations consistently affect the P95 residue by missense mutation or, more rarely, in-frame insertion or deletion. The latter are reported at low incidence in uveal melanoma by van Poppelen et al. (COSP47526), while Yimpak et al. (COSP47539) study hot spot mutations in SFSR2 and SF3B1 in MDS patients in Upper Northern Thailand, relating their findings to diagnosis and prognosis. The mutational profiles of cases of AML co-mutated with ASXL1 and SRSF2, a prognostically distinct subgroup of AML with very poor outcome, are investigated by Johnson et al. (COSP47646). They find evidence of monocytic differentiation and genetic overlap with chronic myelomonocytic leukaemia (CMML), suggesting this subset of AML may often arise as a secondary AML from an occult CMML-like MDS or MDS/MPN.
U2AF1 (U2 small nuclear RNA auxiliary factor 1) encodes the small subunit of U2 auxiliary factor, which plays a critical role in both constitutive and enhancer-dependent RNA splicing. U2AF1 hotspot mutations occur at residue S34 or Q157, each of which is located in one of the zinc finger domains. Both of these mutations are observed almost equally in haematological malignancies, while in solid cancers, such as non small cell lung cancer, the S34 mutation is the more frequent hotspot. Carruale et al. (COSP47380) report a case of acute basophilic leukaemia, a very rare form of acute leukaemia, where a Q157 U2AF1 mutation was identified.
ZRSR2 (zinc finger CCCH-type, RNA binding motif and serine/arginine rich 2) encodes a protein that associates with the U2 auxiliary factor heterodimer, which is then required for the recognition of a functional 3' splice site in pre-mRNA splicing. ZRSR2 mutations are found in approximately 5-10% of MDS patients. Janusz et al. (COSP43153) combine conventional Sanger and next generation sequencing to identify mutations in ZRSR2 and the other spliceosome-related genes in more than 85% of the RS-MDS cases that they tested, suggesting this two-step approach could be useful in patients with an unclear diagnosis.
Oncogenic ALK fusions, most commonly EML4-ALK, occur in 3-5% of patients with non-small cell lung cancer (NSCLC). While crizotinib, a multitargeted tyrosine kinase inhibitor, is effective in treating these patients, most acquire resistance against ALK inhibitors through various molecular mechanisms, including on-target resistance mutations. This v92 release includes a case report by Sharma et al. (COSP47266) where a NSCLC patient develops resistance to crizotinib and then to brigatinib. The latter was caused by a drug resistant compound mutation L1196M/G1202R that also confers primary resistance to lorlatinib, a third generation ALK inhibitor. Gainor et al. (COSP41796) analyse repeat biopsies from ALK-positive lung cancer patients progressing on various ALK inhibitors and find that each ALK inhibitor is associated with a distinct spectrum of ALK resistance mutations and that the frequency of G1202R increases significantly after treatment with second-generation agents, including brigatinib. Yoda et al. (COSP47670) perform whole exome sequencing in three lung cancer patients and confirm the stepwise accumulation of ALK mutations and the emergence of compound mutations during sequential treatments. The identification of these changes is critical to informing drug design and developing effective therapeutic strategies for these patients. Treatment with lorlatinib, active against most known single ALK resistant mutations including the highly refractory G1202R, results in an ALK I1171S and G1269A compound mutation in a case report from Takahashi et al. (COSP47645).
Follow links below to the 18 papers which are new in v92, or view the full table of papers here.
COSMIC v91 (April 2020) includes 4 new fully curated genes, substantial curation update on APC gene, we have also focussed on testicular and other male cancers as well as breast implant related lymphoma. There are nearly 5 million new coding mutations, 2 million genomic mutations, 2 million non-coding mutations and 2900 new whole genomes. We have curated 48 new systematic screen papers and have also updated our ICGC dataset to v28, which includes 2 new studies: ICGC( BPLL-FR ) : B-Cell Prolymphocytic Leukemia - FRi and ICGC( GACA-JP ) : Gastric Cancer - JP; with a complete re-annotation using Ensembl Variant Effect Predictor (VEP). We are including new data download files to track the mutations and new normalised VCF files for COSMIC and Cell Line Project.
COSMIC's cancer genome data is interpreted into standardised annotations from a variety of sources, described here.
CosmicMutationTracking.tsv.gz/CellLinesMutationTracking.tsv.gz -We are providing a new mutation tracking file for COSMIC and the Cell Line Project to map the legacy COSM/COSN IDs to the new genomic ID (COSV), along with the gene names, accession number and a new unique mutation identifier. We will also indicate in the file if these mutations were coding or non-coding. There is also a field that indicates if the annotation is on the canonical transcript.
CosmicCodingMuts.normal.vcf.gz/ CosmicNonCodingMuts.normal.vcf.gz/ CellLinesCodingMuts.normal.vcf.gz/ CellLinesNonCodingMuts.normal.vcf.gz - We have also improved our VCF files to include HGVS syntaxes on the genomic (HGVSG), on the cds (HGVSC) to include the transcript accession number with the version and on the peptide (HGVSP) with Ensembl's peptide accession.
To our VCF files, we have added normalised version denoted with the suffix , where each variant is 5' shifted whilst maintaining the HGVS compliant (3' shifted) syntaxes in the INFO section. This reflects the non-normalised version where different. We have also compressed these files with bgzip following user feedback.
All our files with the mutation syntaxes have got these additional columns of HGVSG, HGVSC and HGVSP, for more information please have a look at the download page, here.
KMT2A (lysine methyltransferase 2A, formerly MLL) encodes a protein containing multiple conserved functional domains including the SET domain, which is responsible for histone H3 lysine 4 (H3K4) methyltransferase activity, mediating chromatin modifications associated with epigenetic transcriptional activation. Recurrent KMT2A mutations, mostly nonsense, frameshift and missense, have been found in peripheral T-cell lymphoma-not otherwise specified. Additionally, components of the histone methyltransferase complex, including KMT2A, show a high frequency of alterations in pancreatic ductal adenocarcinoma and in oesophageal sarcomatoid carcinoma. KMT2A point mutations are relatively rare and mutations are often part of the "long tail" of possible driver mutations seen in many cancer types e.g. breast cancer and colorectal cancer.
CDKN1B (cyclin dependent kinase inhibitor 1B) belongs to a family of CDK inhibitor genes that includes CDKN1A (encoding for p21/WAF1) and CDKN1C (encoding for p57/KIP2). CDKN1B encodes protein p27/KIP1, which binds to and prevents the activation of cyclin E-CDK2 or cyclin D-CDK4 complexes, controlling cell cycle progression at G1. Evidence suggests CDKN1B is a haploinsufficient tumour suppressor gene. In large sequencing studies, driver mutations in CDKN1B have been detected in selected cancer types, where truncating frameshift or nonsense mutations are largely predominant. Recurrent somatic mutations and deletions in CDKN1B are also found in small intestinal neuroendocrine tumours, affecting approximately 8% of tumours, and in primary luminal breast cancer there are recurrent CDKN1B mutations, mostly truncating mutations occurring in the C-terminal. Novel driver CDKN1B mutations have been identifed in classical hairy cell leukaemia (HCL) where the gene is mutated in up to 16% of patients, suggesting a role for cell cycle deregulation in the pathogenesis of HCL. Additionally, studies have shown CDNK1B to be significantly mutated in prostate cancer, commonly with deletions..
MYC (MYC proto-oncogene) protein is a transcription factor that activates transcription of growth-related genes. Recent cancer genome sequencing efforts affirm that MYC is one of the most frequently amplified genes across many cancer types. In Burkitt's lymphoma (BL) it is the MYC translocations that are the hallmark of the cancer. While generally considered an under-mutated gene, majority of BLs also acquire somatic MYC mutations that can have increased oncogenic potency. MYC gene translocation into one of the immunoglobulin loci may drive a hypermutation phenotype often observed in the BL. Most BL cells express only the translocated allele whereas the normal allele is transcriptionally silent. Clustered somatic mutations located in the transcriptional activation domain are found in aggressive lymphomas arising in the acquired immunodeficiency syndrome (AIDS) and the presence of mutations is correlated with the rearrangement of the oncogene. Mutations were also found in other de novo non-AIDS, non-Burkltt's aggressive lymphomas with MYC rearrangements.
Transformation/transcription domain-associated protein (TRRAP, located at 7q22.1) encodes a large multidomain protein of the phosphoinositide 3-kinase-related kinases (PIKK) family and functions as part of a multiprotein coactivator complex. It has histone acetyltranferase activity and is involved in chromatin remodelling as well as Wnt-signalling and acts as a positive regulator of both wild type and mutant TP53 transcription levels and is central in MYC transcription activation. TRRAP appears to act as an oncogene. Missense mutations have been observed in a variety of cancers, notably a recurrent mutation at p.S722F observed in melanomas as well as other cancer types. Codon p.S722 is highly conserved evolutionarily and knock down studies have suggested that mutant p.S722F TRRAP is necessary for melanoma cell survival. Other missense mutations have been seen in cancers including WaldenstrÖm macroglubulinaemia, sebaceous carcinoma, appendiceal goblet cell carcinoid, bladder cancer, lymphomas, urinary tract and colorectal cancer as well as high-risk ulcerative colitis.
As part of the v91 release we have focused on updating the expert-curated mutation data for the gene encoding APC, Adenomatous Polyposis Coli. Over 202 additional publications screening the APC gene (amongst others) have been surveyed with the addition of 736 new APC mutations and over 5000 samples.
APC is a large protein (2843 amino acids) encoded by a gene on chromosome 5q21-22. Being a multi-domain protein, APC serves multiple functions through different binding partners. It is involved in cellular processes relating to cell migration, cell adhesion, proliferation, differentiation and chromosome segregation. The gene for APC is a tumour suppressor, dysregulated at both the germline and somatic level. Germline mutations result in Familal Adenomatous Polyposis, the major hereditary predisposition event leading to CRC development. Somatic APC mutations are found in approximately 80% of all sporadic non-hypermutated CRC patients. More than 90% of APC mutations generate premature stop codons, resulting in stable truncated gene products, most (~60%) of which occur within a region referred to as the mutation cluster region (MCR). C-terminal truncated proteins present in CRC lack the domains that are required for binding to microtubules, end-binding protein 1 (EB1) and ??-catenin potentially leading to the induction of chromosome instability, activation of proliferation and inhibition of differentiation. Hence, as a tumour suppressor, loss of APC function caused by bi-allelic mutations and/or LOH lead to constitutive activation of the Wnt/??-catenin pathway, which is considered one of the driving forces of the initiation and development of colorectal tumours. Additionally, APC driven CRC tumorigenesis occurs independently of the Wnt signalling through the loss of effect of the protein on chromosome segregation; cellular polarity and migration; and DNA replication.
The papers in this update have followed the evolution of the study of the involvement of this gene in sporadic human cancers, particularly colorectal carcinoma (CRC). From early Single Strand Conformation Polymorphism (SSCP) papers of a single gene to Next Generation Sequencing (NGS) studies of whole exomes and targeted panels, researchers have looked at the incidence of APC mutations in a huge array of cancer types - including those that show a high rate of mutation (e.g. subtypes of biliary tract carcinoma (reviewed by Roos et al in COSP46736), those with a low frequency of mutations (e.g.primary multiple melanoma, COSP46996) and cancers with no observed APC mutations (e.g. cemento-ossifying fibromas, COSP44725). Others have studied the incidence of APC mutations in a variety of populations, e.g. Chinese (COSP45556), Romanians (COSP46499) and Thai (COSP46949).
After the initial findings of mutation incidence in CRC, many researchers have looked the role of APC (and TP53) in tumour development, as part of the adenoma-adenocarcinoma-carcinoma pathway (e.g. COSP46469, COSP46949), and the occurrence of metastases (e.g. COSP43720). Mutations in APC also have a role in the development of CRC through the inflammation pathway, as evidenced in patients with ulcerative colitis (COSP44212) and Crohn's disease (COSP45588). Additionally, mutations in APC are thought to be a universal initiating event in gastric carcinogenesis (COSP42555).
Other recent advances, in the use of plasma and serum to detect APC mutations in cell-free DNA, could have a role in screening and early diagnosis of CRC (e.g. COSP46679, COSP45523) . Likewise, stool DNA has been a similar source for screening for both CRC (e.g. COSP46463) and gastric carcinoma (COSP44856).
Finally, with regards to treatment of CRC, APC mutations have been studied in a variety of contexts. As examples, one study showed that they are involved with resistance to preoperative chemoradiotherapy (COSP47277). With more targeted drugs, such as G007-LK, a tankyrase inhibitor (TI), it was demonstrated (COSP434130) that TI responsive cells harbour the short form APC mutation.
As part of release v91 we have focused on updating the expert-curated mutation data for testicular and other male cancers. Over 40 additional publications with mutation screening data in these diseases are included in the release.
Testicular germ cell tumours (TGCT) are the most common testicular cancers and, although relatively rare, are the most frequent cancer type in younger men aged 15-49. They progress from precursor lesions, germ cell neoplasia in situ, and show a heterogeneous clinical and pathological range. Broadly classified as seminomatous and non-seminomatous, the latter is further characterized by different histological subtypes, such as embryonal carcinoma, yolk sac tumour, teratoma and choriocarcinoma. These tumours can be pure or comprised of more than one histological component. Recurrent somatic mutations in KIT, KRAS, BRAF and NRAS have been reported in TGCT, but generally point mutations are uncommon, and TP53, frequently mutated in many cancer types, is rarely mutated. Now advances in next generation sequencing have enabled the genomic landscape of TGCT to be better studied. This update includes a paper by Boublikova et al. (COSP46711) who confirm the frequency of RAS/BRAF mutations and identify WT1 as a novel factor involved in TGCT pathogenesis, with potential as a prognostic marker. Outcome for TGCT is often good, with many patients responding to combination cisplatin- and etoposide-based therapies, but approximately 20% will progress or relapse after first-line chemotherapy. Necchi et al. (COSP46693) study a chemorefractory subset with TGCT and find different alterations in seminomas and non-seminomas. They suggest targeted therapy for KRAS alterations and immunotherapy for a subset of nonseminomas.
Testicular sex cord-stromal tumours (TSCST)are uncommon tumours also with diverse histology e.g. Leydig cell tumours, Sertoli cell tumours and granulosa cell tumours. While the majority of these are clinically benign, 5-10% are malignant and present with metastatic lesions or relapse with metastases. Systemic treatment of patients with malignant disease is not standardized. Necchi et al. (COSP46698) perform comprehensive genomic profiling of malignant TSCST to identify potential therapy targets. They find targetable alterations uncommon in all types of malignant TSCST although some tumours show potential for mTOR inhibitors (PTEN-mutated) and hedgehog inhibitors (PTCH1-mutated) . Tatsi et al. (COSP46876) report a rare case of testicular large cell calcifying Sertoli cell tumour with a somatic mutation in PRKAR1A mutation, with no association with Carney complex, a hereditary disorder characterized by multiple benign tumors and often with germline inactivating PRKAR1A mutations.
Male breast cancer (MBC) is very rare and accounts for less than 1% of all breast neoplasms. Moelans (COSP46952) study the landscape of MBC by targeting all exons of 1943 cancer-related genes in more than 135 cases. They find recurrent PIK3CA and GATA3 mutations, with results mirroring those in female breast cancer to some extent, but TP53 mutations are significantly less frequent in MBC whereas mutations in genes regulating chromatin function, such as PBRM1 and KMT2C, are more prevalent. These differences provide additional evidence that MBC is its own entity, requiring a different clinical approach.
Penile squamous cell carcinoma (PSCC) is also a rare malignancy, in the developed world, and advanced PSCC is associated with poor survival, with many showing chemo-/radio-resistance. Huang et al. (COSP46841) evaluate salvage therapy with the EGFR mono-antibody nimotuzumab in chemorefractory advanced PSCC with mutations in TP53, CDKN2A and PIK3CA, while Trafalis et al. (COSP46719) report successful treatment with human programmed death receptor-1 (PD-1) blocking antibody nivolumab in a case of radio- and chemorefractory advanced PSCC with a CDKN2A mutation.
Additionally, Frick et al. ( COSP46705) screen diffuse large B cell lymphomas and find primary testicular lymphomas to be significantly associated with mutations in CD79B and MYD88, and Michalova et al. ( COSP46703) report a pancreatic analogue, solid pseudopapillary neoplasm (SPN) of the testis. A comparison of mutational profiles of both testicular and pancreatic SPN showed oncogenic mutations in exon 3 of CTNNB1 in both.
Reports of lymphomas associated with cosmetic and reconstructive breast implants appeared over 20 years ago and the 2017 WHO classification update of lymphoid neoplasia recognised these ALK-ve CD30 +ve tumours as a distinct subtype of non-Hodgkin T-cell lymphoma: Breast implant-associated anaplastic large cell lymphoma (BIA-ALCL). The morphological features of these rare tumours are similar to other ALK-ve ALCL tumours, but the location adjacent to implants, the molecular landscape and generally favourable outcome are distinct (reviewed in Oishi et al. COSP46322).
BIA-ALCLs usually present as a periprosthetic effusion some years after a textured implant, having arisen in the seroma cavity surrounding the implant (in situ lymphoma). Without invasion of surrounding tissues surgical removal of the implant and total capsulectomy is associated with an excellent outlook. However, BIA-ALCLs can present with lymph node involvement or as a mass, and these are adverse prognostic factors.
Molecular analysis of BIA-ALCLs in eight publications demonstrates frequent mutations in JAK/STAT pathway genes, in particular recurrent gain-of-function activating mutations in the JAK1 kinase domain (JAK1 p.G1097D/C/V) and the STAT3 SH2 domain (p.S614R, p.Y640F, p.D661Y, p.G618R). Mutations are also present in other genes involved in the JAK/STAT pathway such as STAT5A/5B, SOCS1, SOCS3 and PTPN1. In addition, mutations occur in TP53 and several epigenetic genes such as KMT2C, KMT2D, CHD1 and CREBB (Breast implant-associated anaplastic large cell lymphoma (BIA-ALCL). Whilst the JAK/STAT pathway mutations are usually activating, those observed in epigenetic regulators are often potentially inactivating nonsense or frameshift mutations. In one study ( COSP47421) over 70% of tumours had mutations in an epigenetic regulator or histone modifier, compared to 59% in JAK/STAT pathway gene, suggesting an important role for epigenetics. Fusion genes found in other ALCL subtypes (ALK, DUSP22, TP63) are absent from BIA-ALCL.
The single most frequent mutation reported, STAT3 p.S614R, results in increased transcriptional activity of STAT3 whilst JAK1 p.G1097 mutations constitutively activate STAT3, and it is a feature of BIA-ALCL that STAT3 is activated, regardless of the mutation status of JAK/STAT pathway genes ( COSP47077). Over 50% of BIA-ALCL cases are mutated in JAK/STAT cascade genes and it is thought likely that chronic inflammation and interference in cytokine receptor signalling play a role in BIA-ALCL.
Co-occurrence of two or more somatic mutations in genes involved in the JAK/STAT pathway/regulation, or in combination with an epigenetic regulator/histone modifier gene or TP53 is not unusual (COSP42682, COSP45295, COSP46938, COSP47421 ) and chromosomal deletions in regions containing these genes are also reported. Several publications report presence of additional germline mutations in JAK/STAT pathways genes and TP53 ( COSP42530, COSP46172), and it has been suggested that double hits enhance JAK/STAT signalling in BIA-ALCL, acting together to facilitate growth.
Whilst most BIA-ALCL cases are limited to the effusion in the seroma cavity, some present as masses and current data suggests differences between them in mutation patterns and rates. Letourneau et al. (COSP45295) reported a solid tumour positive for recurrent STAT3/JAK1 mutations, a second JAK1 nonsense mutation and a TRG-TRB rearrangement. An in situ tumour identified during subsequent implant-related surgery lacked the second JAK1 nonsense mutation. Furthermore, the 15 solid and 19 in situ tumours studied by Laurent et al. (COSP47421) showed that 80% and 42% respectively presented with mutations in the JAK/STAT pathway; the solid tumours had a significantly higher STAT3 mutation rate and were more likely to contain mutations in cell cycle controlling genes.
Mukhtar et al. (COSP46938)reported a case of synchronous breast tumours 18 years after an implant; the stage 2 BIA-ALCL was positive for a recurrent STAT3 mutation as well as several others, but the second tumour, a breast invasive carcinoma, shared no common mutations.
Chen et al. (COSP43377) looked at the sensitivity of ALK-ve ALCL cell lines (including two BIA-ALCL cell lines) to JAK inhibitors and found them to be sensitive in all cases, regardless of the presence of STAT3 and/or JAK1 mutations.
Follow links below to the 48 papers which are new in v91, or view the full table of papers here.
COSP44549COSP47139COSP47421COSP43982COSP46568COSP45556COSP45498COSP47277COSP47224COSP39957COSP47104COSP42128COSP46181COSP44881COSP46970COSP43252COSP46949COSP47107COSP47161COSP47075COSP46968COSP43281COSP43720COSP45187COSP44800COSP43830COSP39315COSP40528COSP46756COSP46552COSP33607COSP46701COSP45470COSP45825COSP46696COSP44727COSP39585COSP40773COSP40827COSP45984COSP39390COSP45540COSP45457COSP33421COSP45959COSP42555COSP42149COSP44543
Follow links below to the 2 studies which are new in v91, or view the full table of studies here.
COSU683 GACA-JPCOSU693 BPLL-FR
COSMIC v90 (September 2019) of the COSMIC database has undergone an extensive update and reannotation, in order to ensure standardisation and modernisation across COSMIC data. This has substantially improved the identification of unique variants that may have been described at the genome, transcript and/or protein level. The introduction of a Genomic Identifier (COSV), along with complete annotation across multiple, high quality Ensembl transcripts and improved compliance with current HGVS syntax, will enable variant matching both within COSMIC and across other bioinformatic datasets.
For a more detailed explanation of the changes, please have a look at the variant updates page
The first stage of this work was the introduction of improved HGVS syntax compliance in our May release. The majority of the changes are reflected in this release, and this is a work in progress and the remaining changes will be introduced over the next few releases.
COSM476
Genomic mutation identifier (COSV) indicates the definitive position of the variant on the genome. This identifier is trackable and stable between different versions of the release. Also, this identifier remains the same between different assemblies (GRCh37 and GRCh38). This will be our preferred way to identify mutations.
Legacy mutation identifier (COSM) represents existing COSM mutation identifiers. This identifier remains the same between different assemblies (GRCh37 and GRCh38). Previously, each mutation at a specific genomic coordinate but on different transcripts had a unique COSM identifier. Now, all the COSM identifiers at the same genomic location have been collapsed into one representative COSM identifier. These ids are maintained to help track existing mutations.
The legacy mutations can either be searched from the search bar by typing in eg: COSM7088790 or they can be accessed by the mutation page URL : https://cancer.sanger.ac.uk/cosmic/mutation/overview?id=7088790
COSM7088790
These are internal identifiers that are unique to a mutation on a particular transcript and are displayed in the URL of the mutation pages. Therefore, several of these internal ids could be associated with a single genomic COSV id where the mutation has been mapped to all overlapping genes and transcripts.
Similarly, since every COSM id is mapped to one COSV id (where genomic coordinates are known), each COSM id can also be associated with several alternative (internal) identifiers. These ids are expected to change between assemblies (GRCh37 and GRCh38) and between the releases.
We have made substantial updates on the website to continue supporting the existing COSM identifiers. The legacy identifiers can either be searched using the search bar with the identifier eg COSM580 or by their previous URLs eg https://cancer.sanger.ac.uk/cosmic/mutation/overview?id=580
COSM580
Once the website finds the corresponding new mutation identifier for the legacy identifier, it will automatically redirect the user to the new mutation page. eg COSM580 redirects to https://cancer.sanger.ac.uk/cosmic/mutation/overview?id=97107326 (GRCh38)
Similarly, to handle the de-duplication of data, where duplicate variants have been merged into one representative variant, the merged COSM identifiers are still accessible by the search function and the website redirects to the newly merged identifier seamlessly. In the case of a merged mutation, the mutation overview webpage will display this message.
The legacy mutation COSM5846084 has now been merged into the following mutation
eg : COSM5846084 and COSM5846086 are merged in to COSM5846085 or https://cancer.sanger.ac.uk/cosmic/mutation/overview?id=5846084. For more details on the mutations, please have a look at the variant updates page.
COSM5846084
COSM5846086
COSM5846085
With the mapping of COSMIC gene fusions on to the new, updated transcripts, we have now included a new section with the genomic coordinates of the known fusion breakpoints. eg: https://cancer.sanger.ac.uk/cosmic/fusion/summary?id=1013
As each variant has been mapped on all relevant Ensembl transcripts, the number of rows in the majority of variant download files has increased significantly. In the download files, additional columns are provided including the legacy identifier (COSM) and the new genomic identifier (COSV). An internal mutation identifier is also provided to uniquely represent each mutation on a specific transcript on a given assembly build. The accession and version number for transcripts are included. File descriptions for each of the download files will be available from the downloads page for clarity.
Please note: Due to the various major updates made to the data, the statistics describing the release, such as mutation count, are no longer directly comparable to those in previous releases. These significant changes include annotation on multiple transcripts, deduplication of variants etc, please have a look at the (variant updates page) for more details.
COSMIC v89 (May 2019) includes 4 new fully curated genes, substantial curation update on Bile duct cancer (cholangiocarcinoma), 231 new whole genomes from 10 new systematic screen papers. We have also included mutational signatures V3, characterising 49 single base substitutions, 11 doublet base substitution, four clustered base substitution, and 17 small insertion and deletions.
DDX3X (DEAD-box helicase 3 X-linked) encodes a member of the large DEAD-box protein family which has ATP-dependent RNA helicase activity and RNA-independent ATPase activity. The protein has multiple conserved domains and plays roles in both the nucleus and cytoplasm. Recurrent mutations in DDX3X, a tumour suppressor gene, are found in medulloblastoma, where the WNT-subgroup is enriched for these mutations, with missense mutations clustering in the DEAD-box helicase domain. In chronic lymphocytic lymphoma inactivating DDX3X mutations are also detected, especially in cases with unfavourable clinical markers. Next-generation sequencing studies have shown a high frequency of truncating and missense DDX3X mutations in extranodal natural killer/T cell lymphoma, where most of the mutations affect the two highly conserved domains: the ATP-binding helicase domain and the C-terminal helicase domain. Mutations are also present in the closely related aggressive natural killer-cell leukaemia as well as in a small subset of pleural and peritoneal mesothelioma, and in human papillomavirus-positive oral squamous cell carcinoma.
LATS1 (large tumour suppressor kinase 1), found at the genomic locus 6q25.1, encodes for a serine/threonine kinase which acts in the Hippo signalling pathway via YAP/TAZ proteins by restricting proliferation and promoting apoptosis. Genomic alterations in Hippo pathway components have been found in many tumour types. These include malignant mesothelioma, where LATS1 fusion genes and deletions can result in inactivation of its tumour suppressor function, and ER+ breast cancer, where these alterations have also been found to promote cyclin dependent kinase 4/6 inhibitor resistance. Abnormal Hippo signalling pathway is involved in the pathogenesis of most sporadic schwannomas. Oh et al. (COSP46088) found LATS1 mutations in 2% of schwannoma cases and LATS1 promoter methylation in 17% of the cases. In COSMIC the expert curated mutations are primarily annotated on our canonical transcripts but for this gene they have been temporarily annotated across two transcripts, LATS1 and LATS1_ENST00000543571.
NCOR2 (nuclear receptor corepressor 2), also called SMRT, mediates the transcriptional repression activity of some nuclear receptors by promoting chromatin condensation. NCOR2 has been shown to repress androgen receptor activity thus providing a rationale for the role of NCOR2 inactivating mutations in prostate cancer initiation and progression. NCOR2 acts as a haploinsufficient tumour suppressor gene. It is known to be involved in several repression pathways; therefore, its down-regulation can potentially activate numerous downstream genes that are normally repressed. Accordingly, NCOR2 mutations have been observed in many cancer types such as colorectal, gastric and breast cancers, non-Hodgkin and marginal zone lymphomas and acute lymphoblastic leukaemia. In COSMIC the expert curated mutations are primarily annotated on our canonical transcripts but for this gene they have been temporarily annotated across two transcripts, NCOR2 and NCOR2_ENST00000405201.
PIK3CB encodes the p110β isoform of the catalytic subunit of the class I phosphoinositide 3-kinases (PI3Ks). PI3K signalling pathway has been shown to be important across cancer, with another class I catalytic subunit isoform, PIK3CA, already well-established as a key gene and drug target in many different cancers. PIK3CB mutations are seen far less frequently than those in PIK3CA, but there is now growing evidence that PIK3CB also plays a significant role, in particular in PTEN-deficient tumours, with mutations in the kinase domain (D1067, E1051) having been shown to have oncogenic potential. These mutations are seen at low levels in a wide range of cancers, including large intestine, skin, lung, endometrium and prostate.
As part of release v89 we have focused on updating the expert-curated mutation data for bile duct cancer (cholangiocarcinoma). Approximately 60 additional publications that include mutation screening data in this disease are included in the release.
Bile duct cancer (BDC) is the second most common adult primary liver cancer after hepatocellular carcinoma and the majority are adenocarcinomas arising from the epithelial lining of the bile duct. They are classified as intrahepatic, perihilar or distal extrahepatic based on the predominant location, and each subtype has a distinct epidemiology, biology, prognosis and disease management. There is a range of associated risk factors for BDC including primary sclerosing cholangitis, hepatitis B and C, and in South East Asia, particularly northern Thailand, liver fluke infection. Worldwide the incidence of intrahepatic BDC (iBDC) is increasing and its prognosis is dismal; patients with metastatic disease have a life expectancy of less than one year. Since iBDC remains asymptomatic until reaching an advanced stage most patients present with advanced, unresectable or metastatic disease, limiting the number of available treatment options. Surgical resection remains the mainstay of potentially curative treatment for localised BDC but even after surgery the prognosis is poor. Many patients are not suitable for resection and these cases may receive palliative treatment with gemcitabine plus cisplatin.
The genomic landscape of BDC differs depending on disease location. IDH1 and IDH2 mutations, mostly at their hotspots, are almost exclusive to iBDC, while ERBB2 and BAP1 mutations are common in extrahepatic BDC (eBDC) and less common in iBDC. Recently, FGFR2 alterations, mostly fusions, have been identified in iBDC. This update includes a paper by Churi et al. (COSP38231) who perform next generation sequencing on both iBDC and eBDC, demonstrating their significant molecular differences. In iBDC, genetic alterations in KRAS, TP53 or MAPK/mTOR are significantly associated with a worse prognosis while FGFR alterations correlate with a relatively indolent disease course. In a publication by Trachu et al. (COSP44522) molecular alterations are also explored, as well as clinical prognostic factors, in Thai patients with BDC. Goeppert et al. (COSP46197) use integrative genomic and epigenomic analysis to identify four major iBDC subgroups with genomic and epigenomic differences and prognostic implications.
Genomic studies of BDC have shown that they have many potential actionable targets. Zou et al. (COSP38150) identify 8 potential targetable driver genes, including IDH1, TP53, KRAS and ARID1A, in a Chinese population. In cases with FGFR2 fusions, promising therapeutic activity has been demonstrated with tyrosine kinase inhibitors such as BGJ398 that inhibit the FGFR2 pathway but such success is tempered by follow-up reports of resistance. Goyal et al. (COSP42875) describe genetic mechanisms of clinical acquired resistance to FGFR inhibition in patients with FGFR2 fusion-positive iBDC.
Publications in this update also cover the molecular characterisation of rare variants of BDC, including clear cell intrahepatic cholangiocarcinoma (COSP45941) and mucinous cholangiocarcinoma (COSP45954), as well as the combined tumour hepatocellular-cholangiocarcinoma (COSP42266).
Follow links below to the 10 papers which are new in v89, or view the full table of papers here.
COSMIC v88 (March 2019) includes 4 new fully curated genes, substantial curation updates for TSC1 and TSC2 genes, and 205 new whole genomes from 10 new systematic screen papers. We have also added 17 new gene descriptions to the Hallmarks of Cancer.
Importantly, we have also updated the HGVS syntax on some of the manually curated mutations in COSMIC. This is one of the steps we will be taking towards the extensive re-annotation of the entire COSMIC database. Further significant changes will be coming in future releases. For details of the HGVS changes you can expect, please see the help documentation.
MAP3K1 (mitogen-activated protein kinase kinase kinase 1) encodes a protein which is a serine/threonine kinase and part of some signal transduction cascades, including the ERK and JNK kinase pathways as well as the NF-kappa-B pathway. Recurrent loss of function MAP3K1 mutations, including frameshifts, nonsense and splicing mutations, suggesting a tumour suppressor role, have been found in breast cancer. The mutations are most frequent in the luminal A subtype and are associated with indolent disease and a favourable prognosis. Genome studies have also identified low to moderate frequency MAP3K1 deletions and mutations in other tumour types such endometrial, colorectal and lung.
LATS2 (large tumour suppressor kinase 2), found at the genomic locus 13q12.11, encodes for a serine/threonine kinase which acts in the Hippo signalling pathway via YAP/TAZ proteins by restricting proliferation and promoting apoptosis. Its role in carcinogenesis is mediated in part via somatic mutations in many tumour types. Mutations such as nonsense, frameshift and missense mutations are likely to affect LATS2 functionality, possibly through negatively affecting interactions with positive regulators such as NF2 or MOB1, by damaging the catalytic domain of LATS2, interfering with activating phosphorylations through upstream kinases, or other mechanisms. Data from malignant mesothelioma indicate that Hippo pathway dysregulation is frequent in MM cells with inactivation of LATS2 or an upstream regulator, such as NF2, of this pathway. In a series of malignant pleural mesothelioma patients the LATS2 gene was altered in 11% of MPM by point mutations and large exon deletions. Genetic data coupled with transcriptomic data allowed the identification of a new MPM molecular subgroup, C2LN, characterized by a co-occurring mutation in the LATS2 and NF2 genes in the same MPM. MPM patients of this subgroup presented a poor prognosis.
LEF1 (Lymphoid enhancer factor 1) located at 4q25, encodes a member of the lymphoid enhancer factor/T-cell factor (LEF/TCF) family of DNA binding transcription factors and shares homology with the high mobility group protein-1. LEF1 interacts with nuclear β-catenin in the Wnt signalling pathway and can act as either a tumour suppressor gene or an oncogene. Two major isoforms are found, including a short form which lacks the β-catenin binding domain at the N terminus and shows dominant negative activity, suppressing the Wnt pathway by preventing β-catenin recruitment. LEF2 plays an important role in early lymphoid development and increased expression is associated with poor prognosis in several leukaemias. Somatic mutations and deletions/microdeletions have been found in a variety of childhood and adult T and B cell acute lymphoblastic leukaemias (ALL). These mutations include truncating frameshift and nonsense mutations as well as gain-of-function missense mutations in the β-catenin binding domain, and these missense mutations have been shown to be associated with proliferation of ALL cells. Expression of N terminus-truncated LEF1 in mice results in tumours with sebaceous differentiation, and in humans nonsense and missense mutations located in the β-catenin binding domain which result in inactivation of Wnt signalling have been found in sebaceous carcinomas and adenomas as well as benign sebaceomas.
PTPRT (protein tyrosine phosphatase receptor-type, T) encodes a protein tyrosine phosphatase (PTP) which is a member of the R2B subfamily of the PTP family of proteins. Located on chromosome 20 (20q13.11), PTPRT is the most frequently mutated PTP gene in human cancer and genetic changes have been observed in a variety of cancer types including colon, endometrial, bladder, oesophageal, head and neck, lung, skin and stomach. These mutations are mainly missense single nucleotide variants, distributed throughout the gene, although nonsense, insertion and deletion mutations that result in premature truncation of the protein have also been found. This distribution of mutation types and sites is suggestive of a tumour suppressor gene. An idea that is further supported by studies in knock-out mice and the fact that loss of PTPRT function also occurs in human cancers through methylation of promotor DNA. PTPRT is thought to act as a tumour suppressor by dephosphorylating activated STAT3 and paxillin, two substrates which have an oncogenic role in tumour development, when phosphorylated. PTPRT also has a role in cell-cell adhesion which is affected negatively by loss of function mutations. In addition, deleterious mutations in PTPRT (predominantly those occurring in the phosphatase domain) have been implicated in the drug resistance mechanism of metastatic colorectal carcinomas treated with bevacizumab together with chemotherapy.
As part of release v88 we have focused on updating the expert-curated mutation data for TSC1 and TSC2 genes. Approximately 60 additional publications that include mutation screening data for these genes are included in the release.Tuberous sclerosis complex (TSC) is an autosomal dominant disorder characterized by skin manifestations and formation of multiple benign and/or malignant tumours in different organs. In addition, TSC disease causes disabling neurologic disorders, including autism, intellectual impairment and seizures. Postzygotic somatic mutations in TSC1 or TSC2 result in mosaic forms of tuberous sclerosis complex, which may present as disseminated or segmental disease, or with mild disease or later onset than the inherited disease.TSC1 and TSC2 encode proteins hamartin and tuberin, respectively, which interact and form a heterodimer that inhibits the activation of mammalian target of rapamycin (mTOR) complex 1, a master regulator of nutrient and growth factor induced signalling. As common for tumour suppressors, tumorigenesis involves inactivation of the second allele, typically via a second-hit somatic mutation in the wild-type allele of the TSC gene. Sporadic tumours involving somatic mutations in TSC1 or TSC2 with no apparent germline mutation are also found. Current COSMIC release reports somatic mutations in renal cell carcinoma (RCC), subependymal giant cell astrocytomas and other gliomas, bladder cancer, hepatocellular carcinoma, melanoma, angiomyolipoma and lymphangioleiomyomatosis amongst many other tumour types.In a study of 91 mucosal melanoma patients the overall somatic mutation frequency of TSC1 was found to be 17.6%. The TSC1 mutations were mainly missense mutations and were found across 11 different exons. The mutations were more inclined to occur in advanced mucosal melanoma and these patients had a worse outcome than patients without TSC1 mutations (COSP44621).In a case of a young adult with renal epithelioid angiomyolipomas (EAML), a rare tumour type with aggressive behaviour, a complete and durable response to sirolimus was achieved, the patient being disease free after 36 months of treatment (COSP45400). Targeted next generation sequencing (NGS) of MTOR, TSC1 and TSC2 genes revealed one inactivating TSC2 mutation (c.2739dup; p.K914*) in the tumour cells. The authors demonstrated mTOR pathway activation and TSC2 inactivation as the mechanism for the response.Similar examples of significant clinical response to mTOR inhibitors in other patients with sporadic or inherited disease such as RCC and Hodgkin's lymphoma have also been described, and some of these publications are included in the current release. These studies support sequencing as a useful tool to identify patients sensitive to mTOR inhibitors and support mTOR inhibition as an important therapeutic approach in these malignancies.
Based on the concept defined by D. Hanahan and R. A. Weinberg, COSMIC, in collaboration with Open Targets, integrates functional descriptions focused on Hallmarks of Cancer into the CGC. The Hallmark pages visually explain the role of a gene in cancer by highlighting which of the classic behaviours are displayed by the gene and whether they are promoted or suppressed.
New Hallmark Genes in v88
Follow links below to the 10 papers which are new in v88, or view the full table of papers here.
COSMIC v87 (November 2018) includes 4 new fully curated genes, substantial curation updates for KRAS and Mesothelioma, 1 newly curated fusion pair, and mutation data from 9 new systematic screen papers. We have also added 4 new genes to the Cancer Gene Census and 8 to the Hallmarks of Cancer.
Files for v87 will not be available on our SFTP service, which is now closed. For more information about how to download COSMIC data and the files available please visit the following pages -
ARID1B (AT-rich interaction domain 1B), like ARID1A, encodes a protein which is a component of the SWI/SNF chromatin remodelling complex and may play a role in cell-cycle activation. Somatic mutations in both genes are very frequent in gynaecological and several other solid tumors. ARID1B mutations are found in approximately 20% of ovarian clear cell carcinomas and are also detected in dedifferentiated ovarian and endometrial carcinomas where concurrent ARID1A and ARID1B inactivating mutations result in loss of protein expression in 25% of the tumours. Recurrent mutations in both ARID1A and ARID1B have also been identified in acute promyelocytic leukaemia. The majority of these mutations are loss-of-function alterations, similar to the truncating mutations seen in solid tumours. Additionally, ARID1B is a potential biomarker for neuroblastoma patients with poor prognosis and ARID1B mutations have been found in gastrointestinal stromal tumours lacking alterations in the canonical KIT/PDGFRA/RAS pathways. Mutations are also detected in breast carcinoma, hepatocellular carcinoma, mesonephric carcinoma, schwannomas, microsatellite unstable colorectal cancer and diffuse large B cell lymphoma. ARID1B may be targetable with FDA-approved HDAC inhibitors, including vorinostat and panobinostat.
RBM10 encodes a spliceosomal RNA binding motif protein involved in the regulation of gene expression, predominantly through the regulation of alternative splicing. The gene is located on the X chromosome (Xp11.3) and has been shown to be mutated in a variety of cancer types, including breast, colon, thyroid, ovary, pancreas, prostate and lung. These genetic changes are mainly missense single nucleotide variants, although frameshift insertions predicted to generate truncated proteins and nonsense mutations have also been found. RBM10 acts as a tumour suppressor gene and these loss-of-function mutations affect the mechanism of repression of Notch signalling and cell proliferation through the regulation of NUMB alternative splicing. As RBM10 regulates the alternative splicing of hundreds of target genes there is a need for the expression of RBM10 itself to be tightly regulated, which occurs through auto-regulatory processes. RBM10 negatively regulates its own mRNA and protein expression by exon skipping and the promotion of alternative splicing-coupled nonsense-mediated mRNA decay. In lung adenocarcinoma samples, mutations that affect the splice sites of the exons skipped (6 or 12) have been shown to lead to reduced RBM10 expression, consistent with the tumour suppressive role of RBM10. Mutations in RBM10 have also been implicated in the drug resistance mechanism of thyroid carcinoma harbouring BRAFV600E.
KRAS (Kirsten rat sarcoma viral oncogene homolog) was one of the first 4 genes that was curated for COSMIC when the database was first released to the public 14 years ago. Over the last decade, KRAS has become one of the clinically most important and sequenced oncogenes in cancer. Accordingly, the related scientific literature in PubMed has exploded to a level that is impossible to manually curate exhaustively. With the help of PubTator and LitVar powered by machine learning (Wei, 2013; Allot 2018), we have scanned the literature from the last 5 years and managed to curate 21 new mutations for this COSMIC release. These consisted of 15 new substitution mutations, 1 nonsense mutation and 5 new deletions/insertions from 17 publications. In total, 70 new KRAS mutations have been added to COSMIC during 2018. Most of the mutations are found outside the well-known oncogenic hotspots of exon 1 codons 12 and 13 and exon 2 codon 61 expanding the number of potentially relevant mutations in oncology.
Wei CH, Kao HY, Lu Z. PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res. 2013;41:W518-22Allot A, Peng Y, Wei CH, Lee K, Phan L, Lu Z. LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC. Nucleic Acids Res. 2018;46(W1):W530-W536.
B2M (beta-2-microglobulin) encodes for the constant light chain of the classical major histocompatibility complex (MHC) class I molecule. Specific recognition of this trimeric complex by the T-cell receptor triggers cytotoxic T lymphocyte activity, which is currently exploited in cancer immunotherapies, the efficacy of which is dependent on the MHC class I expression level in the tumour. Somatic mutations in B2M, which include substitutions, deletions and LOH of chromosome 15q21, inhibit transcription of B2M or affect translation of the mRNA, resulting in a lowered expression of the protein or in synthesis of a non-functional protein. Mutations are spread across B2M, however, a frequent mutation at the position M1 seems to be recurrent in lymphomas and a two nucleotide deletion in a CT-repeat in colorectal carcinomas. Decreased expression of B2M has been reported to be associated with worse prognosis in non-Hodgkin lymphoma patients and a favourable prognosis in Hodgkin lymphoma patients. Truncating B2M mutations may be associated with acquired resistance to PD-1 blockade in metastatic melanoma.
BCORL1 (BCL6 corepressor like 1, Xq25-26.1) is homologous to the tumour suppressor gene BCOR and ubiquitously expressed in human tissues. A transcriptional co-repressor, BCORL1 encodes a protein which is tethered to promoter regions by DNA binding proteins, interacting with histone deacetylases, CtBP and PCGF1, and represses E-cadherin expression via interaction with CtBP. Somatic mutations have been found in a variety of myeloid tumours including acute myeloid leukaemia, myelodysplastic syndrome and chronic myelomonocytic leukaemia. These mutations include missense mutations but are most often frameshift, splice site or nonsense mutations across the gene predicted to result in severely shortened truncated proteins lacking the LXXLL nuclear receptor recruitment motif and the C-terminus. Similar somatic mutations have also been reported in solid tumours such as MSI-H gastric adenocarcinoma, melanoma, Wilms tumour, Intracranial germ cell tumours, gliomas and head and neck squamous cell carcinoma.
As part of release v87 we have focused on updating the expert-curated mutation data for mesothelioma. Approximately 30 additional publications that include mutation screening data in this disease are included in the release.
Malignant mesothelioma is a rare and aggressive tumour, mostly occurring in the pleural mesothelial cells, but also arising in the peritoneal or pericardial lining and tunica vaginalis. Histologically, malignant pleural mesothelioma (MPM) is classified into 3 main subtypes: epithelial, mesenchymal/sarcomatous, and mixed/biphasic. The disease is associated with occupational and, more rarely, environmental or domestic exposure to asbestos, and to other mineral fibres. Since asbestos exposure is most common in industries with a male work force MPM is seen predominantly in men. A cancer syndrome with germline BAP1 mutations also predisposes carriers to mesothelioma. There is a long latency period in MPM, with up to 50 years between exposure and tumour development, and patients have a poor prognosis, rarely responding to conventional cytotoxic drugs. Although surgery combined with radio-chemotherapy can be beneficial in patients who present with early-stage disease, most patients are in an advanced stage at diagnosis. A greater understanding of the underlying genetics of MPM and the development of novel targeted therapies are needed to improve the outcome for MPM patients. The disease will remain a global health issue while asbestos continues to be mined and used, especially in developing countries.
The genomic landscape of MPM includes recurrent somatic mutations in some tumour suppressor genes: CDKN2A, NF2 and BAP1. TP53 mutations are also found, at a lower frequency, as well as hotspot TERT promoter mutations. This mesothelioma update includes a paper by Ugurluer et al. (COSP45544) who perform exome-based next-generation sequencing on pleural and peritoneal mesotheliomas. They find tumour-related mutations in 73% of their mesothelioma patients and confirm BAP1, CDKN2A/B and NF2 as the most frequently mutated genes. In a publication by Kang et al. (COSP45546) SETDB1 is identified as a frequently mutated gene in MPM. Tranchant et al. (COSP45541) find an MPM molecular subgroup characterised by co-occurring mutations in the LATS2 and NF2 genes. Furthermore, by defining the specific deregulated signal pathways they identify PF-04691502, an inhibitor of the mTOR/Pi3K/AKT pathway and already in use in clinical trials for other cancer types, as potentially useful for this MPM subgroup.
Lai et al. (COSP45543) report oncogene targeted deep sequencing of a case of malignant peritoneal mesothelioma, identifying a novel somatic BAP1 insert frameshift mutation and suggesting the resultant tumour-specific neo-antigen as a diagnostic marker. Monch et al. (COSP45532) identify a group of MPM characterised by overexpression of ALK and present preclinical data showing that a combination of crizotinib and rapamycin may be suitable targeted therapy in MPM.
Unlike malignant mesothelioma, well differentiated papillary mesothelioma (WDPM) of the peritoneum shows indolent behaviour and is not associated with asbestos exposure. Stevers et al. (COSP45755) perform genomic profiling on WDPM, finding them defined by mutually exclusive mutations in TRAF7 and CDC42, and lacking the genetic alterations common to malignant mesothelioma.
The rare fusion ETV6 (TEL)-PDGFRB, the molecular consequence of the t(5;12)(q33;p13) translocation, is found in some patients with chronic myelomonocytic leukaemia and other myeloproliferative disorders with eosinophilia. ETV6 encodes an ETS family transcription factor containing a Helix-Loop-Helix (HLH) and an ETS DNA binding domain. PDGFRB encodes a cell surface tyrosine kinase receptor for members of the platelet-derived growth factor family. In the fusion, the N-terminal HLH domain of ETV6 is fused to the transmembrane and the tyrosine kinase domains of PDGFRB. The breakpoints detected in ETV6-PDGFRB transcripts are consistently at exon 4 in ETV6 and at exon 11 in PDGFRB. Imatinib is effective therapy in patients with ETV6-PDGFRB-positive chronic myeloproliferative diseases.
New Census genes (tier 1):
New Census genes (tier 2):
Hallmarks of Cancer
New Hallmark Genes in v87
Follow links below to the 9 papers which are new in v87, or view the full table of papers here.
COSMIC v86 (August 2018) includes 3 new fully curated genes, substantial curation update for Glioblastoma, 1 newly curated fusion pair, and 476 whole genome screened samples from 8 new systematic screen papers. We have also integrated ICGC release 27 and added 7 new genes to the Cancer Gene Census Hallmarks of Cancer.
Files for v86 are available on our SFTP site but this is now a legacy service which is no longer supported. For more information about the files available, how to download from the command line, and help with automating the download process please visit the following help pages -
CHD4 (chromodomain helicase DNA-binding protein 4) encodes a protein belonging to the SNF2/RAD54 helicase family and which is the main component of the nucleosome remodelling and deacetylase complex, with an important role in epigenetic transcriptional repression. A high frequency of mutations in CHD4 (17%) has been found in serous endometrial carcinoma where most are missense mutations and half affect the ATPase/helicase and helicase domains. CHD4 is also mutated in clear cell, endometrioid and mixed-histology endometrial tumours, and in uterine carcinosarcoma.
IRS4 (insulin receptor substrate 4) is a cytoplasmic scaffold protein that is phosphorylated by the insulin receptor tyrosine kinase in response to receptor stimulation by insulin. Tyrosine phosphorylated IRS4 protein has been shown to associate with cytoplasmic signalling molecules that contain SH2 domains leading to the activation of the P13K/AKT and MAPK/ERK signalling pathways. The gene for IRS4 is located on chromosome Xq22.3 and somatic mutations, including point mutations and deletions, have been found in a variety of tumour types including metastatic melanoma, multiple meningioma, paediatric T-cell acute lymphoblastic leukaemia and other haematological malignancies. An insertional mutagenesis study in mouse has shown that IRS4 is a driver in mammary oncogenesis, working synergistically with ERBB2/HER2. Furthermore, analysis of expression of IRS4 in human breast carcinomas has shown it to be a putative biomarker for HER2-targeted therapy resistance.
CTCF (CCCTC-binding factor) encodes a transcriptional regulator affecting chromatin structure and organisation by binding different DNA target sequences and proteins, thus playing a vital role in transcription by controlling promoter-enhancer interactions. Mutations have been observed in a variety of cancers including endometrial carcinoma (and the potentially precancerous lesion endometriosis), bladder cancer, Wilms tumour, several myeloid tumours (including acute megakaryoblastic leukaemia (AMKL), transient abnormal myelopoiesis preceding AMKL in Down syndrome patients, and acute lymphoblastic leukaemia), head and neck cancers and some breast cancers. CTCF functions as a tumour suppressor in cancer. Most mutations are frameshift, nonsense or splice mutations resulting in haploinsufficiency following nonsense mediated decay, or are loss-of-function missense mutations often in the zinc finger domains. However, some missense mutations alter DNA binding residues and act as gain-of-function mutations enhancing cell survival, for example p.K365T in endometrial carcinoma.
We have focused on updating the expert-curated mutation data for the brain cancer Glioblastoma multiforme (GBM). Approximately 70 new publications which include mutation screening data in this disease are included in this release. GBM, a WHO grade IV astrocytoma, is the most common malignant primary tumour of the adult brain and has a poor prognosis. It may occur de novo or develop as a secondary tumour from diffuse astrocytoma WHO grade II or anaplastic astrocytoma WHO grade III. Primary and secondary GBMs have different genetic profiles, with IDH1/2 mutations being evident in secondary GBM. Similarly TP53 mutations are more common in secondary than in primary GBM. The most common mutations in primary GBM are TERT promoter mutations (especially C228T, C250T), occurring in 70-80% of tumours, and these mutations are indicative of poorer outcome. Alterations in EGFR are also frequent in primary GBM, with EGFR amplification present in approximately 40% of cases and about half of these also carrying the EGFR vIII variant, an inframe deletion of exons 2-7. Also common in primary GBM are CDKN2A deletions and PTEN mutations. By contrast, paediatric GBM has a different genomic landscape to that of adults, with infrequent changes in CDKN2A, PTEN and EGFR but frequent mutations at hotspot positions in H3F3A and H3.1. GBM has some rare variants, including gliosarcoma, giant cell GBM and epithelioid GBM; the latter often harbouring BRAF V600E mutations.
The MN1-ETV6 fusion, resulting from t(12;22)(p13;q12), is a recurrent but infrequent abnormality in haematological malignancies. MN1 encodes a transcription co-factor while ETV6 encodes a protein of the ETS transcription factor family. Two fusion transcript types have been reported and in both most of MN1, including the glutamine/proline rich domain, is fused to the DNA binding domain of ETV6. In type I, exon 1 of MN1 is fused to exon 3 of ETV6 and in type II the same MN1 exon is fused to ETV6 exon 4. MN1-ETV6 fusions are found in myeloid leukaemia and myelodysplastic syndromes.
New Hallmark Genes in v86
1 New ICGC Study (SSM)
4 Updated CNV Studies
CNVs not reported in 1 Study
There has been a cleanup of duplicated records which had arisen for two reasons:
Both of these issues caused duplicated samples and mutations in the COSMIC database and these were removed in v86.
Follow links below to the 8 papers which are new in v86, or view the full table of papers here.
COSMIC v85 (May 2018) includes 3 new fully curated genes, substantial curation updates for TERT, 1 newly curated fusion pair, and 162 whole genome screened samples from 8 new systematic screen papers. We have also added 25 new genes to the Cancer Gene Census Hallmarks of Cancer and added a new track to the main Genome Browser which displays mutation recurrence. The new download service has been extended making earlier versions of COSMIC available and also adding functionality to support users who automate downloads.
We have extended the new '1-click' download service released in February to include previous versions of COSMIC (from v81) and added functionality to support users who use command line tools and automate downloads. The SFTP site will cease to be supported from our next release (v86 scheduled for August). For more information about the files available, how to download from the command line, and help with automating the download process please visit the following help pages -
A new 'Mutation Recurrence' track has been added to the main Genome Browser. This is a colour density track (pale yellow=low score, red=high score) across the whole reference sequence. Mouseover any nucleotide position on the track to see the score, which is the number of whole genome screened sample IDs with a coding or non-coding mutation at that position.
We have added the option to automatically translate the COSMIC website using the Google Translate plugin. This option is available from every page footer, where a language can be selected from the drop down menu.
There has been no new gene expression data added in v85 but due to the removal of some duplicates the overall number of variants has decreased from 9,176,464 to 9,147,788 (-0.31%).
As a result of the new EU GDPR (General Data Protection Regulations) legislation which comes into effect on 25th May, we will be making some changes to our terms and conditions and privacy policy for the COSMIC website. In the near future we will also be sending registered users an email with instructions for managing mailing preferences and the steps needed to keep your account active.
ERBB3 (erb-b2 receptor tyrosine kinase 3), which encodes HER3 (Human Epidermal growth factor Receptor 3), is a member of the epidermal growth factor receptor family, consisting of four closely related type 1 transmembrane receptors. ERBB3 is the final of the four genes to be expert curated by COSMIC. Unlike other members of the family, HER3 has impaired tyrosine kinase activity, but can function via ligand binding and heterodimerization with other members of the family to influence cell proliferation. Increased expression of HER3 has been observed in a number of cancers, where it has been linked with therapeutic resistance. Somatic mutations in ERBB3 are found across a wide range of cancer types, including colon and gastric cancer, with recurrent hotspots seen in the extracellular domain.
DNA polymerase delta 1 (POLD1) mutations from targeted screening studies have now been curated in COSMIC. Polymerase delta plays an essential role in the replication and repair of chromosomal DNA. Recent studies have shown that germline mutations in the proofreading domain of POLD1 predispose to cancer. They are present in 0.5-2% of patients in intestinal polyposis and CRC cohorts enriched for familial disease. Also low levels of somatic POLD1 mutations occur in multiple sporadic tumours, such as colorectal, gastric and endometrial carcinomas, melanomas and childhood brain tumours, where they often underlie an ultramutated phenotype and potentially a favourable prognosis. POLD1 is a large gene, and is likely to acquire somatic mutations secondary to other causes of increased mutation burden, such as MMR-deficiency; therefore it is important to differentiate pathogenic variants from passenger mutations that are of no functional consequence.
LRP1B (Low-density lipoprotein (LDL) receptor-related protein 1B) is a member of the LDL receptor family of lipoprotein receptors, which have many functions in the human body including cholesterol metabolism and atherosclerotic lesion formation. The gene encoding LRP1B is very large (1.9Mb, 91 exons) and situated on the long arm of chromosome 2 (2q21.2), in the FRA2F fragile site. The LRP1B gene was first discovered during the study of cancer cell lines harbouring homozygous deletions in this region; alterations of the gene in small cell lung cancer cell lines were suggestive of a tumour suppressor role. Furtherstudies of the gene in different cancers have shown a high frequency of genetic changes including homozygous deletions (glioblastoma multiforme, and cancers of the oesophagus, nasopharynx, bladder and lung), point mutations (chronic lymphocytic leukaemia and cancers of the lung, nasopharynx, oesophagus, ovary and stomach) and promotor methylation resulting in transcription silencing. Later experiments, involving overexpression of a recombinant gene in lung cancer cell lines with little or no endogenous LRP1B expression resulted in significantly reduced cellular proliferation, confirming the postulated growth-suppressing function of the protein and role as a tumour suppressor.
The curated data for TERT (telomerase reverse transcriptase) have been updated. More than 60 publications which include screening of TERT, sometimes alongside other genes, are included in this release. TERT encodes the reverse transcriptase component of telomerase which adds telomere repeats to chromosome ends enabling cell replication. Maintenance of telomere length is a key process in malignant progression. As well as the 2 common hot spot mutations in the TERT core promoter at positions c.1-124 (C228T) and c.1-146 (C250T) many other promoter variants have also been identified. TERT promoter mutations are frequent in melanoma and glioma, particularly glioblastoma. They are also found in numerous other solid tumours including hepatocellular carcinoma, urothelial bladder carcinoma and papillary thyroid carcinoma, as well as malignant phyllodes tumour of the breast and in higher grade meningioma.
The RUNX1-RUNX1T1 (AML1-ETO) fusion is now represented in COSMIC. A proportion of the literature reporting on this fusion pair has been curated. The RUNX1-RUNX1T1 fusion results from the translocation t(8;21)(22q;22q), one of the most common cytogenetic abnormalities in acute myeloid leukaemia (AML). Along with inv(16) AML, these comprise the core binding leukaemias, both characterised by the disruption and transcriptional deregulation of genes encoding the subunits of the core binding factor, a transcription factor that functions as an essential regulator of normal haematopoiesis. RUNX1-RUNX1T1 is found in approximately 5-10% of all AML cases and is most common in AML with maturation (FAB M2). The fusion is consistent, with the amino terminal portion of RUNX1, including the runt homology domain, joined to almost the entire RUNX1T1 gene. The genomic breakpoints occur in RUNX1 intron 5 and RUNX1T1 intron 1. Evidence suggests the fusion alone is insufficient to induce leukaemogenesis but additional cooperating mutations are required, such as point mutations in KIT or NRAS. The prognosis is generally favourable for patients with RUNX1-RUNX1T1 AML; complete remission can be achieved with relatively long disease-free survival when patients are treated with high dose chemotherapy but additional activating mutations can confer a poorer prognosis.
New Hallmark Genes in v85
Follow links below to the 8 papers which are new in v85, or view the full table of papers here.
COSP44684*
Zehir A, et al. (Nat Med. 2017 Jun;23(6):703-713., PMID:28481359) describes the compiled tumor and matched normal sequence data from a unique cohort of more than 10,000 patients with advanced cancer. The MSK-IMPACT study from the Memorial Sloan Kettering Cancer Center New York, New York, USA is a clinical sequencing initiative and it has identified 78,240 clinically relevant somatic mutations from various tumour types.
COSMIC v84 (February 2018) includes 8 new fully curated genes, substantial curation updates for POLE and PIK3CA, 1 new fusion pair, 337 genomes from 11 new systematic screen papers and updates from ICGC release 26. We have also added 20 new genes to the Cancer Gene Census (2 added to tier 1 and 18 to tier 2). In this release we launch a new download service which allows users to download complete data files directly from the website. We have also substantially updated the 'About' pages to better describe the COSMIC project.
We have launched a new download service which allows users to download all complete data files from the website, avoiding the need to connect to our SFTP server.To use this service you will need to login and visit the download page. For each downloadable file there are now two or three download buttons -'Download Whole File', 'Download Filtered File' and 'Access via SFTP Server'. Every available file can be downloaded from the website or SFTP server, but the option to download filtered data is not available for all files.
The new About page describes the COSMIC project, core data and resources. It is a useful for source of information for new users and those wishing to explore how COSMIC can support their research.
The ZFHX3 (Zinc Finger Homeobox protein-3) gene encodes the transcription factor ATBF1 (AT-motif binding factor 1). The gene is situated on the long arm of chromosome 16 (16q22), a region which frequently exhibits loss of heterozygosity in solid tumours. Functionally, ATBF1 inhibits cell proliferation through transcriptional negative regulation of c-Myb and transactivation of the cell-cycle inhibitor cyclin-dependent kinase inhibitor 1A (CDKN1A). It also down regulates the alpha-fetoprotein (AFP) oncoprotein, a plasma protein not usually present in normal adult organs but can be found in some adult cancer cells (such as hepatocellular, yolk sac and gastric). Consequently, ZFHX3 acts as a tumour suppressor gene and mutations have been reported in several cancer types - firstly in prostate cancer, and then in breast, colorectal, endometrial, gastric, lung and salivary gland tumours, and neuroblastoma. Mutation types cover the spectrum of changes - missense, small insertion and deletions (both frameshift and non-frameshift), nonsense mutations causing truncation of the encoding protein and intronic changes affecting splice mechanisms.Mutations in ZFHX3 are also a risk factor for atrial fibrillation, a cardiac arrhythmia strongly correlated with cancer incidence.
The miRNA processing gene DiGeorge Syndrome Critical Region 8 (DGCR8) and the renal development genes SIX homeobox 1 and SIX homeobox 2 ( SIX1 and SIX2) are somatically mutated in the embryonal kidney neoplasia Wilms Tumour (WT). DGCR8 is part of the DROSHA microprocessor complex, which recognises and cleaves a pri-miRNA to release a pre-miRNA. Several DGCR8 mutations have been reported in WT and are often associated with chr22 loss of heterozygosity. The recurrent p.E518K mutation, located at the first dsRNA binding domain, has been shown to cause reduction in critical mature miRNAs in tumours. The highly homologous SIX1 and SIX2 genes are essential for progenitor renewal and early renal development. Loss of SIX2 has been shown to result in epithelial differentiation and loss of nephron progenitors. A recurrent mutation located in the homeodomain, p.Q177R, is found in both SIX1 and SIX2 in WTs and is thought to act dominantly, altering the DNA binding properties and thus upregulating cell cycle genes involved in kidney development. SIX1, SIX2 and DGCR8 mutants can be seen early in tumour development or appear at later stages and show evidence of association with poor outcome and disease progression, often being observed in chemotherapy resistant tumours and/or at recurrence. SIX1/2 mutants observed in combination with DGCR8 or other miRNA processing gene mutations in a single tumour show evidence of RAS activation and a higher rate of relapse and death.
The nuclear receptor coactivator 2 (NCOA2) gene encodes a transcriptional coactivator (SRC-2) that modulates gene expression by hormone receptors. In prostate cancer, NCOA2 is found to be both amplified and mutated. The genomic and functional data suggest that NCOA2 functions as a driver oncogene in primary tumours by increasing AR signalling, which is known to play a critical role in early and late stage prostate cancer. However, NCOA2 has many additional targets, including genes involved in cell-cycle regulation, signal transduction, apoptosis, immunity, and transport, which also may contribute to tumorigenesis. In liver cancer NCOA2 has been proposed to act as a tumour suppressor. Deletion of NCOA2 in mice promotes diethylnitrosamine (DEN)-induced liver tumorigenesis. Low levels of NCOA2 and its target glucose-6-phosphatase (G6pc) in HCC patients are associated with poor survival. NCOA2 may promote liver tumorigenesis in cooperation with Myc. NCOA2 mutations have also been reported in melanoma and lung cancer where they clustered in two highly conserved regions of the gene, and several other cancers.
Large scale exome sequencing studies have identified mutations in genes involved in the differentiation programme of squamous epithelium and the Notch/p63 axis, including TP63, as drivers of squamous cell carcinoma of the head and neck. Recurrent missense and nonsense mutations in TP63 have been found.
ACVR2A, activin A receptor type 2A, encodes a transmembrane serine-threonine kinase receptor that mediates the functions of activins, members of the transforming growth factor-beta superfamily. ACVR2A acts as a tumour suppressor gene with a hotspot at an 8-base pair polyadenine tract in exon 10 where truncating frameshift mutations occur in gastrointestinal cancers with microsatellite instability.
LZTR1 (Leucine Zipper Like Transcription Regulator 1) encodes a BTB-Kelch protein that localises to the golgi and acts as a tumour suppressor. Somatic mutations in LZTR1 have been observed across a number of different cancer types, including endometrial, skin and colorectal cancers. They are also seen in glioblastoma, where they have been demonstrated to co-occur with copy number loss. Predisposing germline mutations and loss of heterozygosity are frequently seen in schwannomatosis.
PIK3CA encodes a key component of the PI3K pathway, which plays a key role in many different cancers and is a recognised drug target. Somatic mutations in PIK3CA occur with high frequency, in particular in colorectal, breast and endometrial cancers. In the current release we have updated PIK3CA, focusing on adding novel mutations and papers describing cancers in which it is less well described, for example salivary duct carcinoma, vulval carcinoma and overgrowth syndromes.
Over the last few COSMIC releases we have significantly updated the database with the latest POLE (DNA polymerase epsilon) related literature. Hotspot mutations, such as p.P286R, in the POLE exonuclease domain are associated with an ultramutated tumour phenotype which often includes elevated levels of other driver gene mutations. The mutational signature can be used to subclassify endometrial and colorectal cancers, guide the treatment and act as a prognostic marker. POLE ultramutated tumours are likely to be sensitive to immune checkpoint inhibitors and there are several ongoing trials investigating these agents alone or in combination with chemotherapy or other biological agents.
TBL1XR1-TP63 has been identified as a recurrent fusion in diffuse large B cell lymphoma where it is exclusive to the germinal centre B cell-like subtype. TP63 encodes a member of the p53 family of transcription factors with functional domains including an N-terminal transactivation domain, a central DNA-binding domain and an oligomerization domain. TBL1XR1, transducin beta-like 1 X-linked receptor 1, is a member of the WD40 repeat-containing gene family and encodes a component of both nuclear receptor corepressor and histone deacetylase 3 complexes. In all fusion transcripts the TP63 breakpoint is consistent at exon 4, losing the N-terminal domain and conserving the distal reading frame. TBL1XR1-TP63 has also been found in peripheral T cell lymphoma, where this fusion and ALK rearrangements were mutually exclusive.
New CGC genes (Tier 1)
New CGC genes (Tier 2)
Please note that RAD17 is only available on the GRCh37 genome version
Based on the concept defined by D. Hanahan and R. A. Weinberg, COSMIC, in collaboration with Open Targets, integrates functional descriptions focused on Hallmarks of Cancer into the CGC. The Hallmark pages visually explain the role of a gene in cancer by highlighting which of the classic behaviours are displayed by the gene and whether they are promoted or suppressed. A concise overview with associated references is available for 227 census genes and this will continue to be expanded.
All CGC genes have been re-evaluated and classified with regard to their function in cancer, as oncogenes or tumour suppressive genes, as well as genes participating in fusions, where applicable.
To be able to provide high-confidence and comprehensive data, the CGC has been divided into two tiers.
To classify into Tier 1 of the CGC, a gene must possess a documented activity that may drive or suppress cancer, and there must be evidence of mutations in this gene, detected in cancer, and changing the activity of the protein in a way that promotes the oncogenic transformation. We also take into account the existence of somatic mutation patterns in cancer samples, typical for tumour suppressor genes (broad range of inactivating mutations) or oncogenes (well defined hotspots of missense mutations).
Tier 2 of the Cancer Gene Census consists of genes with strong indications of roles in cancer but with less expansive available evidence, compared to Tier 1. It currently contains 127 genes and is being expanded.
The complete CGC list (Tier 1 and 2) is available here, but please note that any reference to the CGC (or 'Census') across the website which doesn't specify tier, refers to the Tier 1 list.
Follow links below to the 11 papers which are new in v84, or view the full table of papers here.
COSMIC v83 (November 2017) includes 3 new fully curated genes, a substantial curation update for VHL, 1 new fusion pair, 1,138 genomes from 13 new systematic screen papers, updates from ICGC release 25, and updated resistance mutation data; 8 new samples and 11 new resistance mutations. We have also added 5 new genes to the Cancer Gene Census (Tier 1) and expanded the census table functionality to display both tiers. In this release we have retired the legacy website but in response to user feedback we have added full support for the GRCh37 coordinate system across the main website.
We have added a new menu to the main navigation bar on the COSMIC website called 'Genome Version'. The default is set to GRCh38 but GRCh37 can be selected from this menu. When set to GRCh37 the 'GRCh37 Archive' logo will appear at the top of each web page. Please note that you will need to enable cookies in order for this to work.
TGFBR2, transforming growth factor beta receptor 2, encodes a transmembrane member of the Ser/Thr protein kinase family which forms a heterodimeric complex with TGF-beta receptor type-1 and binds TGF-beta. TGFBR2 acts as a tumour suppressor gene in colorectal cancer, where its mutational inactivation is the most common genetic event affecting the TGF-beta signalling pathway, occurring in approximately 30% of these cancers. A 10-base pair polyadenine tract in the extracellular domain is a hotspot, where insertions and deletions result in a frameshift and a non-functioning protein lacking the receptor's transmembrane domain and intracellular kinase domain. These mutations are common in cancers displaying microsatellite instability, with unique clinicopathological features, including an increased incidence in the proximal colon, presentation at an early stage and better prognosis than microsatellite stable (MSS) colon cancer. The frameshift mutations also occur in gastric cancer and missense mutations are found in the kinase domain in MSS colon cancer. Genome studies have identified TGFBR2 as a significantly mutated gene in cervical cancer, and head and neck squamous cell carcinoma.
ERBB4 is a member of the Epidermal Growth Factor Receptor (EGFR) subfamily of receptor tyrosine kinases, along with EGFR, ERBB2 and ERBB3. Ligands include neuregulins and several EGF family members. Activated ERBB4 can function as both a homodimer and a heterodimer with other EGFR family members, resulting in a range of cellular responses. A comparatively less well understood member of the EGFR family, somatic mutations in ERBB4 are seen across various cancers (including breast, lung and melanoma) and in various different regions of the gene. No hotspot mutations have been identified. It has been proposed to act as both an oncogene and a tumour suppressor and is being investigated as a potential drug target.
BCL9L (B-cell CLL/lymphoma 9 like) is a co-activator of Wnt/beta-catenin signalling. It increases the expression of a subset of Wnt target genes but also regulates genes that are required for early stages of intestinal tumour progression. Somatic loss-of-function alterations in BCL9L are frequent in aneuploid colorectal carcinoma but are also found in other tumour types at lower frequency. BCL9L has been proposed to function as an oncogene or as a tumour suppressor depending on the cellular context.
VHL is a tumour suppressor gene that plays a role in a rare inherited disorder called Von Hippel-Lindau syndrome but also in sporadic forms of cancer. The current update in COSMIC brings together the historic collection and the latest published data on the somatic mutations in the VHL gene, including novel mutations and VHL mutations in new histological entities and ethnic groups. Early inactivation of VHL is commonly seen in ccRCC, the most common form of renal cancer. A recent publication by Corr?? et al. (28214514) explores the feasibility of using circulating tumour DNA as a biomarker in this disease. Cho et al. (27994516) sequenced Taiwanese pancreatic neuroendocrine tumours (pNETs) for a large customised panel of genes. They observed that Asian patients with pNETs were more frequently mutated for the mTOR and angiogenesis (including VHL) pathways when compared to Caucasian patients, which could partially explain the better outcome observed for targeted therapy in Asian patients with pNETs. Other reports analysed VHL mutations in tumour types such as parotid mucoepidermoid carcinoma, glioblastoma, breast cancer, colorectal cancer, and clear cell microcystic adenoma.
ETV6-ABL1, resulting from t(9;12)(q34;p13) or a complex rearrangement, is a rare but recurrent fusion in a wide range of haematological malignancies including myelodysplastic neoplasm, acute lymphoblastic leukaemia, acute myeloid leukaemia and Philadelphia chromosome-negative chronic myeloid leukaemia. ETV6 encodes an ETS family transcription factor which contains two functional domains, an N-terminal pointed domain that is involved in protein-protein interactions with itself and other proteins, and a C-terminal DNA-binding domain. Two types of ETV6-ABL1 transcript are detected: type A has an ETV6 breakpoint at exon 4 and type B at exon 5. The ABL1 breakpoint is consistent at exon 2. Both types result in constitutive tyrosine kinase activity similar to that seen with the BCR-ABL1 fusion. Eosinophilia is a common characteristic of patients with ETV6-ABL1 fusion.
Follow links below to the 13 papers which are new in v83, or view the full table of papers here.
COSMIC v82 (August 2017) includes 4 new fully curated genes, a substantial curation update for SMAD4, 1 new fusion pair, 342 genomes from 11 new systematic screen papers, updates from ICGC release 24, and updated resistance mutation data; 1 new drug and 4 updated. We also launch the new COSMIC website featuring new styles and layout as well as an enhanced version of the Cancer Gene Census and additional website download options.
The new COSMIC website has now been launched. We welcome your feedback, please email cosmic@sanger.ac.uk with any issues or suggestions for improvement.
The old websites have been updated to v82 and will continue as the legacy website and GRCh37 (archive) legacy website. These will be available until the next release in November 2017, but we do not plan to maintain them beyond that date. However, we will continue to provide our download files as both GRCh38 and GRCh37 versions for the foreseeable future.
New features include -
For users who download the COSMIC Oracle database dumps, please note that we now only support Oracle 12c. This is because Oracle 11.2 is no longer supported by Oracle.
Kelch-like ECH-associated protein 1 (KEAP1) is a component of the Cullin 3-based E3 ubiquitin ligase complex and controls the stability and accumulation of NRF2 protein. When cells are exposed to oxidative damage, KEAP1 releases NRF2 which translocates into the nucleus where it specifically recognises an enhancer sequence known as Antioxidant Response Element (ARE) resulting in the activation of redox balancing genes. Several studies have reported somatic mutations of the interacting domain between KEAP1 and NRF2 leading to a permanent NRF2 activation. Somatic mutations of the KEAP1 gene are found in non-small cell lung cancer, hepatocellular carcinoma, endometrial cancer, melanoma and many other cancer types and have been associated with a poor outcome and resistance to chemotherapy. The mutations are generally widely distributed in the KEAP1 gene and the frequency of mutations depends on the cancer type and origin.
microRNAs (miRNA) are vital regulators of gene expression. Together with its co-factor DGCR8, the miRNA processing gene DROSHA (drosha ribonuclease III) is involved in the early stages of miRNA processing and is essential for the biogenesis of most miRNAs. Low DROSHA expression levels are observed in several cancer types, including neuroblastoma, endometrial and ovarian cancer, and are associated with advanced stages of several cancer types. In contrast, copy number increases (seen in advanced cervical squamous cell carcinoma) and over-expression are observed in other cancer types, including serous ovarian carcinoma, gastric and non-small cell lung cancers, often associated with prognosis or progression. DROSHA is frequently mutated in Wilms tumour, with the majority of mutations found in the RNase IIIb domain, at p.E1147. The recurrent mutation p.E1147K affects miRNA processing via a dominantnegative mechanism resulting in down regulation of miRNAs.
BTK encodes Bruton tyrosine kinase, a TEC family cytoplasmic tyrosine kinase required for the development, activation and differentiation of B cells, and an early component of the B-cell receptor signalling pathway. Recurrent mutations at BTK C481 have been identified in patients with chronic lymphocytic leukaemia (CLL) who have progressed after an initial response to ibrutinib treatment. Ibrutinib is a highly specific BTK inhibitor, inactivating by irreversible binding to C481 within the ATP-binding domain of BTK. While C481 mutations are most common among CLL patients who progress on ibrutinib, mutations at the non-kinase SH2 domain at T316 have also been reported. Progression of mantle cell lymphoma after a durable response to ibrutinib may also be due to C481 BTK mutation. This same mutation has also been detected in Waldenstrom macroglobulinaemia patients progressing on ibrutinib.
Hypoxia-inducible factors (HIFs) are transcription factors that respond to changes in tissue oxygen concentration. One of these, Hypoxia-inducible factor 2-alpha (HIF-2-alpha), is encoded by EPAS1. Somatic mutations in EPAS1 occur recurrently in sporadic pheochromocytomas and paragangliomas, as well as in somatostatinomas as part of Pacak-Zhuang syndrome (multiple paragangliomas and somatostatinomas associated with polycythaemia). In some patients with multiple tumours, these somatic EPAS1 mutations are mosaic, having arisen post-zygotically. The majority of somatic EPAS1 mutations are found in exon 12, and gain of function mutations in this region have been shown to cause stabilisation of the HIF2A protein, resulting in transcription of genes involved in the hypoxia response and promotion of angiogenesis and proliferation.
The expertly curated data for SMAD4 have been updated. Over 40 publications which include screening of SMAD4, often alongside other genes, are included in this release. SMAD4 encodes a member of the Smad family of signal transduction proteins which plays a pivotal role in signal transduction of the transforming growth factor beta superfamily cytokines by mediating transcriptional activation of target genes. SMAD4, a tumour suppressor gene, is one of the major driver genes in pancreatic cancer. A lack of SMAD4 mutations in high-grade pancreatic intraepithelial neoplasia, the major precursor of pancreatic ductal adenocarcinoma, indicates these are late genetic alterations in pancreatic carcinoma. SMAD4 mutations are also found in colorectal carcinoma (CRC), where they have a prognostic role in metastatic CRC cases, and less frequently in other tumours, including lung cancer.
The SET-NUP214 fusion results from a recurrent genetic abnormality at 9q34 and is found predominantly in T-cell acute lymphoblastic leukaemia (T-ALL), with a reported frequency of up to 10%. The fusion is rarely detected in acute myeloid leukaemia, acute undifferentiated leukaemia and B-cell acute lymphoblastic leukaemia. In T-ALL, the SET-NUP214 fusion is associated with elevated expression of HOXA cluster genes and with corticosteroid/chemotherapy resistance. SET encodes a protein with a critical role in chromatin binding and remodelling, while NUP214 encodes an FG-repeat-containing nucleoporin involved in the cell cycle and transportation of material between the nucleus and cytoplasm. Most commonly the breakpoints in the SET-NUP214 transcript are at exon 7 of SET and exon 18 of NUP214.
Based on the concept defined by D. Hanahan and R. A. Weinberg, COSMIC, in collaboration with Open Targets, integrates functional descriptions focused on Hallmarks of Cancer into the Cancer Gene Census. New Hallmark pages visually explain the role of a gene in cancer by highlighting which of the classic behaviours are displayed by the gene and whether they are promoted or suppressed. A concise overview with associated references is available for 226 census genes and will be expanded on a regular basis.
All Cancer Gene Census (CGC) genes have been re-evaluated and classified with regard to their function in cancer, as oncogenes or tumour suppressive genes, as well as genes participating in fusions, where it was applicable.
To be able to provide high-confidence and comprehensive data, we have divided the CGC into two tiers. Currently, only the Tier 1 genes are shown on the website and in the download files.
Tier 2 of the Cancer Gene Census consists of genes with strong indications of roles in cancer but with less expansive available evidence, compared to Tier 1. It currently contains 41 genes from the previous release of the Cancer Gene Census and is being expanded, with a planned initial release of about 200 genes in November 2017, along with COSMIC v83.
Complete census list (Tier 1) is available here
CGC Genes moved to Tier 2 of the Cancer Gene Census
1. PMS1 - PMS1, a component of DDR, only one recurrent frameshift mutation K164fs*6 in four samples, newer papers about MMR genes in cancer don't mention this gene, mice deficient in PMS1 do not develop tumours, no evidence for significant activity in MMR in vitro [PMID: 10542278]
2. Fusion genes with only one case (or rare partners of potent oncogenes known to be fused to multiple partners and able to drive the transformation on their own):
3. Fusion genes transcribed with a shifted reading frame or untranscribed upon fusion, for which there is no sufficient evidence for tumour suppressing activity:
4. Non-coding genes and pseudogenes do not fit to the current schema of Tier 1 of the Cancer Gene Census. We are working on better characterisation of the role of such genes in cancer. Temporarily they are classified as Tier 2 CGC genes:
5. Genes known to be involved in cancer only through fusions, where the oncogenic mechanisms depend on disruption of the structure of their fusion partner and there is no evidence of their other cancer-promoting activity so far:
6. Genes known to be involved in cancer only through fusions, for which there is not enough data describing their participation in oncogenic transformation
Genes removed from the Cancer Gene Census (Tier 1 and 2)
In total, the following 49 genes have been removed from Tier 1 of the CGC:
New ICGC Studies:
New Copy number data:
Follow links below to the 11 papers which are new in v82, or view the full table of papers here.
COSMIC v81 (May 2017) includes 6 new fully curated genes, a substantial curation update for TET2, 1 new fusion pair, 220 genomes from 9 new systematic screen papers and updated resistance mutation data; 1 new drug and 5 updated. We also announce the launch of a new COSMIC beta site featuring new styles and layout as well as an enhanced version of the Cancer Gene Census and additional website download options.
The new COSMIC Beta site http://cancer-beta.sanger.ac.uk has now been launched. This site will be under continual update over the next 3 months and will be regularly updated. We welcome your feedback, please email cosmic@sanger.ac.uk with any issues or suggestions for improvement.
For users who download the COSMIC Oracle database dumps, please note that from v82 we will only support Oracle 12c. This is because Oracle 11.2 is no longer supported by Oracle.
Oncogenic gain-of-function mutations in DDR2 have been identified in squamous cell carcinoma (SqCC) of the lung. DDR2 encodes the discoidin domain receptor 2, a collagen-stimulated receptor tyrosine kinase. These kinases are involved in the regulation of cell differentiation, cell migration and cell proliferation. DDR2 mutations are present in 4% of lung SqCC where they are associated with sensitivity to dasatinib. Low frequency DDR2 mutations have been found in other cancer types such endometrial, kidney, brain, breast and colorectal, and in recurrent/metastatic head-neck SqCC.
Mutations in SMAD2 and SMAD3 occur at very low frequency in various cancers types. SMAD2 mutations have been found in cervical and colorectal cancer, hepatocellular carcinoma and non-small cell lung cancer. SMAD3 mutations have been detected in colorectal cancer and in oral squamous cell carcinoma. Most of the mutations observed are missense mutations. Both SMAD2 and SMAD3 encode proteins which are major signalling molecules acting downstream of the serine/threonine kinase receptors.
NCOR1 (nuclear receptor corepressor 1) plays a part in maintenance of genomic integrity. It has been reported among the most frequently mutated drivers in breast cancer. Downregulation of NCOR1 expression abrogates HDAC3 function and results in genomic instability. Breast cancer patients with high NCOR1 expression levels have been found to have a better prognosis than those with low expression (Zhang et al., 2005). NCOR1 mutations also play a role in skin cancer, colorectal carcinoma and many other cancer types. Predicted damaging and somatic mutations in epigenetic regulators were detected in one third of high hyperdiploid acute lymphoblastic leukaemia (HD-ALL) patients (de Smith AJ 2016).
Protein phosphatase, Mg2+/Mn2+-dependent, 1D (PPM1D) encodes WIP1, a member of the PP2C family of serine/threonine protein phosphatases. PPM1D dephosphorylates DNA damage response mediators such as CHEK2 and p53, antagonising their function and promoting reentry into the cell cycle. Recurrent PPM1D mutations have been observed in brainstem gliomas, with many of these resulting in truncation of the C-terminal regulatory domain and leaving the phosphatase domain intact.
Mutations across the PREX2 gene, including numerous truncating mutations, have been found in metastatic melanoma, including in desmoplastic melanoma, and also in other cancers such as basal cell carcinoma, pancreatic ductal and lung adenocarcinomas, and merkel cell carcinoma. PREX2 has been recognised as playing a role in melanoma for some years, although the precise nature of all the mechanisms of its involvement remain uncertain. Some in vivo and mouse studies have a demonstrated that cancer-associated PREX2 mutations promote the growth of human melanoma cells. It is a GTP/GDP exchange factor and both mutated and wild type PREX2 inhibit the tumour suppressor PTEN, but PTEN can no longer inhibit mutated PREX2, hence mutual inhibition is disrupted promoting tumour growth via activation of the PIK3 signalling pathway. Increased RAC-dependent invasiveness is also associated with mutated PREX2.
TET2 (ten-eleven-translocation gene) is an epigenetic regulator responsible for converting DNA cytosine methylation to hydroxymethylation, a process disrupted by mutations which are known to be associated with myeloproliferative neoplasms (MPN), leukaemias and mastocytosis. An update of 46 publications which included screening of TET2, often along-side other genes or gene panels, has been carried out. Overall 2,027 new samples were curated which identified 277 new mutations of all types and located across the gene. Publications included reports of many haematopoietic and lymphoid disorders, as well as 2 where solid cancers progressed following hormone or tyrosine kinase therapy. One of these publications reported TET2 mutations associated with metastatic prostate cancer after hormone therapy and the second publication reported 12% TET2 mutated samples in non-small cell lung cancer progressions following tyrosine kinase therapy. MPN publications curated include those where TET2 was found associated with progression, and chronic myelomonocytic leukaemia, where mutated TET2 was predictive of inferior prognosis when co-occurring with ASXL1 mutation; myelodysplastic syndrome (MDS) and chronic eosinophilic leukaemia (CEL), including a report where mutated TET2 could help distinguish MDS/CEL from reactive disorders and hypereosinophilic syndrome respectively. Leukaemia publications include HTLV-1 associated adult T cell associated leukaemia/lymphoma (with TET2 as the most commonly mutated gene); angioimmunoblastic T cell leukaemia and peripheral T cell leukaemia, where TET2 mutation are associated with shorter PFS; And somatic TET2 mutation associated with AML in a family with familial platelet disorder.
Complete census list available here
Based on the concept defined by D. Hanahan and R. A. Weinberg, COSMIC, in collaboration with Open Targets, integrates functional descriptions focused on Hallmarks of Cancer into the Cancer Gene Census. New Hallmark pages visually explain the role of a gene in cancer by highlighting which of the classic behaviours are displayed by the gene and whether they are promoted or suppressed. A concise overview with associated references is initially available for 116 census genes and will be expanded on a regular basis.
Follow links below to the 9 papers which are new in v81, or view the full table of papers here.
SNVs and indels have also been uploaded from a Colorectal Cancer Organoids study from the suppresSTEM consortium: COSU670
COSMIC v80 (Feb 2017) includes a major new tool "COSMIC-3D" supporting target characterisation and pharmaceutical design alongside significant updates to our cancer genome and key cancer gene curations.
We have a new interface to explore cancer mutations on 3D protein structures, "COSMIC-3D", now available for public evaluation. Produced in partnership with Astex Pharmaceuticals (Cambridge, UK), it shows interactive 3D visualisations of over 8000 human proteins (using PDB structures), with COSMIC mutations mapped, and options to see frequency and effect. Putative small-molecule drug pockets are identified, and can be explored alongside cancer mutations to identify, characterise and design molecules against new targets across oncology. All the information is correct, but as an beta-evaluation release we would value your feedback on the web interface, so we can make it as useful as possible.
In our traditional way, full and exhaustive literature curations are now provided across cancer genes USP8, FAT1, FAT4, CXCR4 and fusion pair PML-RARA; substantial curation updates are made to AR and CTNNB1 and the Cancer Gene Census describes 7 new genes. Genome-wide molecular profiles have been curated from the ICGC (release 23, Oct 7th 2016) and 421 new genomes have been added by curation of 18 systematic screen publications. For full details of the new content in v80 please see the Datasheet.
We use recommendations from the HGVS for syntax when annotating the data within COSMIC. As part of our ongoing commitment to data quality we are currently in the process of ensuring all our mutation data are described in the most modern ways, including the latest HGVS nomenclature and gene structures. Over the last 6 months we have been working on a new system to continually annotate COSMIC data to the latest standards. Of course, to ensure the new annotations are exactly correct, we are including expert manual oversight, so it takes a little time to completely validate our huge dataset. Once we have verified the precision of our system, it will be deployed in forthcoming releases.
For more information about release v80 and other news please see the first issue of our Newsletter. We will be using this to communicate with you more frequently about the project and the exciting developments we have in the pipeline. This issue includes details about the COSMIC Workshop on March 6th and the beta release of COSMIC-3D
COSMIC v79 (Nov 2016) includes substantial updates to our cancer genome and key cancer gene curations. Full literature curations are now provided across cancer genes PRKACA and AR, and fusion pair CBFA2T3-GLIS2; substantial curation updates are made, especially to GNAS, GNAQ, and GNA11, and the Cancer Gene Census describes 7 new genes. Genome-wide molecular profiles have been curated from the ICGC (release 22, Aug 2016) and 265 new genomes have been added by curation of 9 systematic screen publications. A new drug, Vismodegib, has been added to our Genetics of Drug Resistance, describing 19 therapy-resistance variants in the gene SMO.
Data Updates in brief (for full details of this latest release, please see the v79 Datasheet).
We now include drug resistance data for the gene SMO (Vismodegib) as well as updates for EGFR (Gefitinib,Erlotinib and Afatinib), ESR1 (Endocrine therapy) and ALK (Alectinib).
All drug resistance data is detailed here, describing our curations across 11 genes and 21 pharmaceuticals. Links are provided to explore this information in detail, with charts showing the landscape of resistance to drugs targeting mutations in the gene of interest.
7 genes have been added to the Cancer Gene Census: EPAS1PTPRTPPM1DBTKPREX2TP63QKI
The complete list is available in the census table, which describes the role of each gene in cancer progression (tumour suppressor or oncogene). Currently this information is available for 244 census genes. This content, as well as additional functional annotation is being substantially expanded for future releases.
COSMIC data have been combined with the ProteinPaint data mining and visualization system at St. Jude Children's Research Hospital in Memphis TN, to support the discovery and understanding of genetic mutations in paediatric cancers [ .... read more ].
On Monday 6th March 2017 we are holding a workshop titled 'An introduction to COSMIC' at the Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
The course will begin with a presentation overviewing the COSMIC project, followed by a hands-on tutorial introducing the COSMIC website and strategies for exploring cancer variation data and investigating the genetic causes of human cancers. In addition, there will be short presentations describing exciting new developments scheduled for future release, and opportunities to engage the team in a group Q&A session and informal discussions about the COSMIC website and future plans.
Registration will open in January, but please email cosmic@sanger.ac.uk if you would like more information or wish to express an interest in attending.
If you would be interested in hosting a COSMIC workshop at your workplace, we would be very pleased to hear from you. Please contact the COSMIC helpdesk (cosmic@sanger.ac.uk)
We are planning to merge the functionality of the COSMIC and Whole Genomes websites in February 2017 (v80). We will be introducing a new 'whole genomes' filter on the gene and cancer browser pages, and as a consequence the Whole Genomes site will become redundant and will be retired.
An API and new web interfaces for downloading COSMIC data will also be developed and rolled out in 2017. As part of these developments, and due to incompatibility between BioMart (0.7) and the latest version of our Oracle databases, we are discontinuing support for the COSMICMart in this release.
If you have any questions about these changes please email the COSMIC helpdesk (cosmic@sanger.ac.uk).
COSMIC has been updated significantly in v78 (Sept 2016). This major data release includes new full literature curations of cancer genes HIF1A, MTOR and PTPN13, drug resistance profiles across Sorafenib & Quizartinib, and a complete update of genome-wide analysis from the ICGC (release 21, May 2016). We have also added 9 new genes to the Cancer Gene Census, and fully re-analysed the copy number data across all TCGA samples using the ASCAT2 algorithm.
Data Updates in brief (for full details of this latest release, please see the v78 Datasheet).
New in v78; FLT3 with drugs Quizartinib, and Sorafenib, detailing a total of 76 new unique resistance mutations.
All drug resistance data is now detailed here, describing our curations across 10 genes and 20 pharmaceuticals. Links are provided to explore this information in detail, with charts showing the landscape of resistance to drugs targeting mutations in the gene of interest.
9 genes have been added to the Cancer Gene Census: DDR2MAPK1BCORL1KEAP1LRP1BDROSHAB2MDDX3XAPOBEC3B
The complete list is available in the census table, which describes the role of each gene in cancer progression (tumour suppressor or oncogene). Currently this information is available for 237 census genes. This content, as well as additional functional annotation is being substantially expanded for future releases.
Over time we have added filters aimed at selecting those variants within the cell lines that are more likely to contribute to carcinogenesis. These have included the ability to select variants in genes known to contribute to cancer (Cancer Gene Census), as well as an estimation of the mutation impact on the protein as determined by FATHMM. We have now extended this list of filters to include a filter that identifies variants within the cell lines that are similar to variants seen recurrently in whole genome screened tumour samples. The criteria for calling a variant as recurrent differs based on mutation type. For further details please see the Genome Annotation page.
We welcome two new starters to the COSMIC team, Dr. John Tate and Ms. Bhavana Harsha. John is our new web design and visualisation specialist who will be driving new developments and improving the design of the website. Bhavana, our new bioinformatic specialist, is developing a new annotation system to handle the ever increasing volume and complexity of genomic variation data.
Thank you for your continued support.
On Monday 26th September 2016 we are holding a workshop at the University of Cambridge, UK, titled 'COSMIC: Exploring cancer genetics at high resolution'.
During the course we will use the live COSMIC website and genome browser to show you how to access and explore cancer variation data, seeking to identify genetic causes and targets in all human cancers.
For more details please see the course timetable.
If you wish to attend the workshop, please visit the registration page.
If you would be interested in hosting a COSMIC workshop at your workplace, we would be very pleased to hear from you. Please contact cosmic@sanger.ac.uk
We are considering changing the compatibility of the Oracle data pump export files from supporting Oracle 10g to 11g (11.2). If this change will cause problems for you, please let us know by emailing the COSMIC helpdesk (cosmic@sanger.ac.uk).
COSMIC now encompasses the Genetics of Drug Resistance across 9 therapeutic target genes and 18 drugs (release v77). Also, full mutation profiles across ATR, TBX3 & NFKBIE, STIL-TAL1 & DNAJB1-PRKACA gene fusions, and over 700 new cancer genomes.
Data Updates in brief (for full details of this latest release, please see the v77 Datasheet).
In this COSMIC release, we now encompass the genetics of drug resistance, somatic mutations that allow a tumour to continue growing despite targeted therapeutics. Initial curations cover 9 genes and 18 pharmaceutical therapies (listed below), detailing 226 resistance-driving mutations.
Genes: ABL1 ,ALK ,BRAF ,EGFR ,ESR1 ,KIT ,MAP2K1 ,MAP2K2 ,PDGFRADrugs: Vemurafenib, AZD9291, Ceritinib, Erlotinib, Gefitinib, Imatinib, Nilotinib, Tyrosine kinase inhibitor - NS, Afatinib, Endocrine therapy, Alectinib, PD0325901, Dasatinib, Crizotinib, Selumetinib, Sunitinib, Dabrafenib, Bosutinib
This information is available in the 'Drug Resistance' tab of the gene analysis pages; where a table and charts show the landscape of resistance to drugs targeting mutations in the gene of interest. For example, please look at the Tyrosine Kinase Inhibitors associated with EGFR.
23 genes have been added to the Cancer Gene Census: AR, CHD4, CTCF, CXCR4,ERBB4 , FAT1 , FAT4 , HIF1A , LEF1 , LZTR1 , MTOR , NCOR2 , PRKACA , PTK6 , PTPN13 ,RBM10 , SDHA , SMAD2 , SMAD3 , TGFBR2 , USP8 , ZFHX3
New information is added to the census table, describing the role of each gene in cancer progression (tumour suppressor or oncogene). Currently this information is available for 156 census genes. This content, as well as additional functional annotation is being substantially expanded for future releases.
This spring, we welcome three new additions to the COSMIC team. Dr Laura Ponting and Dr Raymund Stefancsik join us from Cambridge University (UK) as curator scientists. They are now enhancing our team of expert manual curators, aiming to comprehensively describe the range of cancer-causing mutations across all cancer genes (driven by the Cancer Gene Census, describing 595 genes).
In addition, Charalambos (Harry) Boutselakis joins us from London's Farr Institute, bringing substantial informatic expertise across databases and data analytics. He will be expanding the ways in which COSMIC can be used while ensuring its immediate responsiveness as the database increases in size and scope.
Thank you for continuing to support us.
Please ensure you are registered (here) for data downloads, and to ensure you receive future communications.
COSMIC v76 includes full curations across cancer genes PPP6C and SPOP, genomic content from 17 systematic screen publications, and a complete update from ICGC release 20. We welcome two new scientists to the COSMIC team who will be focused on identifying targets and biomarkers across the expanding COSMIC dataset. The streamlining of our website also continues, improving the layout and design of many large-data webpages, and we have improved our Download files to simplify frequency calculations across COSMIC datasets.
For full details of this latest release, please see the v76 Datasheet; in brief:
We welcome two new scientists who will be investigating the curated database and annotating the most interesting target and biomarker opportunities across this enormous database.
Dr. Sam Thompson is a medical statistician with expertise in clinical trials. In collaboration with Bayer Pharmaceuticals, she will be exploring correlations across the different types of variant annotation in COSMIC, aiming to systematically identify novel markers for disease.
Dr. Harry Jubb brings a proteomic perspective to COSMIC. Working together with Astex Pharmaceuticals, Harry will spend the next three years enhancing our visualisation of coding mutations, and investigating which mutated peptide domains are tractable for pharmaceutical design.
Thank you for your support, allowing us to enhance the utility of the curations in COSMIC.
We have extended the layout and design used on the Gene page to the Cancer Browser, Sample, Study, and Mutations pages. Tabulations showing variant annotations from multiple datatypes have been combined into a 'Variants' tab on these pages.
On the Overview tab of the Gene page various icons indicate if the selected gene is part of a significant dataset. The icons , and indicate a cancer census gene, an expert curated gene, and a gene with a significant role in oncogenesis as evidenced from mouse insertional mutagenisis experiments.
Substantial changes are made on the Genome Browser home page with a new smart search feature with the option to select any of the specific datasets; COSMIC, Whole Genomes or the Cell Lines Project.
We have updated the structure of the mutation files in our Download site to simplify the calculation of mutation frequencies. Data has been separated according to the type of screening method used; targeted gene screen and whole genome screen. We have also enhanced the information available from the sample details file so that whole genome samples can be extracted for use in whole genome screen mutation frequency calculations. Please see our FAQ for details.
We are changing the way we communicate release updates to COSMIC users. Please register to ensure you receive future communications.
COSMIC v75 includes curations across GRIN2A, fusion pair TCF3-PBX1, and genomic data from 17 systematic screen publications. We are also beginning a reannotation of TCGA exome datasets using Sanger's Cancer Genome Project analyis pipeline to ensure consistency; four studies are included in this release, to be expanded across the next few releases. The Cancer Gene Census now has a dedicated curator, Dr. Zbyslaw Sondka, who will be focused on expanding the Census, enhancing the evidence underpinning it, and developing improved expert-curated detail describing each gene's impact in cancer. Finally, as we begin to streamline our ever-growing website, we have combined all information for each gene onto one page and simplified the layout and design to improve navigation.
For full details of this latest release, please see the v75 Datasheet; in brief:
We welcome Dr. Zbyslaw Sondka to the COSMIC team. Working in collaboration with The Centre for Therapeutic Target Validation (CTTV) he will be curating the Cancer Gene Census; building the evidence behind existing genes as well as extending the census list.
Overview information has been merged into the Gene Analysis page. This page also has a full featured Genome Browser which repsonds to filters. The page layout has also been redesigned, with tabulations organised under a single 'Data' tab and studies and publications combined under the 'References' tab.
The GA4GH (Global Alliance for Genomics and & Health) Beacon Project is a project to encourage international sites to share genetic data in the simplest of all technical contexts. The service is designed merely to accept a query of the form "Do you have any genomes with an 'A' at position 100,735 on chromosome 3" (or similar data) and responds with one of "Yes" or "No."
The Beacon Network lists all the known beacons, including the newly released COSMIC Beacon
A new miRNA track has been added across all browsers, with the data sourced from miRBase.
We are changing the way we communicate website updates to COSMIC users. As from this release all our registered users will receive email notification of updates to the website. We would encourage all those who have subscribed to the mailing list cosmic-announce@sanger.ac.uk to register as communication via this list is being phased out. If you are registered but prefer not to receive emails you can opt out by logging in and going to the Account Settings page.
We have also introduced a new 'non-affiliated' category to allow users who do not belong to a recognised academic or corporate organisation to register for email updates using their personal email address.
COSMIC v74 brings a new focus on curating blood cancer fusion genes, starting with BCR/ABL and KMT2A (MLL) fusions. We are also beginning to capture much greater clinical details on the samples we curate, now available for download. More traditionally, somatic mutations are curated from three new cancer genes, POLE, AXIN2 and KDM6A. Substantial new genomic data are included from 17 systematic screen publications, and a full update to the latest ICGC release (v19).
For full details of this latest release, please see thethe v74 Datasheet; in brief:
"Mutation Impact" scores (via FATHMM-MKL) are now available for non-coding variants. These values can be viewed on the NCV, Study and Sample overview pages, and the COSMIC Genome Browser (functionally significant variants are coloured blue). They are also included in the download files on the SFTP site. There are 422,212 functionally significant variants (scores ≥ 0.7). Please see the Mutation Impact section of Cancer Genome Annotation for help interpreting the scores.
We are now capturing substantially more clinical feature annotations on the samples we curate. Across 24 new columns we are capturing, where available, annotations such as therapeutic regimes and responses, mutation allele specification, tumour stage/grade/cytogenetics, patient age/ethnicity/gender. This full information is available via COSMIC Downloads, and is also displayed on the website on each individual Sample Overview page. For full details of these rich expanded clinical annotations, please see the 'Cosmic sample features' section (describing the CosmicSample.tsv.gz file) here.
COSMIC v73 contains full expert curation across 9 cancer genes, 26 systematic screen publications and ICGC release 18. 'Mutation impact' filters across the website now estimate pathogenic functional consequences, based on the new FATHMM-MKL algorithm. Substantial new information is now present in the COSMIC Genome Browser: regulatory features from ENCODE are now available, particularly enhancing the utility of the differential methylation and non-coding variant data; human SNPs are now shown alongside COSMIC somatic mutations, and genome browsing is now navigable via our Cancer Browser.
Below is a summary of new data in v73, please see the v73 Datasheet for further description.
We have upgraded our 'Mutation Impact' filters to use scores generated by the a new version of FATHMM (FATHMM-MKL). See the v73 Datasheet for more information.
COSMIC v72 is our largest release ever, containing new annotations across 5466 cancer genomes and full literature curation across 22 new cancer genes, 28 fusion pairs; 26 genes have been added to the Cancer Gene Census. We provide our first integration of differential methylation data and many additional mutations, copy number aberrations and expression variants from recent ICGC & TCGA releases. All genomic events in COSMIC have been upgraded to GRCh38 (with a GRCh37 archive available). Finally, we present a new curated resource, to be regularly updated, describing the characterisation of 30 mutation signatures across human cancer.
COSMIC is adopting a new licensing strategy for v72, to grow the scope of our literature curations, enhance the analytics available across our data, and support the capacity to sustain this ever-growing database into the future. Key changes are -
All licensing payments are used to grow COSMIC, its coverage and analytic usefulness for oncology insight. We will also be inviting licensees to tell us which priorities we might best pursue, to ensure the direction of COSMIC best supports these industries' commercial oncology research. Please see our Licensing page for more details.
This v72 release is too large to describe here in detail. Here's a summary, please see the v72 Datasheet for further description.
Our curations are generated by expert postdoctoral scientists, described here.
We have updated the genomic coordinates in COSMIC to GRCh38. However, we are also hosting a parallel website to display the data on the GRCh37 reference. This GRCH37 site will be maintained and updated throughout 2015 with any new source data where the original coordinates are on GRCh37. However, it will not be updated with any new data where the original coordinates are on GRCh38.
Different mutational processes generate unique combinations of mutation types, termed "Mutational Signatures". Based on an analysis of 10,952 exomes and 1,048 whole-genomes across 40 distinct types of human cancer we have added a Mutation Signatures page on the website; a curated census of signatures providing the profiles of, and additional information about, known mutational signatures.
We have integrated methylation data across the COSMIC website. The Gene Analysis page has been extended to show a methylation track on the mutation histogram, differential methylation counts in the tissue tab and a new 'Methylation' tab has been added to display a table of variants. The Cancer Browser, Study and Sample Overview pages have also been updated to integrate methylation data. The majority of methylation annotations are outside gene footprints, COSMIC's Genome Browser is the best way to explore this information.
The COSMIC Genome Browser is valuable tool for exploring COSMIC data in its genomic context. This browser can be used to explore the data in COSMIC, COSMIC genomes (WGS) and the COSMIC Cell lines Project, on either the GRCh37 or GRCh38 reference sequence. It can also be used to view the data for an individual sample if selected. Please see the COSMIC Genome Browser homepage for more details.
COSMIC v71 includes full literature curation of PTPRB, PLCG1, POT1 and STAG2, the addition of 25 new census genes and an update of gene expression and copy number data from ICGC release 17 (Sept 2014).
The Cancer Gene Census has been updated with 25 new genes, this brings the total of known cancer genes substantiated by the scientific literature to 547. The new genes are :
We have added an additional 16 cell lines to the Cell Lines Project. The lines are:
We have included an initial integration of mouse insertional mutagenesis data for 851 COSMIC genes from the CCGD (Candidate Cancer Gene Database) adding supporting evidence for cancer driver genes. These data are integrated in the Gene Overview page, more details can be found here.
A mutation matrix plot has been added to the Study Overview page, enabling the relationship between genes, point mutations, copy number gains/losses, over/under gene expression and samples to be investigated for a specific study or publication.
For whole genome analysed samples the Sample Overview page now includes a Genome Browser (JBrowse), allowing all mutation types for a sample (including coding and non-coding mutations, and aberrant copy number and gene expression) to be viewed in genomic context with COSMIC and Ensembl gene annotations (GRCh37).
There is a new tutorial section in the help pages including 4 new tutorials demonstrating the Sample, Gene, Fusion and Cancer Browser pages.
We have added 7,148 new copy number variants from 8 new TCGA studies (source ICGC release 17, re-analysed with ASCAT).
We have nearly doubled the gene expression data in COSMIC by adding data from 10 new studies from TCGA (source ICGC release 17). The platforms supported are: IlluminaHiSeq_RNASeqV2, IlluminaGA_RNASeqV2, IlluminaHiSeq_RNASeq, and IlluminaGA_RNASeq. Please note that as from this release we no longer show results from the array platforms AgilentG4502A_07_2 and AgilentG4502A_07_3. For more information please visit the gene expression help page.
PTPRB, encoding a tyrosine phosphatase specific to the vascular endothelium that inhibits angiogenesis, has been identified as a tumour suppressor gene in angiosarcoma. Mutations were found in secondary tumours or those with MYC amplification, a biomarker of radiation-associated secondary angiosarcoma. PLCG1, encoding a tyrosine kinase signal transducer in the phosphoinositide pathway, also has recurrent, likely activating mutations in angiosarcoma. PLCG1 gain of function mutations have previously been identified in cutaneous T-cell lymphoma.
POT1 encodes a single-stranded telomere-binding component of the shelterin complex. It is the only shelterin that contains 2 N-terminal oligonucleotide/oligosaccharide-binding (OB) domains. Recurrent mutations in POT1 have been found in chronic lymphocytic leukaemia where they occur in the clinically aggressive subtype with wild-type IGHV@. The POT1 mutations are most often found in gene regions encoding the 2 OB folds.
Stromal antigen 2 (STAG2) is a subunit of cohesin complex and has a role in chromatid separation during cell division. Genetic disruption of this process can lead to aneuploidy in cancer. A number of tumour types have been found to harbour somatic mutations in STAG2, these include bladder cancer, myeloid neoplasms and glioblastoma. The gene maps to the X-chromosome (Xq25) and is present as a single copy in males; in females the other X-chromosome is inactivated. Hence, complete genetic inactivation of STAG2 requires only a single mutational event. STAG2 has also been suggested to act as a tumour suppressor via other mechanisms distinct from its role in cohesion.
We have added mutation data for 841 tumour samples from publications where genome wide analyses have been used. More details can be found here
As from this release we no longer support Internet Explorer version 8. This allows us to facilitate and develop tools for the latest browsers and provide a richer user experience. We apologise for the inconvenience caused to IE 8 users.
COSMIC v70 includes an initial integration of gene expression data from TCGA, full literature curation of CALR, CD79A and CD79B, 12 whole-genome sequencing publications, and extensive updates to point mutation and structural variant data from ICGC (release 16, May 2014) and TCGA.
Gene expression level 3 data has been integrated into COSMIC from 10 publicly accessible TCGA studies. The platform codes currently used to produce the COSMIC gene expression values are: IlluminaGA_RNASeqV2, IlluminaHiSeq_RNASeqV2, AgilentG4502A_07_2, AgilentG4502A_07_3 . COSMIC now includes gene expression alongside coding mutations and copy number aberrations on the cancer browser, sample overview, gene analysis and study/paper overview pages. We have also added a gene expression track to the histogram on the gene analysis page and the circos diagram on the sample overview page, more details can be found here.
A mutation matrix has been added to the cancer browser, enabling the relationship between genes, point mutations, copy number gains/losses, over/under gene expression and samples to be investigated for a specific cancer.
The mutation matrix chart shows 20 x 175 boxes, with each box representing a gene-sample combination. Genes are ranked by the number of samples with variations (depending on the selected data type) and the samples are sorted using a clustering algorithm to group them in relation to the ranked genes, more details can be found here.
To improve the value of COSMIC data we have tried to identify the most significant high-value data within cancer genomes using the following filtering strategies -
We have excluded data from any sample with over 15,000 mutations. In addition, we have flagged all known SNPs as defined by the 1000 genomes project, dbSNP and a panel of 378 normal (non-cancer) samples from Sanger CGP sequencing. Using this approach 812,136 mutations have been flagged. Although all data are included in our download files, we have excluded flagged mutations from the website.
Although no CNV data has been excluded from the website, we have applied filtering so that by default only the most significant variants are shown. For these CNVs the minor allele and total copy number values are known and gain/loss has been defined using stringent criteria [ see the Copy Number Variants section in the help pages ]. However, at the head of every table showing CNVs there is an option to switch off the filter and view all the data.
In order to make it easier to examine each sample, analysis filters have been introduced on the sample overview page. These filters allow you to specify that the mutations viewed should be likely pathogenic (as defined by FATHMM analysis), in the cancer census genes, or of a particular mutation type. In future releases, we will be developing further filters across these data to enhance their analysis.
We have started to upgrade our help pages and have introduced two new tutorials to help users navigate the COSMIC website. The first of these tutorials focus on the components of the website [ Site Tour ] and a guide to searching COSMIC [ Search ].
The recently identified oncogene calreticulin (CALR) is a multi-functional Ca+ binding protein chaperone localised in the endoplasmic reticulum. CALR somatic mutations are now the second most prevalent mutation seen in patients with myeloproliferative neoplasms; Mutations have found in the majority of JAK2/MPL mutation-negative essential thrombocythaemia (ET) and primary myelofibrosis (PMF) patients, in addition to a small number of myelodysplastic patients (RARS, RARS-T, CMML and aCML). Almost all the reported mutations are insertion, deletion or complex mutations generating a +1 bp frameshift and an extended novel CALR C-terminal domain. CALR mutations appear to be associated with a more benign clinical course, younger age and male sex.
The Ig-alpha and Ig-beta proteins encoded by CD79A and CD79B are necessary for expression and function of the B-cell antigen receptor. Recurrent activating mutations in CD79A and CD79B have been identified in diffuse large B cell lymphoma where they occur more frequently in the activated B-cell-like subtype. The ITAM (immunoreceptor tyrosine-based activation motif) domain is targeted, with a hot spot at Y196 in CD79B. Mutations in both genes have also been found in Waldenstrom???s macroglobulinaemia.
In this release 12 systematic screen publications have been curated in COSMIC, more details can be found here.
We have decided to drop support for Internet Explorer version 8 from November 2014. This allows us to facilitate and develop tools for latest browsers and provide rich user experience for our users. We apologise in advance for the inconvenience caused to IE 8 users.