SHLD1

SHLD1
Identifiers
Aliases	SHLD1, chromosome 20 open reading frame 196, shieldin complex subunit 1, RINN3, C20orf196
External IDs	OMIM: 618028 MGI: 1920997 HomoloGene: 51865 GeneCards: SHLD1
Gene location (Human)
Chr.	Chromosome 20 (human)
End	5,864,395 bp
Gene location (Mouse)
Chr.	Chromosome 2 (mouse)
End	132,592,975 bp
RNA expression pattern
	Top expressed in
	monocyte; ; oocyte; ; gastrocnemius muscle; ; right adrenal gland; ; blood; ; lymph node; ; spleen; ; islet of Langerhans; ; ganglionic eminence; ; appendix;
	Top expressed in
	interventricular septum; ; myocardium of ventricle; ; otolith organ; ; utricle; ; hand; ; blood; ; facial motor nucleus; ; internal carotid artery; ; external carotid artery; ; morula;
	More reference expression data
	n/a
Gene ontology
Molecular function	protein binding;
Cellular component	chromosome; site of double-strand break;
Biological process	positive regulation of isotype switching; negative regulation of double-strand break repair via homologous recombination; positive regulation of double-strand break repair via nonhomologous end joining; DNA repair; cellular response to DNA damage stimulus; regulation of double-strand break repair via nonhomologous end joining;
	Sources:Amigo / QuickGO
Orthologs
	149840
	73747
	ENSG00000171984
	ENSMUSG00000044991
	Q8IYI0
	Q9D112
	NM_001303477; NM_001303478; NM_001303479; NM_152504
	NM_028637; NM_001358260; NM_001358261
	NP_001290406; NP_001290407; NP_001290408; NP_689717
	NP_082913; NP_001345189; NP_001345190
	Wikidata
View/Edit Human	View/Edit Mouse

SHLD1 or shieldin complex subunit 1 is a gene on chromosome 20.^[5] The C20orf196 gene encodes an mRNA that is 1,763 base pairs long, and a protein that is 205 amino acids long.^[5]

Function[edit]

C20orf196 is involved in the DNA repair network. Gupta et al. identified C20orf196 as part of a vertebrate-specific protein complex called shieldin.^[6] Shieldin is recruited to double stranded breaks (DSB) to promote nonhomologous end joining-dependent repair (NHEJ), immunoglobulin class-switch recombination (CSR), and fusion of unprotected telomeres.^[6] Analysis indicates a sub-stoichiometric interaction or weaker interaction affinity of SHLD1 to the shieldin complex.^[6]

Gene[edit]

Locus[edit]

C20orf196 is located on the short arm of chromosome 20 at 20p12.3, from base pairs 5,750,286 to 5,864,407 on the direct strand.^[5] It contains 11 exons.^[7]

Aliases[edit]

Its aliases are RINN3^[6] and SHLD1.

Expression[edit]

mRNA[edit]

Alternative Splicing[edit]

C20orf196 produces 9 different mRNAs, with 7 alternatively spliced variants and 2 unspliced forms.^[7] There are 3 probable alternative promoters, 3 non-overlapping alternative last exons, and 2 alternative polyadenylation sites.^[7] The mRNAs differ by the truncation of the 5' end, truncation of the 3' end, presence or absence of 2 cassette exons, and overlapping exons with different boundaries.^[7]

Isoforms[edit]

C20orf196 has six splice isoforms.^[7]

Promoter[edit]

The promoter region is within bases 5749286 to 5750555, totaling 1270 base pairs.^[5] The transcription start site is located within bases 5750382 and 5750409, totaling 28 base pairs.^[5]

Expression[edit]

RNA-Seq analysis has shown ubiquitous expression of c20orf196 in 26 human tissues: adrenal, appendix, bone marrow, brain, colon, duodenum, endometrium, esophagus, fat, gall bladder, heart, kidney, liver, lung, lymph node, ovary, pancreas, placenta, prostate, salivary gland, skin, small intestine, spleen, stomach, testis, thyroid, and urinary bladder.^[5] The highest C20orf196 mRNA levels were found in the lymph node, tonsil, thyroid, adrenal gland, prostate, pharynx, parathyroid, connective tissue, and bone marrow.^[8]

C20orf196 was found to be expressed in soft tissue/muscle tissue tumors, lymphoma tumors, and pancreatic tumors.^[9] C20orf196 representation was biased toward the fetal developmental stage.^[9] EBI expression data showed high expression of C20orf196 in the diencephalon and cerebral cortex in the developing brain.^[9]

Protein[edit]

General Features[edit]

The most common transcript encodes a protein that is 205 amino acids long with a molecular mass of 23 kDa.^[10] It has a predicted isoelectric point of 4.72.^[11] It is predicted to have a half-life around 30 hours.^[12] C20orf196 contains 19 positive residues (9.3%), 32 negative residues (15.6%), and 46 hydrophobic residues (22.4%).^[13]

Cellular Localization[edit]

C20orf196 is predicted to localize in the nucleus.^[7]

Domains[edit]

C20orf196 contains one domain, DUF4521, which arose in Amniote.^[5] DUF4521 spans from amino acid 3 to 201.^[5] Several regions of this domain are conserved in c20orf196 orthologs found in mammals, amphibians, and fish. The proteins of this family are functionally uncharacterized.

Post-Translational Modifications[edit]

There are many phosphorylation sites targeted by unspecified serine kinases.^[14] C20orf196 is predicted to have one SUMOylation site at amino acid 203 and one N-glycosylation site at amino acid 69.^[15]^[16] C20orf196 is predicted to have two ubiquitination sites at amino acids 84 and 139.^[17]

Secondary Structure[edit]

Several modeling programs predicted a secondary structure containing alpha helix, beta sheet, and coil regions.^[18]^[19] CFSSP has predicted that C20orf196 secondary structure is 57.1% alpha helices, 48.8% beta strands, and 16.6% beta turns.^[20]

Protein Interactions[edit]

Several databases citing yeast two-hybrid screenings have found C20orf196 to interact with PRMT1, QARS, MAD2L2, and CUL3.^[21]^[22]^[23]^[24] C20orf196 functionally interacts with REV7, SHLD2, and SHLD3 in the shieldin complex within the DNA repair network.^[6]

Homology and Evolution[edit]

Orthologs[edit]

C20orf196 gene orthologs are found in species including mammals, birds, reptiles, and amphibians.^[6]^[25] C20orf196 has distant orthologs in bony fish and cartilaginous fish.^[6]^[25] There are no invertebrate orthologs.^[6] Orthologs are found in 163 organisms.^[5]

Table of Orthologs for C20orf196
Class	Species	Common Name	Date of Divergence (MYA)	Accession Number	Sequence Identity (%)	Sequence Similarity (%)
Mammalia (Marsupialia)	Sarcophilus harrisii	Tasmanian devil	159	XP_012395605.1	55	68
Mammalia (Marsupialia)	Phascolarctos cinereus	Koala	159	XP_020841153.1	54	67
Aves	Gallus gallus	Red junglefowl	312	XP_015139412.1	33	49
Aves	Aptenodytes forsteri	Emperor penguin	312	XP_009280865.1	35	47
Reptilia	Crocodylus porosus	Saltwater crocodile	312	XP_019404613.1	36	50
	Pogona vitticeps	Central bearded dragon	312	XP_020649300.1	30	46
	Thamnophis sirtalis	Common garter snake	312	XP_013911941.1	33	51
Amphibia	Nanorana parkeri	High Himalaya frog	352	XP_018422019.1	39	57
Osteichthyes	Monopterus albus	Asian swamp eel	435	XP_020455013.1	46	73
Chondrichthyes	Rhincodon typus	Whale shark	473	XP_020391945.1	30	55

Paralogs[edit]

There are no paralogs in humans.^[5]

Rate of evolution[edit]

C20orf196 has a high protein sequence divergence rate. It is a fast evolving protein. It evolves faster than fibrinogen, as seen in the figure to the right.

Phenotype[edit]

Genome-wide association studies have identified SNPs found in the C20orf196 gene that are associated with parental longevity, information processing speed, and breast carcinoma occurrence.^[26]

References[edit]

^ ^a ^b ^c GRCh38: Ensembl release 89: ENSG00000171984 – Ensembl, May 2017
^ ^a ^b ^c GRCm38: Ensembl release 89: ENSMUSG00000044991 – Ensembl, May 2017
^ "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
^ "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j "C20orf196 chromosome 20 open reading frame 196 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2018-02-05.
^ ^a ^b ^c ^d ^e ^f ^g ^h Gupta R, Somyajit K, Narita T, Maskey E, Stanlie A, Kremer M, Typas D, Lammers M, Mailand N, Nussenzweig A, Lukas J, Choudhary C (May 2018). "DNA Repair Network Analysis Reveals Shieldin as a Key Regulator of NHEJ and PARP Inhibitor Sensitivity". Cell. 173 (4): 972–988.e23. doi:10.1016/j.cell.2018.03.050. PMC 8108093. PMID 29656893. S2CID 4886733.
^ ^a ^b ^c ^d ^e ^f Thierry-Mieg, Danielle; Thierry-Mieg, Jean. "AceView: Gene:C20orf196, a comprehensive annotation of human, mouse and worm genes with mRNAs or ESTsAceView". www.ncbi.nlm.nih.gov. Retrieved 2018-02-05.
^ Uhlén M, Fagerberg L, Hallström BM, Lindskog C, Oksvold P, Mardinoglu A, Sivertsson Å, Kampf C, Sjöstedt E, Asplund A, Olsson I, Edlund K, Lundberg E, Navani S, Szigyarto CA, Odeberg J, Djureinovic D, Takanen JO, Hober S, Alm T, Edqvist PH, Berling H, Tegel H, Mulder J, Rockberg J, Nilsson P, Schwenk JM, Hamsten M, von Feilitzen K, Forsberg M, Persson L, Johansson F, Zwahlen M, von Heijne G, Nielsen J, Pontén F (January 2015). "Proteomics. Tissue-based map of the human proteome". Science. 347 (6220): 1260419. doi:10.1126/science.1260419. PMID 25613900. S2CID 802377.
^ ^a ^b ^c "The European Bioinformatics Institute < EMBL-EBI". 2018.
^ Database, GeneCards Human Gene. "C20orf196 Gene - GeneCards | CT196 Protein | CT196 Antibody". www.genecards.org. Retrieved 2018-02-20.
^ "Compute pI/Mw". ExPASy. 2018.
^ Bachmair A, Finley D, Varshavsky A (October 1986). "In vivo half-life of a protein is a function of its amino-terminal residue". Science. 234 (4773): 179–86. Bibcode:1986Sci...234..179B. doi:10.1126/science.3018930. PMID 3018930.
^ "Statistical Analysis of Protein Sequences". EMBL-EBI. 2018.
^ Blom N, Gammeltoft S, Brunak S (December 1999). "Sequence and structure-based prediction of eukaryotic protein phosphorylation sites". Journal of Molecular Biology. 294 (5): 1351–62. doi:10.1006/jmbi.1999.3310. PMID 10600390.
^ Zhao Q, Xie Y, Zheng Y, Jiang S, Liu W, Mu W, Liu Z, Zhao Y, Xue Y, Ren J (July 2014). "GPS-SUMO: a tool for the prediction of sumoylation sites and SUMO-interaction motifs". Nucleic Acids Research. 42 (Web Server issue): W325-30. doi:10.1093/nar/gku383. PMC 4086084. PMID 24880689.
^ Gupta R, Jung E, Brunak S. "Prediction of N-glycosylation sites in human proteins". DTU Bioinformatics. 46: 203–206.
^ Huang CH, Su MG, Kao HJ, Jhong JH, Weng SL, Lee TY (January 2016). "UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines". BMC Systems Biology. 10 Suppl 1 (1): 6. doi:10.1186/s12918-015-0246-z. PMC 4895383. PMID 26818456.
^ Zhang Y (January 2008). "I-TASSER server for protein 3D structure prediction". BMC Bioinformatics. 9: 40. doi:10.1186/1471-2105-9-40. PMC 2245901. PMID 18215316.
^ Raghava, G. P. S. (2000). "APSSP: Advanced Protein Secondary Structure Prediction Server".
^ T, Ashok Kumar (2013-04-01). "CFSSP: Chou and Fasman Secondary Structure Prediction server". Zenodo. 1 (9): 15–19. doi:10.5281/zenodo.50733.
^ Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP, Kuhn M, Bork P, Jensen LJ, von Mering C (January 2015). "STRING v10: protein-protein interaction networks, integrated over the tree of life". Nucleic Acids Research. 43 (Database issue): D447-52. doi:10.1093/nar/gku1003. PMC 4383874. PMID 25352553.
^ Licata L, Briganti L, Peluso D, Perfetto L, Iannuccelli M, Galeota E, Sacco F, Palma A, Nardozza AP, Santonico E, Castagnoli L, Cesareni G (January 2012). "MINT, the molecular interaction database: 2012 update". Nucleic Acids Research. 40 (Database issue): D857-61. doi:10.1093/nar/gkr930. PMC 3244991. PMID 22096227.
^ Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, Margalit H, Armstrong J, Bairoch A, Cesareni G, Sherman D, Apweiler R (January 2004). "IntAct: an open source molecular interaction database". Nucleic Acids Research. 32 (Database issue): D452-5. doi:10.1093/nar/gkh052. PMC 308786. PMID 14681455.
^ Calderone A, Castagnoli L, Cesareni G (August 2013). "mentha: a resource for browsing integrated protein-interaction networks". Nature Methods. 10 (8): 690–1. doi:10.1038/nmeth.2561. PMID 23900247. S2CID 9733108.
^ ^a ^b Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (October 1990). "Basic local alignment search tool". Journal of Molecular Biology. 215 (3): 403–10. doi:10.1016/s0022-2836(05)80360-2. PMID 2231712. S2CID 14441902.
^ "GWAS Catalog". 2018.

[refGRCh38Ensembl-1] GRCh38: Ensembl release 89: ENSG00000171984 – Ensembl, May 2017

[refGRCm38Ensembl-2] GRCm38: Ensembl release 89: ENSMUSG00000044991 – Ensembl, May 2017

[3] "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.

[4] "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.

[:0-5] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j "C20orf196 chromosome 20 open reading frame 196 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2018-02-05.

[:2-6] ^ ^a ^b ^c ^d ^e ^f ^g ^h Gupta R, Somyajit K, Narita T, Maskey E, Stanlie A, Kremer M, Typas D, Lammers M, Mailand N, Nussenzweig A, Lukas J, Choudhary C (May 2018). "DNA Repair Network Analysis Reveals Shieldin as a Key Regulator of NHEJ and PARP Inhibitor Sensitivity". Cell. 173 (4): 972–988.e23. doi:10.1016/j.cell.2018.03.050. PMC 8108093. PMID 29656893. S2CID 4886733.

[:1-7] ^ ^a ^b ^c ^d ^e ^f Thierry-Mieg, Danielle; Thierry-Mieg, Jean. "AceView: Gene:C20orf196, a comprehensive annotation of human, mouse and worm genes with mRNAs or ESTsAceView". www.ncbi.nlm.nih.gov. Retrieved 2018-02-05.

[8] Uhlén M, Fagerberg L, Hallström BM, Lindskog C, Oksvold P, Mardinoglu A, Sivertsson Å, Kampf C, Sjöstedt E, Asplund A, Olsson I, Edlund K, Lundberg E, Navani S, Szigyarto CA, Odeberg J, Djureinovic D, Takanen JO, Hober S, Alm T, Edqvist PH, Berling H, Tegel H, Mulder J, Rockberg J, Nilsson P, Schwenk JM, Hamsten M, von Feilitzen K, Forsberg M, Persson L, Johansson F, Zwahlen M, von Heijne G, Nielsen J, Pontén F (January 2015). "Proteomics. Tissue-based map of the human proteome". Science. 347 (6220): 1260419. doi:10.1126/science.1260419. PMID 25613900. S2CID 802377.

[:3-9] "The European Bioinformatics Institute < EMBL-EBI". 2018.

[10] Database, GeneCards Human Gene. "C20orf196 Gene - GeneCards | CT196 Protein | CT196 Antibody". www.genecards.org. Retrieved 2018-02-20.

[11] "Compute pI/Mw". ExPASy. 2018.

[12] Bachmair A, Finley D, Varshavsky A (October 1986). "In vivo half-life of a protein is a function of its amino-terminal residue". Science. 234 (4773): 179–86. Bibcode:1986Sci...234..179B. doi:10.1126/science.3018930. PMID 3018930.

[13] "Statistical Analysis of Protein Sequences". EMBL-EBI. 2018.

[14] Blom N, Gammeltoft S, Brunak S (December 1999). "Sequence and structure-based prediction of eukaryotic protein phosphorylation sites". Journal of Molecular Biology. 294 (5): 1351–62. doi:10.1006/jmbi.1999.3310. PMID 10600390.

[15] Zhao Q, Xie Y, Zheng Y, Jiang S, Liu W, Mu W, Liu Z, Zhao Y, Xue Y, Ren J (July 2014). "GPS-SUMO: a tool for the prediction of sumoylation sites and SUMO-interaction motifs". Nucleic Acids Research. 42 (Web Server issue): W325-30. doi:10.1093/nar/gku383. PMC 4086084. PMID 24880689.

[16] Gupta R, Jung E, Brunak S. "Prediction of N-glycosylation sites in human proteins". DTU Bioinformatics. 46: 203–206.

[17] Huang CH, Su MG, Kao HJ, Jhong JH, Weng SL, Lee TY (January 2016). "UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines". BMC Systems Biology. 10 Suppl 1 (1): 6. doi:10.1186/s12918-015-0246-z. PMC 4895383. PMID 26818456.

[18] Zhang Y (January 2008). "I-TASSER server for protein 3D structure prediction". BMC Bioinformatics. 9: 40. doi:10.1186/1471-2105-9-40. PMC 2245901. PMID 18215316.

[19] Raghava, G. P. S. (2000). "APSSP: Advanced Protein Secondary Structure Prediction Server".

[20] T, Ashok Kumar (2013-04-01). "CFSSP: Chou and Fasman Secondary Structure Prediction server". Zenodo. 1 (9): 15–19. doi:10.5281/zenodo.50733.

[21] Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP, Kuhn M, Bork P, Jensen LJ, von Mering C (January 2015). "STRING v10: protein-protein interaction networks, integrated over the tree of life". Nucleic Acids Research. 43 (Database issue): D447-52. doi:10.1093/nar/gku1003. PMC 4383874. PMID 25352553.

[22] Licata L, Briganti L, Peluso D, Perfetto L, Iannuccelli M, Galeota E, Sacco F, Palma A, Nardozza AP, Santonico E, Castagnoli L, Cesareni G (January 2012). "MINT, the molecular interaction database: 2012 update". Nucleic Acids Research. 40 (Database issue): D857-61. doi:10.1093/nar/gkr930. PMC 3244991. PMID 22096227.

[23] Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, Margalit H, Armstrong J, Bairoch A, Cesareni G, Sherman D, Apweiler R (January 2004). "IntAct: an open source molecular interaction database". Nucleic Acids Research. 32 (Database issue): D452-5. doi:10.1093/nar/gkh052. PMC 308786. PMID 14681455.

[24] Calderone A, Castagnoli L, Cesareni G (August 2013). "mentha: a resource for browsing integrated protein-interaction networks". Nature Methods. 10 (8): 690–1. doi:10.1038/nmeth.2561. PMID 23900247. S2CID 9733108.

[:4-25] Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (October 1990). "Basic local alignment search tool". Journal of Molecular Biology. 215 (3): 403–10. doi:10.1016/s0022-2836(05)80360-2. PMID 2231712. S2CID 14441902.

[26] "GWAS Catalog". 2018.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]