The refTSS is an annotated reference dataset for transcriptional start sites (TSS) in human and mouse. The dataset is generated by collecting, reprocessing and assembling various public resources. For question and inquiries about the data please contact reftss-help@riken.jp [Files] Under the directory of each organism (e.g. human and mouse), we stored the following files. * refTSS_v1.0_{human,mouse}_coordinate.{hg38,mm10}.bed] Coordination of TSS in the refTSS dataset. This is in BED9 format. The columns in this file are: 1. chromosome 2. start of TSS region 3. end of TSS region 4. name (ID) of the TSS (refTSS ID) 5. cumulative score across all samples 6. strand of the TSS 7. start of the representative TSS position 8. end of the representative TSS position (Note: end is always start+1) 9. rgb string for color coding (plus or minus strand only) * refTSS_v1.0_{human,mouse}_annotation.txt Associated gene / transcript / protein to each TSS (annotation). The columns are organized as follows. 1. refTSS ID 2. Transcript name (accession number) 3. Distance between the TSS and 5’-end of the transcript 4. Entrez Gene ID 5. HGNC/MGI ID 6. UniProt ID 7. Gene name 8. Gene symbol 9. Gene synonyms 10. Source of the gene annotation * refTSS_v1.0_{human,mouse}_transcript.txt All (candidate) transcripts located around each TSS. The columns show as follows. 1. refTSS ID 2. Transcript name (accession number) 3. Distance between the TSS and 5’-end of the transcript 4. The number of transcripts located around the TSS 5. All accession numbers of the transcripts * sources/*.bed Sources of the refTSS data, which were assembled to construct the representative TSS set. The files are in BED9. [Data sources] In the current release of refTSS, we assembled the following TSS sets. * FANTOM5 promoter atlas (CAGE mapping of TSS) http://fantom.gsc.riken.jp/5/ * dbTSS (TSS-Seq mapping of TSS) http://dbtss.hgc.jp/ * EPD (The Eukaryotic Promoter Database) (manually created promoter databases) http://epd.vital-it.ch/ Associated genes, transcripts, and proteins to TSS were generated with the following databases (as of March 31, 2016): GENCODE (v24 and vM9); Entrez Gene; RefSeq; HUGO Gene Nomenclature Committee (HGNC) database; the Mouse Genome Database (MGD); the UCSC Genome Browser; and UniProt [Future release plans] We are planning the further release: * Assemble of more TSS data sources * Integration of various resources including TSS activities, transcriptional regulations, and epigenetic information, to the refTSS.