The refTSS is an annotated reference dataset for transcriptional start sites (TSS) in human and mouse. The dataset is generated by collecting, reprocessing and assembling various public resources. For question and inquiries about the data please contact reftss-help@riken.jp Last update: 2025.4.4 - New contents: refTSS-FANTOM5 expression table in human [Files] Under the directory of each organism (e.g. human and mouse), we stored the following files. Please note that mm10_lifted was generated from mm39 file using liftOver tool. * refTSS_v4.1_{human,mouse}_coordinate.{genome versions}.bed Coordination of TSS in the refTSS dataset. This is in BED6 format. The columns in this file are: 1. chromosome 2. start of TSS region 3. end of TSS region 4. name (ID) of the TSS (refTSS ID) 5. score (=1) 6. strand of the TSS * refTSS_v4.1_{human,mouse}_{genome version}_id_change_list.txt.gz Relationship between TSS IDs in refTSS4.1 and their refTSS3.3. * refTSS_v4.1.{human,mouse}_coordinate.{genome version}.rds refTSS coordination and annotation data in R Data Format (RDS). See details in refTSS_gr_object_Instruction.210616.md * refTSS_v4.1_{human,mouse}_{genome version}_annotation.txt Associated gene / transcript / protein to each TSS (annotation). The columns are organized as follows. 1. refTSS ID 2. Transcript name (accession number) 3. Distance between the TSS and 5’-end of the transcript 4. Entrez Gene ID 5. HGNC/MGI ID 6. UniProt ID 7. Gene name 8. Gene symbol 9. Gene synonyms 10. Source of the gene annotation * refTSS_v4.1_{human,mouse}_{genome version}_transcript.txt All (candidate) transcripts located around each TSS. The columns show as follows. 1. refTSS ID 2. Transcript name(s) (accession number(s)) whose 5'-ends are located nearest to the TSS 3. Distance between the TSS and 5’-ends of the above transcripts 4. The number of transcripts located around the TSS (<=500bp) 5. All accession numbers of the transcripts (<=500bp) with the distance between their 5'-end and TSS (numbers after ':') * refTSS_v4.1_{human,mouse}_{genome version}_tata_annotation.txt The TATA-Box was annotated in refTSS by using Homer software for motif discovery and next-gen sequencing analysis (http://homer.ucsd.edu/homer/ngs/annotation.html). The resulting text file contains the following set attributes: 1: refTSS_ID 2: Detailed Annotation 3: Distance to annotation 4: CpG% 5: GC% 6: TATA-Box(TBP)/Promoter/Homer Distance From Peak(sequence,strand,conservation) * refTSS4.1.F5_phase1and2_tpm.osc.hg38.txt.gz Human refTSS4.1 was connected with FANTOM5 expression table using overlapping genomic corrdinates. FANTOM5 expression table was downloaded from: https://fantom.gsc.riken.jp/5/datafiles/reprocessed/hg38_latest/extra/CAGE_peaks_expression/ Note that to keep header sequence consistencies, we put extra words "refTSS4ID and FANTOMID" on MAPPED and NROM_FACTOR columns. Please refer the detail of samples and headers to the FANTOM5 website and SSTAR (https://fantom.gsc.riken.jp/5/sstar/Main_Page). The resulting Tab deliminated text file contains the following set attributes: 1: refTSS4ID 2: FANTOMID 3: FANTOM5 expression table header 4-1832: TSS expressions (TPM values) # refTSS4.1.F5_phase1and2.nonoverlap.txt.gz Non-overlapped Human refTSS4.1 IDs with FANTOM5 coordinates in the process of refTSS4.1.F5_phase1and2_tpm.osc.hg38.txt.gz. * refTSS_v4.1_{human,mouse}_{genome version}.RegBuild.gz Regulatory annotations in Ensembl Regulatory Build (ERB) in which each TSS peaks are located. * liftovered_mm39_to_hg38_peaks_overlapped_reftss_hg38_500bp.bed A list of mouse refTSS peaks that can be liftovered within 500bp of any human refTSS peaks. The file shows the locations of liftovered mouse refTSS peaks in the hg38 genomic coordination. * reftss_hg38_peaks_overlapped_liftovered_mm39_to_hg38_500bp.bed A list of human refTSS peaks that are located within 500bp of any liftovered mouse refTSS peaks. * liftovered_hg38_to_mm39_peaks_overlapped_reftss_mm39_500bp.bed A list of human refTSS peaks that can be liftovered within 500bp of any mouse refTSS peaks. The file shows the locations of liftovered human refTSS peaks in the mm10 genomic coordination. * reftss_mm39_peaks_overlapped_liftovered_hg38_to_mm39_500bp.bed A list of mouse refTSS peaks that are located within 500bp of any liftovered human refTSS peaks. * refTSS_v4.1_{human,mouse}_coordinate.ann.{genome version}.rds The RefTSS GRanges object for loading refTSS annotations in R environment. More details were written in refTSS_gr_object_Instruction.md. * refTSS.{genome version}.4.1_cCREs_annotation.txt.gz The list of cCREs overlapping with refTSS. * refTSS_v4.1_{human, mouse}_{genome version}_studies.count.gz The list of source dataset and it's protocol.