The refTSS is an annotated reference dataset for transcriptional start sites (TSS) in human and mouse. The dataset is generated by collecting, reprocessing and assembling various public resources. For question and inquiries about the data please contact reftss-help@riken.jp Last update: 2024.1.31 [Files] Under the directory of each organism (e.g. human and mouse), we stored the following files. * refTSS_v4.1_{human,mouse}_coordinate.{hg38,mm39}.bed Coordination of TSS in the refTSS dataset. This is in BED6 format. The columns in this file are: 1. chromosome 2. start of TSS region 3. end of TSS region 4. name (ID) of the TSS (refTSS ID) 5. score (=1) 6. strand of the TSS * refTSS_v4.1_{human,mouse}_{genome version}_id_change_list.txt.gz Relationship between TSS IDs in refTSS4.1 and their refTSS3.3. * refTSS_v4.1.{human,mouse}_coordinate.{genome version}.rds refTSS coordination and annotation data in R Data Format (RDS). See details in refTSS_gr_object_Instruction.210616.md * refTSS_v4.1_{human,mouse}_{genome version}_annotation.txt Associated gene / transcript / protein to each TSS (annotation). The columns are organized as follows. 1. refTSS ID 2. Transcript name (accession number) 3. Distance between the TSS and 5’-end of the transcript 4. Entrez Gene ID 5. HGNC/MGI ID 6. UniProt ID 7. Gene name 8. Gene symbol 9. Gene synonyms 10. Source of the gene annotation * refTSS_v4.1_{human,mouse}_{genome version}_transcript.txt All (candidate) transcripts located around each TSS. The columns show as follows. 1. refTSS ID 2. Transcript name(s) (accession number(s)) whose 5'-ends are located nearest to the TSS 3. Distance between the TSS and 5’-ends of the above transcripts 4. The number of transcripts located around the TSS (<=500bp) 5. All accession numbers of the transcripts (<=500bp) with the distance between their 5'-end and TSS (numbers after ':') * refTSS_v4.1_{human,mouse}_{genome version}_tata_annotation.txt The TATA-Box was annotated in refTSS by using Homer software for motif discovery and next-gen sequencing analysis (http://homer.ucsd.edu/homer/ngs/annotation.html). The resulting text file contains the following set attributes: 1:refTSS_ID 2:Detailed Annotation 3:Distance to annotation 4:CpG% 5:GC% 6:TATA-Box(TBP)/Promoter/Homer Distance From Peak(sequence,strand,conservation) * refTSS_v4.1_{human,mouse}_{genome version}.RegBuild.gz Regulatory annotations in Ensembl Regulatory Build (ERB) in which each TSS peaks are located. * liftovered_mm39_to_hg38_peaks_overlapped_reftss_hg38_500bp.bed A list of mouse refTSS peaks that can be liftovered within 500bp of any human refTSS peaks. The file shows the locations of liftovered mouse refTSS peaks in the hg38 genomic coordination. * reftss_hg38_peaks_overlapped_liftovered_mm39_to_hg38_500bp.bed A list of human refTSS peaks that are located within 500bp of any liftovered mouse refTSS peaks. * liftovered_hg38_to_mm39_peaks_overlapped_reftss_mm39_500bp.bed A list of human refTSS peaks that can be liftovered within 500bp of any mouse refTSS peaks. The file shows the locations of liftovered human refTSS peaks in the mm10 genomic coordination. * reftss_mm39_peaks_overlapped_liftovered_hg38_to_mm39_500bp.bed A list of mouse refTSS peaks that are located within 500bp of any liftovered human refTSS peaks. * refTSS_v4.1_{human,mouse}_coordinate.ann.{genome version}.rds The RefTSS GRanges object for loading refTSS annotations in R environment. More details were written in refTSS_gr_object_Instruction.md. * refTSS.{genome version}.4.1_cCREs_annotation.txt.gz The list of cCREs overlapping with refTSS. * refTSS_v4.1_{human, mouse}_{genome version}_studies.count.gz The list of source dataset and it's protocol.