Frequently asked questions
From referenceTSS
Jump to: navigation, search
Frequently asked questions and their answers about the refTSS resource
- A1. Currently, there are no data that collects only the transcription start sites of non-coding RNAs. It is possible to extract noncording RNAs by concatenating the GENCODE annotation file and the following files:
https://reftss.riken.jp/datafiles/3.0/human/refTSS_v3.0_human_annotation.txt.gz
https://reftss.riken.jp/datafiles/3.0/mouse/refTSS_v3.0_mouse_annotation.txt.gz
- A2. The range of TSSs is in the region of 8-24 bp, while for others the range can be over 100 bp. We computationally identify TSS regions based on the mapping results with 5'-end sequences.
- A3. We search for potential associated genes based on our annotation pipeline. In this pipeline, we searched for annotated TSS within 500 bp, and choose the nearest genes as associated ones. In this case, we consider the TSS is an unannotated (undetected) 1st exon. Of course, since it is a prediction, it can be incorrect. We have no clear guidelines about the confidence, but the distance can be a quasi-measure of the strength of the prediction.
- A4. Usually, even if the patch number is different, the coordination is not changed. You can use the current TSSs coordinates provided in refTSS regardless of the patch number.
- A5. There are many ways in which you can convert BED file formats to GTF. For example, a command that uses UCSC tools (bedToGenePred and genePredToGtf) like this:
> bedToGenePred in.bed /dev/stdout | genePredToGtf file /dev/stdin out.gtf
If you are using Galaxy, it provides solution for converting BED to GTF.
- A6. In refTSS’ UCSC configuration of the bigBed 6 tracks, we set the option:
colorByStrand 255,0,0 0,0,255
This setting enables to show alignments on the reverse strand colored in dark red, whereas alignments on the forward strand are colored in dark blue. Again, in refTSS UCSC configuration of the bigBed 6 tracks, we set the option:exonArrows on exonArrowsDense on
The above options enable us to show the Exon arrows in white heads within the TSS.
- A7. In the current release of refTSS we only provide BED and TEXT formats, but not bigWig files. We are planning to offer this in a future release. However, the BED files from refTSS can be used to generate bigwig file format using several UCSC tools (http://hgdownload.soe.ucsc.edu/admin/exe/) that supports this kind of conversion.
- A8. In refTSS we only integrate TSS regions, but not the TSS tag counts (Score), this is due to the fact that different data sources use different sequence specifications. Instead, we recommend you take the expression tables from the FANTOM5 web site (https://fantom.gsc.riken.jp/5/datafiles/reprocessed/mm10_latest/extra/CAGE_peaks_expression/) and then retrieve the count data.
- A9. In the current version of refTSS, you can’t determine the location of the start and stop codons, but you can predict them using the transcript and protein annotation information provided by refTSS at https://reftss.riken.jp/datafiles/current/mouse/gene_annotation/
- A10. In the current release of refTSS we are not providing sample information (i.e. cell or tissue etc.). If you need such info, you will need to look at the source of TSS from here https://reftss.riken.jp/datafiles/current/human/sources/