Data formats allowed by psRNATarget:

multi-FASTA format(for small RNAs and target transcripts):
>AT1G27360.1
AAGGTATCTATTTGCCTAGCCAGAGTTATATATAGGATTGATTGTCTAGTCTTTTCTTAT
ATGATTTTTGTTCTCATTTACTAATCAAAGTTCTGCAAACTTGTAGTTGTTGTAGGATTT
GTTGCTCTGGCTCTGGTGGTAGGTCTATGAAATCAACCCATATCGTGAATGGACTGCAAC
ATGGTATCTTCGTCCCAGTGGGATTGGGAGCATTTGATCATGTCCAATCCGTCAAGGACT
GAAGATGACAGCAAACAG
>AT1G27360.4 | Symbols:  | squamosa promoter-binding protein
CTGGGTGAAACATAGAAAAGTTTCTCTTGCTCAAGTTAATGATAAAAGGGTGAGAGCAAT
AAACGCTGATAAGCCTTGTCTGGTCCTTGGAATTTTGAATTTTCTTTTTCTATCTTACTT
ATAGTATTGGTAGTTGAGGGTGTCGTCGATAAGTTGTTGTAGGATTTGTTGCTCTGGCTC
TGGTGGTAGGTCTATGAAATCAACCCATATCGTGAATGGACTGCAACATGGTATCTTCGT
CCCAGTGGGATTGGGAGCATTTGATCATGTCCAATCCGTCAAGGACTGAAGATGACAGCA
AACAGCTACCTACTGAGTGGGAAATTGAAAAAGGTGAAGGAATTGAATCTATAGTTCCAC
ATTTCTCAGGCCTTGAGAGAGTCAGTAGTGGCTCTGCCACCAGCTTCTGGCACACTGCTG
TATCGAAAAGCTCACAGTCGACCTCTATCAACTCATCATCTCCCGAAGCCAAACGATGCA
AGCTTGCATCAGA
Short Tags: for small RNA sequences, one sequence per line
UGACAGAAGAGAGUGAGCAC
UUGACAGAAGAUAGAGAGCAC
UCCCAAAUGUAGACAAAGCA
UGUGUUCUCAGGUCACCCCUU
UGUGUUCUCAGGUCACCCCUG
UGUGUUCUCAGGUCACCCCUG
UGGUAGCAGUAGCGGUGGUAA
AAGCUCAGGAGGGAUAGCGCC
AAGCUCAGGAGGGAUAGCGCC
Pure Sequence: a single target transcript sequence without FASTA head (may occupy multi-lines)
CTGGGTGAAACATAGAAAAGTTTCTCTTGCTCAAGTTAATGATAAAAGGGTGAGAGCAAT
AAACGCTGATAAGCCTTGTCTGGTCCTTGGAATTTTGAATTTTCTTTTTCTATCTTACTT
ATAGTATTGGTAGTTGAGGGTGTCGTCGATAAGTTGTTGTAGGATTTGTTGCTCTGGCTC
TGGTGGTAGGTCTATGAAATCAACCCATATCGTGAATGGACTGCAACATGGTATCTTCGT
CCCAGTGGGATTGGGAGCATTTGATCATGTCCAATCCGTCAAGGACTGAAGATGACAGCA
AACAGCTACCTACTGAGTGGGAAATTGAAAAAGGTGAAGGAATTGAATCTATAGTTCCAC
ATTTCTCAGGCCTTGAGAGAGTCAGTAGTGGCTCTGCCACCAGCTTCTGGCACACTGCTG
TATCGAAAAGCTCACAGTCGACCTCTATCAACTCATCATCTCCCGAAGCCAAACGATGCA
AGCTTGCATCAGA

User-submitted small RNA sequence(s):

Prior to analysis, back-end pipeline will check submitted small RNAs, mainly including miRNA and phasiRNA (sRNA) sequences by the following standards:

  • A valid sequence can only be either FASTA or short tag format (see above figures);
  • At most 50M sRNA sequences can be analyzed once by pipeline and maximal submission size is 200MiB;
  • Sequence minimum length is equal to HSP value, which is 19 for scoring schema V2, 20 for scoring schema V1, or up to your choice in customized schema;
  • Sequence maximum length is 25 or HSP+5, whichever is greater. So it should be 25 for both scoring schemas;
  • Unqualified small RNA will be ignored by pipeline;
  • Only 'ATCGUN' are valid sequence letters, a sequence containing other letters will be ignored;
For FASTA format, we only suggest alphabet, digit and minor/underscore characters in sequence ID. In addition, please avoid long sequences ID, such as, a ID longer than 50 letters, because long ID may mess up web display.

User-submitted target candidate sequence(s):

Users are allowed to submit target candidate sequences of their interest in this section. A typical target transcript sequence can be a cDNA, EST, Unigene, mRNA or genomic segment, etc. The server will search possible target sites on these submitted target cadidates for (submitted or preloaded) small RNA sequences (mainly including miRNA and ta-siRNA, sic passim). Prior to analysis, back-end pipeline will check these submitted sequences by the following standards:

  • A valid sequence can only be FASTA format or single sequence without FASTA header(Pure Sequence, see above figures);
  • At most 5M target candidate sequences can be analyzed once by pipeline and maximal submission size is 1000MiB;
  • A target candidate sequence should be between 50 and 5,000,000 nucleotides in length, and pipeline will ignore the sequences out of this range;
  • Only 'ATCGUN' are valid sequence letters, other characters will be deleted or changed to 'N'.
For FASTA format, we only suggest alphabet, digit and minor/underscore characters in sequence ID. In addition, please avoid long sequences ID, such as, a ID longer than 50 letters, because long ID may interrupt web display.

Preprocess of Next-Generation-Sequencing (NGS) Data:

The raw NGS data need to be preprocessed prior to submission. For the miRNA sequenced by NGS, users should firstly convert them either into FASTA format or as short tags (see above examples). To reduce data size, users need to filter sequences by length to only keep those with 19-25 NT. Redundant sequences can be removed to further reduce data size. For the mRNA transcript (target candidates) sequenced by NGS, we recommend de novo transcriptome assembly which will generate longer contig and improve prediction quality. The workload for analysis server will also be reduced.

Scoring schema V1(2011):

The V1 scoring schema [PMID:21622958] was developed referring to the model from animal based on a series of research papers at early stage. One of major features is that the seed region is from No. 2-8 bp only and there is no limit for the number of mismatches occured in seed region. In our early study, the v1 schema can identify all of validated miRNA-target pair (usually by 5'-RACE) in our curated dataset if the maximum epxectation is set to 5.0. In psRNATarget, we set the default value of maximum expectation to 3.0 for compatibility reason

Scoring schema V2(2017):

We improved the default scoring schema based on the curated dataset including the validated miRNA-target pairs after the V1 schema published. The improved schema (V2, 2017 release) can find more curated miRNA-target pairs from the updated dataset without significant increase in total output. In V2 schema, the seed region has been extended to No. 2-13 bp and the maximum number of mismatches (excluding G-U) allowed in seed region has been restricted to two. In addtion, the analysis of target accessibility has been disable since its value didn't change the final output. The default maximum expectation is set to 5.0, which recalls 93% of validated miRNA-target pairs compared to the 86% of recall rate reached by V1 schema with the same cutoff.

User-customized Schema:

User may change settings to handle special case of target recognition. For example, some miRNA-target interactions may accommodate long INDEL, so Penalty for opening gap can be reduced to display more such kind of interactions. Extra weight in seed region can also be increased to give more weight for seed region recognition. Calculate target accessibility can be enabled to consider the effect of mRNA secondary structure on target recognition. Please referring to the help information described below to adjust schema.

Maximum expectation:

Expectation value is the penalty for the mismatches between small RNA mature and target sequence. Higher value indicated less similarity (and possibility) between small RNA and target candidate. The default penalty rule is set up by scoring schema. Maximum expectation is the cutoff; any small RNA-target pair with expecation less than the cutoff will be discarded in final result. The recommended values are 3.0-5.0 depending on scoring schema.

Length for complementarity scoring (hspsize):

The length of region in which the server will score complementarity between small RNA and target transcript. The recommended range for hspsize is 19-20. Be aware that scoring algorithm will only penalize mismatches in this region(from No. 1 to No. hspsize nt) and subsequent mismatches will be ignored. In addition, the submitted small RNAs will be removed if they are shorter than HSP value.

Number of top target genes for each small RNA (top):

The number of top (the best) target gene candidates that will be listed for each submitted small RNA.

Target accessibility - maximum energy to unpair the target site (UPE):

The accessibility of mRNA target site to small RNA has been identified as one of important factors that are involved in target recognition because the secondary structure (stem etc.) around target site will prevent small RNA (including miRNA and ta-siRNA, sic passim) and mRNA target from contacting. The psRNATarget server employes RNAup to calculate target accessbility, which is represented by the energy required to open (unpair) secondary structure around target site (usually the complementary region with small RNA and up/downstream) on target mRNA(see figure below). The less energy means the more possibility that small RNA is able to contact (and cleave) target mRNA.

In above figure, represents the energy that is required to open secondary structure around target site. We use a software, namely RNAup, described by Muckstein et al (2005, pmid=16446276) to calculate this value, denoted as UPE.

Flanking length around target site for target accessibility analysis:

Besides target site (complementary region with small RNA) itself, its two flanks on mRNA are also required to be opened in secondary structure for small RNA's (including miRNA and ta-siRNA, sic passim) binding and cleavage (see two red up-arrows in the following figure). The reason is that small RNA binds to target mRNA in the groove of RISC complex which need extra space on two sides of target site. Kertesz et al (2007)(PMID:17893677) suggested that 17 upstream and 13 downstream nucleotides of target site should be considered in target accessibility analysis.

Translation inhibition range:

In addition to cleave mRNA, plant miRNA also reportedly inhibits the translation of target genes. It often happens if any mismatch occurs in around center of complemetary region because the central region is essential for cleavage (Brodersen et al 2008, PMID: 18483398). This mechanism is different from translational inhibition of animal miRNA, although the latter also inhibits gene expression at the translational level.

The users are allowed to set coordinates of central region in which any mismatch will be reported as the trigger of translational inhibition.

Multiplicity of target site:

Two-hits model (Axtell et al, 2005; PMID:17081978) suggests that a miRNA or ta-siRNA may have multiple target sites (i.e. complementary regions) on a specific target transcript, which will increase recognition actitivity of the miRNA/ta-siRNA to the mRNA target. The server will report the number of target sites for each small RNA/target pair. Users are advised to preferentially select a sRNA/target pair with more target sites.