-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Hi rdp staff,
I am trying to retrain RDP classifier using NCBI 16s database, however, when I looked into the example taxonomy file and the fasta file, I am a bit confused how should I even generate that file.
0*Root*-1*0*rootrank
1*Bacteria*0*1*domain
2*"Actinobacteria"*1*2*phylum
3*Actinobacteria*2*3*class
4*Acidimicrobidae*3*4*subclass
5*Acidimicrobiales*4*5*order
6*"Acidimicrobineae"*5*6*suborder
7*Acidimicrobiaceae*6*7*family
8*Acidimicrobium*7*8*genus
9*Ferrimicrobium*7*8*genus
10*Ferrithrix*7*8*genus
11*Ilumatobacter*7*8*genus
12*Iamiaceae*6*7*family
3102*Aquihabitans*12*8*genus
13*Iamia*12*8*genus
Could you please explain how each line is constructed? Allow me to take a line as an example,
6*"Acidimicrobineae"*5*6*suborder
I could guess that the first number is the taxonomy id for Acidimicrobineae, which is 6, and its parent taxonomy is 5, Acidimicrobiales. I assume that the suborder at the end of the line indicates that Acidimicrobineae is at the taxonomy rank of suborder, right? Then what is the 6 before suborder mean? when I look at 12*Iamiaceae*6*7*family, I can say Iamiaceae is a family level taxonomy, which has the parent of 6 (Acidimicrobineae) and 7 (Acidimicrobiaceae)? I am not sure I am getting what's the rule of constructing the taxonomy file here. Could you please explain how this is done?
Thanks in advance,
Eddi