Skip to content

add: gotree reformat #7052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions tools/gotree/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
description: gotree toolkit
categories:
Comment on lines +1 to +3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
---
description: gotree toolkit
categories:
description: gotree toolkit
categories:

- Phylogenetics
owner: iuc
homepage_url: https://github.com/evolbioinfo/gotree
long_description: |
Gotree is a set of command line tools to manipulate phylogenetic trees
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/tools/gotree
type: unrestricted
auto_tool_repositories:
name_template: "{{ tool_id }}"
description_template: "Wrapper for gotree {{ tool_name }}."
95 changes: 95 additions & 0 deletions tools/gotree/gotree-reformat.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
<tool id="gotree_reformat" name="gotree reformat" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="23.0">
<description>Convert phylogenetic trees between Nexus/NHX, Newick and phyloxml formats.</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="xrefs"/>
<expand macro="requirements"/>
<expand macro="version_command"/>

<command detect_errors="exit_code"><![CDATA[
#set format=$input_tree.file_ext
case "$format" in
newick) TREEFORMAT="newick" ;;
nex) TREEFORMAT="nexus" ;;
nexus) TREEFORMAT="nexus" ;;
phyloxml) TREEFORMAT="phyloxml" ;;
json) TREEFORMAT="nextstrain" ;;
*) echo "Unknown format: \$TREEFORMAT" && exit 1 ;;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be echo "Unknown format: $format" because at this point TREEFORMAT has not been set.

esac &&

gotree reformat $output_format
--input $input_tree
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--input $input_tree
--input '$input_tree'

--input-format \$TREEFORMAT
--output ./output.tree
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't you write to '$output_tree' directly and save the mv?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or, more modern style: keep writing to ./output.tree, but let Galaxy handle the file transfer by using from_work_dir in <outputs><data>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see @bgruening 's original suggestion.

--threads \${GALAXY_SLOTS:-1} &&

mv ./output.tree '$output_tree'

]]></command>

<inputs>
<param name="input_tree"
type="data"
format="newick,nex,phyloxml,json"
label="Input tree"
help="Accepted formats: newick, nexus, phyloxml, nextstrain (json)" />

<param name="output_format"
type="select"
label="Select output tree format">
<option value="newick">Newick</option>
<option value="nexus">Nexus</option>
<option value="phyloxml">Phyloxml</option>
</param>
</inputs>

<outputs>
<data name="output_tree" format="txt" label="${tool.name} on ${on_string}">
<change_format>
<when input="output_format" value="newick" format="newick" />
<when input="output_format" value="nexus" format="nex" />
<when input="output_format" value="phyloxml" format="phyloxml" />
</change_format>
</data>
</outputs>

<tests>
<!-- Test 1: Convert from Newick to Nexus -->
<test expect_num_outputs="1">
<param name="input_tree" value="test.newick" ftype="newick"/>
<param name="output_format" value="nexus" />
<output name="output_tree" file="newick-to.nex">
<assert_contents> <has_text text="#NEXUS"/> </assert_contents>
</output>
</test>

<!-- Test 2: Convert from Nexus to PhyloXML -->
<test expect_num_outputs="1">
<param name="input_tree" value="test.nex" ftype="nex"/>
<param name="output_format" value="phyloxml" />
<output name="output_tree" file="newick-to.xml">
<assert_contents> <has_text text="phylogeny rooted="/> </assert_contents>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<assert_contents> <has_text text="phylogeny rooted="/> </assert_contents>
<assert_contents>
<has_text text="phylogeny rooted="/>
</assert_contents>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check indentation here

</output>
</test>

<!-- Test 3: Convert from PhyloXML to Newick -->
<test expect_num_outputs="1">
<param name="input_tree" value="test.xml" ftype="phyloxml"/>
<param name="output_format" value="newick" />
<output name="output_tree" file="phylo-to.newick" >
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from https://docs.galaxyproject.org/en/latest/dev/schema.html#tool-tests-test-output:

The functional test framework will execute the tool using the parameters defined in the tag sets and generate a temporary file, which will either be compared with the file named in the file attribute value or checked against assertions made by a child assert_contents tag

Since you're already doing a file-based exact comparison in this and all other tests, you shouldn't use any <assert_contents>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you could add, however, to <output> is the ftype attribute to check that not only the right content is produced but that the expected output format is set by Galaxy, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<output name="output_tree" file="phylo-to.newick" >
<output name="output_tree" file="phylo-to.newick" ftype="phyloxml" />

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw that you've been using sim_size before. So if an exact match is not what you want, you can remove the file check and keep only the <assert_contents> (just don't use both simultaneously).

You should then also remove the test files from the PR altogether since they won't be used anyway.

Depending on why exactly you needed sim_size a check with re_match_multiline as the comparison method may also be helpful.

<assert_contents> <has_text text="(((hCoV-19"/> </assert_contents>
</output>
Comment on lines +80 to +82
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<output name="output_tree" file="phylo-to.newick" >
<assert_contents> <has_text text="(((hCoV-19"/> </assert_contents>
</output>
<output name="output_tree" file="phylo-to.newick" >
<assert_contents>
<has_text text="(((hCoV-19"/>
</assert_contents>
</output>

</test>
</tests>

<help><![CDATA[
**GoTree Reformat**
Reformats an input tree file into different formats.

Input formats: Nexus, Newick, Phyloxml or Nextstrain
Output formats: Newick, Nexus or Phyloxml
]]></help>

<expand macro="citations" />
</tool>
25 changes: 25 additions & 0 deletions tools/gotree/macros.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<?xml version="1.0"?>
<macros>
<token name="@TOOL_VERSION@">0.4.5</token>
<token name="@VERSION_SUFFIX@">0</token>
<xml name="requirements">
<requirements>
<requirement type="package" version="@TOOL_VERSION@">gotree</requirement>
</requirements>
</xml>
<xml name="version_command">
<version_command>gotree version</version_command>
</xml>
<xml name="xrefs">
<xrefs>
<xref type="bio.tools">Gotree</xref>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<xref type="bio.tools">Gotree</xref>
<xref type="bio.tools">gotree</xref>

It's the unique bio.tools ID that you need to refer to, not the displayed name. No idea whether this is case-sensitive, but better play it safe.

</xrefs>
</xml>
<xml name="citations">
<citations>
<citation type="doi">10.1093/nargab/lqab075</citation>
<yield />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<yield />

</citations>
</xml>
</macros>

8 changes: 8 additions & 0 deletions tools/gotree/test-data/newick-to.nex
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#NEXUS
BEGIN TAXA;
DIMENSIONS NTAX=12;
TAXLABELS hCoV-19/Australia/QLD0x01518C/2025 hCoV-19/Denmark/DCGC-691554/2025 hCoV-19/Denmark/DCGC-691606/2025 hCoV-19/Finland/THL-02373/2025 hCoV-19/Germany/NW-RKI-I-1147997/2025 hCoV-19/Malaysia/MKAI-6710920/2025 hCoV-19/Netherlands/OV-RIVM-145269/2025 hCoV-19/Scotland/CLIMB-CM7YJPPK/2024 hCoV-19/Singapore/Y25R9MSC35/2025 hCoV-19/Sweden/T-60300744/2025 hCoV-19/Sweden/Y-22_SE100_25CS500200/2025 hCoV-19/Thailand/NIC_BKK_84/2025;
END;
BEGIN TREES;
TREE tree0 = (((hCoV-19/Australia/QLD0x01518C/2025:0.07699999999999818,(hCoV-19/Thailand/NIC_BKK_84/2025:0.04399999999986903,hCoV-19/Singapore/Y25R9MSC35/2025:0.021999999999934516):0.14500000000020918):0.06599999999980355,(hCoV-19/Malaysia/MKAI-6710920/2025:0.20199999999999818,(hCoV-19/Germany/NW-RKI-I-1147997/2025:0.24299999999993815,hCoV-19/Finland/THL-02373/2025:0.2899999999999636):0.02200000000016189):0.027999999999792635):0.026000000000067303,((hCoV-19/Netherlands/OV-RIVM-145269/2025:0.30599999999981264,(hCoV-19/Denmark/DCGC-691554/2025:0.038000000000010914,hCoV-19/Denmark/DCGC-691606/2025:0.038000000000010914):0.04899999999997817):0.018000000000029104,(hCoV-19/Scotland/CLIMB-CM7YJPPK/2024:0.04500000000007276,(hCoV-19/Sweden/T-60300744/2025:0.05499999999983629,hCoV-19/Sweden/Y-22_SE100_25CS500200/2025:0.3869999999999436):0.02500000000009095):0.021999999999934516):0.01700000000005275);
END;
116 changes: 116 additions & 0 deletions tools/gotree/test-data/newick-to.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
<?xml version="1.0" encoding="UTF-8"?>
<phyloxml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.phyloxml.org http://www.phyloxml.org/1.10/phyloxml.xsd"
xmlns="http://www.phyloxml.org">
<phylogeny rooted="true">
<clade>
<clade>
<branch_length>0.000249121</branch_length>
<clade>
<branch_length>0.0000829749</branch_length>
<clade>
<branch_length>0.0000531102</branch_length>
<clade>
<branch_length>0.000152922</branch_length>
<clade>
<branch_length>0.0000107679</branch_length>
<clade>
<branch_length>0.0000700188</branch_length>
<clade>
<name>'MN346698.1|chimp|Cote_dIvoire||2017'</name>
<branch_length>0.0000540728</branch_length>
</clade>
<clade>
<name>'KJ136820.1|chimp|Cote_dIvoire||2012'</name>
<branch_length>0.0000211467</branch_length>
</clade>
</clade>
<clade>
<name>'DQ011156.1|monkey|Liberia||1970'</name>
<branch_length>0.0000105738</branch_length>
</clade>
</clade>
<clade>
<name>'KP849470.1|human|Cote_dIvoire||1971'</name>
<branch_length>0.0000845594</branch_length>
</clade>
</clade>
<clade>
<name>'AY741551.1|human|Sierra_Leone||'</name>
<branch_length>0.000183429</branch_length>
</clade>
</clade>
<clade>
<name>'MT724769.1|swamp_rat|DRC||2012'</name>
<branch_length>0.000194718</branch_length>
</clade>
</clade>
<clade>
<name>'MT903346.1|pouched_rat|USA||2003'</name>
<branch_length>0.000384278</branch_length>
</clade>
</clade>
<clade>
<branch_length>0.000414804</branch_length>
<clade>
<name>'KJ642615|human|Nigeria||1978'</name>
<branch_length>0.000271011</branch_length>
</clade>
<clade>
<branch_length>0.000690131</branch_length>
<clade>
<name>'KJ642617|human|Nigeria|Abia|1971-04-14'</name>
<branch_length>0.0000368402</branch_length>
</clade>
<clade>
<branch_length>0.000142185</branch_length>
<clade>
<name>'MK783027|human|Nigeria|Rivers|2017-11-09'</name>
<branch_length>0.0000211831</branch_length>
</clade>
<clade>
<branch_length>0.0000631634</branch_length>
<clade>
<name>'MK783033|human|Nigeria|Rivers|2017-10-09'</name>
<branch_length>0.000047712</branch_length>
</clade>
<clade>
<name>'OP413718|human|United_Kingdom||2022-08'</name>
<branch_length>0.000189545</branch_length>
</clade>
</clade>
<clade>
<branch_length>0.0000684319</branch_length>
<clade>
<name>'MN648051|human|Israel|ex_Nigeria|2018-10-04'</name>
<branch_length>0.0000210763</branch_length>
</clade>
<clade>
<branch_length>0.000231667</branch_length>
<clade>
<name>'ON563414|human|USA||2022-05-19'</name>
<branch_length>0.0000000001</branch_length>
</clade>
<clade>
<name>'OP415257|human|United_Kingdom|ex_Nigeria|2022-08'</name>
<branch_length>0.0000210588</branch_length>
</clade>
<clade>
<branch_length>0.000259498</branch_length>
<clade>
<name>'B1.fastq.gz'</name>
<branch_length>0.0000377656</branch_length>
</clade>
<clade>
<name>'B2.fastq.gz'</name>
<branch_length>0.0000466743</branch_length>
</clade>
</clade>
</clade>
</clade>
</clade>
</clade>
</clade>
</clade>
</phylogeny>
</phyloxml>
1 change: 1 addition & 0 deletions tools/gotree/test-data/phylo-to.newick
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
(((hCoV-19/Australia/QLD0x01518C/2025:0.07699999999999818,(hCoV-19/Thailand/NIC_BKK_84/2025:0.04399999999986903,hCoV-19/Singapore/Y25R9MSC35/2025:0.021999999999934516):0.14500000000020918):0.06599999999980355,(hCoV-19/Malaysia/MKAI-6710920/2025:0.20199999999999818,(hCoV-19/Germany/NW-RKI-I-1147997/2025:0.24299999999993815,hCoV-19/Finland/THL-02373/2025:0.2899999999999636):0.02200000000016189):0.027999999999792635):0.026000000000067303,((hCoV-19/Netherlands/OV-RIVM-145269/2025:0.30599999999981264,(hCoV-19/Denmark/DCGC-691554/2025:0.038000000000010914,hCoV-19/Denmark/DCGC-691606/2025:0.038000000000010914):0.04899999999997817):0.018000000000029104,(hCoV-19/Scotland/CLIMB-CM7YJPPK/2024:0.04500000000007276,(hCoV-19/Sweden/T-60300744/2025:0.05499999999983629,hCoV-19/Sweden/Y-22_SE100_25CS500200/2025:0.3869999999999436):0.02500000000009095):0.021999999999934516):0.01700000000005275);
1 change: 1 addition & 0 deletions tools/gotree/test-data/test.newick
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
(((hCoV-19/Australia/QLD0x01518C/2025:0.07699999999999818,(hCoV-19/Thailand/NIC_BKK_84/2025:0.04399999999986903,hCoV-19/Singapore/Y25R9MSC35/2025:0.021999999999934516):0.14500000000020918):0.06599999999980355,(hCoV-19/Malaysia/MKAI-6710920/2025:0.20199999999999818,(hCoV-19/Germany/NW-RKI-I-1147997/2025:0.24299999999993815,hCoV-19/Finland/THL-02373/2025:0.2899999999999636):0.02200000000016189):0.027999999999792635):0.026000000000067303,((hCoV-19/Netherlands/OV-RIVM-145269/2025:0.30599999999981264,(hCoV-19/Denmark/DCGC-691554/2025:0.038000000000010914,hCoV-19/Denmark/DCGC-691606/2025:0.038000000000010914):0.04899999999997817):0.018000000000029104,(hCoV-19/Scotland/CLIMB-CM7YJPPK/2024:0.04500000000007276,(hCoV-19/Sweden/T-60300744/2025:0.05499999999983629,hCoV-19/Sweden/Y-22_SE100_25CS500200/2025:0.3869999999999436):0.02500000000009095):0.021999999999934516):0.01700000000005275):0;
4 changes: 4 additions & 0 deletions tools/gotree/test-data/test.nex
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#NEXUS
begin trees;
tree tree_1 = [&R] ((((((('MN346698.1|chimp|Cote_dIvoire||2017':5.40728e-05,'KJ136820.1|chimp|Cote_dIvoire||2012':2.11467e-05)[&label="Node7"]:7.00188e-05,'DQ011156.1|monkey|Liberia||1970':1.05738e-05)[&label="Node6"]:1.07679e-05,'KP849470.1|human|Cote_dIvoire||1971':8.45594e-05)[&label="Node5"]:0.000152922,'AY741551.1|human|Sierra_Leone||':0.000183429)[&label="Node4"]:5.31102e-05,'MT724769.1|swamp_rat|DRC||2012':0.000194718)[&label="Node3"]:8.29749e-05,'MT903346.1|pouched_rat|USA||2003':0.000384278)[&label="Node2"]:0.000249121,('KJ642615|human|Nigeria||1978':0.000271011,('KJ642617|human|Nigeria|Abia|1971-04-14':3.68402e-05,('MK783027|human|Nigeria|Rivers|2017-11-09':2.11831e-05,('MK783033|human|Nigeria|Rivers|2017-10-09':4.77120e-05,'OP413718|human|United_Kingdom||2022-08':0.000189545)[&label="Node12"]:6.31634e-05,('MN648051|human|Israel|ex_Nigeria|2018-10-04':2.10763e-05,('ON563414|human|USA||2022-05-19':1.00000e-10,'OP415257|human|United_Kingdom|ex_Nigeria|2022-08':2.10588e-05,('B1.fastq.gz':3.77656e-05,'B2.fastq.gz':4.66743e-05)[&label="Node16"]:0.000259498)[&label="Node14"]:0.000231667)[&label="Node13"]:6.84319e-05)[&label="Node10"]:0.000142185)[&label="Node9"]:0.000690131)[&label="Node8"]:0.000414804)[&label="Node1"];
end;
87 changes: 87 additions & 0 deletions tools/gotree/test-data/test.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
<?xml version="1.0" encoding="UTF-8"?>
<phyloxml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.phyloxml.org http://www.phyloxml.org/1.10/phyloxml.xsd"
xmlns="http://www.phyloxml.org">
<phylogeny rooted="true">
<clade>
<clade>
<branch_length>0.026000000000067303</branch_length>
<clade>
<branch_length>0.06599999999980355</branch_length>
<clade>
<name>hCoV-19/Australia/QLD0x01518C/2025</name>
<branch_length>0.07699999999999818</branch_length>
</clade>
<clade>
<branch_length>0.14500000000020918</branch_length>
<clade>
<name>hCoV-19/Thailand/NIC_BKK_84/2025</name>
<branch_length>0.04399999999986903</branch_length>
</clade>
<clade>
<name>hCoV-19/Singapore/Y25R9MSC35/2025</name>
<branch_length>0.021999999999934516</branch_length>
</clade>
</clade>
</clade>
<clade>
<branch_length>0.027999999999792635</branch_length>
<clade>
<name>hCoV-19/Malaysia/MKAI-6710920/2025</name>
<branch_length>0.20199999999999818</branch_length>
</clade>
<clade>
<branch_length>0.02200000000016189</branch_length>
<clade>
<name>hCoV-19/Germany/NW-RKI-I-1147997/2025</name>
<branch_length>0.24299999999993815</branch_length>
</clade>
<clade>
<name>hCoV-19/Finland/THL-02373/2025</name>
<branch_length>0.2899999999999636</branch_length>
</clade>
</clade>
</clade>
</clade>
<clade>
<branch_length>0.01700000000005275</branch_length>
<clade>
<branch_length>0.018000000000029104</branch_length>
<clade>
<name>hCoV-19/Netherlands/OV-RIVM-145269/2025</name>
<branch_length>0.30599999999981264</branch_length>
</clade>
<clade>
<branch_length>0.04899999999997817</branch_length>
<clade>
<name>hCoV-19/Denmark/DCGC-691554/2025</name>
<branch_length>0.038000000000010914</branch_length>
</clade>
<clade>
<name>hCoV-19/Denmark/DCGC-691606/2025</name>
<branch_length>0.038000000000010914</branch_length>
</clade>
</clade>
</clade>
<clade>
<branch_length>0.021999999999934516</branch_length>
<clade>
<name>hCoV-19/Scotland/CLIMB-CM7YJPPK/2024</name>
<branch_length>0.04500000000007276</branch_length>
</clade>
<clade>
<branch_length>0.02500000000009095</branch_length>
<clade>
<name>hCoV-19/Sweden/T-60300744/2025</name>
<branch_length>0.05499999999983629</branch_length>
</clade>
<clade>
<name>hCoV-19/Sweden/Y-22_SE100_25CS500200/2025</name>
<branch_length>0.3869999999999436</branch_length>
</clade>
</clade>
</clade>
</clade>
</clade>
</phylogeny>
</phyloxml>