Skip to content

Potential optimization in the RISC-V example #16

@mbty

Description

@mbty

I noticed that the getFields function of the RISC-V example often gets called just to get the value of a single field (with calls along the line of let funct3 := get(getFields(inst), funct3)). Since fetching the values of all the fields just to extract the value of one of them looked a bit overkill, I tried replacing getFields with individual getXxx functions (e.g. getFunct3). This provides a decent boost to simulation performance:

-- Running tests with Cuttlesim --
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/rvbench_qsort.rv32 -1
[before] real: 0m0.018s	user: 0m0.017s	sys: 0m0.001s
[after ] real: 0m0.013s	user: 0m0.013s	sys: 0m0.000s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/rvbench_median.rv32 -1
[before] real: 0m0.012s	user: 0m0.012s	sys: 0m0.000s
[after ] real: 0m0.008s	user: 0m0.007s	sys: 0m0.001s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/img.rv32 -1
[before] real: 0m0.216s	user: 0m0.195s	sys: 0m0.020s
[after ] real: 0m0.160s	user: 0m0.142s	sys: 0m0.018s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/primes.rv32 -1
[before] real: 0m4.421s	user: 0m4.417s	sys: 0m0.001s
[after ] real: 0m3.090s	user: 0m3.084s	sys: 0m0.001s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/morse.rv32 -1
[before] real: 0m1.584s	user: 0m1.583s	sys: 0m0.000s
[after ] real: 0m1.086s	user: 0m1.085s	sys: 0m0.000s
-- Running tests with Verilator --
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/rvbench_qsort.vmh -1
[before] real: 0m0.033s	user: 0m0.032s	sys: 0m0.001s
[after ] real: 0m0.030s	user: 0m0.030s	sys: 0m0.000s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/morse.vmh -1
[before] real: 0m2.689s	user: 0m2.687s	sys: 0m0.000s
[after ] real: 0m2.186s	user: 0m2.184s	sys: 0m0.000s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/rvbench_median.vmh -1
[before] real: 0m0.022s	user: 0m0.022s	sys: 0m0.000s
[after ] real: 0m0.018s	user: 0m0.017s	sys: 0m0.001s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/primes.vmh -1
[before] real: 0m8.358s	user: 0m8.349s	sys: 0m0.001s
[after ] real: 0m6.778s	user: 0m6.767s	sys: 0m0.001s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/img.vmh -1
[before] real: 0m0.399s	user: 0m0.385s	sys: 0m0.014s
[after ] real: 0m0.327s	user: 0m0.310s	sys: 0m0.017s

This results in a speedup of roughly 1.43 for Cuttlesim and of 1.23 for Verilator compared to master. I did not yet check the effects on synthesis but I might do it soon. I assume that this kind of optimization is applied automatically when synthesizing (isn't it?), so I don't expect changes on this side (alas). See commit 04cfdf8 for a demonstration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions