Home Download Documentation FAQ Report a Bug SourceForge page
Previous Contents Next

Optimizing your simulation for vcomp

Overview

vcomp is a Verilog® compiler, not an interpreter - this means that things that can be determined at compile time work especially well while things that require runtime resolution tend to be much slower.

This section is intended to give you some overview of what performs well and what does not so that you can get the fastest possible simulations

Most of these issues also apply to other similar Verilog® compilers.

Bad Things

  • scalared multi-bit nets - see the note below under 'good things'
  • expressions other than a simple variable name in an event expression ( @(...) ) sensitivity list require the generation of extra code approx. equivalent to creating a wire and 'assign' to evaluate the expression.
  • force/assign - forcing or assigning to a register-type variable is rather expensive - it takes a lot more time than a simple assignment
  • <= vs = - use of <= means that the compiler cannot perform inter-assignment optimizations past the point where the non-blocking assignment is performed, so, if possible, you should group any non-blocking assignments together, On the other hand '<= #N' is only slightly more expensive than '= #N' - '<= #N' can require the creation of arbitrary amounts of storage if the non-blocking assignments are performed at a higher rate than the delay - vcomp optimizes for the much more common single outstanding non-blocking store case.
  • disable is expensive if you attempt to disable something that's not above you in your hierarchy - disable to quit a loop is cheap, disabling something in another always block is much more expensive, disabling a task is even more expensive (and if it's a possibility requires extra book-keeping on the part of the compiler every time the task is called)
  • recursive task/function calls - Verilog® is blessed with a memory model that is mostly static - ie at the time a simulation is created you can mostly figure out how much storage will be required. vcomp works really hard to keep memory footprints as small as possible - thread stack spaces are usually in the 10s of bytes - this helps keep simulation sizes down as well as giving better cache/TLB performance. However one of the few places in the language where stack space can be of arbitrary size is when you have recursive task or function calls - fortunately these are seldom used in Verilog® nor are they of much use (due to Verilog®'s curious globalized parameters). When vcomp detects a recursive subroutine call it allocates a 4k stack segment if your program crashes and you need more, you can use the -s compile-time flag to allocate more stack space
  • vcomp highly optimizes the value-change mechanism that's used for PLI VCDs as well as internally for event expressions etc. If a register variable appears in an event sensitivity list or equivalent (for example the RHS of an assign statement) then extra memory must be allocated for it and assignment to it will be much more expensive - many compiler optimizations will also be disabled across the assignment. Some things like including $dumpvars or equivalent routines that require waveform viewers/dumpers to be able to attach VCD callbacks to every object in a simulation are VERY expensive when running - and still somewhat expensive if compiled in and not called - for example if you include $dumpvars in your simulation and don't call it vcomp must still allocate all the VCD overhead for every variable, disable all inter-assignment optimizations and add the VCD checks to every assignment - this can slow things down a lot - so if you want to have $dumpvars `ifdef it out and recompile when you need it, (perhaps build 2 binaries every time you do a compile). Wires don't suffer from this VCD overhead as much as register-type variables do.
  • wires are more expensive than regs - this is true in general - wires need more support (storage etc) than register variables and are not able to take part in as many optimizations that register variables can
  • hierarchical references - assignments to things outside the scope of the current module are somewhat more expensive - mostly due to the overhead of SMP synchronization
  • tran/tranif0/tranif1 etc - frankly trans are evil .... they slow down the evaluation of the nets they are attached to by a factor of 2-3

Good Things

  • Wires are optimized for the single driver case
  • We attempt to keep vectored (multi-bit) wires vectored wherever possible, this is because the simulate much faster. If you do not use the 'vectored' or 'scalared' keywords then vcomp will attempt to keep a net as vectored - if you pass a selection of a wire (ie one bit or a range of bits) to a module primitive or gate then vcomp is forced to split the vectored net into a bunch of wires - an N-bit scalared wire will take almost N times the effort an N-bit wire will take

Traps ...

The main problem you have to look out for is code that depends on event ordering, this is almost always somewhat different between Verilog® implementations, because of our SMP implementation it can be even more unpredictable (even simulation run to simulation run because of the randomness of who wins during SMP locking). If you think you are suffering from such an event ordering problem try running your simulation with the -Do flag and see if the randomness goes away (ie it always succeeds or always breaks the same way rather than each simulation working differently). Note: such a simulation problem is almost always an indication of a 0-time race in your design - you're probably not pipelining something correctly or something similar - this can be a genuine bug and it's something that should be tracked down and fixed and not ignored - the fact that SMP operation allows you to find these bugs is a good thing - even though they are often really hard to find and fix.

The one situation where you might find a genuine event ordering difference in a design is a place where signals are crossing a clock boundary where the clocks periods are nor synchronized with respect to each other - in this situation in the real world the order in which signals are resolved is undefined anyway - since this is the simulator equivalent of metastability you probably want some randomness to check your own synchronizer circuits (you don't have any? time to raise a red flag!).

Finally we've also seen people have problems with the following sort of construct:


	output	x;
	reg x;

	always @(...) begin

		x = ....;
		....
		if (x)
			....
	end

This happens to work as expected with some simulators and not with others, it probably doesn't with ours (at least in the first release), it's actually buggy code and not portable, the problem is that when an output of a module is also declared as a 'reg' the language really defines it to be something like:

	output	x_ext;
	reg x_int;
	assign x_ext = x_int;

	always @(...) begin

		x_int = ....;
		....
		if (x_ext)
			....
	end

All writes to the 'reg' go to the 'x_int' value and are propagated to the external net, not always immediately, reads to 'x' however always go to the net. If you write to a registered output and read it again in the same always statement the value may not yet have been propagated and the result you read may be stale.

It's easy to miss this sort of bug, better to avoid registered outputs if you want to read them in the same block, and instead explicitly make a 'reg' and a 'output' and assign them together - that way you can read the internal registered value directly which is probably what you intended.


Previous Contents Next