VCOMP - a Verilog Compiler

Home

Download

Documentation

FAQ

Report a Bug

SourceForge page

11. SMP Support

NOTE: at this early time we don't consider our SMP runtime ready for prime time - it is disabled in the current binaries - this will change.

11.1 Is SMP a good idea for your simulation?

vcomp was engineered from scratch to provide support for simulation on multiple CPU systems. This can allow the runtime of a simulation to be shorter.

However no matter how hard we try and keep the CPUs away from each other synchronization overheads always cost you more than you expect - a good rule of thumb is that an SMP program running on dual processor probably gets between 1.5 and 1.7 the performance of the same program running on a single processor. So even though we're proud of our SMP implementation we recommend that for best bang for the buck, if you have a relatively small simulation, you should run 2 copies at once rather than one running SMP, you can ensure this by passing the '-Do' runtime flag to the runtime binary.

If you have a self-checking testbed where the second part is a separate program (perhaps perl or C) and the runtime of the checking portion of the testbed is comparable to the simulation you may be able to get a similar speedup by checking on-the-fly, perhaps by piping the simulation's output to the checker.

11.2 Making your simulation suitable for SMP speedups

A thread is a flow of control - a bunch of things waiting to be executed. Threads are mapped to physical processors when they are executed in an SMP system more than one thread may be mapped to a different physical processor at the same time.

In our implementation there are a LOT of threads - for example every always statement in every instance of a module becomes a thread - many wires have one or more threads embedded in them, so do assign statements. vcomp makes very light-weight threads, light-weight even by traditional Unix standards

vcomp will never map 2 threads within the same module instance to different physical processors at the same time. This means that if you only have one module in your simulation you will never see SMP speedup

11.3 Events

In order to minimize synchronization overhead instead of maintaining a single global event (or thread) queue vccomp maintains many event queues (worst case one per instanced module - but usually groups of modules close to each other in the instance hierarchy are collected into gangs that share an event queue). This may mean that the ordering of events within a particular time slice may be different from what you have seen on other simulators.

Differences between simulators in the ordering that events are scheduled are normal - the Verilog® standard is careful not to specify 0-time event ordering other than around constructs like non-blocking assignment (<=), 0 delays (#0), $strobe etc.

Although vcomp implements many parallel event queues it is careful to unify them into a global stratified queue as per the standard Verilog® model so that the above constructs, and all well constructed Verilog® simulate correctly.

If you do see problems moving from simulator to simulator it often indicates as-yet-undetected race conditions in your simulation that have been masked by a particular simulator's event ordering. Sometimes these sorts of problems indicate a real race condition in your simulation that may also occur in real logic - it's best to find and fix them rather than avoid them!

11.4 PLIs

Our PLI subsystem is single threaded - only one thread is allowed into it at any one time - it was designed this way because we didn't want to break existing PLI programs or require the addition of synchronization primitives.

If your simulations spend lots of time in PLI calls the overhead of waiting for other PLI calls to complete may be so great that you see little or no SMP speed up.

When a thread stalls within a PLI call because the PLI routine makes an IO call that waits or does a sleep() etc the other threads may continue until they also need to make a PLI call, or there are no more events to be scheduled - simulation time will not advance while a thread is stalled in a PLI call.

If your simulation stalls out listening to sockets and you depend on it to become idle you may wish to disable SMP simulation.

Previous Contents Next