An explanation of Metastability for logic designers
By Paul Campbell
Metastability seems to be one of the most misunderstood, and even ignored
issues in modern day logic design - this short article is intended to be
an introduction to the subject.
NOTE: all the following is written with the assumption that we are talking
about flops that latch on the rising edge of their clocks - just swap everything
about in your mind for negative edge flops ...
What is Metastability?
In essence it's something that can happen inside a flop when you miss meeting its setup
and/or hold times.
In a well designed synchronous system, with well characterized components this should
never happen.
Consider how a simple flop works internally:
This is a positive edge flop. While the CLK is low (just prior to latching) the D input
drives through open transmission gate TM into the start of the master portion, it overwhelms the weak driver from
M2, M1 inverts the signal which in turn drives M2 removing any conflict at the input to M1.
When the clock edge rises transmission gate TM closes and TS opens - this isolates
the value circulating through M1/M2 from any changes at D and allows the value at
the output of M1 to overpower the weak output of S2 driving the latched value into the slave
side of the flop where it circulates through S1/S2.
When the clock goes low again TM opens up and TS closes, the slave side keeps the data
on the output Q and the master side starts to recirculate new values from D through
the M1/M2 loop.
So - what happens if you don't meet setup on the rising edge of the clock? Imagine that
the input D suddenly rises from 0 to 1 just at the rising edge of CLK - just as
transmission gate TM is closing, there's a delay through M1 and M2 and it's possible that
the output of M1 reaches a 1 just as TM closes and M2 immediately pulls its output to 0. Now you have a 1
at the output of M1 and a 0 at the output of M2 - in fact you have a sort of rather unstable
ring oscillator - the mismatched values may circulate around the loop for an undefined
time depending on the initial conditions and the gain of the various components.
On the falling edge of the clock there's a small (statistical)
chance that the slave latch will be left in
a similar state - and even if not the clk-to-Q settling time of the whole flop has been
compromised - this may lead to following flops suffering similar problems. It can also
lead to noise at unexpected frequencies in your logic.
Actually it's worse than this since it all rapidly moves to an analog rather than
digital sort of process, transistors are switching faster than they are designed to and
not meeting threshold voltages - because of process variations it's not possible to
predict how any particular flop in a particular chip will behave - all Spice can tell
you is what the worst might be and maybe how often it will happen.
How do you stop Metastability?
Well for synchronous logic the answer is simple - always meet your setup and hold
times.
But there is a class of designs where this is impossible - those where signals must cross clock domain boundaries
and the frequency relationship between the clocks is such that your logic cannot
determine 'safe' times for signals to be sampled (usually this is possible, with care, for signals that
are well defined multiples of each other). In these cases metastability is not only possible but
is in fact unavoidable - all you can do is to try and reduce the chance of it happening.
The first thing to do is to make sure that the signal you are synchronizing is glitch-free - this
reduces the window in which it can be mis-sampled in the second clock domain - the usual way
to do this is to drive it directly out of a flop in the first clock domain, not through combinatorial
logic.
Next make sure it has a fast rise/fall time, drive it from a high-drive flop and don't run
it on a wire across the die. Again this reduces the sampling window.
Run the signal directly into a flop in the second clock domain - if possible use a
special 'synchronizer flop' this is a flop designed to reduce the chance of metastability
problems - often it will be marked in the library has having a very long clk-to-Q, at least
1/2 a clock period because they want to wait for the slave side of the flop to latch
before having a 'valid' output.
Because of the small statistical possibility of the slave side of a flop also staying in
a metastable state it's statistically possible that the output of the flop will itself be in
an unknown metastable state - to guard against this you can run your synchronizer flop
through a chaining of synchronizer flops - each time the statistical likelihood of
the metastability being passed on will decrease (but will NOT go to 0). Make sure that these flops
are close together and that their output wires have fast rise times. The down side of
this is that you add latency to the path you are synchronizing.
Calculating MTBF
Eventually you will need to pass your synchronized signal into the logic in your second clock
domain - there's a small - but calculatable statistical chance that this signal will pass
the metastable state into the flops in your second clock domain - there's NOTHING you can do
about this - if you move data from one random clock domain to another there will be synchronizer failure
and your logic will fail - the question is how often will it fail - and how can you reduce the
chance of failure?
First you can use special synchronizer flops - they get into metastable states less often.
Next you can uses chains of synchronizer flops as described above to further reduce the
chance of failure.
You can also gate the outputs of synchronizer flops until you need to sample them
if your design allows you to predict when this will be (this further reduces the window
where a metastable signals can propagate into the rest of your logic).
Your library designer should be able to calculate the chance of synchronizer failure
for a particular flop clocked at a particular frequency sampling another signal
at another frequency with a particular rise time (it's usually a pain they often
don't like to do it) - the results will be a failure rate for a flop - say 0.01%
a chain of 3 flops might then get a failure rate of 0.01*0.01*0.01% (it's not really
quite that simple - probably it should be spiced too).
The result of this is the
statistical chance of a particular path propagating metastability into your design.
You can change this by changing the number of synchronizers, the frequency ratios etc
If you know a signal that's being sampled isn't changing on every clock edge (say every
16th clock edge) you can use that frequency instead even though it's being
driven from a faster clock.
Now suppose you have a 0.00001% chance of a path failing on a particular clock
edge your MTBF for a 1MHz clock is going to be 100,000 clocks or about 1/10 of
a second - obviously you are probably going to need a much smaller chance of failure.
Now you know the chance of a particular path failing - count the number of
paths - add up all their chances of failure to get a reasonable approximation
of the chance of your whole chip failing (MTBF).
Finally think about the consequences of a chip failure - is there a person on a hospital bed
connected to it? is it a IO device on a PC where the user will just swear once a year
at a wedged system and reboot? is your IO device a disk drive and you're quietly mangling
a customer's data? is the MTBF of your device significantly lower than other portions of the system
(for example if Windows crashes every day or so would anyone even notice)? Are lawyers
likely to get involved? etc etc This is an area where you have to make some bets on your
company's future (best to have your boss sign off on this one :-).
Synchronizing multiple bit values
This is really a whole topic in its own right - but it is related to the issue
of synchronizer failure.
Imagine you're building a fifo that's moving data between clock domains, you're
writing data in one clock domain and reading it in another - somewhere deep
inside there are two pointers one shows where data is written in one
domain, another shows where it
is read in the other - in essence they are a bunch of flops each clocked in it's respective
domain.
Now imagine that the FIFO needs and indicator to the write port as to whether there is room
in the FIFO for more data to be written - this can be figured out by comparing the values of the two pointers
(read side has a similar issue in determining if data is available to be read).
The problem with comparing the two pointers is that they are in different clock domains,
if you sample the read pointer in the write clock domain you might sample it at just
the point that it's changing causing metastability to be injected into your logic
so the obvious first thing you must do is to synchronize the bits of the read pointer though a
set of synchronizer flops into the write clock domain before you compare them with
the write pointer.
The problem is that this doesn't work - even if you reduce the chance of metastability to acceptable
levels you still may sample the data wrongly, consider the case where the read pointer has the binary
value 0111 and is in the process of changing to 1000 at the time where it's sampled
if each bit is separately synchronized some bits may be sampled just before the change while others
might be sampled after the change - because all the bits are changing in this case you might
sample the read pointer as 0000 or 1111 or in fact as ANY possible combination of bits.
So how do you do it? the most important thing to realize is that you have to be synchronizing
only 1 bit in order to pass the information safely to the other side. A common way to do this
is to have logic that when the read pointer changes takes a copy that remains stable
until the logic in the other clock domain has sampled it. For example in Verilog:
reg wr_ack; // write side copy of the ack state
wire sync_wr_ack; // wr_ack synchronized into the read domain
synchronizer_flops s1(.clk(rdclk), .in(wr_ack), .out(sync_wr_ack));
reg rd_wr_ack; // read side copy of the ack state
reg rd_send; // write side copy of the send state
wire sync_rd_send; // rd_send synchronized into the rite domain
synchronizer_flops s2(.clk(wrclk), .in(rd_send), .out(sync_rd_send));
reg wr_rd_send; // read side copy of the send state
reg [ 3: 0]rd_pnt; // read pointer
reg [ 3: 0]rd_pnt_copy; // read pointer copy - remains stable during synchronization
reg [ 3: 0]wr_rd_pnt; // read pointer copy - synchronized to the write domain
always @(posedge rdclk)
if (rd_reset) begin
rd_pending = 0;
rd_pnt_copy = rd_pnt;
rd_wr_ack = sync_wr_ack;
rd_send = 0;
end else // if we detect we should send a new pointer
if (rd_pnt != rd_pnt_copy && (!rd_pending || sync_wr_ack != rd_wr_ack)) begin
rd_pnt_copy = rd_pnt; // copy the pointer
rd_wr_ack = sync_wr_ack; // remember the ack state
rd_pending = 1; // remember we're sending
rd_send = ~rd_send; // signal to the other side
end
always @(posedge wrclk)
if (wr_reset) begin
wr_rd_pnt = rd_pnt_copy;
wr_ack = 0;
wr_rd_send = sync_rd_send;
end else
if (wr_rd_send != sync_rd_send) begin // if we detect a send transition
wr_rd_pnt = rd_pnt_copy; // copy the pointer
wr_rd_send = sync_rd_send; // remember the send state
wr_ack = ~wr_ack; // send an ack
end
Notice that we're signalling with EDGEs here - not levels - this is important -
not only does it make for simpler logic - but it halves the bandwidth of the
signal being synchronized reducing the chances of metastability.
The above is not the only way to perform this operation - just an example. The main downside
of this sort of mechanism is the latency (you must synchronize data in both directions
before you can pass more onward. Reducing the chances of metastability almost always
comes at the expense latency. One result in the FIFO example is that some FIFO
entries may be unusable at some times - effectively making the FIFO shorter that you
had expected.
One solution that I like to this is to decode the fifo pointers to a one hot state:
reg [15:0]onehot;
always @(posedge rdclk) begin
onehot = 0;
onehot[rd_pnt] = 1;
end
wire [15:0]sync_onehot;
synchronizer_flops s00(.clk(wrclk), .in(onehot[0]), .out(sync_onehot[0));
....
synchronizer_flops s15(.clk(wrclk), .in(onehot[15]), .out(sync_onehot[15));
always @(posedge wrclk)
casez(sync_onehot)
16'b????_????_????_??01: wr_rd_pnt = 0;
16'b????_????_????_?01?: wr_rd_pnt = 1;
16'b????_????_????_01??: wr_rd_pnt = 2;
....
16'b1???_????_????_???0: wr_rd_pnt = 15;
default: wr_rd_pnt = wr_rd_pnt;
endcase
This can use a lot more gates but it effectively reduces the signalling
rate on the signal lines (reducing the failure rate as the number of synchronizers
increases).
It works by being careful about what happens with failures. Consider what happens when
rd_pnt goes from 0 to 1, bits 1:0 of onehot go from 01 to 10 (11 and 00 are illegal
combinations). When sampled on the write side however all 4 combinations of bits 1:0
might be sampled - for 10 and 01 the right thing happens, for 11 the case statement above
reads it as if 10 was detected and if 00 is sampled nothing happens - the output
retains it's old value (presumably decode from 00). Care must be taken here as
it's possible that with a fast enough read clock you might see multiple 00s - but this OK because
eventually the fifo will empty and the read pointer will stop changing. If this is a problem
you can instead stuff 'onehot' with a value where 1/2 the bits are 1 and 1/2 are 0 and
they chase themselves around.
Another possible solution to the multi-bit synchronizer problem is to make sure that only one bit
in the source changes per source clock - and make sure the destination can handle the possible
combinations it might see with arbitrary synchronization. For the FIFO example above this might
involve using a gray-code counter (which only changes 1 bit/clock with all the possible
mis-synchronizations giving a correctly advancing counter) - in my experience the down side
is that if you want to calculate the difference between pointers you have to convert back to
a more normal coding.
But wait .... I use asynchronous resets ...
And I bet they work MOST of the time too .....
These can fail - the reset signal can
go away just before a clock edge causing one of the internal
feedback loops to enter a metastable state (often they are
implemented with a circuit that is roughly the equivalent of
the one shown above but with one of the inverters in each half
replaced with a nand or nor gate)
"But everyone uses asynchronous resets right?" Well a lot of people
still do - and they mostly seem to work - the reason for this
is that they are usually driven from an external reset
signal - which has a VERY low frequency - like in the order of hours, days
or years so while the chances of metastability in the flop may be low
the chances of it actually happening are really low (like longer
than the chip's expected lifetime). This is an example of
where figuring out the frequency of the thing you are synchronizing
when calculating MTBF can be really useful.
But - if you are using asynchronous reset for something other than a power-on
reset - from a signal with a frequency that isn't glacial you may have a real
potential problem - time to do some math - or redesign.
There's a second problem with async resets that's akin to the synchronization
of multiple bits problem described above - if you have multiple flops with asynchronous
resets it's quite possible that the removal of reset (you probably don't care about
the application of reset) may not be recognized by all the flops in the same
clock - in this case it may be worthwhile having a circuit that keeps
things idle for a couple of clocks after reset (of course that would be
a synchronous reset circuit .... so why not just synchronize the reset
signal through 1 flop just like any other signal and use that to drive
synchronous resets to the rest of the flops).
Even though I've described the evils of asynchronous resets there are still
places where they are genuinely useful - provided you take great care
about how they are used. Often these situations are where a chip must do
something before a clock is going to be reliably available - for example
if you are using 1-hot transmission-gate muxes and don't want to get them
into a bad high-current state after power on
|