VCOMP - a Verilog Compiler

Home

Download

Documentation

FAQ

Report a Bug

SourceForge page

An explanation of Metastability for logic designers

By Paul Campbell

Metastability seems to be one of the most misunderstood, and even ignored issues in modern day logic design - this short article is intended to be an introduction to the subject.

NOTE: all the following is written with the assumption that we are talking about flops that latch on the rising edge of their clocks - just swap everything about in your mind for negative edge flops ...

What is Metastability?

In essence it's something that can happen inside a flop when you miss meeting its setup and/or hold times.

In a well designed synchronous system, with well characterized components this should never happen.

Consider how a simple flop works internally:

This is a positive edge flop. While the CLK is low (just prior to latching) the D input drives through open transmission gate TM into the start of the master portion, it overwhelms the weak driver from M2, M1 inverts the signal which in turn drives M2 removing any conflict at the input to M1.

When the clock edge rises transmission gate TM closes and TS opens - this isolates the value circulating through M1/M2 from any changes at D and allows the value at the output of M1 to overpower the weak output of S2 driving the latched value into the slave side of the flop where it circulates through S1/S2.

When the clock goes low again TM opens up and TS closes, the slave side keeps the data on the output Q and the master side starts to recirculate new values from D through the M1/M2 loop.

So - what happens if you don't meet setup on the rising edge of the clock? Imagine that the input D suddenly rises from 0 to 1 just at the rising edge of CLK - just as transmission gate TM is closing, there's a delay through M1 and M2 and it's possible that the output of M1 reaches a 1 just as TM closes and M2 immediately pulls its output to 0. Now you have a 1 at the output of M1 and a 0 at the output of M2 - in fact you have a sort of rather unstable ring oscillator - the mismatched values may circulate around the loop for an undefined time depending on the initial conditions and the gain of the various components.

On the falling edge of the clock there's a small (statistical) chance that the slave latch will be left in a similar state - and even if not the clk-to-Q settling time of the whole flop has been compromised - this may lead to following flops suffering similar problems. It can also lead to noise at unexpected frequencies in your logic.

Actually it's worse than this since it all rapidly moves to an analog rather than digital sort of process, transistors are switching faster than they are designed to and not meeting threshold voltages - because of process variations it's not possible to predict how any particular flop in a particular chip will behave - all Spice can tell you is what the worst might be and maybe how often it will happen.

How do you stop Metastability?

Well for synchronous logic the answer is simple - always meet your setup and hold times.

But there is a class of designs where this is impossible - those where signals must cross clock domain boundaries and the frequency relationship between the clocks is such that your logic cannot determine 'safe' times for signals to be sampled (usually this is possible, with care, for signals that are well defined multiples of each other). In these cases metastability is not only possible but is in fact unavoidable - all you can do is to try and reduce the chance of it happening.

The first thing to do is to make sure that the signal you are synchronizing is glitch-free - this reduces the window in which it can be mis-sampled in the second clock domain - the usual way to do this is to drive it directly out of a flop in the first clock domain, not through combinatorial logic.

Next make sure it has a fast rise/fall time, drive it from a high-drive flop and don't run it on a wire across the die. Again this reduces the sampling window.

Run the signal directly into a flop in the second clock domain - if possible use a special 'synchronizer flop' this is a flop designed to reduce the chance of metastability problems - often it will be marked in the library has having a very long clk-to-Q, at least 1/2 a clock period because they want to wait for the slave side of the flop to latch before having a 'valid' output.

Because of the small statistical possibility of the slave side of a flop also staying in a metastable state it's statistically possible that the output of the flop will itself be in an unknown metastable state - to guard against this you can run your synchronizer flop through a chaining of synchronizer flops - each time the statistical likelihood of the metastability being passed on will decrease (but will NOT go to 0). Make sure that these flops are close together and that their output wires have fast rise times. The down side of this is that you add latency to the path you are synchronizing.

Calculating MTBF

Eventually you will need to pass your synchronized signal into the logic in your second clock domain - there's a small - but calculatable statistical chance that this signal will pass the metastable state into the flops in your second clock domain - there's NOTHING you can do about this - if you move data from one random clock domain to another there will be synchronizer failure and your logic will fail - the question is how often will it fail - and how can you reduce the chance of failure?

First you can use special synchronizer flops - they get into metastable states less often. Next you can uses chains of synchronizer flops as described above to further reduce the chance of failure.

You can also gate the outputs of synchronizer flops until you need to sample them if your design allows you to predict when this will be (this further reduces the window where a metastable signals can propagate into the rest of your logic).

Your library designer should be able to calculate the chance of synchronizer failure for a particular flop clocked at a particular frequency sampling another signal at another frequency with a particular rise time (it's usually a pain they often don't like to do it) - the results will be a failure rate for a flop - say 0.01% a chain of 3 flops might then get a failure rate of 0.01*0.01*0.01% (it's not really quite that simple - probably it should be spiced too).

The result of this is the statistical chance of a particular path propagating metastability into your design. You can change this by changing the number of synchronizers, the frequency ratios etc If you know a signal that's being sampled isn't changing on every clock edge (say every 16th clock edge) you can use that frequency instead even though it's being driven from a faster clock.

Now suppose you have a 0.00001% chance of a path failing on a particular clock edge your MTBF for a 1MHz clock is going to be 100,000 clocks or about 1/10 of a second - obviously you are probably going to need a much smaller chance of failure.

Now you know the chance of a particular path failing - count the number of paths - add up all their chances of failure to get a reasonable approximation of the chance of your whole chip failing (MTBF).

Finally think about the consequences of a chip failure - is there a person on a hospital bed connected to it? is it a IO device on a PC where the user will just swear once a year at a wedged system and reboot? is your IO device a disk drive and you're quietly mangling a customer's data? is the MTBF of your device significantly lower than other portions of the system (for example if Windows crashes every day or so would anyone even notice)? Are lawyers likely to get involved? etc etc This is an area where you have to make some bets on your company's future (best to have your boss sign off on this one :-).

Synchronizing multiple bit values

This is really a whole topic in its own right - but it is related to the issue of synchronizer failure.

Imagine you're building a fifo that's moving data between clock domains, you're writing data in one clock domain and reading it in another - somewhere deep inside there are two pointers one shows where data is written in one domain, another shows where it is read in the other - in essence they are a bunch of flops each clocked in it's respective domain.

Now imagine that the FIFO needs and indicator to the write port as to whether there is room in the FIFO for more data to be written - this can be figured out by comparing the values of the two pointers (read side has a similar issue in determining if data is available to be read).

The problem with comparing the two pointers is that they are in different clock domains, if you sample the read pointer in the write clock domain you might sample it at just the point that it's changing causing metastability to be injected into your logic so the obvious first thing you must do is to synchronize the bits of the read pointer though a set of synchronizer flops into the write clock domain before you compare them with the write pointer.

The problem is that this doesn't work - even if you reduce the chance of metastability to acceptable levels you still may sample the data wrongly, consider the case where the read pointer has the binary value 0111 and is in the process of changing to 1000 at the time where it's sampled if each bit is separately synchronized some bits may be sampled just before the change while others might be sampled after the change - because all the bits are changing in this case you might sample the read pointer as 0000 or 1111 or in fact as ANY possible combination of bits.

So how do you do it? the most important thing to realize is that you have to be synchronizing only 1 bit in order to pass the information safely to the other side. A common way to do this is to have logic that when the read pointer changes takes a copy that remains stable until the logic in the other clock domain has sampled it. For example in Verilog:


	reg	wr_ack;		// write side copy of the ack state
	wire	sync_wr_ack;	// wr_ack synchronized into the read domain
	synchronizer_flops	s1(.clk(rdclk), .in(wr_ack), .out(sync_wr_ack));		
	reg	rd_wr_ack;	// read side copy of the ack state

	reg	rd_send;	// write side copy of the send state
	wire	sync_rd_send;	// rd_send synchronized into the rite domain
	synchronizer_flops	s2(.clk(wrclk), .in(rd_send), .out(sync_rd_send));	
	reg	wr_rd_send;	// read side copy of the send state

	reg	 [ 3: 0]rd_pnt;		// read pointer
	reg	 [ 3: 0]rd_pnt_copy;	// read pointer copy - remains stable during synchronization
	reg	 [ 3: 0]wr_rd_pnt;	// read pointer copy - synchronized to the write domain

	always @(posedge rdclk)
	if (rd_reset) begin
		rd_pending = 0;
		rd_pnt_copy = rd_pnt;
		rd_wr_ack = sync_wr_ack;
		rd_send = 0;
	end else				// if we detect we should send a new pointer
	if (rd_pnt != rd_pnt_copy && (!rd_pending || sync_wr_ack != rd_wr_ack)) begin
		rd_pnt_copy = rd_pnt;		// copy the pointer
		rd_wr_ack = sync_wr_ack;	// remember the ack state
		rd_pending = 1;			// remember we're sending
		rd_send = ~rd_send;		// signal to the other side
	end

	always @(posedge wrclk) 
	if (wr_reset) begin
		wr_rd_pnt = rd_pnt_copy;
		wr_ack = 0;
		wr_rd_send = sync_rd_send;
	end else
	if (wr_rd_send != sync_rd_send) begin	// if we detect a send transition
		wr_rd_pnt = rd_pnt_copy;	// copy the pointer
		wr_rd_send = sync_rd_send;	// remember the send state
		wr_ack = ~wr_ack;		// send an ack
	end

Notice that we're signalling with EDGEs here - not levels - this is important - not only does it make for simpler logic - but it halves the bandwidth of the signal being synchronized reducing the chances of metastability.

The above is not the only way to perform this operation - just an example. The main downside of this sort of mechanism is the latency (you must synchronize data in both directions before you can pass more onward. Reducing the chances of metastability almost always comes at the expense latency. One result in the FIFO example is that some FIFO entries may be unusable at some times - effectively making the FIFO shorter that you had expected.

One solution that I like to this is to decode the fifo pointers to a one hot state:


	reg	[15:0]onehot;

	always @(posedge rdclk) begin
		onehot = 0;
		onehot[rd_pnt] = 1;
	end

	wire	[15:0]sync_onehot;
	synchronizer_flops	s00(.clk(wrclk), .in(onehot[0]),  .out(sync_onehot[0));
	....
	synchronizer_flops	s15(.clk(wrclk), .in(onehot[15]), .out(sync_onehot[15));
		
	always @(posedge wrclk)
	casez(sync_onehot) 
	16'b????_????_????_??01:	wr_rd_pnt = 0;
	16'b????_????_????_?01?:	wr_rd_pnt = 1;
	16'b????_????_????_01??:	wr_rd_pnt = 2;
	....
	16'b1???_????_????_???0:	wr_rd_pnt = 15;
	default:			wr_rd_pnt = wr_rd_pnt;
	endcase

This can use a lot more gates but it effectively reduces the signalling rate on the signal lines (reducing the failure rate as the number of synchronizers increases).

It works by being careful about what happens with failures. Consider what happens when rd_pnt goes from 0 to 1, bits 1:0 of onehot go from 01 to 10 (11 and 00 are illegal combinations). When sampled on the write side however all 4 combinations of bits 1:0 might be sampled - for 10 and 01 the right thing happens, for 11 the case statement above reads it as if 10 was detected and if 00 is sampled nothing happens - the output retains it's old value (presumably decode from 00). Care must be taken here as it's possible that with a fast enough read clock you might see multiple 00s - but this OK because eventually the fifo will empty and the read pointer will stop changing. If this is a problem you can instead stuff 'onehot' with a value where 1/2 the bits are 1 and 1/2 are 0 and they chase themselves around.

Another possible solution to the multi-bit synchronizer problem is to make sure that only one bit in the source changes per source clock - and make sure the destination can handle the possible combinations it might see with arbitrary synchronization. For the FIFO example above this might involve using a gray-code counter (which only changes 1 bit/clock with all the possible mis-synchronizations giving a correctly advancing counter) - in my experience the down side is that if you want to calculate the difference between pointers you have to convert back to a more normal coding.

But wait .... I use asynchronous resets ...

And I bet they work MOST of the time too .....

These can fail - the reset signal can go away just before a clock edge causing one of the internal feedback loops to enter a metastable state (often they are implemented with a circuit that is roughly the equivalent of the one shown above but with one of the inverters in each half replaced with a nand or nor gate)

"But everyone uses asynchronous resets right?" Well a lot of people still do - and they mostly seem to work - the reason for this is that they are usually driven from an external reset signal - which has a VERY low frequency - like in the order of hours, days or years so while the chances of metastability in the flop may be low the chances of it actually happening are really low (like longer than the chip's expected lifetime). This is an example of where figuring out the frequency of the thing you are synchronizing when calculating MTBF can be really useful.

But - if you are using asynchronous reset for something other than a power-on reset - from a signal with a frequency that isn't glacial you may have a real potential problem - time to do some math - or redesign.

There's a second problem with async resets that's akin to the synchronization of multiple bits problem described above - if you have multiple flops with asynchronous resets it's quite possible that the removal of reset (you probably don't care about the application of reset) may not be recognized by all the flops in the same clock - in this case it may be worthwhile having a circuit that keeps things idle for a couple of clocks after reset (of course that would be a synchronous reset circuit .... so why not just synchronize the reset signal through 1 flop just like any other signal and use that to drive synchronous resets to the rest of the flops).

Even though I've described the evils of asynchronous resets there are still places where they are genuinely useful - provided you take great care about how they are used. Often these situations are where a chip must do something before a clock is going to be reliably available - for example if you are using 1-hot transmission-gate muxes and don't want to get them into a bad high-current state after power on