# Hardware and Petri nets

Alex Yakovlev Univ. Newcastle upon Tyne

> Advanced Course on Petri nets, Eichstätt, 24-26 Sept, 2003

#### Contents and schedule of lectures

- Introduction (Wednesday, 10:30-10:45)
- Hardware modelling with Petri nets (Wednesday, 10:45-12:00)
- Circuit Synthesis (Thursday 10:30-12:00):
   Direct synthesis of Petri nets (Thursday, 10:30-11:15)
   Logic synthesis from STGs (Thursday, 11:15-12:00)
- Analysis and verification (Friday, 10:30-11:15)
- Performance analysis (Friday, 11:15-12:00)

## Main bib references

- A.V. Yakovlev, A.M.Koelmans. Petri nets and digital hardware design, Lectures on Petri nets II: Applications, Advances in Petri Nets, LNCS vol. 1492, Springer 1998, pp. 154-236
- A. Kondratyev, M. Kishinevsky, A. Taubin, J. Cortadella, L. Lavagno. The use of Petri nets for the design and verification of asynchronous circuits and systems, Jour. Cir., Syst. And Comp., vol.8, no.1, Feb 1998, pp. 67-118.
- Hardware Design and Petri Nets (Editors: A. Yakovlev, L. Gomes, L. Lavagno), Kluwer Academic Publishers, March 2000, ISBN 0-7923-7791-5, 344 pp
- J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno and A. Yakovlev, Logic Synthesis of Asynchronous Controllers and Interfaces, Springer, March 2002, ISBN3-540-43152-7
- Hardware Design and Concurrency, Advances in Petri nets, LNCS vol. 2549, Springer, ISBN 3-540-00199-9, 345pp.

## Hardware and Petri Nets: Introduction

Alex Yakovlev Univ. Newcastle upon Tyne

> Advanced Course on Petri nets, Eichstätt, 24-26 Sept, 2003

## Introduction. Outline

- Role of Hardware in modern systems
- Role of Hardware design tools
- Role of a modeling language
- Why Petri nets are good for Hardware
   Design
- History of "relationship" between Hardware Design and Petri nets
- Asynchronous Circuit Design

## Role of Hardware in modern systems

- Technology allows putting 1 billion transistors on a chip
- System on Chip is a reality 1 billion operations per second
- Hardware and software designs are no longer separate
- Hardware becomes distributed, asynchronous and concurrent

#### Role of Hardware design tools

- Design productivity is a problem due to chip complexity and time to market demands
- Need for well-integrated CAD with simulation, synthesis, verification and testing tools
- Modelling of system behaviour at all levels of abstraction with feedback to the designer
- Design re-use is a must but with max technology independence

#### Int. Technology Roadmap for Semiconductors says:

- 2010 will bring a system-on-a-chip with:
  - 4 billion 50-nanometer transistors, run at 10GHz
     Moore's law: steady growth at 60% in the number of transistors per chip per year as the functionality of a chip doubles every 1.5-2 years.
- Technology troubles: process parameter variation, power dissipation (IBM S/390 chip operation PICA video), clock distribution etc. present new challenges for Design and Test
- · But the biggest threat of all is design cost



Role of Modelling Language

- Design methods and tools require good modelling and specification techniques
- Those must be formal and rigorous and easy to comprehend (cf. timing diagrams, waveforms, traditionally used by logic designers)
- Today's hardware description languages allow high level of abstraction
- Models must allow for equivalence-preserving refinements
- They must allow for non-functional qualities such as speed, size and power

## Why Petri nets are good for hardware design

- Finite State Machine is still the main formal tool in hardware design but it may be inadequate for distributed, concurrent and asynchronous hardware
- Petri nets
- simple and easy to understand graphical capture
- modelling power adjustable to various types of behaviour at different abstraction levels
- formal operational semantics and verification of correctness (safety and liveness) properties
- possibility of mechanical synthesis of circuits from net models

#### A bit of history of their "marriage"

- 1950's and 60's: Foundations (Muller & Bartky, Petri, Karp & Miller, ...)
- 1970's: Toward Parellel Computations (MIT, Toulouse, St. Petersburg, Manchester ...)
- 1980's: First progress in VLSI and CAD, Concurrency theory, Signal Transition Graphs (STGs)
- 1990's: First asynchronous design (verification and synthesis) tools: SIS, Forcage, Petrify
- 2000's: Powerful asynchronous design flow (incl. hardware-software codesign and system-on-chip design)

#### Introduction to Asynchronous Circuits

- · What is an asynchronous circuit?
  - Physical (analogue) level
  - Logical level
- Speed-independent and delay-insensitive circuits
- Why go asynchronous?
- · Why control logic?
- · Role of Petri nets
- · Asynchronous circuit design based on Petri nets

#### What is an asynchronous circuit

- · No global clock; circuits are self-timed or self-clocked
- Can be viewed as hardwired versions of parallel and distributed programs statements are activated when their guards are true
- No special run-time mechanism the "program statements" are physical components: logic gates, memory latches, or hierarchical modules
- Interconnections are also physical components: wires, busses





## Physical (Analogue) level

- Strict view: an asynchronous circuit is a (analogue) dynamical system – e.g. to be described by differential equations
- In most cases can be safely approximated by logic level (0-to-1 and 1-to-0 transitions) abstraction; even hazards can be captured
- For some anomalous effects, such as metastability and oscillations, absolute need for analogue models
- · Analogue aspects are not considered in this tutorial

#### Logical Level

- Circuit behaviour is described by sequences of up (0-to-1) and down (1-to-0) transitions on inputs and outputs
- The order of transitions is defined by causal relationship, not by clock (*a causes b*, directly or transitively)
- The order is partial if concurrency is present
- Two prominent classes of async circuits: speedindependent (work for any gate delay variations) and delay-insensitive (for both gate and wire delays)
- A class of async timed (not clocked!) circuits allows special timing order relations (*a occurs before b*, due to delay assumptions)









#### Why asynchronous is good

- Performance (work on actual, not max delays)
- Robustness (operationally scalable; no clock distribution; important when gate-to-wire delay ratio changes)
- Low Power ('change-based' computing fewer signal transitions)
- Low Electromagnetic Emission (more even power/frequency spectrum)
- Modularity and re-use (parts designed independently; well-defined interfaces)
- · Testability (inherent self-checking via ack signals)

#### Obstacles to Async Design

- Design tool support commercial design tools are aimed at clocked systems
- Difficulty of production testing production testing is heavily committed to use of clock
- Aversion of majority of designers, trained 'with clock' biggest obstacle
- Overbalancing effect of periodic (every 10 years) 'asynchronous euphoria'

#### Why control logic

- Customary in hardware design to separate control logic from datapath logic due to different design techniques
  Control logic implements the control flow of a (possibly concurrent)
- algorithm • Datapath logic deals with operational part of the algorithms
- Datapath operations may have their (lower level) control flow elements, so the distinction is relative
- Examples of *control-dominated* logic: a bus interface adapter, an arbiter, or a modulo-N counter
- Their behaviour is a combination of partial orders of signal events
  Examples of *data-dominated* logic are: a register bank or an arithmetic-logic unit (ALU)

#### Role of Petri Nets

- · We concentrate here on control logic
- Control logic is behaviourally more diverse than data path
- Petri nets capture causality and concurrency between signalling events, deterministic and non-deterministic choice in the circuit and its environment
- They allow:
  - composition of labelled PNs (transition or place sync/tion)
     refinement of event annotation (from abstract operations down to signal transitions)
  - use of observational equivalence (lambda-events)
  - clear link with state-transition models in both directions



## Hardware and Petri Nets: Modelling

Alex Yakovlev Univ. Newcastle upon Tyne

> Advanced Course on Petri nets, Eichstätt, 24-26 Sept, 2003

#### **Modelling.Outline**

- High level modelling and abstract refinement; processor example
- Low level modelling and logic synthesis; interface controller example
- Modelling of logic circuits: event-driven and level-driven parts
- · Properties analysed



## High-level modelling:Processor Example

- Details of further refinement, circuit implementation (by direct translation) and performance estimation (using UltraSan) are in:
- A. Semenov, A.M. Koelmans, L.Lloyd and A. Yakovlev. Designing an asynchronous processor using Petri Nets, IEEE Micro, 17(2):54-64, March 1997
- For use of Coloured Petri net models and use of Design/CPN in processor modelling: F.Burns, A.M. Koelmans and A. Yakovlev. Analysing superscalar processor architectures with coloured Petri nets, Int. Journal on Software Tools for Technology Transfer, vol.2, no.2, Dec. 1998, pp. 182-191.



























































### Modelling. Conclusions

- · Choosing the right level of modelling is crucial
- Refinement of Petri net models and interpretation can be used in hardware design
- Petri nets are too abstract to capture analogue phenomena in circuits
- However, non-persistence or non-safeness can (conservatively) approximate the possibility of hazards

## Hardware and Petri Nets: Synthesis

Alex Yakovlev Univ. Newcastle upon Tyne

> Advanced Course on Petri nets, Eichstätt, 24-26 Sept, 2003

## **Tutorial Outline**

- Introduction
- Modeling Hardware with PNs
- Synthesis of Circuits from PN specifications
- Circuit verification with PNs
- Performance analysis using PNs

Hardware Design and Petri Nets – Adv. Tutorial

## Synthesis.Outline

- Abstract synthesis of Labelled PNs (LPNs) from causality constraints and transition systems
- Handshake and signal refinement (LPN-to-STG)
- Direct translation of LPNs and STGs to circuits
- Logic synthesis from STGs

## Synthesis from Causality Constraints (compositional approach)

- Behaviour defined in terms of *Causality Constraints* - characteristic predicates defined on traces
- These constraints produce LPN "snippets"
- Construction of LPNs as compositions of snippets
- Examples: n-place buffer, 2-way merge























#### Synthesis from transition systems

- Modelling behaviour in terms of a sequential capture *Transition System*
- Synthesis of LPN (distributed and concurrent object) from TS (using *theory of regions*)
- Examples: one place buffer, counterflow pp





































## Synthesis from process-based languages

- Modelling behaviour in terms of a process (-algebraic) specifications (CSP, ...)
- Synthesis of LPN (concurrent object with explicit causality) from process-based model (concurrency is explicit but causality implicit)
- Examples: modulo-N counter

#### Refinement at the LPN level

- Examples of refinements:
  - Introduction of "silent" events
  - Handshake refinement
  - Signalling protocol refinement (return-to-zero versus non-returnto-zero)
  - Arbitration refinement

All these refinements must preserve behavioural equivalence (discussed below) and some other properties at the STG level (discussed later)

• What is implemented in Petrify and what isn't (yet)



































- After appropriate refinements have been made one can translate Labelled Petri nets (or Signal Transition Graphs) into circuits
- Either by syntax-direct translation (discussed below)
- Or by using Logic Synthesis (discussed later)

#### Why direct translation?

- Direct translation has linear complexity but can be area inefficient (inherent one-hot encoding)
- Logic synthesis has problems with state space explosion, repetitive and regular structures (logbased encoding approach)

#### Direct Translation of Petri Nets

- Previous work dates back to 70s
- Synthesis into event-based (two-phase) circuits (similar to Sutherland's micropipeline control)

   S.Patil, F.Furtek (MIT)
- S.Palli, F.Furlek (IVIT)
- Synthesis into *level-based (4-phase)* circuits (similar to synthesis from one-hot encoded FSMs)
  - R. David ('69, translation FSM graphs to CUSA cells)
  - L. Hollaar ('82, translation from parallel flowcharts)
     V. Varshavsky et al. ('90,'96, translation from PN into
  - an interconnection of David Cells)

#### Synthesis into event-based circuits

- · Patil's translation method for simple PNs
- Furtek's extension to 1-safe net
- "Pragmatic" extensions to Patil's set (for nonsimple PNs)
- Examples: modulo-N counter, Lazy ring adapter

































#### Synthesis into level-based circuits

- David's method for asynchronous Finite State Machines
- Holaar's extensions to parallel flow charts
- Varshavsky's method for 1-safe Petri nets: based on associating *places with latches*
- Examples: counter, VME bus, butterfly circuit











## Varshavsky's Approach

- This method associates *places* with *latches* (flip-flops) so the state memory (marking) of PN is directly mimicked in the circuit's state memory
- Transitions are associated with controlled actions (e.g. activations of data path units or lower level control blocks by using handshake protocols)
- Modelling discrepancy (be careful!):
  - in Petri nets removal of a token from pre-places and adding tokens in post-places is *instantaneous* (i.e. *no intermediate states*)
  - in circuits the "move of a token" has a *duration* and *there is an* intermediate state



## Direct translation examples

In this work we tried direct translation:

- From STG-refined specification (VME bus controller)

   Worse than logic synthesis
- From a largish abstract specification with high degree of repetition (mod-6 counter)
- Considerable gain to logic synthesis
- From a small concurrent specification with dense coding space ("butterfly" circuit)
  - Similar or better than logic synthesis



















"Butterfly" circuit

a+ (1



"Butterfly" circuit

DC-circuit with aggressive use of timing assumptions:

٦XZ



## Hardware and Petri Nets: Verification of Asynchronous Circuits using Partial Order Techniques

Alex Yakovlev Univ. Newcastle upon Tyne

> Advanced Course on Petri nets, Eichstätt, 24-26 Sept, 2003

## Outline

- Representing Petri net semantics with occurrence nets (unfoldings)
- Unfolding (finite) prefix construction
- Analysis of asynchronous circuits
- Problems with efficient unfolding

## Approaches to PN analysis

- Reachable state space:
  - Direct or symbolic representation
  - Full or reduced state space (e.g. stubborn set method)

in both cases knowledge of Petri net structural relations (e.g. conflicts) helps efficiency

- Unfolding the Petri net graph into an acyclic branching graph (occurrence net), with partial ordering between events and conditions and:
  - Considering a finite prefix of the unfolding which covers all reachable states and contains enough information for properties to be verified

























## Truncation of unfolding

- At some point of unfolding the process begins to repeat parts of the net that have already been instantiated
- In many cases this also repeats the markings in the form of cuts
- The process can be stopped in every such situation
- Transitions which generate repeated cuts are called cut-off points or simply *cut-offs*
- The unfolding truncated by cut-off is called prefix





| Prefix Construction Algorithm                                  |  |
|----------------------------------------------------------------|--|
| <b>Proc</b> Build prefix ( $N = \langle P, T, F, M0 \rangle$ ) |  |
| Initialise N' with instances of places in MO                   |  |
| Initialise Queue with instances of t enabled at M0             |  |
| while Queue is not empty do                                    |  |
| Pull t' from Queue                                             |  |
| if t' is not cutoff then do                                    |  |
| Add t' and $succ(t')$ to N'                                    |  |
| for each $t$ in $T$ do                                         |  |
| Find unused set of mutually concurrent<br>instances of pred(t) |  |
| if such set exists then do                                     |  |
| Add t' to Queue in order of its prehistory size                |  |
| end do                                                         |  |
| end do                                                         |  |
| end do                                                         |  |
| return N'                                                      |  |
| end proc                                                       |  |

## Cut-off definition

- A newly built transition instance *t1*' in the unfolding is a cut-off point if there exists another instance *t2*' (of possibly another transition) whose:
  - Final cut maps to the same marking is the final cut of  $t1^\prime\!,$  and
  - The size of prehistory (local configuration) of t2' is strictly greater than that of t1'

[McMillan, 1992]

 Initial marking and its min-cut are associated with an imaginary "bottom" instance (so we can cut-off on t7 in our example)



## Complexity issues

- The prefix covers all reachable markings of the original net but the process of prefix construction does not visit all these markings
- Only those markings (sometimes called *Basic Markings*) are visited that are associated with the final cuts of the local configurations of the transition instances
- These markings are analogous to primes in an algebraic lattice
- The (time) complexity of the algorithm is therefore proportional to the size of the unfolding prefix
- For highly concurrent nets this gives a significant gain in efficiency compared to methods based on the reachability graph









## Cut-off Criteria

- McMillan's cutoff criterion, based on the size of pre-history, can be too strong
- A weaker criterion, based only on the matching of the final cuts, was proposed by Esparza, Vogler, and Römer
  - It uses a *total* (lexicographical) *order* on the transition set (when putting them into *Queue*)
  - It can be *only* applied to 1-safe nets because for non-1-safe nets such a total order cannot be established (main reason auto-concurrency of instances of the same transition!)
- Unfolding k-safe nets can produce a lot of redundancy



- A model-checker to verify a CTL formula (defined on place literals) has been built (Esparza) within the PEP tool (Hildesheim/Oldenburg)
- Various standard properties, such as k-boundedness, 1safeness, persistency, liveness, deadlock freedom have special algorithms, e.g.:
  - Check for 1-safeness is a special case of auto-concurrency (whether a pair of place instances exist that are mutually concurrent – can be done in polynomial time)
  - Similar is a check for persistency of some transition (analysis of whether it is in immediate conflict with another transition)
  - Check for deadlock is exponential (McMillan) involves enumeration of configurations (non-basic markings), however efficient linear-algebraic techniques have recently been found by Khomenko and Koutny (CONCUR'2000)



- Unfolding an interpreted Petri net, such as a Signal Transition Graph, requires keeping track of the interpretation – each transition is a change of state of a signal, hence each marking is associated with a *binary state*
- The prefix of an STG must not only "cover" the STG in the Petri net (reachable markings) sense but must also be complete for analysing the implementability of the STG, namely: consistency, output-persistency and Complete State Coding









## Verifying STG implementability

- Consistency by detecting signal deadlock via *autoconcurrency* between transitions labelled with the same signal (a\* || a\*, where a\* is a+ or a-)
- Output persistency by detecting *conflict* relation between output signal transition a\* and another signal transition b\*
- Complete State Coding is less trivial requires special theory of binary covers on unfolding segments (Kondratyev et.al.)





## Analysis of Circuit Petri Nets

- Petri net models built for event-based and level-based elements, together with the models of the environment can be analysed using the STG unfolding prefix
- The possibility of hazards is verified by checking either 1-safeness (for eventbased) or persistency (for level-based) violations

| example     | #stages | #places | #trans | #states  | BDD       | final cizo | timo  | Prefix<br>#places | iltrane | fimo |   |
|-------------|---------|---------|--------|----------|-----------|------------|-------|-------------------|---------|------|---|
|             |         |         |        |          | peak size | lindi Size | ume   | #places           | #lidits | ume  |   |
| philosoph   | 20      | 140     | 100    | 2.20E+13 | none      | 3091       | 10    | 140               | 100     |      | 1 |
| philosoph   | 40      | 280     | 200    | 2.90E+19 | none      | 251839     | 455   | 280               | 200     |      | 1 |
| philosoph   | 50      | 350     | 250    | too many | none      | 1870847    | >4hrs | 350               | 250     |      | 1 |
| philosoph   | 60      | 420     | 300    | too many | none      | none       | none  | 420               | 300     |      | 1 |
| muller-pipe | 30      | 120     | 60     | 6.00E+07 | 7897      | 4784       | 132   | 490               | 240     |      | 1 |
| muller-pipe | 45      | 180     | 90     | 6.90E+11 | 23590     | 10634      | 740   | 1035              | 510     |      | 2 |
| muller-pipe | 60      | 240     | 120    | 8.40E+15 | 53446     | 18788      | 3210  | 1780              | 880     |      | 4 |
| dme-arbiter | 20      | 81      | 80     | 2.20E+07 | 1688      | 1688       | 11    | 81                | 80      |      | 1 |
| dme-arbiter | 40      | 161     | 160    | 4.50E+13 | 6568      | 6568       | 101   | 161               | 160     |      | 1 |
| dme-arbiter | 60      | 241     | 240    | 7.00E+19 | 14648     | 14648      | 342   | 241               | 240     |      | 1 |
|             |         |         |        |          |           |            |       |                   |         |      |   |





















| Buffer | k-bound  | ed net with Mcr<br>unfolding | nillan's    | safe nets using ERV's algorithm |                    |        |  |
|--------|----------|------------------------------|-------------|---------------------------------|--------------------|--------|--|
| size   | Original | Unfolding<br>(t/p)           | time(<br>s) | Original                        | Unfolding<br>(t/p) | time(s |  |
| 2      | 6/8      | 184/317                      | 0.05        | 8/14                            | 29/68              | 0.03   |  |
| 3      | 6/8      | 1098/1896                    | 0.84        | 10/18                           | 46/105             | 0.10   |  |
| 4      | 6/8      | 6944/11911                   | 21.46       | 12/22                           | 67/150             | 0.28   |  |
| 5      | 6/8      | -                            | -           | 14/26                           | 92/203             | 0.7    |  |
| 6      | 6/8      | -                            | -           | 16/30                           | 121/264            | 1.8    |  |
| 7      | 6/8      | -                            | -           | 18/34                           | 154/333            | 4.2    |  |
| 8      | 6/8      | -                            | -           | 20/38                           | 191/410            | 8.74   |  |

I Infolding k-safe Nets

#### Unfolding k-safe Nets

| Buffer | safe nets | using ERV's alg    | gorithm     | FIFO-unfolding with McMillan's |                    |             |  |
|--------|-----------|--------------------|-------------|--------------------------------|--------------------|-------------|--|
| size   | Original  | Unfolding<br>(t/p) | time(<br>s) | Original                       | Unfolding<br>(t/p) | time(s<br>) |  |
| 2      | 8/14      | 29/68              | 0.03        | 6/8                            | 41/69              | 0.010       |  |
| 3      | 10/18     | 46/105             | 0.10        | 6/8                            | 41/68              | 0.010       |  |
| 4      | 12/22     | 67/150             | 0.28        | 6/8                            | 51/84              | 0.020       |  |
| 5      | 14/26     | 92/203             | 0.74        | 6/8                            | 61/100             | 0.020       |  |
| 6      | 16/30     | 121/264            | 1.84        | 6/8                            | 71/116             | 0.040       |  |
| 7      | 18/34     | 154/333            | 4.25        | 6/8                            | 81/132             | 0.050       |  |
| 8      | 20/38     | 191/410            | 8.74        | 6/8                            | 91/148             | 0.060       |  |
|        |           |                    |             |                                |                    |             |  |

## Conclusion

- Unfolding can be very efficient where a lot of concurrency and little choice involved
- However unfolding may be very inefficient can be "excessively resolving" (e.g. individualise tokens in ksafe nets or split self-loops) and thus spawn too many branches in history
- Other forms of unfolding can be studied (e.g. nonaggressive unfolding of places – building DAGs instead of branching processes)
- Unfoldings have also been used to analyse nets with time annotation and for synthesis of circuits but these are hot research topics – Haway the lads!

## Hardware and Petri Nets: Performance Analysis

#### Alex Yakovlev Univ. Newcastle upon Tyne

Advanced Course on Petri nets, Eichstätt, 24-26 Sept, 2003

### Outline

- Performance analysis of asynchronous circuits: a motivating example
- · Delay types in asynchronous designs
- Main approaches: Deterministic vs Probablistic
- Generalised Timed PNs and Stochastic PNs
- Application examples
- · Open problems

## Performance issues in async design

- No global clocking does not mean async designers needn't care about timing!
- Knowledge of timing in async design helps to construct circuits with higher performance and smaller size
- Performance of async circuits depends on:
- 1) delay distribution of datapath components
- 2) overhead of completion detection
- 3) its micro-architecture and control flow
- Our focus is on 3) , where behavioural modelling with Petri nets can be applied
- Important tradeoff: degree of concurrency (adds speed) vs control complexity (reduces speed and increases size)





























#### Main approaches to perf. analysis

Two methodologically different approaches:

- · Deterministic (delay information known in advance), sometimes the element of unknown is represented by delay intervals. Performance values are computed precisely (even if within lower/upper bounds or by average values). Good for hard-real time systems or for detailed, low level circuit designs where absolute performance parameters are important
- Probabilistic (delay information defined by distribution functions, standard or arbitrary pmf). Performance is estimated only approximately, mostly to assess and compare alternative design solutions at early stages of system design, where relative performance factors are needed. They may also be useful for guiding synthesis



Timed Petri nets - early models by Ramchandani (MIT-TR, 1974) and Ramamoorthy&Ho (IEEE Trans SE1980) Key result (for marked graphs): Proof based on



C - average cycle time (lower) bound

Nk - total number of tokens in cycle k

q-number of cycles in the net

 $T_k$  – sum of execution times of transitions in cycle k

 $= \max \{$ N

where

- (1) No. of tokens in every cycle of an MG is constant (Commoner et al)
- All transitions in an (2)MG have the same cycle time

A polynomial algorithm for verification of  $CN_k - T_k \ge 0$  condition (based on Floyd algorithm); see also Nielsen&Kishinevsky(DAC'94)

Method can also be used for safe persistent nets but proved NP-complete for general nets











### Probabilistic approach

Sources of non-determinism:

- Environment may offer choice (e.g. Read/Write modes in VME bus interface, instruction decoding in a CPU) => probabilistic choice b/w transitions (cf. frequencies in TPNs)
- Data path or environment delays may have stochastic nature (e.g. delay distribution in carrychain, or user think time distribution)
- Gate delays may be modelled using specific pdf/pmf's to allow for uncertainty in low-level implementation (layout and technology parameter variations)
- 2) and 3) => firing time distributions in Stochastic Petri nets (SPNs)





## Generalised Stochastic PNs

- Transitions with probabilistic (continuous) firing time were introduced in Stochastic Petri nets (SPNs)by Molloy (IEEE TC-31,82) and in GSPN by Marsan, Balbo&Conte (ACM TCS-2,84)
- Fining time can either be zero (immediate transitions) or exponential distributed (for Markovian properties of the reachability graph); Immed. transitions have higher priority
- More extensions have been introduced later leading to Generally Distributed Timed Transitions SPNs (GDTT-SPN) – see Marsan, Bobbio&Donatelli's tutorial in Adv.Lectures 98
- Analysis of GSPN based on:
   (1) constructing a reachability grade
  - (1) constructing a reachability graph with transition rates, thus generating a continnuous time Markov Chain, and
  - (2) computing performance measures from CTMC analysis



#### Comparison b/w GTPN and GSPN

| Modelling<br>/Analysis<br>Feature  | GTPN                                                                                                                   | GSPN                                                                                                                                         |
|------------------------------------|------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
| Conflict resolution                | Static (conflicts resolved before<br>firing – using firing frequencies).<br>Good to model free<br>(environment) choice | Dynamic (competing transition<br>delays – transition which fires first<br>wins conflict). Good to model races<br>and arbitration in hardware |
| Probability/<br>rate<br>assignment | Probabilities are assigned<br>according to transitions<br>frequencies independent of<br>transition delays              | Probabilities/rates are assigned<br>according to competing transition<br>delays for non-immediate<br>transitions                             |
| Firing<br>durations                | Arbitrary non-negative reals plus geometric holding times                                                              | Extended SPN (ESPN) can have<br>arb. holding times but at the cost of<br>restricted reach graphs                                             |
| Complexity                         | Due to inherent complexity of deterministic delays, their state space is larger                                        | Discrete-time SPNs can have large<br>state spaces when deterministic<br>holding times are used                                               |

What is needed for async hardware?

Asynchronous circuit modelling requires:

- both deterministic and stochastic delay modelling,
- stochastic static ("free-choice") and dynamic (with races) conflict resolution
- competing (with races) transitions with deterministic timing
- Any idea of a tractable model with these features?

### Recent application examples

These are examples of using PNs in analytic and simulation environments:

- Use of unfoldings (tool PUNT) and SPNs (tool UltraSan) for performance estimation of a CPU designed with PNs (Semenov,etal, IEEEMicro,1997)
- Multi-processor, multi-threaded architecture modelling using TPNs (Zuberek, HWPN'99)
- Response time (average bounds) analysis using STPNs and Monte-Carlo, for Instruction length Decoder; developed tool PET (Xie&Beerel, HWPN'99)
- Analysis of data flow architectures using tool ExSpect (Witlox etal, HWPN'99)
- Modelling and analysis of memory systems using tool CodeSign (Gries, HWPN'99)
- Superscalar processor modelling and analysis using tool Design/CPN (Burns,etal,J.ofRT,2000)
- SPN modelling and quantification of fairness in arbiter analysis using tool GreatSPN (Madalinski,etal,UKPEW'00)

#### Conclusions

- Asynchronous circuits, whether speed-independent or with timing assumptions/constraints, require flexible and efficient techniques for performance analysis
- The delay models cover both main types: deterministic, stochastic (with different pdf/pmf's) and must allow for races; conflicts both static and dynamic
- Clearly two different levels of abstraction need to be covered – logic circuit (STG) level and abstract behaviour (LPN) level; those often have different types of properties to analyse
- The number of async IP cores (for Systems-on-Chip) are on the increase in the near future, so big help from performance analysis is urgently needed to evaluate these new core developments

## References(1)

Asynchronous Hardware - Performance Analysis

- S.M. Burns, Performance analysis and optimisation of asynchronous circuits, PhD thesis, Caltech, Dec. 1990.
- M.R. Greenstreet, and K. Steiglitz, Bubbles can make self-timed pipelines fast,
- Journal of signal processing, 2(3), pp. 139-148. J. Gunawardena, Timing analysis of digital circuits and the theory of min-max functions, Proc. ACM Int. Symp. On Timing Issues in the Spec. and Synth. of
- H. Hulgaard and S.M Burns Bounded delay timing analysis of a class of CSP programs with choice, Proc. Int. Symp. On Adv. Res. In Async. Cir. and Syst, (ASYNC'94), pp. 2-11.
- C.Nielsen and M. Kishinevsky, Performance analysis based on timing simulation, Proc. Design Automation Conference (DAC'94).
- T. Lee, A general approach to performance analysis and optimization of asynchronous circuits, PhD thesis, Caltech, 1995.
- J. Ebergen and R. Berks, Response time of asynchronous linear pipelines, Proc. Of IEEE, 87(2), pp. 308-318.

## References(2)

Timed and Generalised Timed Petri nets:

- C. Ramchandani, Analysis of asynchronous concurrent systems by Petri nets, MAC TR-120, MIT, Feb. 1974 C.V. Ramamoorthy and G.S. Ho, Performance evaluation of asynchronous concurrent
- systems using Petri nets, IEEE Trans. Soft. Eng., SE-6(5), Sept. 1980, pp. 440-449.
- systems using Petri nets, IEEE Irans. Soft. Eng., SE-6(b), Sept. 1980, pp. 440-449. W.M. Zuberek, Timed Petri nets and preliminary performance evaluation, 7<sup>th</sup> Ann. Symp. On Comput. Architecture, 1980, pp. 88-96. W.M. Zuberek, Timed Petri nets definitions, properties and applications, Microelectronics and Reliability (Special Issue on Petri nets and Related Graph Models), 31(4), pp. 627-644, 1991.
- R.R. Razouk and C.V. Phelps, Performance analysis using timed Petri nets, Proc. 1984
   Int. Conf. Parallel Processing, Aug. 1984, pp. 126-129.
   M.A. Holliday and M. K. Vernon, A generalised timed Petri net model for performance analysis, IEEE Trans. Soft. Eng., SE-13(12), Dec. 1987, pp. 1297-1310. .
- •
- M. K. Molloy, Performance analysis using stochastic Petri nets, IEEE Trans. Comp., C-31(9), Sep. 1982, pp.913-917. S
- M.A. Marsan, G. Babo, and G. Conte, A class of generalized stochastic Petri nets, ACM Trans. Comput. Syst. Vol. 2, pp. 93-122, May 1984.
  M. A. Marsan, A. Bobbio, and S. Donatelli, Petri nets in performance analysis: an introduction, In: Lectures on Petri nets I: Basic Models, LNCS 1491, Springer Verlag, 1998. .

## References(3)

R. R. Razouk, The use of Petri nets for modelling pipelined processors, Proc. 25<sup>th</sup> ACM/IEEE Design Automation Conference (DAC'88), pp. 548-553.
A. Semenov, A.M. Koelmans, L. Lloyd, and A. Yakovlev, Designing an asynchronous processor using Petri nets, IEEE Micro, March/April 1997, pp. 54-64.

- processor using Petri nets, IEEE Micro, March/April 1997, pp. 54-64.
   A. Yakovlev, L. Gomes and L. Lavagno, editors: Hardware Design and Petri nets, Kluwer AP,Boston-Dordrecht, 2000, part V, Architecture Modelling and Petromance Analysis:
   A. Xie and P. A. Beerel, Performance analysis of asynchronous circuits and systems using Stochastic Timed Petri nets, pp. 239-268
   B.R.T.M. Wittox, P. van der Wolf, E.H.L. Aarts and W.M.P van der Aalst, Performance analysis of dataflow architectures using Timed Coloured Petri nets, pp. 269-290.
   M. Gries, Modeling a memory subsystem with Petri nets: a case study, pp. 291-310.
   W. M. Zubare, Performance

- W. M. Zuberek, Performance modelling of multithreaded distributed memory architectures, pp. 311- 331.
   F.Burns, A.M. Koelmans, and A. Yakovlev, WCET analysis of superscalar processors using simulation with Coloured Petri nets, Real-Time Syst., Int. J. of Time-Crit. Comp. Syst., 18(2), May 2000, Kluwer AP,pp.275-288
- A. Madalinski, A. Bystrov and A. Yakovlev, Statistical fairness of ordered arbiters, accepted for UKPEW, Durham, U.K., July 2000