James SW Song is the senior member of the technical staff, OMAP platform architect and design manger in Texas Instruments Wireless Terminal Business Unit. He has a BSEE and can be reached at s-song@ti.com.

Besides the data paths, acceleration units also contain control functions. With RTL accelerators, control functions are implemented as hardware state machines. By contrast, configurable processors perform control functions in software. Control functions built with hardware run faster than those implemented with software running on conventional processor architectures. However, designers can construct new instructions for a configurable processor that accelerate control functions so software-based control functions can often approach the performance of hardware state machines.

Many applications have macroscopic parallelism (such applications have independent processes that may execute concurrently), exploited through the independent execution of multiple machines. For example, a designer can create multiple hardware accelerators with independent state machines to accelerate MPEG-4 bit-stream coding, allowing the SoC to process multiple video objects concurrently. However, a designer may also exploit such macroscopic parallelism using multiple configurable processors to process the video objects with similar performance.

If performance is the number one concern, which approach should be used? The RTL accelerator approach may have a performance edge due to a more cycle-efficient control implementation, but not by a huge margin — especially considering the limitations of conventional processor bus interfaces. Configurable processors may achieve comparable performance. Therefore, designers must consider other criteria besides performance before selecting the optimal approach.

Considering area

The silicon real estate occupied by logic directly affects chip cost and overall system costs. Therefore, area is a major criterion when selecting between the use of a fixed processor with custom RTL hardware accelerators or configurable processors. By definition, the SoC already includes a processor so the incremental number of gates required for the acceleration logic is roughly equivalent whether that logic resides inside or outside the processor.


On an FPGA prototype, cycle-accurate code runs at speeds that are orders of magnitude faster than an ISS. Once manufactured in small quantities, thus amortizing the engineering cost, FPGA prototypes are fairly cheap to build, making them ideal vehicles for adoption by large software development teams.

