TDK-Lambda, Inc.

<p>Lane Mason, Principal Memory Analyst at Denali, believes that the trend toward embedded memories driven by consumer applications like cell phones and PDA’s will continue. He also believes that the explosion of the Flash memory market will also continue, with flash displacing other forms of memory and magnetic disk media in a variety of applications.</p>

The total coil losses can be combined into the loss resistance (Rs), which is connected in series with the ideal inductance (Ls). This is resulting in the simplified equivalent circuit shown Figure 2.

Next: DSP Multimedia and DMA Coprocessor

Dedicated Multimedia DSP Capabilities Speed Building-Blocks FST can be applied to most multimedia building blocks and significantly accelerate them. However, considerable DSP multimedia capabilities are still required in order to complement FST’s acceleration capabilities. These are used in conventional software implementations, such as in the following two scenarios: 1. Covering the rest of the 10% cases, where a known shape is not recognized and a conventional software implementation has to take place. 2. Multimedia building blocks that are not being accelerated by FST at all. The dedicated multimedia instructions are significantly accelerating those pure software implementations.

RN73R1JTTD1621C10_Datasheet PDF

Below is a partial list of different instructions and mechanisms, embedded into the CEVA-X1620 DSP, specifically built to accelerate multimedia functions:

The following example illustrates the significance of these DSP multimedia instructions, in the software implementation of the Motion Compensation component of the H.264 decoding process.

RN73R1JTTD1621C10_Datasheet PDF

The H.264 standard defines data blocks of 4×4 pixels, and a motion compensation process based on a quarter-pixel interpolation algorithm. In short, H.264 allows very fine motion vectors, shifting not only in pixels to any direction but also in half and quarter pixels. For that purpose, the motion compensation process of the decoder first has to generate 15 interpolated pixels for every 4 existing pixels, as shown in the diagram (G, H, M, N – 4 full pixels; b, h, j – 3 half pixels; a, c, d, e, f, g, i, k, n, p, q, r – 12 quarter pixels). Calculating these quarter pixels requires heavy averaging capabilities; some are simple averages, others are the result of a 6-tap 2D filter.

RN73R1JTTD1621C10_Datasheet PDF

The CEVA-X1620 is equipped with the appropriate instructions and mechanisms to deal with these processing complexities. Among other features embedded in the DSP is an average” instruction that enables the DSP to calculate up to 8 average bytes per cycle, and sophisticated byte additions and subtractions, treating different data sections as signed and unsigned values. The results of these arithmetic functions can then be saturated for different dynamic ranges, as specified in the H.264 standard.

DMA Coprocessor Offloads Data Arrangement Completely A DMA engine exists in all SoC designs for multimedia applications. The DMA’s most important responsibility is to execute most data transfers, on- and off-chip, while accessing any available resource, including memories, I/Os, peripherals and bus bridges. By doing that, the DMA engine offloads parts of the data rearrangement tasks from the DSP, enabling it to focus mainly on media processing functions.

The principle issue when connecting an emulation system is that the connection length is limited because of the high system clock speed. For example, if a 150 MHz microcontroller were used, the propagation delay of a 50 cm connection would be around 2.0 ns. However, the clock period is only 6.67 ns, so a 2.0 ns single-direction delay is very significant—virtually precluding any control functions being remote from the target device because the connection acts as a transmission line at these high frequencies, and the termination cannot be guaranteed. In this example, the maximum length to ignore transmission line effects would be only 16 cm, so the ICE (in-circuit emulator) would have the same environmental requirements as, say, the ECU (engine control unit) under test.

An example of such a system-on-chip implementation is the Infineon TC1796. The 32-bit TriCore CPU has separate private buses for code and data and is bridged (via the LFI) to the system buses to allow data access to the peripheral subsystem. A further bridge (via the DMA (direct memory access)) allows access to the remote peripheral bus.

The peripheral control processor (PCP2) is another 32-bit CPU that, again, has private data and program buses that are not normally visible. This processor runs at a maximum 150 MHz for the processor subsystem and 75 MHz for the peripheral subsystems, so there are two clock domains. The device is packaged in a 416-pin ball grid array package and provides a standard JTAG debug interface for debugger support. However, full emulation for such a microcontroller requires the ability to inspect transactions over many different internal buses that have no connection to the external pins. The large embedded memories (2M-byte flash) have wide internal fetch paths (128-bit) and local caches, so execution from internal memory is significantly faster than from external (32-bit accessed) memories (See below for the TC1796 block diagram).

Such complexity of multiple bus and multiple cores means that only a bond-out device (FPGA (field programmable gate array) can give the level of visibility required to fully debug the system. However, as previously discussed, the bus cycle time at 150 MHz is only 6.67 ns, which is too short for an external bond-out controller to receive the bus information, decode it, decide a break needs to be triggered, and halt the processor. The solution adopted in this case is to embed the emulator inside the bond-out device to create a so-called Emulation Device (See below for a comparison of a mass production device (left) with an emulation device with emulation extension chip (right)).

The Emulation Device uses a macro of the original production device, complete with all the normal functionality, peripherals and ports, and then adds around the edges, a 512K-byte SRAM, several bus observation blocks (BOBs), and a local CPU with some local memory for control of the emulator (in this case, another PCP2). Several high-speed serial interfaces, including USB, JTAG and Micro-Link interface ports provide external connection. This additional circuitry, known as the EEC (emulation extension chip), allows easy redesign of the emulator when a change to the mass production device is made, because the interconnection points remain unchanged.

The volume associated with a single application is rarely enough to ensure security of supply for that memory at a competitive price. Embedded products need to be able to survive across several generations of memory technology without a system redesign. Simple component substitution should be the extent of the change required to migrate to the next low cost memory device.

Copyright © 苏ICP备11090050号-1 tl431 datasheet All Rights Reserved. 版权投诉及建议邮箱