Multicore technology requires hardware and software expertise. Many companies are announcing collaboration to deliver the required multicore systems without losing the company focus of hardware or software development. One example of this joining of forces, bringing together expertise from either side of the design divide, is embedded-hardware and -software company GE Fanuc Intelligent Platforms. It has collaborated with networking and communications semiconductor company Cavium Networks to develop multicore-processor architectures for military and aerospace applications. Paul Cavill, general manager, military and aerospace products, of GE Fanuc, explained that today's military computing and deployment is based on communications and that consequently, "defence becomes information-centric rather than weapons-centric". The Octeon processor-based products, Cavill believes, will allow high-reliability applications to use the low-power and secure technology of multicore processing.
The collaboration will centre on Cavium's Octeon processors, already used for commercial networking, wireless, storage and security in data-centre, enterprise and service-provider industries. The Octeon II series has up to 32 cores, each running at up to 1.5GHz; they will be combined with GE Fanuc's military and aerospace experience for wire speed, deep-packet inspection and intelligent-networking applications such as network monitoring, intrusion detection and firewalling.
Packet processing
The processors provide a scaleable range of up to 32 cnMIPS cores, which is the company's implementation of MIPS64 release 2, which has been optimised for networking and services with very low power consumption. The processors also have, according to the company, the most advanced networking security and application-hardware acceleration. The hardware-acceleration engines are for quality of service, packet processing, TCP, compression, encryption, RAID, de-duplication and pattern matching for scalability and integration. There is a choice of standards-based I/Os—gigabit Ethernet, 10Gbit Ethernet, PCI Express Gen 1 and Gen 2, USB2.0, Serial I/O and Interlaken—for high-bandwidth networks and system connectivity.
The multicore processors are expected to be used in GE Fanuc's WANic 56511 and WANic 56512 (Figures 1 and 2) 10Gbit Ethernet packet processors. These were introduced to manage large, complex networks by taking the processing away from the server to the Ethernet interface to expand the network.
The packet processors provide fast, high-bandwidth networks by virtue of being able to use the Octeon processors to offload network-processing tasks from the server. They feature the Octeon Plus CN5650 12-core 750MHz processor from Cavium and are already used to handle the increased network traffic generated by search engines, cloud computing, mobile devices, Web 2.0 and online gaming. The packet processors are designed for networked communications systems based on the PCI Express architecture, where secure IP communications and 10Gbit Ethernet packet processing are required. In military and aerospace applications, these criteria are also essential.
The hardware is tailored for packet movement. The packet processors execute security algorithms—compression and encryption, for example—before the network traffic reaches the server. Already, packet processing—where data is identified, inspected, extracted and/or manipulated—is used in networks that demand speed and security. It is found today in border-control and secure-access applications or in firewalls to gather statistics, intercept data and route traffic.
The WANic 56511 has one 10Gbit Ethernet interface, an eight-lane PCI Express host interface, and a 12-core, 750MHz Octeon Plus processor. The WANic 56512 has two 10Gbit Ethernet interfaces, four lane of PCI Express to the host, and a 12-core, 750MHz processor. Both processors have up to 4Gbytes DDR2 SDRAM, up to 4Gbytes of flash disk, and a software-development package that includes universal boot and power on self-test firmware embedded in the hardware, as well as a Linux support package and sample application code.
Server culture
Another company to use multicore processing is Radisys, which has used the Intel Core 2 Duo and Core 2 Quad processors in its RMS420-Q35JD (Figure 2) embedded server for general-purpose x-ray or machine-vision systems. The scalable, embedded server uses the Intel graphics media adapter 3100 to deliver high-resolution video and audio and a PCI Express slot for an additional graphics card or a co-processor. Two other PCI Express slots and four PCI slots are for newer add-on cards and for legacy PCI boards to be used. The server has 667, 800 or 1333MHz front-side bus and a dual-channel memory of up to 8Gbytes of DDR2 at 667 or 800MHz operation.
The company has also used Intel's multicore technology in an embedded system for industrial-automation applications. The Procelerant IS1000 is claimed to be the industry's first integrated, ready-to-use embedded system based on the technology. It was developed specifically for industrial applications and is powered by Microware Real-time Hypervisor virtualisation software. The software enables multiple OSs, for example Windows XP, Linux and OS-9, which can run concurrently, at full speed and without interruption to maximise performance levels. Legacy computing hardware, real-time control software and general-purpose applications can be used on a single, industrial-grade computer.
Virtualisation in operation
Mark Hermeling, senior product manager, multicore and virtualisation at Wind River, has some tips. Virtualisation, he says, provides the opportunity to abstract the operating-system layer from the processor. "Virtualisation is typically achieved by introducing a layer—the virtual machine monitor or hypervisor—directly on top of the processor. The layer then creates the virtual machines in which systems can be executed." The hypervisor can be used to time-share a single core between multiple OSs, running in virtual machines or to partition a four-core chip into two two-way SMP (symmetrical-multiprocessing) partitions. It can also be used, he says, to supervise an AMP (asymmetrical-multiprocessing) system. SMP is one OS controlling multiple cores, but there are scheduling overheads and complexities with this. AMP is better suited to embedded applications as it uses a separate OS on each core, which allows the system developer to combine real-time and open-source OSs in the same design.
Multicore and virtualisation provide the opportunity to combine different OSs in partitions into a single product. "For example," says Hermeling, "you can use a RTOS for those time-critical tasks that need sub-microsecond latencies and use a general-purpose OS, such as Linux, for situations where open-source solutions are available and where time is less important. Or use an RTOS with a GUI, strongly separated so that you can update the GUI functionality frequently without impacting the operation—and certification—of the RTOS."
Virtualisation makes systems more secure. The approach of running applications such as access control or resource management in a virtual machine is increasingly being used with industrial and embedded systems that have many network links, which could otherwise leave the system vulnerable to attack.
In developing a virtualised system, attention must be paid to the way the system memory is accessed by the cores, as this can be the root of bottlenecks, along with caches and I/O. A hypervisor can automatically handle this, although for debugging, clear resource mapping can be used especially where resources are constricted.
Existing multiprocessor designs can be consolidated into a single multicore one. The different security and certification levels, says Hermeling, need not be separated on different processors, with multicore technology. The virtualisation layer maintains the separation, he explains, and ensures that the system is certifiable. The reduction in the number of processors lowers the bill of materials and power consumption.
Hermeling is confident that processors will get more cores and sees this period as a great opportunity to look at the architecture used. Virtualisation can add new functionality to a single core-product, and architectures should be prepared to be flexible.
Security and debugging
Finding the best multicore configuration requires careful investigation, counsels Hermeling. Establishing the configuration and executing benchmarks is really the only way to ensure proper performance, he says. Benchmarks are generic, and the most accurate is derived from a developer's own test, with a test suite that measures the performance of an application on the configuration and reports numbers that are relevant to the developer.
Debugging a multicore system is, unsurprisingly, more complex than debugging a single-core one. Hermeling advises that engineers should think about stopping an application on one core. "The other core in the system is still making progress; therefore any communication buffers between the applications on different cores will quickly fill up and overflow." On-chip debugging through JTAG interfaces has the capability to stop all cores on a chip near-enough at once, to assess the global system state.
Stopping all cores simultaneously is helpful during system start-up. Start-up for multicore systems requires attention, too, as the software has to recognise the multicore system and bring up one core to run the initial tests on memory and I/O. Once the initial tests have been completed, additional cores can be addressed. "This is especially important on a system reset or a power failure because a board may power up differently in a debug session," he says.
Hermeling concludes that multicore systems can provide significant performance benefits and cost savings, but these are highly application-dependent. Finding the best multicore configuration—SMP, AMP, sAMP (supervised AMP) or virtualised—requires careful investigation and holds the key to successful development.
Paul Cavill: ready for info-centric battlefields.
Mark Hermeling: consider options for virtualisation.
Figure 1: Tailored packages: GE Fanuc's WANic 56511.
Figure 2: The WANic 56512 has a 12-core, 750MHz processor and a software-development package embedded in the hardware.
Figure 3: The RMS420-Q35JD uses Intel dual- and quad-core processors.