The Itanium 2 Processor, with the Intel E8870 Chipset, is ...



Comparison of the AMD64 and Intel IA-32e Extensions and the Itanium Architecture

By:

Mathew Makai

Michael Parrill

Elizabeth Rommel

David Winfield

Table of Contents

Introduction - 3 -

AMD64 - 3 -

Itanium - 5 -

IA-32e 64-bit Extension - 8 -

Comparison - 10 -

Conclusion - 11 -

Bibliography - 12 -

Introduction

For years the computer world has been using 32-bit processors, but as technology continues to advance this is quickly becoming inadequate. As consumers continue to demand more and more memory, the number of addresses available from 32 bits is no longer sufficient. For this reason, among others, many companies have begun producing and marketing 64-bit processors, or extensions to 32-bit processors. One example of these extensions is the AMD 64, which expands current AMD 32-bit processors, following the x86 architecture. The IA-32e 64-bit expansion is another extension to an Intel 32-bit processor that also hopes to solve the problem. The Itanium Processor by Intel is a 64-bit processor which is sold in various models. Some of these models have been released on the market, and are on sale, while others are still in the prototype and developmental stages.

AMD64

AMD’s approach to the 64-bit computing market is the AMD64 technology. The AMD64 technology doubles the number of processor registers and increases the addressable memory space well beyond the 4GB limit. The AMD64 technology offers leading-edge performance on current software applications while providing a seamless migration to future 64-bit computing.

The industry standard x86 instructions set architecture has been expanded by AMD64 into the AMD 64-bit platform. The AMD 64-bit platform is unique because it is the first to be fully backwards compatible with existing x86 solutions and still deliver 64-bit performance. In April 2003, AMD released the AMD Opteron Processor, which is a 64-bit processor for servers and workstations. The Opteron utilizes the AMD64 architecture and opened the door for AMD’s new class of computing. AMD then released the world’s first and only Window’s compatible 64-bit desktop and mobile processor, in September 2003 (Bennett, 2003). AMD64 processors are available for servers, workstations, desktops, and mobile PC’s which allow the technology to available to anyone.

The AMD64 Instruction Set Architecture (ISA) was created by AMD to extend the x86 architecture, which has been the foundation of all PC’s since the 8086 processor was developed in 1978 to support 64-bit registers. The ISA enables 64-bit computing, while remaining compatible with 32-bit x86 applications. This lets consumers still use 32-bit applications and operating systems while making the transition to 64-bit computing at their own pace. AMD achieves this by letting the AMD64 processors run in two modes. The first mode is Legacy mode, which removes all 64-bit support and lets the processor run in 32-bit mode. Long mode is consists of two modes, compatibility mode and 64-bit mode. Compatibility mode is designed for 64-bit operating systems, such as the upcoming Microsoft Windows XP 64-bit Edition and the Windows Server 2003 64-bit edition. The advantage to compatibility mode is even though 32-bit applications are still limited to 4 gigabytes of memory, each program can have all of the 4 gigabytes to itself since 64-bit addressing will let the computer address additional memory space. 64-bit mode is only used in a pure 64-bit environment, and offers an advantage of eight extra general purpose registers which are only available in the 64-bit mode. The AMD64 Instruction Set Architecture improves many aspects of the x86 instruction set. AMD64 is different than Intel’s approach to 64-bit processors as seen in their Itanium line, which uses a completely different architecture for their chips (Bennett, 2003).

Internal DDR memory controllers are built into each AMD64 processor to lower the time that it takes for the CPU to access memory. The data flow still needs to travel between the processor and memory, but the communication with the memory controller does not need to be passed outside of the processor. The Opteron and Athlon 64 FX processors incorporate a dual-channel memory controller instead of the single-channel controller present in the other models of AMD64 processors. The memory path in this controller is doubled from 64 to 128 bits. Even thought it is technically not a dual-channel controller, data still flows along two distinct paths, but doubles the data pathways and makes a true 128-bit connection to the system memory. Theoretically, a 64-bit processor could conceivably access over 18 exabytes of physical memory, but in the AMD64 the physical address space now supports up to one terabyte of installed RAM. The upcoming release of Microsoft Windows XP 64-Bit Edition for 64-Bit Extended Systems supports up to 32 gigabytes of RAM and up to 16 terabytes of virtual memory. Current four gigabyte limitations are only problems in Compatibility mode and Legacy mode for the AMD64 ISA (Shimpi, 2003).

AMD64 architecture widens the general purpose registers to 64-bits and increased the amount by eight, totaling 16, which quadruples the general purpose register space currently available. Sixteen 128-bit XMM, or extended memory management, registers for enhanced multimedia performance doubles the space currently provided for SSE/SSE2 implementations. Level 1 cache has been increased to 64kb and is organized as 2-way set associative. A 2-way set associative cache has 2 cache locations in each set of cache locations. Level 1 cache supports two simultaneous 64-bit operations. Level 2 cache has been increased to 1Mb and is organized as 16-way set associative. When a given cache line in the L2 cache contains instruction stream information, the ECC bits associated with the given line are used to store pre-decode and branch prediction information. The SSE2 instruction set has also been introduced which supports 3DNow! Professional, which also includes SSE and 3DNow! Enhanced. Since AMD chose to extend the x86 architecture, software can expand using the x86 technology while receiving all of the benefits of 64-bit computing (Welker, 2004).

One of the most important features of the AMD64 processors is the HyperTransport technology. HyperTransport technology is a high performance system bus technology. HyperTransport supports an overall bus speed of 1.6GHz for up to 6.4GB/sec of bandwidth by using a series of data paths using links from point to point. HyperTransport links the CPU to the Northbridge chipset, Southbridge chipset, system memory, and even other CPU’s in a multiprocessor Opteron system. This technology is constant throughout all of the AMD64 processor line, but some have fewer HyperTransport links (Bennett, 2003). Coherent HyperTransport links are used to communicate cache-coherency information between multiple processors in a multiprocessor system configuration that share data. Opterons have three HyperTransport links, allowing a total system bandwidth of 19.2 GB/sec. The 800 Series Opteron has three coherent links, which allow up to four CPU’s, while the 200 Series combines one coherent, allowing up to 2 CPU’s with two non-coherent links. The Opteron 100 Series consists of three non-coherent HyperTransport links, because with just one processor there's no coherency issue to worry about. The Athlon 64 and Athlon 64 FX are made for single processor use, so they have no need for coherent links. Their system bandwidth is limited to 6.4 GB/sec with one HyperTransport link (Shimpi, 2003). The HyperTransport specifications allow for plenty of bandwidth for current hardware and applications and will support upcoming technologies, such as PCI Express which is said to be the next standard for graphic card interfaces, replacing the current AGP standards.

The many types of AMD 64 Processors released include the Opteron, Athlon 64, Athlon 64 FX, and the Mobile Athlon 64. The Opteron series supports multiprocessing and is marketed towards servers and workstations. The Athlon 64 FX is a port of the Opteron to the single processor line and marketed towards gamers. The Athlon 64 and Mobile Athlon 64 are marketed as high end processors for desktops and notebooks (Welker, 2004). AMD is making the transition from 32-bit computing to 64-bit computing easy by making the technology available to everyone.

With all of the improvements made on the x86 instruction set, AMD is paving the way for the future of computing. Introducing the first Windows compatible 64-bit processor and making sure that it was backwards compatible with current and older instructions lets people adapt to the new technology at their own pace. Therefore, they aren’t throwing everything out the window and starting from scratch as with the Itanium architecture. Until now, AMD has always been seen as the second best to Intel, but the AMD64 technology has shown that AMD has what it takes to at least compete with Intel, if not perform better than Intel.

Itanium

The Itanium Processor is the result of a partnership between Hewlett Packard (HP) and Intel Corporation. While many other companies have created 64-bit and even 128-bit processors, Intel and HP hope to successfully introduce a 64-bit processor into an environment which is dominated by 32-bit (IA-32) processors.

Throughout the years, computer scientists have underestimated how fast we will need computers to be in the future. The 64-bit processor is simply a continuation of this idea. One reason to move past the 32-bit processors is because of limitations. One of these limitations is with memory addressing. The IA-32 processors can, without special help, address only 4GB of memory at one time. There is an option to implement special memory paging application program interfaces (API), however. These APIs, however, slow down performance, simply because it takes longer to translate the address (Intel Corporation, 2002a). Half of the memory addressable units, under a Microsoft Windows environment, are allocated to the operating system. The Itanium processors can address as much as 264 memory allocations and this will be sufficient enough for at least some time.

Initially, software written and developed for the IA-32 processors would not run as efficiently on the Itanium processors as it did the IA-32. The reason is because the software written for IA-32 computers had to be run using an emulator (Intel Corporation 2002a). The only other option would be to port the software to the new 64-bit environment. Due to improvements and with the release of Itanium 2 Processors, this is no longer the case. The Itanium 2 processor can now run both 32-bit and 64-bit software.

The Itanium 2 has a top clock speed 1.5 GHz with a system bus of 400MHz. It has 3 caches all on-die. This means that the caches are all on the processor chip. L1 cache is not a unified cache, which means that the instruction cache (L1I) and the data cache (L1D) are separate. The L1D only caches data from the integer data; another cache must be used for floating point data. Both the L1I and the L1D are of size 16KB, for a combined total of 32KB. The L2 is a unified cache and can take floating-point instructions (Intel Corporation, 2002a). The L2 cache size is 256KB. The L3 cache, while still being on-die, is separate from the regular system bus. It must be accessed by a 128-bit back-side bus. Out of the three caches, the L3 is the largest; its size is 6mb. The L2 and L3 cache can be accessed at the full clock speed of the processor, therefore minimizing overall latency (Intel Corporation, 2002a).

There are three Translation Lookaside Buffers with the Itanium processors. They are the first level Data Translation Lookaside Buffer (DTLB1), the second level Data Translation Lookaside Buffer (DTBL2) and the Instruction Translation Lookaside Buffer (ITLB). The DTBL1 only has 32 entries and its main job is to keep a cached copy of the main table in DTBL2. The DTBL2, on the other hand, has 96 entries and keeps a record of the defined page sizes, the data Translation Registers (TR) and entries in the data Translation Cache (TC) (Intel Corporation, 2002a). The ITLB is very similar to the DTLB2, except that it only holds 64 entries and also holds instruction TRs and instruction TCs instead of data ones.

The Itanium Processor pipeline is based on EPIC (Explicitly Parallel Instruction Computing) when executing its “fetch, decode and execute” cycle. The actual pipeline is broken up into 10 different stages; thought not all can be done parallel to each other. The first three stages fetch the instructions and then deliver them in such a way that one end of the machine can work independently of the back end of the machine. The next two stages involving dispersing and renaming of the registers. The 6th and 7th stage involve the reading of the register file and dispersement of the data. The last three stages involve the parallel execution, exception handling as well as retirement of the instruction. The pipeline also provides hardware for execution of multiple units. These units are six integer ALUs, six multimedia ALUs, two Extended Precision Floating Point Units, two Single Precision Floating Point Units as well as two Load/Store Units. With this combination, and since all these instructions are not done in parallel, the processor can fetch, decode and execute 6 instructions per clock cycle (Intel Corporation, 2002a).

Working alongside the pipeline is structure that deals with prefetching. Prefetching acts as a bridge between the L1I and L2 cache. Instructions are prefetched from the L2 cache to hopefully prevent misses with the L1I. Instructions that are prefetched are done on a speculative basis, and the logic of which instructions to prefetch is handled by branch prediction. The processor will hold up to 4 different predictions, all of which are done parallel to the prefetch instruction from L1I to L2.

The Floating Point Units listed above have a four stage pipeline. This pipeline can allow either two Floating Point (FP) operations or two Integer multiplications; along with two FP load and two FP store commands. Two FP Multiply Accumulate (FMAC) hardware devices are also supported. These FMACs can execute single, double or mixed FP operations.

The Itanium architecture implements a massive amount of registers. With a large amount of registers, writing and reading to memory is minimized. There are 128 general registers (GR), 128 floating point registers, 64 predicate registers and 6 branch registers. Each general register is 64-bits and provides the processor with integer and multimedia integer computation. All general registers available to all programs. General registers are broken up into two sets, GR0-GR31 are static general registers, and GR0 is always 0 when it is an operand. GR32 through GR127 are deemed as stackable registers and are available to programs in groups. All floating point registers (FR) are 82-bits and are also broken up into two sets. Registers FR0 through FR31 are static registers and FR0 and FR1 are given the value +0.0 and +1.0 when sourced as an operand. Registers FR32-FR127 may be renamed in order to accelerate loops.

Predicate registers (PR) are used when a comparison takes place. There are 64 PRs and are each only 1 bit in length. PR0 through PR15 are static and PR0 always reads 1 when an operand. The rest, PR16 through PR63 are considered rotating registers and are used to support efficiency in pipeline loops. Branch registers (BR) are used to hold the branching information discussed above. There are only 8 of these registers and they are 64-bit. There is no subdivision, unlike the other registers, and the branching information consists of the address of each predicted branch.

The Register Stack Engine (RSE) is used to maintain the registers. This way, a register is not overwritten if there is an empty register of the same type available. It also helps in over-spilling of registers. If for instance, all the registers have information in them, the RSE will save all the registers that will be overwritten into memory. This way, the content of each register may be recalled easily. Such an implementation can give the illusion of infinite registers (provided the computer’s memory does not fill up.)

To be able to switch between IA-32 and the regular Itanium instruction set, all that is needed are three special instructions and interruptions (Intel Corporation, 2002b). The first one is jpme, which is an IA-32 instruction. This instruction jumps to a targeted Itanium instruction. It also tells the processor to use the Itanium instruction set. The next instruction is called br.ia. This is an Itanium instruction, which means to branch to an IA-32 instruction, while also making the IA-32 the active instruction set (Intel Corporation, 2002b). The last instruction is rfi, which stands for “Return from Interrupt.” This is another Itanium instruction that will return to either a targeted IA-32 or Itanium instruction.

IA-32e 64-bit Extensions

The IA-32e 64-bit extension set is Intel’s response to the AMD64 architecture used with Athlon64 and Opteron processors. Although Intel already sold the 64-bit Itanium processor, widespread adoption of the architecture remains elusive in both the desktop and server markets. The central problem with the Itanium is that 32-bit applications must run in an emulation mode which is vastly inferior to current 32-bit-specific processors. The new IA-32e extensions allow users to run current 32-bit programs and also retain support for future 64-bit applications. However, the specifications of the new IA-32e operating modes are different from previous chips, and recompilation of applications is required to take advantage of the 64-bit extensions. Both the Xeon server-line and the Pentium desktop-line of processors will support the new IA-32e extensions when they are implemented in mid-2004 (Turner 2004).

The main two reasons Intel was forced to implement 64-bit extensions were the limit of 32-bit virtual memory addressing, and pressure from tier one vendors to provide a counter to the AMD64 extensions. Although 32-bit memory addressing allows for 232 (four gigabytes) of unique unsigned address locations, Intel uses a two’s complement scheme that limits the actual number of locations to 231-1 addresses, or roughly two gigabytes. With 64-bits, the total number of memory addresses increases to 263-1 (eight exabytes) with a two’s complement representation. Although desktop computers in the first half of 2004 usually have between one half gigabyte to one gigabyte of main memory, 64-bit addressing must become establish before the 32-bit virtual memory address limit is reached. This will allow time for widespread adoption of the new 64-bit architecture.

Another factor that pushed Intel to announce the IA-32e extensions was pressure from tier one computer vendors to counter the Athlon64 in the desktop market and the Opteron in the server market. Tier one vendors are large corporations such as Dell Inc., Hewlett-Packard, and IBM, who have the largest market share among original equipment manufacturers (OEMs) for Intel. Dell in particular put pressure on Intel because it does not sell AMD processors in its computer systems and it “remains the world’s #1 direct-sale computer vendor (Yahoo! 2004).” Without offering a moderately priced 64-bit processor in their desktop and server systems, Dell and other tier one vendors could potentially lose market-share to smaller computer vendors. Therefore, Intel was pressure to introduce a 64-bit extended architecture that cost much less than its expensive Itanium platform. These two main factors led Intel to create the IA-32e extensions for the x86 architecture.

One of the main goals of extending the x86 microprocessor architecture is to retain legacy support for current software. The new extensions are only used for 64-bit functions; otherwise chips function as 32-bit microprocessors (Intel 2004b). The IA-32e extensions add 64-bit capabilities to software compiled with 64-bit addressing in mind, without compromising the compatibility of applications that have not been recompiled. In fact, legacy 32-bit applications can run at the same time as 64-bit applications because compatibility mode is determined by the operating system for individual code segments.

The IA-32e extensions capabilities rely on a 64-bit operating system (OS) and 64-bit applications run under the OS. Numerous new features are available to programmers in the IA-32e operating mode, among them: 64-bits for linear addressing, register extensions requiring new opcode prefixes, additional 64-bit general purpose registers, a and a 64-bit instruction pointer.

64-bit linear addressing has already been discussed, but its importance cannot be understated. In 1981, Bill Gates claimed, “640K ought to be enough for anybody,” in reference to the maximum amount of memory an IBM-compatible personal computer could address at that time (). In computer science, progress in the field is inevitable. The best way to deal with the changes brought about by progress is adequate preparation. 64-bit linear addressing prepares the desktop and server markets for new applications that will emerge in the future (Intel 2004b).

The 64-bit addresses will require new opcode prefixes to take advantage of larger memory capacities. Intel technical specifications for the IA-32e extensions designate the default sizes for addresses and operands at 64-bits and 32-bits, respectively (Intel 2004b). The address and operand size prefixes can specify both 64-bit and 32-bit addresses in 64-bit mode for each separate instruction. Compatibility mode for legacy applications allows for 32-bit and 16-bit opcode prefixes that specify both addresses. However, 16-bit addresses are not supported in 64-bit mode. The new extensions allow programmers flexibility when designing applications.

Programmers also gain flexibility with additional 64-bit general purpose registers (GPRs). There are new GPRs under the IA-32e extension mode. When these new GPRs are combined with the previous eight registers, a total of sixteen GPRs are available to programmers in the 64-bit operating mode. One issue with maintaining backwards-compatibility is that the upper 32-bits of the 64-bit registers are not preserved when 32-bit data is loaded into them. Programmers must not rely on the preservation of the upper bits when switching between legacy applications and recompiled 64-bit programs.

The 64-bit instruction pointer is required for the new architecture to take advantage of addressing more than four gigabytes of main memory. In legacy mode, the instruction pointer retains 32-bit support.

The Intel IA-32e extensions provide support for all legacy applications that have been developed for the x86 architecture. By extending the architecture to 64-bits, future applications that can take advantage of more than four gigabytes of memory. The new features included with the extensions are 64-bits for linear addressing, register extensions that require new opcode prefixes, additional 64-bit general purpose registers, and a 64-bit instruction pointer.

Comparison

Each of these 64-bit architectures, and the processors built following them have something to offer, but which one is really the best? The answer to this question will be different for each consumer, dependent on what they want to be using their computer for and how much they are willing to spend on their new processor. When comparing the three 64-bit processors detailed earlier, one must consider budget, use, and ease of instillation among other things.

Although the Itanium is typically the most costly, ranging anywhere up to $5,200, it also far outranks its competitors in the number of general purpose registers. The Itanium, with its 128 general purpose registers would dramatically increase the speed of programs written to utilize this feature. Unfortunately, there are currently not many programs that are written to make use of this many registers, simply because there has never before been the option of doing so. This will eventually change, but most likely only a few select games would show the difference as is. As the technology market continues to evolve, the cost of the Itanium, and the number of programs written to make use of the large number of registers will continue to become more appealing to potential buyers. The Itanium comes in various makes; varying clock speed, bus speed, cache size and more importantly to many buyers – price. While some models have begun to appear on the market, others have not yet made their appearances in stores.

The less costly, but very similar IA-32e extension also ranges in cost, but can usually be found for under $1000. This processor has a superior clock speed to many others and a competitive bus speed as well. The IA-32e and Itanium are very similar to each other, both having a very rapid integer processing speed as well as many other shared features. The IA-32 applications are supported by the Itanium, and will soon be enhanced with the introduction of the IA-32 execution layer.

The AMD64 is much less costly than either the IA-32 or Itanium, which is much more appealing to many buyers. The AMD, like it’s competition ranges in cost, but can be found for under $800. The AMD was one of the first 64-bit processors to hit the market, but Intel quickly responded by producing and releasing the IA-32, and now the Itanium. With the release of the AMD64, AMD showed the computing world that they were unafraid of competing with Intel, and in fact could possibly prove to have better products than Intel. The AMD64 was built specifically to enhance the x86 instruction set, which was the most widely used architecture when the AMD64 was released. The Itanium was initially not backwards compatible, meaning it would not initially be able to work with 32-bit coded programs, whereas the AMD64 was backwards compatible upon its release. Although Intel has now fixed their compatibility issue, their mistake has made many people feel the less costly AMD64 is a much better buy. The AMD64 also has a far superior bus speed, twice that of the IA-32, and four times that of the Itanium. Although this may not matter to some buyers, it is one of the limiting factors on the overall speed of the computer, and a faster bus speed could be very appealing depending on the type of work intended for the computer.

As far as price goes, there is no good reason not to purchase the AMD64, in fact many would argue this point when regardless of the considerations. With a bus speed 4-times as fast as the IA-32, a clock speed comparable to the Itanium and IA-32e, and the same number of general purpose registers as the IA-32 (nowhere near the Itanium’s), some would say the only thing the AMD64 has against it is the lack of an Intel patent. Others however could see this as a positive aspect, and would see no reason not to go with the AMD64.

Another factor to consider when looking at the three architectures is the intended use. The Itanium has a faster transaction application, making it useful for data-base and other high-end applications. It performs very fast floating point applications, and is therefore better capable of completing tasks needed in this environment. A typical user would be a knowledge management business person, looking to store massive quantities of data, and wishing to have the capability to access this data quickly. The IA-32 however, is better for workgroups, workstations, and the Web. Since it is easily compatible with both 32 and 64-bit machines, it can adapt to fit the other computers on the network. The IA-32 does allow more memory access to those applications coded for 64-bit processing, but will also provide full native 32-bit performance. Although it can also be used for server or workstations, the AMD provides the best results when the user is looking for a cinematic area of excellence. The AMD64, with its very impressive bus speed of 1600MHz also has the speed capability to work as a server or workstation, but the IA-32 would perform better.

Conclusion

64-bit architectures are slowly becoming more and more prevalent in the processor market. As this technology continues to be developed, and continues to decrease in price these architectures will begin outdating the older 32-bit processors. The AMD64, IA-32e 64-bit extension, and the Intel Itanium are only three of many architectures that will begin to be developed as consumers become more accustomed to the idea of a 64-bit processors. Although each has its own marketable features, the two Intel architectures are, as expected, very similar. The AMD64, coming from a generally less know, and less trusted company also has a lot to offer, but it has yet to be seen if the cheaper AMD processors will push Athlon into threatening competition with Intel.

Bibliography

Antioffline (2004). “Bill Gates Speaks.” URL:

Bennett, Kyle (2003). “Athlon 64 Vs. Pentium 4.” URL:

Intel Corporation (2002a). “Intel Itanium 2 Processor: Hardware Developer’s Manual.” URL:

Intel Corporation (2002b). “Intel Itanium Architecture Software Developer’s Manual.” URL:

Intel Corporation (2004a). “Intel Itanium 2 Processor” URL:



Intel Corporation (2004b). “64-Bit Extension Technology Software Developer’s Guide.”

URL:

Shimpi, Anand (2003). “AMD Athlon 64 and AMD Athlon 64 FX – It’s Judgment Day.” URL:

Turner, Vernon (2004). “Intel Announces Xeon Processor with 64-Bit Extensions.”

URL:

IDC_Intel_Xeon_Whitepaper.pdf

Welker, Mark (2004). “AMD Processor Performance Evaluation Guide.” URL:

Yahoo! Finance (2004). “DELL: Profile for DELL INC.” URL:



................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download