Username:
Password:
Remember Me?
   Lost your password?
Search



Nvidia's next-gen Compute architecture

G300-Fermi: Nvidia focuses on GPU Computing - Impressive Raytracing demo shots

At the GTC Keynote Nvidia has announced its next generation GPU and Cuda architecture - Fermi (G300). The Californians concentrated on flexible usability and high workload of the 512 Shader ALUs - DirectX 11 was only an aside.
High-Level Diagram of the G300/Fermi
 
High-Level Diagram of the G300/Fermi [Source: view picture gallery]
The architecture of the G300 is code-named Fermi and features about 3 billion transistors, 512 ALUs, up to 6 GiByte GDDR5 RAM and a 384-bit memory interface. Nvidia has not anything revealed about clock rates yet - therefore all details about capability are meant by clock which does not necessarily show the performance ratio of the final products to its predecessors.

With the Fermi architecture Nvidia more and more focuses on GPU computing and also uses those terms in their presentation. The former texture units turned into Load/Store units, the shader ALUs (which Nvidia formerly called stream processors) are Cuda core or Cuda processors now. Certainly chips basing on the Fermi architecture will be DirectX 11-compatible but Nvidia doesn't talk about that much.

Picture gallery  (enlarge to view source)


Fermi Streaming Multiprocessor orSIMD
 
Fermi Streaming Multiprocessor orSIMD [Source: view picture gallery]
Specifications: G300 Fermi
A total of 512 Cuda cores will be on the G300 chip, organized in 16 SIMD units. So every SIMD has 32 ALUs which share the 16 loading and memory units (LS units, ex-TMUs). Up to now Nvidia has not revealed details about the capabilities of the LS units. The presented specifications have not given hints about the power of texture filtering yet. Values between 4 and 16 texture filters per SIMD could be possible.

Special Function Units (SFUs) execute transcendental instructions such as sin, cosine, reciprocal and square root. Each SFU executes one instruction per thread, per clock; a warp executes over eight clocks. The SFU pipeline works independently from the dispatch unit, allowing the dispatch unit to issue to other execution units while the SFU is occupied.

Double precision arithmetic has been improved as well. It does not only fulfill the IEEE 754-2008 floating-point standard (formerly IEEE 754-1985 was used) with the more precise FMA (Fused Multiply-Add, which AMD offers with the HD 5800 series and Nvidia with the GT200 only for DP), but also the DP output increases by factor 8 compared with the GT200 (per clock cycle!). Every SIMD (called Streaming Multiprocessor) can execute 16 FMA operations and 256 per chip respectively - the GT200 was only capable of 30 DP-MADs.

The SIMDs thread in groups of 32 parallel threads called warps. Each SIMD has two warp schedulers and two instruction dispatch units, allowing two warps to be issued and executed concurrently. Every warp scheduler activates either a group of sixteen cores, sixteen load/store units or four SFUs. Since the warps work independently, Fermi's scheduler does not need to check for dependencies from within the instruction stream.

Speeds & Feeds
 
Speeds & Feeds [Source: view picture gallery]


Fermi Cache and RAM
A clever trick is the cache division of the SIMDs. Physically there are 64 kiByte per SIMD/Streaming Multiprocessor. 16 kiByte are fix configurated Shared Memory (as it was with G80 and GT200) and 16 kiByte are configured as Level1 cache - the rest of the 32 kiByte can be freely used.

Furthermore the G300 with Fermi architecture offers a unified Level 2 cache with 768 kiByte capacity.

In our gallery you will also see brand-new screenshots from physics and raytracing techdemos. At the GDC the already known GPU Raytracing demo with a Bugatti Veyron was shown in an updated version and it uses a much more realistic illumination based upon Global Illumination. Other demos show a physically correct water simulation used in films as well as particle effects and a destruction simulation.

Fermi will probably not launch until 2010. Nvidia is still working on details so it will take another couple of months before the graphics cards will hit the retail market.




--
Author: Carsten Spille (Oct 01, 2009)






Advertisement

Comments (16)

Comments 13 to 16  Read all comments here!
connos Re: G300-Fermi: Nvidia focuses on GPU Computing - Impressive Raytracing demo shots
Senior Member
02.10.2009 22:34
Some more on the "fake" Fermi.

www.semiaccurate.com/2009...
chizow Re: G300-Fermi: Nvidia focuses on GPU Computing - Impressive Raytracing demo shots
Senior Member
02.10.2009 19:37
Quote: (Originally Posted by natr0n)
latest word on the e-street is saying that actual card fermi was a prop/fake card. Meaning that ray traced car wasnt rendered on the fermi.

oops

Fuad confirming the card shown on stage was a mock-up, but the ASIC running the N-body simulation was Fermi. Reasons for not showing the working unit were obvious, its apparently a mess of wires and PCB. He was promised a picture of the working GPU later. Also the Bugatti ray-tracing demo was done on a GT200-based card, that was stated during the conference. Most logical reason why, they don't have working drivers beyond the very basic yet for Fermi.

www.fudzilla.com/content/...
connos Re: G300-Fermi: Nvidia focuses on GPU Computing - Impressive Raytracing demo shots
Senior Member
02.10.2009 19:11
Nvidia does thinks like that but who knows, for sure it wasn't real time. They were trying in a harry for sure to stall the ATI sales as if it was possible to render it in real time and demonstrate it they will probably do it.

Also you may think that ATI spread the rumors due that Nvidia didn't demonstrate it in real time.

Copyright © 2014 by Computec Media GmbH      About/Imprint  •  Terms/Conditions