ac6-formation, un département d'Ac6 SAS
 
Site displayed in English (GB)
Site affiché en English (GB)View the site in FrenchVoir le site en English (USA)
go-up

ac6 >> ac6-formation >> Processors >> IBM processors >> PPC970FX implementation Inquire Download as PDF Write us

PC2 PPC970FX implementation

This course covers the IBM Power 970FX Power G5 CPU

formateur
Objectives
  • The course details the pipeline operation in order to determine code optimization guidelines.
  • Data and instruction paths between SDRAM, L1 caches and L2 cache are highlighted.
  • MERSI cache coherency protocol is introduced in increasing depth.
  • The operation of the elastic bus is described.
  • Through a FFT algorithm, the instructor shows how to vectorize processing and reduce execution time using data streaming.
  • The performance monitor is used to optimize the performance of the FFT.
A more detailed course description is available on request at formation@ac6-formation.com
  • Theoretical course
    • PDF course material (in English) supplemented by a printed version for face-to-face courses.
    • Online courses are dispensed using the Teams video-conferencing system.
    • The trainer answers trainees' questions during the training and provide technical and pedagogical assistance.
  • At the start of each session the trainer will interact with the trainees to ensure the course fits their expectations and correct if needed
  • Any embedded systems engineer or technician with the above prerequisites.
  • The prerequisites indicated above are assessed before the training by the technical supervision of the traineein his company, or by the trainee himself in the exceptional case of an individual trainee.
  • Trainee progress is assessed by quizzes offered at the end of various sections to verify that the trainees have assimilated the points presented
  • At the end of the training, each trainee receives a certificate attesting that they have successfully completed the course.
    • In the event of a problem, discovered during the course, due to a lack of prerequisites by the trainee a different or additional training is offered to them, generally to reinforce their prerequisites,in agreement with their company manager if applicable.

Course Outline

  • Functional units
  • Key features
  • Pipeline basics
  • Deeply pipelined design, superscalar implementation, register renaming
  • Branch prediction mechanism
  • Instruction decode and preprocessing
  • Instruction dispatch, sequencing and completion control, register renaming
  • Dispatch group organization
  • Synchronization-based instruction grouping
  • Instruction latencies and throughputs
  • Software optimisation guidelines
  • MMU goals
  • Data address translation, 128-entry Data ERAT, ERAT Miss Queue
  • Second-level Memory Management Unit consisting of SLB and TLB
  • 1024-entry 4-way set associative TLB, 64-entry fully associative SLB
  • Large page support
  • Real memory limit register
  • Hypervisor vs supervisor
  • Support for 32-bit operating systems
  • Data paths between load / store units, instruction queue, L2 and external bus
  • Out-of-order and speculative issue of load operations
  • 32-entry real address based store queues
  • 32-entry load re-order queue, tracking of the order of loads
  • 8-entry load miss queue
  • GUS subsystem
  • Core Interface Unit
  • L2 cache controller
  • Non Cacheable Unit
  • Storage access ordering
  • Hardware controlled data prefetch
  • Prefetch startup sequence, stream detection
  • Synchronization instructions sync, lwsync, ptesync
  • Cache basics
  • 64 kB direct-mapped instruction cache
  • 32 kB 2-way set associative data cache, FIFO replacement policy, Store-through policy
  • 512 kB L2 cache, fully inclusive of L1 data caches, MERSI coherency protocol
  • Cache coherency, MERSI cache line state, cache state transition tables
  • Branch instructions
  • The system call communication path between applications and RTOS
  • Integer load / store instructions
  • Integer arithmetic and logic instructions
  • IEEE754 basics
  • FPU operation : FPSCR register
  • Float load / store instructions, floating point exceptions
  • Float arithmetic instructions
  • The EABI
  • Code and data sections, small data areas benefits
  • 970FX specific registers
  • Objectives
  • Event selection
  • Configuring the performance monitor bus
  • Instruction matching and sampling, the 3 stages of eligibility
  • Exception recognition and priorities
  • Focus on soft patch and maintenance exceptions
  • Registers updating according to the exception cause
  • Requirements to support exception nesting
  • Precise processing of machine check exceptions
  • VMX introduction, SIMD processing
  • Intra vs inter element instructions
  • VMX registers, VSCR initialization
  • ANSI C extension to support vector operators, new C types, new castings, vector declaration and initialization
  • VMX implementation on the PPC970FX
  • Data streams management
  • EABI extension to support VMX
  • Clocking, PLL design
  • Time Base and decrementer
  • Frequency and voltage scaling
  • Additional dynamic power management
  • Unidirectional point-to-point bus segments, source synchronized transfers
  • Packet protocols
  • Snoop response
  • Pipelined transactions
  • Power-on procedure
  • Electrical interface