HBeongpgpu

Parallel Programming for Everyone

Training: HBCS102 - Introduction to GPGPU using CUDA            

New batch of online CUDA training starting 1st September, Enroll Now: info@hbeongpgpu.com

The  training is also offered on-line  in real-time. CLICK HERE 

 

This is how we provide on-site project based training:

 

Course Overview:

This course (HBCS102) is divided into two modules: Level "A" and Level "B" being the most advance course.

Level "A" is an introductory course on parallel programming with about 40% of the time devoted for CUDA programming. This level does not require any parallel computing knowledge. Only a Data structures level course is required. The course starts from C programming language, and covers the detail of Graphics card hardware ( GPU architecture, DRAM, PCIe, etc). Apart from these concepts we also cover elementary concepts in parallel  programming and CUDA  including  "computational thinking", Algorithms,  and some discussion on shared memory usage. Programming on Windows and Linux environment is also taught. Linux introduction is included so that the students and professionals can get all the benefits of working on open source OS.
 
Learning Outcome: The course aims at making the trainee understand how to write simple program  such as squaring of (say) first 10000 integers, and such other simple CUDA programs., and compile the same on  Linux and Windows.  In short the candidate learns how to write simple CUDA programs and understand basic hardware and software details, without bothering about the performance. Also the discussion on Algorithms builds a solid base for getting a head in the world of parallel programming. 
 
Level B is an advance course on CUDA programming. The training comprises of most of the features  of CUDA 2.3 and 3.0.
Learning Outcome:  The trainee learns Implementation and various optimization of  complex algorithms, such as reduction and Pre-fix Sum (Scan), Matrix Transpose. This level aims at making the candidates ready to develop their own applications. Please refer to the detailed syllabus below.     
 
SYLLABUS

LEVEL A

Contents

1. Important C Concepts

 C Basic
    * Token
    * First Program
    * Variable(Data Type)
    * Operator

Building C Program on Linux
    *
Linux basics
    * Compiling
    * Running a Program
   
Control Flow
    * if else
    * Switch Case
    * How to Use Loops

User Defined Data Type
    * Array
    * Pointer
    * Dynamic Memory Allocation: malloc(),free()

Function
    * Type of Function
    * Structure of Function
    * Calling Techniques of Function
 
2-Parallel Algorithms and Computational Thinking 
  
3-Introduction to GPU Hardware
    * Modern GPU Architecture
    * Type of Memory
    * Difference between CPU and GPU
    * PCI-Express Vs PCI
 
4- Getting Started With Cuda
Installation, Driver, Sdk, Toolkit, Basic Programming Concepts, Mode of Parallel programming
CUDA Programming Model, Kernel, Calling Kernel on Device, Compiling and running a CUDA Program
 
5- Shared Memory Usage in CUDA
 
 
 
LEVEL B

LEVEL "B" discusses parallel programming concepts in detail giving specific focus on CUDA programming.  Specifically you are exposed to the following special topics:

 

    *    Performance metrics - speed-up, utilization, efficiency
 
    *    Transparent Scalability
 
    *     Memory organization in CUDA, Discussion on Pinned memory, texture memory & constant memory usage 
 
    *     Error Handling
 
    *     CUDA events

    *     Models of Parallel Computation: SIMD (Single Instruction Multiple Data), MIMD (Multiple Instruction Multiple Data),

    *     GPU Compute Architecture

   *     Memory Optimization (Removing Bank Conflicts in Shared Memory, Partition camping, Global Memory Optimization)
  
    *     Using streams: Overlapping GPU and CPU tasks, Overlapping Computation with Memory Copy
 
    *     Atomic Operations and their limitations    
 
    *     using occupancy calculator, CUDA profiler, Debugger 
 
    *      Performance Guidelines
 
    *    CUBLASS & CUFFT usage
 
    *    Thrust Library & CUDA Data Parallel Primitives Library (CuDPP)  

   *     Implementation of fast Matrix Multiplication, SCAN  (Pre-fix sum) and reduction algorithms , matrix transpose . These algorithms are basic building blocks of many of the complex applications being developed today.

Each concept is explained with practical examples. In short this complete course will expose you to almost  all of the features of CUDA 2.3,  and 3.0 making you ready to write and optimise your own applications. The course will end up in a project as well.

The trainees will also be given a Certificate, subject to the condition that they clear the test at the end of the training. 
_____________________________________________________________________________

Target Audience:

Professionals, researchers and students with background in Mathematics, Computer Science, IT, Electrical, Electronics & Communications, and similar fields can enrol for this course.

___________________________________________________________________

Prerequisites: 

For Level "A", the person should be familiar with the concepts of C programming language. Although the parallel programming will be taught in the training in Level "A", but some exposure to it will help you grasp the concept quickly.  

___________________________________________________________________

Reference Books

Introduction to Algorithms, Third Edition
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein

Introduction to Parallel Computing by Ananth Grama, George Karypis, Vipin Kumar and Anshul Gupta (Pearson)

CUDA programming Guide, CUDA Best Practice Guide  (Download from nvidia.com)

GPU GEMS 3 by Hubert Nguyen 

Reading Material from Internet 

(Cuda Drivers / Cuda Download / Cuda SDK / CUDA VS Wizard)
_____________________________________________________________________________

For any specific information  or query contact us at info@hbeongpgpu.com

Testimonials

  • "With Hbeongpgpu i got to know about the miracles of parallel programming. I have developed a keen interest in this field now. I am really thankful to Heshsham Sir and Prateek S..."
    Got a new direction
  • "Thanks to heshsham sir for his fruitful classroom training program at srmcem lucknow..! I pesonally feel that his guidance on cuda has given me a base of parallel processing fr..."
    Mritunjay Mani
    satisfied

Recent Videos

465 views - 1 comments
474 views - 3 comments
531 views - 2 comments