Modern X86 SIMD Programming - Outline Page 1 of 7D. Kusswurm - F:\\ModX86SIMD\\Outline\\ModernX86SIMD_Outline (v1).docxIntroductionThe Introduction presents an overview of the book and includes concise descriptions of each chapter. It also summaries thehardware and software tools required to use the book's source code.OverviewTarget AudienceChapter DescriptionsSource CodeAdditional ResourcesChapter 1 - SIMD FundamentalsChapter 1 discusses SIMD fundamentals including data types, basic arithmetic, and common data manipulation operations.Understanding of this material is necessary for the reader to successfully comprehend the book's subsequent chapters. What is SIMD? Simple C++ example (Ch01_01)Brief History of x86 SIMD Instruction Set Extensions MMX SSE - SSE4.2 AVX, AVX2, and AVX-512SIMD Data Types Fundamental types 128b, 256b, 512b Integer types Packed i8, i16, i32, i64 (signed and unsigned) Floating-point types Packed f16/b16, f32 and f64 Little-endian storageSIMD Arithmetic IntegerAddition and subtraction Wraparound vs. saturated Multiplication Bitwise logical Floating-point Addition, subtraction, multiplication, division, sqrt Horizontal addition and subtraction Fused multiply-accumulate (FMA)SIMD Operations Integer Min & max Compares Shuffles, permutations, and blends Size promotions and reductions Floating-point Min & max Compares Shuffles, permutations, and blends Size promotions and reductions Modern X86 SIMD Programming - Outline Page 2 of 7D. Kusswurm - F:\\ModX86SIMD\\Outline\\ModernX86SIMD_Outline (v1).docx Masked moves Conditional execution and merging (AVX-512)SIMD Programming Overview C++ compiler options C++ SIMD intrinsic functions Assembly language functions Testing for AVX, AVX2, and AVX-512Chapter 2 - AVX C++ Programming - Part 1Chapter 2 teaches AVX integer arithmetic and other operations using C++ intrinsic functions. It also discusses how to code afew simple image processing algorithms using C++ intrinsic functions and AVX instructions.Basic Integer Arithmetic Addition (Ch02_01)Subtraction (Ch02_02) Multiplication (Ch02_03)Common Integer Operations Bitwise logical operations (Ch02_04) Arithmetic and logical shifts (Ch02_05)Image Processing Algorithms Pixel minimum and maximum (Ch02_06) Pixel mean (Ch02_07)Chapter 3 - AVX C++ Programming - Part 2Chapter 3 is similar to the previous chapter but emphasizes floating-point instead of integer values. This chapter alsoexplains how to employ C++ intrinsic functions to perform SIMD arithmetic operations using floating-point arrays andmatrices.Basic Floating-Point Arithmetic Addition, subtraction, etc. (Ch03_01) Compares (Ch03_02) Conversions (Ch03_03)Floating-Point Arrays Array mean and standard deviation (Ch03_04, Ch03_05) Array square roots and compares (Ch03_06, Ch03_07)Floating-Point Matrices Matrix column means (Ch03_08, Ch03_09)Chapter 4 - AVX2 C++ Programming - Part 1Chapter 4 describes AVX2 integer programming using C++ intrinsic functions. This chapter also highlights the coding of moresophisticated image processing functions using the AVX2 instruction set.Basic Integer Arithmetic Addition and subtraction (Ch04_01) Pack and unpack operations (Ch04_02) Size promotions (Ch04_03)Image Processing Algorithms Pixel clipping (Ch04_04) RGB to grayscale (Ch04_05) Modern X86 SIMD Programming - Outline Page 3 of 7D. Kusswurm - F:\\ModX86SIMD\\Outline\\ModernX86SIMD_Outline (v1).docx Thresholding (Ch04_06) Pixel conversions (Ch04_07)Chapter 5 - AVX2 C++ Programming - Part 2Chapter 5 explains how to accelerate the performance of commonly used floating-point algorithms using C++ intrinsicfunctions and the AVX2 instruction set. The source code examples in this chapter also demonstrate use of FMA (fusedmultiply-add) arithmetic.Floating-Point Arrays Least squares with FMA (Ch05_01)Floating-Point Matrices Matrix multiplication (Ch05_02, Ch05_03) Matrix (4x4) multiplication (Ch05_04, Ch05_05) Matrix (4x4) vector multiplication (Ch05_06) Matrix inversion (Ch05_07, Ch05_08)Chapter 6 - AVX2 C++ Programming - Part 3Chapter 6 is a continuation of the previous chapter. It focuses on more advanced algorithms and SIMD programmingtechniques.Signal Processing Brief overview of convolution arithmetic 1D Convolutions Variable and fixed width kernels (Ch06_01, Ch06_02) 2D Convolutions Non-separable kernel (Ch06_03) Separable kernel (Ch06_04)Chapter 7 - AVX-512 C++ Programming - Part 1Chapter 7 explains AVX-512 integer arithmetic and other operations using C++ intrinsic functions. It also discusses how tocode a few basic image processing algorithms using the AVX-512 instruction set.Integer Arithmetic Addition and subtraction (Ch07_01) Masked arithmetic (Ch07_02)Image Processing RGB to grayscale (Ch07_03) Image thresholding (Ch07_04) Image statistics (Ch07_05)Chapter 8 - AVX-512 C++ Programming - Part 2Chapter 8 describes how to code common and advanced floating-point algorithms using C++ intrinsic functions and the AVX512 instruction set.Floating-Point Arithmetic Addition, subtraction, etc. (Ch08_01) Masked operations (Ch08_02)Floating-Point Arrays Array mean and standard deviation (Ch08_03) Modern X86 SIMD Programming - Outline Page 4 of 7D. Kusswurm - F:\\ModX86SIMD\\Outline\\ModernX86SIMD_Outline (v1).docxFloating-Point Matrices Covariance matrix (Ch08_04) Matrix multiplication (Ch08_05, Ch08_06) Matrix (4x4) vector multiplication (Ch08_07)Signal Processing 1D convolution using variable and fixed width kernels (Ch08_08) 2D convolutions using separable kernel (Ch08_09)Chapter 9 - Supplemental C++ SIMD ProgrammingChapter 9 examines supplemental x86 SIMD programming topics including instruction set detection, how to use SIMD mathlibrary functions, and SIMD operations using text strings.Instruction set detection (Ch09_01)SIMD Math Library Functions Rectangular to polar coordinate conversions (Ch09_02) Body surface area calculations (Ch09_03)SIMD String Operations String length (Ch09_04)Chapter 10 - X86 Processor ArchitectureChapter 10 explains x86 processor architecture including data types, register sets, memory addressing modes, and conditioncodes. Knowledge of this material is necessary for the reader to successfully understand the subsequent x86 assemblylanguage programming chapters.Data types Fundamental data types Numerical data types SIMD data types StringsInternal architecture General-purpose registers RFLAGS register MXCSR register Scalar FP and SIMD registersMemory addressingCondition codesChapter 11 - Core Assembly Language Programming - Part 1Chapter 11 teaches fundamental x86-64 assembly language programming and basic instruction use. Understanding of thismaterial is required to comprehend the source code examples in subsequent chapters.Integer Arithmetic Addition and subtraction (Ch11_01) Multiplication (Ch11_02) Division (Ch11_03) Mixed integer types and stack arguments (Ch11_04)Integer Operations Memory addressing modes (Ch11_05) Simple for-loops (Ch11_06) Modern X86 SIMD Programming - Outline Page 5 of 7D. Kusswurm - F:\\ModX86SIMD\\Outline\\ModernX86SIMD_Outline (v1).docx Compares (Ch11_07)Text Strings String instructions (Ch11_08)Chapter 12 - Core Assembly Language Programming - Part 2Chapter 12 is a continuation of the previous chapter. Topics discussed include scalar floating-point arithmetic, floating-pointarrays, and function calling conventions.Scalar Floating-Point Arithmetic Single-precision arithmetic (Ch12_01) Double-precision arithmetic (Ch12_02) Compares (Ch12_03) Conversions (Ch12_04)Scalar Floating-Point Arrays Mean, SD (Ch12_05)Function Calling Convention Stack frames (Ch12_06) Using non-volatile general-purpose registers (Ch12_07) Using non-volatile SIMD registers (Ch12_08) Macros for function prologues and epilogues (Ch12_09)Chapter 13 - AVX Assembly Language Programming - Part 1Chapter 13 explains AVX integer arithmetic and other operations using x86-64 assembly language. It also describes how tocode a few simple image processing algorithms using assembly language.Integer Arithmetic Addition and subtraction (Ch13_01) Multiplication (Ch13_02)Common Integer Operations Bitwise logical operations (Ch13_03) Arithmetic and logical shifts (Ch13_04)Image Processing Algorithms Pixel minimum and maximum (Ch13_05) Pixel mean (Ch13_06)Chapter 14 - AVX Assembly Language Programming - Part 2Chapter 14 is similar to the previous chapter but uses floating-point instead of integer values. This chapter also illustrateshow to employ x86-64 assembly language to perform SIMD arithmetic operations using arrays and matrices.Basic Floating-Point Arithmetic Addition and subtraction, etc. (Ch14_01) Compares and size conversions (Ch14_02)Floating-Point Arrays Array mean and standard deviation (Ch14_03) Array square roots and compares (Ch14_04)Floating-Point Matrices Matrix column means (Ch14_05) Modern X86 SIMD Programming - Outline Page 6 of 7D. Kusswurm - F:\\ModX86SIMD\\Outline\\ModernX86SIMD_Outline (v1).docxChapter 15 - AVX2 Assembly Language Programming - Part 1Chapter 15 describes AVX2 integer programming using x86-64 assembly language. This chapter also highlights the coding ofmore sophisticated image processing functions using the AVX2 instruction set.Integer Arithmetic Addition and subtraction (Ch15_01)Image Processing Pixel clipping (Ch15_02) RGB to grayscale (Ch15_03) Thresholding (Ch15_04) Pixel conversions (Ch15_05)Chapter 16 - AVX2 Assembly Language Programming - Part 2Chapter 16 explains how to enhance the performance of frequently used floating-point algorithms using x86-64 assemblylanguage and the AVX2 instruction set.Floating-Point Arrays Least squares with FMA (Ch16_01)Floating-Point Matrices Matrix multiplication (Ch16_02) Matrix (4x4) multiplication (Ch16_03) Matrix (4x4) vector multiplication (Ch16_04)Signal Processing 1D convolutions using fixed and variable width kernels (Ch16_05)Chapter 17 - AVX-512 Assembly Language Programming - Part 1Chapter 17 highlights AVX-512 integer arithmetic and other operations using x86-64 assembly language. It also discusseshow to code a few simple image processing algorithms using the AVX-512 instruction set.Integer Arithmetic Addition and subtraction (Ch17_01) Compares, merge masking, and zero-masking (Ch17_02)Image Processing Pixel clipping (Ch17_03) Image statistics (Ch17_04)Chapter 18 - AVX-512 Assembly Language Programming - Part 2Chapter 18 explains how to code common and advanced floating-point algorithms using x86-64 assembly language and theand the AVX-512 instruction set.Floating-Point Arrays Correlation coefficient (Ch18_01) Merge and zero masking (Ch18_02) Embedded rounding and broadcasts (Ch18_03)Floating-Point Matrices Matrix (4x4) vector multiplication (Ch18_04)Signal Processing 1D convolutions using fixed and variable width kernels (Ch18_05) Modern X86 SIMD Programming - Outline Page 7 of 7D. Kusswurm - F:\\ModX86SIMD\\Outline\\ModernX86SIMD_Outline (v1).docxAppendix A - Source Code and Development ToolsAppendix A describes how to download, install, and execute the source code. It also includes some brief usage notesregarding Visual Studio and the GNU C++ compiler.Source Code Download InformationSoftware Development Tools Microsoft Visual Studio GNU C++ compilerAppendix B - References and Additional ResourcesAppendix B contains a list of references that were consulted during the writing of this book. It also lists supplementalresources that the reader can consult for additional x86 SIMD programming information.X86 SIMD Programming ReferencesAlgorithm ReferencesC++ ReferencesAdditional Resources