High-Level Shader Language Specification
Working Draft

Introduction

The hlsl is the GPU programming language provided in conjunction with the dx runtime. Over many years its use has expanded to cover every major rendering API across all major development platforms. Despite its popularity and long history hlsl has never had a formal language specification. This document seeks to change that.

hlsl draws heavy inspiration originally from isoC and later from isoCPP with additions specific to graphics and parallel computation programming. The language is also influenced to a lesser degree by other popular graphics and parallel programming languages.

hlsl has two reference implementations which this specification draws heavily from. The original reference implementation fxc has been in use since dx 9. The more recent reference implementation dxc has been the primary shader compiler since dx 12.

In writing this specification bias is leaned toward the language behavior of dxc rather than the behavior of fxc, although that can vary by context.

In very rare instances this spec will be aspirational, and may diverge from both reference implementation behaviors. This will only be done in instances where there is an intent to alter implementation behavior in the future. Since this document and the implementations are living sources, one or the other may be ahead in different regards at any point in time.

Scope

This document specifies the requirements for implementations of hlsl. The hlsl specification is based on and highly influenced by the specifications for the c and the cpp.

This document covers both describing the language grammar and semantics for hlsl, and (in later sections) the standard library of data types used in shader programming.

Normative References

The following referenced documents provide significant influence on this document and should be used in conjunction with interpreting this standard.

Terms and definitions

This document aims to use terms consistent with their definitions in isoC and isoCPP. In cases where the definitions are unclear, or where this document diverges this section, the remaining sections in this chapter, and the attached [glossaries].

Runtime Targeting

hlsl evolved out of dx and gained popularity because it targeted a common hardware description which all conforming drivers were required to support. This common hardware description, called a sm, is an integral part of the description for hlsl . Some hlsl features require specific sm features, and are only supported by compilers when targeting those sm versions or later.

spmd Programming Model

hlsl uses a spmd programming model where a program describes operations on a single element of data, but when the program executes it executes across more than one element at a time. This programming model is useful due to GPUs largely being simd hardware architectures where each instruction natively executes across multiple data elements at the same time.

There are many different terms of art for describing the elements of a GPU architecture and the way they relate to the spmd program model. In this document we will use the terms as defined in the following subsections.

lane

A lane represents a single computed element in an spmd program. In a traditional programming model it would be analogous to a thread of execution, however it differs in one key way. In multi-threaded programming threads advance independent of each other. In spmd programs, a group of lanes execute instructions in lock step because each instruction is a simd instruction computing the results for multiple lanes simultaneously.

wave

A grouping of lanes for execution is called a wave. wave sizes vary by hardware architecture. Some hardware implementations support multiple wave sizes. Generally wave sizes are powers of two, but there is no requirement that be the case. hlsl is explicitly designed to run on hardware with arbitrary wave sizes.

quad

A quad is a subdivision of four lanes in a wave which are computing adjacent values. In pixel shaders a quad may represent four adjacent pixels and quad operations allow passing data between adjacent lanes. In compute shaders quads may be one or two dimensional depending on the workload dimensionality described in the numthreads attribute on the entry function. (FIXME: Add reference to attribute)

threadgroup

A grouping of waves executing the same shader to produce a combined result is called a threadgroup. threadgroups are executed on separate simd hardware and are not instruction locked with other threadgroups.

dispatch

A grouping of threadgroups which represents the full execution of a hlsl program and results in a completed result for all input data elements.

hlsl Memory Models

Memory accesses for sm 5.0 and earlier operate on 128-bit slots aligned on 128-bit boundaries. This optimized for the common case in early shaders where data being processed on the GPU was usually 4-element vectors of 32-bit data types.

On modern hardware memory access restrictions are loosened, and reads of 32-bit multiples are supported starting with sm 5.1 and reads of 16-bit multiples are supported with sm 6.0. sm features are fully documented in the dx Specifications, and this document will not attempt to elaborate further.

Common Definitions

The following definitions are consistent between hlsl and the isoC and isoCPP specifications, however they are included here for reader convenience.

Diagnostic Message

An implementation defined message belonging to a subset of the implementation’s output messages which communicates diagnostic information to the user.

Ill-formed Program

A program that is not well formed, for which the implementation is expected to return unsuccessfully and produce one or more diagnostic messages.

Implementation-defined Behavior

Behavior of a well formed program and correct data which may vary by the implementation, and the implementation is expected to document the behavior.

Implementation Limits

Restrictions imposed upon programs by the implementation.

Undefined Behavior

Behavior of invalid program constructs or incorrect data which this standard imposes no requirements, or does not sufficiently detail.

Unspecified Behavior

Behavior of a well formed program and correct data which may vary by the implementation, and the implementation is not expected to document the behavior.

Well-formed Program

An HLSL program constructed according to the syntax rules, diagnosable semantic rules, and the One Definition Rule.

[glossaries]