Name
    
    NV_fragment_program

Name Strings

    GL_NV_fragment_program

Contact

    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
    Mark J. Kilgard, NVIDIA Corporation (mjk 'at' nvidia.com)

Notice

    Copyright NVIDIA Corporation, 2001-2002.

IP Status

    NVIDIA Proprietary.

Status

    Implemented in CineFX (NV30) Emulation driver, August 2002.
    Shipping in Release 40 NVIDIA driver for CineFX hardware, January 2003.

Version

    Last Modified Date:  2005/05/24
    NVIDIA Revision:     73

Number

    282

Dependencies

    Written based on the wording of the OpenGL 1.2.1 specification and
    requires OpenGL 1.2.1.

    Requires support for the ARB_multitexture extension with at least
    two texture units.

    NV_vertex_program affects the definition of this extension.  The only
    dependency is that both extensions use the same mechanisms for defining
    and binding programs.

    NV_texture_shader trivially affects the definition of this extension.

    NV_texture_rectangle trivially affects the definition of this extension.

    ARB_texture_cube_map trivially affects the definition of this extension.

    EXT_fog_coord trivially affects the definition of this extension.

    NV_depth_clamp affects the definition of this extension.

    ARB_depth_texture and SGIX_depth_texture affect the definition of this
    extension.

    NV_float_buffer affects the definition of this extension.

    ARB_vertex_program affects the definition of this extension.

    ARB_fragment_program affects the definition of this extension.

Overview

    OpenGL mandates a certain set of configurable per-fragment computations
    defining texture lookup, texture environment, color sum, and fog
    operations.  Each of these areas provide a useful but limited set of fixed
    operations.  For example, unextended OpenGL 1.2.1 provides only four
    texture environment modes, color sum, and three fog modes.  Many OpenGL
    extensions have either improved existing functionality or introduced new
    configurable fragment operations.  While these extensions have enabled new
    and interesting rendering effects, the set of effects is limited by the
    set of special modes introduced by the extension.  This lack of
    flexibility is in contrast to the high-level of programmability of
    general-purpose CPUs and other (frequently software-based) shading
    languages.  The purpose of this extension is to expose to the OpenGL
    application writer an unprecedented degree of programmability in the
    computation of final fragment colors and depth values.

    This extension provides a mechanism for defining fragment program
    instruction sequences for application-defined fragment programs.  When in
    fragment program mode, a program is executed each time a fragment is
    produced by rasterization.  The inputs for the program are the attributes
    (position, colors, texture coordinates) associated with the fragment and a
    set of constant registers.  A fragment program can perform mathematical
    computations and texture lookups using arbitrary texture coordinates.  The
    results of a fragment program are new color and depth values for the
    fragment.

    This extension defines a programming model including a 4-component vector
    instruction set, 16- and 32-bit floating-point data types, and a
    relatively large set of temporary registers.  The programming model also
    includes a condition code vector which can be used to mask register writes
    at run-time or kill fragments altogether.  The syntax, program
    instructions, and general semantics are similar to those in the
    NV_vertex_program and NV_vertex_program2 extensions, which provide for the
    execution of an arbitrary program each time the GL receives a vertex.

    The fragment program execution environment is designed for efficient
    hardware implementation and to support a wide variety of programs.  By
    design, the entire set of existing fragment programs defined by existing
    OpenGL per-fragment computation extensions can be implemented using the
    extension's programming model.

    The fragment program execution environment accesses textures via
    arbitrarily computed texture coordinates.  As such, there is no necessary
    correspondence between the texture coordinates and texture maps previously
    lumped into a single "texture unit".  This extension separates the notion
    of "texture coordinate sets" and "texture image units" (texture maps and
    associated parameters), allowing implementations with a different number
    of each.  The initial implementation of this extension will support 8
    texture coordinate sets and 16 texture image units.

Issues

    What limitations exist in this extension?

        RESOLVED:  Very few.  Programs can not exceed a maximum program length
        (which is no less than 1024 instructions), and can use no more than
        32-64 temporary registers.  Programs can not access more than one
        fragment attribute or program parameter (constant) per instruction,
        but can work around this restriction using temporaries.  The number of
        textures that can be used by a program is limited to the number of
        texture image units provided by the implementation (16 in the initial
        implementation of this extension).

        These limits are fairly high.  Additionally, there is no limit on the
        total number of texture lookups that can be performed by a program.
        There is no limit on the length of a texture dependency chain -- one
        can write a program that performs over 1000 consecutive dependent
        texture lookups.  There is no restrictions on dependencies between
        texture mapping instructions and arithmetic instructions.  Texture
        lookups can be performed using arbitrarily computed texture
        coordinates.  Applications can carry out their calculations with full
        32-bit single precision, although two lower-precision modes are also
        available.

    How does texture mapping work with fragment programs?

        RESOLVED:  This extension provides three instructions used to perform
        texture lookups.

        The "TEX" instruction performs a lookup with the (s,t,r) values taken
        from an interpolated texture coordinate, an arbitrarily computed
        vector, or even a program constant.  The "TXP" instruction performs a
        similar lookup, except that it uses the fourth component of the source
        vector to performs a perspective divide, using (s/q, t/q, r/q).  In
        both cases, the GL will automatically compute partial derivatives used
        for filter and LOD selection.

        The "TXD" instruction operates like "TEX", except that it allows the
        program to explicitly specify two additional vectors containing the
        partial derivatives of the texture coordinate with respect to x and y
        window coordinates.

        All three instructions write a filtered texel value to a temporary or
        output register.  Other than the computation of texture coordinates
        and partial derivatives, texture lookups not performed any differently
        in fragment program mode.  In particular, any applicable LOD biases,
        wrap modes, minification and magnification filters, and anisotropic
        filtering controls are still applied in fragment program mode.

        The results of the texture lookup are available to be used arbitrarily
        by subsequent fragment program instructions.  Fragment programs are
        allowed to access any texture map arbitrarily many times.

    Can fragment programs be used to compute depth values?

         RESOLVED:  Yes.  A fragment program can perform arbitrary
         computations to compute a final value for the fragment, which it
         should write to the "z" component of the o[DEPR] register.  The "z"
         value written should be in the range [0,1], regardless of the size of
         the depth buffer.  

         To assist in the computation of the final Z value, a fragment program
         can access the interpolated depth of the fragment (prior to any
         displacement) by reading the "z" component of the f[WPOS] attribute
         register.

    How should near and far plane clipping work in fragment program mode if
    the current fragment program computes a depth value?

        RESOLVED:  Geometric clipping to the near and far clip plane should be
        disabled.  Clipping should be done based on the depth values computed
        per-fragment.  The rationale is that per-fragment depth displacement
        operations may effectively move portions of a primitive initially
        outside the clip volume inside, and vice versa.

        Note that under the NV_depth_clamp extension, geometric clipping to
        the near and far clip planes is also disabled, and the fragment depth
        values are clamped to the depth range.  If depth clamp mode is enabled
        when using a fragment program that computes a depth value, the
        computed depth value will be clamped to the depth range.

    Should fragment programs be allowed to use multiple precisions for
    operands and operations?

        RESOLVED:  Yes.  Low-precision operands are generally adequate for
        representing colors.  Allowing low-precision registers also allows for
        a larger number of temporary registers (at lower precision).
        Low-precision operations also provide the opportunity for a higher
        level of performance.  

        Applications are free to use only high-precision operations or mix
        high- and low-precision operations as necessary.

    What levels of precision are supported in arithmetic operations?

        RESOLVED:  Arithmetic operations can be performed at three different
        precisions.  32-bit floating point precision (fp32) uses the IEEE
        single-precision standard with a sign bit, 8 exponent bits, and 23
        mantissa bits.  16-bit floating-point precision (fp16) uses a similar
        floating-point representation, but with 5 exponent bits and 10
        mantissa bits.  Additionally, many arithmetic operations can also be
        carried out at 12-bit fixed point precision (fx12), where values in
        the range [-2,+2) are represented as signed values with 10 fraction
        bits.

    How should the precision with which operations are carried out be
    specified?  Should we infer the precision from the types of the operands
    or result vectors?  Or should it be an attribute of the instruction?

        RESOLVED:  Applications can optionally specify the precision of
        individual instructions by adding a suffix of "R", "H", and "X" to
        instruction names to select fp32, fp16, and fx12 precision,
        respectively.  

        By default, instructions will be carried out using the precision of
        the destination register.  Always inferring the precision from the
        operands has a number of issues.  First, there are a number of
        operations (e.g., TEX/TXP/TXD) where result type has little to no
        correspondance to the type of the operands.  In these cases, precision
        suffixes are not supported.  Second, one could have instructions
        automatically cast operands and compute results using the type of the
        highest precision operand or result.  This behavior would be
        problematic since all fragment attribute registers and program
        parameters are kept at full precision, but full precision may not be
        needed by the operation.

        The choice of precision level allows programs to trade off precision
        for potentially higher performance.  Giving the program explicit
        control over the precision also allows it to dictate precision
        explicitly and eliminate any uncertainty over type casting.

    For instructions whose specified precision is different than the precision
    of the operands or the result registers, how are the operations performed?
    How are the condition codes updated?

        RESOLVED:  Operations are performed with operands and results at the
        precision specified by the instruction.  After the operation is
        complete, the result is converted to the precision of the destination
        register, after which the condition code is generated.

        In an alternate approach, the condition code could be generated from
        the result.  However, in some cases, the register contents would not
        match the condition code.  In such cases, it may not be reliable to
        use the condition code to prevent division by zero or other special
        cases.

    How does this extension interact with the ARB_multisample extension?  In
    the ARB_multisample extension, each fragment has multiple depth values.
    In this extension, a single interpolated depth value may be modified by a
    fragment program.

        RESOLVED:  The depth values for the extra samples are generated by
        computing partials of the computed depth value and using these
        partials to derive the depth values for each of the extra samples.

    How does this extension interact with polygon offset?  Both extensions
    modify fragment depth values.

        RESOLVED:  As in the base OpenGL spec, the depth offset generated by
        polygon offset is added during polygon rasterization.  The depth value
        provided to programs in f[WPOS].z already includes polygon offset, if
        enabled.  If the depth value is replaced by a fragment program, the
        polygon offset value will NOT be recomputed and added back after
        program execution.
  
        This is probably not desirable for fragment programs that modify depth
        values since the partials used to generate the offset may not match
        the partials of the computed depth value.  Polygon offset for filled
        polygons can be approximated in a fragment program using the depth
        partials obtained by the DDX and DDY instructions.  This will not work
        properly for line- and point-mode polygons, since the partials used
        for offset are computed over the polygon, while the partials resulting
        from the DDX and DDY instructions are computed along the line (or are
        zero for point-mode polygons).  In addition, separate treatment of
        points, line segments, and polygons is not possible in a fragment
        program.

    Should depth component replacement be an property of the fragment program
    or a separate enable?

        RESOLVED:  It should be a program property.  Using the output register
        notation simplifies matters:  depth components are replaced if and
        only if the DEPR register is written to.  This alleviates the
        application and driver burden of maintaining separate state.

    How does this extension affect the handling of q texture coordinates in
    the OpenGL spec?
       
        RESOLVED:  Fragment programs are allowed to access an associated q
        texture coordinate, so this attribute must be produced by
        rasterization.  In unextended OpenGL 1.2, the q coordinate is
        eliminated in the rasterization portions of the spec after dividing
        each of s, t, and r by it.  This extension updates the specification
        to pass q coordinates through at least to conventional texture
        mapping.  When fragment program mode are disabled, q coordinates will
        be eliminated there in an identical manner.  This modification has the
        added benefit of simplifying the equations used for attribute
        interpolation.

    How should clip w coordinates be handled by this extension?

        RESOLVED:  Fragment programs are allowed to access the reciprocal of
        the clip w coordinate, so this attribute must be produced by
        rasterization.  The OpenGL 1.2 spec doesn't explictly enumerate the
        attributes associated with the fragment, but we add treatment of the w
        clip coordinate in the appropriate locations.  

        The reciprocal of the clip w coordinate in traditional graphics
        hardware is produced by screen-space linear interpolation of the
        reciprocals of the clip w coordinates of the vertices.  However, this
        spec says the clip w coordinate is produced by perspective-correct
        interpolation of the (non-reciprocated) clip w vertex coordinates.
        These two formulations turn out to be equivalent, and the latter is
        more convenient since the core OpenGL spec already contains formulas
        for perspective-correct interpolation of vertex attributes.

    What is produced by the TEX/TXP/TXD instructions if the requested texture
    image is inconsistent?

        RESOLVED:  The result vector is specified to be (0,0,0,0).  This
        behavior is consistent with the NV_texture_shader extension.  Note
        that like in NV_texture_shader, these instructions ignore the standard
        hierarchy of texture enables and programs can access textures that are
        not specifically "enabled".

    Should a minimum precision be specified for certain fragment attribute
    registers (in particular COL0, COL1) that may not be generated with full
    fp32 precision?

        RESOLVED:  No.  It is expected that the precision of COL0/COL1 should
        generally be at least as high as that of the frame buffer.

    Fragment color components (f[COL0] and f[COL1]) are generally
    low-precision fixed-point values in the range [0,1].  Is it possible to
    pass unclamped or high-precision color components to fragment programs?

        RESOLVED:  Yes, although you can't exactly call them "colors".
        High-precision per-vertex color values can be written into any unused
        texture coordinate set, either via a MultiTexCoord call or using a
        vertex program.  These "texture coordinates" will be interpolated
        during rasterization, and can be used arbitrarily by a fragment
        program.

        In particular, there is no requirement that per-fragment attributes
        called "texture coordinates" be used for texture mapping.

    Should this specification guarantee that temporary registers are
    initialized to zero?

        RESOLVED:  Yes.  This will allow for the modular construction of
        programs that accumulate results in registers.  For example,
        per-fragment lighting may use MAD instructions to accumulate color
        contributions at each light.  Without zero-initialization, the program
        would require an explicit MOV instruction to load 0 or the use of the
        MUL instruction for the first light.

    Should this specification support Unicode program strings?

        RESOLVED:  Not necessary.

    Programs defined by NV_vertex_program begin with "!!VP1.0".  Should
    fragment programs have a similar identifier?

        RESOLVED:  Yes, "!!FP1.0", identifying the first revision of this
        fragment program language.

    Should per-fragment attributes have equivalent integer names in the
    program language, as per-vertex attributes do in NV_vertex_program?

        RESOLVED:  No.  In NV_vertex_program, "generic" vertex attributes
        could be specified directly by an application using only an attribute
        number.  Those numbers may have no necessary correlation with the
        conventional attribute names, although conventional vertex attributes
        are mapped to attribute numbers.  However, conventional attributes are
        the only outputs of vertex programs and of rasterization.  Therefore,
        there is no need for a similar input-by-number functionality for
        fragment programs.

    Should we provide the ability to issue instructions that do not update
    temporary or output registers?

        RESOLVED:  Yes.  Programs may issue instructions whose only purpose is
        to update the condition code register, and requiring such instructions
        to write to a temporary may require the use of an additional temporary
        and/or defeat possible program optimizations.  We accomplish this by
        adding two write-only temporary pseudo-registers ("RC" and "HC") that
        can be specified as destination registers.

    Do the packing and unpacking instructions in this extension make any
    sense?

        RESOLVED:  Yes.  They are useful for packing and unpacking multiple
        components in a single channel of a floating-point frame buffer.  For
        example, a 128-bit "RGBA" frame buffer could pack 16 8-bit quantities
        or 8 16-bit quantities, all of which could be used in later
        rasterization passes.  See the NV_float_buffer extension for more
        information.

    Should we provide a method for specifying an fp16 depth component output
    value?

        RESOLVED:  No.  There is no good reason for supporting half-precision
        Z outputs.  Even with 16-bit Z buffers, the 10-bit mantissa of the
        half-precision float is rather limiting.  There would effectively be
        only 11 good bits in the back half of the Z buffer.

    Should RequestResidentProgramsNV (or a new equivalent function) take a
    target?  Dealing with working sets of different program types is a bit
    messy.  Should we document some limitation if we get programs of different
    types?
          
        RESOLVED:  In retrospect, it may have been a good idea to attach a
        target to this command, but there isn't a good reason to mess with
        something that already works for vertex programs.  The driver is
        responsible for ensuring consistent results when the program types
        specified are mixed.
    
    What happens on data type conversions where the original value is not
    exactly representable in the new data type, either due to overflow or
    insufficient precision in the destination type?

        RESOLVED:  In case of overflow, the original value is clamped to the
        +/-INF (fp16 or fp32) or the nearest representable value (fx12).  In
        case of imprecision, the conversion is either to round or truncate to
        the nearest representable value.

    Should this extension support IEEE-style denorms?  For 32-bit IEEE
    floating point, denorms are numbers smaller in absolute value than 2^-126.
    For 16-bit floats used by this extension, denorms are numbers smaller in
    absolute value than 2^-14.

        RESOLVED:  For 32-bit data types, hardware support for denorms was
        considered too expensive relative to the benefit provided.
        Computational results that would otherwise produce denorms are flushed
        to zero.  For 16-bit data types, hardware denorm support will be
        present.  The expense of hardware denorm support is lower and the
        potential precision benefit is greater for 16-bit data types.

    OpenGL provides a hierarchy of texture enables.  The texture lookup
    operations in NV_texture_shader effectively override the texture enable
    hierarchy and select a specific texture to enable.  What should be done by
    this extension?

        RESOLVED:  This extension will build upon NV_texture_shader and reduce
        the driver overhead of validating the texture enables.  Texture
        lookups can be specified by instructions like "TEX H0, f[TEX2], TEX2,
        3D", which would indicate to use texture coordinate set number 2 to do
        a lookup in the texture object bound to the TEXTURE_3D target in
        texture image unit 2.

        Each texture unit can have only one "active" target.  Programs are not
        allowed to reference different texture targets in the same texture
        image unit.  In the example above, any other texture instructions
        using texture image unit 2 must specify the 3D texture target.

    What is the interaction with NV_register_combiners?

        RESOLVED:  Register combiners are not available when fragment programs
        are enabled.

        Previous version of this specification supported the notion of
        combiner programs, where the result of fragment program execution was
        a set of four "texture lookup" values that fed the register combiners.

    For convenience, should we include pseudo-instructions not present in the
    hardware instruction set that are trivially implementable?  For example,
    absolute value and subtract instructions could fall in this category.  An
    "ABS R1,R0" instruction would be equivalent to "MAX R1,R0,-R0", and a "SUB
    R2,R0,R1" would be equivalent to "ADD R2,R0,-R1"

        RESOLVED:  In general, yes.  A SUB instruction is provided for
        convenience.  This extension does not provide a separate ABS
        instruction because it supports absolute value operations of each
        operand.

    Should there be a '+' in the <optionalSign> portion of the grammar?  There
    isn't one in the GL_NV_vertex_program spec.

        RESOLVED:  Yes, for orthogonality/readability.  A '+' obviously adds
        no functionality.  In NV_vertex_program, an <optionalSign> of "-" was
        always a negation operator.  However, in fragment programs, it can
        also be used as a sign for a constant value.

    Can the same fragment attribute register, program parameter register, or
    constants be used for multiple operands in the same instruction?  If so,
    can it be used with different swizzle patterns?

        RESOLVED:  Yes and yes.

    This extension allows different limits for the number of texture
    coordinate sets and the number of texture image units (i.e., texture maps
    and associated data).  The state in ActiveTextureARB affects both
    coordinate sets (TexGen, matrix operations) and image units (TexParameter,
    TexEnv).  How should we deal with this?

        RESOLVED:  Continue to use ActiveTextureARB and emit an
        INVALID_OPERATION if the active texture refers to an unsupported
        coordinate set/image unit.  Other options included creating dummy
        (unusable) state for unsupported coordinate sets/image units and
        continue to use ActiveTextureARB normally, or creating separate state
        and state-setting commands for coordinate sets and image units.
        Separate state is the cleanest solution, but would add more calls and
        potentially cause more programmer confusion.  Dummy state would avoid
        additional error checks, but the demands of dummy state could grow if
        the number of texture image units and texture coordinate sets
        increases.

        The current OpenGL spec is vague as to what state is affected by the
        active texture selector and has no distination between
        coordinate-related and image-related state.  The state tables could
        use a good clean-up in this area.

    The LRP instruction is defined so that the result of "LRP R0, R0, R1, R2"
    is R0*R1+(1-R0)*R2.  There are conflicting precedents here.  The
    definition here matches the "lrp" instruction in the DirectX 8.0 pixel
    shader language.  However, an equivalent RenderMan lerp operation would
    yield a result of (1-R0)*R1+R0*R2.  Which ordering should be implemented?

        RESOLVED:  NVIDIA hardware implements the former operand ordering, and
        there is no good reason to specify a different ordering.  To convert a
        "LRP" using the latter ordering to NV_fragment_program, swap the third
        and fourth arguments.

    Should this extension provide tracking of matrices or any other state,
    similar to that provided in NV_vertex_program?

        RESOLVED:  No.

    Should this extension provide global program parameters -- values shared
    between multiple fragment programs?

        RESOLVED:  No.

    Should this extension provide program parameters specific to a program?
    If so, how?

        RESOLVED:  Yes.  These parameters will be called "local parameters".
        This extension will provide both named and numbered local parameters.
        Local parameters can be managed by the driver and eliminate the need
        for applications to manage a global name space.  

        Named local parameters work much like standard variable names in most
        programming languages.  They are created using the "DECLARE"
        instruction within the fragment program itself.  For example:

            DECLARE color = {1,0,0,1};

        Named local parameters are used simply by referencing the variable
        name.  They do not require the array syntax like the global parameters
        in the NV_vertex_program extension.  They can be updated using the
        commands ProgramNamedParameter4[f,fv]NV.

        Numbered local parameters are not declared.  They are used by simply
        referencing an element of an array called "p".  For example,

            MOV R0, p[12];

        loads the value of numbered local parameter 12 into register R0.
        Numbered local parameters can be updated using the commands
        ProgramLocalParameter4[d,dv,f,fv]ARB.

        The numbered local parameter APIs were added to this extension late in
        its development, and are provided for compatibility with the
        ARB_vertex_program extension, and what will likely be supported in
        ARB_fragment_program as well.  Providing this mechanism allows
        programs to use the same mechanisms to set local parameters in both
        extension.

    Why are the APIs for setting named and numbered local parameters
    different?

        RESOLVED:  The named parameter API was created prior to
        ARB_vertex_program (and the possible future ARB_fragment_program) and
        uses conventions borrowed from NV_vertex_program.  A slightly
        different API was chosen during the ARB standardization process; see
        the ARB_vertex_program specification for more details.

        The named parameter API takes a program ID and a parameter name, and
        sets the parameter for the program with the specified ID.  The
        specified program does not need to be bound (via BindProgramNV) in
        order to modify the values of its named parameters.  The numbered
        parameter API takes a program target enum (FRAGMENT_PROGRAM_NV) and a
        parameter number and modifies the corresponding numbered parameter of
        the currently bound program.

    What should be the initial value of uninitialized local parameters?

        RESOLVED:  (0,0,0,0).  This choice is somewhat arbitrary, but matches
        previous extensions (e.g., NV_vertex_program).

    Should this extension support program parameter arrays?

        RESOLVED:  No hardware support is present.  Note that from the point
        of view of a fragment program, a texture map can be used as a 1-, 2-,
        or 3-dimensional array of constants.
        
    Should this extension provide support constants in fragment programs?  If
    so, how?

        RESOLVED:  Yes.  Scalar or vector constants can be defined inline
        (e.g., "1.0" or "{1,2,3,4}").  In addition, named constants are
        supported using the "DEFINE" instruction, which allow programmers to
        change the values of constants used in multiple instructions simply be
        changing the value assigned to the named constant.

        Note that because this extension uses program strings, the
        floating-point value of any constants generated on the fly must be
        printed to the program string.  An alternate method that avoids the
        need to print constants is to declare a named local program parameter
        and initialize it with the ProgramNamedParameter4[f,fv]() calls.

    Should named constants be allowed to be redefined?

        RESOLVED:  No.  If you want to redefine the values of constants, you
        can create an equivalent named program parameter by changing the
        "DEFINE" keyword to "DECLARE".

    Should functions used to update or query named local parameters take a
    zero-terminated string (as with most strings in the C programming
    language), or should they require an explicit string length?  If the
    former, should we create a version of LoadProgramNV that does not require
    a string length.

        RESOLVED:  Stick with explicit string length.  Strings that are
        defined as constants can have the length computed at compile-time.
        Strings read from files will have the length known in advance.
        Programs to build strings at run-time also likely keep the length
        up-to-date.  Passing an explicit length saves time, since the driver
        doesn't have to do a strlen().

    What is the deal with the alpha of the secondary color?

        RESOLVED:  In unextended OpenGL 1.2, the alpha component of the
        secondary color is forced to 0.0.  In the EXT_secondary_color
        extension, the alpha of the per-vertex secondary colors is defined to
        be 0.0.  NV_vertex_program allows vertex programs to produce a
        per-vertex alpha component, but it is forced to zero for the purposes
        of the color sum.  In the NV_register_combiners extension, the alpha
        component of the secondary color is undefined.  What a mess.

        In this extension, the alpha of the secondary color is well-defined
        and can be used normally.  When in vertex program mode

    Why are fragment program instructions involving f[FOGC] or f[TEX0] through
    f[TEX7] automatically carried out at full precision?

        RESOLVED:  This is an artifact of the method that these interpolants
        are generated the NVIDIA graphics hardware.  If such instructions
        absolutely must be carried out at lower precision, the requirement can
        be met by first loading the interpolants into a temporary register.

    With a different number of texture coordinate sets and texture image
    units, how many copies of each kind of texture state are there?

        RESOLVED:  The intention is that texture state be broken into three
        groups.  (1) There are MAX_TEXTURE_COORDS_NV copies of texture
        coordinate set state, which includes current texture coordinates,
        TexGen state, and texture matrices.  (2) There are
        MAX_TEXTURE_IMAGE_UNITS_NV copies of texture image unit state, which
        include texture maps, texture parameters, LOD bias parameters.  (3)
        There are MAX_TEXTURE_UNITS_ARB copies of legacy OpenGL texture unit
        state (e.g., texture enables, TexEnv blending state), all of which are
        unused when in fragment program mode.

        It is not necessary that MAX_TEXTURE_UNITS_ARB be equal to the minimum
        of MAX_TEXTURE_COORDS_NV and MAX_TEXTURE_IMAGE_UNITS --
        implementations may choose not to extend fixed-function OpenGL texture
        mapping modes beyond a certain point.

    The GLX protocol for LoadProgramNV (and ProgramNamedParameterNV) may end
    up with programs >64KB.  This will overflow the limits of the GLX Render
    protocol, resulting in the need to use RenderLarge path.  This is an issue
    with vertex programs, also.

        RESOLVED:  Yes, it is.

    Should textures used by fragment programs be declared?  For example,
    "TEXTURE TEX3, 2D", indicating that the 2D texture should be used for all
    accesses to texture unit 3.  The dimension could be dropped from the TEX
    family of instructions, and some of the compile-time error checking could
    be dropped.

        RESOLVED:  Maybe it should be, but for better or worse, it isn't.

    It is not all that uncommon to have negative q values with projective
    texture mapping, but results are undefined if any q values are negative in
    this specification.  Why?

        RESOLVED:  This restriction carries on a similar one in the initial
        OpenGL specification.  The motivation for this restriction is that
        when interpolating, it is possible for a fragment to have an
        interpolated q coordinate at or near 0.0.  Since the texture
        coordinates used for projective texture mapping are s/q, t/q, and r/q,
        this will result in a divide-by-zero error or suffer from significant
        numerical instability.  Results will be inaccurate for such fragments.

        Other than the numerical stability issue above, NVIDIA hardware should
        have no problems with negative q coordinates.

    Should programs that replace depth have their own special program type,
    Such as "!!FPD1.0" and "!!FPDC1.0"?

        RESOLVED:  No.  If a program has an instruction that writes to
        o[DEPR], the final fragment depth value is taken from o[DEPR].z.
        Otherwise, the fragment's original depth value is used.

    What fx12 value should NaN map to?

        RESOLVED:  For the lack of any better choice, 0.0.

    How are special-case encodings (-INF, +INF, -0.0, +0.0, NaN) handled for
    arithmetic and comparison operations?

        RESOLVED:  The special cases for all floating-point operations are
        designed to match the IEEE specification for floating-point numbers as
        closely as possible.  The results produced by special cases should be
        enumerated in the sections of this spec describing the operations.
        There are some cases where the implemented fragment program behavior
        does not match IEEE conventions, and these cases should be noted in
        this specification.

    How can condition codes be used to mask out register writes?  How about
    killing fragments?  What other things can you do?

        RESOLVED:  The following example computes a component wise |R1-R2|:

          SUBC R0, R1, R2;      # "C" suffix means update condition code
          MOV  R0 (LT), -R0;    # Conditional write mask in parentheses

        The first instruction computes a component-wise difference between R1
        and R2, storing R1-R2 in register R0.  The "C" suffix in the
        instruction means to update the condition code based on the sign of
        the result vector components.  The second instruction inverts the sign
        of the components of R0.  However the "(LT)" portion says that the
        destination register should be updated only if the corresponding
        condition code component is LT (negative).  This means that only those
        components of R0

        To kill a fragment if the red (x) component of a texture lookup
        returns zero:

          TEXC R0, f[TEX0], TEX0, 2D;
          KIL EQ.x;

        To kill based on the green (y) component, use "EQ.y" instead.  To kill
        if any of the four components is zero, use "EQ.xyzw" or just "EQ".
        
        Fragment programs do not support boolean expressions.  These can
        generally be achieved using conditional write mask.  

        To evaluate the expression "(R0.x == 0) && (R1.x == 0)":

          MOVC RC.x, R0.x;
          MOVC RC.x (EQ), R1.x;

        To evaluate the expression "(R0.x == 0) || (R1.x == 0)":

          MOVC RC.x, R0.x;
          MOVC RC.x (NE), R1.x;

        In both cases, the x component of the condition code will contain "EQ"
        if and only if the condition is TRUE.

    How can fragment programs be used to implement non-standard texture
    filtering modes?

        RESOLVED:  As one example, consider a case where you want to do linear
        filtering in a 2D texture map, but only horizontally.  To achieve
        this, first set the texture filtering mode to NEAREST.  For a 16 x n
        texture, you might do something like:

          DEFINE halfTexel = { 0.03125, 0 };   # 1/32 (1/2 a texel)
          ADD R2, f[TEX0], -halfTexel;         # coords of left sample
          ADD R1, f[TEX0], +halfTexel;         # coords of right sample
          TEX R0, R2, TEX0, 2D;                # lookup left sample
          TEX R1, R1, TEX0, 2D;                # lookup right sample
          MUL R2.x, R2.x, 16;                  # scale X coords to texels
          FRC R2.x, R2.x;                      # get fraction, filter weight
          LRP R0, R2.x, R1, R0;                # blend samples based on weight

        There are plenty of other interesting things that can be done.

    Should this specification provide more examples?

        RESOLVED:  Yes, it should.

    Is the OpenGL ARB working on a multi-vendor standard for fragment
    programmability?  Will there be an ARB_fragment_program extension?  If so,
    how will this extension interact with the ARB standard?

        RESOLVED:  Yes, as of July 2002, there was a multi-vendor working
        group and a draft specification.  The ARB extension is expected to
        have several features not present in this extension, such as state
        tracking and global parameters (called "program environment
        parameters").  It will also likely lack certain features found in this
        extension.

    Why does the HEMI mapping apply to the third component of signed HILO
    textures, but not to unsigned HILO textures?

        RESOLVED:  This behavior matches the behavior of NV_texture_shader
        (e.g., the DOT_PRODUCT_NV mode).  The HEMI mapping will construct the
        third component of a unit vector whose first two components are
        encoded in the HILO texture.


New Procedures and Functions

    void ProgramNamedParameter4fNV(uint id, sizei len, const ubyte *name,
                                   float x, float y, float z, float w);
    void ProgramNamedParameter4dNV(uint id, sizei len, const ubyte *name,
                                   double x, double y, double z, double w);
    void ProgramNamedParameter4fvNV(uint id, sizei len, const ubyte *name,
                                    const float v[]);
    void ProgramNamedParameter4dvNV(uint id, sizei len, const ubyte *name,
                                    const double v[]);
    void GetProgramNamedParameterfvNV(uint id, sizei len, const ubyte *name,
                                      float *params);
    void GetProgramNamedParameterdvNV(uint id, sizei len, const ubyte *name,
                                      double *params);

    void ProgramLocalParameter4dARB(enum target, uint index,
                                    double x, double y, double z, double w);
    void ProgramLocalParameter4dvARB(enum target, uint index,
                                     const double *params);
    void ProgramLocalParameter4fARB(enum target, uint index,
                                    float x, float y, float z, float w);
    void ProgramLocalParameter4fvARB(enum target, uint index,
                                     const float *params);
    void GetProgramLocalParameterdvARB(enum target, uint index,
                                       double *params);
    void GetProgramLocalParameterfvARB(enum target, uint index, 
                                       float *params);


New Tokens

    Accepted by the <cap> parameter of Disable, Enable, and IsEnabled, by the
    <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv, and GetDoublev,
    and by the <target> parameter of BindProgramNV, LoadProgramNV,
    ProgramLocalParameter4dARB, ProgramLocalParameter4dvARB,
    ProgramLocalParameter4fARB, ProgramLocalParameter4fvARB,
    GetProgramLocalParameterdvARB, and GetProgramLocalParameterfvARB:

        FRAGMENT_PROGRAM_NV                            0x8870

    Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv,
    and GetDoublev:

        MAX_TEXTURE_COORDS_NV                          0x8871
        MAX_TEXTURE_IMAGE_UNITS_NV                     0x8872
        FRAGMENT_PROGRAM_BINDING_NV                    0x8873
        MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV       0x8868

    Accepted by the <name> parameter of GetString:

        PROGRAM_ERROR_STRING_NV                        0x8874


Additions to Chapter 2 of the OpenGL 1.2.1 Specification (OpenGL Operation)

    Modify Section 2.11, Clipping (p.39)

    (replace the first paragraph of the section, p. 39)  Primitives are clipped
    to the clip volume.  In clip coordinates, the view volume is defined by
    
        -w_c <= x_c <= w_c,
        -w_c <= y_c <= w_c, and
        -w_c <= z_c <= w_c.

    Clipping to the near and far clip planes is ignored if fragment program
    mode (section 3.11) or texture shaders (see NV_texture_shader
    specification) are enabled, if the current fragment program or texture
    shader computes per-fragment depth values.  In this case, the view volume
    is defined by:
    
        -w_c <= x_c <= w_c and
        -w_c <= y_c <= w_c.


Additions to Chapter 3 of the OpenGL 1.2.1 Specification (Rasterization) 

    Modify Chapter 3 introduction (p. 57)

    (p.57, modify 1st paragraph) ... Figure 3.1 diagrams the rasterization
    process.  The color value assigned to a fragment is initially determined
    by the rasterization operations (Sections 3.3 through 3.7) and modified by
    either the execution of the texturing, color sum, and fog operations as
    defined in Sections 3.8, 3.9, and 3.10, or of a fragment program defined
    in Section 3.11.  The final depth value is initially determined by the
    rasterization operations and may be modified by a fragment program.
  
    note:  Antialiasing Application is renumbered from Section 3.11 to Section
    3.12.

    Modify Figure 3.1 (p.58)

                             Primitive Assembly
                                      |
              +-----------+-----------+-----------+-----------+
              |           |           |           |           |
              |           |           |        Pixel          |
            Point       Line       Polygon     Rectangle   Bitmap
           Raster-     Raster-     Raster-     Raster-     Raster-
           ization     ization     ization     ization     ization   
              |           |           |           |           |
              +-----------+-----------+-----------+-----------+
                                      |
                                      |
                    +-----------------+-----------------+
                    |                 |                 |
              Conventional         Texture          Fragment
              Texture Fetch        Shaders          Programs
                    |                 |                 |
                    |  +--------------+                 |
                    |  |                                |
        TEXTURE_    o  o                                |
        SHADER_NV                                       |
        enable      o                                   | 
                    |                                   |
                    +-------------+                     |
                    |             |                     |
               Conventional   Register                  |
                  TexEnv      Combiners                 |
                    |             |                     |
                Color Sum         |                     |
                    |             |                     |
                   Fog            |                     |
                    |             |                     |
                    |  +----------+                     |
                    |  |                                | 
        REGISTER_   o  o                                |
        COMBINERS_                                      |
        NV enable   o                                   |
                    |                                   |
                    +-----------------+  +--------------+
                                      |  |
                           FRAGMENT_  o  o
                           PROGRAM_
                           NV enable  o
                                      |
                                      |
                                   Coverage 
                                  Application
                                      |
                                      v
                            to fragment processing


    Modify Section 3.3, Points (p.61)

    All fragments produced in rasterizing a non-antialiased point are assigned
    the same associated data, which are those of the vertex corresponding to
    the point.  (delete reference to divide by q).

    If anitialiasing is enabled, then ...  The data associated with each
    fragment are otherwise the data associated with the point being
    rasterized.  (delete reference to divide by q)

    Modify Section 3.4.1, Basic Line Segment Rasterization (p.66)

    (Note that t=0 at p_a and t=1 at p_b).  The value of an associated datum f
    from the fragment, whether it be R, G, B, or A (in RGBA mode) or a color
    index (in color index mode), the s, t, r, or q texture coordinate, or the
    clip w coordinate (the depth value, window z, must be found using equation
    3.3, below), is found as

      f = (1-t) * f_a / w_a + t * f_b / w_b                     (3.2)
          ---------------------------------
                (1-t) / w_a + t / w_b

    where f_a and f_b are the data associated with the starting and ending
    endpoints of the segment, respectively; w_a and w_b are the clip
    w coordinates of the starting and ending endpoints of the segments
    respectively.  Note that linear interpolation would use

      f = (1-t) * f_a + t * f_b.                                (3.3)

    ... A GL implementation may choose to approximate equation 3.2 with 3.3,
    but this will normally lead to unacceptable distortion effects when
    interpolating texture coordinates or clip w coordinates.

    Modify Section 3.5.1, Basic Polygon Rasterization (p.71)

    Denote a datum at p_a, p_b, or p_c ... is given by

      f = a * f_a / w_a + b * f_b / w_b + c * f_c / w_c         (3.4)
          ---------------------------------------------
                  a / w_a + b / w_b + c / w_c

    where w_a, w_b, and w_c are the clip w coordinates of p_a, p_b, and p_c,
    respectively.  a, b, and c are the barycentric coordinates of the fragment
    for which the data are produced. a, b, and c must correspond precisely to
    the exact coordinates ... at the fragment's center.
    
    Just as with line segment rasterization, equation 3.4 may be approximated
    by
    
      f = a * f_a + b * f_b + c * f_c;                          (3.5)

    this may yield ... for texture coordinates or clip w coordinates.

    Modify Section 3.6.4, Rasterization of Pixel Rectangles (p.100)

    A fragment arising from a group ... are given by those associated with the
    current raster position.  (delete reference to divide by q)
      
    Modify Section 3.7, Bitmaps (p.111)

    Otherwise, a rectangular array ... The associated data for each fragment
    are those associated with the current raster position.  (delete reference
    to divide by q)  Once the fragments have been produced ...

    Modify Section 3.8, Texturing (p.112)

    ... an image at the location indicated by a fragment's texture coordinates
    to modify the fragments primary RGBA color.  Texturing does not affect the
    secondary color.  

    Texturing is specified only for RGBA mode; its use in color index mode is
    undefined.

    Except when in fragment program mode (Section 3.11), the (s,t,r) texture
    coordinates used for texturing are the values s/q, t/q, and r/q,
    respectively, where s, t, r, and q are the texture coordinates associated
    with the fragment.  When in fragment program mode, the (s,t,r) texture
    coordinates are specified by the program.  If q is less than or equal to
    zero, the results of texturing are undefined.

    Add new Section 3.11, Fragment Programs (p.140)  

    Fragment program mode is enabled and disabled with the Enable and Disable
    commands using the symbolic constant FRAGMENT_PROGRAM_NV.  When fragment
    program mode is enabled, standard and extended texturing, color sum, and
    fog application stages are ignored and a general purpose program is
    executed instead.  

    A fragment program is a sequence of instructions that execute on a
    per-fragment basis.  In fragment program mode, the currently bound
    fragment program is executed as each fragment is generated by the
    rasterization operations.  Fragment programs execute a finite fixed
    sequence of instructions with no branching or looping, and operate
    independently from the processing of other fragments.  Fragment programs
    are used to compute new color values to be associated with each fragment,
    and can optionally compute a new depth value for each fragment as well.

    Fragment program mode is not available in color index mode and is
    considered disabled, regardless of the state of FRAGMENT_PROGRAM_NV.  When
    fragment program mode is enabled, texture shaders and register combiners
    (NV_texture_shader and NV_register_combiners extension) are disabled,
    regardless of the state of TEXTURE_SHADER_NV and REGISTER_COMBINERS_NV.

    Section 3.11.1, Fragment Program Registers

    Fragment programs operate on a set of program registers.  Each program
    register is a 4-component vector, whose components are referred to as "x",
    "y", "z", and "w" respectively.  The components of a fragment register are
    always referred to in this manner, regardless of the meaning of their
    contents.

    The four components of each fragment program register have one of two
    different representations:  32-bit floating-point (fp32) or 16-bit
    floating-point (fp16).  More details on these representations can be found
    in Section 3.11.4.1.

    There are several different classes of program registers.  Attribute
    registers (Table X.1) correspond to the fragment's associated data
    produced by rasterization.  Temporary registers (Table X.2) hold
    intermediate results generated by the fragment program.  Output registers
    (Table X.3) hold the final results of a fragment program.  The single
    condition code register is used to mask writes to other registers or to
    determine if a fragment should be discarded.


    Section 3.11.1.1, Fragment Program Attribute Registers

    The fragment program attribute registers (Table X.1) hold the location of
    the fragment and the data associated with the fragment produced by
    rasterization.

    Fragment Attribute                                    Component
    Register Name    Description                          Interpretation
    --------------   -----------------------------------  --------------
       f[WPOS]       Position of the fragment center.     (x,y,z,1/w)
       f[COL0]       Interpolated primary color           (r,g,b,a)
       f[COL1]       Interpolated secondary color         (r,g,b,a)
       f[FOGC]       Interpolated fog distance/coord      (z,0,0,0)
       f[TEX0]       Texture coordinate (unit 0)          (s,t,r,q)
       f[TEX1]       Texture coordinate (unit 1)          (s,t,r,q)
       f[TEX2]       Texture coordinate (unit 2)          (s,t,r,q)
       f[TEX3]       Texture coordinate (unit 3)          (s,t,r,q)
       f[TEX4]       Texture coordinate (unit 4)          (s,t,r,q)
       f[TEX5]       Texture coordinate (unit 5)          (s,t,r,q)
       f[TEX6]       Texture coordinate (unit 6)          (s,t,r,q)
       f[TEX7]       Texture coordinate (unit 7)          (s,t,r,q)

    Table X.1:  Fragment Attribute Registers.  The component interpretation
    column describes the mapping of attribute values to register components.
    For example, the "x" component of f[COL0] holds the red color component,
    and the "x" component of f[TEX0] holds the "s" texture coordinate for
    texture unit 0.  The entries "0" and "1" indicate that the attribute
    register components hold the constants 0 and 1, respectively.

    f[WPOS].x and f[WPOS].y hold the (x,y) window coordinates of the fragment
    center, and relative to the lower left corner of the window.  f[WPOS].z
    holds the associated z window coordinate, normally in the range [0,1].
    f[WPOS].w holds the reciprocal of the associated clip w coordinate.

    f[COL0] and f[COL1] hold the associated RGBA primary and secondary colors
    of the fragment, respectively.  

    f[FOGC] holds the associated eye distance or fog coordinate normally used
    for fog computations.

    f[TEX0] through f[TEX7] hold the associated texture coordinates for
    texture coordinate sets 0 through 7, respectively.

    All attribute register components are treated as 32-bit floats.  However,
    the components of primary and secondary colors (f[COL0] and f[COL1]) may
    be generated with reduced precision.

    The contents of the fragment attribute registers may not be modified by a
    fragment program.  In addition, each fragment program instruction can use
    at most one unique attribute register.


    Section 3.11.1.2, Fragment Program Temporary Registers

    The fragment temporary registers (Table X.2) hold intermediate values used
    during the execution of a fragment program.  There are 96 temporary
    register names, but not all can be used simultaneously.

    Fragment Temporary
    Register Name       Description
    ------------------  -----------------------------------------------------
        R0-R31          Four 32-bit (fp32) floating point values (s.e8.m23)
        H0-H63          Four 16-bit (fp16) floating point values (s.e5.m10)

    Table X.2:  Fragment Temporary Registers.

    In addition to the normal temporary registers, there are two temporary
    pseudo-registers, "RC" and "HC".  RC and HC are treated as unnumbered,
    write-only temporary registers.  The components of RC have an fp32 data
    type; the components of HC have an fp16 data type.  The sole purpose of
    these registers is to permit instructions to modify the condition code
    register (section 3.11.1.4) without overwriting the values in any
    temporary register.

    Fragment program instructions can read and write temporary registers.
    There is no restriction on the number of temporary registers that can be
    accessed by any given instruction.

    All temporary registers are initialized to (0,0,0,0) each time a fragment
    program executes.


    Section 3.11.1.3, Fragment Program Output Registers

    The fragment program output registers hold the final results of the
    fragment program.  The possible final results of a fragment program are a
    high- or low-precision RGBA fragment color, and a fragment depth value.

       Output
    Register Name      Description
    -------------      -------------------------------------------------------
       o[COLR]         Final RGBA fragment color, fp32 format
       o[COLH]         Final RGBA fragment color, fp16 format
       o[DEPR]         Final fragment depth value, fp32 format

    Table X.3:  Fragment Program Output Registers.

    o[COLR] and o[COLH] specify the color of a fragment.  These two registers
    are identical, except for the associated data type of the components.  The
    R, G, B, and A components of the fragment color are taken from the x, y,
    z, and w components respectively of the o[COLR] or o[COLH].  A fragment
    program will fail to load if it writes to both o[COLR] and o[COLH].

    o[DEPR] can be used to replace the associated depth value of a fragment.
    The new depth value is taken from the z component of o[DEPR].  If a
    fragment program does not write to o[DEPR], the associated depth value is
    unmodified.

    A fragment program will fail to load if it does not write to at least one
    output register.

    The fragment program output registers may not be read by a fragment
    program, but may be written to multiple times.  

    The values of all fragment program output registers are initially
    undefined.


    Section 3.11.1.4, Fragment Program Condition Code Register

    The condition code register (CC) is a single four-component vector.  Each
    component of this register is one of four enumerated values:  GT (greater
    than), EQ (equal), LT (less than), or UN (unordered).  The condition code
    register can be used to mask writes to fragment data register components
    or to terminate processing of a fragment altogether (via the KIL
    instruction).

    Most fragment program instructions can optionally update the condition
    code register.  When a fragment program instruction updates the condition
    code register, a condition code component is set to LT if the
    corresponding component of the result vector is less than zero, EQ if it
    is equal to zero, GT if it is greater than zero, and UN if it is NaN (not
    a number).

    The condition code register is initialized to a vector of EQ values each
    time a fragment program executes.


    Section 3.11.2, Fragment Program Parameters

    In addition to using the registers defined in Section 3.11.1, fragment
    programs may also use fragment program parameters in their computation.
    Fragment program parameters are constant during the execution of fragment
    programs, but some parameters may be modified outside the execution of a
    fragment program.

    There are five different types of program parameters:  embedded scalar
    constants, embedded vector constants, named constants, named local
    parameters, and numbered local parameters.

    Embedded scalar constants are written as standard floating-point numbers
    with an optional sign designator ("+" or "-") and optional scientific
    notation (e.g., "E+06", meaning "times 10^6").
 
    Embedded vector constants are written as a comma-separated array of one to
    four scalar constants, surrounded by braces (like a C/C++ array
    initializer).  Vector constants are always treated as 4-component vectors:
    constants with fewer than four components are expanded to 4-components by
    filling missing y and z components with 0.0 and missing w components with
    1.0.  Thus, the vector constant "{2}" is equivalent to "{2,0,0,1}",
    "{3,4}" is equivalent to "{3,4,0,1}", and "{5,6,7}" is equivalent to
    "{5,6,7,1}".

    Named constants allow fragment program instructions to define scalar or
    vector constants that can be referenced by name.  Named constants are
    created using the DEFINE instruction:

        DEFINE pi = 3.1415926535;
        DEFINE color = {0.2, 0.5, 0.8, 1.0};

    The DEFINE instruction associates a constant name with a scalar or vector
    constant value.  Subsequent fragment program instructions that use the
    constant name are equivalent to those using the corresponding constant
    value.

    Named local parameters are similar to named vector constants, but their
    values can be modified after the program is loaded.  Local parameters are
    created using the DECLARE instruction:

        DECLARE fog_color1;
        DECLARE fog_color2 = {0.3, 0.6, 0.9, 0.1};

    The DECLARE instruction creates a 4-component vector associated with the
    local parameter name.  Subsequent fragment program instructions
    referencing the local parameter name are processed as though the current
    value of the local parameter vector were specified instead of the
    parameter name.  A DECLARE instruction can optionally specify an initial
    value for the local parameter, which can be either a scalar or vector
    constant.  Scalar constants are expanded to 4-component vectors by
    replicating the scalar value in each component.  The initial value of
    local parameters not initialized by the program is (0,0,0,0).

    A named local parameter for a specific program can be updated using the
    calls ProgramNamedParameter4fNV or ProgramNamedParameter4fvNV (section
    5.7).  Named local parameters are accessible only by the program in which
    they are defined.  Modifying a local parameter affects the only the
    associated program and does not affect local parameters with the same name
    that are found in any other fragment program.

    Numbered local parameters are similar to named local parameters, except
    that they are referred to by number and are not declared in fragment
    programs.  Each fragment program object has an array of four-component
    floating-point vectors that can be used by the program.  The number of
    vectors is given by the implementation-dependent constant
    MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV, and must be at least 64.  A
    numbered local parameter is accessed by a fragment program as members of
    an array called "p".  For example, the instruction

        MOV R0, p[31];

    copies the contents of numbered local parameter 31 into temporary register
    R0.

    Constant and local parameter names can be arbitrary strings consisting of
    letters (upper or lower-case), numbers, underscores ("_"), and dollar
    signs ("$").  Keywords defined in the grammar (including instruction
    names) can not be used as constant names, nor can strings that start with
    numbers, or strings that specify valid temporary register or texture
    numbers (e.g., "R0"-"R31", "H0"-"H63"", "TEX0"-"TEX15").  A fragment
    program will fail to load if a DEFINE or DECLARE instruction specifies an
    invalid constant or local parameter name.

    A fragment program will fail to load if an instruction contains a named
    parameter not specified in a previous DEFINE or DECLARE instruction.  A
    fragment program will also fail to load if a DEFINE or DECLARE instruction
    attempts to re-define a named parameter specified in a previous DEFINE or
    DECLARE instruction.

    The contents of the fragment program parameters may not be modified by a
    fragment program.  In addition, each fragment program instruction can
    normally use at most one unique program parameter.  The only exception to
    this rule is if all program parameter references specify named or embedded
    constants that taken together contain no more than four unique scalar
    values.  For such instructions, the GL will automatically generate an
    equivalent instruction that references a single merged vector constant.
    This merging allows programs to specify instructions like the following:

        Instruction              Equivalent Instruction
        ---------------------    ---------------------------------------
        MAD R0, R1, 2, -1;       MAD R0, R1, {2,-1,0,0}.x, {2,-1,0,0}.y;
        ADD R0, {1,2,3,4}, 4;    ADD R0, {1,2,3,4}.xyzw, {1,2,3,4}.w;

    Before counting the number of unique values, any named constants are first
    converted to the equivalent embedded constants.  When generating a
    combined vector constant, the GL does not perform swizzling, component
    selection, negation, or absolute value operations.  The following
    instructions are invalid, as they contain more than four unique scalar
    values.

        Invalid Instructions
        -----------------------------------
        ADD R0, {1,2,3,4}, -4;
        ADD R0, {1,2,3,4}, |-4|;
        ADD R0, {1,2,3,4}, -{-1,-2,-3,-4};
        ADD R0, {1,2,3,4}, {4,5,6,7}.x;


    Section 3.11.3, Fragment Program Specification

    Fragment programs are specified as an array of ubytes.  The array is a
    string of ASCII characters encoding the program.  The command
    LoadProgramNV loads a fragment program when the target parameter is
    FRAGMENT_PROGRAM_NV.  The command BindProgramNV enables a fragment program
    for execution.

    At program load time, the program is parsed into a set of tokens possibly
    separated by white space.  Spaces, tabs, newlines, carriage returns, and
    comments are considered whitespace.  Comments begin with the character "#"
    and are terminated by a newline, a carriage return, or the end of the
    program array.  Fragment programs are case-sensitive -- upper and lower
    case letters are treated differently.  The proper choice of case can be
    inferred from the grammar.

    The Backus-Naur Form (BNF) grammar below specifies the syntactically valid
    sequences for fragment programs.  The set of valid tokens can be inferred
    from the grammar.  The token "" represents an empty string and is used to
    indicate optional rules.  A program is invalid if it contains any
    undefined tokens or characters.

    <program>              ::= <progPrefix> <instructionSequence> "END"

    <progPrefix>           ::= "!!FP1.0"

    <instructionSequence>  ::= <instructionSequence> <instructionStatement>
                             | <instructionStatement>

    <instructionStatement> ::= <instruction> ";" 
                             | <constantDefinition> ";"
                             | <localDeclaration> ";"

    <instruction>          ::= <VECTORop-instruction>
                             | <SCALARop-instruction>
                             | <BINSCop-instruction>
                             | <BINop-instruction>
                             | <TRIop-instruction>
                             | <KILop-instruction>
                             | <TEXop-instruction>
                             | <TXDop-instruction>

    <VECTORop-instruction> ::= <VECTORop> <maskedDstReg> "," 
                               <vectorSrc>

    <VECTORop>             ::= "DDX"   | "DDX_SAT"
                             | "DDXR"  | "DDXR_SAT"
                             | "DDXH"  | "DDXH_SAT"
                             | "DDXC"  | "DDXC_SAT"
                             | "DDXRC" | "DDXRC_SAT"
                             | "DDXHC" | "DDXHC_SAT"
                             | "DDY"   | "DDY_SAT"
                             | "DDYR"  | "DDYR_SAT"
                             | "DDYH"  | "DDYH_SAT"
                             | "DDYC"  | "DDYC_SAT"
                             | "DDYRC" | "DDYRC_SAT"
                             | "DDYHC" | "DDYHC_SAT"
                             | "FLR"   | "FLR_SAT"
                             | "FLRR"  | "FLRR_SAT"
                             | "FLRH"  | "FLRH_SAT"
                             | "FLRX"  | "FLRX_SAT"
                             | "FLRC"  | "FLRC_SAT"
                             | "FLRRC" | "FLRRC_SAT"
                             | "FLRHC" | "FLRHC_SAT"
                             | "FLRXC" | "FLRXC_SAT"
                             | "FRC"   | "FRC_SAT"
                             | "FRCR"  | "FRCR_SAT"
                             | "FRCH"  | "FRCH_SAT"
                             | "FRCX"  | "FRCX_SAT"
                             | "FRCC"  | "FRCC_SAT"
                             | "FRCRC" | "FRCRC_SAT"
                             | "FRCHC" | "FRCHC_SAT"
                             | "FRCXC" | "FRCXC_SAT"
                             | "LIT"   | "LIT_SAT"
                             | "LITR"  | "LITR_SAT"
                             | "LITH"  | "LITH_SAT"
                             | "LITC"  | "LITC_SAT"
                             | "LITRC" | "LITRC_SAT"
                             | "LITHC" | "LITHC_SAT"
                             | "MOV"   | "MOV_SAT"
                             | "MOVR"  | "MOVR_SAT"
                             | "MOVH"  | "MOVH_SAT"
                             | "MOVX"  | "MOVX_SAT"
                             | "MOVC"  | "MOVC_SAT"
                             | "MOVRC" | "MOVRC_SAT"
                             | "MOVHC" | "MOVHC_SAT"
                             | "MOVXC" | "MOVXC_SAT"
                             | "PK2H"
                             | "PK2US"  
                             | "PK4B"  
                             | "PK4UB"

    <SCALARop-instruction> ::= <SCALARop> <maskedDstReg> "," 
                               <scalarSrc>

    <SCALARop>             ::= "COS"     | "COS_SAT"
                             | "COSR"    | "COSR_SAT"
                             | "COSH"    | "COSH_SAT"
                             | "COSC"    | "COSC_SAT"
                             | "COSRC"   | "COSRC_SAT"
                             | "COSHC"   | "COSHC_SAT"
                             | "EX2"     | "EX2_SAT"
                             | "EX2R"    | "EX2R_SAT"
                             | "EX2H"    | "EX2H_SAT"
                             | "EX2C"    | "EX2C_SAT"
                             | "EX2RC"   | "EX2RC_SAT"
                             | "EX2HC"   | "EX2HC_SAT"
                             | "LG2"     | "LG2_SAT"
                             | "LG2R"    | "LG2R_SAT"
                             | "LG2H"    | "LG2H_SAT"
                             | "LG2C"    | "LG2C_SAT"
                             | "LG2RC"   | "LG2RC_SAT"
                             | "LG2HC"   | "LG2HC_SAT"
                             | "RCP"     | "RCP_SAT"
                             | "RCPR"    | "RCPR_SAT"
                             | "RCPH"    | "RCPH_SAT"
                             | "RCPC"    | "RCPC_SAT"
                             | "RCPRC"   | "RCPRC_SAT"
                             | "RCPHC"   | "RCPHC_SAT"
                             | "RSQ"     | "RSQ_SAT"
                             | "RSQR"    | "RSQR_SAT"
                             | "RSQH"    | "RSQH_SAT"
                             | "RSQC"    | "RSQC_SAT"
                             | "RSQRC"   | "RSQRC_SAT"
                             | "RSQHC"   | "RSQHC_SAT"
                             | "SIN"     | "SIN_SAT"
                             | "SINR"    | "SINR_SAT"
                             | "SINH"    | "SINH_SAT"
                             | "SINC"    | "SINC_SAT"
                             | "SINRC"   | "SINRC_SAT"
                             | "SINHC"   | "SINHC_SAT"
                             | "UP2H"    | "UP2H_SAT"
                             | "UP2HC"   | "UP2HC_SAT"
                             | "UP2US"   | "UP2US_SAT"
                             | "UP2USC"  | "UP2USC_SAT"
                             | "UP4B"    | "UP4B_SAT"
                             | "UP4BC"   | "UP4BC_SAT"
                             | "UP4UB"   | "UP4UB_SAT"
                             | "UP4UBC"  | "UP4UBC_SAT"

    <BINSCop-instruction> ::=  <BINSCop> <maskedDstReg> "," 
                               <scalarSrc> "," <scalarSrc>

    <BINSCop>              ::= "POW"   | "POW_SAT"
                             | "POWR"  | "POWR_SAT"
                             | "POWH"  | "POWH_SAT"
                             | "POWC"  | "POWC_SAT"
                             | "POWRC" | "POWRC_SAT"
                             | "POWHC" | "POWHC_SAT"

    <BINop-instruction>    ::= <BINop> <maskedDstReg> ","
                               <vectorSrc> "," <vectorSrc>

    <BINop>                ::= "ADD"   | "ADD_SAT"
                             | "ADDR"  | "ADDR_SAT"
                             | "ADDH"  | "ADDH_SAT"
                             | "ADDX"  | "ADDX_SAT"
                             | "ADDC"  | "ADDC_SAT"
                             | "ADDRC" | "ADDRC_SAT"
                             | "ADDHC" | "ADDHC_SAT"
                             | "ADDXC" | "ADDXC_SAT"
                             | "DP3"   | "DP3_SAT"
                             | "DP3R"  | "DP3R_SAT"
                             | "DP3H"  | "DP3H_SAT"
                             | "DP3X"  | "DP3X_SAT"
                             | "DP3C"  | "DP3C_SAT"
                             | "DP3RC" | "DP3RC_SAT"
                             | "DP3HC" | "DP3HC_SAT"
                             | "DP3XC" | "DP3XC_SAT"
                             | "DP4"   | "DP4_SAT"
                             | "DP4R"  | "DP4R_SAT"
                             | "DP4H"  | "DP4H_SAT"
                             | "DP4X"  | "DP4X_SAT"
                             | "DP4C"  | "DP4C_SAT"
                             | "DP4RC" | "DP4RC_SAT"
                             | "DP4HC" | "DP4HC_SAT"
                             | "DP4XC" | "DP4XC_SAT"
                             | "DST"   | "DST_SAT"
                             | "DSTR"  | "DSTR_SAT"
                             | "DSTH"  | "DSTH_SAT"
                             | "DSTC"  | "DSTC_SAT"
                             | "DSTRC" | "DSTRC_SAT"
                             | "DSTHC" | "DSTHC_SAT"
                             | "MAX"   | "MAX_SAT"
                             | "MAXR"  | "MAXR_SAT"
                             | "MAXH"  | "MAXH_SAT"
                             | "MAXX"  | "MAXX_SAT"
                             | "MAXC"  | "MAXC_SAT"
                             | "MAXRC" | "MAXRC_SAT"
                             | "MAXHC" | "MAXHC_SAT"
                             | "MAXXC" | "MAXXC_SAT"
                             | "MIN"   | "MIN_SAT"
                             | "MINR"  | "MINR_SAT"
                             | "MINH"  | "MINH_SAT"
                             | "MINX"  | "MINX_SAT"
                             | "MINC"  | "MINC_SAT"
                             | "MINRC" | "MINRC_SAT"
                             | "MINHC" | "MINHC_SAT"
                             | "MINXC" | "MINXC_SAT"
                             | "MUL"   | "MUL_SAT"
                             | "MULR"  | "MULR_SAT"
                             | "MULH"  | "MULH_SAT"
                             | "MULX"  | "MULX_SAT"
                             | "MULC"  | "MULC_SAT"
                             | "MULRC" | "MULRC_SAT"
                             | "MULHC" | "MULHC_SAT"
                             | "MULXC" | "MULXC_SAT"
                             | "RFL"   | "RFL_SAT"
                             | "RFLR"  | "RFLR_SAT"
                             | "RFLH"  | "RFLH_SAT"
                             | "RFLC"  | "RFLC_SAT"
                             | "RFLRC" | "RFLRC_SAT"
                             | "RFLHC" | "RFLHC_SAT"
                             | "SEQ"   | "SEQ_SAT"
                             | "SEQR"  | "SEQR_SAT"
                             | "SEQH"  | "SEQH_SAT"
                             | "SEQX"  | "SEQX_SAT"
                             | "SEQC"  | "SEQC_SAT"
                             | "SEQRC" | "SEQRC_SAT"
                             | "SEQHC" | "SEQHC_SAT"
                             | "SEQXC" | "SEQXC_SAT"
                             | "SFL"   | "SFL_SAT"
                             | "SFLR"  | "SFLR_SAT"
                             | "SFLH"  | "SFLH_SAT"
                             | "SFLX"  | "SFLX_SAT"
                             | "SFLC"  | "SFLC_SAT"
                             | "SFLRC" | "SFLRC_SAT"
                             | "SFLHC" | "SFLHC_SAT"
                             | "SFLXC" | "SFLXC_SAT"
                             | "SGE"   | "SGE_SAT"
                             | "SGER"  | "SGER_SAT"
                             | "SGEH"  | "SGEH_SAT"
                             | "SGEX"  | "SGEX_SAT"
                             | "SGEC"  | "SGEC_SAT"
                             | "SGERC" | "SGERC_SAT"
                             | "SGEHC" | "SGEHC_SAT"
                             | "SGEXC" | "SGEXC_SAT"
                             | "SGT"   | "SGT_SAT"
                             | "SGTR"  | "SGTR_SAT"
                             | "SGTH"  | "SGTH_SAT"
                             | "SGTX"  | "SGTX_SAT"
                             | "SGTC"  | "SGTC_SAT"
                             | "SGTRC" | "SGTRC_SAT"
                             | "SGTHC" | "SGTHC_SAT"
                             | "SGTXC" | "SGTXC_SAT"
                             | "SLE"   | "SLE_SAT"
                             | "SLER"  | "SLER_SAT"
                             | "SLEH"  | "SLEH_SAT"
                             | "SLEX"  | "SLEX_SAT"
                             | "SLEC"  | "SLEC_SAT"
                             | "SLERC" | "SLERC_SAT"
                             | "SLEHC" | "SLEHC_SAT"
                             | "SLEXC" | "SLEXC_SAT"
                             | "SLT"   | "SLT_SAT"
                             | "SLTR"  | "SLTR_SAT"
                             | "SLTH"  | "SLTH_SAT"
                             | "SLTX"  | "SLTX_SAT"
                             | "SLTC"  | "SLTC_SAT"
                             | "SLTRC" | "SLTRC_SAT"
                             | "SLTHC" | "SLTHC_SAT"
                             | "SLTXC" | "SLTXC_SAT"
                             | "SNE"   | "SNE_SAT"
                             | "SNER"  | "SNER_SAT"
                             | "SNEH"  | "SNEH_SAT"
                             | "SNEX"  | "SNEX_SAT"
                             | "SNEC"  | "SNEC_SAT"
                             | "SNERC" | "SNERC_SAT"
                             | "SNEHC" | "SNEHC_SAT"
                             | "SNEXC" | "SNEXC_SAT"
                             | "STR"   | "STR_SAT"
                             | "STRR"  | "STRR_SAT"
                             | "STRH"  | "STRH_SAT"
                             | "STRX"  | "STRX_SAT"
                             | "STRC"  | "STRC_SAT"
                             | "STRRC" | "STRRC_SAT"
                             | "STRHC" | "STRHC_SAT"
                             | "STRXC" | "STRXC_SAT"
                             | "SUB"   | "SUB_SAT"
                             | "SUBR"  | "SUBR_SAT"
                             | "SUBH"  | "SUBH_SAT"
                             | "SUBX"  | "SUBX_SAT"
                             | "SUBC"  | "SUBC_SAT"
                             | "SUBRC" | "SUBRC_SAT"
                             | "SUBHC" | "SUBHC_SAT"
                             | "SUBXC" | "SUBXC_SAT"

    <TRIop-instruction>    ::= <TRIop> <maskedDstReg> ","
                               <vectorSrc> "," <vectorSrc> ","
                               <vectorSrc>

    <TRIop>                ::= "MAD"   | "MAD_SAT"
                             | "MADR"  | "MADR_SAT"
                             | "MADH"  | "MADH_SAT"
                             | "MADX"  | "MADX_SAT"
                             | "MADC"  | "MADC_SAT"
                             | "MADRC" | "MADRC_SAT"
                             | "MADHC" | "MADHC_SAT"
                             | "MADXC" | "MADXC_SAT"
                             | "LRP"   | "LRP_SAT"
                             | "LRPR"  | "LRPR_SAT"
                             | "LRPH"  | "LRPH_SAT"
                             | "LRPX"  | "LRPX_SAT"
                             | "LRPC"  | "LRPC_SAT"
                             | "LRPRC" | "LRPRC_SAT"
                             | "LRPHC" | "LRPHC_SAT"
                             | "LRPXC" | "LRPXC_SAT"
                             | "X2D"   | "X2D_SAT"
                             | "X2DR"  | "X2DR_SAT"
                             | "X2DH"  | "X2DH_SAT"
                             | "X2DC"  | "X2DC_SAT"
                             | "X2DRC" | "X2DRC_SAT"
                             | "X2DHC" | "X2DHC_SAT"

    <KILop-instruction>    ::= <KILop> <ccMask>

    <KILop>                ::= "KIL"

    <TEXop-instruction>    ::= <TEXop> <maskedDstReg> ","
                               <vectorSrc> "," <texImageId>

    <TEXop>                ::= "TEX"  | "TEX_SAT"
                             | "TEXC" | "TEXC_SAT"
                             | "TXP"  | "TXP_SAT"
                             | "TXPC" | "TXPC_SAT"

    <TXDop-instruction>    ::= <TXDop> <maskedDstReg> ","
                               <vectorSrc> "," <vectorSrc> ","
                               <vectorSrc> "," <texImageId>

    <TXDop>                ::= "TXD"  | "TXD_SAT"
                             | "TXDC" | "TXDC_SAT"

    <scalarSrc>            ::= <absScalarSrc>
                             | <baseScalarSrc>

    <absScalarSrc>         ::= <negate> "|" <baseScalarSrc> "|"

    <baseScalarSrc>        ::= <signedScalarConstant>
                             | <negate> <namedScalarConstant>
                             | <negate> <vectorConstant> <scalarSuffix>
                             | <negate> <namedLocalParameter> <scalarSuffix>
                             | <negate> <numberedLocal> <scalarSuffix>
                             | <negate> <srcRegister> <scalarSuffix>

    <vectorSrc>            ::= <absVectorSrc>
                             | <baseVectorSrc>

    <absVectorSrc>         ::= <negate> "|" <baseVectorSrc> "|"

    <baseVectorSrc>        ::= <signedScalarConstant>
                             | <negate> <namedScalarConstant>
                             | <negate> <vectorConstant> <scalarSuffix>
                             | <negate> <vectorConstant> <swizzleSuffix>
                             | <negate> <namedLocalParameter> <scalarSuffix>
                             | <negate> <namedLocalParameter> <swizzleSuffix>
                             | <negate> <numberedLocal> <scalarSuffix>
                             | <negate> <numberedLocal> <swizzleSuffix>
                             | <negate> <srcRegister> <scalarSuffix>
                             | <negate> <srcRegister> <swizzleSuffix>

    <maskedDstReg>         ::= <dstRegister> <optionalWriteMask> 
                               <optionalCCMask>

    <dstRegister>          ::= <fragTempReg>
                             | <fragOutputReg>
                             | "RC"
                             | "HC"

    <optionalCCMask>       ::= "(" <ccMask> ")"
                             | ""

    <ccMask>               ::= <ccMaskRule> <swizzleSuffix>
                             | <ccMaskRule> <scalarSuffix>

    <ccMaskRule>           ::= "EQ" | "GE" | "GT" | "LE" | "LT" | "NE" |
                               "TR" | "FL"
                           
    <optionalWriteMask>    ::= ""
                             | "." "x"
                             | "."     "y"
                             | "." "x" "y"
                             | "."         "z"
                             | "." "x"     "z"
                             | "."     "y" "z"
                             | "." "x" "y" "z"
                             | "."             "w"
                             | "." "x"         "w"
                             | "."     "y"     "w"
                             | "." "x" "y"     "w"
                             | "."         "z" "w"
                             | "." "x"     "z" "w"
                             | "."     "y" "z" "w"
                             | "." "x" "y" "z" "w"

    <srcRegister>          ::= <fragAttribReg>
                             | <fragTempReg>

    <fragAttribReg>        ::= "f" "[" <fragAttribRegId> "]"

    <fragAttribRegId>      ::= "WPOS" | "COL0" | "COL1" | "FOGC" | "TEX0"
                             | "TEX1" | "TEX2" | "TEX3" | "TEX4" | "TEX5"
                             | "TEX6" | "TEX7"

    <fragTempReg>          ::= <fragF32Reg>
                             | <fragF16Reg>

    <fragF32Reg>           ::= "R0"  | "R1"  | "R2"  | "R3"
                             | "R4"  | "R5"  | "R6"  | "R7"
                             | "R8"  | "R9"  | "R10" | "R11"
                             | "R12" | "R13" | "R14" | "R15"
                             | "R16" | "R17" | "R18" | "R19"
                             | "R20" | "R21" | "R22" | "R23"
                             | "R24" | "R25" | "R26" | "R27"
                             | "R28" | "R29" | "R30" | "R31"

    <fragF16Reg>           ::= "H0"  | "H1"  | "H2"  | "H3"
                             | "H4"  | "H5"  | "H6"  | "H7"
                             | "H8"  | "H9"  | "H10" | "H11"
                             | "H12" | "H13" | "H14" | "H15"
                             | "H16" | "H17" | "H18" | "H19"
                             | "H20" | "H21" | "H22" | "H23"
                             | "H24" | "H25" | "H26" | "H27"
                             | "H28" | "H29" | "H30" | "H31"
                             | "H32" | "H33" | "H34" | "H35"
                             | "H36" | "H37" | "H38" | "H39"
                             | "H40" | "H41" | "H42" | "H43"
                             | "H44" | "H45" | "H46" | "H47"
                             | "H48" | "H49" | "H50" | "H51"
                             | "H52" | "H53" | "H54" | "H55"
                             | "H56" | "H57" | "H58" | "H59"
                             | "H60" | "H61" | "H62" | "H63"

    <fragOutputReg>        ::= "o" "[" <fragOutputRegName> "]"

    <fragOutputRegName>    ::= "COLR" | "COLH" | "DEPR"

    <numberedLocal>        ::= "p" "[" <localNumber> "]"

    <localNumber>          ::= <integer> from 0 to
                               MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV - 1

    <scalarSuffix>         ::= "." <component>

    <swizzleSuffix>        ::= ""
                             | "." <component> <component>
                                   <component> <component>

    <component>            ::= "x" | "y" | "z" | "w"

    <texImageId>           ::= <texImageUnit> "," <texImageTarget>

    <texImageUnit>         ::= "TEX0"  | "TEX1"  | "TEX2"  | "TEX3"
                             | "TEX4"  | "TEX5"  | "TEX6"  | "TEX7"
                             | "TEX8"  | "TEX9"  | "TEX10" | "TEX11"
                             | "TEX12" | "TEX13" | "TEX14" | "TEX15"

    <texImageTarget>       ::= "1D" | "2D" | "3D" | "CUBE" | "RECT"

    <constantDefinition>   ::= "DEFINE" <namedVectorConstant> "=" 
                               <vectorConstant>
                             | "DEFINE" <namedScalarConstant> "=" 
                               <scalarConstant>

    <localDeclaration>     ::= "DECLARE" <namedLocalParameter> 
                               <optionalLocalValue>

    <optionalLocalValue>   ::= ""
                             | "=" <vectorConstant>
                             | "=" <scalarConstant>

    <vectorConstant>       ::= {" <vectorConstantList> "}"
                             | <namedVectorConstant>

    <vectorConstantList>   ::= <scalarConstant>
                             | <scalarConstant> "," <scalarConstant>
                             | <scalarConstant> "," <scalarConstant> ","
                               <scalarConstant>
                             | <scalarConstant> "," <scalarConstant> ","
                               <scalarConstant> "," <scalarConstant>

    <scalarConstant>       ::= <signedScalarConstant>
                             | <namedScalarConstant>

    <signedScalarConstant> ::= <optionalSign> <floatConstant>

    <namedScalarConstant>  ::= <identifier>    ((name of a scalar constant
                                                 in a DEFINE instruction))

    <namedVectorConstant>  ::= <identifier>    ((name of a vector constant
                                                 in a DEFINE instruction))

    <namedLocalParameter>  ::= <identifier>    ((name of a local parameter
                                                 in a DECLARE instruction))

    <negate>               ::= "-" | "+" | ""

    <optionalSign>         ::= "-" | "+" | ""

    <identifier>           ::= see text below

    <floatConstant>        ::= see text below


    The <identifier> rule matches a sequence of one or more letters ("A"
    through "Z", "a" through "z", "_", and "$") and digits ("0" through "9);
    the first character must be a letter.  The underscore ("_") and dollar
    sign ("$") count as a letters.  Upper and lower case letters are different
    (names are case-sensitive).

    The <floatConstant> rule matches a floating-point constant consisting
    of an integer part, a decimal point, a fraction part, an "e" or
    "E", and an optionally signed integer exponent.  The integer and
    fraction parts both consist of a sequence of on or more digits ("0"
    through "9").  Either the integer part or the fraction parts (not
    both) may be missing; either the decimal point or the "e" (or "E")
    and the exponent (not both) may be missing.

    A fragment program fails to load if it contains more than the maximum
    number of executable instructions.  If ARB_fragment_program is supported,
    this limit is the value of MAX_PROGRAM_INSTRUCTIONS_ARB for the
    FRAGMENT_PROGRAM_ARB target.  Otherwise, the limit is 1024.  Executable
    instructions are those matching the <instruction> rule in the grammar, and
    do not include DEFINE or DECLARE instructions.

    A fragment program fails to load if its total temporary and output
    register count exceeds 64.  Each fp32 temporary or output register used by
    the program (R0-R31, o[COLR], and o[DEPR]) counts as two registers; each
    fp16 temporary or output register used by the program (H0-H63 and o[COLH])
    count as a single register.
      
    A fragment program fails to load if any instruction sources more than one
    unique fragment attribute register.  Instructions sourcing the same
    attribute register multiple times are acceptable.

    A fragment program fails to load if any instruction sources more than one
    unique program parameter register.  Instructions sourcing the same program
    parameter multiple times are acceptable.

    A fragment program fails to load if multiple texture lookup instructions
    reference different targets for the same texture image unit.

    A fragment program fails to load if it writes to both the o[COLR] and
    o[COLH] output registers.

    The error INVALID_OPERATION is generated by LoadProgramNV if a fragment
    program fails to load because it is not syntactically correct or for one
    of the semantic restrictions listed above.

    The error INVALID_OPERATION is generated by LoadProgramNV if a program is
    loaded for id when id is currently loaded with a program of a different
    target.

    A successfully loaded fragment program is parsed into a sequence of
    instructions.  Each instruction is identified by its tokenized name.  The
    operation of these instructions when executed is defined in Sections
    3.11.4 and 3.11.5.


    Section 3.11.4, Fragment Program Operation

    There are forty-five fragment program instructions.  Fragment program
    instructions may have up to eight variants, including a suffix of "R",
    "H", or "X" to specify arithmetic precision (section 3.11.4.2), a suffix
    of "C" to allow an update of the condition code register (section
    3.11.4.4), and a suffix of "_SAT" to clamp the result vector components to
    the range [0,1] (section 3.11.4.4).  For example, the sixteen forms of the
    "ADD" instruction are "ADD", "ADDR", "ADDH", "ADDX", "ADDC", "ADDRC",
    "ADDHC", "ADDXC", "ADD_SAT", "ADDR_SAT", "ADDH_SAT", "ADDX_SAT",
    "ADDC_SAT", "ADDRC_SAT", "ADDHC_SAT", and "ADDXC_SAT".

    Some mathematical instructions that support precision suffixes, typically
    those that involve complicated floating-point computations, do not support
    the "X" precision suffix.

    The fragment program instructions and their respective input and output
    parameters are summarized in Table X.4.

      Instruction          Inputs  Output   Description
      -----------------    ------  ------   --------------------------------
      ADD[RHX][C][_SAT]    v,v     v        add
      COS[RH ][C][_SAT]    s       ssss     cosine
      DDX[RH ][C][_SAT]    v       v        derivative relative to x
      DDY[RH ][C][_SAT]    v       v        derivative relative to y
      DP3[RHX][C][_SAT]    v,v     ssss     3-component dot product
      DP4[RHX][C][_SAT]    v,v     ssss     4-component dot product
      DST[RH ][C][_SAT]    v,v     v        distance vector
      EX2[RH ][C][_SAT]    s       ssss     exponential base 2
      FLR[RHX][C][_SAT]    v       v        floor
      FRC[RHX][C][_SAT]    v       v        fraction
      KIL                  none    none     conditionally discard fragment
      LG2[RH ][C][_SAT]    s       ssss     logarithm base 2
      LIT[RH ][C][_SAT]    v       v        compute light coefficients
      LRP[RHX][C][_SAT]    v,v,v   v        linear interpolation
      MAD[RHX][C][_SAT]    v,v,v   v        multiply and add
      MAX[RHX][C][_SAT]    v,v     v        maximum
      MIN[RHX][C][_SAT]    v,v     v        minimum
      MOV[RHX][C][_SAT]    v       v        move
      MUL[RHX][C][_SAT]    v,v     v        multiply
      PK2H                 v       ssss     pack two 16-bit floats
      PK2US                v       ssss     pack two unsigned 16-bit scalars
      PK4B                 v       ssss     pack four signed 8-bit scalars
      PK4UB                v       ssss     pack four unsigned 8-bit scalars
      POW[RH ][C][_SAT]    s,s     ssss     exponentiation (x^y)
      RCP[RH ][C][_SAT]    s       ssss     reciprocal
      RFL[RH ][C][_SAT]    v,v     v        reflection vector
      RSQ[RH ][C][_SAT]    s       ssss     reciprocal square root
      SEQ[RHX][C][_SAT]    v,v     v        set on equal
      SFL[RHX][C][_SAT]    v,v     v        set on false
      SGE[RHX][C][_SAT]    v,v     v        set on greater than or equal
      SGT[RHX][C][_SAT]    v,v     v        set on greater than
      SIN[RH ][C][_SAT]    s       ssss     sine
      SLE[RHX][C][_SAT]    v,v     v        set on less than or equal
      SLT[RHX][C][_SAT]    v,v     v        set on less than
      SNE[RHX][C][_SAT]    v,v     v        set on not equal
      STR[RHX][C][_SAT]    v,v     v        set on true
      SUB[RHX][C][_SAT]    v,v     v        subtract
      TEX[C][_SAT]         v       v        texture lookup
      TXD[C][_SAT]         v,v,v   v        texture lookup w/partials
      TXP[C][_SAT]         v       v        projective texture lookup
      UP2H[C][_SAT]        s       v        unpack two 16-bit floats
      UP2US[C][_SAT]       s       v        unpack two unsigned 16-bit scalars
      UP4B[C][_SAT]        s       v        unpack four signed 8-bit scalars
      UP4UB[C][_SAT]       s       v        unpack four unsigned 8-bit scalars
      X2D[RH ][C][_SAT]    v,v,v   v        2D coordinate transformation
     
    Table X.4:  Summary of fragment program instructions.  "[RHX]" indicates
    an optional arithmetic precision suffix.  "[C]" indicates an optional
    condition code update suffix.  "[_SAT]" indicates an optional clamp of
    result vector components to [0,1].  "v" indicates a 4-component vector
    input or output, "s" indicates a scalar input, and "ssss" indicates a
    scalar output replicated across a 4-component vector.


    Section 3.11.4.1:  Fragment Program Storage Precision

    Registers in fragment program are stored in two different representations:
    16-bit floating-point (fp16) and 32-bit floating-point (fp32).  There is
    an additional 12-bit fixed-point representation (fx12) used only as an
    internal representation for instructions with the "X" precision qualifier.

    In the 32-bit float (fp32) representation, each component is represented
    in floating-point with eight exponent and twenty-three mantissa bits, as
    in the standard IEEE single-precision format.  If S represents the sign (0
    or 1), E represents the exponent in the range [0,255], and M represents
    the mantissa in the range [0,2^23-1], then an fp32 float is decoded as:

       (-1)^S * 0.0,                           if E == 0,
       (-1)^S * 2^(E-127) * (1 + M/2^23),      if 0 < E < 255,
       (-1)^S * INF,                           if E == 255 and M == 0,
       NaN,                                    if E == 255 and M != 0.

    INF (Infinity) is a special representation indicating numerical overflow.
    NaN (Not a Number) is a special representation indicating the result of
    illegal arithmetic operations, such as computing the square root or
    logarithm of a negative number.  Note that all normal fp32 values, zero,
    and INF have an associated sign.  -0.0 and +0.0 are considered equivalent
    for the purposes of comparisons.

    This representation is identical to the IEEE single-precision
    floating-point standard, except that no special representation is provided
    for denorms -- numbers in the range (-2^-126, +2^-126).  All such numbers
    are flushed to zero.

    In a 16-bit float (fp16) register, each component is represented
    similarly, except with only five exponent and ten mantissa bits.  If S
    represents the sign (0 or 1), E represents the exponent in the range
    [0,31], and M represents the mantissa in the range [0,2^10-1], then an
    fp32 float is decoded as:

       (-1)^S * 0.0,                           if E == 0 and M == 0,
       (-1)^S * 2^-14 * M/2^10                 if E == 0 and M != 0,
       (-1)^S * 2^(E-15) * (1 + M/2^10),       if 0 < E < 31,
       (-1)^S * INF,                           if E == 31 and M == 0, or
       NaN,                                    if E == 31 and M != 0.

    One important difference is that the fp16 representation, unlike fp32,
    supports denorms to maximize the limited precision of the 16-bit floating
    point encodings.

    In the 12-bit fixed-point (fx12) format, numbers are represented as signed
    12-bit two's complement integers with 10 fraction bits.  The range of
    representable values is [-2048/1024, +2047/1024].

    Section 3.11.4.2:  Fragment Program Operation Precision

    Fragment program instructions frequently perform mathematical operations.
    Such operations may be performed at one of three different precisions.
    Fragment programs can specify the precision of each instruction by using
    the precision suffix.  If an instruction has a suffix of "R", calculations
    are carried out with 32-bit floating point operands and results.  If an
    instruction has a suffix of "H", calculations are carried out using 16-bit
    floating point operands and results.  If an instruction has a suffix of
    "X", calculations are carried out using 12-bit fixed point operands and
    results.  For example, the instruction "MULR" performs a 32-bit
    floating-point multiply, "MULH" performs a 16-bit floating-point multiply,
    and "MULX" performs a 12-bit fixed-point multiply.  If no precision suffix
    is specified, calculations are carried out using the precision of the
    temporary register receiving the result.

    Fragment program instructions may source registers or constants whose
    precisions differ from the precision specified with the instruction.
    Instructions may also generate intermediate results with a different
    precision than that of the destination register.  In these cases, the
    values sourced are converted to the precision specified by the
    instruction.

    When converting to fx12 format, -INF and any values less than -2048/1024
    become -2048/1024.  +INF, and any values greater than +2047/1024 become
    +2047/1024.  NaN becomes 0.

    When converting to fp16 format, any values less than or equal to -2^16 are
    converted to -INF.  Any values greater than or equal to +2^16 are
    converted to +INF.  -INF, +INF, NaN, -0.0, and +0.0 are unchanged.  Any
    other values that are not exactly representable in fp16 format are
    converted to one of the two nearest representable values.

    When converting to fp32 format, any values less than or equal to -2^128
    are converted to -INF.  Any values greater than or equal to +2^128 are
    converted to +INF.  -INF, +INF, NaN, -0.0, and +0.0 are unchanged.  Any
    other values that are not exactly representable in fp32 format are
    converted to one of the two nearest representable values.

    Fragment program instructions using the fragment attribute registers
    f[FOGC] or f[TEX0] through f[TEX7] will be carried out at full fp32
    precision, regardless of the precision specified by the instruction.

    Section 3.11.4.3:  Fragment Program Operands

    Except for KIL, fragment program instructions operate on either vector or
    scalar operands, indicated in the grammar (see section 3.11.3) by the
    rules <vectorSrc> and <scalarSrc> respectively.

    The basic set of scalar operands is defined by the grammar rule
    <baseScalarSrc>.  Scalar operands can be scalar constants (embedded or
    named), or single components of vector constants, local parameters, or
    registers allowed by the <srcRegister> rule.  A vector component is
    selected by the <scalarSuffix> rule, where the characters "x", "y", "z",
    and "w" select the x, y, z, and w components, respectively, of the vector.

    The basic set of vector operands is defined by the grammar rule
    <baseVectorSrc>.  Vector operands can include vector constants, local
    parameters, or registers allowed by the <srcRegister> rule.

    Basic vector operands can be swizzled according to the <swizzleSuffix>
    rule.  In its most general form, the <swizzleSuffix> rule matches the
    pattern ".????" where each question mark is one of "x", "y", "z", or "w".
    For such patterns, the x, y, z, and w components of the operand are taken
    from the vector components named by the first, second, third, and fourth
    character of the pattern, respectively.  For example, if the swizzle
    suffix is ".yzzx" and the specified source contains {2,8,9,0}, the
    swizzled operand used by the instruction is {8,9,9,2}.  If the
    <swizzleSuffix> rule matches "", it is treated as though it were ".xyzw".

    Operands can optionally be negated according to the <negate> rule in
    <baseScalarSrc> or <baseVectorSrc>.  If the <negate> matches "-", each
    value is negated.

    The absolute value of operands can be taken if the <vectorSrc> or
    <scalarSrc> rules match <absScalarSrc> or <absVectorSrc>.  In this case,
    the absolute value of each component is taken.  In addition, if the
    <negate> rule in <absScalarSrc> or <absVectorSrc> matches "-", the result
    is then negated.

    Instructions requiring vector operands can also use scalar operands in the
    case where the <vectorSrc> rule matches <scalarSrc>.  In such cases, a
    4-component vector is produced by replicating the scalar.

    After operands are loaded, they are converted to a data type corresponding
    to the operation precision specified in the fragment program instruction.
 
    The following pseudo-code spells out the operand generation process.
    "SrcT" and "InstT" refer to the data types of the specified register or
    constant and the instruction, respectively.  "VecSrcT" and "VecInstT"
    refer to 4-component vectors of the corresponding type.  "absolute" is
    TRUE if the operand matches the <absScalarSrc> or <absVectorSrc> rules,
    and FALSE otherwise.  "negateBase" is TRUE if the <negate> rule in
    <baseScalarSrc> or <baseVectorSrc> matches "-" and FALSE otherwise.
    "negateAbs" is TRUE if the <negate> rule in <absScalarSrc> or
    <absVectorSrc> matches "-" and FALSE otherwise.  The ".c***", ".*c**",
    ".**c*", ".***c" modifiers refer to the x, y, z, and w components obtained
    by the swizzle operation.  TypeConvert() is assumed to convert a scalar of
    type SrcT to a scalar of type InstT using the type conversion process
    specified above.

      VecInstT VectorLoad(VecSrcT source)
      {
          VecSrcT srcVal;
          VecInstT convertedVal;

          srcVal.x = source.c***;
          srcVal.y = source.*c**;
          srcVal.z = source.**c*;
          srcVal.w = source.***c;
          if (negateBase) {
             srcVal.x = -srcVal.x;
             srcVal.y = -srcVal.y;
             srcVal.z = -srcVal.z;
             srcVal.w = -srcVal.w;
          }
          if (absolute) {
             srcVal.x = abs(srcVal.x);
             srcVal.y = abs(srcVal.y);
             srcVal.z = abs(srcVal.z);
             srcVal.w = abs(srcVal.w);
          }
          if (negateAbs) {
             srcVal.x = -srcVal.x;
             srcVal.y = -srcVal.y;
             srcVal.z = -srcVal.z;
             srcVal.w = -srcVal.w;
          }

          convertedVal.x = TypeConvert(srcVal.x);
          convertedVal.y = TypeConvert(srcVal.y);
          convertedVal.z = TypeConvert(srcVal.z);
          convertedVal.w = TypeConvert(srcVal.w);
          return convertedVal;
      }

      InstT ScalarLoad(VecSrcT source) 
      {
          SrcT srcVal;
          InstT convertedVal;

          srcVal = source.c***;
          if (negateBase) {
            srcVal = -srcVal;
          }
          if (absolute) {
             srcVal = abs(srcVal);
          }
          if (negateAbs) {
            srcVal = -srcVal;
          }

          convertedVal = TypeConvert(srcVal);
          return convertedVal;
      }


    Section 3.11.4.4, Fragment Program Destination Register Update

    Each fragment program instruction, except for KIL, writes a 4-component
    result vector to a single temporary or output register.  

    The four components of the result vector are first optionally clamped to
    the range [0,1].  The components will be clamped if and only if the result
    clamp suffix "_SAT" is present in the instruction name.  The instruction
    "ADD_SAT" will clamp the results to [0,1]; the otherwise equivalent
    instruction "ADD" will not.

    Since the instruction may be carried out at a different precision than the
    destination register, the components of the results vector are then
    converted to the data type corresponding to destination register.

    Writes to individual components of the temporary register are controlled
    by two sets of enables: individual component write masks specified as part
    of the instruction and the optional condition code mask.

    The component write mask is specified by the <optionalWriteMask> rule
    found in the <maskedDstReg> rule.  If the optional mask is "", all
    components are enabled.  Otherwise, the optional mask names the individual
    components to enable.  The characters "x", "y", "z", and "w" match the x,
    y, z, and w components respectively.  For example, an optional mask of
    ".xzw" indicates that the x, z, and w components should be enabled for
    writing but the y component should not.  The grammar requires that the
    destination register mask components must be listed in "xyzw" order.

    The optional condition code mask is specified by the <optionalCCMask> rule
    found in the <maskedDstReg> rule.  If <optionalCCMask> matches "", all
    components are enabled.  Otherwise, the condition code register is loaded
    and swizzled according to the swizzling specified by <swizzleSuffix>.
    Each component of the swizzled condition code is tested according to the
    rule given by <ccMaskRule>.  <ccMaskRule> may have the values "EQ", "NE",
    "LT", "GE", LE", or "GT", which mean to enable writes if the corresponding
    condition code field evaluates to equal, not equal, less than, greater
    than or equal, less than or equal, or greater than, respectively.
    Comparisons involving condition codes of "UN" (unordered) evaluate to true
    for "NE" and false otherwise.  For example, if the condition code is
    (GT,LT,EQ,GT) and the condition code mask is "(NE.zyxw)", the swizzle
    operation will load (EQ,LT,GT,GT) and the mask will thus will enable
    writes on the y, z, and w components.  In addition, "TR" always enables
    writes and "FL" always disables writes, regardless of the condition code.

    Each component of the destination register is updated with the result of
    the fragment program if and only if the component is enabled for writes by
    both the component write mask and the optional condition code mask.
    Otherwise, the component of the destination register remains unchanged.

    A fragment program instruction can also optionally update the condition
    code register.  The condition code is updated if the condition code
    register update suffix "C" is present in the instruction name.  The
    instruction "ADDC" will update the condition code; the otherwise
    equivalent instruction "ADD" will not.  If condition code updates are
    enabled, each component of the destination register enabled for writes is
    compared to zero.  The corresponding component of the condition code is
    set to "LT", "EQ", or "GT", if the written component is less than, equal
    to, or greater than zero, respectively.  Condition code components are set
    to "UN" if the written component is NaN.  Note that values of -0.0 and
    +0.0 both evaluate to "EQ".  If a component of the destination register is
    not enabled for writes, the corresponding condition code component is
    unchanged.

    In the following example code,

        # R1=(-2, 0, 2, NaN)              R0                  CC
        MOVC R0, R1;               # ( -2,  0,   2, NaN) (LT,EQ,GT,UN)
        MOVC R0.xyz, R1.yzwx;      # (  0,  2, NaN, NaN) (EQ,GT,UN,UN)
        MOVC R0 (NE), R1.zywx;     # (  0,  0, NaN,  -2) (EQ,EQ,UN,LT)

    the first instruction writes (-2,0,2,NaN) to R0 and updates the condition
    code to (LT,EQ,GT,UN).  The second instruction, only the "x", "y", and "z"
    components of R0 and the condition code are updated, so R0 ends up with
    (0,2,NaN,NaN) and the condition code ends up with (EQ,GT,UN,UN).  In the
    third instruction, the condition code mask disables writes to the x
    component (its condition code field is "EQ"), so R0 ends up with
    (0,0,NaN,-2) and the condition code ends up with (EQ,EQ,UN,LT).

    The following pseudocode illustrates the process of writing a result
    vector to the destination register.  In the example, "ccMaskRule" refers
    to the condition code mask rule given by <ccMaskRule> (or "" if no rule is
    specified), "instrmask" refers to the component write mask given by the
    <optionalWriteMask> rule, "updatecc" is TRUE if condition code updates are
    enabled, and "clamp01" is TRUE if [0,1] result clamping is enabled.
    "destination" and "cc" refer to the register selected by <dstRegister> and
    the condition code, respectively.

      boolean TestCC(CondCode field) {
          switch (ccMaskRule) {
          case "EQ":  return (field == "EQ");
          case "NE":  return (field != "EQ");
          case "LT":  return (field == "LT");
          case "GE":  return (field == "GT" || field == "EQ");
          case "LE":  return (field == "LT" || field == "EQ");
          case "GT":  return (field == "GT");
          case "TR":  return TRUE;
          case "FL":  return FALSE;
          case "":    return TRUE;
      }

      enum GenerateCC(DstT value) {
        if (value == NaN) {
          return UN;
        } else if (value < 0) {
          return LT;
        } else if (value == 0) {
          return EQ;
        } else {
          return GT;
        }
      }

      void UpdateDestination(VecDstT destination, VecInstT result)
      {
          // Load the original destination register and condition code.
          VecDstT resultDst;
          VecDstT merged;
          VecCC   mergedCC;

          // Clamp the result vector components to [0,1], if requested.
          if (clamp01) {
              if (result.x < 0)      result.x = 0;
              else if (result.x > 1) result.x = 1;
              if (result.y < 0)      result.y = 0;
              else if (result.y > 1) result.y = 1;
              if (result.z < 0)      result.z = 0;
              else if (result.z > 1) result.z = 1;
              if (result.w < 0)      result.w = 0;
              else if (result.w > 1) result.w = 1;
          }

          // Convert the result to the type of the destination register.
          resultDst.x = TypeConvert(result.x);
          resultDst.y = TypeConvert(result.y);
          resultDst.z = TypeConvert(result.z);
          resultDst.w = TypeConvert(result.w);

          // Merge the converted result into the destination register, under
          // control of the compile- and run-time write masks.
          merged = destination;
          mergedCC = cc;
          if (instrMask.x && TestCC(cc.c***)) {
              merged.x = result.x;
              if (updatecc) mergedCC.x = GenerateCC(result.x);
          }
          if (instrMask.y && TestCC(cc.*c**)) {
              merged.y = result.y;
              if (updatecc) mergedCC.y = GenerateCC(result.y);
          }
          if (instrMask.z && TestCC(cc.**c*)) {
              merged.z = result.z;
              if (updatecc) mergedCC.z = GenerateCC(result.z);
          }
          if (instrMask.w && TestCC(cc.***c)) {
              merged.w = result.w;
              if (updatecc) mergedCC.w = GenerateCC(result.w);
          }

          // Write out the new destination register and result code.
          destination = merged;
          cc = mergedCC;
      }

    Section 3.11.5, Fragment Program Instruction Set

    The following sections describe the instruction set available to fragment
    programs.


    Section 3.11.5.1,  ADD:  Add

    The ADD instruction performs a component-wise add of the two operands to
    yield a result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = tmp0.x + tmp1.x;
      result.y = tmp0.y + tmp1.y;
      result.z = tmp0.z + tmp1.z;
      result.w = tmp0.w + tmp1.w;

    The following special-case rules apply to addition:

      1. "A+B" is always equivalent to "B+A".
      2. NaN + <x> = NaN, for all <x>.
      3. +INF + <x> = +INF, for all <x> except NaN and -INF.
      4. -INF + <x> = -INF, for all <x> except NaN and +INF.
      5. +INF + -INF = NaN.
      6. -0.0 + <x> = <x>, for all <x>.
      7. +0.0 + <x> = <x>, for all <x> except -0.0.


    Section 3.11.5.2,  COS:  Cosine

    The COS instruction approximates the cosine of the angle specified by the
    scalar operand and replicates the approximation to all four components of
    the result vector.  The angle is specified in radians and does not have to
    be in the range [0,2*PI].

      tmp = ScalarLoad(op0);
      result.x = ApproxCosine(tmp);
      result.y = ApproxCosine(tmp);
      result.z = ApproxCosine(tmp);
      result.w = ApproxCosine(tmp);

    The approximation function ApproxCosine is accurate to at least 22 bits
    with an angle in the range [0,2*PI].

      | ApproxCosine(x) - cos(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI.

    The error in the approximation will typically increase with the absolute
    value of the angle when the angle falls outside the range [0,2*PI].

    The following special-case rules apply to cosine approximation:

      1. ApproxCosine(NaN) = NaN.
      2. ApproxCosine(+/-INF) = NaN.
      3. ApproxCosine(+/-0.0) = +1.0.


    Section 3.11.5.3,  DDX:  Derivative Relative to X

    The DDX instruction computes approximate partial derivatives of the four
    components of the single operand with respect to the X window coordinate
    to yield a result vector.  The partial derivative is evaluated at the
    center of the pixel.

      f = VectorLoad(op0);
      result = ComputePartialX(f);

    Note that the partial derivates obtained by this instruction are
    approximate, and derivative-of-derivate instruction sequences may not
    yield accurate second derivatives.  

    For components with partial derivatives that overflow (including +/-INF
    inputs), the resulting partials may be encoded as large floating-point
    numbers instead of +/-INF.


    Section 3.11.5.4,  DDY:  Derivative Relative to Y

    The DDY instruction computes approximate partial derivatives of the four
    components of the single operand with respect to the Y window coordinate
    to yield a result vector.  The partial derivative is evaluated at the
    center of the pixel.

      f = VectorLoad(op0);
      result = ComputePartialY(f);

    Note that the partial derivates obtained by this instruction are
    approximate, and derivative-of-derivate instruction sequences may not
    yield accurate second derivatives.

    For components with partial derivatives that overflow (including +/-INF
    inputs), the resulting partials may be encoded as large floating-point
    numbers instead of +/-INF.


    Section 3.11.5.5,  DP3:  3-Component Dot Product

    The DP3 instruction computes a three component dot product of the two
    operands (using the x, y, and z components) and replicates the dot product
    to all four components of the result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1):
      result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 
                 (tmp0.z * tmp2.z);
      result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 
                 (tmp0.z * tmp2.z);
      result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 
                 (tmp0.z * tmp2.z);
      result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 
                 (tmp0.z * tmp2.z);


    Section 3.11.5.6,  DP4:  4-Component Dot Product

    The DP4 instruction computes a four component dot product of the two
    operands and replicates the dot product to all four components of the
    result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1):
      result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 
                 (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);
      result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 
                 (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);
      result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 
                 (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);
      result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 
                 (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);


    Section 3.11.5.7,  DST:  Distance Vector

    The DST instruction computes a distance vector from two specially-
    formatted operands.  The first operand should be of the form [NA, d^2,
    d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d],
    where NA values are not relevant to the calculation and d is a vector
    length.  If both vectors satisfy these conditions, the result vector will
    be of the form [1.0, d, d^2, 1/d].

    The exact behavior is specified in the following pseudo-code:

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = 1.0;
      result.y = tmp0.y * tmp1.y;
      result.z = tmp0.z;
      result.w = tmp1.w;

    Given an arbitrary vector, d^2 can be obtained using the DOT3 instruction
    (using the same vector for both operands) and 1/d can be obtained from d^2
    using the RSQ instruction.

    This distance vector is useful for per-fragment light attenuation
    calculations:  a DOT3 operation involving the distance vector and an
    attenuation constants vector will yield the attenuation factor.


    Section 3.11.5.8,  EX2:  Exponential Base 2

    The EX2 instruction approximates 2 raised to the power of the scalar
    operand and replicates it to all four components of the result
    vector.

      tmp = ScalarLoad(op0);
      result.x = Approx2ToX(tmp);
      result.y = Approx2ToX(tmp);
      result.z = Approx2ToX(tmp);
      result.w = Approx2ToX(tmp);

    The approximation function is accurate to at least 22 bits:

      | Approx2ToX(x) - 2^x | < 1.0 / 2^22, if 0.0 <= x < 1.0,

    and, in general,
   
      | Approx2ToX(x) - 2^x | < (1.0 / 2^22) * (2^floor(x)).

    The following special-case rules apply to exponential approximation:

      1. Approx2ToX(NaN) = NaN.
      2. Approx2ToX(-INF) = +0.0.
      3. Approx2ToX(+INF) = +INF.
      4. Approx2ToX(+/-0.0) = +1.0.


    Section 3.11.5.9,  FLR:  Floor

    The FLR instruction performs a component-wise floor operation on the
    operand to generate a result vector.  The floor of a value is defined as
    the largest integer less than or equal to the value.  The floor of 2.3 is
    2.0; the floor of -3.6 is -4.0.

      tmp = VectorLoad(op0);
      result.x = floor(tmp.x);
      result.y = floor(tmp.y);
      result.z = floor(tmp.z);
      result.w = floor(tmp.w);

    The following special-case rules apply to floor computation:

      1. floor(NaN) = NaN.
      2. floor(<x>) = <x>, for -0.0, +0.0, -INF, and +INF.  In all cases, the
         sign of the result is equal to the sign of the operand.


    Section 3.11.5.10,  FRC:  Fraction

    The FRC instruction extracts the fractional portion of each component of
    the operand to generate a result vector.  The fractional portion of a
    component is defined as the result after subtracting off the floor of the
    component (see FLR), and is always in the range [0.00, 1.00).

    For negative values, the fractional portion is NOT the number written to
    the right of the decimal point -- the fractional portion of -1.7 is not
    0.7 -- it is 0.3.  0.3 is produced by subtracting the floor of -1.7 (-2.0)
    from -1.7.

      tmp = VectorLoad(op0);
      result.x = tmp.x - floor(tmp.x);
      result.y = tmp.y - floor(tmp.y);
      result.z = tmp.z - floor(tmp.z);
      result.w = tmp.w - floor(tmp.w);

    The following special-case rules, which can be derived from the rules for
    FLR and ADD apply to fraction computation:

      1. fraction(NaN) = NaN.
      2. fraction(+/-INF) = NaN.
      3. fraction(+/-0.0) = +0.0.


    Section 3.11.5.11,  KIL:  Conditionally Discard Fragment

    The KIL instruction is unlike any other instruction in the instruction
    set.  This instruction evaluates components of a swizzled condition code
    using a test expression identical to that used to evaluate condition code
    write masks (Section 3.11.4.4).  If any condition code component evaluates
    to TRUE, the fragment is discarded.  Otherwise, the instruction has no
    effect.  The condition code components are specified, swizzled, and
    evaluated in the same manner as the condition code write mask.

      if (TestCC(rc.c***) || TestCC(rc.*c**) || 
          TestCC(rc.**c*) || TestCC(rc.***c)) {
         // Discard the fragment.
      } else {
        // Do nothing.
      }

    If the fragment is discarded, it is treated as though it were not produced
    by rasterization.  In particular, none of the per-fragment operations
    (such as stencil tests, blends, stencil, depth, or color buffer writes)
    are performed on the fragment.


    Section 3.11.5.12,  LG2:  Logarithm Base 2

    The LG2 instruction approximates the base 2 logarithm of the scalar
    operand and replicates it to all four components of the result vector.

      tmp = ScalarLoad(op0);
      result.x = ApproxLog2(tmp);
      result.y = ApproxLog2(tmp);
      result.z = ApproxLog2(tmp);
      result.w = ApproxLog2(tmp);
   
    The approximation function is accurate to at least 22 bits:

      | ApproxLog2(x) - log_2(x) | < 1.0 / 2^22.

    Note that for large values of x, there are not enough bits in the
    floating-point storage format to represent a result that precisely.

    The following special-case rules apply to logarithm approximation:

      1. ApproxLog2(NaN) = NaN.
      2. ApproxLog2(+INF) = +INF.
      3. ApproxLog2(+/-0.0) = -INF.
      4. ApproxLog2(x) = NaN, -INF < x < -0.0.
      5. ApproxLog2(-INF) = NaN.


    Section 3.11.5.13,  LIT:  Compute Light Coefficients

    The LIT instruction accelerates per-fragment lighting by computing
    lighting coefficients for ambient, diffuse, and specular light
    contributions.  The "x" component of the operand is assumed to hold a
    diffuse dot product (n dot VP_pli, as in the vertex lighting equations in
    Section 2.13.1).  The "y" component of the operand is assumed to hold a
    specular dot product (n dot h_i).  The "w" component of the operand is
    assumed to hold the specular exponent of the material (s_rm).

    The "x" component of the result vector receives the value that should be
    multiplied by the ambient light/material product (always 1.0).  The "y"
    component of the result vector receives the value that should be
    multiplied by the diffuse light/material product (n dot VP_pli).  The "z"
    component of the result vector receives the value that should be
    multiplied by the specular light/material product (f_i * (n dot h_i) ^
    s_rm).  The "w" component of the result is the constant 1.0.

    Negative diffuse and specular dot products are clamped to 0.0, as is done
    in the standard per-vertex lighting operations.  In addition, if the
    diffuse dot product is zero or negative, the specular coefficient is
    forced to zero.

      tmp = VectorLoad(op0);
      if (t.x < 0) t.x = 0;
      if (t.y < 0) t.y = 0;
      result.x = 1.0;
      result.y = t.x;
      result.z = (t.x > 0) ? ApproxPower(t.y, t.w) : 0.0;
      result.w = 1.0;

    The exponentiation approximation used to compute result.z are identical to
    that used in the POW instruction, including errors and the processing of
    any special cases.


    Section 3.11.5.14,  LRP:  Linear Interpolation

    The LRP instruction performs a component-wise linear interpolation to
    yield a result vector.  It interpolates between the components of the
    second and third operands, using the first operand as a weight.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      tmp2 = VectorLoad(op2);
      result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x;
      result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y;
      result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z;
      result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w;


    Section 3.11.5.15,  MAD:  Multiply and Add

    The MAD instruction performs a component-wise multiply of the first two
    operands, and then does a component-wise add of the product to the third
    operand to yield a result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      tmp2 = VectorLoad(op2);
      result.x = tmp0.x * tmp1.x + tmp2.x;
      result.y = tmp0.y * tmp1.y + tmp2.y;
      result.z = tmp0.z * tmp1.z + tmp2.z;
      result.w = tmp0.w * tmp1.w + tmp2.w;


    Section 3.11.5.16,  MAX:  maximum

    The MAX instruction computes component-wise maximums of the values in the
    two operands to yield a result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = max(tmp0.x, tmp1.x);
      result.y = max(tmp0.y, tmp1.y);
      result.z = max(tmp0.z, tmp1.z);
      result.w = max(tmp0.w, tmp1.w);

    The following special cases apply to the maximum operation:

      1. max(A,B) is always equivalent to max(B,A).
      2. max(NaN, <x>) == NaN, for all <x>.

    

    Section 3.11.5.17,  MIN:  minimum

    The MIN instruction computes component-wise minimums of the values in the
    two operands to yield a result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = min(tmp0.x, tmp1.x);
      result.y = min(tmp0.y, tmp1.y);
      result.z = min(tmp0.z, tmp1.z);
      result.w = min(tmp0.w, tmp1.w);

    The following special cases apply to the minimum operation:

      1. min(A,B) is always equivalent to min(B,A).
      2. min(NaN, <x>) == NaN, for all <x>.


    Section 3.11.5.18,  MOV:  Move

    The MOV instruction copies the value of the operand to yield a result
    vector.

      result = VectorLoad(op0);


    Section 3.11.5.19,  MUL:  Multiply

    The MUL instruction performs a component-wise multiply of the two operands
    to yield a result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = tmp0.x * tmp1.x;
      result.y = tmp0.y * tmp1.y;
      result.z = tmp0.z * tmp1.z;
      result.w = tmp0.w * tmp1.w;

    The following special-case rules apply to multiplication:

      1. "A*B" is always equivalent to "B*A".
      2. NaN * <x> = NaN, for all <x>.
      3. +/-0.0 * +/-INF = NaN.
      4. +/-0.0 * <x> = +/-0.0, for all <x> except -INF, +INF, and NaN.  The
         sign of the result is positive if the signs of the two operands match
         and negative otherwise.
      5. +/-INF * <x> = +/-INF, for all <x> except -0.0, +0.0, and NaN.  The 
         sign of the result is positive if the signs of the two operands match
         and negative otherwise.
      6. +1.0 * <x> = <x>, for all <x>.


    Section 3.11.5.20,  PK2H:  Pack Two 16-bit Floats

    The PK2H instruction converts the "x" and "y" components of the single
    operand into 16-bit floating-point format, packs the bit representation of
    these two floats into a 32-bit value, and replicates that value to all
    four components of the result vector.  The PK2H instruction can be
    reversed by the UP2H instruction below.

      tmp0 = VectorLoad(op0);
      /* result obtained by combining raw bits of tmp0.x, tmp0.y */
      result.x = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
      result.y = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
      result.z = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
      result.w = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);

    The result must be written to a register with 32-bit components (an "R"
    register, o[COLR], or o[DEPR]).  A fragment program will fail to load if
    any other register type is specified.


    Section 3.11.5.21,  PK2US:  Pack Two Unsigned 16-bit Scalars

    The PK2US instruction converts the "x" and "y" components of the single
    operand into a packed pair of 16-bit unsigned scalars.  The scalars are
    represented in a bit pattern where all '0' bits corresponds to 0.0 and all
    '1' bits corresponds to 1.0.  The bit representations of the two converted
    components are packed into a 32-bit value, and that value is replicated to
    all four components of the result vector.  The PK2US instruction can be
    reversed by the UP2US instruction below.

      tmp0 = VectorLoad(op0);
      if (tmp0.x < 0.0) tmp0.x = 0.0;
      if (tmp0.x > 1.0) tmp0.x = 1.0;
      if (tmp0.y < 0.0) tmp0.y = 0.0;
      if (tmp0.y > 1.0) tmp0.y = 1.0;
      us.x = round(65535.0 * tmp0.x);  /* us is a ushort vector */
      us.y = round(65535.0 * tmp0.y);
      /* result obtained by combining raw bits of us. */
      result.x = ((us.x) | (us.y << 16));
      result.y = ((us.x) | (us.y << 16));
      result.z = ((us.x) | (us.y << 16));
      result.w = ((us.x) | (us.y << 16));

    The result must be written to a register with 32-bit components (an "R"
    register, o[COLR], or o[DEPR]).  A fragment program will fail to load if
    any other register type is specified.


    Section 3.11.5.22,  PK4B:  Pack Four Signed 8-bit Scalars

    The PK4B instruction converts the four components of the single operand
    into 8-bit signed quantities.  The signed quantities are represented in a
    bit pattern where all '0' bits corresponds to -128/127 and all '1' bits
    corresponds to +127/127.  The bit representations of the four converted
    components are packed into a 32-bit value, and that value is replicated to
    all four components of the result vector.  The PK4B instruction can be
    reversed by the UP4B instruction below.

      tmp0 = VectorLoad(op0);
      if (tmp0.x < -128/127) tmp0.x = -128/127;
      if (tmp0.y < -128/127) tmp0.y = -128/127;
      if (tmp0.z < -128/127) tmp0.z = -128/127;
      if (tmp0.w < -128/127) tmp0.w = -128/127;
      if (tmp0.x > +127/127) tmp0.x = +127/127;
      if (tmp0.y > +127/127) tmp0.y = +127/127;
      if (tmp0.z > +127/127) tmp0.z = +127/127;
      if (tmp0.w > +127/127) tmp0.w = +127/127;
      ub.x = round(127.0 * tmp0.x + 128.0);  /* ub is a ubyte vector */
      ub.y = round(127.0 * tmp0.y + 128.0);
      ub.z = round(127.0 * tmp0.z + 128.0);
      ub.w = round(127.0 * tmp0.w + 128.0);
      /* result obtained by combining raw bits of ub. */
      result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
      result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
      result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
      result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));

    The result must be written to a register with 32-bit components (an "R"
    register, o[COLR], or o[DEPR]).  A fragment program will fail to load if
    any other register type is specified.


    Section 3.11.5.23,  PK4UB:  Pack Four Unsigned 8-bit Scalars

    The PK4UB instruction converts the four components of the single operand
    into a packed grouping of 8-bit unsigned scalars.  The scalars are
    represented in a bit pattern where all '0' bits corresponds to 0.0 and all
    '1' bits corresponds to 1.0.  The bit representations of the four
    converted components are packed into a 32-bit value, and that value is
    replicated to all four components of the result vector.  The PK4UB
    instruction can be reversed by the UP4UB instruction below.

      tmp0 = VectorLoad(op0);
      if (tmp0.x < 0.0) tmp0.x = 0.0;
      if (tmp0.x > 1.0) tmp0.x = 1.0;
      if (tmp0.y < 0.0) tmp0.y = 0.0;
      if (tmp0.y > 1.0) tmp0.y = 1.0;
      if (tmp0.z < 0.0) tmp0.z = 0.0;
      if (tmp0.z > 1.0) tmp0.z = 1.0;
      if (tmp0.w < 0.0) tmp0.w = 0.0;
      if (tmp0.w > 1.0) tmp0.w = 1.0;
      ub.x = round(255.0 * tmp0.x);  /* ub is a ubyte vector */
      ub.y = round(255.0 * tmp0.y);
      ub.z = round(255.0 * tmp0.z);
      ub.w = round(255.0 * tmp0.w);
      /* result obtained by combining raw bits of ub. */
      result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
      result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
      result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
      result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));

    The result must be written to a register with 32-bit components (an "R"
    register, o[COLR], or o[DEPR]).  A fragment program will fail to load if
    any other register type is specified.


    Section 3.11.5.24,  POW:  Exponentiation

    The POW instruction approximates the value of the first scalar operand
    raised to the power of the second scalar operand and replicates it to all
    four components of the result vector.

      tmp0 = ScalarLoad(op0);
      tmp1 = ScalarLoad(op1);
      result.x = ApproxPower(tmp0, tmp1);
      result.y = ApproxPower(tmp0, tmp1);
      result.z = ApproxPower(tmp0, tmp1);
      result.w = ApproxPower(tmp0, tmp1);
   
    The exponentiation approximation function is defined in terms of the base
    2 exponentiation and logarithm approximation operations in the EX2 and LG2
    instructions, including errors and the processing of any special cases.
    In particular,

      ApproxPower(a,b) = ApproxExp2(b * ApproxLog2(a)).

    The following special-case rules, which can be derived from the rules in
    the LG2, MUL, and EX2 instructions, apply to exponentiation:

      1. ApproxPower(<x>, <y>) = NaN, if x < -0.0,
      2. ApproxPower(<x>, <y>) = NaN, if x or y is NaN.
      3. ApproxPower(+/-0.0, +/-0.0) = NaN.
      4. ApproxPower(+INF, +/-0.0) = NaN.
      5. ApproxPower(+1.0, +/-INF) = NaN.
      6. ApproxPower(+/-0.0, <x>) = +0.0, if x > +0.0.
      7. ApproxPower(+/-0.0, <x>) = +INF, if x < -0.0.
      8. ApproxPower(+1.0, <x>)   = +1.0, if -INF < x < +INF.
      9. ApproxPower(+INF, <x>) = +INF, if x > +0.0.
      10. ApproxPower(+INF, <x>) = +INF, if x < -0.0.
      11. ApproxPower(<x>, +/-0.0) = +1.0, if +0.0 < x < +INF.
      12. ApproxPower(<x>, +1.0) ~= <x>, if x >= +0.0.
      13. ApproxPower(<x>, +INF) = +0.0, if -0.0 <= x < +1.0,
                                   +INF, if x > +1.0,
      14. ApproxPower(<x>, -INF) = +INF, if -0.0 <= x < +1.0,
                                   +0.0, if x > +1.0,

    Note that 0^0 is defined here as NaN, since ApproxLog2(0) = -INF, and
    0*(-INF) = NaN.  In many other applications, including the standard C
    pow() function, 0^0 is defined as 1.0.  This behavior can be emulated
    using additional instructions in much that same way that the pow()
    function is implemented on many CPUs.

    Note that a logarithm is involved even if the exponent is an integer.
    This means that any exponentiating with a negative base will produce NaN.
    In constrast, it is possible in a "normal" mathematical formulation to
    raise negative numbers to integral powers (e.g., (-3)^2== 9, and
    (-0.5)^-2==4).


    Section 3.11.5.25,  RCP:  Reciprocal

    The RCP instruction approximates the reciprocal of the scalar operand and
    replicates it to all four components of the result vector.

      tmp = ScalarLoad(op0);
      result.x = ApproxReciprocal(tmp);
      result.y = ApproxReciprocal(tmp);
      result.z = ApproxReciprocal(tmp);
      result.w = ApproxReciprocal(tmp);

    The approximation function is accurate to at least 22 bits:

      | ApproxReciprocal(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 2.0.

    The following special-case rules apply to reciprocation:

      1. ApproxReciprocal(NaN) = NaN.
      2. ApproxReciprocal(+INF) = +0.0.
      3. ApproxReciprocal(-INF) = -0.0.
      4. ApproxReciprocal(+0.0) = +INF.
      5. ApproxReciprocal(-0.0) = -INF.


    Section 3.11.5.26,  RFL:  Reflection Vector

    The RFL instruction computes the reflection of the second vector operand
    (the "direction" vector) about the vector specified by the first vector
    operand (the "axis" vector).  Both operands are treated as 3D vectors (the
    w components are ignored).  The result vector is another 3D vector (the
    "reflected direction" vector).  The length of the result vector, ignoring
    rounding errors, should equal that of the second operand.

      axis = VectorLoad(op0);
      direction = VectorLoad(op1);
      tmp.w = (axis.x * axis.x + axis.y * axis.y +
               axis.z * axis.z);
      tmp.x = (axis.x * direction.x + axis.y * direction.y + 
               axis.z * direction.z);
      tmp.x = 2.0 * tmp.x;
      tmp.x = tmp.x / tmp.w;
      result.x = tmp.x * axis.x - direction.x;
      result.y = tmp.x * axis.y - direction.y;
      result.z = tmp.x * axis.z - direction.z;

    A fragment program will fail to load if the w component of the result is
    enabled in the component write mask (see the <optionalWriteMask> rule in
    the grammar).


    Section 3.11.5.27,  RSQ:  Reciprocal Square Root

    The RSQ instruction approximates the reciprocal of the square root of the
    scalar operand and replicates it to all four components of the result
    vector.

      tmp = ScalarLoad(op0);
      result.x = ApproxRSQRT(tmp);
      result.y = ApproxRSQRT(tmp);
      result.z = ApproxRSQRT(tmp);
      result.w = ApproxRSQRT(tmp);

    The approximation function is accurate to at least 22 bits:

      | ApproxRSQRT(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 4.0.

    The following special-case rules apply to reciprocal square roots:

      1. ApproxRSQRT(NaN) = NaN.
      2. ApproxRSQRT(+INF) = +0.0.
      3. ApproxRSQRT(-INF) = NaN.
      4. ApproxRSQRT(+0.0) = +INF.
      5. ApproxRSQRT(-0.0) = -INF.
      6. ApproxRSQRT(x) = NaN, if -INF < x < -0.0.


    Section 3.11.5.28,  SEQ:  Set on Equal To

    The SEQ instruction performs a component-wise comparison of the two
    operands.  Each component of the result vector is 1.0 if the corresponding
    component of the first operand is equal to that of the second, and 0.0
    otherwise.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = (tmp0.x == tmp1.x) ? 1.0 : 0.0;
      result.y = (tmp0.y == tmp1.y) ? 1.0 : 0.0;
      result.z = (tmp0.z == tmp1.z) ? 1.0 : 0.0;
      result.w = (tmp0.w == tmp1.w) ? 1.0 : 0.0;

    The following special-case rules apply to SEQ:

      1. (<x> == <y>) and (<y> == <x>) always produce the same result.
      1. (NaN == <x>) is FALSE for all <x>, including NaN.
      2. (+INF == +INF) and (-INF == -INF) are TRUE.
      3. (-0.0 == +0.0) and (+0.0 == -0.0) are TRUE.


    Section 3.11.5.29,  SFL:  Set on False

    The SFL instruction is a degenerate case of the other "Set on"
    instructions that sets all components of the result vector to
    0.0.

      result.x = 0.0;
      result.y = 0.0;
      result.z = 0.0;
      result.w = 0.0;


    Section 3.11.5.30,  SGE:  Set on Greater Than or Equal

    The SGE instruction performs a component-wise comparison of the two
    operands.  Each component of the result vector is 1.0 if the corresponding
    component of the first operands is greater than or equal that of the
    second, and 0.0 otherwise.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = (tmp0.x >= tmp1.x) ? 1.0 : 0.0;
      result.y = (tmp0.y >= tmp1.y) ? 1.0 : 0.0;
      result.z = (tmp0.z >= tmp1.z) ? 1.0 : 0.0;
      result.w = (tmp0.w >= tmp1.w) ? 1.0 : 0.0;

    The following special-case rules apply to SGE:

      1. (NaN >= <x>) and (<x> >= NaN) are FALSE for all <x>.
      2. (+INF >= +INF) and (-INF >= -INF) are TRUE.
      3. (-0.0 >= +0.0) and (+0.0 >= -0.0) are TRUE.


    Section 3.11.5.31,  SGT:  Set on Greater Than

    The SGT instruction performs a component-wise comparison of the two
    operands.  Each component of the result vector is 1.0 if the corresponding
    component of the first operands is greater than that of the second, and
    0.0 otherwise.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = (tmp0.x > tmp1.x) ? 1.0 : 0.0;
      result.y = (tmp0.y > tmp1.y) ? 1.0 : 0.0;
      result.z = (tmp0.z > tmp1.z) ? 1.0 : 0.0;
      result.w = (tmp0.w > tmp1.w) ? 1.0 : 0.0;

    The following special-case rules apply to SGT:

      1. (NaN > <x>) and (<x> > NaN) are FALSE for all <x>.
      2. (-0.0 > +0.0) and (+0.0 > -0.0) are FALSE.


    Section 3.11.5.32,  SIN:  Sine

    The SIN instruction approximates the sine of the angle specified by the
    scalar operand and replicates it to all four components of the result
    vector.  The angle is specified in radians and does not have to be in the
    range [0,2*PI].

      tmp = ScalarLoad(op0);
      result.x = ApproxSine(tmp);
      result.y = ApproxSine(tmp);
      result.z = ApproxSine(tmp);
      result.w = ApproxSine(tmp);

    The approximation function is accurate to at least 22 bits with an angle
    in the range [0,2*PI].

      | ApproxSine(x) - sin(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI.

    The error in the approximation will typically increase with the absolute
    value of the angle when the angle falls outside the range [0,2*PI].

    The following special-case rules apply to cosine approximation:

      1. ApproxSine(NaN) = NaN.
      2. ApproxSine(+/-INF) = NaN.
      3. ApproxSine(+/-0.0) = +/-0.0.  The sign of the result is equal to the
         sign of the single operand.


    Section 3.11.5.33,  SLE:  Set on Less Than or Equal

    The SLE instruction performs a component-wise comparison of the two
    operands.  Each component of the result vector is 1.0 if the corresponding
    component of the first operand is less than or equal to that of the
    second, and 0.0 otherwise.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = (tmp0.x <= tmp1.x) ? 1.0 : 0.0;
      result.y = (tmp0.y <= tmp1.y) ? 1.0 : 0.0;
      result.z = (tmp0.z <= tmp1.z) ? 1.0 : 0.0;
      result.w = (tmp0.w <= tmp1.w) ? 1.0 : 0.0;

    The following special-case rules apply to SLE:

      1. (NaN <= <x>) and (<x> <= NaN) are FALSE for all <x>.
      2. (+INF <= +INF) and (-INF <= -INF) are TRUE.
      3. (-0.0 <= +0.0) and (+0.0 <= -0.0) are TRUE.


    Section 3.11.5.34,  SLT:  Set on Less Than

    The SLT instruction performs a component-wise comparison of the two
    operands.  Each component of the result vector is 1.0 if the corresponding
    component of the first operand is less than that of the second, and 0.0
    otherwise.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = (tmp0.x < tmp1.x) ? 1.0 : 0.0;
      result.y = (tmp0.y < tmp1.y) ? 1.0 : 0.0;
      result.z = (tmp0.z < tmp1.z) ? 1.0 : 0.0;
      result.w = (tmp0.w < tmp1.w) ? 1.0 : 0.0;

    The following special-case rules apply to SLT:

      1. (NaN < <x>) and (<x> < NaN) are FALSE for all <x>.
      2. (-0.0 < +0.0) and (+0.0 < -0.0) are FALSE.


    Section 3.11.5.35,  SNE:  Set on Not Equal

    The SNE instruction performs a component-wise comparison of the two
    operands.  Each component of the result vector is 1.0 if the corresponding
    component of the first operand is not equal to that of the second, and 0.0
    otherwise.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = (tmp0.x != tmp1.x) ? 1.0 : 0.0;
      result.y = (tmp0.y != tmp1.y) ? 1.0 : 0.0;
      result.z = (tmp0.z != tmp1.z) ? 1.0 : 0.0;
      result.w = (tmp0.w != tmp1.w) ? 1.0 : 0.0;

    The following special-case rules apply to SNE:

      1. (<x> != <y>) and (<y> != <x>) always produce the same result.
      2. (NaN != <x>) is TRUE for all <x>, including NaN.
      3. (+INF != +INF) and (-INF != -INF) are FALSE.
      4. (-0.0 != +0.0) and (+0.0 != -0.0) are TRUE.


    Section 3.11.5.36,  STR:  Set on True

    The STR instruction is a degenerate case of the other "Set on"
    instructions that sets all components of the result vector to 1.0.

      result.x = 1.0;
      result.y = 1.0;
      result.z = 1.0;
      result.w = 1.0;


    Section 3.11.5.37,  SUB:  Subtract

    The SUB instruction performs a component-wise subtraction of the second
    operand from the first to yield a result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = tmp0.x - tmp1.x;
      result.y = tmp0.y - tmp1.y;
      result.z = tmp0.z - tmp1.z;
      result.w = tmp0.w - tmp1.w;

    The SUB instruction is completely equivalent to an identical ADD
    instruction in which the negate operator on the second operand is
    reversed:

      1. "SUB R0, R1, R2" is equivalent to "ADD R0, R1, -R2".
      2. "SUB R0, R1, -R2" is equivalent to "ADD R0, R1, R2".
      3. "SUB R0, R1, |R2|" is equivalent to "ADD R0, R1, -|R2|".
      4. "SUB R0, R1, -|R2|" is equivalent to "ADD R0, R1, |R2|".


    Section 3.11.5.38,  TEX: Texture Lookup

    The TEX instruction performs a filtered texture lookup using the texture
    target given by <texImageTarget> belonging to the texture image unit given
    by <texImageUnit>.  <texImageTarget> values of "1D", "2D", "3D", "CUBE",
    and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D,
    TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively.
    
    The (s,t,r) texture coordinates used for the lookup are the x, y, and z
    components of the single operand.

    The texture lookup is performed as specified in Section 3.8.  The LOD
    calculations in Section 3.8.5 are performed using an implementation
    dependent method to derive ds/dx, ds/dy, dt/dx, dt/dy, dr/dx, and dr/dy.
    The mapping of filtered texture components to the components of the result
    vector is dependent on the base internal format of the texture and is
    specified in Table X.5.

                                 Result Vector Components
      Base Internal Format        X      Y      Z      W
      --------------------      -----  -----  -----  -----
      ALPHA                      0.0    0.0    0.0    At
      LUMINANCE                  Lt     Lt     Lt     1.0
      LUMINANCE_ALPHA            Lt     Lt     Lt     At
      INTENSITY                  It     It     It     It
      RGB                        Rt     Gt     Bt     1.0
      RGBA                       Rt     Gt     Bt     At
      HILO_NV (signed)           HIt    LOt    HEMI   1.0
      HILO_NV (unsigned)         HIt    LOt    1.0    1.0
      DSDT_NV                    DSt    DTt    0.0    1.0
      DSDT_MAG_NV                DSt    DTt    MAGt   1.0
      DSDT_MAG_INTENSITY_NV      DSt    DTt    MAGt   It
      FLOAT_R_NV                 Rt     0.0    0.0    1.0
      FLOAT_RG_NV                Rt     Gt     0.0    1.0
      FLOAT_RGB_NV               Rt     Gt     Bt     1.0
      FLOAT_RGBA_NV              Rt     Gt     Bt     At
      
      Table X.5:  Mapping of filtered texel components to result vector
      components for the TEX instruction.  0.0 and 1.0 indicate that the
      corresponding constant value is written to the result vector.
      DEPTH_COMPONENT textures are treated as ALPHA, LUMINANCE, or INTENSITY,
      as specified in the texture's depth texture mode.

      For HILO_NV textures with signed components, "HEMI" is defined as
      sqrt(MAX(0, 1-(HIt^2+LOt^2))).

    This instruction specifies a particular texture target, ignoring the
    standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D,
    TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended
    OpenGL.  If the specified texture target has a consistent set of images, a
    lookup is performed.  Otherwise, the result of the instruction is the
    vector (0,0,0,0).

    Although this instruction allows the selection of any texture target, a
    fragment program can not use more than one texture target for any given
    texture image unit.
      

    Section 3.11.5.39,  TXD: Texture Lookup with Derivatives

    The TXD instruction performs a filtered texture lookup using the texture
    target given by <texImageTarget> belonging to the texture image unit given
    by <texImageUnit>.  <texImageTarget> values of "1D", "2D", "3D", "CUBE",
    and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D,
    TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively.
    
    The (s,t,r) texture coordinates used for the lookup are the x, y, and z
    components of the first operand.  The partial derivatives in the X
    direction (ds/dx, dt/dx, dr/dx) are specified by the x, y, and z
    components of the second operand.  The partial derivatives in the Y
    direction (ds/dy, dt/dy, dr/dy) are specified by the x, y, and z
    components of the third operand.

    The texture lookup is performed as specified in Section 3.8.  The LOD
    calculations in Section 3.8.5 are performed using the specified partial
    derivatives.  The mapping of filtered texture components to the components
    of the result vector is dependent on the base internal format of the
    texture and is specified in Table X.5.

    This instruction specifies a particular texture target, ignoring the
    standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D,
    TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended
    OpenGL.  If the specified texture target has a consistent set of images, a
    lookup is performed.  Otherwise, the result of the instruction is the
    vector (0,0,0,0).
      
    Although this instruction allows the selection of any texture target, a
    fragment program can not use more than one texture target for any given
    texture image unit.
      

    Section 3.11.5.40,  TXP: Projective Texture Lookup

    The TXP instruction performs a filtered texture lookup using the texture
    target given by <texImageTarget> belonging to the texture image unit given
    by <texImageUnit>.  <texImageTarget> values of "1D", "2D", "3D", "CUBE",
    and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D,
    TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively.

    For cube map textures, the (s,t,r) texture coordinates used for the lookup
    are given by x, y, and z, respectively.  For all other textures, the
    (s,t,r) texture coordinates used for the lookup are given by x/w, y/w, and
    z/w, respectively, where x, y, z, and w are the corresponding components
    of the operand.

    The texture lookup is performed as specified in Section 3.8.  The LOD
    calculations in Section 3.8.5 are performed using an implementation
    dependent method to derive ds/dx, ds/dy, dt/dx, dt/dy, dr/dx, and dr/dy.
    The mapping of filtered texture components to the components of the result
    vector is dependent on the base internal format of the texture and is
    specified in Table X.5.

    This instruction specifies a particular texture target, ignoring the
    standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D,
    TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended
    OpenGL.  If the specified texture target has a consistent set of images, a
    lookup is performed.  Otherwise, the result of the instruction is the
    vector (0,0,0,0).
      
    Although this instruction allows the selection of any texture target, a
    fragment program can not use more than one texture target for any given
    texture image unit.
      

    Section 3.11.5.41,  UP2H:  Unpack Two 16-Bit Floats

    The UP2H instruction unpacks two 16-bit floats stored together in a 32-bit
    scalar operand.  The first 16-bit float (stored in the 16 least
    significant bits) is written into the "x" and "z" components of the result
    vector; the second is written into the "y" and "w" components of the
    result vector.

    This operation undoes the type conversion and packing performed by the
    PK2H instruction.

      tmp = ScalarLoad(op0);
      result.x = (fp16) (RawBits(tmp) & 0xFFFF);
      result.y = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);
      result.z = (fp16) (RawBits(tmp) & 0xFFFF);
      result.w = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);
    
    Since the source operand must be a 32-bit scalar, a fragment program will
    fail to load if the operand is not obtained from a register with 32-bit
    components or from a program parameter.


    Section 3.11.5.42,  UP2US:  Unpack Two Unsigned 16-Bit Scalars

    The UP2US instruction unpacks two 16-bit unsigned values packed together
    in a 32-bit scalar operand.  The unsigned quantities are encoded where a
    bit pattern of all '0' bits corresponds to 0.0 and a pattern of all '1'
    bits corresponds to 1.0.  The "x" and "z" components of the result vector
    are obtained from the 16 least significant bits of the operand; the "y"
    and "w" components are obtained from the 16 most significant bits.

    This operation undoes the type conversion and packing performed by the
    PK2US instruction.

      tmp = ScalarLoad(op0);
      result.x = ((RawBits(tmp) >> 0)  & 0xFFFF) / 65535.0;
      result.y = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;
      result.z = ((RawBits(tmp) >> 0)  & 0xFFFF) / 65535.0;
      result.w = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;

    Since the source operand must be a 32-bit scalar, a fragment program will
    fail to load if the operand is not obtained from a register with 32-bit
    components or from a program parameter.


    Section 3.11.5.43,  UP4B:  Unpack Four Signed 8-Bit Values

    The UP4B instruction unpacks four 8-bit signed values packed together in a
    32-bit scalar operand.  The signed quantities are encoded where a bit
    pattern of all '0' bits corresponds to -128/127 and a pattern of all '1'
    bits corresponds to +127/127.  The "x" component of the result vector is
    the converted value corresponding to the 8 least significant bits of the
    operand; the "w" component corresponds to the 8 most significant bits.

    This operation undoes the type conversion and packing performed by the
    PK4B instruction.

      tmp = ScalarLoad(op0);
      result.x = (((RawBits(tmp) >> 0) & 0xFF) - 128) / 127.0;
      result.y = (((RawBits(tmp) >> 8) & 0xFF) - 128) / 127.0;
      result.z = (((RawBits(tmp) >> 16) & 0xFF) - 128) / 127.0;
      result.w = (((RawBits(tmp) >> 24) & 0xFF) - 128) / 127.0;

    Since the source operand must be a 32-bit scalar, a fragment program will
    fail to load if the operand is not obtained from a register with 32-bit
    components or from a program parameter.


    Section 3.11.5.44,  UP4UB:  Unpack Four Unsigned 8-Bit Scalars

    The UP4UB instruction unpacks four 8-bit unsigned values packed together
    in a 32-bit scalar operand.  The unsigned quantities are encoded where a
    bit pattern of all '0' bits corresponds to 0.0 and a pattern of all '1'
    bits corresponds to 1.0.  The "x" component of the result vector is
    obtained from the 8 least significant bits of the operand; the "w"
    component is obtained from the 8 most significant bits.

    This operation undoes the type conversion and packing performed by the
    PK4UB instruction.

      tmp = ScalarLoad(op0);
      result.x = ((RawBits(tmp) >> 0)  & 0xFF) / 255.0;
      result.y = ((RawBits(tmp) >> 8)  & 0xFF) / 255.0;
      result.z = ((RawBits(tmp) >> 16) & 0xFF) / 255.0;
      result.w = ((RawBits(tmp) >> 24) & 0xFF) / 255.0;

    Since the source operand must be a 32-bit scalar, a fragment program will
    fail to load if the operand is not obtained from a register with 32-bit
    components or from a program parameter.


    Section 3.11.5.45,  X2D:  2D Coordinate Transformation

    The X2D instruction multiplies the 2D offset vector specified by the "x"
    and "y" components of the second vector operand by the 2x2 matrix
    specified by the four components of the third vector operand, and adds the
    transformed offset vector to the 2D vector specified by the "x" and "y"
    components of the first vector operand.  The first component of the sum is
    written to the "x" and "z" components of the result; the second component
    is written to the "y" and "w" components of the result.

    The X2D instruction can be used to displace texture coordinates in the
    same manner as the OFFSET_TEXTURE_2D_NV mode in the GL_NV_texture_shader
    extension.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      tmp2 = VectorLoad(op2);
      result.x = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;
      result.y = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;
      result.z = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;
      result.w = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;


    Section 3.11.6, Fragment Program Outputs

    Upon completion of fragment program execution, the output registers are
    used to replace the fragment's associated data.

    The RGBA color of the fragment is taken from the color output register
    used by the program (COLR or COLH).  The R, G, B, and A color components
    are extracted from the "x", "y", "z", and "w" components, respectively, of
    the output register and are clamped to the range [0,1].

    If the DEPR output register is written by the fragment program, the depth
    value of the fragment is taken from the z component of the DEPR output
    register.  If depth clamping is enabled, the depth value is clamped to the
    range [min(n,f), max(n,f)], where n and f are the near and far depth range
    values.  If depth clamping is disabled, the fragment is discarded if its
    depth value is outside the range [min(n,f), max(n,f)].


    Section 3.11.7, Required Fragment Program State

    The state required for managing fragment programs consists of:

      a bit indicating whether or not fragment program mode is enabled;

      an unsigned integer naming the currently bound fragment program

      and the state that must be maintained to indicate which integers are
      currently in use as fragment program names.

    Fragment program mode is initially disabled.  The initial state of all 128
    fragment program parameter registers is (0,0,0,0).  The initial currently
    bound fragment program is zero.

    Each fragment program object consists of:

      an enumerant given the program target (FRAGMENT_PROGRAM_NV);

      a boolean indicating whether the program is resident;

      an array of type ubyte containing the program string;

      an integer representing the length of the program string array;

      one four-component floating-point vector for each named local
      parameter in the program;

      and a set of MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV four-component
      floating-point vectors to hold numbered local parameters, each initially
      set to (0,0,0,0).

    Initially, no program objects exist.

    Additionally, the state required during the execution of a fragment
    program consists of:  twelve 4-component floating-point fragment attribute
    registers, thirty-two 128-bit physical temporary registers, and a single
    4-component condition code, whose components have one of four values (LT,
    EQ, GT, or UN).

    Each time a fragment program is executed, the fragment attribute registers
    are initialized with the fragment's location and associated data, all
    temporary register components are initialized to zero, and all condition
    code components are initialized to EQ.


    Renumber Section 3.11 to Section 3.12, Antialiasing Application (p.140).
    No changes to the text of the section.


Additions to Chapter 4 of the OpenGL 1.2.1 Specification (Per-Fragment
Operations and the Framebuffer) 

    None

Additions to Chapter 5 of the OpenGL 1.2.1 Specification (Special Functions) 

    Add new section 5.7, Programs (after "Flush and Finish")

    Programs are specified as an array of ubytes used to control the operation
    of portions of the GL.  The array is a string of ASCII characters encoding
    the program.

    The command

      LoadProgramNV(enum target, uint id, sizei len, const ubyte *program);

    loads a program.  The target parameter specifies the type of program
    loaded and can be VERTEX_PROGRAM_NV, VERTEX_STATE_PROGRAM_NV, or
    FRAGMENT_PROGRAM_NV.  VERTEX_PROGRAM_NV specifies a program to be executed
    in vertex program mode as each vertex is specified.  VERTEX_STATE_PROGRAM
    specifies a program to be run manually to update vertex state.
    FRAGMENT_PROGRAM specifies a program to be executed in fragment program
    mode as each fragment is rasterized.

    Multiple programs can be loaded with different names.  id names the
    program to load.  The name space for programs is the set of positive
    integers (zero is reserved).  The error INVALID_VALUE is generated by
    LoadProgramNV if a program is loaded with an id of zero.  The error
    INVALID_OPERATION is generated by LoadProgramNV or if a program is loaded
    for an id that is currently loaded with a program of a different program
    target.  program is a pointer to an array of ubytes that represents the
    program being loaded.  The length of the array in ubytes is indicated by
    len.

    At program load time, the program is parsed into a set of tokens possibly
    separated by white space.  Spaces, tabs, newlines, carriage returns, and
    comments are considered whitespace.  Comments begin with the character "#"
    and are terminated by a newline, a carriage return, or the end of the
    program array.  Tokens are processed in a case-sensitive manner:  upper
    and lower-case letters are not considered equivalent.

    Each program target has a corresponding Backus-Naur Form (BNF) grammar
    specifying the syntactically valid sequences for programs of the specified
    type.  The set of valid tokens can be inferred from the grammar.  The
    token "" represents an empty string and is used to indicate optional
    rules.  A program is invalid if it contains any undefined tokens or
    characters.

    The error INVALID_OPERATION is generated by LoadProgramNV if a program
    fails to load because it is not syntactically correct or fails to satisfy
    all of the semantic restrictions corresponding to the program target.

    A successfully loaded program is parsed into a sequence of instructions.
    Each instruction is identified by its tokenized name.  The operation of
    these instructions is specific to the program target and is defined
    elsewhere.

    A successfully loaded program replaces the program previously assigned to
    the name specified by id.  If the OUT_OF_MEMORY error is generated by
    LoadProgramNV, no change is made to the previous contents of the named
    program.

    Querying the value of PROGRAM_ERROR_POSITION_NV returns a ubyte offset
    into the program string most recently passed to LoadProgramNV indicating
    the position of the first error, if any, in the program.  If the program
    fails to load because of a semantic restriction that cannot be determined
    until the program is fully scanned, the error position will be len, the
    length of the program.  If the program loads successfully, the value of
    PROGRAM_ERROR_POSITION_NV is assigned the value negative one.

    For targets whose programs are executed automatically (e.g., vertex and
    fragment programs), there must be a current program.  The current vertex
    program is executed automatically in vertex program mode as vertices are
    specified.  The current fragment program is executed automatically in
    fragment program mode as fragments are generated by rasterization.
    Current programs for a program target are updated by

      BindProgramNV(enum target, uint id);

    where target must be VERTEX_PROGRAM_NV or FRAGMENT_PROGRAM_NV.  The error
    INVALID_OPERATION is generated by BindProgramNV if id names a program that
    has a type different than target (for example, if id names a vertex state
    program as described in section 2.14.4).

    Binding to a nonexistent program id does not generate an error.  In
    particular, binding to program id zero does not generate an error.
    However, because program zero cannot be loaded, program zero is always
    nonexistent.  If a program id is successfully loaded with a new vertex
    program and id is also the currently bound vertex program, the new program
    is considered the currently bound vertex program.

    The INVALID_OPERATION error is generated when both vertex program mode is
    enabled and Begin is called (or when a command that performs an implicit
    Begin is called) if the current vertex program is nonexistent or not
    valid.  A vertex program may not be valid for reasons explained in section
    2.14.5.

    The INVALID_OPERATION error is generated when both fragment program mode
    is enabled and Begin, another GL command that performs an implicit Begin,
    or any other GL command that generates fragments is called, if the current
    fragment program is nonexistent or not valid.  A fragment program may be
    invalid for reasons explained in Section 3.11.3.

    Programs are deleted by calling

      void DeleteProgramsNV(sizei n, const uint *ids);

    ids contains n names of programs to be deleted.  After a program is
    deleted, it becomes nonexistent, and its name is again unused.  If a
    program that is currently bound is deleted, it is as though BindProgramNV
    has been executed with the same target as the deleted program and program
    zero.  Unused names in ids are silently ignored, as is the value zero.

    The command

      void GenProgramsNV(sizei n, uint *ids);

    returns n currently unused program names in ids.  These names are marked
    as used, for the purposes of GenProgramsNV only, but they become existent
    programs only when the are first loaded using LoadProgramNV.

    An implementation may choose to establish a working set of programs on
    which binding and/or manual execution are performed with higher
    performance.  A program that is currently part of this working set is said
    to be resident.

    The command
      
      boolean AreProgramsResidentNV(sizei n, const uint *ids,
                                    boolean *residences);

    returns TRUE if all of the n programs named in ids are resident, or if the
    implementation does not distinguish a working set.  If at least one of the
    programs named in ids is not resident, then FALSE is returned, and the
    residence of each program is returned in residences.  Otherwise the
    contents of residences are not changed.  If any of the names in ids are
    nonexistent or zero, FALSE is returned, the error INVALID_VALUE is
    generated, and the contents of residences are indeterminate.  The
    residence status of a single named program can also be queried by calling
    GetProgramivNV (Section 6.1.13) with id set to the name of the program and
    pname set to PROGRAM_RESIDENT_NV.

    AreProgramsResidentNV indicates only whether a program is currently
    resident, not whether it could not be made resident.  An implementation
    may choose to make a program resident only on first use, for example.  The
    client may guide the GL implementation in determining which programs
    should be resident by requesting a set of programs to make resident.

    The command

      void RequestResidentProgramsNV(sizei n, const uint *ids);

    requests that the n programs named in ids should be made resident.
    While all the programs are not guaranteed to become resident,
    the implementation should make a best effort to make as many of
    the programs resident as possible.  As a result of making the
    requested programs resident, program names not among the requested
    programs may become non-resident.  Higher priority for residency
    should be given to programs listed earlier in the ids array.
    RequestResidentProgramsNV silently ignores attempts to make resident
    nonexistent program names or zero.  AreProgramsResidentNV can be
    called after RequestResidentProgramsNV to determine which programs
    actually became resident.

    The commands

      void ProgramNamedParameter4fNV(uint id, sizei len, const ubyte *name,
                                     float x, float y, float z, float w);
      void ProgramNamedParameter4dNV(uint id, sizei len, const ubyte *name,
                                     double x, double y, double z, double w);
      void ProgramNamedParameter4fvNV(uint id, sizei len, const ubyte *name,
                                      const float v[]);
      void ProgramNamedParameter4dvNV(uint id, sizei len, const ubyte *name,
                                      const double v[]);

    specify a new value for the named program local parameter <name> belonging
    to the fragment program specified by <id>.  <name> is a pointer to an
    array of ubytes holding the parameter name.  <len> specifies the number of
    ubytes in the array given by <name>.  The new x, y, z, and w components of
    the named local parameter are given by x, y, z, and w, respectively, for
    ProgramNamedParameter4fNV and ProgramNamedParameter4dNV, and by v[0],
    v[1], v[2], and v[3], respectively, for ProgramNamedParameter4fvNV and
    ProgramNamedParameter4dvNV.  The error INVALID_OPERATION is generated if
    <id> specifies a nonexistent program or a program whose type does not
    suport named local parameters.  The error INVALID_VALUE error is generated
    if <name> does not specify the name of a local parameter in the program
    corresponding to <id>.  The error INVALID_VALUE is also generated if <len>
    is zero.

    The commands

      void ProgramLocalParameter4fARB(enum target, uint index,
                                      float x, float y, float z, float w);
      void ProgramLocalParameter4fvARB(enum target, uint index, 
                                       const float *params);
      void ProgramLocalParameter4dARB(enum target, uint index,
                                      double x, double y, double z, double w);
      void ProgramLocalParameter4dvARB(enum target, uint index, 
                                       const double *params);

    update the values of the numbered program local parameter <index>
    belonging to the program object currently bound to <target>.  For
    ProgramLocalParameter4fARB and ProgramLocalParameter4dARB, the four
    components of the parameter are updated with the values of <x>, <y>, <z>,
    and <w>, respectively.  For ProgramLocalParameter4fvARB and
    ProgramLocalParameter4dvARB, the four components of the parameter are
    updated with the array of four values pointed to by <params>.  The error
    INVALID_VALUE is generated if <index> is greater than or equal to the
    number of numbered program local parameters supported by <target>.


Additions to Chapter 6 of the OpenGL 1.2.1 Specification (State and
State Requests)

    Modify Section 6.1.11, Pointer and String Queries (p. 206)

    (modify last paragraph, p. 206) ... The possible values for <name> are
    VENDOR, RENDERER, VERSION, EXTENSIONS, and PROGRAM_ERROR_STRING_NV.

    (add after last paragraph of section, p. 207) Queries of
    PROGRAM_ERROR_STRING_NV return a pointer to an implementation-dependent
    program load error string.  If the last call to LoadProgramNV failed to
    load a program, the returned string describes a reason that the program
    failed to load.  Otherwise, a pointer to an empty string (containing only
    a terminator) is returned.

    Rename and modify Section 6.1.13, Vertex and Fragment Program Queries
    (from GL_NV_fragment_program).  Portions of this section pertaining to
    fragment programs are copied verbatim.

    (insert after discussion of GetProgramParameter[fd]vNV)

    The commands

      void GetProgramNamedParameterfvNV(uint id, sizei len,
                                        const ubyte *name, float *params);
      void GetProgramNamedParameterdvNV(uint id, sizei len,
                                        const ubyte *name, double *params);

    obtain the current program named local parameter value for the parameter
    named <name> belonging to the program given by <id>.  <name> is a pointer
    to an array of ubytes holding the parameter name.  <len> specifies the
    number of ubytes in the array given by <name>.  The error
    INVALID_OPERATION is generated if <id> specifies a nonexistent program or
    a program whose type does not suport named local parameters.  The error
    INVALID_VALUE is generated if <name> does not specify the name of a local
    parameter in the program corresponding to <id>.  The error INVALID_VALUE
    is also generated if <len> is zero.  Each named program local parameter is
    an array of four values.

    The commands

      void GetProgramLocalParameterdvARB(enum target, uint index,
                                         double *params);
      void GetProgramLocalParameterfvARB(enum target, uint index,
                                         float *params);

    obtain the current value for the numbered program local parameter <index>
    belonging to the program object currently bound to <target>, and places
    the information in the array <params>.  The error INVALID_ENUM is
    generated if <target> specifies a nonexistent program target or a program
    target that does not support numbered program local parameters.  The error
    INVALID_VALUE is generated if <index> is greater than or equal to the
    implementation-dependent number of supported numbered program local
    parameters for the program target.

    When the program target type is FRAGMENT_PROGRAM_NV, each numbered program
    local parameter returned is an array of four values.  ...

    The command

      void GetProgramivNV(uint id, enum pname, int *params);

    obtains program state named by pname for the program named id in the array
    params.  pname must be one of PROGRAM_TARGET_NV, PROGRAM_LENGTH_NV, or
    PROGRAM_RESIDENT_NV.  The error INVALID_OPERATION is generated if the
    program named id does not exist.

    The command

      void GetProgramStringNV(uint id, enum pname,
                              ubyte *program);

    obtains the program string for program id.  pname must be
    PROGRAM_STRING_NV.  n ubytes are returned into the array program
    where n is the length of the program in ubytes.  GetProgramivNV with
    PROGRAM_LENGTH_NV can be used to query the length of a program's
    string.  The INVALID_OPERATION error is generated if the program
    named id does not exist.

    ...

    The command

      boolean IsProgramNV(uint id);

    returns TRUE if program is the name of a program object.  If program
    is zero or is a non-zero value that is not the name of a program
    object, or if an error condition occurs, IsProgramNV returns FALSE.
    A name returned by GenProgramsNV but not yet loaded with a program
    is not the name of a program object."


Additions to Appendix F of the OpenGL 1.2.1 Specification (ARB Extensions)

    Modify Section F.2.3 (Changes to Section 2.6), p.240
 
    (modify last paragraph on p.240) ... Multiple sets of texture coordinates
    may be used to specify how multiple texture images are mapped onto a
    primitive.  The number of texture coordinate sets supported is
    implementation dependent, but must be at least 1.  The number of texture
    coordinate sets supported may be queried with the state
    MAX_TEXTURE_COORDS_NV.

    Modify Section F.2.4 (Changes to Section 2.7), p.241

    (modify the last paragraph on p.241, carrying over to p.243)
    Implementations may support more than one set of texture coordinates.  The
    commands

        void MultiTexCoord{1234}{sifd}ARB(enum texture, T coords)
        void MultiTexCoord{1234}{sifd}vARB(enum texture, T coords)

    take the coordinate set to be modified as the <texture> parameter.
    <texture> is a symbolic constant of the form TEXTUREi_ARB, indicating that
    texture coordinate set i is to be modified.  The constants obey
    TEXTUREi_ARB = TEXTURE0_ARB + i (i is in the range 0 to k-1, where k is
    the implementation dependent number of texture units defined by
    MAX_TEXTURE_COORDS_NV).


    Modify Section F.2.5 (Changes to Section 2.8), p.243

    (modify first and second paragraphs of section) ... The client may specify
    up to 5 plus the value of MAX_TEXTURE_COORDS_NV arrays; one each to store
    vertex coordinates...

    In implementations which support more than one texture coordinate set, the
    command

        void ClientActiveTextureARB(enum texture)

    is used to select the vertex array client state parameters to be modified
    by the TexCoordPointer command and the array affected by EnableClientState
    and DisableClientState with the parameter TEXTURE_COORD_ARRAY.  This
    command sets the state variable CLIENT_ACTIVE_TEXTURE_ARB.  Each texture
    coordinate set has a client state vector which is selected when this
    command is invoked.  This state vector also includes the vertex array
    state.  This command also selects the texture coordinate set state used
    for queries of client state.

    (modify first paragraph on p.244) If the number of supported texture
    coordinate sets (the value of MAX_TEXTURE_COORDS_NV) is k, ...


    Modify Section F.2.6 (Changes to Section 2.10.2), p.244

    (modify first paragraph)  For each texture coordinate set, a 4x4 matrix is
    applied to the corresponding texture coordinates...

    (replace second and third paragraphs) The command

      void ActiveTextureARB(enum texture);

    specifies the active texture unit selector, ACTIVE_TEXTURE_ARB.  Each
    texture unit contains up to two distinct sub-units:  a texture coordinate
    processing unit (consisting of a texture matrix stack and texture
    coordinate generation state) and a texture image unit (consisting of all
    the texture state defined in Section 3.8).  In implementations with a
    different number of supported texture coordinate sets and texture image
    units, some texture units may consist of only one of the two sub-units.

    The active texture unit selector specifies the texture unit accessed by
    commands involving texture coordinate processing.  Such commands include
    those accessing the current matrix stack (if MATRIX_MODE is TEXTURE),
    TexGen (Section 2.10.4), Enable/Disable (if any texture coordinate
    generation enum is selected), as well as queries of the current texture
    coordinates and current raster texture coordinates.  If the texture unit
    number corresponding to the current value of ACTIVE_TEXTURE_ARB is greater
    than or equal to the implementation dependent constant
    MAX_TEXTURE_COORDS_NV, the error INVALID_OPERATION is generated by any
    such command.

    The active texture unit selector also selects the texture unit accessed by
    commands involving texture image processing (Section 3.8).  Such commands
    include all variants of TexEnv, TexParameter, and TexImage commands,
    BindTexture, Enable/Disable for any texture target (e.g., TEXTURE_2D), and
    queries of all such state.  If the texture unit number corresponding to
    the current value of ACTIVE_TEXTURE_ARB is greater than or equal to the
    implementation dependent constant MAX_TEXTURE_IMAGE_UNITS_NV, the error
    INVALID_OPERATION is generated by any such command.

    ActiveTextureARB generates the error INVALID_ENUM if an invalid <texture>
    is specified.  <texture> is a symbolic constant of the form TEXTUREi_ARB,
    indicating that texture unit i is to be modified.  The constants obey
    TEXTUREi_ARB = TEXTURE0_ARB + i (i is in the range 0 to k-1, where k is
    the larger of the MAX_TEXTURE_COORDS_NV and MAX_TEXTURE_IMAGE_UNITS_NV).
    For compatibility with old OpenGL specifications, the implementation
    dependent constant MAX_TEXTURE_UNITS_ARB specifies the number of
    conventional texture units supported by the implementation.  Its value
    must be no larger than the minimum of MAX_TEXTURE_COORDS_NV and
    MAX_TEXTURE_IMAGE_UNITS_NV.

    Modify Section F.2.12 (Changes to Section 3.8.10), p.249

    (modify next-to-last paragraph) Texturing is enabled and disabled
    individually for each texture unit.  If texturing is disabled for one of
    the units, then the fragment resulting from the previous unit is passed
    unaltered to the following unit.  Individual texture units beyond those
    specified by MAX_TEXTURE_UNITS_ARB may be incomplete and are always
    treated as disabled.

    Modify Section F.2.15 (Changes to Section 6.1.2), p.251
    
    (add to end of paragraph) Queries of texture state variables corresponding
    to texture coordinate processing unit (namely, TexGen state and enables,
    and matrices) will produce an INVALID_OPERATION error if the value of
    ACTIVE_TEXTURE_ARB is greater than or equal to MAX_TEXTURE_COORDS_NV.  All
    other texture state queries will result in an INVALID_OPERATION error if
    the value of ACTIVE_TEXTURE_ARB is greater than or equal to
    MAX_TEXTURE_IMAGE_UNITS_NV.

Additions to the AGL/GLX/WGL Specifications

    Program objects are shared between AGL/GLX/WGL rendering contexts if
    and only if the rendering contexts share display lists.  No change
    is made to the AGL/GLX/WGL API.

Dependencies on GL_NV_vertex_program

    If NV_vertex_program is supported, the description of LoadProgramNV in
    Section 2.14.1.7 (up to the BNF description of vertex programs) is
    deleted, as it is replaced by the contents of Section 5.7 in this
    specification.  The general error descriptions in Section 2.14.1.7 common
    to Section 5.7 (like INVALID_OPERATION if the program fails to compile)
    should also be deleted.  Section 2.14.1.8 should also be deleted.  Section
    6.1.13 is modified by this specification as described above.

Dependencies on NV_texture_shader

    If NV_texture_shader is not supported, the comment about texture shaders
    being disabled in fragment program mode is not applicable.

Dependencies on NV_texture_rectangle
  
    If NV_texture_rectangle is not supported, the references to "RECT" in the
    <texImageTarget> grammar rule and TEXTURE_RECTANGLE_NV are not applicable.

Dependencies on ARB_texture_cube_map
  
    If ARB_texture_cube_map is not supported, the references to "CUBE" in the
    <texImageTarget> grammar rule and TEXTURE_CUBE_MAP_ARB are not applicable.

Dependencies on EXT_fog_coord

    If EXT_fog_coord is not supported, references to "fog coordinate" in the
    definition of the "FOGC" fragment attribute register should be removed.

Dependencies on NV_depth_clamp

    If NV_depth_clamp is not supported, section 3.11.6 is modified to remove
    discussion of the depth clamp enable and instead indicate that fragments
    with depth values outside [min(n,f), max(n,f)] are always discarded.

Dependencies on ARB_depth_texture and SGIX_depth_texture

    If ARB_depth_texture is not supported, but SGIX_depth_texture is
    supported, the discussion of Table X.5 is modified to indicate that
    DEPTH_COMPONENT textures are treated as LUMINANCE.

    If neither extension is supported, the discussion of DEPTH_COMPONENT
    textures in Table X.5 should be removed.

Dependencies on NV_float_buffer

    If NV_float_buffer is not supported, references to FLOAT_R_NV,
    FLOAT_RG_NV, FLOAT_RGB_NV, and FLOAT_RGBA_NV internal texture formats in
    Table X.5 should be removed.

Dependencies on ARB_vertex_program

    This extension does not have any explicit dependencies, but the APIs for
    setting and querying numbered local parameters (ProgramLocalParameter*ARB
    and GetProgramLocalParameter*ARB) were taken directly from this extension,

Dependencies on ARB_fragment_program

    If ARB_fragment_program is not supported, the maximum number of executable
    instructions in any !!FP1.0 program is 1024.  If ARB_fragment_program is
    supported, the maximum number of executable instructions for an !!FP1.0 is
    at least 1024, but can be larger.  The limit can be queried by calling
    GetProgramiv with <target> set to FRAGMENT_PROGRAM_ARB and <pname> set to
    MAX_PROGRAM_INSTRUCTIONS_ARB.


GLX Protocol

    Most of the GLX protocol needed to implement this extension is described
    in the GL_NV_vertex_program extension specification and will not be
    repeated here.

    The following two rendering commands are potentially large, and hence can
    be sent in a glXRender or glXRenderLarge request.

        ProgramNamedParameter4fvNV
            2           28+len+p        rendering command length
            2           4218            rendering command opcode
            4           CARD32          id
            4           CARD32          len
            4           FLOAT32         params[0]
            4           FLOAT32         params[1]
            4           FLOAT32         params[2]
            4           FLOAT32         params[3]
            len         LISTofCARD8     name
            p                           unused, p=pad(len)

         If the command is encoded in a glxRenderLarge request, the command
         opcode and command length fields above are expanded to 4 bytes each:

            4           32+len+p        rendering command length
            4           4218            rendering command opcode


        ProgramNamedParameter4dvNV
            2           44+len+p        rendering command length
            2           4219            rendering command opcode
            4           CARD32          id
            4           CARD32          len
            8           FLOAT64         params[0]
            8           FLOAT64         params[1]
            8           FLOAT64         params[2]
            8           FLOAT64         params[3]
            len         LISTofCARD8     name
            p                           unused, p=pad(len)

         If the command is encoded in a glxRenderLarge request, the command
         opcode and command length fields above are expanded to 4 bytes each:

            4           48+len+p        rendering command length
            4           4219            rendering command opcode


    The remaining two commands are non-rendering commands.  These commands are
    sent separately (i.e., not as part of a glXRender or glXRenderLarge
    request), using the glXVendorPrivateWithReply request:

        GetProgramNamedParameterfvNV
            1           CARD8           opcode (X assigned)
            1           17              GLX opcode (glXVendorPrivateWithReply)
            2           4+(len+p)/4     request length
            4           1310            vendor specific opcode
            4           GLX_CONTEXT_TAG context tag
            4           INT32           len
            len         LISTofCARD8     name
            p                           unused, p=pad(len)
          =>

          If the command succeeds, 4 floats are sent in the reply:

            1           1               reply
            1                           unused
            2           CARD16          sequence number
            4           4               reply length
            24                          unused
            16          LISTofFLOAT32   params

          Otherwise, an empty reply is sent, indicating that a GL error
          occured:

            1           1               reply
            1                           unused
            2           CARD16          sequence number
            4           0               reply length
            24                          unused


        GetProgramNamedParameterdvNV
            1           CARD8           opcode (X assigned)
            1           17              GLX opcode (glXVendorPrivateWithReply)
            2           4+(len+p)/4     request length
            4           1311            vendor specific opcode
            4           GLX_CONTEXT_TAG context tag
            4           INT32           len
            len         LISTofCARD8     name
            p                           unused, p=pad(len)
          =>

          If the command succeeds, 4 doubles are sent in the reply:

            1           1               reply
            1                           unused
            2           CARD16          sequence number
            4           8               reply length
            24                          unused
            32          LISTofFLOAT64   params

          Otherwise, an empty reply is sent, indicating that a GL error
          occured:

            1           1               reply
            1                           unused
            2           CARD16          sequence number
            4           0               reply length
            24                          unused


Errors 

    INVALID_OPERATION is generated by Begin, DrawPixels, Bitmap, CopyPixels,
    or a command that performs an explicit Begin if FRAGMENT_PROGRAM_NV is
    enabled and the currently bound fragment program does not exist.

    INVALID_OPERATION is generated by ProgramNamedParameter4fNV,
    ProgramNamedParameter4dNV, ProgramNamedParameter4fvNV,
    ProgramNamedParameter4dvNV, GetProgramNamedParameterfvNV, or
    GetProgramNamedParameterdvNV if <id> specifies a nonexistent program or a
    program whose type does not suport local parameters.

    INVALID_VALUE is generated by ProgramNamedParameter4fNV,
    ProgramNamedParameter4dNV, ProgramNamedParameter4fvNV,
    ProgramNamedParameter4dvNV, GetProgramNamedParameterfvNV, or
    GetProgramNamedParameterdvNV if <len> is zero.

    INVALID_VALUE is generated by ProgramNamedParameter4fNV,
    ProgramNamedParameter4dNV, ProgramNamedParameter4fvNV,
    ProgramNamedParameter4dvNV, GetProgramNamedParameterfvNV, or
    GetProgramNamedParameterdvNV if <name> does not specify the name of a
    local parameter in the program corresponding to <id>.

    INVALID_OPERATION is generated by any command accessing texture coordinate
    processing state if the texture unit number corresponding to the current
    value of ACTIVE_TEXTURE_ARB is greater than or equal to the implementation
    dependent constant MAX_TEXTURE_COORDS_NV.

    INVALID_OPERATION is generated by any command accessing texture image
    processing state if the texture unit number corresponding to the current
    value of ACTIVE_TEXTURE_ARB is greater than or equal to the implementation
    dependent constant MAX_TEXTURE_IMAGE_UNITS_NV.


    (The following are error descriptions copied from GL_NV_vertex_program
     that apply to this extension as well.  These modifications do not affect
     the behavior of that extension.)

    INVALID_VALUE is generated by LoadProgramNV if id is zero.

    INVALID_OPERATION is generated by LoadProgramNV if the program
    corresponding to id is currently loaded but has a program type different
    from that given by target.

    INVALID_OPERATION is generated by LoadProgramNV if the program specified
    is syntactically incorrect for the program type specified by target.  The
    value of PROGRAM_ERROR_POSITION_NV is still updated when this error is
    generated.

    INVALID_OPERATION is generated by LoadProgramNV if the program specified
    fails to conform to any of the semantic restrictions imposed on programs
    of the type specified by target.  The value of PROGRAM_ERROR_POSITION_NV
    is still updated when this error is generated.

    INVALID_OPERATION is generated by BindProgramNV if target does not match
    the type of the program named by id.

    INVALID_VALUE is generated by AreProgramsResidentNV if any of the queried
    programs are zero or do not exist.

    INVALID_OPERATION is generated by GetProgramivNV or GetProgramStringNV if
    the program named id does not exist.


New State

Get Value                          Type  Get Command              Initial Value  Description         Section   Attribute
---------------------------------  ----  -----------------------  -------------  ------------------  --------  ------------
FRAGMENT_PROGRAM_NV                B     IsEnabled                FALSE          fragment program    3.11      enable
                                                                                 mode enable
FRAGMENT_PROGRAM_BINDING_NV        Z+    GetIntegerv              0              bound fragment      5.7       -
                                                                                 program

Table X.6.  New State Introduced by NV_fragment_program.


Get Value                  Type    Get Command          Initial Value  Description         Section   Attribute
-------------------------  ------  ------------------   -------------  ------------------  --------  ---------
PROGRAM_ERROR_POSITION_NV  Z       GetIntegerv          -1             program error       5.7       -
                                                                       position
PROGRAM_TARGET_NV          Z2      GetProgramivNV       0              program target      6.1.13    -
PROGRAM_LENGTH_NV          Z+      GetProgramivNV       0              program length      6.1.13    -
PROGRAM_RESIDENT_NV        Z2      GetProgramivNV       False          program residency   6.1.13    -
PROGRAM_STRING_NV          ubxn    GetProgramStringNV   ""             program string      6.1.13    -
-                          nxR4    GetProgramNamed-     (0,0,0,0)      named program local 5.7       -
                                   ParameterNV                         parameter value
-                          64+xR4  GetProgramLocal-     (0,0,0,0)      numbered program    5.7       -
                                   ParameterARB                        local parameter

Table X.7.  Program Object State common to NV_vertex_program and NV_fragment_program.


Get Value    Type    Get Command   Initial Value  Description               Section   Attribute
---------    ------  -----------   -------------  -----------------------   --------  ---------
-            12xR4   -             fragment data  fragment attribute
                                                  registers                 3.11.1.1  -
-            16xR4   -             (0,0,0,0)      fp32 temporary registers  3.11.1.2  -
-            32xR4   -             (0,0,0,0)      fp16 temporary registers  3.11.1.2  -
             (Z_4)4  -             (EQ,EQ,EQ,EQ)  condition code register   3.11.1.4  -
                                                  address register

Table X.8.  Fragment Program Per-Fragment Execution State.


New Implementation Dependent State

                                                 Minimum
Get Value                   Type   Get Command    Value       Description    Section  Attribute
---------                   ----   -----------   -------  -----------------  -------  ---------
MAX_TEXTURE_COORDS_NV       Z+     GetIntegerv      2     number of texture  2.6      -
                                                          coordinate sets
                                                          supported
MAX_TEXTURE_IMAGE_UNITS_NV  Z+     GetIntegerv      2     number of texture  2.10.2   -
                                                          image units
                                                          supported
MAX_FRAGMENT_PROGRAM_       Z+     GetIntegerv     64     number of numbered 3.11.7   -
  LOCAL_PARAMETERS_NV                                     local parameters
                                                          supported


Revision History

    Rev.    Date    Author   Changes
    ----  -------- --------  --------------------------------------------
     73   05/23/05  pbrown   Fixed cut-and-paste error in the dependency 
                             section where it said "NV_texture_rectangle"
                             instead of "ARB_texture_cube_map".

     72   05/16/04  pbrown   Documented that it's not possible to results from
                             LG2 that are any more precise than what is
                             available in the fp32 storage format.

     71   04/23/04  pbrown   Fixed incorrect example.

     70   03/20/03  pbrown   Made the instruction count limit for !!FP1.0
                             programs queryable instead of a hard-wired value
                             of 1024.  The limit can be queried using
                             ARB_fragment_program mechanisms, and remains 1024
                             if ARB_fragment_program is unsupported.

     69   02/01/03  pbrown   Removed support for combiner fragment programs
                             (!!FCP1.0).

     68   01/08/03  pbrown   Correct spec language providing examples of NaNs,
                             such as sqrt(-1) or log(-1).  Division by zero
                             produces an infinity, not a NaN.

     67   12/23/02  pbrown   Fix incorrect syntax of examples of "KIL"
                             instruction. The condition code test is not
                             parenthesized in KIL. 

     66   10/31/02  pbrown   Cleaned up special cases of POW, including the
                             fact that "POW dst, 0, 0" produces NaN in this
                             spec, not 1.0.

     65   10/28/02  pbrown   Documented that signed HILO textures will have
                             the hemisphere remapping applied, but unsigned
                             textures will not.

     64   09/17/02  pbrown   Minor typo fixes.

     63   08/14/02  pbrown   Clarified the value of the "other" components
                             of f[FOGC].

     62   07/24/02  pbrown   Removed PK4UBG and UP4UBG instructions.
                             Simplified the implementation of the temporary
                             and output register limit for combiner
                             programs by counting all four o[TEXn] registers
                             against the limit, whether or not they are
                             written.

     61   07/19/02  pbrown   Renamed ProgramLocalParameter*NV to
                             ProgramNamedParameter*NV to eliminate naming
                             conflicts with ARB_vertex_program (and presumably
                             ARB_fragment_program).
                             
                             Added support for numbered program local
                             parameters for compatibility with the ARB vertex
                             program extension (and upcoming ARB fragment
                             program extension), so it's possible to set local
                             parameters the same way in both extensions.

                             Eliminated the language describing "register
                             slots" and how the "H" and "R" registers overlap.
                             Instead, registers are guaranteed not to overlap,
                             and a semantic limit is added on the number of
                             temporaries and output registers that can be used
                             by a program.

                             Eliminated the requirement that non-combiner
                             programs actually write a color value; the only
                             requirement is that one output register be
                             written.  When using fragment programs that use
                             depth replacement, there may not be a need to
                             compute color if color writes are currently
                             disabled

                             Cleaned up the issues section.  Added several
                             examples of fragment program operation.

                             Cleaned up GLX protocol.

     59   07/07/02  pbrown   Minor clarifications of texture lookup handling.
                             Documented that DDX and DDY may not always
                             produce infinities.

     58   06/27/02  pbrown   Added clarification that instructions can use the
                             same attribute or parameter register more than
                             once.  Added support for "X" precision on the
                             "set on" instructions.  Removed "X" precision
                             support from DST.

     57   06/27/02  pbrown   Added missing table entries covering the use of
                             floating-point textures.

     56   06/27/02  pbrown   Modified the spec to indicate that depth textures
                             are treated as alpha, luminance, or intensity
                             according to the depth texture mode in ARB_shadow.

     55   06/26/02  pbrown   Fixed the correct aliased register number and
                             "read-only" mappings for o[DEPR] in combiner
                             programs.

     54   06/05/02  pbrown   Fixed the spec to indicate that near and far
                             frustum clipping is disabled for depth
                             replacement programs.  Fixed the spec to indicate
                             that the register combiners enable is overridden
                             for fragment programs (enabled for combiner
                             programs, disabled for color programs).

     53   05/20/02  pbrown   Miscellaneous bug fixes for wording and
                             special-case handling errors.

     52   05/16/02  pbrown   Added "_SAT" suffix to clamp result vector
                             components to [0,1].  Fixed special case rules
                             for MUL instruction and the "UN" condition code.

     50   04/19/02  pbrown   Added "$" as a legal character in an identifier
                             name.  Added example for fixed and conditional
                             write masks and condition code updates.

     49   04/16/02  pbrown   Added new query of PROGRAM_ERROR_STRING_NV to
                             return more detailed information on program load
                             failures.

     48   04/02/02  pbrown   Added missing enum value for the
                             FRAGMENT_PROGRAM_BINDING_NV query. 

     47   03/15/02  pbrown   Fixed various typos, and an incorrect description
                             of the MAX operation.

     45   01/31/02  pbrown   Renamed the packing and unpacking opcode to more
                             closely match OpenGL data type naming conventions
                             (PK2 becomes PK2H, PK16 becomes PH2US, PK4
                             becomes PK4B, PKB becomes PK4UB).  Renamed "BEM"
                             instruction to "X2D" to reflect the fact that it
                             does a 2D coordinate transformation (not just a
                             bump mapping operation).  Added PK4UBG and UP4UBG
                             instructions to support sRGB gamma correction
                             when packing and unpacking components.

     44   01/18/02  pbrown   Double the number of available temporaries (16 to
                             32 fp32 vectors).  Add BEM (texture coordinate
                             offset), PKB/UPB (unsigned byte packing), and
                             PK16/UP16 (unsigned short packing) instructions.

     43   01/04/02  pbrown   Documented special cases for comparisons,
                             including the handling of NaN in the SNE
                             instruction. Added automatic generation of a
                             third normal component for HILO textures.
                             Documented the restriction that RFL can't write
                             to the w component of the result.  Trivial fix of
                             the special-cases for RCP.  Fixed minor typo on
                             the TEX instruction.

     40   11/26/01  pbrown   Eliminated "X" precision specifier on those
                             instructions that do complicated math or don't
                             otherwise need it (e.g., "SGE").  Fixed special
                             case math on LG2 instruction.  Eliminated
                             incorrectly specified exponent clamping on LIT
                             instruction.  Fixed description and special-case
                             math on LIT/POW instructions.  Specified that
                             combiner program outputs are clamped to [-1,+1],
                             not [+0,+1].

     39   11/16/01  pbrown   Added semantic restriction that PK2/PK4 must
                             write to a 32-bit register.  Cleaned up the
                             converse restrictions on UP2/UP4, making sure to
                             allow UP2/UP4 from a program parameter.  Fix
                             section numberings and a few typos.

     36   11/07/01  pbrown   Cleaned up explanation of the "negative q is
                             undefined" for texture mapping spec restriction.
                             Fixed a nit on the number of condition code
                             values (now 4 with UN - unordered).

     35   10/29/01  pbrown   Add a SUB instruction for programmer
                             convenience. Moved unresolved issue list back to
                             the "Issues" section.  Fix several minor wording
                             issues.  Clarify register combiners/texture
                             shader/fragment program flow control diagram.

     32   10/19/01  pbrown   Document the fragment program restriction that
                             instructions involving f[FOGC] and f[TEX0-TEX7]
                             are always carried out at fp32 precision.

     31   10/19/01  pbrown   Fixed incorrect description of encoding of fp16
                             denorms.

     30   10/12/01  pbrown   Documented (0,0,0,0) local parameter
                             initialization.  Disallow multiple defines of the
                             same token.  Allow tokens that look like a
                             possible register or texture name, but have
                             numbers that are too big (e.g., "TEX24", "R37").
                             Fixed up several grammar bugs.  Documented that
                             LG2 and RSQ now do not automatically take
                             absolute values, plus new math special cases.
