2D Rendering

This page is our Design Document of 2D drawing renderer. To be referred by AI for coding as well as humans. The most classic/traditional CAD. This specification was initially generated by ChatGPT5.5 by taking inspiration from already function 3D rendering system. Subsequently, extensively modified manually by Ram to fully synchronize with application architecture.

For a 2D CAD surface, we would not build a CPU tessellator. We would build a GPU parametric renderer: CPU stores CAD objects and uploads compact primitive records; GPU performs culling, curve evaluation/flattening where needed, stroke expansion, hatch clipping, anti-aliasing, and indirect drawing.

Our current D3D12 engine already has useful foundations: high-performance adapter selection, per-tab/per-window resources, root-signature based binding, page-based geometry storage, and ExecuteIndirect drawing over GPU pages. That should be reused conceptually, but the payload should change from “already-built triangle geometry” to “CAD primitive records + GPU-generated draw work.”

MVP decisions for first implementation

These decisions are mandatory for the first Page2D implementation pass. The broader sections below remain the long-term renderer design; this MVP section overrides them wherever the first pass intentionally differs.

No 2D persistence in the first pass. Page2D line and text content may be generated in memory only. Similar to the current 3D test path that keeps adding one shape every second, the engineering thread should auto-generate a create-line action about once per second, consume it, and upload the resulting 2D primitive records to GPU memory.
Render directly into the existing scene RTT. Do not create a dedicated CAD render target in the first pass. A dedicated Page2D render target and multi-view/multi-window composition can come later.
ComputerUnit definition. For Page2D, 1.0 ComputerUnit always means 1 mm. CPU-side Page2D coordinates should be stored as float64/double for long-term CAD precision. GPU records may use page-local/rebased float32 values for rendering.
Coordinate origin. Page2D uses a lower-left origin. This matches the intended future print-layout child coordinate system.
Canvas. The first Page2D viewport is an infinite CAD canvas, not a paper sheet preview. Use a dulled white background with black default geometry/text.
Active container visibility. When a Page2D internal sub-tab is active, Scene3D is not rendered in that viewport. Scene3D engineering/copy-thread work may continue in the background and keep uploading to VRAM, but inactive container pages should not draw.
Object identity. Continue using the existing global memoryID generator. Do not introduce a separate Page2D object ID generator.
No optimization-first work. GPU culling, tile binning, compute-generated indirect counts, and advanced batching are deferred. The first implementation should prioritize a correct visible result.
Lineweight is day-0 CAD behavior. MVP line rendering must support CAD lineweight. Prefer analytic thick-line rendering: one line segment is one instance, the vertex shader expands SV_VertexID 0..5 into two triangles, and the pixel shader computes anti-aliased coverage. Geometry cost should remain proportional to the number of line segments, not to the number of pixels touched by the line.
Text. First-pass text uses font value 0, meaning Noto Sans. ASCII is sufficient for MVP. Text records must keep the future font system in mind and include string, text height in ComputerUnits, rotation, color, justification, xOffset, and yOffset.
Text zoom behavior. CAD text height is in ComputerUnits and therefore zooms with the drawing. At large zoom-out, text may become too small to read or effectively disappear.
Shader files. New 2D shader files should be prefixed with Shader2D_ and added to the existing Visual Studio FxCompile build pattern.
Input. Page2D navigation uses mouse wheel zoom and middle-button drag pan.
Source ownership. Keep all 2D GPU rendering implementation code in MemoryManagerGPU2D.h, MemoryManagerGPU2D.cpp, MemoryManagerGPU2D-DirectX12.h, and MemoryManagerGPU2D-DirectX12.cpp.

1. Overall architecture

DataStorage: Unlike 3D world, where there can be multiple views depicting the same object in different representation (solid/wireframe/transparency etc.), Page2D is a flat structure. Each 2D element is uniquely owned by a single Page2D parent. Or 2D equivalent logical container elements such as P&ID, SLD, 3Dto2D sheets etc. Multiple plot layouts generated out of a Scene2D collection, will render like separate independently rendered Page2D.

Page2D elements will appear in hierarchy normally, and when opened by double click, they will open in their own dedicated sub-tab. Plot-layouts shall be children elements of Page2D. (Plot layout not in MVP scope).

In 3D world, multiple views can be rendered out of same GeometryPage. (In future, not yet implemented as on June 2026). However for 2D rendering, each VRAM page will be unique to a particular Page2D element. GeometryPage2D will store the parametric shapes to be rendered by shaders at runtime. Described in detail in this specification.

Think of the new renderer as a separate subsystem: Do not merge it directly into the 3D triangle pipeline. Keep it parallel, then composite its render target with the 3D viewport and UI. The composition is basically as an internal sub-tab.

Recommended frame flow:

CPU:
  - Apply edits to CAD database.
  - Upload changed primitive records only.
  - Update View2D constant buffer: pan, zoom, DPI, model-to-screen matrix.

GPU:
  1. Compute culling / binning.
  2. Generate visible primitive lists and indirect draw/dispatch arguments.
  3. Optional compute expansion: NURBS / complex curves -> temporary segment buffer.
  4. Render fills / solid hatches.
  5. Render hatch patterns.
  6. Render strokes: lines, polylines, arcs, ellipses, NURBS outlines.
  7. Render selection/highlight overlay.
  8. Composite 2D CAD RT into viewport.
  9. Render UI on top.

Our current render thread already renders to intermediate render textures and then copies to the swap-chain backbuffer. That pattern is good for adding a dedicated CAD render target before final composition.

2. Core idea: store CAD primitives, not triangles

Use compact GPU records.

enum CadPrimType : uint32_t {
    CAD_LINE,
    CAD_POLYLINE,
    CAD_ARC,
    CAD_ELLIPSE,
    CAD_NURBS,
    CAD_HATCH_SOLID,
    CAD_HATCH_PATTERN,
    CAD_TEXT,
    CAD_BLOCK_INSTANCE
};

struct CadPrimitiveHeader {
    uint32_t type;
    uint32_t attrIndex;
    uint32_t firstData;
    uint32_t dataCount;
    float4   bboxModel;      // minX, minY, maxX, maxY
    uint32_t layerId;
    uint32_t zOrder;
    uint32_t flags;
    uint32_t objectId;
};

struct CadStrokeAttr {
    float4 color;
    float  lineWeight;       // paper pixels / mm / model units depending mode
    uint32_t lineWeightMode; // screen-space, model-space, plot-space
    uint32_t lineTypeId;
    uint32_t capJoinStyle;
};

Then separate payload buffers:

PrimitiveHeaderBuffer
PointBuffer
PolylineRangeBuffer
ArcBuffer
EllipseBuffer
NurbsControlPointBuffer
NurbsKnotBuffer
HatchLoopBuffer
HatchPatternBuffer
StrokeAttributeBuffer
LayerStateBuffer
BlockTransformBuffer

This is the main design shift verses our 3D renderer. Our current geometry pages allocate GPU buffers and indirect command buffers per page; reuse that idea, but each page should hold primitive records and generated draw commands, not permanent CPU-built vertices. Our existing CreateNewPage() style page allocation, indirect buffer, and active snapshot concept are very relevant.

3. Rendering lines and polylines

Do not use D3D line primitives for CAD lineweights. Use triangles or analytic coverage.

Best first implementation:

Each line segment = 1 instance.
VS uses SV_VertexID 0..5 to emit two triangles.
Expand in screen space perpendicular to the transformed segment.
PS computes anti-aliased coverage from signed distance to the segment.

For line thickness:

model-space lineweight: thickness transforms with zoom.
screen-space / plot lineweight: thickness remains fixed in pixels/mm regardless of zoom.
paper-space plotting: thickness = mm * DPI / 25.4

For polylines, initially draw each segment as a thick segment with bevel/round caps. Then add proper joins:

Phase 1: segment quads + overlap, acceptable for most engineering drawings.
Phase 2: join shader with prev/current/next vertices.
Phase 3: miter, bevel, round joins, round/square/butt caps.

Recommended GPU record for polyline segment draw:

struct PolylineSegmentRef {
    uint32_t polylineId;
    uint32_t vertexIndex;
    uint32_t attrIndex;
    uint32_t flags;
};

The vertex shader fetches p[i-1], p[i], p[i+1] from the point buffer and computes local join behavior.

4. Arcs and ellipses

Use two approaches. For normal zoom levels, draw an analytic bounding quad and evaluate distance in pixel shader:

Arc:
  store center, radius, startAngle, endAngle.
  PS computes distance from pixel to circular arc.
  Coverage = smoothstep around stroke half-width.

Ellipse:
  store center, axisU, axisV, start/end parameter.
  PS maps pixel into ellipse-local coordinates.
  Compute approximate signed distance.

Pros: crisp at any zoom, no CPU tessellation, very compact memory.
Cons: pixel shader can become expensive for many overlapping curves. Ellipse distance is more complex than circle distance.
For very dense drawings, add a compute prepass:

Compute shader:
  visible arcs/ellipses -> adaptive line segments based on current zoom.
  write segment records into a transient GPU segment buffer.
  draw generated segments using same thick-line pipeline.

This hybrid is powerful: analytic for simple scenes, GPU flattening for dense ones.

5. NURBS strategy

NURBS are the most difficult primitive here. I would not attempt analytic pixel rendering first. Recommended implementation:

CPU:
  uploads control points, weights, knot vector, degree, trim range.

GPU compute:
  evaluates NURBS adaptively using current model-to-screen transform.
  emits temporary polyline segments into a GPU append buffer.
  generates indirect draw args.

GPU graphics:
  renders emitted segments using the same thick-line renderer.

This keeps the renderer GPU-based while avoiding the nightmare of per-pixel NURBS distance evaluation. Important detail: adaptive subdivision should be screen-error based, not model-error based.

If projected curve deviation < 0.25 px: stop subdividing. Else: subdivide.

For the first version, cubic Bézier support is easier than full NURBS. Internally, We can eventually convert NURBS spans to rational Bézier spans and evaluate those on GPU. That conversion can be CPU-side if it is only symbolic/structural conversion, not rasterization.

6. Hatches

Hatches are the hardest part after NURBS, especially with holes, islands, and arbitrary curved boundaries. Separate them into two categories.

A. Pattern hatches

For simple pattern hatches, do this:

1. Render hatch boundary into stencil or coverage mask.
2. Render procedural pattern over hatch bounding box.
3. Clip by stencil/mask.

Pattern generation can be fully shader-based:

float2 p = modelPosition.xy;
float d = frac(dot(p, hatchDirection) / spacing);
float lineCoverage = smoothstep(...);

For multiple hatch lines, store pattern definitions in a GPU buffer.
Pros: compact, zoom-independent, very CAD-like.
Cons: clipping against complex boundaries is the pain point.

B. Solid hatches / fills

Options:

Method	Pros	Cons
GPU stencil path fill	Good for arbitrary loops	More render passes; tricky with curves
Compute scanline/tile fill	Fully GPU, scalable	Complex to implement correctly
GPU triangulation	Fast after triangulation	Robust triangulation on GPU is difficult
CPU triangulation	Easiest	We explicitly do not want CPU rendering/tessellation

For our project, I would start with stencil/mask clipping. It is not CPU software rendering, and it avoids building a full polygon triangulator immediately.

7. Culling and indirect drawing

This is where our existing 3D engine gives a strong hint. We already use page snapshots and ExecuteIndirect over published GPU pages. Extend that idea:

CadPrimitivePage:
  GPU buffer: primitive headers
  GPU buffer: payload data
  GPU buffer: visible primitive indices
  GPU buffer: indirect draw commands
  GPU buffer: generated segment output
  GPU counters

Frame compute pass:

For each primitive:
  - check layer visibility
  - check viewport bbox
  - check object flags
  - estimate screen size
  - append to visible list for its pipeline bucket

Pipeline buckets:

Lines
Polylines
Arcs
Ellipses
NURBS-generated-segments
Solid hatches
Pattern hatches
Selection overlay
Text

Then render each bucket with ExecuteIndirect. This avoids one draw call per CAD entity.

8. Root signature and resources

A possible CAD 2D root signature:

b0  View2DConstants
t0  PrimitiveHeaderBuffer
t1  PointBuffer
t2  StrokeAttributeBuffer
t3  LayerStateBuffer
t4  HatchPatternBuffer
t5  Transform/BlockInstanceBuffer
u0  VisibleListBuffer
u1  GeneratedSegmentBuffer
u2  IndirectCommandBuffer

For compute PSOs:

CadCullCS
CadArcSegmentGenerateCS
CadEllipseSegmentGenerateCS
CadNurbsFlattenCS
CadHatchMaskCS
CadSelectionPickCS

For graphics PSOs:

CadLineStrokePSO
CadPolylineStrokePSO
CadArcAnalyticPSO
CadEllipseAnalyticPSO
CadHatchPatternPSO
CadSolidFillPSO
CadSelectionOverlayPSO
CadTextPSO

Our current UI system already uses atlas textures, shader-visible descriptor heaps, upload queues, and root parameters for 2D screen-space rendering. That is useful inspiration for CAD text, symbols, linetype textures, and hatch pattern tables.

9. Anti-aliasing

For CAD, analytic anti-aliasing is usually better than relying only on MSAA. Recommended:

Lines/arcs/ellipses: compute signed distance in pixel shader. alpha = coverage based on distance to stroke edge.  
Hatches: use analytic coverage for hatch lines. optionally use 4x MSAA for hatch masks.  
Text: reuse MSDF approach from UI/text pipeline.

Our UI already has MSDF-style atlas infrastructure, so CAD text can share that design rather than needing a separate text renderer.

10. Precision problem: very important for CAD

CAD coordinates can be huge. Float32 alone will eventually fail when zooming into large coordinate drawings. Avoid this:

float2 screen = mul(float4(modelX, modelY, 0, 1), viewProj);

Use one of these:

Option A: page-local coordinates

Each geometry page has double-precision origin on CPU.
GPU stores float local coordinates relative to page origin.
View constant stores camera origin.
Shader computes local-to-view using rebased coordinates.

Option B: high/low float pair

struct GpuDouble2 {
    float2 hi;
    float2 lo;
};

Then shader reconstructs camera-relative values.

Option C: fixed-point integer coordinates

Useful for exact CAD databases, but more shader work.

Our Choice: page-local float coordinates + per-page origin first. It aligns well with Our existing page architecture.

11. Selection and hit testing

Do not solve selection by CPU geometry tests only. Add a GPU picking path. Two options:

ID buffer:
  render objectId into R32_UINT texture.
  read one pixel or small rectangle around cursor.

Compute picking:
  dispatch around mouse pick box.
  test primitive distance analytically.
  output nearest objectId.

For CAD, compute picking is more accurate for thin geometry and lineweights. Use ID buffer for quick hover, compute picking for final selection.

12. Suggested implementation stages

Stage 1 — 2D CAD render target and camera

Create:

DX12Resources2DPerTab
DX12Resources2DPerWindow
CadViewConstants

Use orthographic projection, pan, zoom, DPI scaling, and render to Our existing RTT path.

Stage 2 — GPU line renderer

Implement only:

Line segments
Screen-space thickness
Solid color
Anti-aliased edges
ExecuteIndirect draw path

Avoid polylines, hatches, arcs initially. Make one million independent segments render fast.

Stage 3 — Polyline joins and linetypes

Add:

Polyline adjacency
Dash/dot linetype
Round/butt/square caps
Miter/bevel/round joins

Linetype can be handled in shader using cumulative distance along segment/polyline.

Stage 4 — Arcs and ellipses

Add analytic arc/ellipse shader first. Then add optional GPU flattening if dense drawings become slow.

Stage 5 — Hatch pattern renderer

Implement:

Rectangular bbox draw
Procedural hatch pattern in model space
Stencil/mask clipping
Boundary loops

Stage 6 — NURBS compute flattener

Add compute adaptive subdivision to generated segment buffer.

Stage 7 — GPU culling and tile/bin system

Upgrade from simple bbox culling to tiled culling:

Screen divided into 32x32 or 64x64 tiles.
Compute assigns primitives to tiles.
Render or dispatch only visible tile primitive lists.

Stage 8 — Selection, snapping, highlighting

Add:

ID render target
GPU pick compute
nearest endpoint/midpoint/intersection candidates
highlight overlay pass

13. Pros and cons

Pros

Advantage	Result
Compact CAD records	Much less memory than permanent tessellated triangles
GPU culling + indirect draw	Scales to huge drawings
Analytic strokes	Crisp zooming, excellent lineweight control
Shared D3D12 architecture	Reuses Our queues, fences, render threads, RTT flow
Compute curve expansion	NURBS/arcs can adapt to current zoom
Good CAD plotting model	Screen-space, model-space, and paper-space widths are all possible

Cons

Problem	Impact
Much harder than CPU tessellation	More shaders, more debug complexity
Hatches are difficult	Boundary clipping, holes, islands, pattern phase
NURBS are difficult	Adaptive GPU subdivision needed
Precision needs planning	Float32 is not enough for large CAD coordinates
Picking/snapping needs separate system	Rendering and selection are not automatically solved together
Many PSOs	Requires careful batching by primitive type/style
GPU synchronization complexity	UAV barriers, resource states, counters, indirect args

D3D12 resource state synchronization is explicitly application-managed with resource barriers, so the compute-to-draw path will need correct UAV/transition barriers between culling, generated buffers, indirect command buffers, and rendering.

14. Our recommended design choice

For Our case, I would choose this hybrid:

Lines / polylines:
  analytic thick segment rendering using instanced triangles.

Arcs / circles / ellipses:
  analytic shader for normal cases;
  optional compute-generated segments for dense cases.

NURBS:
  GPU compute adaptive flattening into transient segment buffer.

Hatches:
  stencil/mask clipped procedural pattern first;
  solid fills later.

Text:
  reuse MSDF atlas approach from UI.

Culling:
  compute bbox culling first;
  tile binning later.

Draw submission:
  ExecuteIndirect, same philosophy as Our current page-based 3D engine.

I would not start with mesh shaders. Keep the first version compatible with Our current D3D12 style: compute + VS/PS + ExecuteIndirect. Mesh shaders can be an optional later path for GPU-generated geometry, but they introduce hardware/feature checks and another pipeline style;

Bottom line

Build it as a GPU CAD primitive renderer, not a triangle renderer. CPU uploads edited CAD records. GPU performs visibility, curve subdivision, stroke expansion, hatch clipping, and draw generation. Our existing page-based ExecuteIndirect architecture is a strong starting point; just change the abstraction from “geometry page of vertices/indices” to “CAD primitive page + generated GPU work.”