2D Rendering
This page is our Design Document of 2D drawing renderer. To be referred by AI for coding as well as humans. The most classic/traditional CAD. This specification was initially generated by ChatGPT5.5 by taking inspiration from already function 3D rendering system. Subsequently, extensively modified manually by Ram to fully synchronize with application architecture.
For a 2D CAD surface, we would not build a CPU tessellator. We would build a GPU parametric renderer: CPU stores CAD objects and uploads compact primitive records; GPU performs culling, curve evaluation/flattening where needed, stroke expansion, hatch clipping, anti-aliasing, and indirect drawing.
Our current D3D12 engine already has useful foundations: high-performance adapter selection, per-tab/per-window resources, root-signature based binding, page-based geometry storage, and ExecuteIndirect drawing over GPU pages. That should be reused conceptually, but the payload should change from “already-built triangle geometry” to “CAD primitive records + GPU-generated draw work.”
MVP decisions for first implementation
These decisions are mandatory for the first Page2D implementation pass. The broader sections below remain the long-term renderer design; this MVP section overrides them wherever the first pass intentionally differs.
- No 2D persistence in the first pass. Page2D line and text content may be generated in memory only. Similar to the current 3D test path that keeps adding one shape every second, the engineering thread should auto-generate a create-line action about once per second, consume it, and upload the resulting 2D primitive records to GPU memory.
- Render directly into the existing scene RTT. Do not create a dedicated CAD render target in the first pass. A dedicated Page2D render target and multi-view/multi-window composition can come later.
- ComputerUnit definition. For Page2D,
1.0ComputerUnit always means1 mm. CPU-side Page2D coordinates should be stored asfloat64/doublefor long-term CAD precision. GPU records may use page-local/rebasedfloat32values for rendering. - Coordinate origin. Page2D uses a lower-left origin. This matches the intended future print-layout child coordinate system.
- Canvas. The first Page2D viewport is an infinite CAD canvas, not a paper sheet preview. Use a dulled white background with black default geometry/text.
- Active container visibility. When a Page2D internal sub-tab is active, Scene3D is not rendered in that viewport. Scene3D engineering/copy-thread work may continue in the background and keep uploading to VRAM, but inactive container pages should not draw.
- Object identity. Continue using the existing global
memoryIDgenerator. Do not introduce a separate Page2D object ID generator. - No optimization-first work. GPU culling, tile binning, compute-generated indirect counts, and advanced batching are deferred. The first implementation should prioritize a correct visible result.
- Lineweight is day-0 CAD behavior. MVP line rendering must support CAD lineweight. Prefer analytic thick-line rendering: one line segment is one instance, the vertex shader expands
SV_VertexID0..5 into two triangles, and the pixel shader computes anti-aliased coverage. Geometry cost should remain proportional to the number of line segments, not to the number of pixels touched by the line. - Text. First-pass text uses font value
0, meaning Noto Sans. ASCII is sufficient for MVP. Text records must keep the future font system in mind and include string, text height in ComputerUnits, rotation, color, justification,xOffset, andyOffset. - Text zoom behavior. CAD text height is in ComputerUnits and therefore zooms with the drawing. At large zoom-out, text may become too small to read or effectively disappear.
- Shader files. New 2D shader files should be prefixed with
Shader2D_and added to the existing Visual StudioFxCompilebuild pattern. - Input. Page2D navigation uses mouse wheel zoom and middle-button drag pan.
- Source ownership. Keep all 2D GPU rendering implementation code in
MemoryManagerGPU2D.h,MemoryManagerGPU2D.cpp,MemoryManagerGPU2D-DirectX12.h, andMemoryManagerGPU2D-DirectX12.cpp.
1. Overall architecture
DataStorage: Unlike 3D world, where there can be multiple views depicting the same object in different representation (solid/wireframe/transparency etc.), Page2D is a flat structure. Each 2D element is uniquely owned by a single Page2D parent. Or 2D equivalent logical container elements such as P&ID, SLD, 3Dto2D sheets etc. Multiple plot layouts generated out of a Scene2D collection, will render like separate independently rendered Page2D.
Page2D elements will appear in hierarchy normally, and when opened by double click, they will open in their own dedicated sub-tab. Plot-layouts shall be children elements of Page2D. (Plot layout not in MVP scope).
In 3D world, multiple views can be rendered out of same GeometryPage. (In future, not yet implemented as on June 2026). However for 2D rendering, each VRAM page will be unique to a particular Page2D element. GeometryPage2D will store the parametric shapes to be rendered by shaders at runtime. Described in detail in this specification.
Think of the new renderer as a separate subsystem: Do not merge it directly into the 3D triangle pipeline. Keep it parallel, then composite its render target with the 3D viewport and UI. The composition is basically as an internal sub-tab.
Recommended frame flow:
CPU:
- Apply edits to CAD database.
- Upload changed primitive records only.
- Update View2D constant buffer: pan, zoom, DPI, model-to-screen matrix.
GPU:
1. Compute culling / binning.
2. Generate visible primitive lists and indirect draw/dispatch arguments.
3. Optional compute expansion: NURBS / complex curves -> temporary segment buffer.
4. Render fills / solid hatches.
5. Render hatch patterns.
6. Render strokes: lines, polylines, arcs, ellipses, NURBS outlines.
7. Render selection/highlight overlay.
8. Composite 2D CAD RT into viewport.
9. Render UI on top.
Our current render thread already renders to intermediate render textures and then copies to the swap-chain backbuffer. That pattern is good for adding a dedicated CAD render target before final composition.
2. Core idea: store CAD primitives, not triangles
Use compact GPU records.
enum CadPrimType : uint32_t {
CAD_LINE,
CAD_POLYLINE,
CAD_ARC,
CAD_ELLIPSE,
CAD_NURBS,
CAD_HATCH_SOLID,
CAD_HATCH_PATTERN,
CAD_TEXT,
CAD_BLOCK_INSTANCE
};
struct CadPrimitiveHeader {
uint32_t type;
uint32_t attrIndex;
uint32_t firstData;
uint32_t dataCount;
float4 bboxModel; // minX, minY, maxX, maxY
uint32_t layerId;
uint32_t zOrder;
uint32_t flags;
uint32_t objectId;
};
struct CadStrokeAttr {
float4 color;
float lineWeight; // paper pixels / mm / model units depending mode
uint32_t lineWeightMode; // screen-space, model-space, plot-space
uint32_t lineTypeId;
uint32_t capJoinStyle;
};
Then separate payload buffers:
PrimitiveHeaderBuffer
PointBuffer
PolylineRangeBuffer
ArcBuffer
EllipseBuffer
NurbsControlPointBuffer
NurbsKnotBuffer
HatchLoopBuffer
HatchPatternBuffer
StrokeAttributeBuffer
LayerStateBuffer
BlockTransformBuffer
This is the main design shift verses our 3D renderer. Our current geometry pages allocate GPU buffers and indirect command buffers per page; reuse that idea, but each page should hold primitive records and generated draw commands, not permanent CPU-built vertices. Our existing CreateNewPage() style page allocation, indirect buffer, and active snapshot concept are very relevant.
3. Rendering lines and polylines
Do not use D3D line primitives for CAD lineweights. Use triangles or analytic coverage.
Best first implementation:
Each line segment = 1 instance.
VS uses SV_VertexID 0..5 to emit two triangles.
Expand in screen space perpendicular to the transformed segment.
PS computes anti-aliased coverage from signed distance to the segment.
For line thickness:
model-space lineweight: thickness transforms with zoom.
screen-space / plot lineweight: thickness remains fixed in pixels/mm regardless of zoom.
paper-space plotting: thickness = mm * DPI / 25.4
For polylines, initially draw each segment as a thick segment with bevel/round caps. Then add proper joins:
Phase 1: segment quads + overlap, acceptable for most engineering drawings.
Phase 2: join shader with prev/current/next vertices.
Phase 3: miter, bevel, round joins, round/square/butt caps.
Recommended GPU record for polyline segment draw:
struct PolylineSegmentRef {
uint32_t polylineId;
uint32_t vertexIndex;
uint32_t attrIndex;
uint32_t flags;
};
The vertex shader fetches p[i-1], p[i], p[i+1] from the point buffer and computes local join behavior.
4. Arcs and ellipses
Use two approaches. For normal zoom levels, draw an analytic bounding quad and evaluate distance in pixel shader:
Arc:
store center, radius, startAngle, endAngle.
PS computes distance from pixel to circular arc.
Coverage = smoothstep around stroke half-width.
Ellipse:
store center, axisU, axisV, start/end parameter.
PS maps pixel into ellipse-local coordinates.
Compute approximate signed distance.
Pros: crisp at any zoom, no CPU tessellation, very compact memory.
Cons: pixel shader can become expensive for many overlapping curves. Ellipse distance is more complex than circle distance.
For very dense drawings, add a compute prepass:
Compute shader:
visible arcs/ellipses -> adaptive line segments based on current zoom.
write segment records into a transient GPU segment buffer.
draw generated segments using same thick-line pipeline.
This hybrid is powerful: analytic for simple scenes, GPU flattening for dense ones.
5. NURBS strategy
NURBS are the most difficult primitive here. I would not attempt analytic pixel rendering first. Recommended implementation:
CPU:
uploads control points, weights, knot vector, degree, trim range.
GPU compute:
evaluates NURBS adaptively using current model-to-screen transform.
emits temporary polyline segments into a GPU append buffer.
generates indirect draw args.
GPU graphics:
renders emitted segments using the same thick-line renderer.
This keeps the renderer GPU-based while avoiding the nightmare of per-pixel NURBS distance evaluation. Important detail: adaptive subdivision should be screen-error based, not model-error based.
If projected curve deviation < 0.25 px: stop subdividing. Else: subdivide.
For the first version, cubic Bézier support is easier than full NURBS. Internally, We can eventually convert NURBS spans to rational Bézier spans and evaluate those on GPU. That conversion can be CPU-side if it is only symbolic/structural conversion, not rasterization.
6. Hatches
Hatches are the hardest part after NURBS, especially with holes, islands, and arbitrary curved boundaries. Separate them into two categories.
A. Pattern hatches
For simple pattern hatches, do this:
1. Render hatch boundary into stencil or coverage mask.
2. Render procedural pattern over hatch bounding box.
3. Clip by stencil/mask.
Pattern generation can be fully shader-based:
float2 p = modelPosition.xy;
float d = frac(dot(p, hatchDirection) / spacing);
float lineCoverage = smoothstep(...);
For multiple hatch lines, store pattern definitions in a GPU buffer.
Pros: compact, zoom-independent, very CAD-like.
Cons: clipping against complex boundaries is the pain point.
B. Solid hatches / fills
Options:
| Method | Pros | Cons |
|---|---|---|
| GPU stencil path fill | Good for arbitrary loops | More render passes; tricky with curves |
| Compute scanline/tile fill | Fully GPU, scalable | Complex to implement correctly |
| GPU triangulation | Fast after triangulation | Robust triangulation on GPU is difficult |
| CPU triangulation | Easiest | We explicitly do not want CPU rendering/tessellation |
For our project, I would start with stencil/mask clipping. It is not CPU software rendering, and it avoids building a full polygon triangulator immediately.
7. Culling and indirect drawing
This is where our existing 3D engine gives a strong hint. We already use page snapshots and ExecuteIndirect over published GPU pages. Extend that idea:
CadPrimitivePage:
GPU buffer: primitive headers
GPU buffer: payload data
GPU buffer: visible primitive indices
GPU buffer: indirect draw commands
GPU buffer: generated segment output
GPU counters
Frame compute pass:
For each primitive:
- check layer visibility
- check viewport bbox
- check object flags
- estimate screen size
- append to visible list for its pipeline bucket
Pipeline buckets:
Lines
Polylines
Arcs
Ellipses
NURBS-generated-segments
Solid hatches
Pattern hatches
Selection overlay
Text
Then render each bucket with ExecuteIndirect. This avoids one draw call per CAD entity.
8. Root signature and resources
A possible CAD 2D root signature:
b0 View2DConstants
t0 PrimitiveHeaderBuffer
t1 PointBuffer
t2 StrokeAttributeBuffer
t3 LayerStateBuffer
t4 HatchPatternBuffer
t5 Transform/BlockInstanceBuffer
u0 VisibleListBuffer
u1 GeneratedSegmentBuffer
u2 IndirectCommandBuffer
For compute PSOs:
CadCullCS
CadArcSegmentGenerateCS
CadEllipseSegmentGenerateCS
CadNurbsFlattenCS
CadHatchMaskCS
CadSelectionPickCS
For graphics PSOs:
CadLineStrokePSO
CadPolylineStrokePSO
CadArcAnalyticPSO
CadEllipseAnalyticPSO
CadHatchPatternPSO
CadSolidFillPSO
CadSelectionOverlayPSO
CadTextPSO
Our current UI system already uses atlas textures, shader-visible descriptor heaps, upload queues, and root parameters for 2D screen-space rendering. That is useful inspiration for CAD text, symbols, linetype textures, and hatch pattern tables.
9. Anti-aliasing
For CAD, analytic anti-aliasing is usually better than relying only on MSAA. Recommended:
Lines/arcs/ellipses: compute signed distance in pixel shader. alpha = coverage based on distance to stroke edge.
Hatches: use analytic coverage for hatch lines. optionally use 4x MSAA for hatch masks.
Text: reuse MSDF approach from UI/text pipeline.
Our UI already has MSDF-style atlas infrastructure, so CAD text can share that design rather than needing a separate text renderer.
10. Precision problem: very important for CAD
CAD coordinates can be huge. Float32 alone will eventually fail when zooming into large coordinate drawings. Avoid this:
float2 screen = mul(float4(modelX, modelY, 0, 1), viewProj);
Use one of these:
Option A: page-local coordinates
Each geometry page has double-precision origin on CPU.
GPU stores float local coordinates relative to page origin.
View constant stores camera origin.
Shader computes local-to-view using rebased coordinates.
Option B: high/low float pair
struct GpuDouble2 {
float2 hi;
float2 lo;
};
Then shader reconstructs camera-relative values.
Option C: fixed-point integer coordinates
Useful for exact CAD databases, but more shader work.
Our Choice: page-local float coordinates + per-page origin first. It aligns well with Our existing page architecture.
11. Selection and hit testing
Do not solve selection by CPU geometry tests only. Add a GPU picking path. Two options:
ID buffer:
render objectId into R32_UINT texture.
read one pixel or small rectangle around cursor.
Compute picking:
dispatch around mouse pick box.
test primitive distance analytically.
output nearest objectId.
For CAD, compute picking is more accurate for thin geometry and lineweights. Use ID buffer for quick hover, compute picking for final selection.
12. Suggested implementation stages
Stage 1 — 2D CAD render target and camera
Create:
DX12Resources2DPerTab
DX12Resources2DPerWindow
CadViewConstants
Use orthographic projection, pan, zoom, DPI scaling, and render to Our existing RTT path.
Stage 2 — GPU line renderer
Implement only:
Line segments
Screen-space thickness
Solid color
Anti-aliased edges
ExecuteIndirect draw path
Avoid polylines, hatches, arcs initially. Make one million independent segments render fast.
Stage 3 — Polyline joins and linetypes
Add:
Polyline adjacency
Dash/dot linetype
Round/butt/square caps
Miter/bevel/round joins
Linetype can be handled in shader using cumulative distance along segment/polyline.
Stage 4 — Arcs and ellipses
Add analytic arc/ellipse shader first. Then add optional GPU flattening if dense drawings become slow.
Stage 5 — Hatch pattern renderer
Implement:
Rectangular bbox draw
Procedural hatch pattern in model space
Stencil/mask clipping
Boundary loops
Stage 6 — NURBS compute flattener
Add compute adaptive subdivision to generated segment buffer.
Stage 7 — GPU culling and tile/bin system
Upgrade from simple bbox culling to tiled culling:
Screen divided into 32x32 or 64x64 tiles.
Compute assigns primitives to tiles.
Render or dispatch only visible tile primitive lists.
Stage 8 — Selection, snapping, highlighting
Add:
ID render target
GPU pick compute
nearest endpoint/midpoint/intersection candidates
highlight overlay pass
13. Pros and cons
Pros
| Advantage | Result |
|---|---|
| Compact CAD records | Much less memory than permanent tessellated triangles |
| GPU culling + indirect draw | Scales to huge drawings |
| Analytic strokes | Crisp zooming, excellent lineweight control |
| Shared D3D12 architecture | Reuses Our queues, fences, render threads, RTT flow |
| Compute curve expansion | NURBS/arcs can adapt to current zoom |
| Good CAD plotting model | Screen-space, model-space, and paper-space widths are all possible |
Cons
| Problem | Impact |
|---|---|
| Much harder than CPU tessellation | More shaders, more debug complexity |
| Hatches are difficult | Boundary clipping, holes, islands, pattern phase |
| NURBS are difficult | Adaptive GPU subdivision needed |
| Precision needs planning | Float32 is not enough for large CAD coordinates |
| Picking/snapping needs separate system | Rendering and selection are not automatically solved together |
| Many PSOs | Requires careful batching by primitive type/style |
| GPU synchronization complexity | UAV barriers, resource states, counters, indirect args |
D3D12 resource state synchronization is explicitly application-managed with resource barriers, so the compute-to-draw path will need correct UAV/transition barriers between culling, generated buffers, indirect command buffers, and rendering.
14. Our recommended design choice
For Our case, I would choose this hybrid:
Lines / polylines:
analytic thick segment rendering using instanced triangles.
Arcs / circles / ellipses:
analytic shader for normal cases;
optional compute-generated segments for dense cases.
NURBS:
GPU compute adaptive flattening into transient segment buffer.
Hatches:
stencil/mask clipped procedural pattern first;
solid fills later.
Text:
reuse MSDF atlas approach from UI.
Culling:
compute bbox culling first;
tile binning later.
Draw submission:
ExecuteIndirect, same philosophy as Our current page-based 3D engine.
I would not start with mesh shaders. Keep the first version compatible with Our current D3D12 style: compute + VS/PS + ExecuteIndirect. Mesh shaders can be an optional later path for GPU-generated geometry, but they introduce hardware/feature checks and another pipeline style;
Bottom line
Build it as a GPU CAD primitive renderer, not a triangle renderer. CPU uploads edited CAD records. GPU performs visibility, curve subdivision, stroke expansion, hatch clipping, and draw generation. Our existing page-based ExecuteIndirect architecture is a strong starting point; just change the abstraction from “geometry page of vertices/indices” to “CAD primitive page + generated GPU work.”