Mission Vishwakarma Download Roadmap Pricing

Graphics API

API stands for Application Programming Interface. Basically a set of conventions / standards, compute engineers have come up with to write the software into. We need to pick sides here.

Choosing a graphics API to base our software upon is one of the most fundamental design we are going to make. For all practical purpose (read sunk man-month reasons) once we choose an API we will be “stuck” with it forever. This is one of the topics where I intentionally choose Performance over Development velocity. We could speed up software development by choosing a ready built engines such as open source ImGUI, GoDot, QT etc. Though, “engines” isolate the software from underlying APIs, we may get constrained by the engine itself at some point in future. We rule out closed source engines such as Unity and Unreal Engine for political reasons ! Fun Fact: This attitude is sometimes called NIH Syndrome i.e. Not-Invented-Here Syndrome. ;) So coming back to lower level APIs, we have limited APIs on each of the Operating Systems.

On windows, we have DirectX 9 / 10 / 11 / 12, OpenGL and Vulkan. OpenGL has been deprecated long back and newer graphics features such as Ray Tracing aren’t supported by it. Vulkan is generally a 2nd class citizen in windows compared to DirectX. Hence we choose the most modern flavor DirectX12. Remember, DirectX12 itself was 1st released in 2014. Hence setting it as a baseline requirement for our software is a reasonable decision. Hence DirectX12 is our ONLY graphics API for Windows Operating System. We support Windows 10 and 11 both for now (2025). This covers perhaps 90% of our target worldwide users. We also presume support of Heap_Tier_2 inside DirectX12. Note: Heap_Tier_2 started appearing in 2015/2016 timeline. What ShaderModel Level ? To be figured out. If you are feeling over-hyped to get deep down, read the 1st ( of 4 ) tutorial on DirectX12 here. It is ~100 pages !

Next most “market-share” operating system is MacOS on Apple Devices. In Apple world, Metal APIs are the only recommended ( non-deprecated ) APIs, hence we go with Metal. Even Vulkan works though a translation layer such as MoltenVK etc. Still for performance and 1st party support, we choose Metal API. Mac Graphics / Metal API shall also be partially reusable on iPhone / iPad devices, since they also have Metal as the preferred API.

Next up is Linux ( Ubuntu ) Operating System. This being open source operating system, open standard Vulkan is preferred here. We want our software to be available on even free operating systems. Hence we must have a Vulkan based US as well. Another reason for keeping this Vulkan interface is due to overlap with Android Mobile Operating System. For Android Phones, we have only 2 options, deprecated OpenGL or modern Vulkan. Hence we choose Vulkan. The within last 10 year version ! i.e. Vulkan 1.1.

Above 3 APIs are for desktop application. Next up is Brower based engine. Here upcoming ( as on 2025) API named WebGPU is chosen-one. This is supported by all major web-browser vendors i.e. Google Chrome, Apple Safari and Mozilla.

Having made above decisions, we have to be realistic about our core-engineering-degree-holder software developers. We can’t expect a chemical / civil / electrical / instrumentation / mechanical background people/developers to be familiar with such deep computer science concepts. Hence we structure our code in sort of mini-engine (NIH?), where adding a new UI element doesn’t involve fiddling deep down in graphics APIs. This will be sorted out progressively as our software matures.

Our software installer will verify that all the relevant APIs are present on the system, before installation. So this way, inside application, we don’t check every time whether a particular feature is supported by available hardware. Unless the initial installed-hardware itself changes. By default this check shouldn’t take more than a few micro-seconds during application startups.

More Graphics design decisions as specified in our Source Code !

  1// Copyright (c) 2025-Present : Ram Shanker: All rights reserved.
  2
  3/*
  4Windows Desktop C++ DirectX12 application for CAD / CAM use.
  5This file is our Architecture . Primitive data structures common to all platforms may be added here.
  6
  7At startup, pickup the GPU with highest VRAM. All rendering happens here only. Only 1 device supported for rendering. 
  8However OS may send the display frame to monitor connected to other / integrated GPU.
  9
 10VERTEX DETAILS:
 11VertexLayout Common to all geometry:
 123x4 Bytes for Position, 4 Bytes for Normal, 4 Bytes for Color RGBA / 8 Bytes if HDR Monitor present. 
 13    = 20 / 24 Bytes per vertex.
 14Anyway go with 24 Bytes format ONLY. Tone mapping (HDR -> SDR) should happen in the Pixel Shader.
 15
 16Initially Hemispheric Ambient Lighting
 17Factor = (Normal.z \times 0.5) + 0.5
 18AmbientLight = Lerp(GroundColor, SkyColor, Factor)
 19Screen Space Ambient Occlusion (SSAO) to darken creases and corners in future revision.
 20
 21All vertexes are positioned on object local space. World matrix applied in vertex shader.
 22This enables, moving even 1000vertex objects just a 48 bytes world matrix update per oject.
 23We use packed 48 bytes world matrix instead of 64 bytes to save bandwidth.
 24Since last row is always 0,0,0,1, we can omit it. In shader, we reconstruct the last row.
 25
 26Separate render threads (1 per monitor) and single Copy thread. Copy thread is the ringmaster of VRAM!
 27Separate render threads per monitor are in VSync with monitors unique refresh rate.
 28Here separate render queue per monitor.
 29
 30We use ExecuteIndirect command with start vertex location instead of DrawIndexedInstanced per object.
 31I want per tab VRAM isolation, each tab will be completely separate.
 32Except for uncloseble tab 0 which stores common textures and UI elements.
 33
 34To support 100s of simultaneous tab, we start with small heap say 4MB per tab and grow heap size only when necessary.
 35Each page could be a mixture of various geometry types. Say Cylinders, Cubes, I beams etc.
 36Instead of allocating 1 giant 256MB buffer. Don't manually destroy heaps on tab switch. Use Evict. 
 37It allows the OS to handle the caching. 
 38If the user clicks back to a heavy tab, MakeResident is faster than re-creating heaps. Tab 0 is always resident. 
 39Eviction happens with a time lag of few seconds. 
 40Advanced system memory budget based eviction strategy after rest of spec implemented.
 41
 42Each page will be accompanied by a corresponding ExecuteIndirect argument buffer.
 43Each TAB will also have it's dedicated World Matrix buffer.
 44When we defragment a page, we must simultaneously rebuild its corresponding Argument Buffer.
 45
 46There will be multiple views per tab.
 47Each View will maintain a pair ( double buffered ) of ExecuteIndirect command buffer.
 48When an object is deleted, copy thread receive command from engineering thread. 
 49Copy thread than update the next double buffer and record the hole in Vertex/index buffer.
 50Except for currently filling head buffer,
 51
 52Maintain a Free-List Allocator (e.g., a Segregated Free List) on the CPU. Per Tab.
 53The Allocator knows: "I have a 12KB middle gap in Page 3, and a 40KB middle gap in Page 8."
 54When a 10KB request comes in, the Allocator immediately returns "Page 3". No iterating through Page objects.
 55If freelist says none of existing pages can accommodate new geometry, than create new heap/placed resource buffer.
 56Free list does not track internal holes created from deleting objects. 
 57Only middle empty space. Aggregate holes are tracked per page. Defragmented occasionally.
 58
 59When a buffer gets >25% holes, it does creates a new defragmented buffer, once complete, switches over to new buffer.
 60For new geometry addition. Maximum 1 buffer is defragmented at a time (between 2 frames). Since max page size is 64MB, 
 61This will not produce high latency stall during async with copy thread.
 62
 63Root Signature puts the "Constants" (View/Proj matrix) in root constants or a very fast descriptor table,
 64as these don't change between pages. Only the VBV/IBV and the EI Argument Buffer change per batch/page.
 65
 66Here is the realistic "Worst Case" Hierarchy for a CAD Frame:
 67• ​Index Depth: 16-bit vs 32-bit (Hardware Requirement) Examples: Nuts/Bolts (16) vs Engine Blocks (32)
 68• ​Transparency: Opaque vs Transparent (Sorting Requirement). Transparent objects must be drawn last for alpha blending.
 69• ​Topology: Triangles (Solid) vs Lines (Wireframe) (PSO Requirement). 
 70    We cannot draw lines and triangles in the same call.
 71• ​Culling: Single-Sided vs Double-Sided (PSO Requirement) . Sheet metal vs Solids.
 72    Since section is a common use case, perhaps we could have all geometry double sided. To be ascertained latter.
 73• ​Buffer Pages (N): How many 256MB blocks you are using.
 74​Total Unique Batches = 2 x 2 x 2 x 2 x N = 16 x N
 75This will ensure no pipeline state reset while rendering single Page. ExecuteIndirect call for every Page.
 76
 77To be clarified latter: How do we  handle repeat geometry ? Say bolts.
 78They will only need set of vertex/index buffers. We can draw them with different world matrices.
 79
 80NORMALS:
 81​The industry standard solution for Normals is not 16-bit floats, but Packed 10-bit Integers.
 82​We use the format: DXGI_FORMAT_R10G10B10A2_UNORM.
 83​X: 10 bits (0 to 1023), ​Y: 10 bits (0 to 1023), ​Z: 10 bits (0 to 1023), ​Padding: 2 bits (unused)
 84​Total: 32 bits (4 Bytes). Why this is perfect for Normals:
 85​Size: It is 3x smaller than 12-byte normal. (4 bytes vs 12 bytes). ​Precision: 10 bits gives us 2^{10} = 1024 steps. 
 86Since normals are always between -1.0 and 1.0, this gives you a precision of roughly 0.002.
 87This is visually indistinguishable from 32-bit floats for lighting, even in high-end CAD.
 88Vertex Shader Normalization: Normal = Input.Normal * 2.0 - 1.0.
 89
 90PAGE STRUCTURE:
 91Vertex and Index buffer in same Page : superior architectural choice for three reasons:
 92​Halves the Allocation Overhead: We only manage 1 heap/resource per 4MB page instead of 2.
 93​Cache Locality: When the GPU fetches a mesh, the vertices and indices are physically close in VRAM (same memory page).
 94This can slightly improve cache hit rates.
 95​Vertices start at Offset 0 and grow UP. ​Indices start at Offset Max (4MB) and grow DOWN.
 96​Free Space is always the gap in the middle. ​Page Full when Vertex_Head_Ptr meets or crosses Index_Tail_Ptr.
 9732 Bytes mandatory gap in middle to address alignment concerns.
 98
 99Lazy Creation.
100​When a user creates a new Tab, allocated memory = 0 MB.
101User draws a Bolt (Solid): Allocate Solid_Page_0 (4MB).
102​User draws a Glass Window: Allocate Transparent_Page_0 (4MB).
103​User never draws a Wireframe: Wireframe_Page remains null.
104
105Resource state is together. i.e. D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER | D3D12_RESOURCE_STATE_INDEX_BUFFER
106Feature				Decision				Benefit
107Page Content	    Single Type Only		Zero PSO switching during Draw.
108Growth Logic		Chained Doubling		4->8->16->32->64. No moving old data.
109Max Page Size		64 MB					Prevents fragmentation failure on low-VRAM GPUs.
110Allocation			Lazy (On Demand)		Keeps "Hello World" tabs lightweight.
111Sub-Allocation		Double-Ended Stack		Maximizes usage for varying ratio of Vertex/Index Buffers.
112
113New geometry is appended (in the middle ) only if both new vertex and index buffers fit inside.
114Otherwise allocate new buffer. Copy thread also does batching. 
115It aggregates all(who fit in  current buffer) objects coming from engineering thread into single GPU upload.
116The Copy Thread should consume batches of updates, 
117coalescing them into single ExecuteCommandList calls where possible to reduce API overhead.
118
119"Big Buffer" fallback. If Allocation_Size > Max_Page_Size, 
120allocate a dedicated Committed Resource just for that object, bypassing the paging system.
121Handles large STL. or terrain map. Treat "Big Buffers" as a special Page Type. Add a "Large Object List" to your loop.
122Do not try to jam them into the standard EI logic if they require unique resource bindings per object.
1231 separate draw command for such Jumbo objects.
124
125Create a separate std::vector<BigObject> in Tab structure. Rendering:
126​Loop through Pages (ExecuteIndirect).
127​Loop through BigObjects (Standard DrawIndexedInstanced or EI with count 1).
128
129Defragmentation Logic:
130Copy queue marks the page for defragmentation. All frames of that tab freeze. Keep presenting previous render output.
131Any 1 of the rendering thread/queue reads the mark, Transition the resource to Common. Signal a fence.
132Copy queue picks it up , once defragmented, return the new resource.
133I am willing to accept the freeze of few frames on screen.
134This is a recognised engineering  tradeoff. Acceptable to CAD users.
135
136EI Argument Buffers tightly coupled to the Memory Pages. 
137When we defragment a Page, we must simultaneously rebuild its corresponding Argument Buffer.
138Do not try to "patch" the Argument buffer; regenerate it for that Page.
139
140Growth Logic: Similar to above defragmentation. How does my copy queue handle async ( without blocking render thread?) 
141addition of 1 small geometry  say 10kb to already existing 64MB heap out of which 50MB is filled up. 
142All Views/frames of that particular tab freeze. However other tabs being handled by render thread keep processing.
143No thread stall. Transition that page to copy destination. Copy new data. 
144Transition back to render status for render thread to pick up.
145
146FREEZE LOGIC:
147RenderToTexture to implement frame freeze since swap chain is FLIP_DISCARD. 
148Side benefits? HDR handling. UI composition. Multi-monitor flexibility. Eviction safety. Clean defrag freezes
149
150Known Issues / Limitations (to be resolved in latter revision):
151Transparency sorting. accepting imperfect sorting for "Glass" pages during rotation, 
152    and doing a CPU Sort + Args Rebuild only when the camera stops.
153Hot page for object drag / active mutation.
154Evict logic.
155Comput shader frustum culling.
156Telemetry. Per-tab VRAM usage graphs. Page fragmentation heatmap. Eviction frequency counters.
157    Copy queue stall tracking.
158Selection Highlighter methodology.
159Mesh Shader on supported hardware (RTX2000 onwards, RX6000 onwards).
160Instanced based LOD optimization . Optionally using compute shader.
161
162Miscellaneous Specification: 
163There will be a uniform object ID ( 64 bit ) unique across all objects across entire process memory. 
164Each object can have up-to 16? different simultaneous variations of vertex geometry / graphics representation.
165We am expecting 1000 to 5000 draw calls per frame ?
166How should I handle multiple partially overlapping windows? 
167Each windows can be independently resized or maximized / minimized.
168Lowest distance between object and ALL the different view camera position shall be used by logic threads,
169    to decided the Level of Detail.
170It will have some mechanism to manage memory over pressure.
171To signal the logic threads to reduce the level of detail within some distance.
172Our GPU Memory manager will be a singleton. There will be only 1 instance of that class managing entire GPU memory.
173
174Consider a Desktop PC. It has 2 discrete graphics card and 1 integrated graphics card.
1751 Monitor is connected and active to each of these 3 devices.
176We can use exactly 1 device for rendering for all monitor!
177Windows 10/11 WDDM supports heterogeneous multi-adapter. When window moves: DWM composites surfaces.
178Frame copied across adapters if needed. This works but is slow since all frames need to traverse PCIe bus.
179
180TO-DO LIST : As things get completed, 
181    they will be removed from this pending list and get incorporated appropriately in design document.
182
183Phase 1: The Visual Baseline (Get these out of the way)
184[Done] Update Vertex format to include Normals. (Required for lighting).
185[Done] Hemispherical Lighting in shader. (Verify normals are correct).
186[Done] Mouse Zoom/Pan/Rotate (Basic).
187
188Phase 2: The "Freeze" Infrastructure
189Before you break the memory model, build the mechanism that hides the breakage.
190[Done] Render To Texture (RTT) & Full-Screen Quad. Goal: Detach the "Drawing" from the "Presenting."
191
192Phase 3: The API Pivot (The Hardest Part)
193Switching to ExecuteIndirect changes how you pass data. Do this BEFORE implementing custom heaps to isolate variables.
194[Done] Implement Structured Buffer for World Matrix. StructuredBuffer<float4x4> and a root constant index.
195Critical: We cannot do ExecuteIndirect for multiple objects without a way to tell the shader which object 
196    is being drawn. 
197[ ] DrawIndexedInstanced → ExecuteIndirect (EI).
198Advice: Implement this using your current committed resources first. Just get the API call working.
199[ ] Double buffered ExecuteIndirect Arguments.
200
201Phase 4: The Memory Manager (The "Vishwakarma" Core)
202Now that EI is working, replace the backing memory.
203[ ] [MISSING] Global Upload Ring Buffer.
204Critical: Copy thread needs a staging area. If we don't build this, 
205our "VRAM Pages" step will stall waiting on CreateCommittedResource for uploads.
206[ ] VRAM Pages per Tab (The Stack Allocator). Advice: Implement the "Double-Ended Stack" (Vertex Up, Index Down) here.
207[ ] CPU-Side Free List Allocator. (The logic that tracks the holes).
208[ ] Tab Management / View Management. (Integrating the heaps into the UI).
209
210Phase 5: Advanced Features & Polish
211[ ] VRAM Defragmentation. (Now safe to implement because RTT exists).
212[ ] Click Selection / Window Selection. (Requires Raycasting against your CPU Free List/Data structures).
213[ ] Instanced optimization for Pipes.
214[ ] SSAO.
215[ ] Upgrade Vertices to HDR + Tonemapping.
216[ ] Transparency Sorting. (CPU Sort + Args Rebuild when camera stops moving).
217
218Phase 6: Performance & Telemetry
219[ ] Per-Tab VRAM Usage Graphs. (Helps identify memory leaks or inefficient usage).
220[ ] Page Fragmentation Heatmap. (Visualize which pages are most fragmented).
221[ ] Eviction Frequency Counters. (Track how often eviction occurs and its impact on performance).
222[ ] Copy Queue Stall Tracking. (Identify bottlenecks in the copy thread).
223
224Phase 7: Extreme performance optimizations (Only after all above is done and stable)
225[ ] LOD Optimization. (Using instancing or compute shaders to manage levels of detail based on camera distance).
226[ ] Compute Shader Frustum Culling. (To reduce the number of objects sent to the GPU).
227[ ] Mesh Shader Implementation. (For supported hardware, to further reduce draw call overhead). (Only for pipes)
228[ ] GPU-Based Defragmentation. (Offload defragmentation to the GPU to minimize CPU stalls).
229[ ] Asynchronous Resource Creation. (Use D3D12's async resource creation to further reduce stalls
230  during heap growth or defragmentation).
231[ ] Page Level optimization : Static pages → single draw, Semi-dynamic pages → EI , 
232  Highly dynamic pages → EI + GPU compaction
233
234Not to do list:
235Multi-GPU Rendering. (Too complex for initial implementation, and Windows' multi-adapter support is limited).
236Face-wise Geometry colors. (Implementation detail). Maybe necessary for future mechanical parts.
237
238*/
239
240#include <DirectXMath.h>
241
242struct CameraState { // Each view gets its own camera state. 
243    //This is part of the "View" data structure, not the "Tab" data structure. Each tab can have multiple views.
244    DirectX::XMFLOAT3 position;
245    DirectX::XMFLOAT3 target;
246    DirectX::XMFLOAT3 up;
247    float fov;
248    float aspect;
249    float nearZ;
250    float farZ;
251
252    CameraState() { Initialize(); }
253    void Initialize() {
254        position = { 0.0f, -10.0f, 2.0f };
255        target = { 0.0f, 0.0f,  0.0f };
256        up = { 0.0f, 0.0f,  1.0f }; // Z-Up is perfect for an XY orbit.
257
258        fov = DirectX::XMConvertToRadians(60.0f);
259        aspect = 1.0f; // SAFE DEFAULT
260        nearZ = 0.1f;
261        farZ = 1000.0f;
262    }
263};
264
265inline void InitCamera(CameraState& cam, float aspectRatio)
266{
267    cam.position = { 0.0f, -10.0f, 2.0f };
268    cam.target = { 0.0f, 0.0f,  0.0f };
269    cam.up = { 0.0f, 0.0f,  1.0f }; // Z-Up is perfect for an XY orbit.
270
271    cam.fov = DirectX::XMConvertToRadians(60.0f);
272    cam.aspect = aspectRatio;
273    cam.nearZ = 0.1f;
274    cam.farZ = 1000.0f;
275}
276
277inline void UpdateCameraOrbit(CameraState& cam)
278{
279    static float rotationAngle = 0.0f;
280    rotationAngle += 0.002f;   // per-frame speed 
281
282    // Calculate the 2D radius from the target on the XY plane. We ignore Z here to prevent the "spiral away" bug.
283    float dx = cam.position.x - cam.target.x;
284    float dy = cam.position.y - cam.target.y;
285    float radius = hypotf(dx, dy);
286    if (radius < 0.001f) radius = 10.0f;// Safety check to prevent radius becoming 0 (which locks the camera)
287
288    float x = cam.target.x + cosf(rotationAngle) * radius; // Orbit in XY plane
289    float y = cam.target.y + sinf(rotationAngle) * radius;
290    float z = cam.position.z;// Z remains static (height)
291    cam.position = { x, y, z };
292}

Actual Code of our graphics engine.

  1// Copyright (c) 2025-Present : Ram Shanker: All rights reserved.
  2#pragma once
  3
  4//DirectX 12 headers. Best Place to learn DirectX12 is original Microsoft documentation.
  5// https://learn.microsoft.com/en-us/windows/win32/direct3d12/direct3d-12-graphics
  6// You need a good dose of prior C++ knowledge and Computer Fundamentals before learning DirectX12.
  7// Expect to read at least 2 times before you start grasping it !
  8
  9//Tell the HLSL compiler to include debug information into the shader blob.
 10#define D3DCOMPILE_DEBUG 1 //TODO: Remove from production build.
 11#include <d3d12.h> //Main DirectX12 API. Included from %WindowsSdkDir\Include%WindowsSDKVersion%\\um
 12//helper structures Library. MIT Licensed. Added to the project as git submodule.
 13//https://github.com/microsoft/DirectX-Headers/blob/main/include/directx/d3dx12.h
 14#include <d3dx12.h>
 15#include <dxgi1_6.h>
 16#include <dxgidebug.h>
 17#include <wrl.h>
 18#include <d3dcompiler.h>
 19#include <DirectXMath.h> //Where from? https://github.com/Microsoft/DirectXMath ?
 20#include <vector>
 21#include <string>
 22#include <unordered_map>
 23#include <random>
 24#include <ctime>
 25#include <iostream>
 26#include <thread>
 27#include <chrono>
 28#include <map>
 29#include <list>
 30
 31#include "ConstantsApplication.h"
 32#include "MemoryManagerGPU.h"
 33#include "डेटा.h"
 34
 35using namespace Microsoft::WRL;
 36
 37//DirectX12 Libraries.
 38#pragma comment(lib, "d3d12.lib") //%WindowsSdkDir\Lib%WindowsSDKVersion%\\um\arch
 39#pragma comment(lib, "dxgi.lib")
 40#pragma comment(lib, "d3dcompiler.lib")
 41#pragma comment(lib, "dxguid.lib")
 42
 43/* Double buffering is preferred for CAD application due to low input lag.Caveat: If rendering time
 44exceeds frame refresh interval, than strutting distortion will appear. However
 45we low input latency outweighs the slight frame smoothness of triple buffering.
 46Double buffering (2x) is also 50% more memory efficient Triple Buffering (3x). */
 47const UINT FRAMES_PER_RENDERTARGETS = 2; //Initially we are going with double buffering.
 48
 49// Constants
 50constexpr UINT64 MaxVertexBufferSize = 1024 * 1024 * 64; // 64 MB
 51constexpr UINT64 MaxIndexBufferSize = 1024 * 1024 * 16; // 16 MB
 52
 53/* DirectX 12 resources are organized at 3 levels:
 541. The Data   : Per Tab (Jumbo Buffers for geometry data, materials, textures, etc.)
 552. The Target : Per Window (Swap Chain, Render Targets, Depth Stencil Buffer etc.)
 563. The Worker : Per Render Thread. 1 For each monitor. (Command Queue, Command List etc.
 57    Resources shared across multiple windows on the same monitor) */
 58struct GpuResourceVertexIndexInfo; //Forward declaration.
 59struct DX12ResourcesPerTab { // (The Data) Geometry Data
 60    // Since data is isolated per tab, these live here. We use a "Jumbo" buffer approach to reduce switching.
 61    ComPtr<ID3D12Resource> vertexBuffer;
 62    ComPtr<ID3D12Resource> indexBuffer;
 63
 64    // Upload Heaps (CPU -> GPU Transfer)
 65    // Moved here because the Copy Thread writes to these when adding objects to the TAB.
 66    ComPtr<ID3D12Resource> vertexBufferUpload;
 67    ComPtr<ID3D12Resource> indexBufferUpload;
 68
 69    // Persistent Mapped Pointers (CPU Address)
 70    UINT8* pVertexDataBegin = nullptr;// Pointer for mapped vertex upload buffer
 71    UINT8* pIndexDataBegin = nullptr;  // Pointer for mapped index upload buffer
 72
 73    // Views into the buffers (to be bound during Draw)
 74    D3D12_VERTEX_BUFFER_VIEW vertexBufferView;
 75    D3D12_INDEX_BUFFER_VIEW indexBufferView;
 76
 77	// TODO: We will generalize this to hold materials, shaders, textures etc. unique to this project/tab
 78    ComPtr<ID3D12DescriptorHeap> srvHeap;
 79
 80    mutable std::mutex objectsOnGPUMutex;// Make mutex mutable so const references can lock it in rendering paths.
 81    // Copy thread will update the following map whenever it adds/removes/modifies an object on GPU.
 82    std::map<uint64_t, GpuResourceVertexIndexInfo> objectsOnGPU;
 83    std::atomic<uint64_t> lastCopyFenceValue = 0;
 84    std::atomic<uint64_t> lastRenderFenceValue = 0; // TODO : Upgrade it for multi monitor.
 85
 86    // Track how much of the jumbo buffer is used
 87    uint64_t vertexDataSize = 0;
 88    uint64_t indexDataSize = 0;
 89
 90    ComPtr<ID3D12Resource> worldMatrixBuffer; // TODO: Doublebuffer it. Or make it per Page ?
 91    UINT8 * pWorldMatrixDataBegin = nullptr;
 92    uint32_t               matrixCapacity = 4096;
 93    uint32_t               matrixCount = 0;
 94	std::vector<uint32_t>  freeMatrixSlots;   // free-list for matrix indices.
 95    //To enable re-use of slots when objects are removed.
 96
 97	CameraState camera; //Reference is updated per frame. 
 98    //Currently per tab, but latter we will have this per view. Since each tab can have multiple views.
 99};
100
101struct DX12ResourcesPerWindow {// Presentation Logic
102    int WindowWidth = 800;//Current ViewPort ( Rendering area ) size. excluding task-bar etc.
103    int WindowHeight = 600;
104    ID3D12CommandQueue* creatorQueue = nullptr; // Track which queue this windows was created with. To assist with migrations.
105    
106    ComPtr<IDXGISwapChain3>         swapChain; // The link to the OS Window
107	//ComPtr<ID3D12CommandQueue>    commandQueue; // Moved to OneMonitorController
108    ComPtr<ID3D12DescriptorHeap>    rtvHeap;
109    ComPtr<ID3D12Resource>          renderTargets[FRAMES_PER_RENDERTARGETS];
110    UINT rtvDescriptorSize = 0;
111
112    // Render To Texture Infrastructure
113    ComPtr<ID3D12Resource>          renderTextures[FRAMES_PER_RENDERTARGETS];
114    ComPtr<ID3D12DescriptorHeap>    rttRtvHeap;
115    ComPtr<ID3D12DescriptorHeap>    rttSrvHeap;
116    DXGI_FORMAT                     rttFormat = DXGI_FORMAT_R8G8B8A8_UNORM;
117    // TODO: When we will implement HDR support, we wil have change above format to following.
118    //DXGI_FORMAT                     rttFormat = DXGI_FORMAT_R16G16B16A16_FLOAT; // HDR ready
119
120    ComPtr<ID3D12RootSignature>     rootSignature;
121    ComPtr<ID3D12PipelineState> pipelineState;
122
123    ComPtr<ID3D12Resource> depthStencilBuffer;// Depth Buffer (Sized to the window dimensions)
124    ComPtr<ID3D12DescriptorHeap> dsvHeap;
125
126    D3D12_VIEWPORT viewport;// Viewport & Scissor (Dependent on Window Size).
127    D3D12_RECT scissorRect;
128
129    ComPtr<ID3D12Resource> constantBuffer;
130    ComPtr<ID3D12DescriptorHeap> cbvHeap;
131    UINT8* cbvDataBegin = nullptr;
132
133	UINT frameIndex = 0; // Remember this is different from allocatorIndex in Render Thread. It can change even during windows resize.
134};
135
136struct DX12ResourcesPerRenderThread { // This one is created 1 for each monitor.
137    // For convenience only. It simply points to OneMonitorController.commandQueue
138	ComPtr<ID3D12CommandQueue> commandQueue;
139
140    // Note that there are as many render thread as number of monitors attached.
141    // Command Allocators MUST be unique to the thread.
142    // We need one per frame-in-flight to avoid resetting while GPU is reading.
143    ComPtr<ID3D12CommandAllocator> commandAllocators[FRAMES_PER_RENDERTARGETS];
144	UINT allocatorIndex = 0; // Remember this is different from frameIndex available per Window.
145
146    // The Command List (The recording pen). Can be reset and reused for multiple windows within the same frame.
147    ComPtr<ID3D12GraphicsCommandList> commandList;
148
149    // Synchronization (Per Window VSync)
150    HANDLE fenceEvent = nullptr;
151    ComPtr<ID3D12Fence> fence;
152    UINT64 fenceValue = 0;
153};
154
155struct OneMonitorController { // Variables stored per monitor.
156    // System Fetched information.
157    bool isScreenInitalized = false;
158    int screenPixelWidth = 800;
159    int screenPixelHeight = 600;
160    int screenPhysicalWidth = 0; // in mm
161    int screenPhysicalHeight = 0; // in mm
162    int WindowWidth = 800;//Current ViewPort ( Rendering area ) size. excluding task-bar etc.
163    int WindowHeight = 600;
164
165    HMONITOR hMonitor = NULL; // Monitor handle. Remains fixed as long as monitor is not disconnected / disabled.
166    std::wstring monitorName;            // Monitor device name (e.g., "\\\\.\\DISPLAY1")
167    std::wstring friendlyName;           // Human readable name (e.g., "Dell U2720Q")
168    RECT monitorRect;                    // Full monitor rectangle
169    RECT workAreaRect;                   // Work area (excluding task bar)
170    int dpiX = 96;                       // DPI X
171    int dpiY = 96;                       // DPI Y
172    double scaleFactor = 1.0;            // Scale factor (100% = 1.0, 125% = 1.25, etc.)
173    bool isPrimary = false;              // Is this the primary monitor?
174    DWORD orientation = DMDO_DEFAULT;    // Monitor orientation
175    int refreshRate = 60;                // Refresh rate in Hz
176    int colorDepth = 32;                 // Color depth in bits per pixel
177
178    bool isVirtualMonitor = false;       // To support headless mode.
179
180    // DirectX12 Resources.
181	ComPtr<ID3D12CommandQueue> commandQueue;    // Persistent. Survives thread restarts.
182    bool hasActiveThread = false;// We need to know if this specific monitor is currently being serviced by a thread
183};
184
185// Commands sent from Generator thread(s) to the Copy thread
186enum class CommandToCopyThreadType { ADD, MODIFY, REMOVE };
187struct CommandToCopyThread
188{
189    CommandToCopyThreadType type;
190    std::optional<GeometryData> geometry; // Present for ADD and MODIFY
191    uint64_t id; // Always present
192    uint64_t tabID; // NEW: We must know which tab this object belongs to!
193};
194// Thread synchronization between Main Logic thread and Copy thread
195extern std::mutex toCopyThreadMutex;
196extern std::condition_variable toCopyThreadCV;
197extern std::queue<CommandToCopyThread> commandToCopyThreadQueue;
198
199extern std::atomic<bool> pauseRenderThreads; // Defined in Main.cpp
200// Represents complete geometry and index data associated with 1 engineering object..
201// This structure holds information about a resource allocated in GPU memory (VRAM)
202struct GpuResourceVertexIndexInfo {
203    ComPtr<ID3D12Resource> vertexBuffer;
204    D3D12_VERTEX_BUFFER_VIEW vertexBufferView;
205    ComPtr<ID3D12Resource> indexBuffer;
206    D3D12_INDEX_BUFFER_VIEW indexBufferView;
207    UINT indexCount;
208    uint32_t matrixIndex = 0;
209
210    //TODO: Latter on we will generalize this structure to hold textures, materials, shaders etc.
211    // Currently we are letting the Drive manage the GPU memory fragmentation. Latter we will manage it ourselves.
212    //uint64_t vramOffset; // Simulated VRAM address
213    //uint64_t size;
214    // In a real DX12 app, this would hold ID3D12Resource*, D3D12_VERTEX_BUFFER_VIEW, etc.
215};
216
217// Packet of work for a Render Thread for one frame
218struct RenderPacket {
219    uint64_t frameNumber;
220    std::vector<uint64_t> visibleObjectIds;
221};
222
223class HrException : public std::runtime_error// Simple exception helper for HRESULT checks
224{
225public:
226    HrException(HRESULT hr) : std::runtime_error("HRESULT Exception"), hr(hr) {}
227    HRESULT Error() const { return hr; }
228private:
229    const HRESULT hr;
230};
231
232inline void ThrowIfFailed(HRESULT hr)
233{
234    if (FAILED(hr)) { throw HrException(hr); }
235}
236
237
238class ThreadSafeQueueGPU {
239public:
240    void push(CommandToCopyThread value) {
241        std::lock_guard<std::mutex> lock(mutex);
242        fifoQueue.push(std::move(value));
243        cond.notify_one();
244    }
245
246    // Non-blocking pop
247    bool try_pop(CommandToCopyThread& value) {
248        std::lock_guard<std::mutex> lock(mutex);
249        if (fifoQueue.empty()) {
250            return false;
251        }
252        value = std::move(fifoQueue.front());
253        fifoQueue.pop();
254        return true;
255    }
256
257    // Shuts down the queue, waking up any waiting threads
258    void shutdownQueue() {
259        std::lock_guard<std::mutex> lock(mutex);
260        shutdown = true;
261        cond.notify_all();
262    }
263
264private:
265    std::queue<CommandToCopyThread> fifoQueue; // fifo = First-In First-Out
266    std::mutex mutex;
267    std::condition_variable cond;
268    bool shutdown = false;
269};
270
271inline ThreadSafeQueueGPU g_gpuCommandQueue;
272
273// VRAM Manager : This class handles the GPU memory dynamically.
274// There will be exactly 1 object of this class in entire application. Hence the special name.
275// भगवान शंकर की कृपा बनी रहे. Corresponding object is named "gpu".
276class शंकर {
277public:
278    //std::vector<OneMonitorController> screens;
279    OneMonitorController screens[MV_MAX_MONITORS];
280    int currentMonitorCount = 0; // Global monitor count. It can be 0 when no monitors are found (headless mode)
281
282    // IDXGIFactory6 / IDXGIAdapter4 Prerequisite : Windows 10 1803+ / Windows 11
283    ComPtr<IDXGIFactory6> factory6; //The OS-level display system manager. Can iterate over GPUs.
284    ComPtr<IDXGIAdapter4> hardwareAdapter;// Represents a physical GPU device.
285    //Represents 1 logical GPU device on above GPU adapter. Helps create all DirectX12 memory / resources / comments etc.
286
287	ComPtr<ID3D12Device> device; //Very Important: We support EXACTLY 1 GPU device only in this version.
288    bool isGPUEngineInitialized = false; //TODO: To be implemented.
289    
290    //Following to be added latter.
291    //ID3D12DescriptorHeapMgr    ← Global descriptor allocator
292    //Shader& PSO Cache         ← Shared by all threads
293    //AdapterInfo                ← For device selection / VRAM stats
294
295    /* We will have 1 Render Queue per monitor, which is local to Render Thread.
296    IMPORTANT: All GPU have only 1 physical hardware engine, and can execute 1 command at a time only.
297    Even if 4 commands list are submitted to 4 independent queue, graphics driver / WDDM serializes them.
298    Still we need to have 4 separate queue to properly handle different refresh rate.
299
300    Ex: If we put all 4 window on same queue: Window A (60Hz) submits a Present command. The Queue STALLS
301    waiting for Monitor A's VSync interval. Window B (144Hz) submits draw comand. 
302    Window B cannot be processed because the Queue is blocked by Windows A's VSync wait. 
303    By using 4 Queues, Queue A can sit blocked waiting for VSync, 
304    while Queue B immediately push work work to the GPU for the faster monitor.*/
305
306    ComPtr<ID3D12CommandQueue> renderCommandQueue; // Only used by Monitor No. 0 i.e. 1st Render Thread.
307    ComPtr<ID3D12Fence> renderFence;// Synchronization for Render Queue
308    UINT64 renderFenceValue = 0;
309    HANDLE renderFenceEvent = nullptr;
310
311	ComPtr<ID3D12CommandQueue> copyCommandQueue; // There is only 1 across the application.
312    ComPtr<ID3D12Fence> copyFence;// Synchronization for Copy Queue
313    UINT64 copyFenceValue = 0;
314    HANDLE copyFenceEvent = nullptr;
315
316public:
317    UINT8* pVertexDataBegin = nullptr; // MODIFICATION: Pointer for mapped vertex upload buffer
318    UINT8* pIndexDataBegin = nullptr;  // MODIFICATION: Pointer for mapped index upload buffer
319
320    // Maps our CPU ObjectID to its resource info in VRAM
321    std::unordered_map<uint64_t, GpuResourceVertexIndexInfo> resourceMap;
322
323    // Simulates a simple heap allocator with 16MB chunks
324    uint64_t m_nextFreeOffset = 0;
325    const uint64_t CHUNK_SIZE = 16 * 1024 * 1024;
326    uint64_t m_vram_capacity = 4 * CHUNK_SIZE; // Simulate 64MB VRAM
327
328    // When an object is updated, the old VRAM is put here to be freed later.
329    struct DeferredFree {
330        uint64_t frameNumber; // The frame it became obsolete
331        GpuResourceVertexIndexInfo resource;
332    };
333    std::list<DeferredFree> deferredFreeQueue;
334
335	// Allocate space in VRAM. Returns the handle. What is this used for?
336    // std::optional<GpuResourceVertexIndexInfo> Allocate(size_t size);
337
338    void ProcessDeferredFrees(uint64_t lastCompletedRenderFrame);
339
340	शंकर() {}; // Our Main function inilsizes DirectX12 global resources by calling InitD3DDeviceOnly().
341    void InitD3DDeviceOnly();
342    void InitD3DPerTab(DX12ResourcesPerTab& tabRes); // Call this when a new Tab is created
343    void InitD3DPerWindow(DX12ResourcesPerWindow& dx, HWND hwnd, ID3D12CommandQueue* commandQueue);
344    void PopulateCommandList(ID3D12GraphicsCommandList* cmdList, //Called by per monitor render thead.
345        DX12ResourcesPerWindow& winRes, const DX12ResourcesPerTab& tabRes);
346    void WaitForPreviousFrame(DX12ResourcesPerRenderThread dx);
347    void ResizeD3DWindow(DX12ResourcesPerWindow& dx, UINT newWidth, UINT newHeight);
348
349    // Called when a monitor is unplugged or window is destroyed. Destroys SwapChain/RTVs but KEEPS Geometry.
350    void CleanupWindowResources(DX12ResourcesPerWindow& winRes);
351    // Called when a TAB is closed by the user. Destroys the Jumbo Vertex/Index Buffers.
352    void CleanupTabResources(DX12ResourcesPerTab& tabRes);
353    // Called ONLY at application exit (wWinMain end).Destroys the Device, Factory, and Global Copy Queue.
354	// Thread resources are cleaned up by the Render Thread itself before exit.
355    void CleanupD3DGlobal();
356};
357
358void FetchAllMonitorDetails();
359BOOL CALLBACK MonitorEnumProc(HMONITOR hMonitor, HDC hdcMonitor, LPRECT lprcMonitor, LPARAM dwData);
360
361/*
362IID_PPV_ARGS is a MACRO used in DirectX (and COM programming in general) to help safely and correctly
363retrieve interface pointers during object creation or querying. It helps reduce repetitive typing of codes.
364COM interfaces are identified by unique GUIDs. Than GUID pointer is converted to appropriate pointer type.
365
366Ex: IID_PPV_ARGS(&device) expands to following:
367IID iid = __uuidof(ID3D12Device);
368void** ppv = reinterpret_cast<void**>(&device);
369*/
370
371// Structure to hold transformation matrices
372struct ConstantBuffer {
373    DirectX::XMFLOAT4X4 viewProj;   // 64 bytes
374};
375
376// Externs for communication 
377extern std::atomic<bool> shutdownSignal;
378extern ThreadSafeQueueGPU g_gpuCommandQueue;
379
380// Logic Thread "Fence"
381extern std::mutex g_logicFenceMutex;
382extern std::condition_variable g_logicFenceCV;
383extern uint64_t g_logicFrameCount;
384
385// Copy Thread "Fence"
386extern std::mutex g_copyFenceMutex;
387extern std::condition_variable g_copyFenceCV;
388extern uint64_t g_copyFrameCount;
389
390//TODO: Implement this. In a real allocator, we would manage free lists and possibly defragment memory.
391/*
392std::optional<GpuResourceVertexIndexInfo> शंकर::Allocate(size_t size) {
393
394    if (nextFreeOffset + size > m_vram_capacity) {
395        std::cerr << "VRAM MANAGER: Out of memory!" << std::endl;
396        // Here, the Main Logic thread would be signaled to reduce LOD.
397        return std::nullopt;
398    }
399    GpuResourceVertexIndexInfo info{ nextFreeOffset, size };
400    nextFreeOffset += size; // Simple bump allocator
401    return info;
402}*/
403
404// Utility Functions
405
406// Waits for the previous frame to complete rendering.
407inline void WaitForGpu(DX12ResourcesPerWindow dx)
408{   //Where are we using this function?
409    /*
410    dx.commandQueue->Signal(dx.fence.Get(), dx.fenceValue);
411    dx.fence->SetEventOnCompletion(dx.fenceValue, dx.fenceEvent);
412    WaitForSingleObjectEx(dx.fenceEvent, INFINITE, FALSE);
413    dx.fenceValue++;*/
414}
415
416// Waits for a specific fence value to be reached
417inline void WaitForFenceValue(DX12ResourcesPerWindow dx, UINT64 fenceValue)
418{ // Where are we using this?
419    /*
420    if (dx.fence->GetCompletedValue() < fenceValue)
421    {
422        ThrowIfFailed(dx.fence->SetEventOnCompletion(fenceValue, dx.fenceEvent));
423        WaitForSingleObjectEx(dx.fenceEvent, INFINITE, FALSE);
424    }*/
425}
426
427// Thread Functions
428// Thread synchronization between Main Logic thread and Copy thread
429inline std::mutex toCopyThreadMutex;
430inline std::condition_variable toCopyThreadCV;
431inline std::queue<CommandToCopyThread> commandToCopyThreadQueue;
432
433// Thread Functions - Just Declaration!
434void GpuCopyThread();
435void GpuRenderThread(int monitorId, int refreshRate);
  1// Copyright (c) 2025-Present : Ram Shanker: All rights reserved.
  2#pragma once
  3
  4//DirectX 12 headers. Best Place to learn DirectX12 is original Microsoft documentation.
  5// https://learn.microsoft.com/en-us/windows/win32/direct3d12/direct3d-12-graphics
  6// You need a good dose of prior C++ knowledge and Computer Fundamentals before learning DirectX12.
  7// Expect to read at least 2 times before you start grasping it !
  8
  9//Tell the HLSL compiler to include debug information into the shader blob.
 10#define D3DCOMPILE_DEBUG 1 //TODO: Remove from production build.
 11#include <d3d12.h> //Main DirectX12 API. Included from %WindowsSdkDir\Include%WindowsSDKVersion%\\um
 12//helper structures Library. MIT Licensed. Added to the project as git submodule.
 13//https://github.com/microsoft/DirectX-Headers/blob/main/include/directx/d3dx12.h
 14#include <d3dx12.h>
 15#include <dxgi1_6.h>
 16#include <dxgidebug.h>
 17#include <wrl.h>
 18#include <d3dcompiler.h>
 19#include <DirectXMath.h> //Where from? https://github.com/Microsoft/DirectXMath ?
 20#include <vector>
 21#include <string>
 22#include <unordered_map>
 23#include <random>
 24#include <ctime>
 25#include <iostream>
 26#include <thread>
 27#include <chrono>
 28#include <map>
 29#include <list>
 30
 31#include "ConstantsApplication.h"
 32#include "MemoryManagerGPU.h"
 33#include "डेटा.h"
 34
 35using namespace Microsoft::WRL;
 36
 37//DirectX12 Libraries.
 38#pragma comment(lib, "d3d12.lib") //%WindowsSdkDir\Lib%WindowsSDKVersion%\\um\arch
 39#pragma comment(lib, "dxgi.lib")
 40#pragma comment(lib, "d3dcompiler.lib")
 41#pragma comment(lib, "dxguid.lib")
 42
 43/* Double buffering is preferred for CAD application due to low input lag.Caveat: If rendering time
 44exceeds frame refresh interval, than strutting distortion will appear. However
 45we low input latency outweighs the slight frame smoothness of triple buffering.
 46Double buffering (2x) is also 50% more memory efficient Triple Buffering (3x). */
 47const UINT FRAMES_PER_RENDERTARGETS = 2; //Initially we are going with double buffering.
 48
 49// Constants
 50constexpr UINT64 MaxVertexBufferSize = 1024 * 1024 * 64; // 64 MB
 51constexpr UINT64 MaxIndexBufferSize = 1024 * 1024 * 16; // 16 MB
 52
 53/* DirectX 12 resources are organized at 3 levels:
 541. The Data   : Per Tab (Jumbo Buffers for geometry data, materials, textures, etc.)
 552. The Target : Per Window (Swap Chain, Render Targets, Depth Stencil Buffer etc.)
 563. The Worker : Per Render Thread. 1 For each monitor. (Command Queue, Command List etc.
 57    Resources shared across multiple windows on the same monitor) */
 58struct GpuResourceVertexIndexInfo; //Forward declaration.
 59struct DX12ResourcesPerTab { // (The Data) Geometry Data
 60    // Since data is isolated per tab, these live here. We use a "Jumbo" buffer approach to reduce switching.
 61    ComPtr<ID3D12Resource> vertexBuffer;
 62    ComPtr<ID3D12Resource> indexBuffer;
 63
 64    // Upload Heaps (CPU -> GPU Transfer)
 65    // Moved here because the Copy Thread writes to these when adding objects to the TAB.
 66    ComPtr<ID3D12Resource> vertexBufferUpload;
 67    ComPtr<ID3D12Resource> indexBufferUpload;
 68
 69    // Persistent Mapped Pointers (CPU Address)
 70    UINT8* pVertexDataBegin = nullptr;// Pointer for mapped vertex upload buffer
 71    UINT8* pIndexDataBegin = nullptr;  // Pointer for mapped index upload buffer
 72
 73    // Views into the buffers (to be bound during Draw)
 74    D3D12_VERTEX_BUFFER_VIEW vertexBufferView;
 75    D3D12_INDEX_BUFFER_VIEW indexBufferView;
 76
 77	// TODO: We will generalize this to hold materials, shaders, textures etc. unique to this project/tab
 78    ComPtr<ID3D12DescriptorHeap> srvHeap;
 79
 80    mutable std::mutex objectsOnGPUMutex;// Make mutex mutable so const references can lock it in rendering paths.
 81    // Copy thread will update the following map whenever it adds/removes/modifies an object on GPU.
 82    std::map<uint64_t, GpuResourceVertexIndexInfo> objectsOnGPU;
 83    std::atomic<uint64_t> lastCopyFenceValue = 0;
 84    std::atomic<uint64_t> lastRenderFenceValue = 0; // TODO : Upgrade it for multi monitor.
 85
 86    // Track how much of the jumbo buffer is used
 87    uint64_t vertexDataSize = 0;
 88    uint64_t indexDataSize = 0;
 89
 90    ComPtr<ID3D12Resource> worldMatrixBuffer; // TODO: Doublebuffer it. Or make it per Page ?
 91    UINT8 * pWorldMatrixDataBegin = nullptr;
 92    uint32_t               matrixCapacity = 4096;
 93    uint32_t               matrixCount = 0;
 94	std::vector<uint32_t>  freeMatrixSlots;   // free-list for matrix indices.
 95    //To enable re-use of slots when objects are removed.
 96
 97	CameraState camera; //Reference is updated per frame. 
 98    //Currently per tab, but latter we will have this per view. Since each tab can have multiple views.
 99};
100
101struct DX12ResourcesPerWindow {// Presentation Logic
102    int WindowWidth = 800;//Current ViewPort ( Rendering area ) size. excluding task-bar etc.
103    int WindowHeight = 600;
104    ID3D12CommandQueue* creatorQueue = nullptr; // Track which queue this windows was created with. To assist with migrations.
105    
106    ComPtr<IDXGISwapChain3>         swapChain; // The link to the OS Window
107	//ComPtr<ID3D12CommandQueue>    commandQueue; // Moved to OneMonitorController
108    ComPtr<ID3D12DescriptorHeap>    rtvHeap;
109    ComPtr<ID3D12Resource>          renderTargets[FRAMES_PER_RENDERTARGETS];
110    UINT rtvDescriptorSize = 0;
111
112    // Render To Texture Infrastructure
113    ComPtr<ID3D12Resource>          renderTextures[FRAMES_PER_RENDERTARGETS];
114    ComPtr<ID3D12DescriptorHeap>    rttRtvHeap;
115    ComPtr<ID3D12DescriptorHeap>    rttSrvHeap;
116    DXGI_FORMAT                     rttFormat = DXGI_FORMAT_R8G8B8A8_UNORM;
117    // TODO: When we will implement HDR support, we wil have change above format to following.
118    //DXGI_FORMAT                     rttFormat = DXGI_FORMAT_R16G16B16A16_FLOAT; // HDR ready
119
120    ComPtr<ID3D12RootSignature>     rootSignature;
121    ComPtr<ID3D12PipelineState> pipelineState;
122
123    ComPtr<ID3D12Resource> depthStencilBuffer;// Depth Buffer (Sized to the window dimensions)
124    ComPtr<ID3D12DescriptorHeap> dsvHeap;
125
126    D3D12_VIEWPORT viewport;// Viewport & Scissor (Dependent on Window Size).
127    D3D12_RECT scissorRect;
128
129    ComPtr<ID3D12Resource> constantBuffer;
130    ComPtr<ID3D12DescriptorHeap> cbvHeap;
131    UINT8* cbvDataBegin = nullptr;
132
133	UINT frameIndex = 0; // Remember this is different from allocatorIndex in Render Thread. It can change even during windows resize.
134};
135
136struct DX12ResourcesPerRenderThread { // This one is created 1 for each monitor.
137    // For convenience only. It simply points to OneMonitorController.commandQueue
138	ComPtr<ID3D12CommandQueue> commandQueue;
139
140    // Note that there are as many render thread as number of monitors attached.
141    // Command Allocators MUST be unique to the thread.
142    // We need one per frame-in-flight to avoid resetting while GPU is reading.
143    ComPtr<ID3D12CommandAllocator> commandAllocators[FRAMES_PER_RENDERTARGETS];
144	UINT allocatorIndex = 0; // Remember this is different from frameIndex available per Window.
145
146    // The Command List (The recording pen). Can be reset and reused for multiple windows within the same frame.
147    ComPtr<ID3D12GraphicsCommandList> commandList;
148
149    // Synchronization (Per Window VSync)
150    HANDLE fenceEvent = nullptr;
151    ComPtr<ID3D12Fence> fence;
152    UINT64 fenceValue = 0;
153};
154
155struct OneMonitorController { // Variables stored per monitor.
156    // System Fetched information.
157    bool isScreenInitalized = false;
158    int screenPixelWidth = 800;
159    int screenPixelHeight = 600;
160    int screenPhysicalWidth = 0; // in mm
161    int screenPhysicalHeight = 0; // in mm
162    int WindowWidth = 800;//Current ViewPort ( Rendering area ) size. excluding task-bar etc.
163    int WindowHeight = 600;
164
165    HMONITOR hMonitor = NULL; // Monitor handle. Remains fixed as long as monitor is not disconnected / disabled.
166    std::wstring monitorName;            // Monitor device name (e.g., "\\\\.\\DISPLAY1")
167    std::wstring friendlyName;           // Human readable name (e.g., "Dell U2720Q")
168    RECT monitorRect;                    // Full monitor rectangle
169    RECT workAreaRect;                   // Work area (excluding task bar)
170    int dpiX = 96;                       // DPI X
171    int dpiY = 96;                       // DPI Y
172    double scaleFactor = 1.0;            // Scale factor (100% = 1.0, 125% = 1.25, etc.)
173    bool isPrimary = false;              // Is this the primary monitor?
174    DWORD orientation = DMDO_DEFAULT;    // Monitor orientation
175    int refreshRate = 60;                // Refresh rate in Hz
176    int colorDepth = 32;                 // Color depth in bits per pixel
177
178    bool isVirtualMonitor = false;       // To support headless mode.
179
180    // DirectX12 Resources.
181	ComPtr<ID3D12CommandQueue> commandQueue;    // Persistent. Survives thread restarts.
182    bool hasActiveThread = false;// We need to know if this specific monitor is currently being serviced by a thread
183};
184
185// Commands sent from Generator thread(s) to the Copy thread
186enum class CommandToCopyThreadType { ADD, MODIFY, REMOVE };
187struct CommandToCopyThread
188{
189    CommandToCopyThreadType type;
190    std::optional<GeometryData> geometry; // Present for ADD and MODIFY
191    uint64_t id; // Always present
192    uint64_t tabID; // NEW: We must know which tab this object belongs to!
193};
194// Thread synchronization between Main Logic thread and Copy thread
195extern std::mutex toCopyThreadMutex;
196extern std::condition_variable toCopyThreadCV;
197extern std::queue<CommandToCopyThread> commandToCopyThreadQueue;
198
199extern std::atomic<bool> pauseRenderThreads; // Defined in Main.cpp
200// Represents complete geometry and index data associated with 1 engineering object..
201// This structure holds information about a resource allocated in GPU memory (VRAM)
202struct GpuResourceVertexIndexInfo {
203    ComPtr<ID3D12Resource> vertexBuffer;
204    D3D12_VERTEX_BUFFER_VIEW vertexBufferView;
205    ComPtr<ID3D12Resource> indexBuffer;
206    D3D12_INDEX_BUFFER_VIEW indexBufferView;
207    UINT indexCount;
208    uint32_t matrixIndex = 0;
209
210    //TODO: Latter on we will generalize this structure to hold textures, materials, shaders etc.
211    // Currently we are letting the Drive manage the GPU memory fragmentation. Latter we will manage it ourselves.
212    //uint64_t vramOffset; // Simulated VRAM address
213    //uint64_t size;
214    // In a real DX12 app, this would hold ID3D12Resource*, D3D12_VERTEX_BUFFER_VIEW, etc.
215};
216
217// Packet of work for a Render Thread for one frame
218struct RenderPacket {
219    uint64_t frameNumber;
220    std::vector<uint64_t> visibleObjectIds;
221};
222
223class HrException : public std::runtime_error// Simple exception helper for HRESULT checks
224{
225public:
226    HrException(HRESULT hr) : std::runtime_error("HRESULT Exception"), hr(hr) {}
227    HRESULT Error() const { return hr; }
228private:
229    const HRESULT hr;
230};
231
232inline void ThrowIfFailed(HRESULT hr)
233{
234    if (FAILED(hr)) { throw HrException(hr); }
235}
236
237
238class ThreadSafeQueueGPU {
239public:
240    void push(CommandToCopyThread value) {
241        std::lock_guard<std::mutex> lock(mutex);
242        fifoQueue.push(std::move(value));
243        cond.notify_one();
244    }
245
246    // Non-blocking pop
247    bool try_pop(CommandToCopyThread& value) {
248        std::lock_guard<std::mutex> lock(mutex);
249        if (fifoQueue.empty()) {
250            return false;
251        }
252        value = std::move(fifoQueue.front());
253        fifoQueue.pop();
254        return true;
255    }
256
257    // Shuts down the queue, waking up any waiting threads
258    void shutdownQueue() {
259        std::lock_guard<std::mutex> lock(mutex);
260        shutdown = true;
261        cond.notify_all();
262    }
263
264private:
265    std::queue<CommandToCopyThread> fifoQueue; // fifo = First-In First-Out
266    std::mutex mutex;
267    std::condition_variable cond;
268    bool shutdown = false;
269};
270
271inline ThreadSafeQueueGPU g_gpuCommandQueue;
272
273// VRAM Manager : This class handles the GPU memory dynamically.
274// There will be exactly 1 object of this class in entire application. Hence the special name.
275// भगवान शंकर की कृपा बनी रहे. Corresponding object is named "gpu".
276class शंकर {
277public:
278    //std::vector<OneMonitorController> screens;
279    OneMonitorController screens[MV_MAX_MONITORS];
280    int currentMonitorCount = 0; // Global monitor count. It can be 0 when no monitors are found (headless mode)
281
282    // IDXGIFactory6 / IDXGIAdapter4 Prerequisite : Windows 10 1803+ / Windows 11
283    ComPtr<IDXGIFactory6> factory6; //The OS-level display system manager. Can iterate over GPUs.
284    ComPtr<IDXGIAdapter4> hardwareAdapter;// Represents a physical GPU device.
285    //Represents 1 logical GPU device on above GPU adapter. Helps create all DirectX12 memory / resources / comments etc.
286
287	ComPtr<ID3D12Device> device; //Very Important: We support EXACTLY 1 GPU device only in this version.
288    bool isGPUEngineInitialized = false; //TODO: To be implemented.
289    
290    //Following to be added latter.
291    //ID3D12DescriptorHeapMgr    ← Global descriptor allocator
292    //Shader& PSO Cache         ← Shared by all threads
293    //AdapterInfo                ← For device selection / VRAM stats
294
295    /* We will have 1 Render Queue per monitor, which is local to Render Thread.
296    IMPORTANT: All GPU have only 1 physical hardware engine, and can execute 1 command at a time only.
297    Even if 4 commands list are submitted to 4 independent queue, graphics driver / WDDM serializes them.
298    Still we need to have 4 separate queue to properly handle different refresh rate.
299
300    Ex: If we put all 4 window on same queue: Window A (60Hz) submits a Present command. The Queue STALLS
301    waiting for Monitor A's VSync interval. Window B (144Hz) submits draw comand. 
302    Window B cannot be processed because the Queue is blocked by Windows A's VSync wait. 
303    By using 4 Queues, Queue A can sit blocked waiting for VSync, 
304    while Queue B immediately push work work to the GPU for the faster monitor.*/
305
306    ComPtr<ID3D12CommandQueue> renderCommandQueue; // Only used by Monitor No. 0 i.e. 1st Render Thread.
307    ComPtr<ID3D12Fence> renderFence;// Synchronization for Render Queue
308    UINT64 renderFenceValue = 0;
309    HANDLE renderFenceEvent = nullptr;
310
311	ComPtr<ID3D12CommandQueue> copyCommandQueue; // There is only 1 across the application.
312    ComPtr<ID3D12Fence> copyFence;// Synchronization for Copy Queue
313    UINT64 copyFenceValue = 0;
314    HANDLE copyFenceEvent = nullptr;
315
316public:
317    UINT8* pVertexDataBegin = nullptr; // MODIFICATION: Pointer for mapped vertex upload buffer
318    UINT8* pIndexDataBegin = nullptr;  // MODIFICATION: Pointer for mapped index upload buffer
319
320    // Maps our CPU ObjectID to its resource info in VRAM
321    std::unordered_map<uint64_t, GpuResourceVertexIndexInfo> resourceMap;
322
323    // Simulates a simple heap allocator with 16MB chunks
324    uint64_t m_nextFreeOffset = 0;
325    const uint64_t CHUNK_SIZE = 16 * 1024 * 1024;
326    uint64_t m_vram_capacity = 4 * CHUNK_SIZE; // Simulate 64MB VRAM
327
328    // When an object is updated, the old VRAM is put here to be freed later.
329    struct DeferredFree {
330        uint64_t frameNumber; // The frame it became obsolete
331        GpuResourceVertexIndexInfo resource;
332    };
333    std::list<DeferredFree> deferredFreeQueue;
334
335	// Allocate space in VRAM. Returns the handle. What is this used for?
336    // std::optional<GpuResourceVertexIndexInfo> Allocate(size_t size);
337
338    void ProcessDeferredFrees(uint64_t lastCompletedRenderFrame);
339
340	शंकर() {}; // Our Main function inilsizes DirectX12 global resources by calling InitD3DDeviceOnly().
341    void InitD3DDeviceOnly();
342    void InitD3DPerTab(DX12ResourcesPerTab& tabRes); // Call this when a new Tab is created
343    void InitD3DPerWindow(DX12ResourcesPerWindow& dx, HWND hwnd, ID3D12CommandQueue* commandQueue);
344    void PopulateCommandList(ID3D12GraphicsCommandList* cmdList, //Called by per monitor render thead.
345        DX12ResourcesPerWindow& winRes, const DX12ResourcesPerTab& tabRes);
346    void WaitForPreviousFrame(DX12ResourcesPerRenderThread dx);
347    void ResizeD3DWindow(DX12ResourcesPerWindow& dx, UINT newWidth, UINT newHeight);
348
349    // Called when a monitor is unplugged or window is destroyed. Destroys SwapChain/RTVs but KEEPS Geometry.
350    void CleanupWindowResources(DX12ResourcesPerWindow& winRes);
351    // Called when a TAB is closed by the user. Destroys the Jumbo Vertex/Index Buffers.
352    void CleanupTabResources(DX12ResourcesPerTab& tabRes);
353    // Called ONLY at application exit (wWinMain end).Destroys the Device, Factory, and Global Copy Queue.
354	// Thread resources are cleaned up by the Render Thread itself before exit.
355    void CleanupD3DGlobal();
356};
357
358void FetchAllMonitorDetails();
359BOOL CALLBACK MonitorEnumProc(HMONITOR hMonitor, HDC hdcMonitor, LPRECT lprcMonitor, LPARAM dwData);
360
361/*
362IID_PPV_ARGS is a MACRO used in DirectX (and COM programming in general) to help safely and correctly
363retrieve interface pointers during object creation or querying. It helps reduce repetitive typing of codes.
364COM interfaces are identified by unique GUIDs. Than GUID pointer is converted to appropriate pointer type.
365
366Ex: IID_PPV_ARGS(&device) expands to following:
367IID iid = __uuidof(ID3D12Device);
368void** ppv = reinterpret_cast<void**>(&device);
369*/
370
371// Structure to hold transformation matrices
372struct ConstantBuffer {
373    DirectX::XMFLOAT4X4 viewProj;   // 64 bytes
374};
375
376// Externs for communication 
377extern std::atomic<bool> shutdownSignal;
378extern ThreadSafeQueueGPU g_gpuCommandQueue;
379
380// Logic Thread "Fence"
381extern std::mutex g_logicFenceMutex;
382extern std::condition_variable g_logicFenceCV;
383extern uint64_t g_logicFrameCount;
384
385// Copy Thread "Fence"
386extern std::mutex g_copyFenceMutex;
387extern std::condition_variable g_copyFenceCV;
388extern uint64_t g_copyFrameCount;
389
390//TODO: Implement this. In a real allocator, we would manage free lists and possibly defragment memory.
391/*
392std::optional<GpuResourceVertexIndexInfo> शंकर::Allocate(size_t size) {
393
394    if (nextFreeOffset + size > m_vram_capacity) {
395        std::cerr << "VRAM MANAGER: Out of memory!" << std::endl;
396        // Here, the Main Logic thread would be signaled to reduce LOD.
397        return std::nullopt;
398    }
399    GpuResourceVertexIndexInfo info{ nextFreeOffset, size };
400    nextFreeOffset += size; // Simple bump allocator
401    return info;
402}*/
403
404// Utility Functions
405
406// Waits for the previous frame to complete rendering.
407inline void WaitForGpu(DX12ResourcesPerWindow dx)
408{   //Where are we using this function?
409    /*
410    dx.commandQueue->Signal(dx.fence.Get(), dx.fenceValue);
411    dx.fence->SetEventOnCompletion(dx.fenceValue, dx.fenceEvent);
412    WaitForSingleObjectEx(dx.fenceEvent, INFINITE, FALSE);
413    dx.fenceValue++;*/
414}
415
416// Waits for a specific fence value to be reached
417inline void WaitForFenceValue(DX12ResourcesPerWindow dx, UINT64 fenceValue)
418{ // Where are we using this?
419    /*
420    if (dx.fence->GetCompletedValue() < fenceValue)
421    {
422        ThrowIfFailed(dx.fence->SetEventOnCompletion(fenceValue, dx.fenceEvent));
423        WaitForSingleObjectEx(dx.fenceEvent, INFINITE, FALSE);
424    }*/
425}
426
427// Thread Functions
428// Thread synchronization between Main Logic thread and Copy thread
429inline std::mutex toCopyThreadMutex;
430inline std::condition_variable toCopyThreadCV;
431inline std::queue<CommandToCopyThread> commandToCopyThreadQueue;
432
433// Thread Functions - Just Declaration!
434void GpuCopyThread();
435void GpuRenderThread(int monitorId, int refreshRate);