Skip to content

Commit 06a8620

Browse files
committed
Next version of The Command Queue
1 parent 952b319 commit 06a8620

File tree

1 file changed

+132
-5
lines changed

1 file changed

+132
-5
lines changed

next/getting-started/the-command-queue.md

Lines changed: 132 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
1-
The Command Queue <span class="bullet">🟠</span>
1+
The Command Queue <span class="bullet">🟢</span>
22
=================
33

44
```{lit-setup}
55
:tangle-root: 015 - The Command Queue - Next
66
:parent: 010 - The Device - Next
77
```
88

9+
*Resulting code:* [`step015-next`](https://github.com/eliemichel/LearnWebGPU-Code/tree/step015-next)
10+
911
Now that we have a `WGPUDevice` object in our hands, we can use it to **sent data and instructions** to the GPU. We learn in this chapter a **key concept** of WebGPU (and of most modern graphics APIs as well), namely **the command queue**.
1012

1113
```{important}
@@ -33,7 +35,7 @@ They are not too far, but for high performance applications like real time graph
3335
```{themed-figure} /images/command-queue/bandwidth_{theme}.svg
3436
:align: center
3537

36-
The bandwidth tells how much information can travel at the same time.
38+
The bandwidth tells **how much information** can travel at the same time.
3739
```
3840

3941
Since the GPU is meant for **massive parallel data processing**, its performance can easily be **bound by the memory transfers** rather than the actual computation.
@@ -49,7 +51,7 @@ The connection between the **CPU memory** (RAM) and **GPU memory (vRAM)** depend
4951
```{themed-figure} /images/command-queue/latency_{theme}.svg
5052
:align: center
5153

52-
The latency is the time it takes for each bit to travel.
54+
The latency is **the time it takes** for each bit to travel.
5355
```
5456

5557
**Even the smallest bit of information** needs some time for the round trip to and back from the GPU. As a consequence, functions that send instructions to the GPU return almost immediately: they **do not wait for the instruction to have actually been executed** because that would require to wait for the GPU to transfer back the "I'm done" information.
@@ -217,11 +219,136 @@ std::cout << "Command submitted." << std::endl;
217219
Waiting for completion
218220
----------------------
219221

220-
**WIP line** *We use `wgpuQueueOnSubmittedWorkDone`*
222+
As repeated and illustrated above, instructions that are submitted to the GPU get executed at their own pace. In many cases, it is not a big issue (as long as the instructions are executed in the right order).
223+
224+
Sometimes however, we really want to **have the CPU wait until submitted instructions have been executed**. For that, we may use the function `wgpuQueueOnSubmittedWorkDone`. This creates **an asynchronous operation that does nothing on the GPU side**. Or, more exactly, nothing else than signaling that everything received before has been executed.
225+
226+
Like any asynchronous operation, it takes a **callback info** as argument and returns a `WGPUFuture`. In this case, it only takes one other argument, namely the queue into which we push the operation:
227+
228+
```C++
229+
// Signature of the wgpuQueueOnSubmittedWorkDone function in webgpu.h
230+
WGPUFuture wgpuQueueOnSubmittedWorkDone(
231+
WGPUQueue queue,
232+
WGPUQueueWorkDoneCallbackInfo callbackInfo
233+
);
234+
```
235+
236+
As usual, the callback info contains a `nextInChain` pointer for extensions, a `mode`, two `userdata` pointers and a `callback` function pointer. The latter must have the following type:
237+
238+
```C++
239+
// Definition of the WGPUQueueWorkDoneCallback function type in webgpu.h
240+
typedef void (*WGPUQueueWorkDoneCallback)(
241+
WGPUQueueWorkDoneStatus status,
242+
void* userdata1,
243+
void* userdata2
244+
);
245+
```
246+
247+
In other words, all it receives besides our potential `userdata` pointers is a status.
248+
249+
````{warning}
250+
The returned status **does not** tell about the success of **other** operations. All it says is whether querying for the moment where submitted work was done did succeed or not. Possible values are:
251+
252+
- `WGPUQueueWorkDoneStatus_Success` when the query operation went well.
253+
- `WGPUQueueWorkDoneStatus_InstanceDropped` when the WebGPU instanced was dropped before previous instructions where executed. This callback is executed nonetheless, but with this special status value.
254+
- `WGPUQueueWorkDoneStatus_Error` when something went wrong in the process (it's probably a bad sign about the overall course of your program).
255+
````
256+
257+
Inspired by what we did when requesting the adapter and device, we can create a callback that takes a boolean as first user pointer and turn it on whenever the callback is invoked:
221258

222259
```{lit} C++, Wait for completion
223-
// TODO
260+
// Our callback invoked when GPU instructions have been executed
261+
auto onQueuedWorkDone = [](
262+
WGPUQueueWorkDoneStatus status,
263+
void* userdata1,
264+
void* /* userdata2 */
265+
) {
266+
// Display a warning when status is not success
267+
if (status != WGPUQueueWorkDoneStatus_Success) {
268+
std::cout << "Warning: wgpuQueueOnSubmittedWorkDone failed, this is suspicious!" << std::endl;
269+
}
270+
271+
// Interpret userdata1 as a pointer to a boolean (and turn it into a
272+
// mutable reference), then turn it to 'true'
273+
bool& workDone = *reinterpret_cast<bool*>(userdata1);
274+
workDone = true;
275+
};
276+
277+
// Create the boolean that will be passed to the callback as userdata1
278+
// and initialize it to 'false'
279+
bool workDone = false;
280+
281+
// Create the callback info
282+
WGPUQueueWorkDoneCallbackInfo callbackInfo = WGPU_QUEUE_WORK_DONE_CALLBACK_INFO_INIT;
283+
callbackInfo.mode = WGPUCallbackMode_AllowProcessEvents;
284+
callbackInfo.callback = onQueuedWorkDone;
285+
callbackInfo.userdata1 = &workDone; // pass the address of workDone
286+
287+
// Add the async operation to the queue
288+
wgpuQueueOnSubmittedWorkDone(queue, callbackInfo);
289+
290+
{{Wait for workDone to be true}}
291+
292+
std::cout << "All queued instructions have been executed!" << std::endl;
293+
```
294+
295+
To wait for the callback to effectively get invoked and thus `workDone` to become `true`, we **reuse the same loop as before**, that calls `wgpuInstanceProcessEvents` interleaved with a small sleep:
296+
297+
```{lit} C++, Wait for workDone to be true
298+
// Hand the execution to the WebGPU instance until onQueuedWorkDone gets invoked
299+
wgpuInstanceProcessEvents(instance);
300+
while (!workDone) {
301+
#ifdef __EMSCRIPTEN__
302+
emscripten_sleep(200);
303+
#else
304+
std::this_thread::sleep_for(std::chrono::milliseconds(200));
305+
#endif
306+
wgpuInstanceProcessEvents(instance);
307+
}
308+
```
309+
310+
```{note}
311+
Again, we will see in the [next chapter](playing-with-buffers.md) a more fine-grained method to wait for asynchronous operations, using the `WGPUFuture` handle returned by `wgpuQueueOnSubmittedWorkDone`, but for this case using `wgpuInstanceProcessEvents` works well.
312+
```
313+
314+
You should finally see something like this in your program's output:
315+
316+
```
317+
Submitting command...
318+
Command submitted.
319+
All queued instructions have been executed!
320+
Device 0000009EC06FF4F0 was lost: reason 3 (A valid external Instance reference no longer exists.)
224321
```
225322

323+
````{tip}
324+
It is **normal that our device gets lost** at the end of our program. We can **have the reason change a bit** though by making sure the instance realizes that the device got released before deleting it on its turn. To do so, you may call `wgpuInstanceProcessEvents` between `wgpuDeviceRelease` and `wgpuInstanceRelease`:
325+
326+
```C++
327+
// At the end
328+
wgpuQueueRelease(queue);
329+
wgpuDeviceRelease(device);
330+
// We clean up the WebGPU instance
331+
wgpuInstanceProcessEvents(instance); // <-- add this!
332+
wgpuInstanceRelease(instance);
333+
```
334+
335+
You should now see something like this in your output:
336+
337+
```
338+
Device 000000D266AFEF50 was lost: reason 2 (Device was destroyed.)
339+
```
340+
````
341+
226342
Conclusion
227343
----------
344+
345+
We have seen a few important notions in this chapter:
346+
347+
- The CPU and GPU live in **different timelines**.
348+
- Commands are streamed from CPU to GPU through a **command queue**.
349+
- Queued command buffers must be encoded using a **command encoder**.
350+
- We can **wait for enqueued commands** to be executed with `wgpuQueueOnSubmittedWorkDone`.
351+
352+
This was a bit abstract because we can queue operations but we did not see any yet. In the next chapter we will start with **simple operations on memory buffers**, followed by **our first shader** to compute things on the GPU!
353+
354+
*Resulting code:* [`step015-next`](https://github.com/eliemichel/LearnWebGPU-Code/tree/step015-next)

0 commit comments

Comments
 (0)