Add a raytracing demo · Issue #148 · mratsim/weave

This adds a raytracing demo and mini writeup on the performance observed: > The Nim version has can use a single parallel-for loop or nested parallel-for loop for better load balancing (important when the inner loop .is actually larget than the outer loop) > > At the moment, there is no random number generator that can deal with the dynamic thread migration of Weave so we get interesting artifacts compared to the single-threaded or the single parallel-for versions. See #147 on multithreaded RNG **Single-thread or single parallel-for** ![ray_trace_300samples_nim_threaded](https://raw.githubusercontent.com/mratsim/weave/raytracing-demo/demos/raytracing/ray_trace_300samples_nim_threaded.png) **Nested parallel-for** ![ray_trace_300samples_nim_nested](https://raw.githubusercontent.com/mratsim/weave/raytracing-demo/demos/raytracing/ray_trace_300samples_nim_nested.png) > CPU: i9-9980XE, 18 cores, overclocked at 4.1GHz all-core turbo (from 3.0 nominal) > The code was compiled with default flag, hence x86-64, hence SSE2. > > | Bench | Nim | Clang C++ | GCC C++ | > |------------------|------------:|----------:|------------:| > | Single-threaded | 4min43.369s | 4m51.052s | 4min50.934s | > | Multithreaded | 13.211s | 14.428s | 2min14.616s | > | Nested-parallel | 12.981s | | | > | Parallel speedup | 21.83x | 20.17x | 2.16x | > > Single-threaded Nim is 2.7% faster than Clang C++ > Multithreaded Nim via Weave is 11.1% faster Clang C++ > > Note: I only have 18 cores but we observe over 18x speedup > with Weave and LLVM. This is probably due to 2 factors: > - Raytracing is pure compute, in particular contrary to high-performance computing > and machine learning workloads which are also very memory-intensive (matrices and tensors with thousands to millions of elements) > The scene has only 10 objects and a camera to keep track off. > Memory is extremely slow, you can do 100 additions while waiting for data > in the L2 cache. > - A CPU has a certain number of execution ports and can use instruction-level parallelism (we call that superscalar). Hyperthreading is a way to > use those extra execution ports. However in many computing workloads > the sibling cores are also competing for memory bandwidth. > From the previous point, this is not true for raytracing > and so we enjoy super linear speedup.

AI Analysis

This issue appears to be discussing a feature request or bug report related to the repository. Based on the content, it seems to be resolved. The issue was opened by mratsim and has received 0 comments.

Add a comment

Comment form would go here

Add a raytracing demo#148