I have a project ready to try this out. It’s a software simulator, and each run (typically 10-10,000 iterations) can be done independently, with the results aggregated and shown at the end. It’s also instrumented to show CPU and memory usage, and on MacOS, you can watch how busy each core gets (hint: PEGGED in multicore mode).
Can run it single-threaded, then with multiprocessing, then with multi-core and time each one. Pretty happy with multicore, but as soon as the no-GIL/subinterpreter version is stable, will try it out and see if it makes any difference. Under the hood it uses numpy and scipy, so will have to wait for them.
Edit: on my todo list is to try it all out in Mojo. They make pretty big performance gain claims.