threading bugs are sometimes hard to catch
Putting it mildly! Threading bugs are probably the worst class of bugs to debug
Definitely debatable if this is worth the risk of impossible bugs. Python is very slow, and multi threading isn’t going to change that. 4x extremely slow is still extremely slow. If you care remotely about performance you need to use a different language anyway.
Python can be extremely slow, it doesn’t have to be. I recently re-wrote a stats program at work and got a ~500x speedup over the original python and a 10x speed up over the c++ rewrite of that. If you know how python works and avoid the performance foot-guns like nested loops you can often (though not always) get good performance.
Unless the C++ code was doing something wrong there’s literally no way you can write pure Python that’s 10x faster than it. Something else is going on there. Maybe the c++ code was accidentally O(N^2) or something.
In general Python will be 10-200 times slower than C++. 50x slower is typical.
Unless the C++ code was doing something wrong there’s literally no way you can write pure Python that’s 10x faster than it. Something else is going on there.
Completely agreed, but it can be surprising just how often C++ really is written that inefficiently; I have had multiple successes in my career of rewriting C++ code in Python and making it faster in the process, but never because Python is inherently faster than C++.
Nope, if you’re working on large arrays of data you can get significant speed ups using well optimised BLAS functions that are vectorised (numpy) which beats out simply written c++ operating on each array element in turn. There’s also Numba which uses LLVM to jit compile a subset of python to get compiled performance, though I didnt go to that in this case.
You could link the BLAS libraries to c++ but its significantly more work than just importing numpy from python.
You’re thinking of CPython. PyPy can routinely compete with C and C++, particularly in allocation-heavy or pointer-heavy scenarios.
You’re both at least partly right. The only interpreted language that can compete with compiled for execution speed is Java and it has the downside of being Java.
That being said, you might be surprised at how fast you can make Python code execute, even pre-GIL changes. I certainly was. Using multiprocessing and code architected to be run massively parallel, it can be blazingly fast. It would still be blown out of the water by similarly optimized compiled code but, is worth serious consideration if you want to optimize for iterative development.
My view on such workflows would be:
- Write iteration of code component in Python.
- Release.
- Evaluate if any functional changes are required. If so, goto 1.
- Port component to compiled language, changing function calls/imports to make use of the compiled binary alongside the other interpreted components.
- Release.
- Refactor code to optimize for compiled language, features that compiled language enables, and/or security/bug fixes.
- Release.
- Evaluate if further refactor is required at this time, if so, goto 6.