This was a nice opportunity to learn about Grand Central Dispatch through AI for me as well. I knew about the page alignment and async read techniques but not on OSX.
On semi-related note, it's worth noting that if you're trying to make a Python script run faster and don't have the know-how to re-write your program in C or how to write SIMD (if applicable), you can always try to run the script through pypy, merely replacing python3 with pypy3 in bench.sh, with no other changes, brings the runtime of the first program down from 104s to 9s on my machine:
Benchmark 1: python3 0_mvp.py bench.txt
Time (mean ± σ): 104.739 s ± 3.982 s [User: 104.213 s, System: 0.158 s]
Range (min … max): 100.303 s … 108.005 s 3 runs
Benchmark 2: python3 1_c_regex.py bench.txt
Time (mean ± σ): 14.777 s ± 0.017 s [User: 14.563 s, System: 0.158 s]
Range (min … max): 14.759 s … 14.791 s 3 runs
Benchmark 1: pypy3 0_mvp.py bench.txt
Time (mean ± σ): 9.381 s ± 0.204 s [User: 9.110 s, System: 0.234 s]
Range (min … max): 9.245 s … 9.616 s 3 runs
Benchmark 2: pypy3 1_c_regex.py bench.txt
Time (mean ± σ): 4.296 s ± 0.031 s [User: 4.038 s, System: 0.236 s]
Range (min … max): 4.262 s … 4.324 s 3 runs
davidst 2 hours ago [-]
It's a wonderful problem for optimizing code. Michael Abrash hosted a performance contest for word counting back in... 1991? (If my memory serves.) The article and code can be found here:
There Ain’t No Such Thing as the Fastest Code:
Lessons Learned in the Pursuit of the Ultimate Word Counter
Would be interesting to see how far this is with exaloop's codon compiler.
kragen 3 hours ago [-]
As it happens, splitting input text into words fast is one of the things I most want to do this week! But maybe that's because it's a distraction from benchmarking hash tables.
kwillets 1 hours ago [-]
I think you can just xor the whitespace mask with the shifted one.
Also, when counting 0xFF bytes from a boolean etc., sub the mask; 0xFF == -1.
This was a nice opportunity to learn about Grand Central Dispatch through AI for me as well. I knew about the page alignment and async read techniques but not on OSX.
Edit: actually you can get even faster with mmap https://github.com/healeycodes/counting-words-at-simd-speed/...
There Ain’t No Such Thing as the Fastest Code: Lessons Learned in the Pursuit of the Ultimate Word Counter
Article: https://www.phatcode.net/res/224/files/html/ch16/16-01.html
Code: https://www.phatcode.net/res/224/files/html/ch16/16-05.html
Also, when counting 0xFF bytes from a boolean etc., sub the mask; 0xFF == -1.