[Discuss] 'C' string tokenizer for those who hate strtok
David Bronaugh
dbronaugh at linuxboxen.org
Fri Jun 30 10:26:22 PDT 2006
Brian Quinlan wrote:
> Whether they are significantly slower at runtime depends on the
> algorithm that you are expressing and how your code is written e.g. in
> this case, my Python code can split a 3,889 character string into 1000
> substrings 10000 times in 3.89 seconds. Your C code takes 13x longer.
> I would expect that all the calls to malloc are killing you - Python
> pre-allocates memory in medium-size chunks and manages it's own pools
> so it probably ran my entire test using a single malloc call where the
> C code required 1000 * 2 * 10000 malloc calls (and a corresponding
> number of free calls).
>
> But I agree that C code can always be made to be faster than Python
> code if you are willing to spend enough time optimizing it. In this
> case, you could pre-allocate len(string) * 2 bytes to store the tokens.
You could also use strtok_r...
However, I'm curious about your testing:
- First, pw's Makefile specifies conflicting optimizations -- the last
one is -O0 so I believe that's the one that stands. I changed that first.
- Second, it seems that more than half the time spent in the C code is
spent calling printf. I removed the printf call and the program runs
more than twice as fast.
My test input is up at http://bronaugh.linuxboxen.org/test.txt -- can
you please make these changes and compare? For that matter, rip the
print out of the Python, for fairness.
Keep in mind that I just wanted to clean up the current code, not
rearchitect it. My own opinions is -- just use strtok_r.
David.
More information about the Discuss
mailing list