So I have a piece of code for which I am trying to get the exact number of clock cycles it will take to execute. However, my experiments indicate that this number is incorrect. Please don't ask details of how I run the experiment. What I need to know is where I am going wrong in counting the clock cycles. I am working with TI launchpad for MSP430. Is it adding any optimizations I am not aware of? Any other ideas anyone can give me.? Code and assembly with clock cycles is given below. Initial value of i is 0.
int get_sign(int i){
int j=i+1000;
int y=0;
while(i<100){
if (y == 0){
j=j-i;
y=1;
}
else{
j=j*2+i*2;
y=0;
}
i++;
}
return j;
}
Assembly generated | Clock cycles | |
push r4 | 3 | |
mov r1, r4 | 1 | |
add #2, r4 | 1 | |
add #llo(-6), r1 | 2 | |
mov r15, -4(r4) | 4 | |
mov -4(r4), r15 | 3 | |
Add #1000, r15 | 2 | |
mov r15, -8(r4) | 4 | |
mov #0, -6(r4) | 4 | |
jmp | 2 | |
cmp #0, -6(r4) | 4 | x100 |
jne | 2 | x100 |
sub -4(r4), -8(r4) | 6 | x50 |
mov #1, -6(r4) | 4 | x50 |
jmp | 2 | x50 |
Mov -8(r4), r15 | 3 | x50 |
add -4(r4), r15 | 3 | x50 |
mov r15, -8(r4) | 4 | x50 |
Rla -8(r4) | 6 | x50 |
mov #0, -6(r4) | 4 | x50 |
add #1, -4(r4) | 4 | x100 |
cmp #100, -4(r4) | 5 | x100 |
jl | 2 | x100 |
Mov -8(r4), r15 | 3 | |
Add #6, r1 | 2 | |
pop r4 | 2 | |
ret | 3 | |
Total | 3336 |
There are various factors that can affect instruction timing. Some are shown in the list below:
- Interrupts (the biggest offender)
If interrupts are enabled, an interrupt firing in the middle of the test will cause the timing to be wrong.
- Flash wait states
Reading flash is slower than reading RAM so if your code is running from flash, you have to account for wait states when an instruction is loaded. Burst mode further complicates things since some processors can perform a read-ahead of multiple instructions so that execution times are reduced. Even slower RAM may require wait states.
- Instruction caching
Some processors have instruction caches that can hold many instructions to eliminate the need to read them from flash. Figuring out the timing is very difficult since a jump taken may clear the cache and instructions would have to be read from flash again.
- Hardware register access
Reading data from a hardware register introduces variable clock cycles since the processor may have to execute wait states until the data is ready at the register.
These are only four examples out of many. There are others such as bus arbitration, DRAM refresh, turbo modes, etc. The best you can hope for is an approximation. As for the MSP430, I'm not familiar with the processor but I'm pretty sure that it has one or more of the items I posted above.
Aaaand -- this is why I gave up on counting clock cycles a long long time ago. I benchmark code instead, and allow for a wide margin in timing in actual use. Not only does benchmarking give me (in my opinion) a more accurate picture, it also accounts for instances where I might be committing pure old-fashioned screwups, like failing to set up the flash memory accesses for the best-case processor speed.
I suppose that if I were working on something time-critical, on a processor that had an easy-to-determine cycle count, then I might count clock ticks -- but on the other hand, such a processor would, almost by definition, be much slower than a modern alternative. So if I were doing that sort of thing with that sort of processor, and there wasn't some compelling reason (legacy hardware, radiation hardness, etc.) to keep that processor, I might advocate for something newer and faster.
You're not giving away much are you ?
How many clock cycles do you measure?
Is the error between what you measure and what you predict constant?
You haven't told us what processor you are using, what else is running on it etc etc.
I'm struggling to see why you care - I often measure how long things take but I haven't needed to do anything complicated in an exact number of clock cycles for over 30 years (battery powered instrument using a very feeble early CMOS processor to generate sine waves and measure the response to them at the same time, no on chip timers.)
MK
Best way to count clock cycles is a dual channel oscilloscope. You can not use a processor of any kind to both generate a signal and count clock cycles since the interrupts and service routines will skew the results unless it's a very slow signal generated on the order of 1% of processor clock speed. Your calculations above should be close. See reply by Jorick as he said there are a lot to people employed trying to solve similar problems. Just the supply voltage drift or clock crystal drift can cause changes in execution times and therefore total clock cycles per time interval not to mention stray capacitance between board traces etc.
You might also consider putting a delay loop in then calibrate it using a oscilloscope then you could measure the time between delays and convert that to clock cycles
good luck