Exploring Assembly Language Optimization (LAB2)
Introduction: Welcome to our lab experiment where we
delved into the fascinating world of Assembly Language, seeking to understand
its intricacies and uncover ways to optimize code execution. In this blog post,
we'll take you through our journey, from calculating the performance of an
initial assembly program to the creation of a significantly faster version, and
finally, share our experiences and reflections on the process.
Calculating Performance:
In the first part of our experiment, we analyzed a provided assembly
program designed to fill a bitmap display with a solid color. The challenge was
to calculate its execution time accurately. To achieve this, we meticulously
evaluated the code, considering the number of clock cycles for each instruction
and loop iteration. The time depends on the specific processor you are using,
as different 6502-compatible processors might have slightly different timing
characteristics. Considering my 6502 processor to be a standardized processor,
here's the breakdown based on the spreadsheet:
LDA #$00 - 2 clock cycles
STA $40 - 3 clock cycles
LDA #$02 - 2 clock cycles
STA $41 - 3 clock cycles
LDA #$07 - 2 clock cycles
lDY #$00 - 2 clock cycles
Other than that, we have 2 loops (outer loop and inner loop)
in the provided code.
è
The inner loop has the following
instructions :
STA ($40),y - 6 clock
cycles (5 cycles for the indexed indirect addressing mode + 1 cycle for the
instruction itself)
INY - 2 clock cycles
BNE loop - 2 clock cycles (2 cycles if the branch is not taken, 3 cycles if it's taken)
The loop iterates 256 times (once for each pixel on the page). So, the loop takes 6 + 2 + 2 = 10 clock cycles per iteration.
è On the other hand, the outer loop iterates 6 times (for 6 pages) and consists of:
- inc $41 - 5 clock cycles (4 cycles for the
instruction + 1 cycle for the page number increment)
- ldx $41 - 3 clock cycles
- cpx #$06 - 2 clock cycles
- bne loop - 2 clock cycles (2 cycles if the
branch is not taken, 3 cycles if it's taken)
Now, Total Execution Time =
Initialization + (Inner Loop * Inner Loop Iterations) + (Outer Loop * Outer
Loop Iterations)
Total Execution Time = (2+3+2+3+2+2)
+ (10 * 256) + ((5+3+2+2) * 6)
= 14 +
2560 + 432
= 3006
clock cycles.
So, the given assembly code takes a total of 3006 clock
cycles to execute on a standard 6502 processor.
Also, we can understand the total memory usage for the
program code plus any pointers or variables by breaking down the memory usage
for each pointers and variables.
- $40 and $41 are two memory locations used as pointers. So, they occupy 2 bytes in memory.
- $00, $02, and $07 are immediate values used in instructions. They don't occupy extra memory because they are part of the instructions themselves.
- $06 is also an immediate value used in an instruction, so it doesn't occupy extra memory.
So, the total memory usage for this assembly code is 2
bytes for the pointers $40 and $41. The rest of the values are used as
immediate data within instructions and do not occupy additional memory.
Having analyzed the original code we performed the task as
expected, but we were determined to optimize it for greater efficiency. For achieving
that, one of the way to do this is to reduce the number of iterations and
eliminate unnecessary instructions.
OPTIMIZED CODE :-
lda
#$00 ; Set a pointer in
memory location $40 to point to $0200
sta
$40 ; ... low byte ($00)
goes in address $40
lda
#$02
sta
$41 ; ... high byte ($02)
goes into address $41
lda #$07 ; Color number
ldx #$00 ; Set X register to 0
loop: sta ($40),x ; Set pixel color at the address (pointer)+X
inx ; Increment X
bne
loop ; Continue until X rolls
over to 0
inc
$41 ; Increment the page
cpx
#$06 ; Compare X with 6
bne loop ; Continue until done all pages
Optimizations made:
- Instead of using ldy #$00 and iny in
the loop, we use ldx #$00 and inx. This saves clock cycles
because inx is one cycle faster than iny.
- We changed sta ($40),y to sta ($40),x
and adjusted the increment operation accordingly to make it more
efficient.
Through this optimized code I was able to significantly reduce the number of clock cycles required per loop iteration. So, the optimized version of the assembly code takes a total of 2238 clock cycles to execute, which is more faster than the original version which took 3006 clock cycles.
Modifying Code:
(6) To modify the code to fill the display with light blue instead of yellow, you need to change the value you load into the Accumulator (A register) to the light blue color code. In the 6502 emulator page, the color code for light blue is $e. Here's the modified code:
lda #$e ; Load light blue color code
; Rest of your code remains the
same
lda #$00 ; set a pointer in memory location $40 to
point to $0200
sta $40 ; ... low byte ($00) goes in address $40
lda #$02
sta $41 ; ... high byte ($02) goes into address $41
ldy #$00 ; set index to 0
loop: sta ($40),y
; set pixel color at the address (pointer)+Y
iny ; increment index
bne loop ; continue until done the page (256 pixels)
inc $41 ; increment the page
ldx $41 ; get the current page number
cpx #$06 ; compare with 6
bne loop ; continue until done all pages
This code will fill the display with light blue instead of
yellow.
(7) To modify the provided assembly code to fill the
display with a different color on each page (each "page" being
one-quarter of the bitmapped display), we can use a loop to change the color
number for each page. Here's the modified code:
lda #$00 ; set a pointer in memory location $40 to
point to $0200
sta $40 ; ... low byte ($00) goes in address $40
lda #$02
sta $41 ; ... high byte ($02) goes into address
$41
ldx #$00 ; Initialize x register for page color
loop: lda x ; Load the color number for the current
page
sta $42 ; Store the color number at memory
location $42
ldy #$00 ; set index to 0
inner_loop: sta ($40),y ; set pixel color at the address (pointer)+Y
iny ; increment index
bne inner_loop ; continue until done the page (256
pixels)
inc $41 ; increment the page
ldx $41 ; get the current page number
cpx #$06 ; compare with 6 (if you have
4 pages, use cpx #4)
bne loop ; continue until done all
pages
inx ; Increment x for the
next page's color
cpx #$04 ; Compare x with the number of
different colors (4 in this case)
bne loop ; Continue until all pages are
filled with different colors
This code sets a different color
for each page and fills each page with that color. The variable x is used to
keep track of the current page's color, and it is incremented after filling
each page. The code will repeat the process until all pages are filled with
different colors.
Experiences and Reflections:
Working with Assembly Language was both challenging and
rewarding. It exposed me to the inner workings of the processor, where I had to
consider each clock cycle and memory access meticulously. It was a valuable
learning experience, gaining a deeper understanding of how code execution can
be optimized for efficiency.
Through experimentation and optimization, I realized that
even small changes in code can have a significant impact on performance. This
reinforced the importance of writing efficient code and the value of
understanding the low-level details of a system.
In conclusion, our lab experiment in Assembly Language not
only improved my technical skills but also provided a new perspective on the
world of programming. It's a reminder that even in the age of high-level
languages, understanding the fundamentals can make us better programmers.
Comments
Post a Comment