Exploring Assembly Language Optimization (LAB2)

Introduction: Welcome to our lab experiment where we delved into the fascinating world of Assembly Language, seeking to understand its intricacies and uncover ways to optimize code execution. In this blog post, we'll take you through our journey, from calculating the performance of an initial assembly program to the creation of a significantly faster version, and finally, share our experiences and reflections on the process.

Calculating Performance:

In the first part of our experiment, we analyzed a provided assembly program designed to fill a bitmap display with a solid color. The challenge was to calculate its execution time accurately. To achieve this, we meticulously evaluated the code, considering the number of clock cycles for each instruction and loop iteration. The time depends on the specific processor you are using, as different 6502-compatible processors might have slightly different timing characteristics. Considering my 6502 processor to be a standardized processor, here's the breakdown based on the spreadsheet:

LDA #$00 - 2 clock cycles

STA $40 - 3 clock cycles

LDA #$02 - 2 clock cycles

STA $41 - 3 clock cycles

LDA #$07 - 2 clock cycles

lDY #$00 - 2 clock cycles

Other than that, we have 2 loops (outer loop and inner loop) in the provided code.

è The inner loop has the following instructions :

STA ($40),y - 6 clock cycles (5 cycles for the indexed indirect addressing mode + 1 cycle for the instruction itself)

INY - 2 clock cycles

BNE loop - 2 clock cycles (2 cycles if the branch is not taken, 3 cycles if it's taken)

The loop iterates 256 times (once for each pixel on the page). So, the loop takes 6 + 2 + 2 = 10 clock cycles per iteration.

è On the other hand, the outer loop iterates 6 times (for 6 pages) and consists of:

  1. inc $41 - 5 clock cycles (4 cycles for the instruction + 1 cycle for the page number increment)
  2. ldx $41 - 3 clock cycles
  3. cpx #$06 - 2 clock cycles
  4. bne loop - 2 clock cycles (2 cycles if the branch is not taken, 3 cycles if it's taken)

Now, Total Execution Time = Initialization + (Inner Loop * Inner Loop Iterations) + (Outer Loop * Outer Loop Iterations)

Total Execution Time = (2+3+2+3+2+2) + (10 * 256) + ((5+3+2+2) * 6)

         = 14 + 2560 + 432

         = 3006 clock cycles.

So, the given assembly code takes a total of 3006 clock cycles to execute on a standard 6502 processor.

Also, we can understand the total memory usage for the program code plus any pointers or variables by breaking down the memory usage for each pointers and variables.

  •  $40 and $41 are two memory locations used as pointers. So, they occupy 2 bytes in memory.  
  • $00, $02, and $07 are immediate values used in instructions. They don't occupy extra memory because they are part of the instructions themselves.
  • $06 is also an immediate value used in an instruction, so it doesn't occupy extra memory.

So, the total memory usage for this assembly code is 2 bytes for the pointers $40 and $41. The rest of the values are used as immediate data within instructions and do not occupy additional memory.

Having analyzed the original code we performed the task as expected, but we were determined to optimize it for greater efficiency. For achieving that, one of the way to do this is to reduce the number of iterations and eliminate unnecessary instructions.

OPTIMIZED CODE :-

               lda #$00              ; Set a pointer in memory location $40 to point to $0200

               sta $40                 ; ... low byte ($00) goes in address $40

               lda #$02             

               sta $41                 ; ... high byte ($02) goes into address $41

 

               lda #$07              ; Color number

               ldx #$00               ; Set X register to 0

loop:      sta ($40),x           ; Set pixel color at the address (pointer)+X

               inx                         ; Increment X

               bne loop              ; Continue until X rolls over to 0

 

               inc $41                 ; Increment the page

               cpx #$06              ; Compare X with 6

               bne loop              ; Continue until done all pages

Optimizations made:

  1. Instead of using ldy #$00 and iny in the loop, we use ldx #$00 and inx. This saves clock cycles because inx is one cycle faster than iny.
  2. We changed sta ($40),y to sta ($40),x and adjusted the increment operation accordingly to make it more efficient.

Through this optimized code I was able to significantly reduce the number of clock cycles required per loop iteration. So, the optimized version of the assembly code takes a total of 2238 clock cycles to execute, which is more faster than the original version which took 3006 clock cycles.

Modifying Code:

(6) To modify the code to fill the display with light blue instead of yellow, you need to change the value you load into the Accumulator (A register) to the light blue color code. In the 6502 emulator page, the color code for light blue is $e. Here's the modified code:


lda #$e  ; Load light blue color code

 

; Rest of your code remains the same

lda #$00  ; set a pointer in memory location $40 to point to $0200

sta $40   ; ... low byte ($00) goes in address $40

lda #$02

sta $41   ; ... high byte ($02) goes into address $41

 

ldy #$00  ; set index to 0

 

loop:  sta ($40),y  ; set pixel color at the address (pointer)+Y

 

iny       ; increment index

bne loop  ; continue until done the page (256 pixels)

 

inc $41   ; increment the page

ldx $41   ; get the current page number

cpx #$06  ; compare with 6

bne loop  ; continue until done all pages

 

This code will fill the display with light blue instead of yellow.

 

(7) To modify the provided assembly code to fill the display with a different color on each page (each "page" being one-quarter of the bitmapped display), we can use a loop to change the color number for each page. Here's the modified code:

 

lda #$00      ; set a pointer in memory location $40 to point to $0200

sta $40       ; ... low byte ($00) goes in address $40

lda #$02

sta $41       ; ... high byte ($02) goes into address $41

 

ldx #$00      ; Initialize x register for page color

loop: lda x     ; Load the color number for the current page

sta $42        ; Store the color number at memory location $42

ldy #$00      ; set index to 0

 

inner_loop: sta ($40),y   ; set pixel color at the address (pointer)+Y

 

iny                      ; increment index

bne inner_loop           ; continue until done the page (256 pixels)

 

inc $41                  ; increment the page

ldx $41                  ; get the current page number

cpx #$06                 ; compare with 6 (if you have 4 pages, use cpx #4)

bne loop                 ; continue until done all pages

 

inx                      ; Increment x for the next page's color

cpx #$04                 ; Compare x with the number of different colors (4 in this case)

bne loop                 ; Continue until all pages are filled with different colors

 

 

This code sets a different color for each page and fills each page with that color. The variable x is used to keep track of the current page's color, and it is incremented after filling each page. The code will repeat the process until all pages are filled with different colors.

 

Experiences and Reflections:

Working with Assembly Language was both challenging and rewarding. It exposed me to the inner workings of the processor, where I had to consider each clock cycle and memory access meticulously. It was a valuable learning experience, gaining a deeper understanding of how code execution can be optimized for efficiency.

Through experimentation and optimization, I realized that even small changes in code can have a significant impact on performance. This reinforced the importance of writing efficient code and the value of understanding the low-level details of a system.

In conclusion, our lab experiment in Assembly Language not only improved my technical skills but also provided a new perspective on the world of programming. It's a reminder that even in the age of high-level languages, understanding the fundamentals can make us better programmers.

 


Comments

Popular posts from this blog

Navigating the Patch Submission Processes in Open Source Communities: Linux Kernel & Python

Building GCC from Source: A Journey into Compiler Construction (Project Stage 1)

Lab 3 - Understanding arithmetic/math and strings in 6502 assembly language(SPO600)