Re: Optimization questions – HP-15C
Message #40 Posted by Marcel Samek on 1 Nov 2012, 7:08 p.m.,
in response to message #39 by Thomas KlemmThomas,
Thank you very much for your all your truly excellent suggestions. I am not sure you realize how much you helped!
I ran some instrumented profiles on the algorithm earlier today and it turns out that roughly 80% of the subroutine calls to mod() were coming from the main loop (LBL 9) and the other 20% from the change() subroutine (LBL 6).
Your suggestions for removing the calls to MOD in the LBL 9 subroutine immediately eliminated roughly 80% of the calls to MOD
Furthermore, the few bytes of memory saved by yours and others various suggestions allowed me to put in an optimization in the main loop that reduced the amount of work that it performs by roughly 40%. Previously, it would check whether a digit was found in the current row, column, and block with each of those checks setting a flag if it was found. Only after all three checks had been made, was there a conditional to check the flag and try the next digit if the one being tried was already found somewhere. With a few extra bytes of memory available, I was able to put that conditional check after each check, so, for example, it will not do the work to check the column or the block, if the digit was already found in the row.
I have not included your reordering of the subroutines yet. When I did the instrumented profiles I got data on how often each subroutine is called from each different place in the code. I am now pondering how to use that data to optimally arrange the subroutines that are called from multiple places.
I have not yet included your suggestion on changing the LBL 4 subroutine yet, but I will investigate it.
I am attaching a new annotated listing and a nonpareil state file which reflect the current state of the code I am working with.
One question: When I use nonpareil, I am always having to issue the DIM command. It never seems to be sticky. Is there something I can do about that or do I have to live with it?
Current annotated listing:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; mod(x, y)
; Returns y % x. Note the RND appears necessary because of precision issues.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
LBL 0
/ ; y / x – start with the floating point division
LASTX ; put x back onto the stack
X<>Y ; switch x & y so that we can operate on y
FRAC ; take the fractional part of y
* ; and multiply it with the original x to get the modulus
RND ; have to round on the 15c to ensure it's an integer
RTN
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; getPart(x) ; Returns the integer representing the entire Xth row of the flag matrix ; Row numbers start at 0. ; returns value in x – input parameter x ends up in y ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; LBL 1 ENTER ENTER ; duplicate the parameter so we can leave it for the caller 2 ; 26 is the starting register for the flag matrix 6 GSB 3 ; set the indirect register to the row specified by x RCL (i) ; retrieve the entire row from the flag matrix RTN
; subroutine to set the indirect register and remove the parameters from the stack LBL 3 + ; x+y is the memory offset we want STO I ; put it in the indirect register RDN ; get rid of the sum from the stack RTN
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; setPow2(x) ; Sets the utility temp register to 2^(x-1). Leaves x in place. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; LBL 5 STO 0 ; store the input X in the temp register 1 ; we want to subtract 1 from the exponent – ; calculate x-1 2 ; set the base as 2 X<>Y ; the y^x function wants x and y reversed y^x ; calculate the value X<>0 ; stuff the result in the temp register and restore the input x RTN
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; setU(x) ; Sets/clears the flag matrix values (used to indicate that x is set at the current row/column/block) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; LBL D GSB 5 ; calculate the bit value (for the row) we need to set or clear in the existing values RCL 2 ; Get the current row index into x GSB B ; set the appropriate flag matrix values and calculate the new bit value for the column RCL 3 ; Get the current column index into x GSB B ; set the appropriate flag matrix values and calculate the new bit value for the block RCL 4 ; Get the current block index into x
; MUST IMMEDIATELY FOLLOW PRECEEDING SUBROUTINE ; utility subroutine for setting flag matrix values
LBL B GSB 1 ; get the current flag matrix row at index x
RCL 0 ; get the temp register which holds the bit value we will be setting F? 3 ; flag 3 indicates if we are setting or clearing the flag CHS ; if we are clearing, we will do a subtraction instead + ; add the values which is equivalent to a boolean operation either setting or clearing the flag
X<>Y ; bring the row index back into x 2 ; 26 is the starting register for the flag matrix 6 GSB 3 ; set the indirect register so that we are ready to store the new value STO (i) ; store the new value into the flag matrix RDN ; get rid of the new value to restore the stack 9 ; just for convenience, the next bit value will be 9 bits to the left + ; set the next bit index GTO 5 ; calculate the value with that bit set ; we GTO instead of GSB and it will take care of returning for us
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; putA(x) ; Set the value x into the current row/column in the trial solution. ; Does it by subtracting the previous value and adding the new one. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; LBL 7 X <> 6 ; swap new value with register that holds current value STO 0 ; store the old value in the temp register RCL 2 ; Get the current row index into x 1 ; 17 is the starting register for the current trial solution 7 GSB 3 ; Set the indirect register RCL (i) ; Get the current value for the entire row RCL 6 ; Get the new value RCL- 0 ; subtract the old value from the new value RCL* 5 ; shift the power of 10 to the appropriate column + ; add to the old value STO (i) ; store the new row value from where we got it RTN
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; change(x) ; Increments or decrements the current position in the trial solution. ; Updates the registers containing the current row, column and block index, ; as well as those containing the power of 10 factor for the current column and others ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; LBL 6 STO+ 1 ; x holds +1 or -1; Register 1 is the current index
RCL 1 ; get the current index (0 to 80) RCL 1 ; get the current index (0 to 80) 9 ; integer divide by 9 to get the row index (0 to 8) / ; no integer divide on 15c so do a floating point divide INT ; use the INT operator to finish of the integer divide STO 2 ; register 2 contains the current row index
9 * – ; col = index – 9 * row STO 3 ; register 3 contains the current column index
3 ; calculate the block index from the row & column indexes / ; block = INT(col/3) + INT(row/3)*3 INT ; it would be nice to save a couple of bytes in this section of code RCL 2 3 / INT 3 * + STO 4 ; register 4 holds the block index
8 ; now calculate the power of 10 of the current column RCL- 3 ; Get the digit (from right) based on the column 10^X ; calculate the exponent STO 5 ; save in register 5 which is used throughout the code
RCL 2 ; get the current row 1 ; 17 is the start register of the current trial solution 7 GSB 4 ; extract the value at the current column STO 6 ; register 6 contains the current trial value at the current row/column
RCL 2 ; get the current row 8 ; 8 is the start register of the input data from the user GSB 4 ; extract the value at the current column STO 7 ; register 7 contains the starting value at the current row/column (0 if none) RTN
; Utility routing to extract the value at the current column from the matrix indirectly specified by x&y LBL 4 GSB 3 ; set the indirect register based on x & y RCL (i) ; get the row from the matrix passed in RCL / 5 ; shift the row to the right so that digit we are interested in is least significant INT ; trim off the digits shifted to the right of the decimal 1 ; we will do a modulus 10 to extract the last digit 0 GTO 0 ; do the modulus operation ; do a GTO instead of GSB and it will return for us
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; main() ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; LBL A CF 2 ; make sure flag 2 is unset – CLR REG does not do this CF 3 ; make sure flag 3 is unset – CLR REG does not do this 1 ; start with a index in register 1 of -1 (0 to 80) CHS ; that way we can start with an increment operation STO 1 ; and actually start at 0 where we want.
LBL 2 ; we are going to loop through all input values and set the flags to show they are set 1 ; go forward one position at a time GSB 6 ; go to the next position in the trial solution and set all temp variables
RCL 7 ; get the starting input value at this row/col GSB 7 ; set the value in the trial solution RCL 7 ; get the starting input value again because the last call destroyed it TEST 1 ; if it is > 0 then the user actually input a value for this row/col GSB D ; set the flags to indicate this value is set
8 ; 80 is the upper bound of the indexes (9×9 = 80 = 0:80) 0 RCL 1 ; get the current index TEST 6 ; if the current index hasn’t reached 80 GTO 2 ; do the next value 1 ; reset the starting value CHS ; to -1 as we did at the beginning of the program STO 1 ; register 1 holds the current index
LBL E ; m1 ; main solution loop 8 ; when we reach the last index (80) we are done 0 RCL 1 ; register 1 holds the current index TEST 5 ; see if we are at the end RTN ; finished ; woohoo – we are done! 1 ; Go forward one spot GSB 6 ; Do the position increment RCL 7 ; get the starting input value at this row/col TEST 1 ; if it’s > 0, the user specified a value here GTO E ; go forward, since this value was explicitly specified by the user and doesn’t need to be solved for GSB 7 ; Set the value in the trial solution
LBL 8 ; m2 9 ; we check the possible digits in order 1-9. If we are on nine, we’ve tried them all at this location RCL 6 ; Get the current trial solution value TEST 5 ; Check to see if it is 9 GTO C ; If it is, backup one step 1 ; We weren’t at 9 yet, so increment the value by 1 + GSB 7 ; Set the value in the trial solution
RCL 6 ; Get the current trial solution value GSB 5 ; Calc 2^x-1 to get the bit mask CF 2 ; Clear the flag we will use to see if the test value has already been used in row/column/block RCL 2 ; Get the current row index into x GSB 9 ; see if the current trial solution value has already been used in the row F? 2 ; If the number has been used in the block, try the next value GTO 8 RCL 3 ; Get the current column index into x GSB 9 ; see if the current trial solution value has already been used in the column F? 2 ; If the number has been used in the block, try the next value GTO 8 RCL 4 ; Get the current block index into x GSB 9 ; see if the current trial solution value has already been used in the block F? 2 ; If the number has been used in the block, try the next value GTO 8 RCL 6 ; Get the current trial solution value GSB D ; set the flags to indicate this value is set GTO E ; move on to the next position in the puzzle
LBL C ; m3 ; Come here to back up to the previous position 1 ; We will go one spot backwards CHS GSB 6 ; Set the new current position and all temp values TEST 1 ; the previous call leaves the starting value at this position in X GTO C ; if that value is > 0, meaning the value was set, backup one more spot RCL 6 ; Get the current trial solution value
SF 3 ; Set the 3 flag to indicate we will be clearing the flag matrix bits, not setting them. GSB D ; Set/Clear the flag matrix bits CF 3 ; unset the 3 flag GTO 8 ; check the next digit
LBL 9 GSB 1 ; get the appropriate row (x) from the flag matrix RCL/ 0 ; get the temp register stuffed by the caller INT 2 ; if the bit is set, there will be a remainder when dividing by two / FRAC TEST 1 ; if the bit is set, set user flag 2 which is used as a return value SF 2 RDN ; move the stack down to prepare the caller for the next call RDN ; move the stack down to prepare the caller for the next call 9 ; the bits flags for row/column/block are shifted by 9 from each other + ; calculates the appropriate bit offset for the next call GTO 5 ; calc 2^x-1 to get the bit mask ; do a GTO instead of GSB and it will return for us
Nonpareil state file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE state SYSTEM "nonpareil.dtd">
<state version="1.00" model="15C" platform="voyager" arch="nut">
<ui/>
<chip name="Nut">
<registers>
<reg name="a" data="00000000000000"/>
<reg name="b" data="00000000000000"/>
<reg name="c" data="00000000000eae"/>
<reg name="m" data="00000000000000"/>
<reg name="n" data="00000000000000"/>
<reg name="g" data="09"/>
<reg name="p" data="c"/>
<reg name="q" data="4"/>
<reg name="q_sel" data="0"/>
<reg name="fo" data="00"/>
<reg name="s" data="0800"/>
<reg name="pc" data="0000"/>
<reg name="stack" index="0" data="009b"/>
<reg name="stack" index="1" data="0042"/>
<reg name="stack" index="2" data="0000"/>
<reg name="stack" index="3" data="0000"/>
<reg name="decimal" data="0"/>
<reg name="carry" data="0"/>
<reg name="awake" data="0"/>
<reg name="pf_addr" data="00"/>
<reg name="ram_addr" data="007"/>
<reg name="active_bank" index="0" data="0"/>
<reg name="active_bank" index="1" data="0"/>
<reg name="active_bank" index="2" data="0"/>
<reg name="active_bank" index="3" data="0"/>
<reg name="active_bank" index="4" data="0"/>
<reg name="active_bank" index="5" data="0"/>
<reg name="active_bank" index="6" data="0"/>
<reg name="active_bank" index="7" data="0"/>
<reg name="active_bank" index="8" data="0"/>
<reg name="active_bank" index="9" data="0"/>
<reg name="active_bank" index="a" data="0"/>
<reg name="active_bank" index="b" data="0"/>
<reg name="active_bank" index="c" data="0"/>
<reg name="active_bank" index="d" data="0"/>
<reg name="active_bank" index="e" data="0"/>
<reg name="active_bank" index="f" data="0"/>
</registers>
</chip>
<chip name="Voyager LCD">
<registers>
<reg name="enable" data="1"/>
<reg name="blink" data="0"/>
</registers>
</chip>
<memory as="ram">
<loc addr="000" data="00000000000000"/>
<loc addr="001" data="00000000000000"/>
<loc addr="002" data="00000000000000"/>
<loc addr="003" data="00000000000000"/>
<loc addr="004" data="00000000000000"/>
<loc addr="005" data="00000000000008"/>
<loc addr="006" data="0000000000000c"/>
<loc addr="007" data="00000000000eae"/>
<loc addr="008" data="00000000000000"/>
<loc addr="009" data="2faf8befbe2280"/>
<loc addr="00a" data="bef282bcbcaf80"/>
<loc addr="010" data="00000000000000"/>
<loc addr="011" data="00000000000000"/>
<loc addr="012" data="00000000000000"/>
<loc addr="013" data="00000000000000"/>
<loc addr="014" data="00000000000000"/>
<loc addr="015" data="c0d2d2d2d2d2d2"/>
<loc addr="016" data="00000000000000"/>
<loc addr="016" data="000000000003e1"/>
<loc addr="017" data="00000000000000"/>
<loc addr="018" data="00000000000000"/>
<loc addr="019" data="00000000000000"/>
<loc addr="01a" data="00000000a00000"/>
<loc addr="0c0" data="00000000000000"/>
<loc addr="0c1" data="00000000000000"/>
<loc addr="0c2" data="00000000000000"/>
<loc addr="0c3" data="00000000000000"/>
<loc addr="0c4" data="00000000000000"/>
<loc addr="0c5" data="00000000000000"/>
<loc addr="0c6" data="00000000000000"/>
<loc addr="0c7" data="00000000000000"/>
<loc addr="0c8" data="00000000000000"/>
<loc addr="0c9" data="00000000000000"/>
<loc addr="0ca" data="00000000000000"/>
<loc addr="0cb" data="00000000000000"/>
<loc addr="0cc" data="00000000000000"/>
<loc addr="0cd" data="00000000000000"/>
<loc addr="0ce" data="00000000000000"/>
<loc addr="0cf" data="00000000000000"/>
<loc addr="0d0" data="00000000000000"/>
<loc addr="0d1" data="00000000000000"/>
<loc addr="0d2" data="00000000000000"/>
<loc addr="0d3" data="00000000000000"/>
<loc addr="0d4" data="00000000000000"/>
<loc addr="0d5" data="00000000000000"/>
<loc addr="0d6" data="00000000000000"/>
<loc addr="0d7" data="00000000000000"/>
<loc addr="0d8" data="00000000000000"/>
<loc addr="0d9" data="00000000000000"/>
<loc addr="0da" data="00000000000000"/>
<loc addr="0db" data="00000000000000"/>
<loc addr="0dc" data="00000000000000"/>
<loc addr="0dd" data="00000000000000"/>
<loc addr="0de" data="00000000000000"/>
<loc addr="0df" data="00000000000000"/>
<loc addr="0e0" data="00000000000000"/>
<loc addr="0e1" data="0000000015faf9"/>
<loc addr="0e2" data="c4c432ff71a3fd"/>
<loc addr="0e3" data="f2ebe0cf210918"/>
<loc addr="0e4" data="43ff2d33ff361c"/>
<loc addr="0e5" data="7126c3f10c1e2d"/>
<loc addr="0e6" data="361852ff293418"/>
<loc addr="0e7" data="52ff29331852ff"/>
<loc addr="0e8" data="293242ff253627"/>
<loc addr="0e9" data="faf11c7536f908"/>
<loc addr="0ea" data="271e713726f1b2"/>
<loc addr="0eb" data="7531f0f80e41c3"/>
<loc addr="0ec" data="f1127631f0f82d"/>
<loc addr="0ed" data="7137273726f102"/>
<loc addr="0ee" data="41c3f143ff42ff"/>
<loc addr="0ef" data="0a10f0f1ebe5cf"/>
<loc addr="0f0" data="862304b24724f8"/>
<loc addr="0f1" data="324624f7f13245"/>
<loc addr="0f2" data="cca3cff844fafc"/>
<loc addr="0f3" data="f3ebfdf332ebfd"/>
<loc addr="0f4" data="f343fbfcf942eb"/>
<loc addr="0f5" data="fdf9313181df06"/>
<loc addr="0f6" data="b296fac5cfa0cf"/>
<loc addr="0f7" data="368623f7f13240"/>
<loc addr="0f8" data="56ef0715faf9c4"/>
<loc addr="0f9" data="9623f6f2c5fac3"/>
<loc addr="0fa" data="53ff30210b342b"/>
<loc addr="0fb" data="332b32250db280"/>
<loc addr="0fc" data="cdc5f2fbf14005"/>
<loc addr="0fd" data="b2c497fa03b286"/>
<loc addr="0fe" data="23f6f2c1c101b2"/>
<loc addr="0ff" data="b5fca3c5b1fd00"/>
</memory>
</state>
|