jump if signed 16-bit number is >0, <=0, fastest code
jump if signed 16-bit number is >0, <=0, fastest code
i need to check if signed 16-bit number (in some register, of course) is >0, and perform a conditional jump. the same for <=0. of course, it can be solved with two JPs (that's what i did), but i maybe i missed some smart trick? ah, i forgot to tell that register contents could be destroyed (and accumulator is free to use too, of course). and it should work without any exceptions, so "everything except $8000", for example, is not a solution. ;-) i need code to be fastest, not smallest (but without LUTs ;-).
it is not a competition, i will simply steal your code for my Forth system. ;-)
it is not a competition, i will simply steal your code for my Forth system. ;-)
Re: jump if signed 16-bit number is >0, <=0, fastest code
You should download the M4 Forth and test a piece of code in it in the console.
I assume you will be able to remove that part
With zero it's the easy part. The more interesting code is if the constant is non-zero and the tested number is with a sign. No one has done optimized code for this.
Code: Select all
dworkin@dw-A15:~/Programovani/ZX/Forth/Pasmo_test$ ../check_word.sh 'PUSH(0) GT IF'
;[8:38] 0 > if ( x -- ) flag: x > 0
ld A, H ; 1:4 0 > if save sign
dec HL ; 1:6 0 > if zero to negative
or H ; 1:4 0 > if
ex DE, HL ; 1:4 0 > if
pop DE ; 1:10 0 > if
jp m, else101 ; 3:10 0 > if
; seconds: 0 ;[ 8:38]
dworkin@dw-A15:~/Programovani/ZX/Forth/Pasmo_test$ ../check_word.sh 'PUSH(0) LE IF'
;[8:38] 0 <= if ( x -- ) flag: x <= 0
ld A, H ; 1:4 0 <= if save sign
dec HL ; 1:6 0 <= if zero to negative
or H ; 1:4 0 <= if
ex DE, HL ; 1:4 0 <= if
pop DE ; 1:10 0 <= if
jp p, else101 ; 3:10 0 <= if
; seconds: 1 ;[ 8:38]
Code: Select all
ex DE, HL ; 1:4 0 > if
pop DE ; 1:10 0 > if
Spoiler
Code: Select all
dworkin@dw-A15:~/Programovani/ZX/Forth/Pasmo_test$ ../check_word.sh 'PUSH(100) GT IF'
;[11:46] 100 > if ( x -- )
ld A, L ; 1:4 100 > if HL>100 --> HL-100-1>=0 --> HL+(0x7FFF-100)>=0x8000
add A, 0x9B ; 2:7 100 > if HL>100 --> HL+(0x7FFF-100)>=0x8000 --> false if not overflow
ld A, H ; 1:4 100 > if HL>100 --> --> HL+(0x7FFF-100)>=0x8000 --> false if not overflow
adc A, 0x7F ; 2:7 100 > if HL>100 --> HL+(0x7FFF-100)>=0x8000 --> false if not overflow
ex DE, HL ; 1:4 100 > if
pop DE ; 1:10 100 > if
jp po, else101 ; 3:10 100 > if variant: positive constant (false if not overflow)
; seconds: 0 ;[11:46]
dworkin@dw-A15:~/Programovani/ZX/Forth/Pasmo_test$ ../check_word.sh 'PUSH(100) LE IF'
;[11:46] 100 <= if ( x -- )
ld A, L ; 1:4 100 <= if HL<=100 --> HL-100-1<=0 --> HL+(0x7FFF-100)<0x8000
add A, 0x9B ; 2:7 100 <= if HL<=100 --> HL+(0x7FFF-100)<0x8000 --> false if overflow
ld A, H ; 1:4 100 <= if HL<=100 --> --> HL+(0x7FFF-100)<0x8000 --> false if overflow
adc A, 0x7F ; 2:7 100 <= if HL<=100 --> HL+(0x7FFF-100)<0x8000 --> false if overflow
ex DE, HL ; 1:4 100 <= if
pop DE ; 1:10 100 <= if
jp pe, else101 ; 3:10 100 <= if variant: positive constant (false if overflow)
; seconds: 1 ;[11:46]
Spoiler
Code: Select all
dworkin@dw-A15:~/Programovani/ZX/Forth/Pasmo_test$ ../check_word.sh 'PUSH(-100) GT IF'
;[11:46] -100 > if ( x -- )
ld A, L ; 1:4 -100 > if HL>-100 --> HL--100>=0 --> HL-(-100-0x7FFF)>=0x8000
sub 0x9D ; 2:7 -100 > if HL>-100 --> HL-(-100-0x7FFF)>=0x8000 --> false if underflow
ld A, H ; 1:4 -100 > if HL>-100 --> HL-(-100-0x7FFF)>=0x8000 --> false if underflow
sbc A, 0x7F ; 2:7 -100 > if HL>-100 --> HL-(-100-0x7FFF)>=0x8000 --> false if underflow
ex DE, HL ; 1:4 -100 > if
pop DE ; 1:10 -100 > if
jp pe, else101 ; 3:10 -100 > if variant: negative constant (false if underflow)
; seconds: 1 ;[11:46]
dworkin@dw-A15:~/Programovani/ZX/Forth/Pasmo_test$ ../check_word.sh 'PUSH(-100) LE IF'
;[11:46] -100 <= if ( x -- )
ld A, L ; 1:4 -100 <= if HL<=-100 --> HL--100-1<0 --> HL-(-100-0x7FFF)<0x8000
sub 0x9D ; 2:7 -100 <= if HL<=-100 --> HL-(-100-0x7FFF)<0x8000 --> false if not underflow
ld A, H ; 1:4 -100 <= if HL<=-100 --> HL-(-100-0x7FFF)<0x8000 --> false if not underflow
sbc A, 0x7F ; 2:7 -100 <= if HL<=-100 --> HL-(-100-0x7FFF)<0x8000 --> false if not underflow
ex DE, HL ; 1:4 -100 <= if
pop DE ; 1:10 -100 <= if
jp po, else101 ; 3:10 -100 <= if variant: negative constant (false if not underflow)
; seconds: 0 ;[11:46]
Z80 Forth compiler (ZX Spectrum 48kb): https://codeberg.org/DW0RKiN/M4_FORTH
Re: jump if signed 16-bit number is >0, <=0, fastest code
I tried to write it optimally for any constant, so if it finds a value that can have a shorter code, it will probably have it.
Spoiler
Code: Select all
dworkin@dw-A15:~/Programovani/ZX/Forth/Pasmo_test$ ../check_word.sh 'PUSH(255) LE IF'
;[8:35] 255 <= if ( x -- )
ld A, H ; 1:4 255 <= if HL<=255 --> HL-255-1<=0 --> HL+(0x7FFF-0x00FF)<0x8000
add A, 0x7F ; 2:7 255 <= if H+0x7F+never_carry<0x80 --> false if overflow
ex DE, HL ; 1:4 255 <= if
pop DE ; 1:10 255 <= if
jp pe, else101 ; 3:10 255 <= if variant: positive constant (false if overflow) and lo is 255
; seconds: 0 ;[ 8:35]
Z80 Forth compiler (ZX Spectrum 48kb): https://codeberg.org/DW0RKiN/M4_FORTH
Re: jump if signed 16-bit number is >0, <=0, fastest code
Actually, this is still the easy part. Complexity starts to increase with 32-bit numbers. There I found out that there are some patterns of constants that will simplify the test. But because it started to get too complicated, I just wrote a general function for an n-bit number. Because I have support from 8*256 bit numbers.
There I saw other optimizations.
This mostly applies when there are bytes with some special value behind them. Quietly in the middle of the number, from a certain length, can be optimized.
I wanted to write something about it, but then I gave up. Nobody would care anyway.
There I saw other optimizations.
This mostly applies when there are bytes with some special value behind them. Quietly in the middle of the number, from a certain length, can be optimized.
I wanted to write something about it, but then I gave up. Nobody would care anyway.
Z80 Forth compiler (ZX Spectrum 48kb): https://codeberg.org/DW0RKiN/M4_FORTH
Re: jump if signed 16-bit number is >0, <=0, fastest code
You might be interested in the zero and 32 bit comparison.
or 8-bit
Spoiler
Code: Select all
dworkin@dw-A15:~/Programovani/ZX/Forth/Pasmo_test$ ../check_word.sh 'PUSHDOT(0) DGT IF'
;[10:37/23] 0. d> if ( d -- d ) carry: d > 0.
ld A, D ; 1:4 0. d> if
cp 0x80 ; 2:7 0. d> if not carry if negative
jr nc, $+7 ; 2:7/12 0. d> if negative d with positive constant --> not carry
or E ; 1:4 0. d> if
or H ; 1:4 0. d> if
or L ; 1:4 0. d> if
add A, 0xFF ; 2:7 0. d> if not carry if 0.
pop HL ; 1:10 0. d> if
pop DE ; 1:10 0. d> if
jp nc, else101 ; 3:10 0. d> if ( d -- ) bool: d > 0x00000000
; seconds: 0 ;[15:67]
dworkin@dw-A15:~/Programovani/ZX/Forth/Pasmo_test$ ../check_word.sh 'PUSHDOT(0) DLE IF'
;[10:37/23] 0. d<= if ( d -- d ) carry: d > 0.
ld A, D ; 1:4 0. d<= if
cp 0x80 ; 2:7 0. d<= if not carry if negative
jr nc, $+7 ; 2:7/12 0. d<= if negative d with positive constant --> not carry
or E ; 1:4 0. d<= if
or H ; 1:4 0. d<= if
or L ; 1:4 0. d<= if
add A, 0xFF ; 2:7 0. d<= if not carry if 0.
pop HL ; 1:10 0. d<= if
pop DE ; 1:10 0. d<= if
jp c, else101 ; 3:10 0. d<= if ( d -- ) bool: d <= 0x00000000
; seconds: 1 ;[15:67]
Spoiler
Code: Select all
dworkin@dw-A15:~/Programovani/ZX/Forth/Pasmo_test$ ../check_word.sh 'PUSH(0) CGT IF'
;[8:35] 0 c> if ( char1 -- ) bool: lo(char1) > lo(0)
ld A, L ; 1:4 0 c> if L>0 --> L-0-1>=0 --> L+(0x7F-0)>=0x80
add A, 0x7F ; 2:7 0 c> if L>0 --> L+(0x7F-0)>=0x80 --> false if not overflow
ex DE, HL ; 1:4 0 c> if
pop DE ; 1:10 0 c> if
jp po, else101 ; 3:10 0 c> if variant: positive constant (false if not overflow)
; seconds: 0 ;[ 8:35]
dworkin@dw-A15:~/Programovani/ZX/Forth/Pasmo_test$ ../check_word.sh 'PUSH(0) CLE IF'
;[8:35] 0 c<= if ( char1 -- ) bool: lo(char1) <= lo(0)
ld A, L ; 1:4 0 c<= if L<=0 --> L-0-1>=0 --> L+(0x7F-0)<0x80
add A, 0x7F ; 2:7 0 c<= if L<=0 --> L+(0x7F-0)<0x80 --> false if overflow
ex DE, HL ; 1:4 0 c<= if
pop DE ; 1:10 0 c<= if
jp pe, else101 ; 3:10 0 c<= if variant: positive constant (false if overflow)
; seconds: 0 ;[ 8:35]
Z80 Forth compiler (ZX Spectrum 48kb): https://codeberg.org/DW0RKiN/M4_FORTH
Re: jump if signed 16-bit number is >0, <=0, fastest code
oh. thank you again! i tried to invent something to "fix" zero, but my brain is not as good as it was, so i overcomplicated everything. and trying to read M4 code makes my brain hurts even more. ;-)
tbh, i forgot that i can destroy HL, and tried to get away with HL unmodified. funny, because i myself wrote in OP that keeping register value intact is not necessary. of course, without destroying HL all my attempts were slower than two jumps.
with two jumps, it is 18ts/32ts, and your code is stable 24ts. as i cannot predict the number, i believe that stable 24ts is better. it is 18/28 for 8-bit numbers, tho, so i'll prolly keep two JPs for 8-bit case.
tbh, i'm not planning to optimise generated code for arbitrary constants. it's not something that occurs often enough to worth the efforts (i think). at least not in the first version of F8. ;-)
but i definitely will take a look at your code again when i decide to make F8 v2! thank you!
Last edited by ketmar on Thu Apr 25, 2024 7:43 am, edited 1 time in total.
Re: jump if signed 16-bit number is >0, <=0, fastest code
that's why i'm not even planning to support 32-bit numbers natively. if you need 32-bit math in your game, you're doing something wrong. ;-) even 3d engines are ok with 16-bit math. and F8 is designed to write game logic and utilities anyway, with built-in asm for time-sensitive parts.
why, it would be interesting to read! and definitely of good value for others. please, do!
Re: jump if signed 16-bit number is >0, <=0, fastest code
The completely basic variant is quite easy even for an interpreter. Divide the constants into two sets.
The first 0 .. 0x7FFF is close to the number 0x8000 when adding a positive constant smaller than 0x8000 (that constant must be smaller than 0x8000). According to the result, you will know whether the number is larger or smaller, if you exceed 0x8000, it will set flags for you.
The second (negative) set of constants is close to 0x8000 if you subtract a constant smaller than 0x8000 from the tested number (again, it is important to have a positive number), and if you underflow, it will set flags again.
So the code for generation looks like it chooses whether it is closer to adding a constant and reaching 0x8000 or it is closer to subtracting a constant and reaching 0x8000. From this you can also figure out how to calculate the constant.
It is best to check it in M4. .))
The first 0 .. 0x7FFF is close to the number 0x8000 when adding a positive constant smaller than 0x8000 (that constant must be smaller than 0x8000). According to the result, you will know whether the number is larger or smaller, if you exceed 0x8000, it will set flags for you.
The second (negative) set of constants is close to 0x8000 if you subtract a constant smaller than 0x8000 from the tested number (again, it is important to have a positive number), and if you underflow, it will set flags again.
So the code for generation looks like it chooses whether it is closer to adding a constant and reaching 0x8000 or it is closer to subtracting a constant and reaching 0x8000. From this you can also figure out how to calculate the constant.
It is best to check it in M4. .))
Z80 Forth compiler (ZX Spectrum 48kb): https://codeberg.org/DW0RKiN/M4_FORTH
Re: jump if signed 16-bit number is >0, <=0, fastest code
no wai. i can read Forth, but M4 is far beyond my cognitive abilities. ;-)
Re: jump if signed 16-bit number is >0, <=0, fastest code
I suspect the easiest way is to check for 0 first, by just ORing the bytes and checking the Z flag. Assuming it's not 0, it's just a single bit test of the relevant sign bit. Should be easily adapted to any size of integer.
Whether it's the fastest sort of depends on how you're measuring it, since the speed will inevitably vary depending on which of the three states you're in and the optimal solution will depend on whether that's 0, a negative or a positive number.
Whether it's the fastest sort of depends on how you're measuring it, since the speed will inevitably vary depending on which of the three states you're in and the optimal solution will depend on whether that's 0, a negative or a positive number.
Re: jump if signed 16-bit number is >0, <=0, fastest code
Maybe if you take the upper part after a DEC and check for bit 7? Something like posted by @_dw above -
and add an INC HL before jumping if you need the original value.
Code: Select all
ld a h
dec hl
or h
jp m LE_0
Last edited by sn3j on Thu Apr 25, 2024 9:39 am, edited 2 times in total.
POKE 23614,10: STOP 1..0 hold, SS/m/n colors, b/spc toggle
Re: jump if signed 16-bit number is >0, <=0, fastest code
Ah, that's what you meant.
I don't expect anyone to understand the M4 source code. (Even if it's not that impossible, it just takes a little practice, most of the code is primitive)
I meant that you look at what the assembler writes to the console when you enter pseudo Forth code.
Just like you understood the code I copied here.
Z80 Forth compiler (ZX Spectrum 48kb): https://codeberg.org/DW0RKiN/M4_FORTH
Re: jump if signed 16-bit number is >0, <=0, fastest code
the first one is identical, but the second code does something different than you expect. Because 0x8000 will underflow into positive numbers.sn3j wrote: ↑Thu Apr 25, 2024 9:06 am Maybe if you take the upper part after a DEC and check for bit 7? Something like posted by @_dw above -orCode: Select all
ld a h dec hl or h jp m LE_0
and add an INC HL before jumping if you need the original value.Code: Select all
dec hl bit 7,h
And if you know from the previous operation that A has bit 7 unset you can spare the LD.
Z80 Forth compiler (ZX Spectrum 48kb): https://codeberg.org/DW0RKiN/M4_FORTH
Re: jump if signed 16-bit number is >0, <=0, fastest code
yes this is not right, I'll remove it from my post... thank you
POKE 23614,10: STOP 1..0 hold, SS/m/n colors, b/spc toggle
Re: jump if signed 16-bit number is >0, <=0, fastest code
that's what i did first, just in a slightly different order.
Code: Select all
ld a, h
or a
jp m, # skip
or l
jp nz, # dest
skip:
oops. i am a little late here, so i removed the outdated post part. ;-)
Re: jump if signed 16-bit number is >0, <=0, fastest code
Testing zero first is not worth it in most cases.AndyC wrote: ↑Thu Apr 25, 2024 9:00 am I suspect the easiest way is to check for 0 first, by just ORing the bytes and checking the Z flag. Assuming it's not 0, it's just a single bit test of the relevant sign bit. Should be easily adapted to any size of integer.
Whether it's the fastest sort of depends on how you're measuring it, since the speed will inevitably vary depending on which of the three states you're in and the optimal solution will depend on whether that's 0, a negative or a positive number.
It's 18 t-cloks if it were zero and 36 if not.
Possibly 20 t-clocks or 33 if not.
I have 24 t-clocks for all cases.
If zero occurs in half of the cases, it is (18+36)/2=27
Or (20+33)/2 = 26.5
This is still worse than 24 clocks despite the disproportionate occurrence of zeros to the remaining 65534 options.
Z80 Forth compiler (ZX Spectrum 48kb): https://codeberg.org/DW0RKiN/M4_FORTH
Re: jump if signed 16-bit number is >0, <=0, fastest code
This is a nice solution that you keep the intermediate result in A!ketmar wrote: ↑Thu Apr 25, 2024 9:39 am that's what i did first, just in a slightly different order.somewhat faster than checking for 0 first. it is faster for negative numbers (18ts), but slower for 0 (32ts). and if programmer wrote such check, they prolly expect to see 0 quite often. so i opted for stable 24ts version. it is also smaller.Code: Select all
ld a, h or a jp m, # skip or l jp nz, # dest skip:
oops. i am a little late here, so i removed the outdated post part.
18 or 32 t-clocks.
It didn't occur to me to test the first negative number.
That's an average of 25 clocks if half the numbers are negative, which they probably are.
Z80 Forth compiler (ZX Spectrum 48kb): https://codeberg.org/DW0RKiN/M4_FORTH
Re: jump if signed 16-bit number is >0, <=0, fastest code
…and i just found another two missing instructions in Z80: "jump if both sign and zero flags are not set", and "jump if either sign, or z flag is set". ;-) so, Z80 is missing 3 useful instructions: "EX SP, SPx", and two jumps. i really have to fix my time machine!
Re: jump if signed 16-bit number is >0, <=0, fastest code
There is an overflow attribute for signed arithmetic. I ignored PO and PE for a long time before I realized how to use it. And how does the credit actually work at all. It is best to read the "Z80 Family CPU User Manual.pdf" directly. Most sites deal only with carry and zero flag. At the same time, "overflow" is just as strong/effective as carry. Only for the value 0x7FFF-0x8000 on the difference from 0xFFFF-0x0000 for carry.
If it's zero value, then you don't need to carry + zero flag in the >,<,<= or >= comparison. The result is enough for you. And the overflow flag will tell you.
Zero is just one case out of 65536, which can be written more optimally. An 8-bit comparison does not differ in any way from a comparison of <= 13, for example.
If it's zero value, then you don't need to carry + zero flag in the >,<,<= or >= comparison. The result is enough for you. And the overflow flag will tell you.
Zero is just one case out of 65536, which can be written more optimally. An 8-bit comparison does not differ in any way from a comparison of <= 13, for example.
Z80 Forth compiler (ZX Spectrum 48kb): https://codeberg.org/DW0RKiN/M4_FORTH
Re: jump if signed 16-bit number is >0, <=0, fastest code
I'm sure that 40 years ago the creators of the Z80 (8080?) had it well thought out.
It was just forgotten along the way.
When you write in assembler, you don't choose a signed number, but you have something easier for a loop. The best ending in zero. Just adjust the problem to something easier.
If you write in C, you completely ignore this, so that the translator can deal with it.
When you write FORTH, you are in an abnormal situation. It's supposed to be signed and 16-bit...
It was just forgotten along the way.
When you write in assembler, you don't choose a signed number, but you have something easier for a loop. The best ending in zero. Just adjust the problem to something easier.
If you write in C, you completely ignore this, so that the translator can deal with it.
When you write FORTH, you are in an abnormal situation. It's supposed to be signed and 16-bit...
Z80 Forth compiler (ZX Spectrum 48kb): https://codeberg.org/DW0RKiN/M4_FORTH
- Antonio Luque
- Drutt
- Posts: 8
- Joined: Wed Jul 08, 2020 5:04 pm
- Location: Spain
Re: jump if signed 16-bit number is >0, <=0, fastest code
This takes 26ts, but preserve HL:
Code: Select all
xor a
sub l
sbc a,a
sub h
jp m,GreaterThan0
Re: jump if signed 16-bit number is >0, <=0, fastest code
yet 16-bit ADD sets only carry flag. and for ADC/SBC you need another free 16-bit register, and they are slower (and you need to reset carry too). even if you have enough free registers, it will be slow. and there is no "CP HL, r16" instruction for non-destructive 16-bit comparisons. so i believe that two extra jumps would be useful, even with 14ts instead of 10ts.
the sad thing is that "0< IF" and "0>= IF" checks are very useful. and the compiler has to invert conditions to skip over "if" body. and i am used to write code like this (because on x86 it doesn't matter, there are all kinds of jumps there). so i want the compiler to generate something more-or-less good for such cases.
the sad thing is that "0< IF" and "0>= IF" checks are very useful. and the compiler has to invert conditions to skip over "if" body. and i am used to write code like this (because on x86 it doesn't matter, there are all kinds of jumps there). so i want the compiler to generate something more-or-less good for such cases.
Re: jump if signed 16-bit number is >0, <=0, fastest code
Here's another one:
which is 31T for Zero, 20T for negative numbers (50% of all cases) and 26T for the positive numbers.
Average should be 23T, depending on how your cases are distributed.
Code: Select all
ld a, h
add a
jr c LE_0
or l
jr z LE_0
Average should be 23T, depending on how your cases are distributed.
POKE 23614,10: STOP 1..0 hold, SS/m/n colors, b/spc toggle
Re: jump if signed 16-bit number is >0, <=0, fastest code
sadly, this doesn't work with $8000. that pesky $8000 again! ;-)Antonio Luque wrote: ↑Thu Apr 25, 2024 12:36 pm This takes 26ts, but preserve HL:
Code: Select all
xor a sub l sbc a,a sub h jp m,GreaterThan0
- Antonio Luque
- Drutt
- Posts: 8
- Joined: Wed Jul 08, 2020 5:04 pm
- Location: Spain