decr count, incr src & dst, iterate if needed
dx || cx is 32-bit byte count
bx || ax is 32-bit actual count used
compute bytes - actual count
dx || cx is # bytes not yet processed
see if it is 0
if more bytes then go to L7
keep testing
if loop done, fall through