David Stafford's Journal

Counting cycles ~~~~ Twiddling bits

A Shorter Sort — Found After 34 Years

The human mind can be a funny thing. It can hold a problem in reserve, sometimes for years, patiently waiting for its chance to reassert it when the time is right. This was one of those problems.

Back in 1991, I needed to sort a small array of integers within the Turbo C startup code. I could have linked in qsort() from the standard library, but it would have been overkill and, worse, would have increased the size of every program built with Turbo C by about a kilobyte. That was unacceptable so I set about making the smallest sort function I could.

I built a working 25-byte sort. I felt certain a smaller one was possible and though I made many attempts, and managed to find more sorts at 25 bytes, I could not do any better. It was as if 25 was some kind of fundamental barrier to how small it could be. It bothered me but I was out of time and had to put the problem aside and move on.

;------------------------------------------------------------------------
; cdecl void sort25( int count, int *array )
;
; Sorts an array of ints. C-callable. Time complexity is quadratic (n^2).
;
; This is the original sort I shared in 1991.
;
; Size: 25 bytes
;
; In:   array to sort
;       count of integers
;
; Out:  array is sorted
;------------------------------------------------------------------------

		public	_sort25

sort25_proc	proc	near

@@top:		mov	dx,[bx]		;swap two adjacent integers
		xchg	dx,[bx+2]
		xchg	dx,[bx]
		cmp	dx,[bx]		;are they in the right order?
		jl	@@top		;no, swap them back
		inc	bx		;bump the pointer
		inc	bx
		loop	@@top

_sort25:	pop	dx		;get return address (entry point)
		pop	cx		;get count
		pop	bx		;get array
		push	bx		;save array
		dec	cx		;(1) decrement count
		push	cx		;save count
		push	dx		;save return address

		jg	@@top		;(1) at least two to sort?

		ret

		endp

Fast-forward 34 years to 2025. I had just retired and one night, feeling sleepless, and, for reasons I can't explain, this ancient problem returned to mind. I reached for my Kindle Scribe on the nightstand and began sketching out sorts in 8086 assembler. After an hour or so of making no progress, I fell asleep.

The same thing happened the following night. And the next. I found more sorts, just as I did in 1991, and hit the same frustrating barrier. Why my mind fixated on this problem is a mystery.

After the third night, as I stood in my morning shower, it suddenly occurred to me why I was hitting the 25-byte wall and how to get past it. The structure of all the 25-byte sorts had been the common "selection sort" pair of nested loops. But we don't need two! We can do it with just one loop. The idea is to loop through the array comparing adjacent items and if there are no swaps, exit the function, we're done. If a swap is required, do it, and structure the code flow so the swap simply falls through to the entry point of the function -- effectively restarting it from scratch. This results in one less branch, saving two bytes.

;------------------------------------------------------------------------
; cdecl void sort23( int count, int *array )
;
; Sorts an array of ints. C-callable.
;
; Caution: This sort has cubic (n^3) performance. It's fine for
;          a few hundred integers. Beware of sorting thousands.
;
; Size: 23 bytes
;
; In:   array to sort
;       count of integers
;
; Out:  array is sorted
;------------------------------------------------------------------------

		public	_sort23

sort23_proc	proc	near

@@top:        	mov	ax,[bx]		;get the first integer of a pair
		inc	bx		;bump pointer
		inc	bx
		cmp	ax,[bx]		;compare to the second integer
		jle	@@bottom	;no swap? continue the loop

		xchg	ax,[bx]		;swap the integer pair
		xchg	ax,[bx-2]

		; If we get here, there has been a swap, and we do not
		; continue looping over the array. Instead, the code
		; path falls through to the entry point which starts
		; everything over from the beginning.

_sort23:	pop	ax		;get return address
		pop	cx		;cx=count
		pop	bx		;bx=array
		push	bx		;save array
		push	cx		;save count
		push	ax		;save return address

@@bottom:	dec	cx		;decrement count
		jg	@@top		;at least two to sort?

		ret

		endp

There is a price to pay for its smaller size. It has cubic runtime performance so it should only be used on very small arrays. And I wouldn't trade quadratic performance for cubic performance to save two bytes. Not in 2025. But if it was 1991... and if the array was certain to be small? Hmmmm... I would have to think very hard about it.

I wondered: How small could it be if it didn't have to obey any calling convention?

;------------------------------------------------------------------------
; Sorts an array of ints. NOT C-callable.
;
; Same algorithm as _sort23 but optimized for size by removing the
; constraint of following a calling convention. This reduces the code to
; its purest and simplest form. As with _sort23, the runtime complexity
; is cubic, and sorting more than a few hundred items is not recommended.
;
; Size:      18 bytes
;
; In:        bx = array to sort
;            dx = count of integers
;            direction-flag = clear
;
; Out:       array at bx is sorted
;
; Preserves: bx, dx, bp, di
;
; Destroys:  ax, cx, si
;------------------------------------------------------------------------

		public	_sort18

sort18_proc	proc	near

@@top:		lodsw			;fetch integer
		cmp	ax,[si]		;compare to next integer
		jle	@@bottom	;no swap? continue the loop

		xchg	ax,[si]		;swap the integer pair
		xchg	ax,[si-2]

		; If we get here, there has been a swap, and we do not
		; continue looping over the array. Instead, the code
		; path falls through to the entry point which starts
		; everything over from the beginning.

_sort18:	mov	si,bx		;si=array (entry point)
		mov	cx,dx		;cx=count

@@bottom:	dec	cx		;decrement count
		jg	@@top		;at least two to sort?

		ret

		endp

18 bytes. Nice. Those 8- and 16-bit days were special for size optimization.

The source code, Borland's assembler and linker, and prebuilt binaries, are available in the link below in testsort.zip. The source includes nine sorting algorithms, a test harness to generate sample data, validate correctness, and report results, in 998 total bytes of 8086 assembly.

Download link: testsort.zip

About Me

I recently semi-retired. Over the years I’ve been a software engineer, scientist, startup co-founder, and leader of mid-sized engineering and science groups. It was a deeply fulfilling career and I would happily do it all over again.

There were two aspects of my working life I was particularly passionate about:

  1. As an individual contributor—performance optimization. I loved the puzzle of mapping a problem to the capabilities of the machine. I wrote a lot of assembler code in the 8- and 16-bit days, when getting a computer to do even simple things was a challenge.[1] That interest has never left me and I look forward to continuing to pursue it.
  2. As an organizational leader responsible for delivering new technologies—building and guiding teams to function and succeed under conditions of high invention risk and ambiguity. I often worked with people who were much smarter than me and while I couldn't solve their technical problems myself, what I could do was provide them with a structured approach where, if a solution existed, we'd find it.
Work with the best people on the worthiest problems. Take big risks and make big mistakes. Occasionally build something great.

That is the motto that guided my entire career. I worked with outstanding people on projects with real potential to make a better world. There were some successes and more failures than I would have liked, but my mistakes were my own and I learned so much along the way. I hope to share some of what I learned in this journal.

To everyone I had the honor of working with over the years: Thank you for the memories. You were wonderful!