Decoded: Sopwith

Sopwith (1984) by David L. Clark

My school computer teacher used Sopwith as a reward for suffering through several rounds of Math Blaster. The educational 'games' were a drag in school, but soon enough we were given the thumbs up to pop in the 5.25" disk labeled 'Sopwith'. Back in the mid 1980s, our computer lab had two shades of monochrome: lime green and amber. I always went for the green.

Our version of the game was probably the earliest -- it didn't include birds, or missiles, networking, and I don't even recall that little oxen at the end of the runway. Initially, I wasn't very good against the AI planes. It never seemed worth it to engage them in dogfights when the purpose of the game was to bomb all the buildings. I didn't have a serious breakthrough until I asked the teacher to let me borrow one of the disks to take home to my XT. After that, it was game over back at school.

I played Sopwith on and off again over the years, but it finally disappeared from my collection sometime after 2000. I rediscovered the game in 2012 and made a fun free app for iOS and Android. My final tribute for this amazing game (in 2017) will be to decode the source for beginning programmers who may be interested in code archeology. Here are some of the learning highlights:

  • Hand-crafted x86 assembly that makes efficient use of DOS API
  • Managing that CGA memory one bit at a time
  • Pixel-level collision detection
  • Direct PC speaker programming
  • Replacing OS interrupt handlers to incorporate game mechanics
  • Atari code! Yes, Sopwith was built for IBM and Atari
  • ...and how to survive code with a whole lot of bit rot!

  • Sopwith Source Code

    David Clark was gracious enough to release the source code for one of his later editions several years ago. Let's take a closer look at the code statistics:

    Lines of C Code: 6,000 (doesn't include unused/duplicated source)
    Lines of Assembly: 2,500
    Number of functions: 250

    Here are the 28 source files with links to my line-by-line code walkthroughs. If you're really interested in reading the entire walkthrough then please help me save bandwidth by downloading it compressed.

    Source File Purpose Code
    BMBLIB.C Console argument, register management routines, and legacy wrappers (Code w/lines) (Code Walkthrough)
    SWASYNIO.C Network communication support functions (dead code) --unused--
    SWAUTO.C Enemy AI routines (Code w/lines) (Code Walkthrough)
    SWCOLLSN.C Handles in-game object collisions (Code w/lines) (Code Walkthrough)
    SWDISP.C Extended drawing actions for drawing individual object types (Code w/lines) (Code Walkthrough)
    SWEND.C End game routines for winning and losing (Code w/lines) (Code Walkthrough)
    SWGAMES.C The data for the game's scenario. Structure defined in SW.H (Code w/lines) (Code Walkthrough)
    SWGROUND.C The game map as an array of ground height values (Code w/lines) (Code Walkthrough)
    SWGRPHA.C The game renderer plus helper functions for Atari (dead code) --unused--
    SWINIT.C Sets up main game environment and gets play mode from the title (Code w/lines) (Code Walkthrough)
    SWMAIN.C Game entry point. Global declarations and main game loop is here (Code w/lines) (Code Walkthrough)
    SWMISC.C Text printing wrapper invoking assembly in SWUTIL.ASM (Code w/lines) (Code Walkthrough)
    SWMOVE.C Movement and other actions for game objects (Code w/lines) (Code Walkthrough)
    SWMULTIO.C Network communication support functions (dead code) --unused--
    swnetio.c Alternate to SWMULTIO for networking (worse than dead - not linked) --unused--
    SWOBJECT.C Low-level memory pool allocator for game object (Code w/lines) (Code Walkthrough)
    SWPLANES.C Graphics for the planes hardcoded in various positions --Raw data, see SWSYMBOL.C--
    SWSOUND.C PC Speaker sound manager (Code w/lines) (Code Walkthrough)
    SWSYMBOL.C Graphics for buildings, bombs, and other effects (Code w/lines) (Code Walkthrough)
    SWTITLE.C Draws elements of the title screen (Code w/lines) (Code Walkthrough)
    _INTC.C Interrupt handler management (Code w/lines) (Code Walkthrough)
    SWCOMM.ASM Communications assembly (dead code) --unused--
    SWGRPH.ASM Assembly for graphics for IBM (Code w/lines) (Code Walkthrough)
    SWHIST.ASM Debugging tool? Record input (not malicious or even used) --unused--
    SWUTIL.ASM A lot of utilities in assembly (My favorite Sopwith source file!) (Code w/lines) (Code Walkthrough)
    _DKIO.ASM Disk manager (mostly dead) --unused--
    _INTA.ASM Replacing standard interrupts (Code w/lines) (Code Walkthrough)
    _INTS.ASM Assembly procedures to enable/disable interrupts (Code w/lines) (Code Walkthrough)


    This version of Sopwith is designed to run on a personal computer using the Microsoft DOS operating system. The target display uses CGA hardware, while the sound is on-board PC Speaker. Between the hardware and the game lies an interface of io ports and interrupts. In the 1980s, there weren't robust software libraries to mediate between applications, OS, and hardware so developers had to incorporate their own drivers to the hardware. For Sopwith, this meant direct management of the keyboard, CGA card, and PC speaker. Here is a simplified view of the target architecture that Sopwith was built for:

    Architecture of an IBM PC used for Sopwith

    The mid 1980s was still too early to push the concept that the engine and the game logic should be handled separately. It would be a few years before the shareware revolution unveils Apogee and the superstars at id software who push the capabilities of the PC to its limits.

    Sopwith does show signs of this eventual separation. Consider this (simplified) source map overlaid on the previous architecture diagram:

    Sopwith code organization as related to an IBM PC

    We see that the game logic files only interact with memory and are written entirely in C. But the source that interacts with hardware is written in assembly. Logically, there is natural separation between these tasks although the files remain highly coupled in Sopwith. The Hovertank/Wolfenstein 3-D engine in 1991 is an easy example where the engine and the game are completely independent components. One element that had well-developed by the mid 1980s is the idea of a game loop. We'll take a closer look at that below.

    Code Quirks

    Today, a complete remake of Sopwith is a beginner project that could probably be done in less than 1000 lines in C using SDL. However for novice code readers, David Clark's original work probably crosses the line in to intermediate for three reasons: This C predates even the earliest standards. Over 25% of the game uses 16-bit segmented, x86 assembly. Finally, there's a lot of dead code...routines with no way for control to flow there.

    K&R C

    Most of the code was written in the early to middle 1980s, years before the ANSI C standard. So we see many things considered unusual today. None of these differences are difficult to grasp per-se, but they do slow down the code reading. One example is that Dave tried to keep variables names at 8 characters or less, sometimes leading to unusual abbreviations. Another example is the K&R style function definitions:

    Both accomplish the same thing, but K&R might take newcomers a minute to adjust their eyes.

    Lots of assembly

    The second challenge to this code is the sheer amount of assembly for such a small code base. This was necessary at the time due to lack of standard library functions/wrappers and possibly for better speed (low call overhead). Portability between architectures and operating systems was an afterthought because there just wasn't that many options. Although if you look closely, you'll see a few pre-processor statements that manage things like Atari. The good news is that the assembly is fairly well documented. I discovered quite a few interesting functions that I didn't know or haven't seen in ages, for example:

    Shifting a 32 bit value stored across two different 16 bit registers. Today, we have 32 or 64 bit registers so just store the value in one register and shift. In the 1980s, shifting a large number without using extra registers looked something like this:

    Shifting values across multiple registers

    We can't simply shift 3 bits, instead we have to shift the most significant register (DX) 1 time. The LSB in the register moves in to the carry flag in the FLAGS register. Then we rotate through carry 1 bit on AX. This brings the carry flag in to the MSB, while the LSB from this registers goes back in to carry. Repeat this for the amount you want to shift.

    This example appears several times throughout the code and there are a dozen other interesting snippets that have effectively disappeared these days. The best way to pick up x86 assembly is to understand the thought process, read a lot of examples and test if possible. GAS or MASM are common choices for assemblers. Here are some of the key points about Sopwith assembly:

  • The syntax is Intel (my preferred): command, destination, source
  • This is DOS, so we have to use segments and offsets, near and far, etc
  • We're in real mode so we're stuck with 20-bit memory and fixed i/o ports
  • Sopwith ASM procedures use custom calling conventions based on actual need
  • ASM functions use the 'public' directive and are resolved at link time
  • Now we'll take a brief look at some specific assembly instructions. I counted a total of 64 unique instructions, which means that Sopwith used most of the original ~90 instruction mnemonics available in the 8086/8088. But as you can probably guess, the top 20 instructions account for around half of the total assembly:

    Assembly in Sopwith - top 20 instructions by count

    It's not a surprise that MOV is the big winner in terms of instruction count. (Ignoring that the MOV mnemonic comprises many variants). Much of the work in assembly is setting up the operands for a more serious instruction. The next several instructions make up the basis for function calls. Pushing, popping, calling, and returning. There's plenty of information about these individual instructions available on the web. Here is a quick list, but the most detailed source is from the IA-32 Architecture Developer Manual, 2B.

    Dead Code

    The third and final challenge in reading Sopwith is the sheer amount of dead code. Scanning through the source, you'll see typical code sections commented out with various dates of change, but those aren't the serious problem. The problem is that features were removed, but the code wasn't commented out. In fact, several source files are included, linked, but no pathway exists for the code to be executed at run time. For example:

  • Multiplayer is disabled, menu options were removed and command line switches are ignored. This effectively removes SWASYNIO.C, SWMULTIO.C, SWCOMM.ASM, and _DKIO.ASM from run time. swnetio.c was already removed from the Makefile.
  • Missiles and flares (starbursts) aren't usable in game without reading the source and using appropriate switches. Normal users won't benefit
  • The source includes management of IBM and non-IBM keyboards. However, this version has IBM hardcoded and the other snippets go unused
  • This game retains some code related to Atari
  • All of these dead code sections probably reduces the code amount at least 25%. I may still decode of those other source files, but they'll be last on my list of things to do.

    Game Initialization

    Initializing Sopwith at run time is different than most modern games. First, all of the assets are hardcoded, so there's no need to read data files. Second, the game replaces operating system interrupts with custom handlers that directly modify the program in memory. This complicates code reading since we can never predict when the OS will context switch and execute code. The game disables interrupts at times when game state needs to retain its integrity.

    The heavy lifting comes in SWINIT.C between lines 190 and 300. It begins by reading the command line arguments and setting the various switches that could have been passed, such as difficulty level settings, input settings, sound, etc. If this is the first game of the session, the title is prepared for display. Several variables are set, including the system delay and the random number seed used for explosions. Sound is initialized, followed by interrupt handler overrides for Ctrl-Break and the system timer (DOS IVT: 0x1B and 0x1C).

    Then the initialization prepares the title sequence to retrieve the modes that weren't passed on the command line, such as game mode, control method, difficult level, etc. Each of these requires input from the user at the title screen.

    Finally, we prepare to launch the main game by initializing the buildings, computer planes, the bird flocks, the oxen, before finally overriding the keyboard interrupt for the action sequence.

    Now we're ready to play the game!

    The Game Loop

    Sopwith's main game loop covers only 29 lines of code in SWMAIN.C between lines 149 and 178. The general flow looks like this:

  • Delay the game for a few ticks
  • Perform plane actions (move, bomb, shoot, etc)
  • Draw world: ground and objects
  • Test and perform collisions
  • Play sounds
  • This early style game loop isn't quite what you'd expect to find these days. The typical cycle, check input -> update game state -> render output, isn't clear thanks to the custom input interrupts configured at load time. These interrupts directly affect game state and thus the loop disables interrupts for each task of the game loop. The joystick state is polled between each task which I assume helps to responsiveness for very slow systems.

    An important point regarding game objects and interrupts: All objects begin their update phase by removing themselves from the active object list -- an unusual idea in 1984 when all processors were single core and single threaded, why the concurrency protection? Unfortunately, the interrupt handlers have the capability of touching objects still linked to the global game object lists. Yes, even this old game suffers from race conditions!

    The CGA (Computer Graphics Adapter)

    Decoding the graphics pipeline of Sopwith requires some basic understanding of the CGA. The direct interface is in the file SWGRPH.ASM, and can be fairly dense for the uninitiated. I'll focus on three points: The memory for the screen, the memory layout for the pixels, and the XOR operations.

    CGA Memory Layout

    The CGA includes 16kb of memory that we access starting at memory address 0xB800. This marks the first byte starting in the upper left of the screen. However, Only the EVEN rows occupy the first 8kb of memory, the ODD rows begin 8kb higher. Thus base address is stored as a macro SCR_SEGM == 0xB800, while the raster offset for the odd rows is stored as SCR_ROFF == 0x2000. During graphics processing in SWGRPH.ASM, the offset between rows is stored in a register and is added to the segment offset at the end of a line. It is then inverted to reflect the upcoming jump back to the other segment. A useful consequence is that if we know the memory location of a pixel, the same pixel one row below is the current address + SCR_OFF

    CGA memory map in Sopwith

    CGA Pixel Layout

    Pixels in CGA are packed in to a byte. Each byte holds four pixels of two bits. The complication is that we can't directly address every pixel. Instead, which have to figure out which byte holds the pixel we need, read that whole byte, manipulate the two bits we want to change, then write that byte back to memory. In assembly, this involves many repeated operations to shift and mask values in to the proper location.

    CGA packed pixels

    XOR Operations in CGA

    Most writes to video memory in Sopwith are done through XOR. Sometimes it's as simple as XOR of the graphic on to the black background. In other cases, we need to change the color of the graphic to its inverse so we XOR the graphic before XORing that result to the black background. Recall that in CGA, we have 4 pixel colors, black, cyan, magenta, and white. The XOR of cyan is magenta and vice versa. Collision detection is done through lookup of expected XOR values, any pixels written that to not match the lookup table values triggers a collision

    CGA XOR effects

    Everything Else

    This high-level look at the code is probably all you need to begin. The rest of the learning is in the code itself. I recommend starting with SWMAIN.C and working through to the game loop, possibly checking out the initialization in SWINIT.C. The game loop branches to all of the subsystems. I'll fill in the code walkthroughs one at a time and link them above.

    Here is a surviving screenshot from my iOS/Android app remake, Sopwith Barons:

    Sopwith Barons (2012) by MaiZure