#td4 — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #td4, aggregated by home.social.

diyelectromusic @[email protected] · 2026-05-14 · 19:25 UTC

@niconiconi This looked like such a great publication, at one point I was seriously looking to see if I could get a copy, after messing around quite a lot with the #TD4 CPU design myself. But ultimately had to decide there would be little point giving my lack of language skills.
But I was still looking if I could get copy. I mean, this really is such a great publication! :)

#td4
diyelectromusic @[email protected] · 2026-05-14 · 19:25 UTC

@niconiconi This looked like such a great publication, at one point I was seriously looking to see if I could get a copy, after messing around quite a lot with the #TD4 CPU design myself. But ultimately had to decide there would be little point giving my lack of language skills.
But I was still looking if I could get copy. I mean, this really is such a great publication! :)

#td4
diyelectromusic @[email protected] · 2026-05-14 · 19:25 UTC

@niconiconi This looked like such a great publication, at one point I was seriously looking to see if I could get a copy, after messing around quite a lot with the #TD4 CPU design myself. But ultimately had to decide there would be little point giving my lack of language skills.
But I was still looking if I could get copy. I mean, this really is such a great publication! :)

#td4
diyelectromusic @[email protected] · 2026-05-14 · 19:25 UTC

@niconiconi This looked like such a great publication, at one point I was seriously looking to see if I could get a copy, after messing around quite a lot with the #TD4 CPU design myself. But ultimately had to decide there would be little point giving my lack of language skills.
But I was still looking if I could get copy. I mean, this really is such a great publication! :)

#td4
diyelectromusic @[email protected] · 2026-05-14 · 19:25 UTC

@niconiconi This looked like such a great publication, at one point I was seriously looking to see if I could get a copy, after messing around quite a lot with the #TD4 CPU design myself. But ultimately had to decide there would be little point giving my lack of language skills.
But I was still looking if I could get copy. I mean, this really is such a great publication! :)

#td4
diyelectromusic @[email protected] · 2026-01-31 · 20:39 UTC

I appear to have created a "sound card" for a 4-bit CPU.
As you do.
https://diyelectromusic.com/2026/01/31/td4-4-bit-sound/
#TD4 #SynthDIY

#td4 #synthdiy
diyelectromusic @[email protected] · 2026-01-31 · 20:39 UTC

I appear to have created a "sound card" for a 4-bit CPU.
As you do.
https://diyelectromusic.com/2026/01/31/td4-4-bit-sound/
#TD4 #SynthDIY

#td4 #synthdiy
diyelectromusic @[email protected] · 2026-01-31 · 20:39 UTC

I appear to have created a "sound card" for a 4-bit CPU.
As you do.
https://diyelectromusic.com/2026/01/31/td4-4-bit-sound/
#TD4 #SynthDIY

#td4 #synthdiy
diyelectromusic @[email protected] · 2026-01-31 · 20:39 UTC

I appear to have created a "sound card" for a 4-bit CPU.
As you do.
https://diyelectromusic.com/2026/01/31/td4-4-bit-sound/
#TD4 #SynthDIY

#synthdiy #td4
diyelectromusic @[email protected] · 2026-01-31 · 20:39 UTC

I appear to have created a "sound card" for a 4-bit CPU.
As you do.
https://diyelectromusic.com/2026/01/31/td4-4-bit-sound/
#TD4 #SynthDIY

#td4 #synthdiy
Simple DIY Electronic Music Projects @[email protected] · 2026-01-31 · 20:37 UTC
TD4 4-bit Sound
Over on my other blog, I spentt a fair bit of time looking at the TD4 4-bit CPU. One of the things I wanted to do with my NAND Oscillators and Logic Sequencer PCB was hook up the address/select pins to something else. And with three select pins, allowing the choice between 8 notes, what better to connect it to, than a 4-bit CPU?
https://makertube.net/w/aroDZYM2BHYpoB9QLJvHnk
Warning! I strongly recommend using old or second hand equipment for your experiments. I am not responsible for any damage to expensive instruments!
If you are new to microcontrollers, see the Getting Started pages.
Parts list
- Built TD4 CPU
- Built NAND Oscillators and Logic Sequencer PCB
- 5V power, jumper wires, amplifier, etc.
The Setup
The most obvious thing in my mind, is to hook up three of the four outputs to the three selection pins of the NAND sequencer, so that is what this post explores.
The NAND PCB needs the jumpers removing, which disconnects the pot-driven oscillators. Then the three select/address lines can be connected to three of the four resistors supporting the OUTPUT LEDs of the TD4, as shown above.
It is also possible to use the POWER header pins to power the NAND PCB too.
Any of the variants of TD4 I’ve built could be used, but I’ve shown above where they would need to be connected on the original. In the end I actually soldered four header pins to the appropriate side of the resistors on my own PCB version of the TD4 as shown below. A bit crude, but it does the job.
Connecting these over to the NAND sequencer and hooking up power gives me the following.
The Code
The simplest way to create a sequence is a set of OUT xx instructions where the least significant 3 bits (so values 0 to 7) map onto the three possible notes played by the NAND sequencer.
This is the simple LED OUTPUT code from Part 3 of my series, but this continually toggles between the lowest and highest notes.
```
0000 OUT 0001   # 1000 1101
0001 OUT 0111   # 1110 1101
0010 JMP 0000   # 0000 1111
```
A counter can be used to play all 8 notes. Note that in this code B will go from 0 to 15 (b0000 to b1111) but only the last three bits select notes. This means that the sequence will count from b000 to b111 twice for each pass through this loop with the top bit being ignored.
```
0000 ADD B,0001  # 1000 1010
0001 OUT B       # 0000 1001
0010 JMP 0000    # 0000 1111
```
There are only two speeds though, 1Hz and 10Hz so the above, which has three instructions, has a tempo of 20 bpm (1 note every 3 seconds) or 200 bpm (approx 3 notes every second). The tempo can be slowed down in steps of 1 second or 1/10 second by moving the JMP an instruction further down and back-filling with other instructions (ADD A,0 or b00000000 is a good one, and is essentially equivalent to a NOP).
The following code uses the INPUT as a counter in a loop to provide a partly configurable tempo.
```
0000 IN A        # 0000 0100    A = INPUT
0001 OUT B       # 0000 1001    OUTPUT = B                  # Plays the note in B
0010 ADD B,0001  # 1000 1010    B = B + 1
0011 ADD A,1111  # 1111 0000    A = A + (-1)                # Loops until A = 0
0100 JNC 0000    # 0000 0111    JUMP IF NO CARRY TO 0000    # Jump back to start for next note
0101 ADD A,0     # 0000 0000    Optional additional NOPs
0110 JMP 0011    # 1100 1111    JUMP to 0011                # Else keep counting
```
This is still only cycling through each note individually though, but that is kind of what an 8-step sequencer would do.
To get more creative with the programmability of the sequencer requires a series of OUT instructions and NOPs between them, for example:
```
0000 OUT 0000    # 0000 1101    OUTPUT = 0000 # Play note 000
0001 OUT 0010    # 0100 1101    OUTPUT = 0010 # Play note 010
0010 ADD A,0     # 0000 0000    A = A + 0     # NOP
0011 OUT 0001    # 1000 1101    OUTPUT = 0001 # Play note 001
0100 OUT 0100    # 0010 1101    OUTPUT = 0100 # Play note 100
0101 ADD A,0     # 0000 0000    A = A + 0     # NOP
0110 ADD A,0     # 0000 0000    A = A + 0     # NOP
0111 OUT 0110    # 0110 1101    OUTPUT = 0110 # Play note 110
1000 OUT 0101    # 1010 1101    OUTPUT = 0101 # Play note 101
1001 OUT 0011    # 1100 1101    OUTPUT = 0011 # Play note 011
1010 ADD A,0     # 0000 0000    A = A + 0     # NOP
1011 ADD A,0     # 0000 0000    A = A + 0     # NOP
1100 ADD A,0     # 0000 0000    A = A + 0     # NOP
1101 OUT 0111    # 1110 1101    OUTPUT = 0111 # Play note 111
1110 ADD A,0     # 0000 0000    A = A + 0     # NOP
1111 ADD A,0     # 0000 0000    A = A + 0     # NOP
```
This last programme is the one running in the video at the start of this post.
Closing Thoughts
I appear to have made a sound card for a 4-bit CPU 🙂
One thing I am quite keen to do is connect up the sequencer’s select pins to the TD4’s address lines, as I’d like to be able to have some incidental (accidental?) music that appears as a result of the CPU just running any other normal programme.
To do this I’d need to either hook into the output of the PC register or the input to the HC154 ROM decoder.
In fact, it would be really interesting to be able to hook up any sets of four signals – so the INPUT selector, or even the control decoding logic – just to see what it sounds like as the CPU is running normal code. That might require a special build of the CPU though.
I also have an address line spare of course, so it would also be interesting to use that to select between two NAND sequencers to give me a 16 step sequence.
Kevin
#4Bit #74hc4051 #nand #sequencer #td4
#4bit #74hc4051 #nand #sequencer #td4
Simple DIY Electronic Music Projects @[email protected] · 2026-01-31 · 20:37 UTC
TD4 4-bit Sound
Over on my other blog, I spentt a fair bit of time looking at the TD4 4-bit CPU. One of the things I wanted to do with my NAND Oscillators and Logic Sequencer PCB was hook up the address/select pins to something else. And with three select pins, allowing the choice between 8 notes, what better to connect it to, than a 4-bit CPU?
https://makertube.net/w/aroDZYM2BHYpoB9QLJvHnk
Warning! I strongly recommend using old or second hand equipment for your experiments. I am not responsible for any damage to expensive instruments!
If you are new to microcontrollers, see the Getting Started pages.
Parts list
- Built TD4 CPU
- Built NAND Oscillators and Logic Sequencer PCB
- 5V power, jumper wires, amplifier, etc.
The Setup
The most obvious thing in my mind, is to hook up three of the four outputs to the three selection pins of the NAND sequencer, so that is what this post explores.
The NAND PCB needs the jumpers removing, which disconnects the pot-driven oscillators. Then the three select/address lines can be connected to three of the four resistors supporting the OUTPUT LEDs of the TD4, as shown above.
It is also possible to use the POWER header pins to power the NAND PCB too.
Any of the variants of TD4 I’ve built could be used, but I’ve shown above where they would need to be connected on the original. In the end I actually soldered four header pins to the appropriate side of the resistors on my own PCB version of the TD4 as shown below. A bit crude, but it does the job.
Connecting these over to the NAND sequencer and hooking up power gives me the following.
The Code
The simplest way to create a sequence is a set of OUT xx instructions where the least significant 3 bits (so values 0 to 7) map onto the three possible notes played by the NAND sequencer.
This is the simple LED OUTPUT code from Part 3 of my series, but this continually toggles between the lowest and highest notes.
```
0000 OUT 0001   # 1000 1101
0001 OUT 0111   # 1110 1101
0010 JMP 0000   # 0000 1111
```
A counter can be used to play all 8 notes. Note that in this code B will go from 0 to 15 (b0000 to b1111) but only the last three bits select notes. This means that the sequence will count from b000 to b111 twice for each pass through this loop with the top bit being ignored.
```
0000 ADD B,0001  # 1000 1010
0001 OUT B       # 0000 1001
0010 JMP 0000    # 0000 1111
```
There are only two speeds though, 1Hz and 10Hz so the above, which has three instructions, has a tempo of 20 bpm (1 note every 3 seconds) or 200 bpm (approx 3 notes every second). The tempo can be slowed down in steps of 1 second or 1/10 second by moving the JMP an instruction further down and back-filling with other instructions (ADD A,0 or b00000000 is a good one, and is essentially equivalent to a NOP).
The following code uses the INPUT as a counter in a loop to provide a partly configurable tempo.
```
0000 IN A        # 0000 0100    A = INPUT
0001 OUT B       # 0000 1001    OUTPUT = B                  # Plays the note in B
0010 ADD B,0001  # 1000 1010    B = B + 1
0011 ADD A,1111  # 1111 0000    A = A + (-1)                # Loops until A = 0
0100 JNC 0000    # 0000 0111    JUMP IF NO CARRY TO 0000    # Jump back to start for next note
0101 ADD A,0     # 0000 0000    Optional additional NOPs
0110 JMP 0011    # 1100 1111    JUMP to 0011                # Else keep counting
```
This is still only cycling through each note individually though, but that is kind of what an 8-step sequencer would do.
To get more creative with the programmability of the sequencer requires a series of OUT instructions and NOPs between them, for example:
```
0000 OUT 0000    # 0000 1101    OUTPUT = 0000 # Play note 000
0001 OUT 0010    # 0100 1101    OUTPUT = 0010 # Play note 010
0010 ADD A,0     # 0000 0000    A = A + 0     # NOP
0011 OUT 0001    # 1000 1101    OUTPUT = 0001 # Play note 001
0100 OUT 0100    # 0010 1101    OUTPUT = 0100 # Play note 100
0101 ADD A,0     # 0000 0000    A = A + 0     # NOP
0110 ADD A,0     # 0000 0000    A = A + 0     # NOP
0111 OUT 0110    # 0110 1101    OUTPUT = 0110 # Play note 110
1000 OUT 0101    # 1010 1101    OUTPUT = 0101 # Play note 101
1001 OUT 0011    # 1100 1101    OUTPUT = 0011 # Play note 011
1010 ADD A,0     # 0000 0000    A = A + 0     # NOP
1011 ADD A,0     # 0000 0000    A = A + 0     # NOP
1100 ADD A,0     # 0000 0000    A = A + 0     # NOP
1101 OUT 0111    # 1110 1101    OUTPUT = 0111 # Play note 111
1110 ADD A,0     # 0000 0000    A = A + 0     # NOP
1111 ADD A,0     # 0000 0000    A = A + 0     # NOP
```
This last programme is the one running in the video at the start of this post.
Closing Thoughts
I appear to have made a sound card for a 4-bit CPU 🙂
One thing I am quite keen to do is connect up the sequencer’s select pins to the TD4’s address lines, as I’d like to be able to have some incidental (accidental?) music that appears as a result of the CPU just running any other normal programme.
To do this I’d need to either hook into the output of the PC register or the input to the HC154 ROM decoder.
In fact, it would be really interesting to be able to hook up any sets of four signals – so the INPUT selector, or even the control decoding logic – just to see what it sounds like as the CPU is running normal code. That might require a special build of the CPU though.
I also have an address line spare of course, so it would also be interesting to use that to select between two NAND sequencers to give me a 16 step sequence.
Kevin
#4Bit #74hc4051 #nand #sequencer #td4
#4bit #74hc4051 #nand #sequencer #td4
Simple DIY Electronic Music Projects @[email protected] · 2026-01-31 · 20:37 UTC
TD4 4-bit Sound
Over on my other blog, I spentt a fair bit of time looking at the TD4 4-bit CPU. One of the things I wanted to do with my NAND Oscillators and Logic Sequencer PCB was hook up the address/select pins to something else. And with three select pins, allowing the choice between 8 notes, what better to connect it to, than a 4-bit CPU?
https://makertube.net/w/aroDZYM2BHYpoB9QLJvHnk
Warning! I strongly recommend using old or second hand equipment for your experiments. I am not responsible for any damage to expensive instruments!
If you are new to microcontrollers, see the Getting Started pages.
Parts list
- Built TD4 CPU
- Built NAND Oscillators and Logic Sequencer PCB
- 5V power, jumper wires, amplifier, etc.
The Setup
The most obvious thing in my mind, is to hook up three of the four outputs to the three selection pins of the NAND sequencer, so that is what this post explores.
The NAND PCB needs the jumpers removing, which disconnects the pot-driven oscillators. Then the three select/address lines can be connected to three of the four resistors supporting the OUTPUT LEDs of the TD4, as shown above.
It is also possible to use the POWER header pins to power the NAND PCB too.
Any of the variants of TD4 I’ve built could be used, but I’ve shown above where they would need to be connected on the original. In the end I actually soldered four header pins to the appropriate side of the resistors on my own PCB version of the TD4 as shown below. A bit crude, but it does the job.
Connecting these over to the NAND sequencer and hooking up power gives me the following.
The Code
The simplest way to create a sequence is a set of OUT xx instructions where the least significant 3 bits (so values 0 to 7) map onto the three possible notes played by the NAND sequencer.
This is the simple LED OUTPUT code from Part 3 of my series, but this continually toggles between the lowest and highest notes.
```
0000 OUT 0001   # 1000 1101
0001 OUT 0111   # 1110 1101
0010 JMP 0000   # 0000 1111
```
A counter can be used to play all 8 notes. Note that in this code B will go from 0 to 15 (b0000 to b1111) but only the last three bits select notes. This means that the sequence will count from b000 to b111 twice for each pass through this loop with the top bit being ignored.
```
0000 ADD B,0001  # 1000 1010
0001 OUT B       # 0000 1001
0010 JMP 0000    # 0000 1111
```
There are only two speeds though, 1Hz and 10Hz so the above, which has three instructions, has a tempo of 20 bpm (1 note every 3 seconds) or 200 bpm (approx 3 notes every second). The tempo can be slowed down in steps of 1 second or 1/10 second by moving the JMP an instruction further down and back-filling with other instructions (ADD A,0 or b00000000 is a good one, and is essentially equivalent to a NOP).
The following code uses the INPUT as a counter in a loop to provide a partly configurable tempo.
```
0000 IN A        # 0000 0100    A = INPUT
0001 OUT B       # 0000 1001    OUTPUT = B                  # Plays the note in B
0010 ADD B,0001  # 1000 1010    B = B + 1
0011 ADD A,1111  # 1111 0000    A = A + (-1)                # Loops until A = 0
0100 JNC 0000    # 0000 0111    JUMP IF NO CARRY TO 0000    # Jump back to start for next note
0101 ADD A,0     # 0000 0000    Optional additional NOPs
0110 JMP 0011    # 1100 1111    JUMP to 0011                # Else keep counting
```
This is still only cycling through each note individually though, but that is kind of what an 8-step sequencer would do.
To get more creative with the programmability of the sequencer requires a series of OUT instructions and NOPs between them, for example:
```
0000 OUT 0000    # 0000 1101    OUTPUT = 0000 # Play note 000
0001 OUT 0010    # 0100 1101    OUTPUT = 0010 # Play note 010
0010 ADD A,0     # 0000 0000    A = A + 0     # NOP
0011 OUT 0001    # 1000 1101    OUTPUT = 0001 # Play note 001
0100 OUT 0100    # 0010 1101    OUTPUT = 0100 # Play note 100
0101 ADD A,0     # 0000 0000    A = A + 0     # NOP
0110 ADD A,0     # 0000 0000    A = A + 0     # NOP
0111 OUT 0110    # 0110 1101    OUTPUT = 0110 # Play note 110
1000 OUT 0101    # 1010 1101    OUTPUT = 0101 # Play note 101
1001 OUT 0011    # 1100 1101    OUTPUT = 0011 # Play note 011
1010 ADD A,0     # 0000 0000    A = A + 0     # NOP
1011 ADD A,0     # 0000 0000    A = A + 0     # NOP
1100 ADD A,0     # 0000 0000    A = A + 0     # NOP
1101 OUT 0111    # 1110 1101    OUTPUT = 0111 # Play note 111
1110 ADD A,0     # 0000 0000    A = A + 0     # NOP
1111 ADD A,0     # 0000 0000    A = A + 0     # NOP
```
This last programme is the one running in the video at the start of this post.
Closing Thoughts
I appear to have made a sound card for a 4-bit CPU 🙂
One thing I am quite keen to do is connect up the sequencer’s select pins to the TD4’s address lines, as I’d like to be able to have some incidental (accidental?) music that appears as a result of the CPU just running any other normal programme.
To do this I’d need to either hook into the output of the PC register or the input to the HC154 ROM decoder.
In fact, it would be really interesting to be able to hook up any sets of four signals – so the INPUT selector, or even the control decoding logic – just to see what it sounds like as the CPU is running normal code. That might require a special build of the CPU though.
I also have an address line spare of course, so it would also be interesting to use that to select between two NAND sequencers to give me a 16 step sequence.
Kevin
#4Bit #74hc4051 #nand #sequencer #td4
#4bit #74hc4051 #nand #sequencer #td4
Simple DIY Electronic Music Projects @[email protected] · 2026-01-31 · 20:37 UTC
TD4 4-bit Sound
Over on my other blog, I spentt a fair bit of time looking at the TD4 4-bit CPU. One of the things I wanted to do with my NAND Oscillators and Logic Sequencer PCB was hook up the address/select pins to something else. And with three select pins, allowing the choice between 8 notes, what better to connect it to, than a 4-bit CPU?
https://makertube.net/w/aroDZYM2BHYpoB9QLJvHnk
Warning! I strongly recommend using old or second hand equipment for your experiments. I am not responsible for any damage to expensive instruments!
If you are new to microcontrollers, see the Getting Started pages.
Parts list
- Built TD4 CPU
- Built NAND Oscillators and Logic Sequencer PCB
- 5V power, jumper wires, amplifier, etc.
The Setup
The most obvious thing in my mind, is to hook up three of the four outputs to the three selection pins of the NAND sequencer, so that is what this post explores.
The NAND PCB needs the jumpers removing, which disconnects the pot-driven oscillators. Then the three select/address lines can be connected to three of the four resistors supporting the OUTPUT LEDs of the TD4, as shown above.
It is also possible to use the POWER header pins to power the NAND PCB too.
Any of the variants of TD4 I’ve built could be used, but I’ve shown above where they would need to be connected on the original. In the end I actually soldered four header pins to the appropriate side of the resistors on my own PCB version of the TD4 as shown below. A bit crude, but it does the job.
Connecting these over to the NAND sequencer and hooking up power gives me the following.
The Code
The simplest way to create a sequence is a set of OUT xx instructions where the least significant 3 bits (so values 0 to 7) map onto the three possible notes played by the NAND sequencer.
This is the simple LED OUTPUT code from Part 3 of my series, but this continually toggles between the lowest and highest notes.
```
0000 OUT 0001   # 1000 1101
0001 OUT 0111   # 1110 1101
0010 JMP 0000   # 0000 1111
```
A counter can be used to play all 8 notes. Note that in this code B will go from 0 to 15 (b0000 to b1111) but only the last three bits select notes. This means that the sequence will count from b000 to b111 twice for each pass through this loop with the top bit being ignored.
```
0000 ADD B,0001  # 1000 1010
0001 OUT B       # 0000 1001
0010 JMP 0000    # 0000 1111
```
There are only two speeds though, 1Hz and 10Hz so the above, which has three instructions, has a tempo of 20 bpm (1 note every 3 seconds) or 200 bpm (approx 3 notes every second). The tempo can be slowed down in steps of 1 second or 1/10 second by moving the JMP an instruction further down and back-filling with other instructions (ADD A,0 or b00000000 is a good one, and is essentially equivalent to a NOP).
The following code uses the INPUT as a counter in a loop to provide a partly configurable tempo.
```
0000 IN A        # 0000 0100    A = INPUT
0001 OUT B       # 0000 1001    OUTPUT = B                  # Plays the note in B
0010 ADD B,0001  # 1000 1010    B = B + 1
0011 ADD A,1111  # 1111 0000    A = A + (-1)                # Loops until A = 0
0100 JNC 0000    # 0000 0111    JUMP IF NO CARRY TO 0000    # Jump back to start for next note
0101 ADD A,0     # 0000 0000    Optional additional NOPs
0110 JMP 0011    # 1100 1111    JUMP to 0011                # Else keep counting
```
This is still only cycling through each note individually though, but that is kind of what an 8-step sequencer would do.
To get more creative with the programmability of the sequencer requires a series of OUT instructions and NOPs between them, for example:
```
0000 OUT 0000    # 0000 1101    OUTPUT = 0000 # Play note 000
0001 OUT 0010    # 0100 1101    OUTPUT = 0010 # Play note 010
0010 ADD A,0     # 0000 0000    A = A + 0     # NOP
0011 OUT 0001    # 1000 1101    OUTPUT = 0001 # Play note 001
0100 OUT 0100    # 0010 1101    OUTPUT = 0100 # Play note 100
0101 ADD A,0     # 0000 0000    A = A + 0     # NOP
0110 ADD A,0     # 0000 0000    A = A + 0     # NOP
0111 OUT 0110    # 0110 1101    OUTPUT = 0110 # Play note 110
1000 OUT 0101    # 1010 1101    OUTPUT = 0101 # Play note 101
1001 OUT 0011    # 1100 1101    OUTPUT = 0011 # Play note 011
1010 ADD A,0     # 0000 0000    A = A + 0     # NOP
1011 ADD A,0     # 0000 0000    A = A + 0     # NOP
1100 ADD A,0     # 0000 0000    A = A + 0     # NOP
1101 OUT 0111    # 1110 1101    OUTPUT = 0111 # Play note 111
1110 ADD A,0     # 0000 0000    A = A + 0     # NOP
1111 ADD A,0     # 0000 0000    A = A + 0     # NOP
```
This last programme is the one running in the video at the start of this post.
Closing Thoughts
I appear to have made a sound card for a 4-bit CPU 🙂
One thing I am quite keen to do is connect up the sequencer’s select pins to the TD4’s address lines, as I’d like to be able to have some incidental (accidental?) music that appears as a result of the CPU just running any other normal programme.
To do this I’d need to either hook into the output of the PC register or the input to the HC154 ROM decoder.
In fact, it would be really interesting to be able to hook up any sets of four signals – so the INPUT selector, or even the control decoding logic – just to see what it sounds like as the CPU is running normal code. That might require a special build of the CPU though.
I also have an address line spare of course, so it would also be interesting to use that to select between two NAND sequencers to give me a 16 step sequence.
Kevin
#4Bit #74hc4051 #nand #sequencer #td4
#td4 #sequencer #nand #74hc4051 #4bit
Simple DIY Electronic Music Projects @[email protected] · 2026-01-31 · 20:37 UTC
TD4 4-bit Sound
Over on my other blog, I spentt a fair bit of time looking at the TD4 4-bit CPU. One of the things I wanted to do with my NAND Oscillators and Logic Sequencer PCB was hook up the address/select pins to something else. And with three select pins, allowing the choice between 8 notes, what better to connect it to, than a 4-bit CPU?
https://makertube.net/w/aroDZYM2BHYpoB9QLJvHnk
Warning! I strongly recommend using old or second hand equipment for your experiments. I am not responsible for any damage to expensive instruments!
If you are new to microcontrollers, see the Getting Started pages.
Parts list
- Built TD4 CPU
- Built NAND Oscillators and Logic Sequencer PCB
- 5V power, jumper wires, amplifier, etc.
The Setup
The most obvious thing in my mind, is to hook up three of the four outputs to the three selection pins of the NAND sequencer, so that is what this post explores.
The NAND PCB needs the jumpers removing, which disconnects the pot-driven oscillators. Then the three select/address lines can be connected to three of the four resistors supporting the OUTPUT LEDs of the TD4, as shown above.
It is also possible to use the POWER header pins to power the NAND PCB too.
Any of the variants of TD4 I’ve built could be used, but I’ve shown above where they would need to be connected on the original. In the end I actually soldered four header pins to the appropriate side of the resistors on my own PCB version of the TD4 as shown below. A bit crude, but it does the job.
Connecting these over to the NAND sequencer and hooking up power gives me the following.
The Code
The simplest way to create a sequence is a set of OUT xx instructions where the least significant 3 bits (so values 0 to 7) map onto the three possible notes played by the NAND sequencer.
This is the simple LED OUTPUT code from Part 3 of my series, but this continually toggles between the lowest and highest notes.
```
0000 OUT 0001   # 1000 1101
0001 OUT 0111   # 1110 1101
0010 JMP 0000   # 0000 1111
```
A counter can be used to play all 8 notes. Note that in this code B will go from 0 to 15 (b0000 to b1111) but only the last three bits select notes. This means that the sequence will count from b000 to b111 twice for each pass through this loop with the top bit being ignored.
```
0000 ADD B,0001  # 1000 1010
0001 OUT B       # 0000 1001
0010 JMP 0000    # 0000 1111
```
There are only two speeds though, 1Hz and 10Hz so the above, which has three instructions, has a tempo of 20 bpm (1 note every 3 seconds) or 200 bpm (approx 3 notes every second). The tempo can be slowed down in steps of 1 second or 1/10 second by moving the JMP an instruction further down and back-filling with other instructions (ADD A,0 or b00000000 is a good one, and is essentially equivalent to a NOP).
The following code uses the INPUT as a counter in a loop to provide a partly configurable tempo.
```
0000 IN A        # 0000 0100    A = INPUT
0001 OUT B       # 0000 1001    OUTPUT = B                  # Plays the note in B
0010 ADD B,0001  # 1000 1010    B = B + 1
0011 ADD A,1111  # 1111 0000    A = A + (-1)                # Loops until A = 0
0100 JNC 0000    # 0000 0111    JUMP IF NO CARRY TO 0000    # Jump back to start for next note
0101 ADD A,0     # 0000 0000    Optional additional NOPs
0110 JMP 0011    # 1100 1111    JUMP to 0011                # Else keep counting
```
This is still only cycling through each note individually though, but that is kind of what an 8-step sequencer would do.
To get more creative with the programmability of the sequencer requires a series of OUT instructions and NOPs between them, for example:
```
0000 OUT 0000    # 0000 1101    OUTPUT = 0000 # Play note 000
0001 OUT 0010    # 0100 1101    OUTPUT = 0010 # Play note 010
0010 ADD A,0     # 0000 0000    A = A + 0     # NOP
0011 OUT 0001    # 1000 1101    OUTPUT = 0001 # Play note 001
0100 OUT 0100    # 0010 1101    OUTPUT = 0100 # Play note 100
0101 ADD A,0     # 0000 0000    A = A + 0     # NOP
0110 ADD A,0     # 0000 0000    A = A + 0     # NOP
0111 OUT 0110    # 0110 1101    OUTPUT = 0110 # Play note 110
1000 OUT 0101    # 1010 1101    OUTPUT = 0101 # Play note 101
1001 OUT 0011    # 1100 1101    OUTPUT = 0011 # Play note 011
1010 ADD A,0     # 0000 0000    A = A + 0     # NOP
1011 ADD A,0     # 0000 0000    A = A + 0     # NOP
1100 ADD A,0     # 0000 0000    A = A + 0     # NOP
1101 OUT 0111    # 1110 1101    OUTPUT = 0111 # Play note 111
1110 ADD A,0     # 0000 0000    A = A + 0     # NOP
1111 ADD A,0     # 0000 0000    A = A + 0     # NOP
```
This last programme is the one running in the video at the start of this post.
Closing Thoughts
I appear to have made a sound card for a 4-bit CPU 🙂
One thing I am quite keen to do is connect up the sequencer’s select pins to the TD4’s address lines, as I’d like to be able to have some incidental (accidental?) music that appears as a result of the CPU just running any other normal programme.
To do this I’d need to either hook into the output of the PC register or the input to the HC154 ROM decoder.
In fact, it would be really interesting to be able to hook up any sets of four signals – so the INPUT selector, or even the control decoding logic – just to see what it sounds like as the CPU is running normal code. That might require a special build of the CPU though.
I also have an address line spare of course, so it would also be interesting to use that to select between two NAND sequencers to give me a 16 step sequence.
Kevin
#4Bit #74hc4051 #nand #sequencer #td4
#4bit #74hc4051 #nand #sequencer #td4
Kevin's Blog @[email protected] · 2025-11-26 · 17:40 UTC
TD4 4-bit DIY CPU – Part 8
Now that I’ve shown I could support more ROM if required using a microcontroller (see Part 6) I can start to ponder how that might be possible.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
There are several other expansions to consider too. Other things I’m pondering are:
- Can I find a way to add the two registers together?
- Are there options to add another register?
- Is 4-bit data still enough?
- Could any extensions be added in a way that is backwards compatible with the existing instructions and behaviours?
And probably a few other odds and ends as I go back and reconsider the schematic as it stands, but they can wait for a future post.
TD4 Simulation
Before I get stuck into the updates, I thought it would be useful to be able to simulate the TD4 to allow for quick turn-around experiments.
I’ve used the “Digital” logic simulator which can be found here: https://github.com/hneemann/Digital
I could have build the simulator from basic logic gates and that would perhaps have been more useful in helping to understand how the design works. But I wanted something that would be easy to fiddle about with to test enhancements, so I build it using the actual 74xx logic chips instead. This doesn’t make for such a readable simulation, as I’ve had to go with actual pinouts for chips rather than logical groupings of signals. But it does map more closely onto the final hardware which is handy for thinking in actual chip-usage rather than abstract logic.
I’ve not bothered simulating the clock circuit, I’ve just wired in a clock source. I’ve also not added the ROM DIP switches, instead adding a ROM element and wiring it into the address and data lines. By right-clicking and viewing the attributes, it is possible to define a 16-byte ROM (4 address, 8 data lines) and edit the contents.
The ROM element takes a multiplexed source and produces a multiplexed output, so I use a splitter/mixer function to turn that into D0-D7 as shown above. Similarly the output of the 74HC161 acting as the program counter (PC) has A0-A3 mixed into a single ADDR bus line.
I’ve added outputs to the two registers to show their contents during execution. I’ve also added a DIP switch on the /RESET line to allow me to start and stop the simulation.
The video below shows it running the above ROM contents, which is the same demo program I used in Part 6 with the microcontroller ROM.
https://makertube.net/w/5njzGmYvqXiU3DLCMMtwqp
Now I have an easier way of experimenting, onto the enhancements.
Increasing the Address Space
The address space is currently implemented as follows:
- A 4-bit counter register based on a HC161 4-bit synchronous binary counter.
- A HC154 4 to 16 line decoder/multiplexer for DIP switch selection.
- A HC540 octal buffer/line driver to buffer (and invert) the data outputs.
The counter auto increments on each clock pulse, thus moving through the address space, but it can also be a destination for the adder, allowing absolute jumps to specific addresses, thus implementing a JMP instruction.
To increase the address space, there are a few considerations:
- With more than 4 bits how should JMPs work? They will have remain 4-bits unless the data width is increased.
- Each additional bit of address space will double the number of DIP switches required.
- The next size of binary counter above 4-bits is typically 8-bits.
One idea is to use the RCO pin of the 161. This is the “ripple carry out” and can be used to cascade counters for greater than 4-bit counting. As I understand things, RCO will be HIGH once all outputs are also HIGH, for a single clock pulse. This can be used to enable a following counter for that pulse. This is shown below (taken from the datasheet).
And this is the sample application, again from the datasheet, showing how it would work, with extensions on to additional stages.
A simple way to add an additional bit of address space might be to feed RCO into a flip-flop acting as a toggle in the configuration shown below.
This can then be used to select between two HC154 4 to 16 decoders. As I already have an unused flip flop as part of the HC74 used for the CARRY, this could be quite an appealing solution and in simulation it does appear to work.
There is one slight complication. As show above, A5 will toggle with A0-A3 = 1111 not as they change back to 0000. This is because the flip-flop toggles on the rising edge of the provided clock signal, which in this case is RCO from the 74HC161. Adding a NOT gate means that the rising edge happens as the 161’s RCO signal drops when it resets back to 0000.
Whilst this solves the sequencing problem it does have the unfortunately side effect that the RESET state means that A5 is 1 on power up. That too could be solved with another NOT gate if required, or simply hanging A5 off the /Q output of the flip-flop rather than the Q output.
Here is the additional wiring, in simulator form, to allow this to work.
Note the addition of A4 which now comes from the spare flip-flop /1Q output, and the linking of RCO via a NOT gate to the flip-flop 1CP clock input. The rest of flip-flop 1 is configured in toggle mode, with /1RD and /1SD both tied high (inactive) and 1D linked to /1Q for the feedback. The non-inverting output 1Q is not used.
Whilst this seems to require an additional logic gate (for the NOT) it turns out that there is a spare Schmidt trigger inverter on the 74HC14 that supports the clock circuit, so that is pretty convenient.
The ROM has also been reconfigured for 5 address inputs with the same 8 data bits, creating a 32 x 8 bit ROM.
There are a few issues with this though:
- JMP/JNC only work within the same half of the memory, so JMP 4 in the first 16 locations will jump to location address 0x04, but JMP 4 in the second 16 locations will jump to location address 0x14.
- A JMP 0 in the last location of each half will carry forward into the next half, as the counter ticks over at the same time as the load happens. So JMP 0 in address 0x0F will jump to address 0x10 and JMP 0 in address 0x1F will jump to address 0x00.
But if one can program around those constraints this is quite a simple solution.
An Alternative Solution
There is a neat solution to adding a 5th address bit here: https://xyama.sakura.ne.jp/hp/4bitCPU_TD4.html#memory
This uses the duplicate JMP/JNC instructions to encode a JMP2/JNC2 that results in the 5th address bit being set, this enabling a jump to the second half of the memory.
In order to create the additional address line, there is a second PC register added – i.e. a 5th HC161 counter. As far as I can see the operation is as follows:
- When the first PC register carries over, the second PC register counts up.
- As only the first output of the second PC register is used, as it counts that output will simply alternate between 0 and 1.
- The second PC register can take 1 as an input when the decoded instructions match JMP2 or JNC2 (D5 low, D6 and D7 high, with either D4 or CARRY), forcing A4 on when the first PC register is loaded with the 4-bit jump value, creating a JMP to the second half of the address space.
- There is an A4 and /A4 signal which alternatively enable the two address decoders for the ROM.
- This specific circuit uses four HC138 chips rather than two HC154, but the principle of operation is the same – generate one of 32 signals for the ROM from 5 bits of address line.
The modifications to support this are fairly simple and it is neat how it uses redundancy in the instruction set to work, but it does require an additional 74HC161 chip.
Combine the two?
If additional logic can be used to address the second PC in the second solution above, then I’m wondering if that could also be used to deliberately set or reset the flip-flop in the first solution too.
The key will be overriding the flip-flop state to preset A4 if the logic sequence for the spare JNC/JMP instructions turn up. If the /1SD input is active (LOW) then the output will be HIGH. If the /1RD input is active (LOW) then the output will be LOW.
Here is the additional instruction decoding logic – I’m using NAND gates as the NOTs here, so I can just use a single quad NAND gate chip.
So, the truth table for this is as follows:
D4D5D6D7/C/LDPCENA400111011011X0101111001111X00XX00X10XX10X10XX01X10
This corresponds to D7+D6 and either D4 or CARRY and NOT D5 causing the ENA4 signal to be true thus implementing the second JNC and JMP instructions (b1100 and b1101).
Unfortunately, so far, I’ve not been able to figure out an option for driving the flip-flop where the logic pans out to correctly set A0-A3 and A4 to successfully load the PC + flip-flop as required by the new instruction, so I might have to leave that for now.
TD4 Arduino 5-bit Address PCB
At this point I thought I had enough to warrant building a new PCB for a microcontroller memory version of the TD4 with the option to support a 5-bit address bus with the limitations described above.
I took the PCB from Part 5 as the starting point and replaced the ROM logic with an Arduino Nano and added in the flip-flop to create the 5-bit address bus.
The ROM section is replaced with the Arduino as shown below.
The CPU section now uses the spare NOT gate from the PWRCLK section and the spare flip-flop from the CPU section as shown below.
I believe these were the only parts to change. I have included the option to disable the RESET button by cutting a solder jumper and replacing it with a link to an Arduino IO pin.
I’ve also added headers to breakout the unused Arduino IO pins just in case that becomes useful at some point.
The complete Arduino Nano pinout is as follows:
TD4 SignalArduino Nano IOA0-A4A0-A4 (A4 optional)D0-D3D8-D11D4-D7D4-D7/RESETD12 (optional)
The board can be powered either via the Arduinos USB port or via the PCB micro USB port.
The PCB will be found on Github here once I know it all works.
Conclusion
I was hopeful I could add a 5th address line just using the spare components in the circuit and not adding to the chip count, and that is kind of possible as long as I’m ok with the limitations of the JMPs.
Building all this onto a PCB will make further programming experiments quite a lot easier.
But the next step is to see if the instruction set can be expanded. I am still in search of that illusive two-register add.
Kevin
#arduinoNano #pcb #td4
#arduinonano #pcb #td4
Kevin's Blog @[email protected] · 2025-11-26 · 17:40 UTC
TD4 4-bit DIY CPU – Part 8
Now that I’ve shown I could support more ROM if required using a microcontroller (see Part 6) I can start to ponder how that might be possible.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
There are several other expansions to consider too. Other things I’m pondering are:
- Can I find a way to add the two registers together?
- Are there options to add another register?
- Is 4-bit data still enough?
- Could a 4×4 output grid be supported?
- Could any extensions be added in a way that is backwards compatible with the existing instructions and behaviours?
And probably a few other odds and ends as I go back and reconsider the schematic as it stands, but they can wait for a future post.
TD4 Simulation
Before I get stuck into the updates, I thought it would be useful to be able to simulate the TD4 to allow for quick turn-around experiments.
I’ve used the “Digital” logic simulator which can be found here: https://github.com/hneemann/Digital
I could have build the simulator from basic logic gates and that would perhaps have been more useful in helping to understand how the design works. But I wanted something that would be easy to fiddle about with to test enhancements, so I build it using the actual 74xx logic chips instead. This doesn’t make for such a readable simulation, as I’ve had to go with actual pinouts for chips rather than logical groupings of signals. But it does map more closely onto the final hardware which is handy for thinking in actual chip-usage rather than abstract logic.
I’ve not bothered simulating the clock circuit, I’ve just wired in a clock source. I’ve also not added the ROM DIP switches, instead adding a ROM element and wiring it into the address and data lines. By right-clicking and viewing the attributes, it is possible to define a 16-byte ROM (4 address, 8 data lines) and edit the contents.
The ROM element takes a multiplexed source and produces a multiplexed output, so I use a splitter/mixer function to turn that into D0-D7 as shown above. Similarly the output of the 74HC161 acting as the program counter (PC) has A0-A3 mixed into a single ADDR bus line.
I’ve added outputs to the two registers to show their contents during execution. I’ve also added a DIP switch on the /RESET line to allow me to start and stop the simulation.
The video below shows it running the above ROM contents, which is the same demo program I used in Part 6 with the microcontroller ROM.
The simulator can be found on GitHub here: https://github.com/diyelectromusic/TD4-CPU
https://makertube.net/w/5njzGmYvqXiU3DLCMMtwqp
Now I have an easier way of experimenting, onto the enhancements.
Increasing the Address Space
The address space is currently implemented as follows:
- A 4-bit counter register based on a HC161 4-bit synchronous binary counter.
- A HC154 4 to 16 line decoder/multiplexer for DIP switch selection.
- A HC540 octal buffer/line driver to buffer (and invert) the data outputs.
The counter auto increments on each clock pulse, thus moving through the address space, but it can also be a destination for the adder, allowing absolute jumps to specific addresses, thus implementing a JMP instruction.
To increase the address space, there are a few considerations:
- With more than 4 bits how should JMPs work? They will have remain 4-bits unless the data width is increased.
- Each additional bit of address space will double the number of DIP switches required.
- The next size of binary counter above 4-bits is typically 8-bits.
One idea is to use the RCO pin of the 161. This is the “ripple carry out” and can be used to cascade counters for greater than 4-bit counting. As I understand things, RCO will be HIGH once all outputs are also HIGH, for a single clock pulse. This can be used to enable a following counter for that pulse. This is shown below (taken from the datasheet).
And this is the sample application, again from the datasheet, showing how it would work, with extensions on to additional stages.
A simple way to add an additional bit of address space might be to feed RCO into a flip-flop acting as a toggle in the configuration shown below.
This can then be used to select between two HC154 4 to 16 decoders. As I already have an unused flip flop as part of the HC74 used for the CARRY, this could be quite an appealing solution and in simulation it does appear to work.
There is one slight complication. As show above, A5 will toggle with A0-A3 = 1111 not as they change back to 0000. This is because the flip-flop toggles on the rising edge of the provided clock signal, which in this case is RCO from the 74HC161. Adding a NOT gate means that the rising edge happens as the 161’s RCO signal drops when it resets back to 0000.
Whilst this solves the sequencing problem it does have the unfortunately side effect that the RESET state means that A5 is 1 on power up. That too could be solved with another NOT gate if required, or simply hanging A5 off the /Q output of the flip-flop rather than the Q output.
Here is the additional wiring, in simulator form, to allow this to work.
Note the addition of A4 which now comes from the spare flip-flop /1Q output, and the linking of RCO via a NOT gate to the flip-flop 1CP clock input. The rest of flip-flop 1 is configured in toggle mode, with /1RD and /1SD both tied high (inactive) and 1D linked to /1Q for the feedback. The non-inverting output 1Q is not used.
Whilst this seems to require an additional logic gate (for the NOT) it turns out that there is a spare Schmidt trigger inverter on the 74HC14 that supports the clock circuit, so that is pretty convenient.
The ROM has also been reconfigured for 5 address inputs with the same 8 data bits, creating a 32 x 8 bit ROM.
There are a few issues with this though:
- JMP/JNC only work within the same half of the memory, so JMP 4 in the first 16 locations will jump to location address 0x04, but JMP 4 in the second 16 locations will jump to location address 0x14.
- A JMP 0 in the last location of each half will carry forward into the next half, as the counter ticks over at the same time as the load happens. So JMP 0 in address 0x0F will jump to address 0x10 and JMP 0 in address 0x1F will jump to address 0x00.
But if one can program around those constraints this is quite a simple solution.
An Alternative Solution
There is a neat solution to adding a 5th address bit here: https://xyama.sakura.ne.jp/hp/4bitCPU_TD4.html#memory
This uses the duplicate JMP/JNC instructions to encode a JMP2/JNC2 that results in the 5th address bit being set, this enabling a jump to the second half of the memory.
In order to create the additional address line, there is a second PC register added – i.e. a 5th HC161 counter. As far as I can see the operation is as follows:
- When the first PC register carries over, the second PC register counts up.
- As only the first output of the second PC register is used, as it counts that output will simply alternate between 0 and 1.
- The second PC register can take 1 as an input when the decoded instructions match JMP2 or JNC2 (D5 low, D6 and D7 high, with either D4 or CARRY), forcing A4 on when the first PC register is loaded with the 4-bit jump value, creating a JMP to the second half of the address space.
- There is an A4 and /A4 signal which alternatively enable the two address decoders for the ROM.
- This specific circuit uses four HC138 chips rather than two HC154, but the principle of operation is the same – generate one of 32 signals for the ROM from 5 bits of address line.
The modifications to support this are fairly simple and it is neat how it uses redundancy in the instruction set to work, but it does require an additional 74HC161 chip.
Combine the two?
If additional logic can be used to address the second PC in the second solution above, then I’m wondering if that could also be used to deliberately set or reset the flip-flop in the first solution too.
The key will be overriding the flip-flop state to preset A4 if the logic sequence for the spare JNC/JMP instructions turn up. If the /1SD input is active (LOW) then the output will be HIGH. If the /1RD input is active (LOW) then the output will be LOW.
Here is the additional instruction decoding logic – I’m using NAND gates as the NOTs here, so I can just use a single quad NAND gate chip.
So, the truth table for this is as follows:
D4D5D6D7/C/LDPCENA400111011011X0101111001111X00XX00X10XX10X10XX01X10
This corresponds to D7+D6 and either D4 or CARRY and NOT D5 causing the ENA4 signal to be true thus implementing the second JNC and JMP instructions (b1100 and b1101).
Unfortunately, so far, I’ve not been able to figure out an option for driving the flip-flop where the logic pans out to correctly set A0-A3 and A4 to successfully load the PC + flip-flop as required by the new instruction, so I might have to leave that for now.
TD4 Arduino 5-bit Address PCB
At this point I thought I had enough to warrant building a new PCB for a microcontroller memory version of the TD4 with the option to support a 5-bit address bus with the limitations described above.
I took the PCB from Part 5 as the starting point and replaced the ROM logic with an Arduino Nano and added in the flip-flop to create the 5-bit address bus.
The ROM section is replaced with the Arduino as shown below.
The CPU section now uses the spare NOT gate from the PWRCLK section and the spare flip-flop from the CPU section as shown below.
I believe these were the only parts to change. I have included the option to disable the RESET button by cutting a solder jumper and replacing it with a link to an Arduino IO pin.
I’ve also added headers to breakout the unused Arduino IO pins just in case that becomes useful at some point.
The complete Arduino Nano pinout is as follows:
TD4 SignalArduino Nano IOA0-A4A0-A4 (A4 optional)D0-D3D8-D11D4-D7D4-D7/RESETD12 (optional)
The board can be powered either via the Arduinos USB port or via the PCB micro USB port.
Complete Bill Of Materials
ICs:
- 1x 74HC10 Triple 3-input NAND
- 1x 74HC14 Hex Schmitt trigger inverters
- 1x 74HC32 Quad 2-input OR
- 1x 74HC74 Dual D-Type Flip Flop
- 2x 74HC153 Dual 4-to-1 selector/multiplexer
- 4x 74HC161 4-bit binary counter
- 1x 74HC283 4-bit binary full adder
Semiconductors and Passive Components
- 25x 3x2mm rectangular LED
- Resistors: 2x100R; 33x 1K; 1x 3K3; 1x 10K; 1 x 33K; 3x 100K
- Capacitors: 3x 10uF electrolytic
Other components:
- 2x SPDT slider switches (see PCB for footprint)
- 1x micro USB socket (Molex, see PCB for footprint)
- 2x tactile switches
- 1x 4-way DIP switches
- DIP sockets: 7x 16 way; 4x 14 way
- 2x 15-way pin header sockets
And 1 Arduino Nano of course.
The PCB can be found on Github here: https://github.com/diyelectromusic/TD4-CPU. The video at the end of this post shows it in action.
Nano Assembler Update
I’ve updated my Nano assembler with a new command to change to 5-bit address mode if required.
```
Help
----
H: Help
L: List
G: Goto
C: Clear
R: Restore
A: Addr Mode
O: Opcodes
OpCode
OpCode im

Current line: b0101 [15]

Address Mode = 5 bit


RAM Disassembly

b00000 [0]: OUT b0001b1010 00010xA1b10000 [10]: OUT b0001b1010 00010xA1
b00001 [1]: ADDA b0001b0000 00010x01b10001 [11]: OUT b0010b1010 00100xA2
b00010 [2]: OUT b0010b1010 00100xA2b10010 [12]: OUT b0100b1010 01000xA4
b00011 [3]: ADDB b0001b0101 00010x51b10011 [13]: OUT b1000b1010 10000xA8
b00100 [4]: OUT b0100b1010 01000xA4b10100 [14]: OUT b0100b1010 01000xA4
b00101 [5]: ADDA b0001b0000 00010x01b10101 [15]: OUT b0100b1010 01000xA4
b00110 [6]: OUT b1000b1010 10000xA8b10110 [16]: OUT b0010b1010 00100xA2
b00111 [7]: ADDB b0001b0101 00010x51b10111 [17]: OUT b0001b1010 00010xA1
b01000 [8]: OUT b0100b1010 01000xA4b11000 [18]: OUT b0001b1010 00010xA1
b01001 [9]: ADDA b0001b0000 00010x01b11001 [19]: OUT b0010b1010 00100xA2
b01010 [A]: OUT b0010b1010 00100xA2b11010 [1A]: OUT b0100b1010 01000xA4
b01011 [B]: ADDB b0001b0101 00010x51b11011 [1B]: OUT b1000b1010 10000xA8
b01100 [C]: OUT b0001b1010 00010xA1b11100 [1C]: OUT b1000b1010 10000xA8
b01101 [D]: ADDA b0000b0000 00000x00b11101 [1D]: OUT b0100b1010 01000xA4
b01110 [E]: OUT b1111b1010 11110xAFb11110 [1E]: OUT b0010b1010 00100xA2
b01111 [F]: ADDA b0000b0000 00000x00b11111 [1F]: OUT b0001b1010 00010xA1
Current line: b10101 [15]
```
When in 4-bit mode (the default) it will continue to act as previously, wrapping the address around between 0 and 15. But when it switches to 5-bit mode it will now wrap between 0 and 31 and the list function will show the whole 32 bytes of RAM/ROM side by side as show above.
The updated sketch is available on GitHub here.
Conclusion
I was hopeful I could add a 5th address line just using the spare components in the circuit and not adding to the chip count, and that is kind of possible as long as I’m ok with the limitations of the JMPs.
Building all this onto a PCB will make further programming experiments quite a lot easier.
But the next step is to see if the instruction set can be expanded. I am still in search of that illusive two-register add.
Kevin
https://makertube.net/w/fAp8ZsbPLUYEKiStc34J9o
#arduinoNano #pcb #TD4
#arduinonano #pcb #td4
Kevin's Blog @[email protected] · 2025-11-26 · 17:40 UTC
TD4 4-bit DIY CPU – Part 8
Now that I’ve shown I could support more ROM if required using a microcontroller (see Part 6) I can start to ponder how that might be possible.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
There are several other expansions to consider too. Other things I’m pondering are:
- Can I find a way to add the two registers together?
- Are there options to add another register?
- Is 4-bit data still enough?
- Could any extensions be added in a way that is backwards compatible with the existing instructions and behaviours?
And probably a few other odds and ends as I go back and reconsider the schematic as it stands, but they can wait for a future post.
TD4 Simulation
Before I get stuck into the updates, I thought it would be useful to be able to simulate the TD4 to allow for quick turn-around experiments.
I’ve used the “Digital” logic simulator which can be found here: https://github.com/hneemann/Digital
I could have build the simulator from basic logic gates and that would perhaps have been more useful in helping to understand how the design works. But I wanted something that would be easy to fiddle about with to test enhancements, so I build it using the actual 74xx logic chips instead. This doesn’t make for such a readable simulation, as I’ve had to go with actual pinouts for chips rather than logical groupings of signals. But it does map more closely onto the final hardware which is handy for thinking in actual chip-usage rather than abstract logic.
I’ve not bothered simulating the clock circuit, I’ve just wired in a clock source. I’ve also not added the ROM DIP switches, instead adding a ROM element and wiring it into the address and data lines. By right-clicking and viewing the attributes, it is possible to define a 16-byte ROM (4 address, 8 data lines) and edit the contents.
The ROM element takes a multiplexed source and produces a multiplexed output, so I use a splitter/mixer function to turn that into D0-D7 as shown above. Similarly the output of the 74HC161 acting as the program counter (PC) has A0-A3 mixed into a single ADDR bus line.
I’ve added outputs to the two registers to show their contents during execution. I’ve also added a DIP switch on the /RESET line to allow me to start and stop the simulation.
The video below shows it running the above ROM contents, which is the same demo program I used in Part 6 with the microcontroller ROM.
The simulator can be found on GitHub here: https://github.com/diyelectromusic/TD4-CPU
https://makertube.net/w/5njzGmYvqXiU3DLCMMtwqp
Now I have an easier way of experimenting, onto the enhancements.
Increasing the Address Space
The address space is currently implemented as follows:
- A 4-bit counter register based on a HC161 4-bit synchronous binary counter.
- A HC154 4 to 16 line decoder/multiplexer for DIP switch selection.
- A HC540 octal buffer/line driver to buffer (and invert) the data outputs.
The counter auto increments on each clock pulse, thus moving through the address space, but it can also be a destination for the adder, allowing absolute jumps to specific addresses, thus implementing a JMP instruction.
To increase the address space, there are a few considerations:
- With more than 4 bits how should JMPs work? They will have remain 4-bits unless the data width is increased.
- Each additional bit of address space will double the number of DIP switches required.
- The next size of binary counter above 4-bits is typically 8-bits.
One idea is to use the RCO pin of the 161. This is the “ripple carry out” and can be used to cascade counters for greater than 4-bit counting. As I understand things, RCO will be HIGH once all outputs are also HIGH, for a single clock pulse. This can be used to enable a following counter for that pulse. This is shown below (taken from the datasheet).
And this is the sample application, again from the datasheet, showing how it would work, with extensions on to additional stages.
A simple way to add an additional bit of address space might be to feed RCO into a flip-flop acting as a toggle in the configuration shown below.
This can then be used to select between two HC154 4 to 16 decoders. As I already have an unused flip flop as part of the HC74 used for the CARRY, this could be quite an appealing solution and in simulation it does appear to work.
There is one slight complication. As show above, A5 will toggle with A0-A3 = 1111 not as they change back to 0000. This is because the flip-flop toggles on the rising edge of the provided clock signal, which in this case is RCO from the 74HC161. Adding a NOT gate means that the rising edge happens as the 161’s RCO signal drops when it resets back to 0000.
Whilst this solves the sequencing problem it does have the unfortunately side effect that the RESET state means that A5 is 1 on power up. That too could be solved with another NOT gate if required, or simply hanging A5 off the /Q output of the flip-flop rather than the Q output.
Here is the additional wiring, in simulator form, to allow this to work.
Note the addition of A4 which now comes from the spare flip-flop /1Q output, and the linking of RCO via a NOT gate to the flip-flop 1CP clock input. The rest of flip-flop 1 is configured in toggle mode, with /1RD and /1SD both tied high (inactive) and 1D linked to /1Q for the feedback. The non-inverting output 1Q is not used.
Whilst this seems to require an additional logic gate (for the NOT) it turns out that there is a spare Schmidt trigger inverter on the 74HC14 that supports the clock circuit, so that is pretty convenient.
The ROM has also been reconfigured for 5 address inputs with the same 8 data bits, creating a 32 x 8 bit ROM.
There are a few issues with this though:
- JMP/JNC only work within the same half of the memory, so JMP 4 in the first 16 locations will jump to location address 0x04, but JMP 4 in the second 16 locations will jump to location address 0x14.
- A JMP 0 in the last location of each half will carry forward into the next half, as the counter ticks over at the same time as the load happens. So JMP 0 in address 0x0F will jump to address 0x10 and JMP 0 in address 0x1F will jump to address 0x00.
But if one can program around those constraints this is quite a simple solution.
An Alternative Solution
There is a neat solution to adding a 5th address bit here: https://xyama.sakura.ne.jp/hp/4bitCPU_TD4.html#memory
This uses the duplicate JMP/JNC instructions to encode a JMP2/JNC2 that results in the 5th address bit being set, this enabling a jump to the second half of the memory.
In order to create the additional address line, there is a second PC register added – i.e. a 5th HC161 counter. As far as I can see the operation is as follows:
- When the first PC register carries over, the second PC register counts up.
- As only the first output of the second PC register is used, as it counts that output will simply alternate between 0 and 1.
- The second PC register can take 1 as an input when the decoded instructions match JMP2 or JNC2 (D5 low, D6 and D7 high, with either D4 or CARRY), forcing A4 on when the first PC register is loaded with the 4-bit jump value, creating a JMP to the second half of the address space.
- There is an A4 and /A4 signal which alternatively enable the two address decoders for the ROM.
- This specific circuit uses four HC138 chips rather than two HC154, but the principle of operation is the same – generate one of 32 signals for the ROM from 5 bits of address line.
The modifications to support this are fairly simple and it is neat how it uses redundancy in the instruction set to work, but it does require an additional 74HC161 chip.
Combine the two?
If additional logic can be used to address the second PC in the second solution above, then I’m wondering if that could also be used to deliberately set or reset the flip-flop in the first solution too.
The key will be overriding the flip-flop state to preset A4 if the logic sequence for the spare JNC/JMP instructions turn up. If the /1SD input is active (LOW) then the output will be HIGH. If the /1RD input is active (LOW) then the output will be LOW.
Here is the additional instruction decoding logic – I’m using NAND gates as the NOTs here, so I can just use a single quad NAND gate chip.
So, the truth table for this is as follows:
D4D5D6D7/C/LDPCENA400111011011X0101111001111X00XX00X10XX10X10XX01X10
This corresponds to D7+D6 and either D4 or CARRY and NOT D5 causing the ENA4 signal to be true thus implementing the second JNC and JMP instructions (b1100 and b1101).
Unfortunately, so far, I’ve not been able to figure out an option for driving the flip-flop where the logic pans out to correctly set A0-A3 and A4 to successfully load the PC + flip-flop as required by the new instruction, so I might have to leave that for now.
TD4 Arduino 5-bit Address PCB
At this point I thought I had enough to warrant building a new PCB for a microcontroller memory version of the TD4 with the option to support a 5-bit address bus with the limitations described above.
I took the PCB from Part 5 as the starting point and replaced the ROM logic with an Arduino Nano and added in the flip-flop to create the 5-bit address bus.
The ROM section is replaced with the Arduino as shown below.
The CPU section now uses the spare NOT gate from the PWRCLK section and the spare flip-flop from the CPU section as shown below.
I believe these were the only parts to change. I have included the option to disable the RESET button by cutting a solder jumper and replacing it with a link to an Arduino IO pin.
I’ve also added headers to breakout the unused Arduino IO pins just in case that becomes useful at some point.
The complete Arduino Nano pinout is as follows:
TD4 SignalArduino Nano IOA0-A4A0-A4 (A4 optional)D0-D3D8-D11D4-D7D4-D7/RESETD12 (optional)
The board can be powered either via the Arduinos USB port or via the PCB micro USB port.
Complete Bill Of Materials
ICs:
- 1x 74HC10 Triple 3-input NAND
- 1x 74HC14 Hex Schmitt trigger inverters
- 1x 74HC32 Quad 2-input OR
- 1x 74HC74 Dual D-Type Flip Flop
- 2x 74HC153 Dual 4-to-1 selector/multiplexer
- 4x 74HC161 4-bit binary counter
- 1x 74HC283 4-bit binary full adder
Semiconductors and Passive Components
- 25x 3x2mm rectangular LED
- Resistors: 2x100R; 33x 1K; 1x 3K3; 1x 10K; 1 x 33K; 3x 100K
- Capacitors: 3x 10uF electrolytic
Other components:
- 2x SPDT slider switches (see PCB for footprint)
- 1x micro USB socket (Molex, see PCB for footprint)
- 2x tactile switches
- 1x 4-way DIP switches
- DIP sockets: 7x 16 way; 4x 14 way
- 2x 15-way pin header sockets
And 1 Arduino Nano of course.
The PCB can be found on Github here: https://github.com/diyelectromusic/TD4-CPU. The video at the end of this post shows it in action.
Conclusion
I was hopeful I could add a 5th address line just using the spare components in the circuit and not adding to the chip count, and that is kind of possible as long as I’m ok with the limitations of the JMPs.
Building all this onto a PCB will make further programming experiments quite a lot easier.
But the next step is to see if the instruction set can be expanded. I am still in search of that illusive two-register add.
Kevin
https://makertube.net/w/fAp8ZsbPLUYEKiStc34J9o
#arduinoNano #pcb #td4
#arduinonano #pcb #td4
Kevin's Blog @[email protected] · 2025-11-26 · 17:40 UTC
TD4 4-bit DIY CPU – Part 8
Now that I’ve shown I could support more ROM if required using a microcontroller (see Part 6) I can start to ponder how that might be possible.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
There are several other expansions to consider too. Other things I’m pondering are:
- Can I find a way to add the two registers together?
- Are there options to add another register?
- Is 4-bit data still enough?
- Could any extensions be added in a way that is backwards compatible with the existing instructions and behaviours?
And probably a few other odds and ends as I go back and reconsider the schematic as it stands, but they can wait for a future post.
TD4 Simulation
Before I get stuck into the updates, I thought it would be useful to be able to simulate the TD4 to allow for quick turn-around experiments.
I’ve used the “Digital” logic simulator which can be found here: https://github.com/hneemann/Digital
I could have build the simulator from basic logic gates and that would perhaps have been more useful in helping to understand how the design works. But I wanted something that would be easy to fiddle about with to test enhancements, so I build it using the actual 74xx logic chips instead. This doesn’t make for such a readable simulation, as I’ve had to go with actual pinouts for chips rather than logical groupings of signals. But it does map more closely onto the final hardware which is handy for thinking in actual chip-usage rather than abstract logic.
I’ve not bothered simulating the clock circuit, I’ve just wired in a clock source. I’ve also not added the ROM DIP switches, instead adding a ROM element and wiring it into the address and data lines. By right-clicking and viewing the attributes, it is possible to define a 16-byte ROM (4 address, 8 data lines) and edit the contents.
The ROM element takes a multiplexed source and produces a multiplexed output, so I use a splitter/mixer function to turn that into D0-D7 as shown above. Similarly the output of the 74HC161 acting as the program counter (PC) has A0-A3 mixed into a single ADDR bus line.
I’ve added outputs to the two registers to show their contents during execution. I’ve also added a DIP switch on the /RESET line to allow me to start and stop the simulation.
The video below shows it running the above ROM contents, which is the same demo program I used in Part 6 with the microcontroller ROM.
The simulator can be found on GitHub here: https://github.com/diyelectromusic/TD4-CPU
https://makertube.net/w/5njzGmYvqXiU3DLCMMtwqp
Now I have an easier way of experimenting, onto the enhancements.
Increasing the Address Space
The address space is currently implemented as follows:
- A 4-bit counter register based on a HC161 4-bit synchronous binary counter.
- A HC154 4 to 16 line decoder/multiplexer for DIP switch selection.
- A HC540 octal buffer/line driver to buffer (and invert) the data outputs.
The counter auto increments on each clock pulse, thus moving through the address space, but it can also be a destination for the adder, allowing absolute jumps to specific addresses, thus implementing a JMP instruction.
To increase the address space, there are a few considerations:
- With more than 4 bits how should JMPs work? They will have remain 4-bits unless the data width is increased.
- Each additional bit of address space will double the number of DIP switches required.
- The next size of binary counter above 4-bits is typically 8-bits.
One idea is to use the RCO pin of the 161. This is the “ripple carry out” and can be used to cascade counters for greater than 4-bit counting. As I understand things, RCO will be HIGH once all outputs are also HIGH, for a single clock pulse. This can be used to enable a following counter for that pulse. This is shown below (taken from the datasheet).
And this is the sample application, again from the datasheet, showing how it would work, with extensions on to additional stages.
A simple way to add an additional bit of address space might be to feed RCO into a flip-flop acting as a toggle in the configuration shown below.
This can then be used to select between two HC154 4 to 16 decoders. As I already have an unused flip flop as part of the HC74 used for the CARRY, this could be quite an appealing solution and in simulation it does appear to work.
There is one slight complication. As show above, A5 will toggle with A0-A3 = 1111 not as they change back to 0000. This is because the flip-flop toggles on the rising edge of the provided clock signal, which in this case is RCO from the 74HC161. Adding a NOT gate means that the rising edge happens as the 161’s RCO signal drops when it resets back to 0000.
Whilst this solves the sequencing problem it does have the unfortunately side effect that the RESET state means that A5 is 1 on power up. That too could be solved with another NOT gate if required, or simply hanging A5 off the /Q output of the flip-flop rather than the Q output.
Here is the additional wiring, in simulator form, to allow this to work.
Note the addition of A4 which now comes from the spare flip-flop /1Q output, and the linking of RCO via a NOT gate to the flip-flop 1CP clock input. The rest of flip-flop 1 is configured in toggle mode, with /1RD and /1SD both tied high (inactive) and 1D linked to /1Q for the feedback. The non-inverting output 1Q is not used.
Whilst this seems to require an additional logic gate (for the NOT) it turns out that there is a spare Schmidt trigger inverter on the 74HC14 that supports the clock circuit, so that is pretty convenient.
The ROM has also been reconfigured for 5 address inputs with the same 8 data bits, creating a 32 x 8 bit ROM.
There are a few issues with this though:
- JMP/JNC only work within the same half of the memory, so JMP 4 in the first 16 locations will jump to location address 0x04, but JMP 4 in the second 16 locations will jump to location address 0x14.
- A JMP 0 in the last location of each half will carry forward into the next half, as the counter ticks over at the same time as the load happens. So JMP 0 in address 0x0F will jump to address 0x10 and JMP 0 in address 0x1F will jump to address 0x00.
But if one can program around those constraints this is quite a simple solution.
An Alternative Solution
There is a neat solution to adding a 5th address bit here: https://xyama.sakura.ne.jp/hp/4bitCPU_TD4.html#memory
This uses the duplicate JMP/JNC instructions to encode a JMP2/JNC2 that results in the 5th address bit being set, this enabling a jump to the second half of the memory.
In order to create the additional address line, there is a second PC register added – i.e. a 5th HC161 counter. As far as I can see the operation is as follows:
- When the first PC register carries over, the second PC register counts up.
- As only the first output of the second PC register is used, as it counts that output will simply alternate between 0 and 1.
- The second PC register can take 1 as an input when the decoded instructions match JMP2 or JNC2 (D5 low, D6 and D7 high, with either D4 or CARRY), forcing A4 on when the first PC register is loaded with the 4-bit jump value, creating a JMP to the second half of the address space.
- There is an A4 and /A4 signal which alternatively enable the two address decoders for the ROM.
- This specific circuit uses four HC138 chips rather than two HC154, but the principle of operation is the same – generate one of 32 signals for the ROM from 5 bits of address line.
The modifications to support this are fairly simple and it is neat how it uses redundancy in the instruction set to work, but it does require an additional 74HC161 chip.
Combine the two?
If additional logic can be used to address the second PC in the second solution above, then I’m wondering if that could also be used to deliberately set or reset the flip-flop in the first solution too.
The key will be overriding the flip-flop state to preset A4 if the logic sequence for the spare JNC/JMP instructions turn up. If the /1SD input is active (LOW) then the output will be HIGH. If the /1RD input is active (LOW) then the output will be LOW.
Here is the additional instruction decoding logic – I’m using NAND gates as the NOTs here, so I can just use a single quad NAND gate chip.
So, the truth table for this is as follows:
D4D5D6D7/C/LDPCENA400111011011X0101111001111X00XX00X10XX10X10XX01X10
This corresponds to D7+D6 and either D4 or CARRY and NOT D5 causing the ENA4 signal to be true thus implementing the second JNC and JMP instructions (b1100 and b1101).
Unfortunately, so far, I’ve not been able to figure out an option for driving the flip-flop where the logic pans out to correctly set A0-A3 and A4 to successfully load the PC + flip-flop as required by the new instruction, so I might have to leave that for now.
TD4 Arduino 5-bit Address PCB
At this point I thought I had enough to warrant building a new PCB for a microcontroller memory version of the TD4 with the option to support a 5-bit address bus with the limitations described above.
I took the PCB from Part 5 as the starting point and replaced the ROM logic with an Arduino Nano and added in the flip-flop to create the 5-bit address bus.
The ROM section is replaced with the Arduino as shown below.
The CPU section now uses the spare NOT gate from the PWRCLK section and the spare flip-flop from the CPU section as shown below.
I believe these were the only parts to change. I have included the option to disable the RESET button by cutting a solder jumper and replacing it with a link to an Arduino IO pin.
I’ve also added headers to breakout the unused Arduino IO pins just in case that becomes useful at some point.
The complete Arduino Nano pinout is as follows:
TD4 SignalArduino Nano IOA0-A4A0-A4 (A4 optional)D0-D3D8-D11D4-D7D4-D7/RESETD12 (optional)
The board can be powered either via the Arduinos USB port or via the PCB micro USB port.
Complete Bill Of Materials
ICs:
- 1x 74HC10 Triple 3-input NAND
- 1x 74HC14 Hex Schmitt trigger inverters
- 1x 74HC32 Quad 2-input OR
- 1x 74HC74 Dual D-Type Flip Flop
- 2x 74HC153 Dual 4-to-1 selector/multiplexer
- 4x 74HC161 4-bit binary counter
- 1x 74HC283 4-bit binary full adder
Semiconductors and Passive Components
- 25x 3x2mm rectangular LED
- Resistors: 2x100R; 33x 1K; 1x 3K3; 1x 10K; 1 x 33K; 3x 100K
- Capacitors: 3x 10uF electrolytic
Other components:
- 2x SPDT slider switches (see PCB for footprint)
- 1x micro USB socket (Molex, see PCB for footprint)
- 2x tactile switches
- 1x 4-way DIP switches
- DIP sockets: 7x 16 way; 4x 14 way
- 2x 15-way pin header sockets
And 1 Arduino Nano of course.
The PCB can be found on Github here: https://github.com/diyelectromusic/TD4-CPU. The video at the end of this post shows it in action.
Conclusion
I was hopeful I could add a 5th address line just using the spare components in the circuit and not adding to the chip count, and that is kind of possible as long as I’m ok with the limitations of the JMPs.
Building all this onto a PCB will make further programming experiments quite a lot easier.
But the next step is to see if the instruction set can be expanded. I am still in search of that illusive two-register add.
Kevin
https://makertube.net/w/fAp8ZsbPLUYEKiStc34J9o
#arduinoNano #pcb #td4
#td4 #pcb #arduinonano
Kevin's Blog @[email protected] · 2025-11-26 · 17:40 UTC
TD4 4-bit DIY CPU – Part 8
Now that I’ve shown I could support more ROM if required using a microcontroller (see Part 6) I can start to ponder how that might be possible.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
There are several other expansions to consider too. Other things I’m pondering are:
- Can I find a way to add the two registers together?
- Are there options to add another register?
- Is 4-bit data still enough?
- Could a 4×4 output grid be supported?
- Could any extensions be added in a way that is backwards compatible with the existing instructions and behaviours?
And probably a few other odds and ends as I go back and reconsider the schematic as it stands, but they can wait for a future post.
TD4 Simulation
Before I get stuck into the updates, I thought it would be useful to be able to simulate the TD4 to allow for quick turn-around experiments.
I’ve used the “Digital” logic simulator which can be found here: https://github.com/hneemann/Digital
I could have build the simulator from basic logic gates and that would perhaps have been more useful in helping to understand how the design works. But I wanted something that would be easy to fiddle about with to test enhancements, so I build it using the actual 74xx logic chips instead. This doesn’t make for such a readable simulation, as I’ve had to go with actual pinouts for chips rather than logical groupings of signals. But it does map more closely onto the final hardware which is handy for thinking in actual chip-usage rather than abstract logic.
I’ve not bothered simulating the clock circuit, I’ve just wired in a clock source. I’ve also not added the ROM DIP switches, instead adding a ROM element and wiring it into the address and data lines. By right-clicking and viewing the attributes, it is possible to define a 16-byte ROM (4 address, 8 data lines) and edit the contents.
The ROM element takes a multiplexed source and produces a multiplexed output, so I use a splitter/mixer function to turn that into D0-D7 as shown above. Similarly the output of the 74HC161 acting as the program counter (PC) has A0-A3 mixed into a single ADDR bus line.
I’ve added outputs to the two registers to show their contents during execution. I’ve also added a DIP switch on the /RESET line to allow me to start and stop the simulation.
The video below shows it running the above ROM contents, which is the same demo program I used in Part 6 with the microcontroller ROM.
The simulator can be found on GitHub here: https://github.com/diyelectromusic/TD4-CPU
https://makertube.net/w/5njzGmYvqXiU3DLCMMtwqp
Now I have an easier way of experimenting, onto the enhancements.
Increasing the Address Space
The address space is currently implemented as follows:
- A 4-bit counter register based on a HC161 4-bit synchronous binary counter.
- A HC154 4 to 16 line decoder/multiplexer for DIP switch selection.
- A HC540 octal buffer/line driver to buffer (and invert) the data outputs.
The counter auto increments on each clock pulse, thus moving through the address space, but it can also be a destination for the adder, allowing absolute jumps to specific addresses, thus implementing a JMP instruction.
To increase the address space, there are a few considerations:
- With more than 4 bits how should JMPs work? They will have remain 4-bits unless the data width is increased.
- Each additional bit of address space will double the number of DIP switches required.
- The next size of binary counter above 4-bits is typically 8-bits.
One idea is to use the RCO pin of the 161. This is the “ripple carry out” and can be used to cascade counters for greater than 4-bit counting. As I understand things, RCO will be HIGH once all outputs are also HIGH, for a single clock pulse. This can be used to enable a following counter for that pulse. This is shown below (taken from the datasheet).
And this is the sample application, again from the datasheet, showing how it would work, with extensions on to additional stages.
A simple way to add an additional bit of address space might be to feed RCO into a flip-flop acting as a toggle in the configuration shown below.
This can then be used to select between two HC154 4 to 16 decoders. As I already have an unused flip flop as part of the HC74 used for the CARRY, this could be quite an appealing solution and in simulation it does appear to work.
There is one slight complication. As show above, A5 will toggle with A0-A3 = 1111 not as they change back to 0000. This is because the flip-flop toggles on the rising edge of the provided clock signal, which in this case is RCO from the 74HC161. Adding a NOT gate means that the rising edge happens as the 161’s RCO signal drops when it resets back to 0000.
Whilst this solves the sequencing problem it does have the unfortunately side effect that the RESET state means that A5 is 1 on power up. That too could be solved with another NOT gate if required, or simply hanging A5 off the /Q output of the flip-flop rather than the Q output.
Here is the additional wiring, in simulator form, to allow this to work.
Note the addition of A4 which now comes from the spare flip-flop /1Q output, and the linking of RCO via a NOT gate to the flip-flop 1CP clock input. The rest of flip-flop 1 is configured in toggle mode, with /1RD and /1SD both tied high (inactive) and 1D linked to /1Q for the feedback. The non-inverting output 1Q is not used.
Whilst this seems to require an additional logic gate (for the NOT) it turns out that there is a spare Schmidt trigger inverter on the 74HC14 that supports the clock circuit, so that is pretty convenient.
The ROM has also been reconfigured for 5 address inputs with the same 8 data bits, creating a 32 x 8 bit ROM.
There are a few issues with this though:
- JMP/JNC only work within the same half of the memory, so JMP 4 in the first 16 locations will jump to location address 0x04, but JMP 4 in the second 16 locations will jump to location address 0x14.
- A JMP 0 in the last location of each half will carry forward into the next half, as the counter ticks over at the same time as the load happens. So JMP 0 in address 0x0F will jump to address 0x10 and JMP 0 in address 0x1F will jump to address 0x00.
But if one can program around those constraints this is quite a simple solution.
An Alternative Solution
There is a neat solution to adding a 5th address bit here: https://xyama.sakura.ne.jp/hp/4bitCPU_TD4.html#memory
This uses the duplicate JMP/JNC instructions to encode a JMP2/JNC2 that results in the 5th address bit being set, this enabling a jump to the second half of the memory.
In order to create the additional address line, there is a second PC register added – i.e. a 5th HC161 counter. As far as I can see the operation is as follows:
- When the first PC register carries over, the second PC register counts up.
- As only the first output of the second PC register is used, as it counts that output will simply alternate between 0 and 1.
- The second PC register can take 1 as an input when the decoded instructions match JMP2 or JNC2 (D5 low, D6 and D7 high, with either D4 or CARRY), forcing A4 on when the first PC register is loaded with the 4-bit jump value, creating a JMP to the second half of the address space.
- There is an A4 and /A4 signal which alternatively enable the two address decoders for the ROM.
- This specific circuit uses four HC138 chips rather than two HC154, but the principle of operation is the same – generate one of 32 signals for the ROM from 5 bits of address line.
The modifications to support this are fairly simple and it is neat how it uses redundancy in the instruction set to work, but it does require an additional 74HC161 chip.
Combine the two?
If additional logic can be used to address the second PC in the second solution above, then I’m wondering if that could also be used to deliberately set or reset the flip-flop in the first solution too.
The key will be overriding the flip-flop state to preset A4 if the logic sequence for the spare JNC/JMP instructions turn up. If the /1SD input is active (LOW) then the output will be HIGH. If the /1RD input is active (LOW) then the output will be LOW.
Here is the additional instruction decoding logic – I’m using NAND gates as the NOTs here, so I can just use a single quad NAND gate chip.
So, the truth table for this is as follows:
D4D5D6D7/C/LDPCENA400111011011X0101111001111X00XX00X10XX10X10XX01X10
This corresponds to D7+D6 and either D4 or CARRY and NOT D5 causing the ENA4 signal to be true thus implementing the second JNC and JMP instructions (b1100 and b1101).
Unfortunately, so far, I’ve not been able to figure out an option for driving the flip-flop where the logic pans out to correctly set A0-A3 and A4 to successfully load the PC + flip-flop as required by the new instruction, so I might have to leave that for now.
TD4 Arduino 5-bit Address PCB
At this point I thought I had enough to warrant building a new PCB for a microcontroller memory version of the TD4 with the option to support a 5-bit address bus with the limitations described above.
I took the PCB from Part 5 as the starting point and replaced the ROM logic with an Arduino Nano and added in the flip-flop to create the 5-bit address bus.
The ROM section is replaced with the Arduino as shown below.
The CPU section now uses the spare NOT gate from the PWRCLK section and the spare flip-flop from the CPU section as shown below.
I believe these were the only parts to change. I have included the option to disable the RESET button by cutting a solder jumper and replacing it with a link to an Arduino IO pin.
I’ve also added headers to breakout the unused Arduino IO pins just in case that becomes useful at some point.
The complete Arduino Nano pinout is as follows:
TD4 SignalArduino Nano IOA0-A4A0-A4 (A4 optional)D0-D3D8-D11D4-D7D4-D7/RESETD12 (optional)
The board can be powered either via the Arduinos USB port or via the PCB micro USB port.
Complete Bill Of Materials
ICs:
- 1x 74HC10 Triple 3-input NAND
- 1x 74HC14 Hex Schmitt trigger inverters
- 1x 74HC32 Quad 2-input OR
- 1x 74HC74 Dual D-Type Flip Flop
- 2x 74HC153 Dual 4-to-1 selector/multiplexer
- 4x 74HC161 4-bit binary counter
- 1x 74HC283 4-bit binary full adder
Semiconductors and Passive Components
- 25x 3x2mm rectangular LED
- Resistors: 2x100R; 33x 1K; 1x 3K3; 1x 10K; 1 x 33K; 3x 100K
- Capacitors: 3x 10uF electrolytic
Other components:
- 2x SPDT slider switches (see PCB for footprint)
- 1x micro USB socket (Molex, see PCB for footprint)
- 2x tactile switches
- 1x 4-way DIP switches
- DIP sockets: 7x 16 way; 4x 14 way
- 2x 15-way pin header sockets
And 1 Arduino Nano of course.
The PCB can be found on Github here: https://github.com/diyelectromusic/TD4-CPU. The video at the end of this post shows it in action.
Nano Assembler Update
I’ve updated my Nano assembler with a new command to change to 5-bit address mode if required.
```
Help
----
H: Help
L: List
G: Goto
C: Clear
R: Restore
A: Addr Mode
O: Opcodes
OpCode
OpCode im

Current line: b0101 [15]

Address Mode = 5 bit


RAM Disassembly

b00000 [0]: OUT b0001b1010 00010xA1b10000 [10]: OUT b0001b1010 00010xA1
b00001 [1]: ADDA b0001b0000 00010x01b10001 [11]: OUT b0010b1010 00100xA2
b00010 [2]: OUT b0010b1010 00100xA2b10010 [12]: OUT b0100b1010 01000xA4
b00011 [3]: ADDB b0001b0101 00010x51b10011 [13]: OUT b1000b1010 10000xA8
b00100 [4]: OUT b0100b1010 01000xA4b10100 [14]: OUT b0100b1010 01000xA4
b00101 [5]: ADDA b0001b0000 00010x01b10101 [15]: OUT b0100b1010 01000xA4
b00110 [6]: OUT b1000b1010 10000xA8b10110 [16]: OUT b0010b1010 00100xA2
b00111 [7]: ADDB b0001b0101 00010x51b10111 [17]: OUT b0001b1010 00010xA1
b01000 [8]: OUT b0100b1010 01000xA4b11000 [18]: OUT b0001b1010 00010xA1
b01001 [9]: ADDA b0001b0000 00010x01b11001 [19]: OUT b0010b1010 00100xA2
b01010 [A]: OUT b0010b1010 00100xA2b11010 [1A]: OUT b0100b1010 01000xA4
b01011 [B]: ADDB b0001b0101 00010x51b11011 [1B]: OUT b1000b1010 10000xA8
b01100 [C]: OUT b0001b1010 00010xA1b11100 [1C]: OUT b1000b1010 10000xA8
b01101 [D]: ADDA b0000b0000 00000x00b11101 [1D]: OUT b0100b1010 01000xA4
b01110 [E]: OUT b1111b1010 11110xAFb11110 [1E]: OUT b0010b1010 00100xA2
b01111 [F]: ADDA b0000b0000 00000x00b11111 [1F]: OUT b0001b1010 00010xA1
Current line: b10101 [15]
```
When in 4-bit mode (the default) it will continue to act as previously, wrapping the address around between 0 and 15. But when it switches to 5-bit mode it will now wrap between 0 and 31 and the list function will show the whole 32 bytes of RAM/ROM side by side as show above.
The updated sketch is available on GitHub here.
Conclusion
I was hopeful I could add a 5th address line just using the spare components in the circuit and not adding to the chip count, and that is kind of possible as long as I’m ok with the limitations of the JMPs.
Building all this onto a PCB will make further programming experiments quite a lot easier.
But the next step is to see if the instruction set can be expanded. I am still in search of that illusive two-register add.
Kevin
https://makertube.net/w/fAp8ZsbPLUYEKiStc34J9o
#arduinoNano #pcb #TD4
#arduinonano #pcb #td4
Kevin's Blog @[email protected] · 2025-11-21 · 17:11 UTC
TD4 4-bit DIY CPU – Part 7
Once the idea was floated, in Part 6 of creating an Arduino “direct to ROM” assembler, I had to just do it, so this post is a little diversion from the hardware discussion into how that could work.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
Basic Concepts
This relies on using an Arduino as the ROM as described in Part 6, but the Arduino now has the option to change the ROM contents independently of the TD4 itself.
The Arduino sketch will do the following:
- Run the TD4 ROM routine off a timer interrupt so that it is always running and responsive.
- Take input over the Arduino serial port to allow basic control, e.g. list, clear, etc.
- Allow the direct input of assembler instructions, such as MOVE A,B or OUT B and so on.
- Provide a means of selecting which line of the program to change.
The code will thus have a number of key sections:
- The TD4 ROM routine.
- Some kind of serial-port command-line interpreter.
- Handler routines for all the commands.
- An assembler.
- A disassembler.
The TD4 ROM routine has already been fully described in Part 6. The only difference is that the scanning routine will be driven from a 1mS timer using the TimerOne library.
As I want to still support a built-in demo, I now have the concept of ROM being the demo code and RAM being the “live” code to pass onto the TD4. The Arduino will initialise the RAM on startup from the ROM.
As far as the TD4 is concerned of course, this is all still ROM.
Command Line Interpreter
The standard Arduino Serial routines will be used to scan for input via the serial port. It will support a line-oriented input as follows:
```
bool cmdRunner (void) {
  while (Serial.available()) {
    char c = Serial.read();
    if (c == '\n') {
      strcpy(cmdSaved, cmdInput);
      cmdIdx = 0;
      return true;
    }
    else if (cmdIdx < CMD_BUFFER-1) {
      cmdInput[cmdIdx++] = c;
      cmdInput[cmdIdx] = '\0';
    }
  }
  return false;
}
```
This will keep adding any received characters to the cmdInput buffer until a newline is received, at which point the command is saved in cmdSaved and the routine will return true indicating a full line is ready to be processed.
Once a complete line is received, then a processing function will parse it.
Key to the processing of commands is a command table that stores the text to match and the handler function to call on finding a valid command. There is an additional parameter that will be passed into the handler function to allow the same handler function to support several commands. This will be used in the assembler itself later.
```
struct cmd_t {
  char    cmd[CMD_BUFFER+1];
  hdlr_t  pFn;
  uint8_t idx;
};

const cmd_t PROGMEM cmdTable[NUM_CMDS] = {
  {"H", hdlrHelp, 0},
  {"L", hdlrList, 0},
  {"G", hdlrGoto, 0},
};
```
The algorithm for parsing commands is as follows:
```
cmdProcess:
  Look for a space or newline
  IF found a space THEN
    This is the start of the parameter

  Look for the command in the command table
  IF command found THEN
    Call the handler function with the parameters
```
The implementation is a bit complex, as it uses string pointers and has to chop and parse strings as it goes. It is also detailing with the command table in the Arduino’s PROGMEM which is an additional complication too.
In order to be able to use the same command line interpreter for the input of assembler instructions, I’ve had to simplify the syntax. There are no spaces in opcodes and there has to be a space between the opcode and immediate value if used.
Here are some examples:
```
IN A       -> INA
MOVE A,B   -> MOVAB
OUT im     -> OUT im
JNC im     -> JNC im
ADD A,im   -> ADDA im
```
Handler Routines
All handler routines have the following prototype:
```
typedef void (*hdlr_t)(int idx, char *param);

void hdlrHelp(int idx, char *pParam) {
  Serial.print("\nHelp\n----\n");
  Serial.println("H: Help");
}
```
The idx parameter is the number in the last field of the command table. pParam will be a pointer to the parameter string for the command (if used).
As we’re dealing with strings all the time, there are a number of helper functions to do things like convert strings to numbers as well as others to print numbers in various formats.
Number formats are assumed to be as follows:
```
0..9   - decimal digits
0x0..F - hex digits
b0..1  - binary digits
```
The code provides the following:
- str2num – the basic string parsing routine to recognise all three number formats as strings.
- printbin – print a number in b0..1 format.
- printhex – print a number in 0x0..F format, allowing for a possible leading zero if required.
- printins – print an instruction in textual format.
- printop – print an instruction in binary and hex opcode format.
- printline – print a line number in a consistent binary and hex format.
The code supports the following commands, so each has its own handler function:
- H – help – show the list of commands.
- L – list – show the disassembly of the whole working memory (RAM).
- G – goto – set the working line number.
- C – clear – reset all working memory (RAM) to zeros.
- R – restore – restore the working memory (RAM) to the pre-build demo code (ROM).
- O – opcodes – list the supported opcodes.
Assembler
As already mentioned, I’m using the same command line interpreter code to create the assembler. To do this, each opcode has an entry in the command table:
```
const cmd_t PROGMEM cmdTable[NUM_CMDS] = {
  // Assembly commands - must be first
  {"ADDA", hdlrAsm, 0},
  {"MOVAB", hdlrAsm, 1},
  {"INA", hdlrAsm, 2},
  {"MOVA", hdlrAsm, 3},
  {"MOVBA", hdlrAsm, 4},
  {"ADDB", hdlrAsm, 5},
  {"INB", hdlrAsm, 6},
  {"MOVB", hdlrAsm, 7},
  {"OUTB", hdlrAsm, 8},
  {"OUT2B", hdlrAsm, 9},
  {"OUT", hdlrAsm, 10},
  {"OUT2", hdlrAsm, 11},
  {"JNCB", hdlrAsm, 12},
  {"JMPB", hdlrAsm, 13},
  {"JNC", hdlrAsm, 14},
  {"JMP", hdlrAsm, 15},

  // Other commands
  {"H", hdlrHelp, 0},
  {"L", hdlrList, 0},
  {"G", hdlrGoto, 0},
  {"C", hdlrClear, 0},
  {"R", hdlrRestore, 0},
  {"O", hdlrOpcodes, 0},
};
```
The order corresponds to the opcode command value, as does the parameter. As these are at the start of the table, I can assume that the position in the table is the same as the command value. This does mean that I also need to account for the duplicated instructions even if I don’t need to use them.
I’m making the following design decisions:
- There is the concept of a “current line” which can be set with the G (goto) command.
- Entering a valid opcode automatically moves the current line on by 1.
- No line information is entered as part of the opcode.
The main logic of the assembler handler is as follows:
```
Assembler:
  Command value is the provided index parameter
  Determine the im value from the provided string parameter
  RAM[line] = cmd << 4 + im
  Increment current line
```
Disassembler
Disassembly is really largely a look-up table matching opcode command values to text. This is all hidden away behind the two print routines printins() and printop().
```
void printins (uint8_t ins) {
  uint8_t cmd = ins >> 4;
  uint8_t im = ins & 0x0F;

  Serial.print(FSH(cmdTable[cmd].cmd));
  if (HASIM(cmd)) {
    Serial.print(" b");
    printbin(im,4);
  } else {
    Serial.print("      ");
  }
}

void printop (uint8_t op) {
  uint8_t cmd = op >> 4;
  uint8_t im = op & 0x0F;

  Serial.print("b");
  printbin(cmd,4);
  Serial.print(" ");
  printbin(im,4);
  Serial.print("\t0x");
  printhex(op,2);
}
```
The main complexity is pulling the strings out of the command table. I’ve had to include a macro to provide access to the strings from the Arduino’s PROGMEM:
```
#define FSH(x) ((const __FlashStringHelper *)x)
```
This feels like a bit of a hack, but apparently this is how it should be done for the kind of thing I need to do!
There is another macro here that needs explaining:
```
#define HASIM(op) (op==0||op==3||op==5||op==7||op>9)
```
This is a set of conditions that if true means that the command supports an immediate value. This is used in a few places to know how to parse the commands.
Whilst in principle all commands could use the immediate value, the “official” statement of how they work assumes im=0 in many cases. So, for example, OUT B does not require an immediate value, but if one is provided then OUT B becomes OUT B+im.
I’m not really supporting that with this code at the moment.
Putting it all together
Here is a serial output log of a session using the assembler.
```
> H
Help
----
H: Help
L: List
G: Goto
C: Clear
R: Restore
O: Opcodes
OpCode
OpCode im

Current line: b0000 [0]

> L
RAM Disassembly

b0000 [0]: JNC b1000b1110 10000xE8
b0001 [1]: JMP b0011b1111 00110xF3
b0010 [2]: OUT b0010b1010 00100xA2
b0011 [3]: ADDB b0001b0101 00010x51
b0100 [4]: OUT b0100b1010 01000xA4
b0101 [5]: ADDA b0001b0000 00010x01
b0110 [6]: OUT b1000b1010 10000xA8
b0111 [7]: ADDB b0001b0101 00010x51
b1000 [8]: OUT b0100b1010 01000xA4
b1001 [9]: ADDA b0001b0000 00010x01
b1010 [A]: OUT b0010b1010 00100xA2
b1011 [B]: ADDB b0001b0101 00010x51
b1100 [C]: JMP b0000b1111 00000xF0
b1101 [D]: ADDA b0000b0000 00000x00
b1110 [E]: ADDA b0000b0000 00000x00
b1111 [F]: ADDA b0000b0000 00000x00
Current line: b0010 [2]

> G 13
Goto line 13
Current line: b1101 [D]

> OUTB
Assemble:
b1101 [D] OUTB      b1000 00000x80
Current line: b1110 [E]

> L
RAM Disassembly

b0000 [0]: JNC b1000b1110 10000xE8
b0001 [1]: JMP b0011b1111 00110xF3
b0010 [2]: OUT b0010b1010 00100xA2
b0011 [3]: ADDB b0001b0101 00010x51
b0100 [4]: OUT b0100b1010 01000xA4
b0101 [5]: ADDA b0001b0000 00010x01
b0110 [6]: OUT b1000b1010 10000xA8
b0111 [7]: ADDB b0001b0101 00010x51
b1000 [8]: OUT b0100b1010 01000xA4
b1001 [9]: ADDA b0001b0000 00010x01
b1010 [A]: OUT b0010b1010 00100xA2
b1011 [B]: ADDB b0001b0101 00010x51
b1100 [C]: JMP b0000b1111 00000xF0
b1101 [D]: OUTB      b1000 00000x80
b1110 [E]: ADDA b0000b0000 00000x00
b1111 [F]: ADDA b0000b0000 00000x00
Current line: b1110 [E]

> O
Supported OpCodes:
  b0000 dataADDA im
  b0001 0000MOVAB
  b0010 0000INA
  b0011 dataMOVA im
  b0100 0000MOVBA
  b0101 dataADDB im
  b0110 0000INB
  b0111 dataMOVB im
  b1000 0000OUTB
  b1001 0000OUT2B
  b1010 dataOUT im
  b1011 dataOUT2 im
  b1100 dataJNCB im
  b1101 dataJMPB im
  b1110 dataJNC im
  b1111 dataJMP im

> C
Clearing RAM ... Done
```
Find the code on GitHub here.
Conclusion
The basics for this actually came together fairly quickly, but I must admit to spending a fair bit of time fiddling about with output formats and refactoring various bits of code to try to give some consistency in terms of when newlines are applied, what is shown in binary, what in hex, and so on.
I can’t guarantee everything has been caught, but I’ve typed in all the code (using the newer, limited syntax) from Part 3 and they all seem to work.
It would be nice to be able to automatically reset the TD4 from the Arduino, but for now, pressing the button when required is fine.
For the most part, unless there is a loop to get caught in, the code will cycle back to the start anyway.
In terms of possible updates and enhancements, there are a few on my mind:
- It would be nice to support the undocumented use of immediate values somehow.
- It might be nice to have a way to save/load the code. It only needs to be a string of 16 2-byte hex codes.
- It might be nice to have several demo programs to choose from.
If I expand the instruction set and architecture, then I’ll have to think again about chunks of this code, but for now, it seems to work pretty well.
Kevin
#4bit #arduinoUno #define #td4
#4bit #arduinouno #define #td4
Kevin's Blog @[email protected] · 2025-11-21 · 17:11 UTC
TD4 4-bit DIY CPU – Part 7
Once the idea was floated, in Part 6 of creating an Arduino “direct to ROM” assembler, I had to just do it, so this post is a little diversion from the hardware discussion into how that could work.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
Basic Concepts
This relies on using an Arduino as the ROM as described in Part 6, but the Arduino now has the option to change the ROM contents independently of the TD4 itself.
The Arduino sketch will do the following:
- Run the TD4 ROM routine off a timer interrupt so that it is always running and responsive.
- Take input over the Arduino serial port to allow basic control, e.g. list, clear, etc.
- Allow the direct input of assembler instructions, such as MOVE A,B or OUT B and so on.
- Provide a means of selecting which line of the program to change.
The code will thus have a number of key sections:
- The TD4 ROM routine.
- Some kind of serial-port command-line interpreter.
- Handler routines for all the commands.
- An assembler.
- A disassembler.
The TD4 ROM routine has already been fully described in Part 6. The only difference is that the scanning routine will be driven from a 1mS timer using the TimerOne library.
As I want to still support a built-in demo, I now have the concept of ROM being the demo code and RAM being the “live” code to pass onto the TD4. The Arduino will initialise the RAM on startup from the ROM.
As far as the TD4 is concerned of course, this is all still ROM.
Command Line Interpreter
The standard Arduino Serial routines will be used to scan for input via the serial port. It will support a line-oriented input as follows:
```
bool cmdRunner (void) {
  while (Serial.available()) {
    char c = Serial.read();
    if (c == '\n') {
      strcpy(cmdSaved, cmdInput);
      cmdIdx = 0;
      return true;
    }
    else if (cmdIdx < CMD_BUFFER-1) {
      cmdInput[cmdIdx++] = c;
      cmdInput[cmdIdx] = '\0';
    }
  }
  return false;
}
```
This will keep adding any received characters to the cmdInput buffer until a newline is received, at which point the command is saved in cmdSaved and the routine will return true indicating a full line is ready to be processed.
Once a complete line is received, then a processing function will parse it.
Key to the processing of commands is a command table that stores the text to match and the handler function to call on finding a valid command. There is an additional parameter that will be passed into the handler function to allow the same handler function to support several commands. This will be used in the assembler itself later.
```
struct cmd_t {
  char    cmd[CMD_BUFFER+1];
  hdlr_t  pFn;
  uint8_t idx;
};

const cmd_t PROGMEM cmdTable[NUM_CMDS] = {
  {"H", hdlrHelp, 0},
  {"L", hdlrList, 0},
  {"G", hdlrGoto, 0},
};
```
The algorithm for parsing commands is as follows:
```
cmdProcess:
  Look for a space or newline
  IF found a space THEN
    This is the start of the parameter

  Look for the command in the command table
  IF command found THEN
    Call the handler function with the parameters
```
The implementation is a bit complex, as it uses string pointers and has to chop and parse strings as it goes. It is also detailing with the command table in the Arduino’s PROGMEM which is an additional complication too.
In order to be able to use the same command line interpreter for the input of assembler instructions, I’ve had to simplify the syntax. There are no spaces in opcodes and there has to be a space between the opcode and immediate value if used.
Here are some examples:
```
IN A       -> INA
MOVE A,B   -> MOVAB
OUT im     -> OUT im
JNC im     -> JNC im
ADD A,im   -> ADDA im
```
Handler Routines
All handler routines have the following prototype:
```
typedef void (*hdlr_t)(int idx, char *param);

void hdlrHelp(int idx, char *pParam) {
  Serial.print("\nHelp\n----\n");
  Serial.println("H: Help");
}
```
The idx parameter is the number in the last field of the command table. pParam will be a pointer to the parameter string for the command (if used).
As we’re dealing with strings all the time, there are a number of helper functions to do things like convert strings to numbers as well as others to print numbers in various formats.
Number formats are assumed to be as follows:
```
0..9   - decimal digits
0x0..F - hex digits
b0..1  - binary digits
```
The code provides the following:
- str2num – the basic string parsing routine to recognise all three number formats as strings.
- printbin – print a number in b0..1 format.
- printhex – print a number in 0x0..F format, allowing for a possible leading zero if required.
- printins – print an instruction in textual format.
- printop – print an instruction in binary and hex opcode format.
- printline – print a line number in a consistent binary and hex format.
The code supports the following commands, so each has its own handler function:
- H – help – show the list of commands.
- L – list – show the disassembly of the whole working memory (RAM).
- G – goto – set the working line number.
- C – clear – reset all working memory (RAM) to zeros.
- R – restore – restore the working memory (RAM) to the pre-build demo code (ROM).
- O – opcodes – list the supported opcodes.
Assembler
As already mentioned, I’m using the same command line interpreter code to create the assembler. To do this, each opcode has an entry in the command table:
```
const cmd_t PROGMEM cmdTable[NUM_CMDS] = {
  // Assembly commands - must be first
  {"ADDA", hdlrAsm, 0},
  {"MOVAB", hdlrAsm, 1},
  {"INA", hdlrAsm, 2},
  {"MOVA", hdlrAsm, 3},
  {"MOVBA", hdlrAsm, 4},
  {"ADDB", hdlrAsm, 5},
  {"INB", hdlrAsm, 6},
  {"MOVB", hdlrAsm, 7},
  {"OUTB", hdlrAsm, 8},
  {"OUT2B", hdlrAsm, 9},
  {"OUT", hdlrAsm, 10},
  {"OUT2", hdlrAsm, 11},
  {"JNCB", hdlrAsm, 12},
  {"JMPB", hdlrAsm, 13},
  {"JNC", hdlrAsm, 14},
  {"JMP", hdlrAsm, 15},

  // Other commands
  {"H", hdlrHelp, 0},
  {"L", hdlrList, 0},
  {"G", hdlrGoto, 0},
  {"C", hdlrClear, 0},
  {"R", hdlrRestore, 0},
  {"O", hdlrOpcodes, 0},
};
```
The order corresponds to the opcode command value, as does the parameter. As these are at the start of the table, I can assume that the position in the table is the same as the command value. This does mean that I also need to account for the duplicated instructions even if I don’t need to use them.
I’m making the following design decisions:
- There is the concept of a “current line” which can be set with the G (goto) command.
- Entering a valid opcode automatically moves the current line on by 1.
- No line information is entered as part of the opcode.
The main logic of the assembler handler is as follows:
```
Assembler:
  Command value is the provided index parameter
  Determine the im value from the provided string parameter
  RAM[line] = cmd << 4 + im
  Increment current line
```
Disassembler
Disassembly is really largely a look-up table matching opcode command values to text. This is all hidden away behind the two print routines printins() and printop().
```
void printins (uint8_t ins) {
  uint8_t cmd = ins >> 4;
  uint8_t im = ins & 0x0F;

  Serial.print(FSH(cmdTable[cmd].cmd));
  if (HASIM(cmd)) {
    Serial.print(" b");
    printbin(im,4);
  } else {
    Serial.print("      ");
  }
}

void printop (uint8_t op) {
  uint8_t cmd = op >> 4;
  uint8_t im = op & 0x0F;

  Serial.print("b");
  printbin(cmd,4);
  Serial.print(" ");
  printbin(im,4);
  Serial.print("\t0x");
  printhex(op,2);
}
```
The main complexity is pulling the strings out of the command table. I’ve had to include a macro to provide access to the strings from the Arduino’s PROGMEM:
```
#define FSH(x) ((const __FlashStringHelper *)x)
```
This feels like a bit of a hack, but apparently this is how it should be done for the kind of thing I need to do!
There is another macro here that needs explaining:
```
#define HASIM(op) (op==0||op==3||op==5||op==7||op>9)
```
This is a set of conditions that if true means that the command supports an immediate value. This is used in a few places to know how to parse the commands.
Whilst in principle all commands could use the immediate value, the “official” statement of how they work assumes im=0 in many cases. So, for example, OUT B does not require an immediate value, but if one is provided then OUT B becomes OUT B+im.
I’m not really supporting that with this code at the moment.
Putting it all together
Here is a serial output log of a session using the assembler.
```
> H
Help
----
H: Help
L: List
G: Goto
C: Clear
R: Restore
O: Opcodes
OpCode
OpCode im

Current line: b0000 [0]

> L
RAM Disassembly

b0000 [0]: JNC b1000b1110 10000xE8
b0001 [1]: JMP b0011b1111 00110xF3
b0010 [2]: OUT b0010b1010 00100xA2
b0011 [3]: ADDB b0001b0101 00010x51
b0100 [4]: OUT b0100b1010 01000xA4
b0101 [5]: ADDA b0001b0000 00010x01
b0110 [6]: OUT b1000b1010 10000xA8
b0111 [7]: ADDB b0001b0101 00010x51
b1000 [8]: OUT b0100b1010 01000xA4
b1001 [9]: ADDA b0001b0000 00010x01
b1010 [A]: OUT b0010b1010 00100xA2
b1011 [B]: ADDB b0001b0101 00010x51
b1100 [C]: JMP b0000b1111 00000xF0
b1101 [D]: ADDA b0000b0000 00000x00
b1110 [E]: ADDA b0000b0000 00000x00
b1111 [F]: ADDA b0000b0000 00000x00
Current line: b0010 [2]

> G 13
Goto line 13
Current line: b1101 [D]

> OUTB
Assemble:
b1101 [D] OUTB      b1000 00000x80
Current line: b1110 [E]

> L
RAM Disassembly

b0000 [0]: JNC b1000b1110 10000xE8
b0001 [1]: JMP b0011b1111 00110xF3
b0010 [2]: OUT b0010b1010 00100xA2
b0011 [3]: ADDB b0001b0101 00010x51
b0100 [4]: OUT b0100b1010 01000xA4
b0101 [5]: ADDA b0001b0000 00010x01
b0110 [6]: OUT b1000b1010 10000xA8
b0111 [7]: ADDB b0001b0101 00010x51
b1000 [8]: OUT b0100b1010 01000xA4
b1001 [9]: ADDA b0001b0000 00010x01
b1010 [A]: OUT b0010b1010 00100xA2
b1011 [B]: ADDB b0001b0101 00010x51
b1100 [C]: JMP b0000b1111 00000xF0
b1101 [D]: OUTB      b1000 00000x80
b1110 [E]: ADDA b0000b0000 00000x00
b1111 [F]: ADDA b0000b0000 00000x00
Current line: b1110 [E]

> O
Supported OpCodes:
  b0000 dataADDA im
  b0001 0000MOVAB
  b0010 0000INA
  b0011 dataMOVA im
  b0100 0000MOVBA
  b0101 dataADDB im
  b0110 0000INB
  b0111 dataMOVB im
  b1000 0000OUTB
  b1001 0000OUT2B
  b1010 dataOUT im
  b1011 dataOUT2 im
  b1100 dataJNCB im
  b1101 dataJMPB im
  b1110 dataJNC im
  b1111 dataJMP im

> C
Clearing RAM ... Done
```
Find the code on GitHub here.
Conclusion
The basics for this actually came together fairly quickly, but I must admit to spending a fair bit of time fiddling about with output formats and refactoring various bits of code to try to give some consistency in terms of when newlines are applied, what is shown in binary, what in hex, and so on.
I can’t guarantee everything has been caught, but I’ve typed in all the code (using the newer, limited syntax) from Part 3 and they all seem to work.
It would be nice to be able to automatically reset the TD4 from the Arduino, but for now, pressing the button when required is fine.
For the most part, unless there is a loop to get caught in, the code will cycle back to the start anyway.
In terms of possible updates and enhancements, there are a few on my mind:
- It would be nice to support the undocumented use of immediate values somehow.
- It might be nice to have a way to save/load the code. It only needs to be a string of 16 2-byte hex codes.
- It might be nice to have several demo programs to choose from.
If I expand the instruction set and architecture, then I’ll have to think again about chunks of this code, but for now, it seems to work pretty well.
Kevin
#4bit #arduinoUno #define #td4
#4bit #arduinouno #define #td4
Kevin's Blog @[email protected] · 2025-11-21 · 17:11 UTC
TD4 4-bit DIY CPU – Part 7
Once the idea was floated, in Part 6 of creating an Arduino “direct to ROM” assembler, I had to just do it, so this post is a little diversion from the hardware discussion into how that could work.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
Basic Concepts
This relies on using an Arduino as the ROM as described in Part 6, but the Arduino now has the option to change the ROM contents independently of the TD4 itself.
The Arduino sketch will do the following:
- Run the TD4 ROM routine off a timer interrupt so that it is always running and responsive.
- Take input over the Arduino serial port to allow basic control, e.g. list, clear, etc.
- Allow the direct input of assembler instructions, such as MOVE A,B or OUT B and so on.
- Provide a means of selecting which line of the program to change.
The code will thus have a number of key sections:
- The TD4 ROM routine.
- Some kind of serial-port command-line interpreter.
- Handler routines for all the commands.
- An assembler.
- A disassembler.
The TD4 ROM routine has already been fully described in Part 6. The only difference is that the scanning routine will be driven from a 1mS timer using the TimerOne library.
As I want to still support a built-in demo, I now have the concept of ROM being the demo code and RAM being the “live” code to pass onto the TD4. The Arduino will initialise the RAM on startup from the ROM.
As far as the TD4 is concerned of course, this is all still ROM.
Command Line Interpreter
The standard Arduino Serial routines will be used to scan for input via the serial port. It will support a line-oriented input as follows:
```
bool cmdRunner (void) {
  while (Serial.available()) {
    char c = Serial.read();
    if (c == '\n') {
      strcpy(cmdSaved, cmdInput);
      cmdIdx = 0;
      return true;
    }
    else if (cmdIdx < CMD_BUFFER-1) {
      cmdInput[cmdIdx++] = c;
      cmdInput[cmdIdx] = '\0';
    }
  }
  return false;
}
```
This will keep adding any received characters to the cmdInput buffer until a newline is received, at which point the command is saved in cmdSaved and the routine will return true indicating a full line is ready to be processed.
Once a complete line is received, then a processing function will parse it.
Key to the processing of commands is a command table that stores the text to match and the handler function to call on finding a valid command. There is an additional parameter that will be passed into the handler function to allow the same handler function to support several commands. This will be used in the assembler itself later.
```
struct cmd_t {
  char    cmd[CMD_BUFFER+1];
  hdlr_t  pFn;
  uint8_t idx;
};

const cmd_t PROGMEM cmdTable[NUM_CMDS] = {
  {"H", hdlrHelp, 0},
  {"L", hdlrList, 0},
  {"G", hdlrGoto, 0},
};
```
The algorithm for parsing commands is as follows:
```
cmdProcess:
  Look for a space or newline
  IF found a space THEN
    This is the start of the parameter

  Look for the command in the command table
  IF command found THEN
    Call the handler function with the parameters
```
The implementation is a bit complex, as it uses string pointers and has to chop and parse strings as it goes. It is also detailing with the command table in the Arduino’s PROGMEM which is an additional complication too.
In order to be able to use the same command line interpreter for the input of assembler instructions, I’ve had to simplify the syntax. There are no spaces in opcodes and there has to be a space between the opcode and immediate value if used.
Here are some examples:
```
IN A       -> INA
MOVE A,B   -> MOVAB
OUT im     -> OUT im
JNC im     -> JNC im
ADD A,im   -> ADDA im
```
Handler Routines
All handler routines have the following prototype:
```
typedef void (*hdlr_t)(int idx, char *param);

void hdlrHelp(int idx, char *pParam) {
  Serial.print("\nHelp\n----\n");
  Serial.println("H: Help");
}
```
The idx parameter is the number in the last field of the command table. pParam will be a pointer to the parameter string for the command (if used).
As we’re dealing with strings all the time, there are a number of helper functions to do things like convert strings to numbers as well as others to print numbers in various formats.
Number formats are assumed to be as follows:
```
0..9   - decimal digits
0x0..F - hex digits
b0..1  - binary digits
```
The code provides the following:
- str2num – the basic string parsing routine to recognise all three number formats as strings.
- printbin – print a number in b0..1 format.
- printhex – print a number in 0x0..F format, allowing for a possible leading zero if required.
- printins – print an instruction in textual format.
- printop – print an instruction in binary and hex opcode format.
- printline – print a line number in a consistent binary and hex format.
The code supports the following commands, so each has its own handler function:
- H – help – show the list of commands.
- L – list – show the disassembly of the whole working memory (RAM).
- G – goto – set the working line number.
- C – clear – reset all working memory (RAM) to zeros.
- R – restore – restore the working memory (RAM) to the pre-build demo code (ROM).
- O – opcodes – list the supported opcodes.
Assembler
As already mentioned, I’m using the same command line interpreter code to create the assembler. To do this, each opcode has an entry in the command table:
```
const cmd_t PROGMEM cmdTable[NUM_CMDS] = {
  // Assembly commands - must be first
  {"ADDA", hdlrAsm, 0},
  {"MOVAB", hdlrAsm, 1},
  {"INA", hdlrAsm, 2},
  {"MOVA", hdlrAsm, 3},
  {"MOVBA", hdlrAsm, 4},
  {"ADDB", hdlrAsm, 5},
  {"INB", hdlrAsm, 6},
  {"MOVB", hdlrAsm, 7},
  {"OUTB", hdlrAsm, 8},
  {"OUT2B", hdlrAsm, 9},
  {"OUT", hdlrAsm, 10},
  {"OUT2", hdlrAsm, 11},
  {"JNCB", hdlrAsm, 12},
  {"JMPB", hdlrAsm, 13},
  {"JNC", hdlrAsm, 14},
  {"JMP", hdlrAsm, 15},

  // Other commands
  {"H", hdlrHelp, 0},
  {"L", hdlrList, 0},
  {"G", hdlrGoto, 0},
  {"C", hdlrClear, 0},
  {"R", hdlrRestore, 0},
  {"O", hdlrOpcodes, 0},
};
```
The order corresponds to the opcode command value, as does the parameter. As these are at the start of the table, I can assume that the position in the table is the same as the command value. This does mean that I also need to account for the duplicated instructions even if I don’t need to use them.
I’m making the following design decisions:
- There is the concept of a “current line” which can be set with the G (goto) command.
- Entering a valid opcode automatically moves the current line on by 1.
- No line information is entered as part of the opcode.
The main logic of the assembler handler is as follows:
```
Assembler:
  Command value is the provided index parameter
  Determine the im value from the provided string parameter
  RAM[line] = cmd << 4 + im
  Increment current line
```
Disassembler
Disassembly is really largely a look-up table matching opcode command values to text. This is all hidden away behind the two print routines printins() and printop().
```
void printins (uint8_t ins) {
  uint8_t cmd = ins >> 4;
  uint8_t im = ins & 0x0F;

  Serial.print(FSH(cmdTable[cmd].cmd));
  if (HASIM(cmd)) {
    Serial.print(" b");
    printbin(im,4);
  } else {
    Serial.print("      ");
  }
}

void printop (uint8_t op) {
  uint8_t cmd = op >> 4;
  uint8_t im = op & 0x0F;

  Serial.print("b");
  printbin(cmd,4);
  Serial.print(" ");
  printbin(im,4);
  Serial.print("\t0x");
  printhex(op,2);
}
```
The main complexity is pulling the strings out of the command table. I’ve had to include a macro to provide access to the strings from the Arduino’s PROGMEM:
```
#define FSH(x) ((const __FlashStringHelper *)x)
```
This feels like a bit of a hack, but apparently this is how it should be done for the kind of thing I need to do!
There is another macro here that needs explaining:
```
#define HASIM(op) (op==0||op==3||op==5||op==7||op>9)
```
This is a set of conditions that if true means that the command supports an immediate value. This is used in a few places to know how to parse the commands.
Whilst in principle all commands could use the immediate value, the “official” statement of how they work assumes im=0 in many cases. So, for example, OUT B does not require an immediate value, but if one is provided then OUT B becomes OUT B+im.
I’m not really supporting that with this code at the moment.
Putting it all together
Here is a serial output log of a session using the assembler.
```
> H
Help
----
H: Help
L: List
G: Goto
C: Clear
R: Restore
O: Opcodes
OpCode
OpCode im

Current line: b0000 [0]

> L
RAM Disassembly

b0000 [0]: JNC b1000b1110 10000xE8
b0001 [1]: JMP b0011b1111 00110xF3
b0010 [2]: OUT b0010b1010 00100xA2
b0011 [3]: ADDB b0001b0101 00010x51
b0100 [4]: OUT b0100b1010 01000xA4
b0101 [5]: ADDA b0001b0000 00010x01
b0110 [6]: OUT b1000b1010 10000xA8
b0111 [7]: ADDB b0001b0101 00010x51
b1000 [8]: OUT b0100b1010 01000xA4
b1001 [9]: ADDA b0001b0000 00010x01
b1010 [A]: OUT b0010b1010 00100xA2
b1011 [B]: ADDB b0001b0101 00010x51
b1100 [C]: JMP b0000b1111 00000xF0
b1101 [D]: ADDA b0000b0000 00000x00
b1110 [E]: ADDA b0000b0000 00000x00
b1111 [F]: ADDA b0000b0000 00000x00
Current line: b0010 [2]

> G 13
Goto line 13
Current line: b1101 [D]

> OUTB
Assemble:
b1101 [D] OUTB      b1000 00000x80
Current line: b1110 [E]

> L
RAM Disassembly

b0000 [0]: JNC b1000b1110 10000xE8
b0001 [1]: JMP b0011b1111 00110xF3
b0010 [2]: OUT b0010b1010 00100xA2
b0011 [3]: ADDB b0001b0101 00010x51
b0100 [4]: OUT b0100b1010 01000xA4
b0101 [5]: ADDA b0001b0000 00010x01
b0110 [6]: OUT b1000b1010 10000xA8
b0111 [7]: ADDB b0001b0101 00010x51
b1000 [8]: OUT b0100b1010 01000xA4
b1001 [9]: ADDA b0001b0000 00010x01
b1010 [A]: OUT b0010b1010 00100xA2
b1011 [B]: ADDB b0001b0101 00010x51
b1100 [C]: JMP b0000b1111 00000xF0
b1101 [D]: OUTB      b1000 00000x80
b1110 [E]: ADDA b0000b0000 00000x00
b1111 [F]: ADDA b0000b0000 00000x00
Current line: b1110 [E]

> O
Supported OpCodes:
  b0000 dataADDA im
  b0001 0000MOVAB
  b0010 0000INA
  b0011 dataMOVA im
  b0100 0000MOVBA
  b0101 dataADDB im
  b0110 0000INB
  b0111 dataMOVB im
  b1000 0000OUTB
  b1001 0000OUT2B
  b1010 dataOUT im
  b1011 dataOUT2 im
  b1100 dataJNCB im
  b1101 dataJMPB im
  b1110 dataJNC im
  b1111 dataJMP im

> C
Clearing RAM ... Done
```
Find the code on GitHub here.
Conclusion
The basics for this actually came together fairly quickly, but I must admit to spending a fair bit of time fiddling about with output formats and refactoring various bits of code to try to give some consistency in terms of when newlines are applied, what is shown in binary, what in hex, and so on.
I can’t guarantee everything has been caught, but I’ve typed in all the code (using the newer, limited syntax) from Part 3 and they all seem to work.
It would be nice to be able to automatically reset the TD4 from the Arduino, but for now, pressing the button when required is fine.
For the most part, unless there is a loop to get caught in, the code will cycle back to the start anyway.
In terms of possible updates and enhancements, there are a few on my mind:
- It would be nice to support the undocumented use of immediate values somehow.
- It might be nice to have a way to save/load the code. It only needs to be a string of 16 2-byte hex codes.
- It might be nice to have several demo programs to choose from.
If I expand the instruction set and architecture, then I’ll have to think again about chunks of this code, but for now, it seems to work pretty well.
Kevin
#4bit #arduinoUno #define #td4
#td4 #define #arduinouno #4bit
Kevin's Blog @[email protected] · 2025-11-21 · 17:11 UTC
TD4 4-bit DIY CPU – Part 7
Once the idea was floated, in Part 6 of creating an Arduino “direct to ROM” assembler, I had to just do it, so this post is a little diversion from the hardware discussion into how that could work.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
Basic Concepts
This relies on using an Arduino as the ROM as described in Part 6, but the Arduino now has the option to change the ROM contents independently of the TD4 itself.
The Arduino sketch will do the following:
- Run the TD4 ROM routine off a timer interrupt so that it is always running and responsive.
- Take input over the Arduino serial port to allow basic control, e.g. list, clear, etc.
- Allow the direct input of assembler instructions, such as MOVE A,B or OUT B and so on.
- Provide a means of selecting which line of the program to change.
The code will thus have a number of key sections:
- The TD4 ROM routine.
- Some kind of serial-port command-line interpreter.
- Handler routines for all the commands.
- An assembler.
- A disassembler.
The TD4 ROM routine has already been fully described in Part 6. The only difference is that the scanning routine will be driven from a 1mS timer using the TimerOne library.
As I want to still support a built-in demo, I now have the concept of ROM being the demo code and RAM being the “live” code to pass onto the TD4. The Arduino will initialise the RAM on startup from the ROM.
As far as the TD4 is concerned of course, this is all still ROM.
Command Line Interpreter
The standard Arduino Serial routines will be used to scan for input via the serial port. It will support a line-oriented input as follows:
```
bool cmdRunner (void) {
  while (Serial.available()) {
    char c = Serial.read();
    if (c == '\n') {
      strcpy(cmdSaved, cmdInput);
      cmdIdx = 0;
      return true;
    }
    else if (cmdIdx < CMD_BUFFER-1) {
      cmdInput[cmdIdx++] = c;
      cmdInput[cmdIdx] = '\0';
    }
  }
  return false;
}
```
This will keep adding any received characters to the cmdInput buffer until a newline is received, at which point the command is saved in cmdSaved and the routine will return true indicating a full line is ready to be processed.
Once a complete line is received, then a processing function will parse it.
Key to the processing of commands is a command table that stores the text to match and the handler function to call on finding a valid command. There is an additional parameter that will be passed into the handler function to allow the same handler function to support several commands. This will be used in the assembler itself later.
```
struct cmd_t {
  char    cmd[CMD_BUFFER+1];
  hdlr_t  pFn;
  uint8_t idx;
};

const cmd_t PROGMEM cmdTable[NUM_CMDS] = {
  {"H", hdlrHelp, 0},
  {"L", hdlrList, 0},
  {"G", hdlrGoto, 0},
};
```
The algorithm for parsing commands is as follows:
```
cmdProcess:
  Look for a space or newline
  IF found a space THEN
    This is the start of the parameter

  Look for the command in the command table
  IF command found THEN
    Call the handler function with the parameters
```
The implementation is a bit complex, as it uses string pointers and has to chop and parse strings as it goes. It is also detailing with the command table in the Arduino’s PROGMEM which is an additional complication too.
In order to be able to use the same command line interpreter for the input of assembler instructions, I’ve had to simplify the syntax. There are no spaces in opcodes and there has to be a space between the opcode and immediate value if used.
Here are some examples:
```
IN A       -> INA
MOVE A,B   -> MOVAB
OUT im     -> OUT im
JNC im     -> JNC im
ADD A,im   -> ADDA im
```
Handler Routines
All handler routines have the following prototype:
```
typedef void (*hdlr_t)(int idx, char *param);

void hdlrHelp(int idx, char *pParam) {
  Serial.print("\nHelp\n----\n");
  Serial.println("H: Help");
}
```
The idx parameter is the number in the last field of the command table. pParam will be a pointer to the parameter string for the command (if used).
As we’re dealing with strings all the time, there are a number of helper functions to do things like convert strings to numbers as well as others to print numbers in various formats.
Number formats are assumed to be as follows:
```
0..9   - decimal digits
0x0..F - hex digits
b0..1  - binary digits
```
The code provides the following:
- str2num – the basic string parsing routine to recognise all three number formats as strings.
- printbin – print a number in b0..1 format.
- printhex – print a number in 0x0..F format, allowing for a possible leading zero if required.
- printins – print an instruction in textual format.
- printop – print an instruction in binary and hex opcode format.
- printline – print a line number in a consistent binary and hex format.
The code supports the following commands, so each has its own handler function:
- H – help – show the list of commands.
- L – list – show the disassembly of the whole working memory (RAM).
- G – goto – set the working line number.
- C – clear – reset all working memory (RAM) to zeros.
- R – restore – restore the working memory (RAM) to the pre-build demo code (ROM).
- O – opcodes – list the supported opcodes.
Assembler
As already mentioned, I’m using the same command line interpreter code to create the assembler. To do this, each opcode has an entry in the command table:
```
const cmd_t PROGMEM cmdTable[NUM_CMDS] = {
  // Assembly commands - must be first
  {"ADDA", hdlrAsm, 0},
  {"MOVAB", hdlrAsm, 1},
  {"INA", hdlrAsm, 2},
  {"MOVA", hdlrAsm, 3},
  {"MOVBA", hdlrAsm, 4},
  {"ADDB", hdlrAsm, 5},
  {"INB", hdlrAsm, 6},
  {"MOVB", hdlrAsm, 7},
  {"OUTB", hdlrAsm, 8},
  {"OUT2B", hdlrAsm, 9},
  {"OUT", hdlrAsm, 10},
  {"OUT2", hdlrAsm, 11},
  {"JNCB", hdlrAsm, 12},
  {"JMPB", hdlrAsm, 13},
  {"JNC", hdlrAsm, 14},
  {"JMP", hdlrAsm, 15},

  // Other commands
  {"H", hdlrHelp, 0},
  {"L", hdlrList, 0},
  {"G", hdlrGoto, 0},
  {"C", hdlrClear, 0},
  {"R", hdlrRestore, 0},
  {"O", hdlrOpcodes, 0},
};
```
The order corresponds to the opcode command value, as does the parameter. As these are at the start of the table, I can assume that the position in the table is the same as the command value. This does mean that I also need to account for the duplicated instructions even if I don’t need to use them.
I’m making the following design decisions:
- There is the concept of a “current line” which can be set with the G (goto) command.
- Entering a valid opcode automatically moves the current line on by 1.
- No line information is entered as part of the opcode.
The main logic of the assembler handler is as follows:
```
Assembler:
  Command value is the provided index parameter
  Determine the im value from the provided string parameter
  RAM[line] = cmd << 4 + im
  Increment current line
```
Disassembler
Disassembly is really largely a look-up table matching opcode command values to text. This is all hidden away behind the two print routines printins() and printop().
```
void printins (uint8_t ins) {
  uint8_t cmd = ins >> 4;
  uint8_t im = ins & 0x0F;

  Serial.print(FSH(cmdTable[cmd].cmd));
  if (HASIM(cmd)) {
    Serial.print(" b");
    printbin(im,4);
  } else {
    Serial.print("      ");
  }
}

void printop (uint8_t op) {
  uint8_t cmd = op >> 4;
  uint8_t im = op & 0x0F;

  Serial.print("b");
  printbin(cmd,4);
  Serial.print(" ");
  printbin(im,4);
  Serial.print("\t0x");
  printhex(op,2);
}
```
The main complexity is pulling the strings out of the command table. I’ve had to include a macro to provide access to the strings from the Arduino’s PROGMEM:
```
#define FSH(x) ((const __FlashStringHelper *)x)
```
This feels like a bit of a hack, but apparently this is how it should be done for the kind of thing I need to do!
There is another macro here that needs explaining:
```
#define HASIM(op) (op==0||op==3||op==5||op==7||op>9)
```
This is a set of conditions that if true means that the command supports an immediate value. This is used in a few places to know how to parse the commands.
Whilst in principle all commands could use the immediate value, the “official” statement of how they work assumes im=0 in many cases. So, for example, OUT B does not require an immediate value, but if one is provided then OUT B becomes OUT B+im.
I’m not really supporting that with this code at the moment.
Putting it all together
Here is a serial output log of a session using the assembler.
```
> H
Help
----
H: Help
L: List
G: Goto
C: Clear
R: Restore
O: Opcodes
OpCode
OpCode im

Current line: b0000 [0]

> L
RAM Disassembly

b0000 [0]: JNC b1000b1110 10000xE8
b0001 [1]: JMP b0011b1111 00110xF3
b0010 [2]: OUT b0010b1010 00100xA2
b0011 [3]: ADDB b0001b0101 00010x51
b0100 [4]: OUT b0100b1010 01000xA4
b0101 [5]: ADDA b0001b0000 00010x01
b0110 [6]: OUT b1000b1010 10000xA8
b0111 [7]: ADDB b0001b0101 00010x51
b1000 [8]: OUT b0100b1010 01000xA4
b1001 [9]: ADDA b0001b0000 00010x01
b1010 [A]: OUT b0010b1010 00100xA2
b1011 [B]: ADDB b0001b0101 00010x51
b1100 [C]: JMP b0000b1111 00000xF0
b1101 [D]: ADDA b0000b0000 00000x00
b1110 [E]: ADDA b0000b0000 00000x00
b1111 [F]: ADDA b0000b0000 00000x00
Current line: b0010 [2]

> G 13
Goto line 13
Current line: b1101 [D]

> OUTB
Assemble:
b1101 [D] OUTB      b1000 00000x80
Current line: b1110 [E]

> L
RAM Disassembly

b0000 [0]: JNC b1000b1110 10000xE8
b0001 [1]: JMP b0011b1111 00110xF3
b0010 [2]: OUT b0010b1010 00100xA2
b0011 [3]: ADDB b0001b0101 00010x51
b0100 [4]: OUT b0100b1010 01000xA4
b0101 [5]: ADDA b0001b0000 00010x01
b0110 [6]: OUT b1000b1010 10000xA8
b0111 [7]: ADDB b0001b0101 00010x51
b1000 [8]: OUT b0100b1010 01000xA4
b1001 [9]: ADDA b0001b0000 00010x01
b1010 [A]: OUT b0010b1010 00100xA2
b1011 [B]: ADDB b0001b0101 00010x51
b1100 [C]: JMP b0000b1111 00000xF0
b1101 [D]: OUTB      b1000 00000x80
b1110 [E]: ADDA b0000b0000 00000x00
b1111 [F]: ADDA b0000b0000 00000x00
Current line: b1110 [E]

> O
Supported OpCodes:
  b0000 dataADDA im
  b0001 0000MOVAB
  b0010 0000INA
  b0011 dataMOVA im
  b0100 0000MOVBA
  b0101 dataADDB im
  b0110 0000INB
  b0111 dataMOVB im
  b1000 0000OUTB
  b1001 0000OUT2B
  b1010 dataOUT im
  b1011 dataOUT2 im
  b1100 dataJNCB im
  b1101 dataJMPB im
  b1110 dataJNC im
  b1111 dataJMP im

> C
Clearing RAM ... Done
```
Find the code on GitHub here.
Conclusion
The basics for this actually came together fairly quickly, but I must admit to spending a fair bit of time fiddling about with output formats and refactoring various bits of code to try to give some consistency in terms of when newlines are applied, what is shown in binary, what in hex, and so on.
I can’t guarantee everything has been caught, but I’ve typed in all the code (using the newer, limited syntax) from Part 3 and they all seem to work.
It would be nice to be able to automatically reset the TD4 from the Arduino, but for now, pressing the button when required is fine.
For the most part, unless there is a loop to get caught in, the code will cycle back to the start anyway.
In terms of possible updates and enhancements, there are a few on my mind:
- It would be nice to support the undocumented use of immediate values somehow.
- It might be nice to have a way to save/load the code. It only needs to be a string of 16 2-byte hex codes.
- It might be nice to have several demo programs to choose from.
If I expand the instruction set and architecture, then I’ll have to think again about chunks of this code, but for now, it seems to work pretty well.
Kevin
#4bit #arduinoUno #define #td4
#4bit #arduinouno #define #td4
Kevin's Blog @[email protected] · 2025-11-21 · 17:11 UTC
TD4 4-bit DIY CPU – Part 7
Once the idea was floated, in Part 6 of creating an Arduino “direct to ROM” assembler, I had to just do it, so this post is a little diversion from the hardware discussion into how that could work.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
Basic Concepts
This relies on using an Arduino as the ROM as described in Part 6, but the Arduino now has the option to change the ROM contents independently of the TD4 itself.
The Arduino sketch will do the following:
- Run the TD4 ROM routine off a timer interrupt so that it is always running and responsive.
- Take input over the Arduino serial port to allow basic control, e.g. list, clear, etc.
- Allow the direct input of assembler instructions, such as MOVE A,B or OUT B and so on.
- Provide a means of selecting which line of the program to change.
The code will thus have a number of key sections:
- The TD4 ROM routine.
- Some kind of serial-port command-line interpreter.
- Handler routines for all the commands.
- An assembler.
- A disassembler.
The TD4 ROM routine has already been fully described in Part 6. The only difference is that the scanning routine will be driven from a 1mS timer using the TimerOne library.
As I want to still support a built-in demo, I now have the concept of ROM being the demo code and RAM being the “live” code to pass onto the TD4. The Arduino will initialise the RAM on startup from the ROM.
As far as the TD4 is concerned of course, this is all still ROM.
Command Line Interpreter
The standard Arduino Serial routines will be used to scan for input via the serial port. It will support a line-oriented input as follows:
```
bool cmdRunner (void) {
  while (Serial.available()) {
    char c = Serial.read();
    if (c == '\n') {
      strcpy(cmdSaved, cmdInput);
      cmdIdx = 0;
      return true;
    }
    else if (cmdIdx < CMD_BUFFER-1) {
      cmdInput[cmdIdx++] = c;
      cmdInput[cmdIdx] = '\0';
    }
  }
  return false;
}
```
This will keep adding any received characters to the cmdInput buffer until a newline is received, at which point the command is saved in cmdSaved and the routine will return true indicating a full line is ready to be processed.
Once a complete line is received, then a processing function will parse it.
Key to the processing of commands is a command table that stores the text to match and the handler function to call on finding a valid command. There is an additional parameter that will be passed into the handler function to allow the same handler function to support several commands. This will be used in the assembler itself later.
```
struct cmd_t {
  char    cmd[CMD_BUFFER+1];
  hdlr_t  pFn;
  uint8_t idx;
};

const cmd_t PROGMEM cmdTable[NUM_CMDS] = {
  {"H", hdlrHelp, 0},
  {"L", hdlrList, 0},
  {"G", hdlrGoto, 0},
};
```
The algorithm for parsing commands is as follows:
```
cmdProcess:
  Look for a space or newline
  IF found a space THEN
    This is the start of the parameter

  Look for the command in the command table
  IF command found THEN
    Call the handler function with the parameters
```
The implementation is a bit complex, as it uses string pointers and has to chop and parse strings as it goes. It is also detailing with the command table in the Arduino’s PROGMEM which is an additional complication too.
In order to be able to use the same command line interpreter for the input of assembler instructions, I’ve had to simplify the syntax. There are no spaces in opcodes and there has to be a space between the opcode and immediate value if used.
Here are some examples:
```
IN A       -> INA
MOVE A,B   -> MOVAB
OUT im     -> OUT im
JNC im     -> JNC im
ADD A,im   -> ADDA im
```
Handler Routines
All handler routines have the following prototype:
```
typedef void (*hdlr_t)(int idx, char *param);

void hdlrHelp(int idx, char *pParam) {
  Serial.print("\nHelp\n----\n");
  Serial.println("H: Help");
}
```
The idx parameter is the number in the last field of the command table. pParam will be a pointer to the parameter string for the command (if used).
As we’re dealing with strings all the time, there are a number of helper functions to do things like convert strings to numbers as well as others to print numbers in various formats.
Number formats are assumed to be as follows:
```
0..9   - decimal digits
0x0..F - hex digits
b0..1  - binary digits
```
The code provides the following:
- str2num – the basic string parsing routine to recognise all three number formats as strings.
- printbin – print a number in b0..1 format.
- printhex – print a number in 0x0..F format, allowing for a possible leading zero if required.
- printins – print an instruction in textual format.
- printop – print an instruction in binary and hex opcode format.
- printline – print a line number in a consistent binary and hex format.
The code supports the following commands, so each has its own handler function:
- H – help – show the list of commands.
- L – list – show the disassembly of the whole working memory (RAM).
- G – goto – set the working line number.
- C – clear – reset all working memory (RAM) to zeros.
- R – restore – restore the working memory (RAM) to the pre-build demo code (ROM).
- O – opcodes – list the supported opcodes.
Assembler
As already mentioned, I’m using the same command line interpreter code to create the assembler. To do this, each opcode has an entry in the command table:
```
const cmd_t PROGMEM cmdTable[NUM_CMDS] = {
  // Assembly commands - must be first
  {"ADDA", hdlrAsm, 0},
  {"MOVAB", hdlrAsm, 1},
  {"INA", hdlrAsm, 2},
  {"MOVA", hdlrAsm, 3},
  {"MOVBA", hdlrAsm, 4},
  {"ADDB", hdlrAsm, 5},
  {"INB", hdlrAsm, 6},
  {"MOVB", hdlrAsm, 7},
  {"OUTB", hdlrAsm, 8},
  {"OUT2B", hdlrAsm, 9},
  {"OUT", hdlrAsm, 10},
  {"OUT2", hdlrAsm, 11},
  {"JNCB", hdlrAsm, 12},
  {"JMPB", hdlrAsm, 13},
  {"JNC", hdlrAsm, 14},
  {"JMP", hdlrAsm, 15},

  // Other commands
  {"H", hdlrHelp, 0},
  {"L", hdlrList, 0},
  {"G", hdlrGoto, 0},
  {"C", hdlrClear, 0},
  {"R", hdlrRestore, 0},
  {"O", hdlrOpcodes, 0},
};
```
The order corresponds to the opcode command value, as does the parameter. As these are at the start of the table, I can assume that the position in the table is the same as the command value. This does mean that I also need to account for the duplicated instructions even if I don’t need to use them.
I’m making the following design decisions:
- There is the concept of a “current line” which can be set with the G (goto) command.
- Entering a valid opcode automatically moves the current line on by 1.
- No line information is entered as part of the opcode.
The main logic of the assembler handler is as follows:
```
Assembler:
  Command value is the provided index parameter
  Determine the im value from the provided string parameter
  RAM[line] = cmd << 4 + im
  Increment current line
```
Disassembler
Disassembly is really largely a look-up table matching opcode command values to text. This is all hidden away behind the two print routines printins() and printop().
```
void printins (uint8_t ins) {
  uint8_t cmd = ins >> 4;
  uint8_t im = ins & 0x0F;

  Serial.print(FSH(cmdTable[cmd].cmd));
  if (HASIM(cmd)) {
    Serial.print(" b");
    printbin(im,4);
  } else {
    Serial.print("      ");
  }
}

void printop (uint8_t op) {
  uint8_t cmd = op >> 4;
  uint8_t im = op & 0x0F;

  Serial.print("b");
  printbin(cmd,4);
  Serial.print(" ");
  printbin(im,4);
  Serial.print("\t0x");
  printhex(op,2);
}
```
The main complexity is pulling the strings out of the command table. I’ve had to include a macro to provide access to the strings from the Arduino’s PROGMEM:
```
#define FSH(x) ((const __FlashStringHelper *)x)
```
This feels like a bit of a hack, but apparently this is how it should be done for the kind of thing I need to do!
There is another macro here that needs explaining:
```
#define HASIM(op) (op==0||op==3||op==5||op==7||op>9)
```
This is a set of conditions that if true means that the command supports an immediate value. This is used in a few places to know how to parse the commands.
Whilst in principle all commands could use the immediate value, the “official” statement of how they work assumes im=0 in many cases. So, for example, OUT B does not require an immediate value, but if one is provided then OUT B becomes OUT B+im.
I’m not really supporting that with this code at the moment.
Putting it all together
Here is a serial output log of a session using the assembler.
```
> H
Help
----
H: Help
L: List
G: Goto
C: Clear
R: Restore
O: Opcodes
OpCode
OpCode im

Current line: b0000 [0]

> L
RAM Disassembly

b0000 [0]: JNC b1000b1110 10000xE8
b0001 [1]: JMP b0011b1111 00110xF3
b0010 [2]: OUT b0010b1010 00100xA2
b0011 [3]: ADDB b0001b0101 00010x51
b0100 [4]: OUT b0100b1010 01000xA4
b0101 [5]: ADDA b0001b0000 00010x01
b0110 [6]: OUT b1000b1010 10000xA8
b0111 [7]: ADDB b0001b0101 00010x51
b1000 [8]: OUT b0100b1010 01000xA4
b1001 [9]: ADDA b0001b0000 00010x01
b1010 [A]: OUT b0010b1010 00100xA2
b1011 [B]: ADDB b0001b0101 00010x51
b1100 [C]: JMP b0000b1111 00000xF0
b1101 [D]: ADDA b0000b0000 00000x00
b1110 [E]: ADDA b0000b0000 00000x00
b1111 [F]: ADDA b0000b0000 00000x00
Current line: b0010 [2]

> G 13
Goto line 13
Current line: b1101 [D]

> OUTB
Assemble:
b1101 [D] OUTB      b1000 00000x80
Current line: b1110 [E]

> L
RAM Disassembly

b0000 [0]: JNC b1000b1110 10000xE8
b0001 [1]: JMP b0011b1111 00110xF3
b0010 [2]: OUT b0010b1010 00100xA2
b0011 [3]: ADDB b0001b0101 00010x51
b0100 [4]: OUT b0100b1010 01000xA4
b0101 [5]: ADDA b0001b0000 00010x01
b0110 [6]: OUT b1000b1010 10000xA8
b0111 [7]: ADDB b0001b0101 00010x51
b1000 [8]: OUT b0100b1010 01000xA4
b1001 [9]: ADDA b0001b0000 00010x01
b1010 [A]: OUT b0010b1010 00100xA2
b1011 [B]: ADDB b0001b0101 00010x51
b1100 [C]: JMP b0000b1111 00000xF0
b1101 [D]: OUTB      b1000 00000x80
b1110 [E]: ADDA b0000b0000 00000x00
b1111 [F]: ADDA b0000b0000 00000x00
Current line: b1110 [E]

> O
Supported OpCodes:
  b0000 dataADDA im
  b0001 0000MOVAB
  b0010 0000INA
  b0011 dataMOVA im
  b0100 0000MOVBA
  b0101 dataADDB im
  b0110 0000INB
  b0111 dataMOVB im
  b1000 0000OUTB
  b1001 0000OUT2B
  b1010 dataOUT im
  b1011 dataOUT2 im
  b1100 dataJNCB im
  b1101 dataJMPB im
  b1110 dataJNC im
  b1111 dataJMP im

> C
Clearing RAM ... Done
```
Find the code on GitHub here.
Conclusion
The basics for this actually came together fairly quickly, but I must admit to spending a fair bit of time fiddling about with output formats and refactoring various bits of code to try to give some consistency in terms of when newlines are applied, what is shown in binary, what in hex, and so on.
I can’t guarantee everything has been caught, but I’ve typed in all the code (using the newer, limited syntax) from Part 3 and they all seem to work.
It would be nice to be able to automatically reset the TD4 from the Arduino, but for now, pressing the button when required is fine.
For the most part, unless there is a loop to get caught in, the code will cycle back to the start anyway.
In terms of possible updates and enhancements, there are a few on my mind:
- It would be nice to support the undocumented use of immediate values somehow.
- It might be nice to have a way to save/load the code. It only needs to be a string of 16 2-byte hex codes.
- It might be nice to have several demo programs to choose from.
If I expand the instruction set and architecture, then I’ll have to think again about chunks of this code, but for now, it seems to work pretty well.
Kevin
#4bit #arduinoUno #define #td4
#4bit #arduinouno #define #td4
Kevin's Blog @[email protected] · 2025-11-15 · 18:13 UTC
TD4 4-bit DIY CPU – Part 6
Having now successfully built my own version of the TD4 4-bit CPU in Part 5, I’m now chewing over some of the ways I’d like to try to expand it.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
I already have a list of others extended projects at the end of Part 4, so I might be drawing on some of them for inspiration moving forward. Many of these are very similar projects, but with a completely different architecture. But really at this stage rather than build a different, more capable, 4-bit CPU from someone else’s design, I’m interested in seeing how far the TD4 design can go. So, ultimately, like all my projects, the fun here is in the reinventing and learning on the way.
One of the questions I have is can I replace the DIP switches with something that can provide the data in a better way? This would be particularly critical if I expand the address space in the future. A ROM is the obvious option, but something more dynamic might be an interesting experiment too.
This post looks at options for replacing the DIP switches with microcontrollers.
Now I feel like I really ought to state right up front that this is a pretty ludicrous thing to do.
At the more charitable end of the endeavor I’m using a 16MHz 8-bit AVR microcontroller with 2kB of RAM to serve up 16 8-bit values to a 10Hz 4-bit CPU.
At the most extreme end I’m using a 125MHz, dual-core, 32-bit ARM Cortex M0+ CPU with 264 kB of RAM running an entire interpreted programming environment requiring (probably) millions of lines of low-level code to implement it, to do the same thing.
So why bother? Well – why not?
TD4 without the ROM
To interface to a microcontroller, I’m after two things:
- Ability to read the 4 address lines.
- Ability to drive the 8 data lines.
The best place to get at these signals is on the interface to the ROM itself – the 74HC540 octal line driver, and 74HC154 4-to-8 line decoder.
Conveniently, these signals can be broken out quite easily on my board as shown below.
The pink shaded area shows which components are needed for a ROM-less build. The two yellow highlights show where headers should be soldered to permit access to the address lines (top) and data lines (bottom).
In this build, the following components are omitted from the full board:
- 74HC154
- 74HC540
- 16x 8-way DIP switches
- 128x small signal diodes
- 8x 10k pull-up resistors
I’ve used 6-way and 10-way pin header sockets to allow me to patch in a microcontroller. This allows for each header to conveniently include 5V and GND too. I’ve included the USB socket for power to the PCB but expect I’ll probably power the board via these 5V and GND links from the microcontroller.
Using Arduino
The natural choice here is to use one of the older Arduino boards, as these are all 5V IO which makes interfacing with the 4-bit CPU fairly straight forward.
Using Arduino direct PORTIO should also make it pretty trivial to read address lines and write the data. I’ve configured the connections as follows:
TD4 SignalArduino GPIOArduino PORTIOA0A0PORTC:0A1A1PORTC:1A2A2PORTC:2A3A3PORTC:3D0D8PORTB:0D1D9PORTB:1D2D10PORTB:2D3D11PORTB:3D4D4PORTD:4D5D5PORTD:5D6D6PORTD:6D7D7PORTD:7
I’m avoiding D0/D1 (PORTD[0:1]) and D13 as they all have other hardware attached (serial port and LED in this case).
Accessing the data corresponding to any specific address is as simple as follows:
```
uint8_t ROM[16];

loop:
    unt8_t addr = PINC & 0x0F
    PORTB = (PORTD & ~(0x0F)) | (ROM[addr] & 0x0F);
    PORTD = (PORTD & ~(0xF0)) | (ROM[addr] & 0xF0);
```
The code could be simplified if I didn’t mind trashing whatever is configured for the other GPIO pins via the PORTIO, but it is good practice to preserve those values when only writing to a subset of the IO ports.
In the final code below, I’ve included a toggle for A5 which allows me to do some timing measurements too.
```
uint8_t ROM[16] = {
    0xB1, 0x01, 0xB2, 0x51,
    0xB4, 0x01, 0xB8, 0x51,
    0xB4, 0x01, 0xB2, 0x51,
    0xF0, 0x00, 0x00, 0x00
};

void setup() {
  DDRB |= 0x0F;
  DDRD |= 0xF0;
  DDRC |= 0x20;
}

int toggle;
void loop() {
  if (toggle == 0) {
    toggle = 1;
    PORTC |= 0x20;
  } else {
    toggle = 0;
    PORTC &= ~(0x20);
  }

  uint8_t addr = PINC & 0x0F;
  PORTB = (PORTD & ~(0x0F)) | (ROM[addr] & 0x0F);
  PORTD = (PORTD & ~(0xF0)) | (ROM[addr] & 0xF0);
}
```
Running the code in a loop like this gives a scan frequency of around 500kHz and a response time of something like 2-3 uS for each read. That seems pretty responsive and I’m sure will be fine for a 10Hz CPU. And it is – it works great!
Using Circuitpython
One thing that would be really nice is a workflow that allows more of a “direct save to the CPU” approach to programming it. One option is to use a more modern microcontroller that supports a filesystem.
The obvious choice here is a 32-bit microcontroller that supports Circuitpython. But will IO in Circuitpython be fast enough to respond to the CPU? There is one obvious way to find out – give it a try.
There is another complication too – most Circuitpython boards run at 3.3V not 5V so that needs to be addressed too.
Level Shifting
I’m going to use a 74LVC245. The Adafruit product page puts it best:
“essentially: connect VCC to your logic level you want to convert to (say 3.3V), Ground connects to Ground. Wire OE (output enable) to ground to enable the device and DIR (direction) to VCC. Then digital logic on the A pins up to 5V will appear on the B pins shifted down to the VCC logic.”
This is an 8-way bi-directional bus transceiver and should be powered by 3V3, then the direction pin will determine the direction of the conversion as shown[ below.
Two devices will be required. The address lines will need a 5V to 3V3 conversion; the data lines will need 3V3 o 5V.
Here is how I’ve wired these up for a Raspberry Pi Pico:
The Pico is connected as follows:
- INPUT: GPIO 10-13 = A0-A3
- OUTPUT: GPIO 2-9 = D7-D0 (not the ordering!)
CircuitPython ROM
The basic algorithm will be as follows:
```
ROM = [16 command byte values]

LOOP:
  Read four address lines
  Set data lines from ROM[address]
```
For performance reasons it would be best to optimise both the reading of the address lines and the writing of the data lines, ideally into a single access. But as this is for a CPU that runs at a maximum of 10Hz, so for now, I’m just going with simple and see how it goes.
```
import board
import digitalio

ROM = [
    0xB1, 0x01, 0xB2, 0x51,
    0xB4, 0x01, 0xB8, 0x51,
    0xB4, 0x01, 0xB2, 0x51,
    0xF0, 0x00, 0x00, 0x00
    ]

Tpin = digitalio.DigitalInOut(board.GP21)
Tpin.direction = digitalio.Direction.OUTPUT

A0pin = digitalio.DigitalInOut(board.GP10)
A1pin = digitalio.DigitalInOut(board.GP11)
A2pin = digitalio.DigitalInOut(board.GP12)
A3pin = digitalio.DigitalInOut(board.GP13)

D0pin = digitalio.DigitalInOut(board.GP2)
D0pin.direction = digitalio.Direction.OUTPUT
D1pin = digitalio.DigitalInOut(board.GP3)
D1pin.direction = digitalio.Direction.OUTPUT
D2pin = digitalio.DigitalInOut(board.GP4)
D2pin.direction = digitalio.Direction.OUTPUT
D3pin = digitalio.DigitalInOut(board.GP5)
D3pin.direction = digitalio.Direction.OUTPUT
D4pin = digitalio.DigitalInOut(board.GP6)
D4pin.direction = digitalio.Direction.OUTPUT
D5pin = digitalio.DigitalInOut(board.GP7)
D5pin.direction = digitalio.Direction.OUTPUT
D6pin = digitalio.DigitalInOut(board.GP8)
D6pin.direction = digitalio.Direction.OUTPUT
D7pin = digitalio.DigitalInOut(board.GP9)
D7pin.direction = digitalio.Direction.OUTPUT

def doOutput (data):
    if (data & 0x01):
        D0pin.value = True
    else:
        D0pin.value = False
    
    if (data & 0x02):
        D1pin.value = True
    else:
        D1pin.value = False
    
    if (data & 0x04):
        D2pin.value = True
    else:
        D2pin.value = False
    
    if (data & 0x08):
        D3pin.value = True
    else:
        D3pin.value = False
    
    if (data & 0x10):
        D4pin.value = True
    else:
        D4pin.value = False
    
    if (data & 0x20):
        D5pin.value = True
    else:
        D5pin.value = False

    if (data & 0x40):
        D6pin.value = True
    else:
        D6pin.value = False

    if (data & 0x80):
        D7pin.value = True
    else:
        D7pin.value = False

while True:
    Tpin.value = True
    addr = 0
    if (A0pin.value == True):
        addr = addr + 1
    if (A1pin.value == True):
        addr = addr + 2
    if (A2pin.value == True):
        addr = addr + 4
    if (A3pin.value == True):
        addr = addr + 8

    Tpin.value = False
    doOutput(ROM[addr])
```
I’ve included a timing pin to GPIO21 so I can see how long it takes to access the IO.
It turns out that it takes something of the order of 50-60uS to read the four address lines and something in the region of 70-80uS to write out the 8 data lines. The above simple Circuitpython code to do this is running with a frequency of around 7kHz.
Now at this point I ought to be reading through the datasheets for the ICs used in the CPU to check response times and timing tolerances so see if this is ok. But I didn’t bother with any of that as it all appears to work!
Conclusion
The Circuitpython is obviously a lot slower than the Arduino running optimised PORTIO code, even though the Circuitpython is running on a 125MHz processor compared to the Arduino’s 16MHz. Of course, if performance was critical then switching to direct GPIO access in C on the Pico would be a lot faster again. Even just having a way to do a single block-access of GPIO would probably make quite a difference.
But for this application, either as they are seem to work absolutely fine.
The ability to quickly edit the ROM contents is pretty useful with the Circuitpython. But I am now wondering how difficult it would be to have some kind of uploader to the Arduino over the serial port. There are only 16 bytes to transfer after all.
In fact it might even be possible to create a simple interactive assembler that allows code to be typed in over the serial port using proper word-based op-codes (like ADD, IN, OUT, etc). At the very least a simple serial port interface to type in numeric values would be relatively straight forward I think. It might also be possible to allow the microcontroller to reset the CPU too.
I’m not sure the added complications of logic shifting, etc, make it worth carrying on with a Pico version at this stage, so I think improving the Arduino is probably the way to go for now.
Kevin
#4bit #arduinoUno #circuitpython #PORTIO #raspberryPiPico #TD4
#4bit #arduinouno #circuitpython #portio #raspberrypipico #td4
Kevin's Blog @[email protected] · 2025-11-15 · 18:13 UTC
TD4 4-bit DIY CPU – Part 6
Having now successfully built my own version of the TD4 4-bit CPU in Part 5, I’m now chewing over some of the ways I’d like to try to expand it.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
I already have a list of others extended projects at the end of Part 4, so I might be drawing on some of them for inspiration moving forward. Many of these are very similar projects, but with a completely different architecture. But really at this stage rather than build a different, more capable, 4-bit CPU from someone else’s design, I’m interested in seeing how far the TD4 design can go. So, ultimately, like all my projects, the fun here is in the reinventing and learning on the way.
One of the questions I have is can I replace the DIP switches with something that can provide the data in a better way? This would be particularly critical if I expand the address space in the future. A ROM is the obvious option, but something more dynamic might be an interesting experiment too.
This post looks at options for replacing the DIP switches with microcontrollers.
Now I feel like I really ought to state right up front that this is a pretty ludicrous thing to do.
At the more charitable end of the endeavor I’m using a 16MHz 8-bit AVR microcontroller with 2kB of RAM to serve up 16 8-bit values to a 10Hz 4-bit CPU.
At the most extreme end I’m using a 125MHz, dual-core, 32-bit ARM Cortex M0+ CPU with 264 kB of RAM running an entire interpreted programming environment requiring (probably) millions of lines of low-level code to implement it, to do the same thing.
So why bother? Well – why not?
TD4 without the ROM
To interface to a microcontroller, I’m after two things:
- Ability to read the 4 address lines.
- Ability to drive the 8 data lines.
The best place to get at these signals is on the interface to the ROM itself – the 74HC540 octal line driver, and 74HC154 4-to-8 line decoder.
Conveniently, these signals can be broken out quite easily on my board as shown below.
The pink shaded area shows which components are needed for a ROM-less build. The two yellow highlights show where headers should be soldered to permit access to the address lines (top) and data lines (bottom).
In this build, the following components are omitted from the full board:
- 74HC154
- 74HC540
- 16x 8-way DIP switches
- 128x small signal diodes
- 8x 10k pull-up resistors
I’ve used 6-way and 10-way pin header sockets to allow me to patch in a microcontroller. This allows for each header to conveniently include 5V and GND too. I’ve included the USB socket for power to the PCB but expect I’ll probably power the board via these 5V and GND links from the microcontroller.
Using Arduino
The natural choice here is to use one of the older Arduino boards, as these are all 5V IO which makes interfacing with the 4-bit CPU fairly straight forward.
Using Arduino direct PORTIO should also make it pretty trivial to read address lines and write the data. I’ve configured the connections as follows:
TD4 SignalArduino GPIOArduino PORTIOA0A0PORTC:0A1A1PORTC:1A2A2PORTC:2A3A3PORTC:3D0D8PORTB:0D1D9PORTB:1D2D10PORTB:2D3D11PORTB:3D4D4PORTD:4D5D5PORTD:5D6D6PORTD:6D7D7PORTD:7
I’m avoiding D0/D1 (PORTD[0:1]) and D13 as they all have other hardware attached (serial port and LED in this case).
Accessing the data corresponding to any specific address is as simple as follows:
```
uint8_t ROM[16];

loop:
    unt8_t addr = PINC & 0x0F
    PORTB = (PORTD & ~(0x0F)) | (ROM[addr] & 0x0F);
    PORTD = (PORTD & ~(0xF0)) | (ROM[addr] & 0xF0);
```
The code could be simplified if I didn’t mind trashing whatever is configured for the other GPIO pins via the PORTIO, but it is good practice to preserve those values when only writing to a subset of the IO ports.
In the final code below, I’ve included a toggle for A5 which allows me to do some timing measurements too.
```
uint8_t ROM[16] = {
    0xB1, 0x01, 0xB2, 0x51,
    0xB4, 0x01, 0xB8, 0x51,
    0xB4, 0x01, 0xB2, 0x51,
    0xF0, 0x00, 0x00, 0x00
};

void setup() {
  DDRB |= 0x0F;
  DDRD |= 0xF0;
  DDRC |= 0x20;
}

int toggle;
void loop() {
  if (toggle == 0) {
    toggle = 1;
    PORTC |= 0x20;
  } else {
    toggle = 0;
    PORTC &= ~(0x20);
  }

  uint8_t addr = PINC & 0x0F;
  PORTB = (PORTD & ~(0x0F)) | (ROM[addr] & 0x0F);
  PORTD = (PORTD & ~(0xF0)) | (ROM[addr] & 0xF0);
}
```
Running the code in a loop like this gives a scan frequency of around 500kHz and a response time of something like 2-3 uS for each read. That seems pretty responsive and I’m sure will be fine for a 10Hz CPU. And it is – it works great!
Using Circuitpython
One thing that would be really nice is a workflow that allows more of a “direct save to the CPU” approach to programming it. One option is to use a more modern microcontroller that supports a filesystem.
The obvious choice here is a 32-bit microcontroller that supports Circuitpython. But will IO in Circuitpython be fast enough to respond to the CPU? There is one obvious way to find out – give it a try.
There is another complication too – most Circuitpython boards run at 3.3V not 5V so that needs to be addressed too.
Level Shifting
I’m going to use a 74LVC245. The Adafruit product page puts it best:
“essentially: connect VCC to your logic level you want to convert to (say 3.3V), Ground connects to Ground. Wire OE (output enable) to ground to enable the device and DIR (direction) to VCC. Then digital logic on the A pins up to 5V will appear on the B pins shifted down to the VCC logic.”
This is an 8-way bi-directional bus transceiver and should be powered by 3V3, then the direction pin will determine the direction of the conversion as shown[ below.
Two devices will be required. The address lines will need a 5V to 3V3 conversion; the data lines will need 3V3 o 5V.
Here is how I’ve wired these up for a Raspberry Pi Pico:
The Pico is connected as follows:
- INPUT: GPIO 10-13 = A0-A3
- OUTPUT: GPIO 2-9 = D7-D0 (not the ordering!)
CircuitPython ROM
The basic algorithm will be as follows:
```
ROM = [16 command byte values]

LOOP:
  Read four address lines
  Set data lines from ROM[address]
```
For performance reasons it would be best to optimise both the reading of the address lines and the writing of the data lines, ideally into a single access. But as this is for a CPU that runs at a maximum of 10Hz, so for now, I’m just going with simple and see how it goes.
```
import board
import digitalio

ROM = [
    0xB1, 0x01, 0xB2, 0x51,
    0xB4, 0x01, 0xB8, 0x51,
    0xB4, 0x01, 0xB2, 0x51,
    0xF0, 0x00, 0x00, 0x00
    ]

Tpin = digitalio.DigitalInOut(board.GP21)
Tpin.direction = digitalio.Direction.OUTPUT

A0pin = digitalio.DigitalInOut(board.GP10)
A1pin = digitalio.DigitalInOut(board.GP11)
A2pin = digitalio.DigitalInOut(board.GP12)
A3pin = digitalio.DigitalInOut(board.GP13)

D0pin = digitalio.DigitalInOut(board.GP2)
D0pin.direction = digitalio.Direction.OUTPUT
D1pin = digitalio.DigitalInOut(board.GP3)
D1pin.direction = digitalio.Direction.OUTPUT
D2pin = digitalio.DigitalInOut(board.GP4)
D2pin.direction = digitalio.Direction.OUTPUT
D3pin = digitalio.DigitalInOut(board.GP5)
D3pin.direction = digitalio.Direction.OUTPUT
D4pin = digitalio.DigitalInOut(board.GP6)
D4pin.direction = digitalio.Direction.OUTPUT
D5pin = digitalio.DigitalInOut(board.GP7)
D5pin.direction = digitalio.Direction.OUTPUT
D6pin = digitalio.DigitalInOut(board.GP8)
D6pin.direction = digitalio.Direction.OUTPUT
D7pin = digitalio.DigitalInOut(board.GP9)
D7pin.direction = digitalio.Direction.OUTPUT

def doOutput (data):
    if (data & 0x01):
        D0pin.value = True
    else:
        D0pin.value = False
    
    if (data & 0x02):
        D1pin.value = True
    else:
        D1pin.value = False
    
    if (data & 0x04):
        D2pin.value = True
    else:
        D2pin.value = False
    
    if (data & 0x08):
        D3pin.value = True
    else:
        D3pin.value = False
    
    if (data & 0x10):
        D4pin.value = True
    else:
        D4pin.value = False
    
    if (data & 0x20):
        D5pin.value = True
    else:
        D5pin.value = False

    if (data & 0x40):
        D6pin.value = True
    else:
        D6pin.value = False

    if (data & 0x80):
        D7pin.value = True
    else:
        D7pin.value = False

while True:
    Tpin.value = True
    addr = 0
    if (A0pin.value == True):
        addr = addr + 1
    if (A1pin.value == True):
        addr = addr + 2
    if (A2pin.value == True):
        addr = addr + 4
    if (A3pin.value == True):
        addr = addr + 8

    Tpin.value = False
    doOutput(ROM[addr])
```
I’ve included a timing pin to GPIO21 so I can see how long it takes to access the IO.
It turns out that it takes something of the order of 50-60uS to read the four address lines and something in the region of 70-80uS to write out the 8 data lines. The above simple Circuitpython code to do this is running with a frequency of around 7kHz.
Now at this point I ought to be reading through the datasheets for the ICs used in the CPU to check response times and timing tolerances so see if this is ok. But I didn’t bother with any of that as it all appears to work!
Conclusion
The Circuitpython is obviously a lot slower than the Arduino running optimised PORTIO code, even though the Circuitpython is running on a 125MHz processor compared to the Arduino’s 16MHz. Of course, if performance was critical then switching to direct GPIO access in C on the Pico would be a lot faster again. Even just having a way to do a single block-access of GPIO would probably make quite a difference.
But for this application, either as they are seem to work absolutely fine.
The ability to quickly edit the ROM contents is pretty useful with the Circuitpython. But I am now wondering how difficult it would be to have some kind of uploader to the Arduino over the serial port. There are only 16 bytes to transfer after all.
In fact it might even be possible to create a simple interactive assembler that allows code to be typed in over the serial port using proper word-based op-codes (like ADD, IN, OUT, etc). At the very least a simple serial port interface to type in numeric values would be relatively straight forward I think.
I’m not sure the added complications of logic shifting, etc, make it worth carrying on with a Pico version at this stage, so I think improving the Arduino is probably the way to go for now.
Kevin
#4bit #arduinoUno #circuitpython #PORTIO #raspberryPiPico #TD4
#4bit #arduinouno #circuitpython #portio #raspberrypipico #td4
Kevin's Blog @[email protected] · 2025-11-15 · 18:13 UTC
TD4 4-bit DIY CPU – Part 6
Having now successfully built my own version of the TD4 4-bit CPU in Part 5, I’m now chewing over some of the ways I’d like to try to expand it.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
I already have a list of others extended projects at the end of Part 4, so I might be drawing on some of them for inspiration moving forward. Many of these are very similar projects, but with a completely different architecture. But really at this stage rather than build a different, more capable, 4-bit CPU from someone else’s design, I’m interested in seeing how far the TD4 design can go. So, ultimately, like all my projects, the fun here is in the reinventing and learning on the way.
One of the questions I have is can I replace the DIP switches with something that can provide the data in a better way? This would be particularly critical if I expand the address space in the future. A ROM is the obvious option, but something more dynamic might be an interesting experiment too.
This post looks at options for replacing the DIP switches with microcontrollers.
Now I feel like I really ought to state right up front that this is a pretty ludicrous thing to do.
At the more charitable end of the endeavor I’m using a 16MHz 8-bit AVR microcontroller with 2kB of RAM to serve up 16 8-bit values to a 10Hz 4-bit CPU.
At the most extreme end I’m using a 125MHz, dual-core, 32-bit ARM Cortex M0+ CPU with 264 kB of RAM running an entire interpreted programming environment requiring (probably) millions of lines of low-level code to implement it, to do the same thing.
So why bother? Well – why not?
TD4 without the ROM
To interface to a microcontroller, I’m after two things:
- Ability to read the 4 address lines.
- Ability to drive the 8 data lines.
The best place to get at these signals is on the interface to the ROM itself – the 74HC540 octal line driver, and 74HC154 4-to-8 line decoder.
Conveniently, these signals can be broken out quite easily on my board as shown below.
The pink shaded area shows which components are needed for a ROM-less build. The two yellow highlights show where headers should be soldered to permit access to the address lines (top) and data lines (bottom).
In this build, the following components are omitted from the full board:
- 74HC154
- 74HC540
- 16x 8-way DIP switches
- 128x small signal diodes
- 8x 10k pull-up resistors
I’ve used 6-way and 10-way pin header sockets to allow me to patch in a microcontroller. This allows for each header to conveniently include 5V and GND too. I’ve included the USB socket for power to the PCB but expect I’ll probably power the board via these 5V and GND links from the microcontroller.
Using Arduino
The natural choice here is to use one of the older Arduino boards, as these are all 5V IO which makes interfacing with the 4-bit CPU fairly straight forward.
Using Arduino direct PORTIO should also make it pretty trivial to read address lines and write the data. I’ve configured the connections as follows:
TD4 SignalArduino GPIOArduino PORTIOA0A0PORTC:0A1A1PORTC:1A2A2PORTC:2A3A3PORTC:3D0D8PORTB:0D1D9PORTB:1D2D10PORTB:2D3D11PORTB:3D4D4PORTD:4D5D5PORTD:5D6D6PORTD:6D7D7PORTD:7
I’m avoiding D0/D1 (PORTD[0:1]) and D13 as they all have other hardware attached (serial port and LED in this case).
Accessing the data corresponding to any specific address is as simple as follows:
```
uint8_t ROM[16];

loop:
    unt8_t addr = PINC & 0x0F
    PORTB = (PORTD & ~(0x0F)) | (ROM[addr] & 0x0F);
    PORTD = (PORTD & ~(0xF0)) | (ROM[addr] & 0xF0);
```
The code could be simplified if I didn’t mind trashing whatever is configured for the other GPIO pins via the PORTIO, but it is good practice to preserve those values when only writing to a subset of the IO ports.
In the final code below, I’ve included a toggle for A5 which allows me to do some timing measurements too.
```
uint8_t ROM[16] = {
    0xB1, 0x01, 0xB2, 0x51,
    0xB4, 0x01, 0xB8, 0x51,
    0xB4, 0x01, 0xB2, 0x51,
    0xF0, 0x00, 0x00, 0x00
};

void setup() {
  DDRB |= 0x0F;
  DDRD |= 0xF0;
  DDRC |= 0x20;
}

int toggle;
void loop() {
  if (toggle == 0) {
    toggle = 1;
    PORTC |= 0x20;
  } else {
    toggle = 0;
    PORTC &= ~(0x20);
  }

  uint8_t addr = PINC & 0x0F;
  PORTB = (PORTD & ~(0x0F)) | (ROM[addr] & 0x0F);
  PORTD = (PORTD & ~(0xF0)) | (ROM[addr] & 0xF0);
}
```
Running the code in a loop like this gives a scan frequency of around 500kHz and a response time of something like 2-3 uS for each read. That seems pretty responsive and I’m sure will be fine for a 10Hz CPU. And it is – it works great!
Using Circuitpython
One thing that would be really nice is a workflow that allows more of a “direct save to the CPU” approach to programming it. One option is to use a more modern microcontroller that supports a filesystem.
The obvious choice here is a 32-bit microcontroller that supports Circuitpython. But will IO in Circuitpython be fast enough to respond to the CPU? There is one obvious way to find out – give it a try.
There is another complication too – most Circuitpython boards run at 3.3V not 5V so that needs to be addressed too.
Level Shifting
I’m going to use a 74LVC245. The Adafruit product page puts it best:
“essentially: connect VCC to your logic level you want to convert to (say 3.3V), Ground connects to Ground. Wire OE (output enable) to ground to enable the device and DIR (direction) to VCC. Then digital logic on the A pins up to 5V will appear on the B pins shifted down to the VCC logic.”
This is an 8-way bi-directional bus transceiver and should be powered by 3V3, then the direction pin will determine the direction of the conversion as shown[ below.
Two devices will be required. The address lines will need a 5V to 3V3 conversion; the data lines will need 3V3 o 5V.
Here is how I’ve wired these up for a Raspberry Pi Pico:
The Pico is connected as follows:
- INPUT: GPIO 10-13 = A0-A3
- OUTPUT: GPIO 2-9 = D7-D0 (not the ordering!)
CircuitPython ROM
The basic algorithm will be as follows:
```
ROM = [16 command byte values]

LOOP:
  Read four address lines
  Set data lines from ROM[address]
```
For performance reasons it would be best to optimise both the reading of the address lines and the writing of the data lines, ideally into a single access. But as this is for a CPU that runs at a maximum of 10Hz, so for now, I’m just going with simple and see how it goes.
```
import board
import digitalio

ROM = [
    0xB1, 0x01, 0xB2, 0x51,
    0xB4, 0x01, 0xB8, 0x51,
    0xB4, 0x01, 0xB2, 0x51,
    0xF0, 0x00, 0x00, 0x00
    ]

Tpin = digitalio.DigitalInOut(board.GP21)
Tpin.direction = digitalio.Direction.OUTPUT

A0pin = digitalio.DigitalInOut(board.GP10)
A1pin = digitalio.DigitalInOut(board.GP11)
A2pin = digitalio.DigitalInOut(board.GP12)
A3pin = digitalio.DigitalInOut(board.GP13)

D0pin = digitalio.DigitalInOut(board.GP2)
D0pin.direction = digitalio.Direction.OUTPUT
D1pin = digitalio.DigitalInOut(board.GP3)
D1pin.direction = digitalio.Direction.OUTPUT
D2pin = digitalio.DigitalInOut(board.GP4)
D2pin.direction = digitalio.Direction.OUTPUT
D3pin = digitalio.DigitalInOut(board.GP5)
D3pin.direction = digitalio.Direction.OUTPUT
D4pin = digitalio.DigitalInOut(board.GP6)
D4pin.direction = digitalio.Direction.OUTPUT
D5pin = digitalio.DigitalInOut(board.GP7)
D5pin.direction = digitalio.Direction.OUTPUT
D6pin = digitalio.DigitalInOut(board.GP8)
D6pin.direction = digitalio.Direction.OUTPUT
D7pin = digitalio.DigitalInOut(board.GP9)
D7pin.direction = digitalio.Direction.OUTPUT

def doOutput (data):
    if (data & 0x01):
        D0pin.value = True
    else:
        D0pin.value = False
    
    if (data & 0x02):
        D1pin.value = True
    else:
        D1pin.value = False
    
    if (data & 0x04):
        D2pin.value = True
    else:
        D2pin.value = False
    
    if (data & 0x08):
        D3pin.value = True
    else:
        D3pin.value = False
    
    if (data & 0x10):
        D4pin.value = True
    else:
        D4pin.value = False
    
    if (data & 0x20):
        D5pin.value = True
    else:
        D5pin.value = False

    if (data & 0x40):
        D6pin.value = True
    else:
        D6pin.value = False

    if (data & 0x80):
        D7pin.value = True
    else:
        D7pin.value = False

while True:
    Tpin.value = True
    addr = 0
    if (A0pin.value == True):
        addr = addr + 1
    if (A1pin.value == True):
        addr = addr + 2
    if (A2pin.value == True):
        addr = addr + 4
    if (A3pin.value == True):
        addr = addr + 8

    Tpin.value = False
    doOutput(ROM[addr])
```
I’ve included a timing pin to GPIO21 so I can see how long it takes to access the IO.
It turns out that it takes something of the order of 50-60uS to read the four address lines and something in the region of 70-80uS to write out the 8 data lines. The above simple Circuitpython code to do this is running with a frequency of around 7kHz.
Now at this point I ought to be reading through the datasheets for the ICs used in the CPU to check response times and timing tolerances so see if this is ok. But I didn’t bother with any of that as it all appears to work!
Conclusion
The Circuitpython is obviously a lot slower than the Arduino running optimised PORTIO code, even though the Circuitpython is running on a 125MHz processor compared to the Arduino’s 16MHz. Of course, if performance was critical then switching to direct GPIO access in C on the Pico would be a lot faster again. Even just having a way to do a single block-access of GPIO would probably make quite a difference.
But for this application, either as they are seem to work absolutely fine.
The ability to quickly edit the ROM contents is pretty useful with the Circuitpython. But I am now wondering how difficult it would be to have some kind of uploader to the Arduino over the serial port. There are only 16 bytes to transfer after all.
In fact it might even be possible to create a simple interactive assembler that allows code to be typed in over the serial port using proper word-based op-codes (like ADD, IN, OUT, etc). At the very least a simple serial port interface to type in numeric values would be relatively straight forward I think.
I’m not sure the added complications of logic shifting, etc, make it worth carrying on with a Pico version at this stage, so I think improving the Arduino is probably the way to go for now.
Kevin
#4bit #arduinoUno #circuitpython #PORTIO #raspberryPiPico #TD4
#td4 #raspberrypipico #portio #circuitpython #arduinouno #4bit
Kevin's Blog @[email protected] · 2025-11-15 · 18:13 UTC
TD4 4-bit DIY CPU – Part 6
Having now successfully built my own version of the TD4 4-bit CPU in Part 5, I’m now chewing over some of the ways I’d like to try to expand it.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
I already have a list of others extended projects at the end of Part 4, so I might be drawing on some of them for inspiration moving forward. Many of these are very similar projects, but with a completely different architecture. But really at this stage rather than build a different, more capable, 4-bit CPU from someone else’s design, I’m interested in seeing how far the TD4 design can go. So, ultimately, like all my projects, the fun here is in the reinventing and learning on the way.
One of the questions I have is can I replace the DIP switches with something that can provide the data in a better way? This would be particularly critical if I expand the address space in the future. A ROM is the obvious option, but something more dynamic might be an interesting experiment too.
This post looks at options for replacing the DIP switches with microcontrollers.
Now I feel like I really ought to state right up front that this is a pretty ludicrous thing to do.
At the more charitable end of the endeavor I’m using a 16MHz 8-bit AVR microcontroller with 2kB of RAM to serve up 16 8-bit values to a 10Hz 4-bit CPU.
At the most extreme end I’m using a 125MHz, dual-core, 32-bit ARM Cortex M0+ CPU with 264 kB of RAM running an entire interpreted programming environment requiring (probably) millions of lines of low-level code to implement it, to do the same thing.
So why bother? Well – why not?
TD4 without the ROM
To interface to a microcontroller, I’m after two things:
- Ability to read the 4 address lines.
- Ability to drive the 8 data lines.
The best place to get at these signals is on the interface to the ROM itself – the 74HC540 octal line driver, and 74HC154 4-to-8 line decoder.
Conveniently, these signals can be broken out quite easily on my board as shown below.
The pink shaded area shows which components are needed for a ROM-less build. The two yellow highlights show where headers should be soldered to permit access to the address lines (top) and data lines (bottom).
In this build, the following components are omitted from the full board:
- 74HC154
- 74HC540
- 16x 8-way DIP switches
- 128x small signal diodes
- 8x 10k pull-up resistors
I’ve used 6-way and 10-way pin header sockets to allow me to patch in a microcontroller. This allows for each header to conveniently include 5V and GND too. I’ve included the USB socket for power to the PCB but expect I’ll probably power the board via these 5V and GND links from the microcontroller.
Using Arduino
The natural choice here is to use one of the older Arduino boards, as these are all 5V IO which makes interfacing with the 4-bit CPU fairly straight forward.
Using Arduino direct PORTIO should also make it pretty trivial to read address lines and write the data. I’ve configured the connections as follows:
TD4 SignalArduino GPIOArduino PORTIOA0A0PORTC:0A1A1PORTC:1A2A2PORTC:2A3A3PORTC:3D0D8PORTB:0D1D9PORTB:1D2D10PORTB:2D3D11PORTB:3D4D4PORTD:4D5D5PORTD:5D6D6PORTD:6D7D7PORTD:7
I’m avoiding D0/D1 (PORTD[0:1]) and D13 as they all have other hardware attached (serial port and LED in this case).
Accessing the data corresponding to any specific address is as simple as follows:
```
uint8_t ROM[16];

loop:
    unt8_t addr = PINC & 0x0F
    PORTB = (PORTD & ~(0x0F)) | (ROM[addr] & 0x0F);
    PORTD = (PORTD & ~(0xF0)) | (ROM[addr] & 0xF0);
```
The code could be simplified if I didn’t mind trashing whatever is configured for the other GPIO pins via the PORTIO, but it is good practice to preserve those values when only writing to a subset of the IO ports.
In the final code below, I’ve included a toggle for A5 which allows me to do some timing measurements too.
```
uint8_t ROM[16] = {
    0xB1, 0x01, 0xB2, 0x51,
    0xB4, 0x01, 0xB8, 0x51,
    0xB4, 0x01, 0xB2, 0x51,
    0xF0, 0x00, 0x00, 0x00
};

void setup() {
  DDRB |= 0x0F;
  DDRD |= 0xF0;
  DDRC |= 0x20;
}

int toggle;
void loop() {
  if (toggle == 0) {
    toggle = 1;
    PORTC |= 0x20;
  } else {
    toggle = 0;
    PORTC &= ~(0x20);
  }

  uint8_t addr = PINC & 0x0F;
  PORTB = (PORTD & ~(0x0F)) | (ROM[addr] & 0x0F);
  PORTD = (PORTD & ~(0xF0)) | (ROM[addr] & 0xF0);
}
```
Running the code in a loop like this gives a scan frequency of around 500kHz and a response time of something like 2-3 uS for each read. That seems pretty responsive and I’m sure will be fine for a 10Hz CPU. And it is – it works great!
Using Circuitpython
One thing that would be really nice is a workflow that allows more of a “direct save to the CPU” approach to programming it. One option is to use a more modern microcontroller that supports a filesystem.
The obvious choice here is a 32-bit microcontroller that supports Circuitpython. But will IO in Circuitpython be fast enough to respond to the CPU? There is one obvious way to find out – give it a try.
There is another complication too – most Circuitpython boards run at 3.3V not 5V so that needs to be addressed too.
Level Shifting
I’m going to use a 74LVC245. The Adafruit product page puts it best:
“essentially: connect VCC to your logic level you want to convert to (say 3.3V), Ground connects to Ground. Wire OE (output enable) to ground to enable the device and DIR (direction) to VCC. Then digital logic on the A pins up to 5V will appear on the B pins shifted down to the VCC logic.”
This is an 8-way bi-directional bus transceiver and should be powered by 3V3, then the direction pin will determine the direction of the conversion as shown[ below.
Two devices will be required. The address lines will need a 5V to 3V3 conversion; the data lines will need 3V3 o 5V.
Here is how I’ve wired these up for a Raspberry Pi Pico:
The Pico is connected as follows:
- INPUT: GPIO 10-13 = A0-A3
- OUTPUT: GPIO 2-9 = D7-D0 (not the ordering!)
CircuitPython ROM
The basic algorithm will be as follows:
```
ROM = [16 command byte values]

LOOP:
  Read four address lines
  Set data lines from ROM[address]
```
For performance reasons it would be best to optimise both the reading of the address lines and the writing of the data lines, ideally into a single access. But as this is for a CPU that runs at a maximum of 10Hz, so for now, I’m just going with simple and see how it goes.
```
import board
import digitalio

ROM = [
    0xB1, 0x01, 0xB2, 0x51,
    0xB4, 0x01, 0xB8, 0x51,
    0xB4, 0x01, 0xB2, 0x51,
    0xF0, 0x00, 0x00, 0x00
    ]

Tpin = digitalio.DigitalInOut(board.GP21)
Tpin.direction = digitalio.Direction.OUTPUT

A0pin = digitalio.DigitalInOut(board.GP10)
A1pin = digitalio.DigitalInOut(board.GP11)
A2pin = digitalio.DigitalInOut(board.GP12)
A3pin = digitalio.DigitalInOut(board.GP13)

D0pin = digitalio.DigitalInOut(board.GP2)
D0pin.direction = digitalio.Direction.OUTPUT
D1pin = digitalio.DigitalInOut(board.GP3)
D1pin.direction = digitalio.Direction.OUTPUT
D2pin = digitalio.DigitalInOut(board.GP4)
D2pin.direction = digitalio.Direction.OUTPUT
D3pin = digitalio.DigitalInOut(board.GP5)
D3pin.direction = digitalio.Direction.OUTPUT
D4pin = digitalio.DigitalInOut(board.GP6)
D4pin.direction = digitalio.Direction.OUTPUT
D5pin = digitalio.DigitalInOut(board.GP7)
D5pin.direction = digitalio.Direction.OUTPUT
D6pin = digitalio.DigitalInOut(board.GP8)
D6pin.direction = digitalio.Direction.OUTPUT
D7pin = digitalio.DigitalInOut(board.GP9)
D7pin.direction = digitalio.Direction.OUTPUT

def doOutput (data):
    if (data & 0x01):
        D0pin.value = True
    else:
        D0pin.value = False
    
    if (data & 0x02):
        D1pin.value = True
    else:
        D1pin.value = False
    
    if (data & 0x04):
        D2pin.value = True
    else:
        D2pin.value = False
    
    if (data & 0x08):
        D3pin.value = True
    else:
        D3pin.value = False
    
    if (data & 0x10):
        D4pin.value = True
    else:
        D4pin.value = False
    
    if (data & 0x20):
        D5pin.value = True
    else:
        D5pin.value = False

    if (data & 0x40):
        D6pin.value = True
    else:
        D6pin.value = False

    if (data & 0x80):
        D7pin.value = True
    else:
        D7pin.value = False

while True:
    Tpin.value = True
    addr = 0
    if (A0pin.value == True):
        addr = addr + 1
    if (A1pin.value == True):
        addr = addr + 2
    if (A2pin.value == True):
        addr = addr + 4
    if (A3pin.value == True):
        addr = addr + 8

    Tpin.value = False
    doOutput(ROM[addr])
```
I’ve included a timing pin to GPIO21 so I can see how long it takes to access the IO.
It turns out that it takes something of the order of 50-60uS to read the four address lines and something in the region of 70-80uS to write out the 8 data lines. The above simple Circuitpython code to do this is running with a frequency of around 7kHz.
Now at this point I ought to be reading through the datasheets for the ICs used in the CPU to check response times and timing tolerances so see if this is ok. But I didn’t bother with any of that as it all appears to work!
Conclusion
The Circuitpython is obviously a lot slower than the Arduino running optimised PORTIO code, even though the Circuitpython is running on a 125MHz processor compared to the Arduino’s 16MHz. Of course, if performance was critical then switching to direct GPIO access in C on the Pico would be a lot faster again. Even just having a way to do a single block-access of GPIO would probably make quite a difference.
But for this application, either as they are seem to work absolutely fine.
The ability to quickly edit the ROM contents is pretty useful with the Circuitpython. But I am now wondering how difficult it would be to have some kind of uploader to the Arduino over the serial port. There are only 16 bytes to transfer after all.
In fact it might even be possible to create a simple interactive assembler that allows code to be typed in over the serial port using proper word-based op-codes (like ADD, IN, OUT, etc). At the very least a simple serial port interface to type in numeric values would be relatively straight forward I think. It might also be possible to allow the microcontroller to reset the CPU too.
I’m not sure the added complications of logic shifting, etc, make it worth carrying on with a Pico version at this stage, so I think improving the Arduino is probably the way to go for now.
Kevin
#4bit #arduinoUno #circuitpython #portio #raspberryPiPico #td4
#4bit #arduinouno #circuitpython #portio #raspberrypipico #td4
Kevin's Blog @[email protected] · 2025-11-15 · 18:13 UTC
TD4 4-bit DIY CPU – Part 6
Having now successfully built my own version of the TD4 4-bit CPU in Part 5, I’m now chewing over some of the ways I’d like to try to expand it.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
I already have a list of others extended projects at the end of Part 4, so I might be drawing on some of them for inspiration moving forward. Many of these are very similar projects, but with a completely different architecture. But really at this stage rather than build a different, more capable, 4-bit CPU from someone else’s design, I’m interested in seeing how far the TD4 design can go. So, ultimately, like all my projects, the fun here is in the reinventing and learning on the way.
One of the questions I have is can I replace the DIP switches with something that can provide the data in a better way? This would be particularly critical if I expand the address space in the future. A ROM is the obvious option, but something more dynamic might be an interesting experiment too.
This post looks at options for replacing the DIP switches with microcontrollers.
Now I feel like I really ought to state right up front that this is a pretty ludicrous thing to do.
At the more charitable end of the endeavor I’m using a 16MHz 8-bit AVR microcontroller with 2kB of RAM to serve up 16 8-bit values to a 10Hz 4-bit CPU.
At the most extreme end I’m using a 125MHz, dual-core, 32-bit ARM Cortex M0+ CPU with 264 kB of RAM running an entire interpreted programming environment requiring (probably) millions of lines of low-level code to implement it, to do the same thing.
So why bother? Well – why not?
TD4 without the ROM
To interface to a microcontroller, I’m after two things:
- Ability to read the 4 address lines.
- Ability to drive the 8 data lines.
The best place to get at these signals is on the interface to the ROM itself – the 74HC540 octal line driver, and 74HC154 4-to-8 line decoder.
Conveniently, these signals can be broken out quite easily on my board as shown below.
The pink shaded area shows which components are needed for a ROM-less build. The two yellow highlights show where headers should be soldered to permit access to the address lines (top) and data lines (bottom).
In this build, the following components are omitted from the full board:
- 74HC154
- 74HC540
- 16x 8-way DIP switches
- 128x small signal diodes
- 8x 10k pull-up resistors
I’ve used 6-way and 10-way pin header sockets to allow me to patch in a microcontroller. This allows for each header to conveniently include 5V and GND too. I’ve included the USB socket for power to the PCB but expect I’ll probably power the board via these 5V and GND links from the microcontroller.
Using Arduino
The natural choice here is to use one of the older Arduino boards, as these are all 5V IO which makes interfacing with the 4-bit CPU fairly straight forward.
Using Arduino direct PORTIO should also make it pretty trivial to read address lines and write the data. I’ve configured the connections as follows:
TD4 SignalArduino GPIOArduino PORTIOA0A0PORTC:0A1A1PORTC:1A2A2PORTC:2A3A3PORTC:3D0D8PORTB:0D1D9PORTB:1D2D10PORTB:2D3D11PORTB:3D4D4PORTD:4D5D5PORTD:5D6D6PORTD:6D7D7PORTD:7
I’m avoiding D0/D1 (PORTD[0:1]) and D13 as they all have other hardware attached (serial port and LED in this case).
Accessing the data corresponding to any specific address is as simple as follows:
```
uint8_t ROM[16];

loop:
    unt8_t addr = PINC & 0x0F
    PORTB = (PORTD & ~(0x0F)) | (ROM[addr] & 0x0F);
    PORTD = (PORTD & ~(0xF0)) | (ROM[addr] & 0xF0);
```
The code could be simplified if I didn’t mind trashing whatever is configured for the other GPIO pins via the PORTIO, but it is good practice to preserve those values when only writing to a subset of the IO ports.
In the final code below, I’ve included a toggle for A5 which allows me to do some timing measurements too.
```
uint8_t ROM[16] = {
    0xB1, 0x01, 0xB2, 0x51,
    0xB4, 0x01, 0xB8, 0x51,
    0xB4, 0x01, 0xB2, 0x51,
    0xF0, 0x00, 0x00, 0x00
};

void setup() {
  DDRB |= 0x0F;
  DDRD |= 0xF0;
  DDRC |= 0x20;
}

int toggle;
void loop() {
  if (toggle == 0) {
    toggle = 1;
    PORTC |= 0x20;
  } else {
    toggle = 0;
    PORTC &= ~(0x20);
  }

  uint8_t addr = PINC & 0x0F;
  PORTB = (PORTD & ~(0x0F)) | (ROM[addr] & 0x0F);
  PORTD = (PORTD & ~(0xF0)) | (ROM[addr] & 0xF0);
}
```
Running the code in a loop like this gives a scan frequency of around 500kHz and a response time of something like 2-3 uS for each read. That seems pretty responsive and I’m sure will be fine for a 10Hz CPU. And it is – it works great!
Using Circuitpython
One thing that would be really nice is a workflow that allows more of a “direct save to the CPU” approach to programming it. One option is to use a more modern microcontroller that supports a filesystem.
The obvious choice here is a 32-bit microcontroller that supports Circuitpython. But will IO in Circuitpython be fast enough to respond to the CPU? There is one obvious way to find out – give it a try.
There is another complication too – most Circuitpython boards run at 3.3V not 5V so that needs to be addressed too.
Level Shifting
I’m going to use a 74LVC245. The Adafruit product page puts it best:
“essentially: connect VCC to your logic level you want to convert to (say 3.3V), Ground connects to Ground. Wire OE (output enable) to ground to enable the device and DIR (direction) to VCC. Then digital logic on the A pins up to 5V will appear on the B pins shifted down to the VCC logic.”
This is an 8-way bi-directional bus transceiver and should be powered by 3V3, then the direction pin will determine the direction of the conversion as shown[ below.
Two devices will be required. The address lines will need a 5V to 3V3 conversion; the data lines will need 3V3 o 5V.
Here is how I’ve wired these up for a Raspberry Pi Pico:
The Pico is connected as follows:
- INPUT: GPIO 10-13 = A0-A3
- OUTPUT: GPIO 2-9 = D7-D0 (not the ordering!)
CircuitPython ROM
The basic algorithm will be as follows:
```
ROM = [16 command byte values]

LOOP:
  Read four address lines
  Set data lines from ROM[address]
```
For performance reasons it would be best to optimise both the reading of the address lines and the writing of the data lines, ideally into a single access. But as this is for a CPU that runs at a maximum of 10Hz, so for now, I’m just going with simple and see how it goes.
```
import board
import digitalio

ROM = [
    0xB1, 0x01, 0xB2, 0x51,
    0xB4, 0x01, 0xB8, 0x51,
    0xB4, 0x01, 0xB2, 0x51,
    0xF0, 0x00, 0x00, 0x00
    ]

Tpin = digitalio.DigitalInOut(board.GP21)
Tpin.direction = digitalio.Direction.OUTPUT

A0pin = digitalio.DigitalInOut(board.GP10)
A1pin = digitalio.DigitalInOut(board.GP11)
A2pin = digitalio.DigitalInOut(board.GP12)
A3pin = digitalio.DigitalInOut(board.GP13)

D0pin = digitalio.DigitalInOut(board.GP2)
D0pin.direction = digitalio.Direction.OUTPUT
D1pin = digitalio.DigitalInOut(board.GP3)
D1pin.direction = digitalio.Direction.OUTPUT
D2pin = digitalio.DigitalInOut(board.GP4)
D2pin.direction = digitalio.Direction.OUTPUT
D3pin = digitalio.DigitalInOut(board.GP5)
D3pin.direction = digitalio.Direction.OUTPUT
D4pin = digitalio.DigitalInOut(board.GP6)
D4pin.direction = digitalio.Direction.OUTPUT
D5pin = digitalio.DigitalInOut(board.GP7)
D5pin.direction = digitalio.Direction.OUTPUT
D6pin = digitalio.DigitalInOut(board.GP8)
D6pin.direction = digitalio.Direction.OUTPUT
D7pin = digitalio.DigitalInOut(board.GP9)
D7pin.direction = digitalio.Direction.OUTPUT

def doOutput (data):
    if (data & 0x01):
        D0pin.value = True
    else:
        D0pin.value = False
    
    if (data & 0x02):
        D1pin.value = True
    else:
        D1pin.value = False
    
    if (data & 0x04):
        D2pin.value = True
    else:
        D2pin.value = False
    
    if (data & 0x08):
        D3pin.value = True
    else:
        D3pin.value = False
    
    if (data & 0x10):
        D4pin.value = True
    else:
        D4pin.value = False
    
    if (data & 0x20):
        D5pin.value = True
    else:
        D5pin.value = False

    if (data & 0x40):
        D6pin.value = True
    else:
        D6pin.value = False

    if (data & 0x80):
        D7pin.value = True
    else:
        D7pin.value = False

while True:
    Tpin.value = True
    addr = 0
    if (A0pin.value == True):
        addr = addr + 1
    if (A1pin.value == True):
        addr = addr + 2
    if (A2pin.value == True):
        addr = addr + 4
    if (A3pin.value == True):
        addr = addr + 8

    Tpin.value = False
    doOutput(ROM[addr])
```
I’ve included a timing pin to GPIO21 so I can see how long it takes to access the IO.
It turns out that it takes something of the order of 50-60uS to read the four address lines and something in the region of 70-80uS to write out the 8 data lines. The above simple Circuitpython code to do this is running with a frequency of around 7kHz.
Now at this point I ought to be reading through the datasheets for the ICs used in the CPU to check response times and timing tolerances so see if this is ok. But I didn’t bother with any of that as it all appears to work!
Conclusion
The Circuitpython is obviously a lot slower than the Arduino running optimised PORTIO code, even though the Circuitpython is running on a 125MHz processor compared to the Arduino’s 16MHz. Of course, if performance was critical then switching to direct GPIO access in C on the Pico would be a lot faster again. Even just having a way to do a single block-access of GPIO would probably make quite a difference.
But for this application, either as they are seem to work absolutely fine.
The ability to quickly edit the ROM contents is pretty useful with the Circuitpython. But I am now wondering how difficult it would be to have some kind of uploader to the Arduino over the serial port. There are only 16 bytes to transfer after all.
In fact it might even be possible to create a simple interactive assembler that allows code to be typed in over the serial port using proper word-based op-codes (like ADD, IN, OUT, etc). At the very least a simple serial port interface to type in numeric values would be relatively straight forward I think. It might also be possible to allow the microcontroller to reset the CPU too.
I’m not sure the added complications of logic shifting, etc, make it worth carrying on with a Pico version at this stage, so I think improving the Arduino is probably the way to go for now.
Kevin
#4bit #arduinoUno #circuitpython #portio #raspberryPiPico #td4
#4bit #arduinouno #circuitpython #portio #raspberrypipico #td4
Kevin's Blog @[email protected] · 2025-10-29 · 22:08 UTC
TD4 4-bit DIY CPU – Part 5
As a prelude to expanding the TD4 in my own way, I thought I ought to at least attempt to reproduce the circuit myself. But if I’m going to design my own version of the TD4 PCB and build it, I may as well add a few little extras.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
PCB Design
I’ve build the schematic up from the published schematics to be found in: https://github.com/wuxx/TD4-4BIT-CPU
But I wanted to ensure I had the following additions:
- LED outputs for the two registers.
- Some kind of LED output to show which instruction is being worked on.
- Minimal number of surface mount components.
- Option to experiment with replace the diodes with LEDs.
With all that in mind, I’ve ended up with the following schematic, which I’ve spread over four sheets.
The core CPU:
Power, clock and reset:
IO and User Interface:
ROM:
I’ve managed to get this all into a 180×110 mm board.
Design choices/notes:
- I’ve largely followed the same layout as the cheap TD4 kit I bought.
- I’ve added the register LEDs to the top of the board.
- And moved the INPUT and OUTPUT up there to match.
- I’ve added LEDs next to each bank of switches to show which instruction is being worked on.
- All LEDs are 3x2mm rectangle LEDs, including the OUTPUT LEDs (which were surface mount on the original).
- I’ve kept a micro USB socket, which unfortunately is still surface mount. I’m hoping it is the right footprint for a commonly available connector. But I’ve moved it to the bottom of the board.
- I’ve made sure all IC labels are included on the silkscreen along with all resistor values.
Full BOM
ICs:
- 1x 74HC10 Triple 3-input NAND
- 1x 74HC14 Hex Schmitt trigger inverters
- 1x 74HC32 Quad 2-input OR
- 2x 74HC153 Dual 4-to-1 selector/multiplexer
- 1x 74HC154 4-to-16 decoder/demultiplexer
- 4x 74HC161 4-bit binary counter
- 1x 74HC283 4-bit binary full adder
- 1x 74HC540 Octal inverting buffer/line driver
Semiconductors:
- 128x 1N4148 or 1N914 small signal diode
- 28x 3x2mm rectangular LED
Passive components:
- Resistors: 2x100R; 35x 1K; 1x 3K3; 9x 10K; 1 x 33K; 3x 100K
- Capacitors: 3x 10uF electrolytic
Other components:
- 2x SPDT slider switches (see PCB for footprint)
- 1x micro USB socket (Molex, see PCB for footprint)
- 2x tactile switches
- 1x 4-way DIP switches
- 16x 8-way DIP switches
- DIP sockets: 7x 16 way; 4x 14 way; 1x 20 way; 1x 24 way
Build Photos
Conclusion
I’ve managed to get the sense of the address LEDs reversed. They are all lit apart from the running instruction. But actually I quite like that effect. It has more LEDs active and still shows which instruction is active.
Apart from that, this seems to work as far as I can tell at present, so I’m saying this is a success!
I haven’t decided if I want to publish this board yet though. It relies on so much of the effort of others. I’ve really not done very much myself at all.
But now I can go back to the schematic and see if I can expand on the logic in anyway, knowing I had a known-good, working, starting point.
Kevin
https://makertube.net/w/beyW1XWnp4dfxkXhFWfhhm
#4bit #cpu #TD4
#4bit #cpu #td4
Kevin's Blog @[email protected] · 2025-10-29 · 22:08 UTC
TD4 4-bit DIY CPU – Part 5
As a prelude to expanding the TD4 in my own way, I thought I ought to at least attempt to reproduce the circuit myself. But if I’m going to design my own version of the TD4 PCB and build it, I may as well add a few little extras.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
PCB Design
I’ve build the schematic up from the published schematics to be found in: https://github.com/wuxx/TD4-4BIT-CPU
But I wanted to ensure I had the following additions:
- LED outputs for the two registers.
- Some kind of LED output to show which instruction is being worked on.
- Minimal number of surface mount components.
- Option to experiment with replace the diodes with LEDs.
With all that in mind, I’ve ended up with the following schematic, which I’ve spread over four sheets.
The core CPU:
Power, clock and reset:
IO and User Interface:
ROM:
I’ve managed to get this all into a 180×110 mm board.
Design choices/notes:
- I’ve largely followed the same layout as the cheap TD4 kit I bought.
- I’ve added the register LEDs to the top of the board.
- And moved the INPUT and OUTPUT up there to match.
- I’ve added LEDs next to each bank of switches to show which instruction is being worked on.
- All LEDs are 3x2mm rectangle LEDs, including the OUTPUT LEDs (which were surface mount on the original).
- I’ve kept a micro USB socket, which unfortunately is still surface mount. I’m hoping it is the right footprint for a commonly available connector. But I’ve moved it to the bottom of the board.
- I’ve made sure all IC labels are included on the silkscreen along with all resistor values.
Full BOM
ICs:
- 1x 74HC10 Triple 3-input NAND
- 1x 74HC14 Hex Schmitt trigger inverters
- 1x 74HC32 Quad 2-input OR
- 1x 74HC74 Dual D-Type Flip Flop
- 2x 74HC153 Dual 4-to-1 selector/multiplexer
- 1x 74HC154 4-to-16 decoder/demultiplexer
- 4x 74HC161 4-bit binary counter
- 1x 74HC283 4-bit binary full adder
- 1x 74HC540 Octal inverting buffer/line driver
Semiconductors:
- 128x 1N4148 or 1N914 small signal diode
- 28x 3x2mm rectangular LED
Passive components:
- Resistors: 2x100R; 35x 1K; 1x 3K3; 9x 10K; 1 x 33K; 3x 100K
- Capacitors: 3x 10uF electrolytic
Other components:
- 2x SPDT slider switches (see PCB for footprint)
- 1x micro USB socket (Molex, see PCB for footprint)
- 2x tactile switches
- 1x 4-way DIP switches
- 16x 8-way DIP switches
- DIP sockets: 7x 16 way; 4x 14 way; 1x 20 way; 1x 24 way
Build Photos
Rough Costs
Adding up the BOM for typical pack sizes of components, and using prices from cheap overseas marketplaces (which will of course come with a degree of risk for ICs, but that is fine for me), I would estimate that my costs to build this are as follows:
- Costs of 5 PCBs (typical minimum order) = £20.
- Initial outlay for components to build a single kit, assuming no existing components = £60.
- PCB + component cost of single board, assuming unused components are spares = £20 + £60 of spares.
- Additional costs for enough components for five kits = £20.
- Full cost per five kits = £100.
So to build five PCBs they can be done for a BOM price of around £20 each, leaving some spares.
If it is costed up for 10 kits, then assuming a second order of 5 PCBs is required (there may be a saving of around £5 if 10 were ordered up front) then it would require an additional ~£90 of components. This means a BOM price of around £19 each for 10 kits.
Conclusion
I’ve managed to get the sense of the address LEDs reversed. They are all lit apart from the running instruction. But actually I quite like that effect. It has more LEDs active and still shows which instruction is active.
Apart from that, this seems to work as far as I can tell at present, so I’m saying this is a success!
The PCB can be found on Github here: https://github.com/diyelectromusic/TD4-CPU
But now I can go back to the schematic and see if I can expand on the logic in anyway, knowing I had a known-good, working, starting point.
Kevin
https://makertube.net/w/beyW1XWnp4dfxkXhFWfhhm
#4bit #cpu #td4
#4bit #cpu #td4
Kevin's Blog @[email protected] · 2025-10-29 · 22:08 UTC
TD4 4-bit DIY CPU – Part 5
As a prelude to expanding the TD4 in my own way, I thought I ought to at least attempt to reproduce the circuit myself. But if I’m going to design my own version of the TD4 PCB and build it, I may as well add a few little extras.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
PCB Design
I’ve build the schematic up from the published schematics to be found in: https://github.com/wuxx/TD4-4BIT-CPU
But I wanted to ensure I had the following additions:
- LED outputs for the two registers.
- Some kind of LED output to show which instruction is being worked on.
- Minimal number of surface mount components.
- Option to experiment with replace the diodes with LEDs.
With all that in mind, I’ve ended up with the following schematic, which I’ve spread over four sheets.
The core CPU:
Power, clock and reset:
IO and User Interface:
ROM:
I’ve managed to get this all into a 180×110 mm board.
Design choices/notes:
- I’ve largely followed the same layout as the cheap TD4 kit I bought.
- I’ve added the register LEDs to the top of the board.
- And moved the INPUT and OUTPUT up there to match.
- I’ve added LEDs next to each bank of switches to show which instruction is being worked on.
- All LEDs are 3x2mm rectangle LEDs, including the OUTPUT LEDs (which were surface mount on the original).
- I’ve kept a micro USB socket, which unfortunately is still surface mount. I’m hoping it is the right footprint for a commonly available connector. But I’ve moved it to the bottom of the board.
- I’ve made sure all IC labels are included on the silkscreen along with all resistor values.
Full BOM
ICs:
- 1x 74HC10 Triple 3-input NAND
- 1x 74HC14 Hex Schmitt trigger inverters
- 1x 74HC32 Quad 2-input OR
- 2x 74HC153 Dual 4-to-1 selector/multiplexer
- 1x 74HC154 4-to-16 decoder/demultiplexer
- 4x 74HC161 4-bit binary counter
- 1x 74HC283 4-bit binary full adder
- 1x 74HC540 Octal inverting buffer/line driver
Semiconductors:
- 128x 1N4148 or 1N914 small signal diode
- 28x 3x2mm rectangular LED
Passive components:
- Resistors: 2x100R; 35x 1K; 1x 3K3; 9x 10K; 1 x 33K; 3x 100K
- Capacitors: 3x 10uF electrolytic
Other components:
- 2x SPDT slider switches (see PCB for footprint)
- 1x micro USB socket (Molex, see PCB for footprint)
- 2x tactile switches
- 1x 4-way DIP switches
- 16x 8-way DIP switches
- DIP sockets: 7x 16 way; 4x 14 way; 1x 20 way; 1x 24 way
Build Photos
Conclusion
I’ve managed to get the sense of the address LEDs reversed. They are all lit apart from the running instruction. But actually I quite like that effect. It has more LEDs active and still shows which instruction is active.
Apart from that, this seems to work as far as I can tell at present, so I’m saying this is a success!
I haven’t decided if I want to publish this board yet though. It relies on so much of the effort of others. I’ve really not done very much myself at all.
But now I can go back to the schematic and see if I can expand on the logic in anyway, knowing I had a known-good, working, starting point.
Kevin
https://makertube.net/w/beyW1XWnp4dfxkXhFWfhhm
#4bit #cpu #TD4
#4bit #cpu #td4
Kevin's Blog @[email protected] · 2025-10-29 · 22:08 UTC
TD4 4-bit DIY CPU – Part 5
As a prelude to expanding the TD4 in my own way, I thought I ought to at least attempt to reproduce the circuit myself. But if I’m going to design my own version of the TD4 PCB and build it, I may as well add a few little extras.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
PCB Design
I’ve build the schematic up from the published schematics to be found in: https://github.com/wuxx/TD4-4BIT-CPU
But I wanted to ensure I had the following additions:
- LED outputs for the two registers.
- Some kind of LED output to show which instruction is being worked on.
- Minimal number of surface mount components.
- Option to experiment with replace the diodes with LEDs.
With all that in mind, I’ve ended up with the following schematic, which I’ve spread over four sheets.
The core CPU:
Power, clock and reset:
IO and User Interface:
ROM:
I’ve managed to get this all into a 180×110 mm board.
Design choices/notes:
- I’ve largely followed the same layout as the cheap TD4 kit I bought.
- I’ve added the register LEDs to the top of the board.
- And moved the INPUT and OUTPUT up there to match.
- I’ve added LEDs next to each bank of switches to show which instruction is being worked on.
- All LEDs are 3x2mm rectangle LEDs, including the OUTPUT LEDs (which were surface mount on the original).
- I’ve kept a micro USB socket, which unfortunately is still surface mount. I’m hoping it is the right footprint for a commonly available connector. But I’ve moved it to the bottom of the board.
- I’ve made sure all IC labels are included on the silkscreen along with all resistor values.
Full BOM
ICs:
- 1x 74HC10 Triple 3-input NAND
- 1x 74HC14 Hex Schmitt trigger inverters
- 1x 74HC32 Quad 2-input OR
- 2x 74HC153 Dual 4-to-1 selector/multiplexer
- 1x 74HC154 4-to-16 decoder/demultiplexer
- 4x 74HC161 4-bit binary counter
- 1x 74HC283 4-bit binary full adder
- 1x 74HC540 Octal inverting buffer/line driver
Semiconductors:
- 128x 1N4148 or 1N914 small signal diode
- 28x 3x2mm rectangular LED
Passive components:
- Resistors: 2x100R; 35x 1K; 1x 3K3; 9x 10K; 1 x 33K; 3x 100K
- Capacitors: 3x 10uF electrolytic
Other components:
- 2x SPDT slider switches (see PCB for footprint)
- 1x micro USB socket (Molex, see PCB for footprint)
- 2x tactile switches
- 1x 4-way DIP switches
- 16x 8-way DIP switches
- DIP sockets: 7x 16 way; 4x 14 way; 1x 20 way; 1x 24 way
Build Photos
Conclusion
I’ve managed to get the sense of the address LEDs reversed. They are all lit apart from the running instruction. But actually I quite like that effect. It has more LEDs active and still shows which instruction is active.
Apart from that, this seems to work as far as I can tell at present, so I’m saying this is a success!
I haven’t decided if I want to publish this board yet though. It relies on so much of the effort of others. I’ve really not done very much myself at all.
But now I can go back to the schematic and see if I can expand on the logic in anyway, knowing I had a known-good, working, starting point.
Kevin
https://makertube.net/w/beyW1XWnp4dfxkXhFWfhhm
#4bit #cpu #TD4
#td4 #cpu #4bit
Kevin's Blog @[email protected] · 2025-10-29 · 22:08 UTC
TD4 4-bit DIY CPU – Part 5
As a prelude to expanding the TD4 in my own way, I thought I ought to at least attempt to reproduce the circuit myself. But if I’m going to design my own version of the TD4 PCB and build it, I may as well add a few little extras.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
PCB Design
I’ve build the schematic up from the published schematics to be found in: https://github.com/wuxx/TD4-4BIT-CPU
But I wanted to ensure I had the following additions:
- LED outputs for the two registers.
- Some kind of LED output to show which instruction is being worked on.
- Minimal number of surface mount components.
- Option to experiment with replace the diodes with LEDs.
With all that in mind, I’ve ended up with the following schematic, which I’ve spread over four sheets.
The core CPU:
Power, clock and reset:
IO and User Interface:
ROM:
I’ve managed to get this all into a 180×110 mm board.
Design choices/notes:
- I’ve largely followed the same layout as the cheap TD4 kit I bought.
- I’ve added the register LEDs to the top of the board.
- And moved the INPUT and OUTPUT up there to match.
- I’ve added LEDs next to each bank of switches to show which instruction is being worked on.
- All LEDs are 3x2mm rectangle LEDs, including the OUTPUT LEDs (which were surface mount on the original).
- I’ve kept a micro USB socket, which unfortunately is still surface mount. I’m hoping it is the right footprint for a commonly available connector. But I’ve moved it to the bottom of the board.
- I’ve made sure all IC labels are included on the silkscreen along with all resistor values.
Full BOM
ICs:
- 1x 74HC10 Triple 3-input NAND
- 1x 74HC14 Hex Schmitt trigger inverters
- 1x 74HC32 Quad 2-input OR
- 1x 74HC74 Dual D-Type Flip Flop
- 2x 74HC153 Dual 4-to-1 selector/multiplexer
- 1x 74HC154 4-to-16 decoder/demultiplexer
- 4x 74HC161 4-bit binary counter
- 1x 74HC283 4-bit binary full adder
- 1x 74HC540 Octal inverting buffer/line driver
Semiconductors:
- 128x 1N4148 or 1N914 small signal diode
- 28x 3x2mm rectangular LED
Passive components:
- Resistors: 2x100R; 35x 1K; 1x 3K3; 9x 10K; 1 x 33K; 3x 100K
- Capacitors: 3x 10uF electrolytic
Other components:
- 2x SPDT slider switches (see PCB for footprint)
- 1x micro USB socket (Molex, see PCB for footprint)
- 2x tactile switches
- 1x 4-way DIP switches
- 16x 8-way DIP switches
- DIP sockets: 7x 16 way; 4x 14 way; 1x 20 way; 1x 24 way
Build Photos
Rough Costs
Adding up the BOM for typical pack sizes of components, and using prices from cheap overseas marketplaces (which will of course come with a degree of risk for ICs, but that is fine for me), I would estimate that my costs to build this are as follows:
- Costs of 5 PCBs (typical minimum order) = £20.
- Initial outlay for components to build a single kit, assuming no existing components = £60.
- PCB + component cost of single board, assuming unused components are spares = £20 + £60 of spares.
- Additional costs for enough components for five kits = £20.
- Full cost per five kits = £100.
So to build five PCBs they can be done for a BOM price of around £20 each, leaving some spares.
If it is costed up for 10 kits, then assuming a second order of 5 PCBs is required (there may be a saving of around £5 if 10 were ordered up front) then it would require an additional ~£90 of components. This means a BOM price of around £19 each for 10 kits.
Conclusion
I’ve managed to get the sense of the address LEDs reversed. They are all lit apart from the running instruction. But actually I quite like that effect. It has more LEDs active and still shows which instruction is active.
Apart from that, this seems to work as far as I can tell at present, so I’m saying this is a success!
The PCB can be found on Github here: https://github.com/diyelectromusic/TD4-CPU
But now I can go back to the schematic and see if I can expand on the logic in anyway, knowing I had a known-good, working, starting point.
Kevin
https://makertube.net/w/beyW1XWnp4dfxkXhFWfhhm
#4bit #cpu #td4
#4bit #cpu #td4
Kevin's Blog @[email protected] · 2025-10-05 · 10:43 UTC
TD4 4-bit DIY CPU – Part 4
Having had quite a good play with the TD4 so far, I’m now starting to think how I might be able to enhance it slightly. First, I’m thinking about fairly simple things I can do with the kit as is. Then I might consider what I could do if I was to redesign it myself.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
Debug Register Output
The biggest limitation I’ve found with using the kit as is, compared to say using a simulator online, is that it isn’t possible to see what is going on with the registers.
But there is a pretty simple hack that will add LED outputs to each register.
The pinout for the 74HC161 is as follows:
And looking at the schematic, we can see that TC (15) is unconnected, but CET (10) links to GND. This means that if we want to catch the outputs O0-O3 (14-11) we just need an LED and resistor between the On pin and pin 10.
By far the easiest way to do that is with one of these LED “bars”:
These are available very cheaply online and tend to be blocks of four, six or eight surface mount LEDs and resistors connected to a common GND or VCC point. These are common-GND, six-way, blue modules which are perfect.
When the GND is aligned with CET on pin 10, the LED connections cover pins 11-16, which provides a handy VCC indicator for the register chip too, which makes it easy to know everything is connected and working.
The best way of installing these I decided was to use a spare 16-way DIP socket and insert the LED bar in the correct place as shown above. Then this can be simply slotted over the top of the chip itself on the PCB. This isn’t, obviously, a robust connection, but it is sufficient for debugging.
I’ve used two of these as shown below, for each register. I don’t need one for OUT as that is shown on the board anyway, and whilst I could add one for the PC, for now, this is fine.
Debug PC Output
As mentioned above, adding an indicator for the Program Counter should be relatively straight forward.
But I’d forgotten something. The PC has an automatic increment enabled which means that CET isn’t tied to GND, but to VCC. this means that the above won’t work directly as there is no convenient GND point next to the OUTPUTs. Still, once rewired to an actual GND it would still work in principle, but it will be a binary number representing the instruction.
What I’d really like is something next to the ROM bank showing which instruction is being read. One option is to hang LEDs off the ROM address decoder, shown in the schematic below.
But when we look at the pinout, it isn’t quite as simple as the registers. Also, as these are active LOW then I’d need one of the common VCC LED modules I think so that the LED will only turn on when the signal goes LOW, or just accept that all LEDs will be on apart from the active instruction.
One option might be a small, custom PCB to provide the required readout that can go between the 154 and the PCB. But ultimately all the signals from the 154 end up connected to each back of 8 diodes feeding the DIP switches, so that is another possibility.
Each DIP switch circuit is as follows:
The “scan” works as follows:
- All data lines are pulled HIGH by the resistors shown.
- By default all feed lines (the single connection point for all the diodes) will also be HIGH.
- Except when the 154 selects that output and drives it LOW.
- When the block is selected, then any switches that are set will drive the associated data lines LOW too, but just for this selected switch block.
- Later in the circuit all data lines go via the 74HC540 octal inverting buffer/line driver so they become active HIGH again.
So attaching an LED between the common connection point and VCC could show which block is active (LOW).
But now I’m wondering if the diodes could simply be replaced with LEDs anyway. In principle they would serve the same purpose – prevent “ghosting” between the blocks of switches, and the current is already limited by the 10K pull-up resistors.
This would mean that the active program counter would have its data word illuminated when active for all situations apart from the data word being all zero. This would correspond to the instruction ADD A, 0000 which is kind of a “NOP” anyway, so I don’t think that would be a significant limitation.
I am seriously considering buying a second kit now to see if I can use LEDs instead of diodes…
Returning to some kind of PC indicator for a moment. It would be possible to attach an LED to the underside of the board across the diode and switch such that it could indicate when that last block in the row is active. However this would only work if the switch that the LED is soldered to is open.
The LED has to be soldered so that the cathode (short leg) is soldered to the top of the existing diode and the anode (long leg) is bent around to connect to the bottom of the switch block as shown below.
So when switch 8 is ON the LED will not work. That means if there is an OUT or a JMP then the LED won’t light, but that is better than nothing perhaps and is a pretty simple addition to the board.
And if there are spare instruction slots remaining then the last instruction can just be a NOP (ADD A, 0).
Simulator Enhancement
Whilst testing out the log2 idea from Part 3, I started with the simulator provided by umtkm on github here: https://umtkm.github.io/td4/
But that has a minor issue in that it implements the TD4 as described, but not as fully implemented in logic. In particular, the undocumented feature that all commands will always add an immediate value if provided, even if the officially described assembler instruction doesn’t include it.
The example in question, used for the log2 code, was IN A. The logic actually implements IN A + im even though the description doesn’t mention it. The instruction table lists this as:
```
IN A        0010 0000
```
But there is no reason why that 0000 couldn’t be any value. This is a consequence of using the adder to load a register – it will always add in the immediate value too.
By downloading the javascript from the original repository https://github.com/umtkm/td4 it is possible to tweak it to implement the undocumented feature.
The function “purse_order” handles the implementation of each instruction, so the immediate value needs adding to the register for the IN instruction, and it needs the CARRY building in too. I’ve also added it to the two MOV register functions too.
Here is my new implementation:
```
function purse_order(bin) {
  var op = bin.slice(0,4);
  var im = parseInt(bin.slice(-4),2);
  registor_a &= 15;
  registor_b &= 15;
  switch (op) {
    case '0000':
      registor_a += im; c_flag = registor_a&16 ? 1 : 0; return;
    case '0101':
      registor_b += im; c_flag = registor_b&16 ? 1 : 0;  return;
    case '0011':
      registor_a = im; break;
    case '0111':
      registor_b = im; break;
    case '0001':
      registor_a = registor_b + im; c_flag = registor_a&16 ? 1 : 0; return;
    case '0100':
      registor_b = registor_a + im; c_flag = registor_b&16 ? 1 : 0; return;
    case '1111':
      program_counter = im; break;
    case '1110':
      if (c_flag === 0) program_counter = im; break;
    case '0010':
      registor_a = input_port + im; c_flag = registor_a&16 ? 1 : 0; return;
    case '0110':
      registor_b = input_port + im; c_flag = registor_b&16 ? 1 : 0; return;
    case '1001':
      output_port = registor_b; break;
    case '1011':
      output_port = im; break;
  }
  c_flag = 0;
}
```
There are four instructions not implemented in the emulator at present – the two duplicate OUT instructions, JMP B and JNC B. I might see if I can add the two jumps as that would be really useful too.
But this allows me to test the log2 routine in simulation now too.
Conclusion
Both of the hardware updates described so far are relatively simple, but quite useful additions to the kit.
I’ll come back to this post as I think of other things to do.
Kevin
#td4
#td4
Kevin's Blog @[email protected] · 2025-10-05 · 10:43 UTC
TD4 4-bit DIY CPU – Part 4
Having had quite a good play with the TD4 so far, I’m now starting to think how I might be able to enhance it slightly. First, I’m thinking about fairly simple things I can do with the kit as is. Then I might consider what I could do if I was to redesign it myself.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
Debug Register Output
The biggest limitation I’ve found with using the kit as is, compared to say using a simulator online, is that it isn’t possible to see what is going on with the registers.
But there is a pretty simple hack that will add LED outputs to each register.
The pinout for the 74HC161 is as follows:
And looking at the schematic, we can see that TC (15) is unconnected, but CET (10) links to GND. This means that if we want to catch the outputs O0-O3 (14-11) we just need an LED and resistor between the On pin and pin 10.
By far the easiest way to do that is with one of these LED “bars”:
These are available very cheaply online and tend to be blocks of four, six or eight surface mount LEDs and resistors connected to a common GND or VCC point. These are common-GND, six-way, blue modules which are perfect.
When the GND is aligned with CET on pin 10, the LED connections cover pins 11-16, which provides a handy VCC indicator for the register chip too, which makes it easy to know everything is connected and working.
The best way of installing these I decided was to use a spare 16-way DIP socket and insert the LED bar in the correct place as shown above. Then this can be simply slotted over the top of the chip itself on the PCB. This isn’t, obviously, a robust connection, but it is sufficient for debugging.
I’ve used two of these as shown below, for each register. I don’t need one for OUT as that is shown on the board anyway, and whilst I could add one for the PC, for now, this is fine.
Debug PC Output
As mentioned above, adding an indicator for the Program Counter should be relatively straight forward.
But I’d forgotten something. The PC has an automatic increment enabled which means that CET isn’t tied to GND, but to VCC. this means that the above won’t work directly as there is no convenient GND point next to the OUTPUTs. Still, once rewired to an actual GND it would still work in principle, but it will be a binary number representing the instruction.
What I’d really like is something next to the ROM bank showing which instruction is being read. One option is to hang LEDs off the ROM address decoder, shown in the schematic below.
But when we look at the pinout, it isn’t quite as simple as the registers. Also, as these are active LOW then I’d need one of the common VCC LED modules I think so that the LED will only turn on when the signal goes LOW, or just accept that all LEDs will be on apart from the active instruction.
One option might be a small, custom PCB to provide the required readout that can go between the 154 and the PCB. But ultimately all the signals from the 154 end up connected to each back of 8 diodes feeding the DIP switches, so that is another possibility.
Each DIP switch circuit is as follows:
The “scan” works as follows:
- All data lines are pulled HIGH by the resistors shown.
- By default all feed lines (the single connection point for all the diodes) will also be HIGH.
- Except when the 154 selects that output and drives it LOW.
- When the block is selected, then any switches that are set will drive the associated data lines LOW too, but just for this selected switch block.
- Later in the circuit all data lines go via the 74HC540 octal inverting buffer/line driver so they become active HIGH again.
So attaching an LED between the common connection point and VCC could show which block is active (LOW).
But now I’m wondering if the diodes could simply be replaced with LEDs anyway. In principle they would serve the same purpose – prevent “ghosting” between the blocks of switches, and the current is already limited by the 10K pull-up resistors.
This would mean that the active program counter would have its data word illuminated when active for all situations apart from the data word being all zero. This would correspond to the instruction ADD A, 0000 which is kind of a “NOP” anyway, so I don’t think that would be a significant limitation.
I am seriously considering buying a second kit now to see if I can use LEDs instead of diodes…
Returning to some kind of PC indicator for a moment. It would be possible to attach an LED to the underside of the board across the diode and switch such that it could indicate when that last block in the row is active. However this would only work if the switch that the LED is soldered to is open.
The LED has to be soldered so that the cathode (short leg) is soldered to the top of the existing diode and the anode (long leg) is bent around to connect to the bottom of the switch block as shown below.
So when switch 8 is ON the LED will not work. That means if there is an OUT or a JMP then the LED won’t light, but that is better than nothing perhaps and is a pretty simple addition to the board.
And if there are spare instruction slots remaining then the last instruction can just be a NOP (ADD A, 0).
Simulator Enhancement
Whilst testing out the log2 idea from Part 3, I started with the simulator provided by umtkm on github here: https://umtkm.github.io/td4/
But that has a minor issue in that it implements the TD4 as described, but not as fully implemented in logic. In particular, the undocumented feature that all commands will always add an immediate value if provided, even if the officially described assembler instruction doesn’t include it.
The example in question, used for the log2 code, was IN A. The logic actually implements IN A + im even though the description doesn’t mention it. The instruction table lists this as:
```
IN A        0010 0000
```
But there is no reason why that 0000 couldn’t be any value. This is a consequence of using the adder to load a register – it will always add in the immediate value too.
By downloading the javascript from the original repository https://github.com/umtkm/td4 it is possible to tweak it to implement the undocumented feature.
The function “purse_order” handles the implementation of each instruction, so the immediate value needs adding to the register for the IN instruction, and it needs the CARRY building in too. I’ve also added it to the two MOV register functions too.
Here is my new implementation:
```
function purse_order(bin) {
  var op = bin.slice(0,4);
  var im = parseInt(bin.slice(-4),2);
  registor_a &= 15;
  registor_b &= 15;
  switch (op) {
    case '0000':
      registor_a += im; c_flag = registor_a&16 ? 1 : 0; return;
    case '0101':
      registor_b += im; c_flag = registor_b&16 ? 1 : 0;  return;
    case '0011':
      registor_a = im; break;
    case '0111':
      registor_b = im; break;
    case '0001':
      registor_a = registor_b + im; c_flag = registor_a&16 ? 1 : 0; return;
    case '0100':
      registor_b = registor_a + im; c_flag = registor_b&16 ? 1 : 0; return;
    case '1111':
      program_counter = im; break;
    case '1110':
      if (c_flag === 0) program_counter = im; break;
    case '0010':
      registor_a = input_port + im; c_flag = registor_a&16 ? 1 : 0; return;
    case '0110':
      registor_b = input_port + im; c_flag = registor_b&16 ? 1 : 0; return;
    case '1001':
      output_port = registor_b; break;
    case '1011':
      output_port = im; break;
  }
  c_flag = 0;
}
```
There are four instructions not implemented in the emulator at present – the two duplicate OUT instructions, JMP B and JNC B. I might see if I can add the two jumps as that would be really useful too.
But this allows me to test the log2 routine in simulation now too.
Conclusion
Both of the hardware updates described so far are relatively simple, but quite useful additions to the kit.
I’ll come back to this post as I think of other things to do.
Kevin
#td4
#td4
Kevin's Blog @[email protected] · 2025-10-01 · 21:28 UTC
TD4 4-bit DIY CPU – Part 3
Having now spent some time thinking about and building my TD4 4-bit CPU this has a look at some code that I can run on it.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
Programming
As described in Part 1 each command has a command part (bits 4-7) and an immediate data part (bits 0-3).
Unfortunately, due to how the PCB is built, when programming the DIP switches, the bits are ordered 01234567.
Now it would be possible to turn the board upside down of course, but then the program would flow from bottom right to top left instead of the more natural top left to bottom right.
This is also slightly confusing as the IO (LED OUTPUT and DIP switch INPUT) are in the more expected 3210 order.
Consequently, in the following programs, I list the “assembly” but then show the bit patterns in DIP switch order for easy programming!
Aside, after a bit more messing around online, I stumbled across a web-based TD4 emulator here: https://umtkm.github.io/td4/
And even an implementation in Minecraft…
Clock Options
There are several options for running the board. It has a built-in clock with two speeds: 1Hz or 10Hz. But there is also a “single stepping” option using the button.
Simple LED Output
Sending a value via the OUT instruction will push that value into the OUT register and consequently onto the LEDs. The simplest LED program is therefore the simple “OUT value”, but to make it at least look like something is happening, we can send two alternating values as follows:
```
0000 OUT 0001   # 1000 1101
0001 OUT 1000   # 0001 1101
0010 JMP 0000   # 0000 1111
```
This flashes bit 1, then bit 4 and then jumps back to the start.
Larson Scanner
Swapping values is no fun, but it is easy to create a 4-bit “Larson Scanner” with a sequence of OUT instructions as shown below.
```
0000 OUT 0001   # 1000 1101
0001 ADD A,0000 # 0000 0000
0010 OUT 0010   # 0100 1101
0011 ADD A,0000 # 0000 0000
0100 OUT 0100   # 0010 1101
0101 ADD A,0000 # 0000 0000
0110 OUT 1000   # 0001 1101
0111 ADD A,0000 # 0000 0000
1000 OUT 0100   # 0010 1101
1001 ADD A,0000 # 0000 0000
1010 OUT 0010   # 0100 1101
1011 ADD A,0000 # 0000 0000
1100 JMP 0000   # 0000 1111
```
The ADD instructions are effectively NOP instructions. They are just there to make the changes all two clock cycles. Without these, the LEDs change quite smoothly but then there is a short pause whilst the JMP takes place.
Counting Up and Down
We only have an OUT instruction that takes an immediate value or will use the B register, so if we want to do some counting and output the result, that has to happen using B.
```
0000 ADD B,0001  # 1000 1010
0001 OUT B       # 0000 1001
0010 JMP 0000    # 0000 1111
```
Counting down is a bit more tricky as there is no subtract. But like the vast majority of other CPU architectures, if we use two complement for negative numbers, then the magic of wrapping around binary numbers will do the trick.
Twos complement means taking the bitwise opposite and adding 1, so the twos complement value for n is NOT(n) + 1. For example:
```
-1 => ~(0001) + 0001 = 1110 + 0001 = 1111 = 15
```
So working through 5 – 1:
```
  0101 -> 5
+ 1111 -> -1
 -----
  0100 -> 4
 1111
```
When the binary digit wraps around, and we ignore the “carry”, that gives the correct answer: 4.
```
0000 ADD B,1111  # 1111 1010
0001 OUT B       # 0000 1001
0010 JMP 0       # 0000 1111
```
Taking INPUT
This will simply echo what is on the INPUT switches to the OUTPUT LEDs.
```
0000 IN B      # 0000 0110
0001 OUT B     # 0000 1001
0010 JMP 00000 # 0000 1111
```
This will add 1 to the INPUT
```
0000 IN B       # 0000 0110
0001 ADD B,0001 # 1000 1010
0010 OUT B      # 0000 1001
0011 JMP 0000   # 0000 1111
```
Although there is actually a bit of a cheat here. OUT B assumes the immediate data is 0 otherwise that gets added to B as part of the OUT, so really the above could be shorted to the following:
```
0000 IN B            # 0000 0110
0001 OUT B, ADD 0001 # 1000 1001
0010 JMP 0000        # 0000 1111
```
Pretty much any instruction that assumes an immediate value of 0 can have the side effect of addition if an actual value is used instead as a side effect of using the accumulator to move values around…
Programmable Counter
This will read the input and count up to that value then stop. It makes use of the JNC – jump if not carry and twos-complement subtraction to count down from the INPUT value.
```
0000 IN A        # 0000 0100    A = INPUT
0001 MOV B,1111  # 1111 1110    B = -1
0010 ADD B,0001  # 1000 1010    B = B + 1
0011 OUT B       # 0000 1001    OUTPUT = B
0100 ADD A,1111  # 1111 0000    A = A + (-1)
0101 JNC 0111    # 1110 0111    JUMP IF NO CARRY TO 1000
0110 JMP 0010    # 0100 1111    JUMP to 0011
0111 MOV B,0000  # 0000 1110    B = 0
```
The JUMP IF NOT CARRY works as when 1111 is added to A, the only value of A that won’t generate a CARRY is 0, so this is really executing the logic:
```
DO
  (other stuff)
  A--
WHILE (A != 0)
```
Bit Shift/Doubling
It isn’t possible to add two registers together directly, but it is possible to keep adding 1 in a loop until a register counts down to zero. This can kind of add the two registers together.
The following reads a number from the INPUT and puts it in both A and B and then proceeds to add them together, the result of course being to double the INPUT value. As this is binary, that also acts as a bit-shift-left.
```
0000 IN A        # 0000 0100    A = B = INPUT
0001 MOV B,A     # 0000 0010
0010 ADD B,1111  # 1111 1010    B = B + (-1)
0011 ADD B,0001  # 1000 1010    B = B + 1
0100 OUT B       # 0000 1001    OUT B
0101 ADD A,1111  # 1111 0000    A = A + (-1)
0110 JNC 1000    # 0001 0111    JUMP IF NO CARRY to 1000
0111 JMP 0011    # 1100 1111    JUMP to 0011
1000 OUT B       # 0000 1001    OUT B
1001 JMP 1001    # 1001 1111    LOOP HERE FOREVER
```
Once again, I’m using a counting down from a value as the counter, so the logic going on here is something like the following:
```
B = A = INPUT
DO
  ADD 1 to B
  A--
WHILE (A != 0)
```
As I show the working by sending B to the OUTPUT on each pass through the loop, the last instruction is a “JUMP forever” instruction to pause the CPU to allow me to see the OUTPUT.
This can be slightly optimised a bit more if we swap the test (in A) to before the addition (in B).
```
0000 IN B       # 0000 0110   A = B = INPUT
0001 MOV A,B    # 0000 1000
0010 ADD A,1111 # 1111 0000   A = A - 1
0011 JNC 0111   # 1110 0111   JUMP IF NO CARRY to 0111
0100 ADD B,0001 # 1000 1010   B++
0101 OUT B      # 0000 1001   OUTPUT = B (temporary display)
0110 JMP 0010   # 0100 1111   JUMP to 0010
0111 OUT B      # 0000 1001   OUTPUT = B (final result)
```
In theory, being able to add two registers like this ought to be pretty useful. But as these are the only two registers, and this operation is destructive – i.e. A counts down to zero – and there is no stack or other temporary storage, in reality the options are pretty limited!
Multiplication
Of course what I’d really like to do is multiply two numbers. Or at least, with a single INPUT, be able to multiply it by itself, so square it.
In principle that could be done via repeated addition in a loop, but I can’t think of a way to use two registers (with no storage) to allow me to run two loops. What I think I need to do is something like the following:
```
A = B = INPUT
TOTAL = 0
DO
  B = INPUT
  DO
    TOTAL++
    B--
  WHILE (B != 0)
  A--
WHILE (A != 0)
OUT TOTAL
```
But that requires keeping track of at least three different things, and I only have two areas of workable storage. Of course, if one of those things is fixed, e.g. multiplication by a fixed amount, then it is possible. But that fixed amount has to be hardcoded in a ROM instruction.
I have wondered if I could somehow hook up the OUTPUT to the INPUT to create a sort of third register.
The best I can do, is use half the ROM as a look-up table for the answers to the squares of 0, 1, 2, 3. Then I’m using the sort of “undocumented” instruction JMP B which adds immediate data to whatever is in the B register and jumps to that location.
```
0000 IN A        # 0000 0100    A = INPUT
0001 MOV B,A     # 0000 0010    B = A
0010 ADD B,1111  # 1111 1010    B = B + (-1)
0011 ADD B,0001  # 1000 1010    B = B + 1
0100 ADD A,1111  # 1111 0000    A = A + (-1)
0101 JNC 0111    # 1110 0111    JUMP IF NO CARRY to 0111
0110 JMP 0011    # 1100 1111    JUMP TO 0011
0111 JMP B,1000  # 0001 1011    "Undocumented" JUMP to B+1000
1000 OUT 0000    # 0000 1101    0*0 = 0
1001 JMP 1111    # 1111 1111
1010 OUT 0001    # 1000 1101    1*1 = 1
1011 JMP 1111    # 1111 1111
1100 OUT 0100    # 0010 1101    2*2 = 4
1101 JMP 1111    # 1111 1111
1110 OUT 1001    # 1001 1101    3*3 = 9
1111 JMP 1111    # 1111 1111
```
I have to allow for the lookup table to execute both an OUT and a JMP so I have to use the previous logic to multiply my index by two and then add it to 1000 and JMP to that value.
The pseudo code is as follows:
```
A = B = INPUT
B = B * 2
JMP 1000 + B
1000 OUT 0; JMP END
1010 OUT 1; JMP END
1100 OUT 4; JMP END
1110 OUT 9; JMP END
1111 END
```
So, this works, and as 3*3 is the maximum we can calculate on a 4-bit OUTPUT register without overflowing anyway, arguably this is doing the job.
But it can hardly be described as multiplication!
If you know of a way to do it computationally, with only two registers, then answers on a postcard to… well, me.
Additional Notes on Multiplication
After posting this blog I ended up have two interesting conversations about the multiplication question.
Using logarithms
The first with [email protected] was a discussion about possibly using logarithms. You may recall that multiplication can be turned into addition by using logarithms.
They presented the following code to add together an arbitrary number of numbers by adding the log2(n) values:
```
0000 OUT B      # 0000 1001   Assumes B=0 at start, then culmulates
0001 IN A,1110  # 0111 0100   A = INPUT - 2
0010 JNC 0000   # 0000 0111   IF NC then A<2 STOP with B=0
0011 ADD B,0001 # 1000 1010   B++
0100 IN A,1100  # 0011 0100   A = INPUT - 4
0101 JNC 0000   # 0000 0111   IF NC then A<4 STOP with B=1
0110 ADD B,0001 # 1000 1010   B++
0111 IN A,1001  # 1001 0100   A = INPUT - 8
1000 JNC 0000   # 0000 0111   IF NC then A<8 STOP with B=2
1001 ADD B,0001 # 1000 1010   B++
1010 IN A,0011  # 1100 0100   A = INPUT + 3
1011 JNC 0000   # 0000 0111   IF NC then A<12 STOP with B=3
1100 ADD B,0001 # 1000 1010   B++
1101 JMP 0000   # 0000 1111   ELSE STOP with B=4
```
This is using the “undocumented” feature that IN A is actually IN A,data with values that test the range of A and will cumulatively add to B until the limit of the size of A has been found.
On entry, B will be 0 and the first number will be read from A until the JUMP happens. Then the second number can be set on the INPUT register and the calculation will continue with the last value of B, still adding values until the new A has been processed.
This is a great solution and very elegant, but the answer is given in terms of log2 values, so it isn’t a direct output. But apart from that is does work pretty well and is particularly neat for allowing the addition of an arbitrary number of different numbers. Well at least until the 4-bits are exhausted – it will carry on and keep overflowing of course, but the numbers will cease making sense.
Noting a curious mathematical property
I have another idea sent to me in a private conversation that pointed out that if I’m only doing squares, i.e. assuming a single INPUT value, then we can take advantage of the fact that the sum of the first n odd integers is equal to n^2:
```
0^2 = 0 = 0
1^2 = 0 + 1 = 1
2^2 = 0 + 1 + 3 = 4
3^2 = 0 + 1 + 3 + 5 = 9
4^2 = 0 + 1 + 3 + 5 + 7 = 16
5^2 = 0 + 1 + 3 + 5 + 7 + 9 = 25
6^2 = 0 + 1 + 3 + 5 + 7 + 9 + 11 = 36
....
```
There is an explanation here: https://www.mathsisfun.com/numbers/odd-square-number.html and of course the diagram makes it obvious! I don’t know that it has a name though – let me know if you know otherwise!
So we can get away without needed nested loops as we’re just adding numbers now that are 2 integers apart.
```
B = INPUT
A = 2B - 1 # This is the largest odd number we will be adding
B = 0
DO
   B = B + A
   A = A - 2
WHILE A > 0
OUTPUT = B
```
So I have to use the above routines I already have for adding registers and multiplying by 2 in the above, and if it will fit in 16 instructions, that should work!
```
0000 IN B       # 0000 0110   A = B = INPUT
0001 MOV A,B    # 0000 1000

# A = 2B - 1 - Shift left then subtract 1
0010 ADD B,1111 # 1111 1010   B = B - 1
0011 JNC 0111   # 1110 0111   JUMP IF NO CARRY to 0110
0100 ADD A,0001 # 1000 0000   A = A + 1
0101 JMP 0010   # 0100 1111   JUMP to 0010
0110 ADD A,1111 # 1111 0000   A = A - 1

# Now add consecutive odd numbers
...
```
Ok, here is where I hit a snag. The next part of the algorithm requires me to do B = B + A, but I can’t add two registers together directly – I have to count down one of the registers whilst adding 1 to the second.
That works fine for one pass through the loop, but at that point I’ve destroyed the value of either A or B and won’t be able to go back around for a 2nd pass…
I started to chew over if there was a way to work incrementally add the values, but every way I considered I ended up with the act that I am having to add two calculated values together at some point and I just can’t do that.
One interesting train of thought initially looked promising: noticing that each odd value to be added is a number of 2s as follows:
```
n=4:  4^2 = 1 + 3 + 5 + 7
          = 2 - 1
          + 2 + 2 - 1
          + 2 + 2 + 2 - 1
          + 2 + 2 + 2 + 2 - 1
```
That is the triangle number for n times two, less n times 1. The formular for a triangle number is as follows:
```
Tri(n) = n (n + 1) / 2
```
So using that many 2s and taking off the appropriate values too, gives me
```
Sq (n) = (2n(n + 1)) / 2   - n
       = (2n(n + 1) - 2n) / 2
       = 2n^2 + 2n - 2n / 2
       = 2n^2 / 2
       = n^2
```
Ah, so basically the square (n) = square (n). That isn’t particularly insightful…
Ok, so this looked really promising, but fundamentally, if I can’t add the two registers together without trashing one of the values, I think I’m stuck again.
Conclusion
I must admit to wondering if the addressing could be extended to permit longer programs. I don’t see why not, if the PC register could be extended. One limitation might be that jumps remain localised to a 4-bit value somehow.
It is proving a real limitation not being able to add the two registers together – that is really limiting what I can work out how to do with it.
It seems that at least one other person has had similar thoughts and had started to work on an extended version – the TD4E and even the TD4E8: https://github.com/Nekhaevalex/TD4. It looks like a lot of the work was done in a simulation, but they did get to produce an assembler: https://hackaday.io/project/161708-4-bit-cpu-td4-once-again.
Other related things I’ve found include:
- Crazy Small CPU (4-bit data, 8-bit address): https://minnie.tuhs.org/Programs/CrazySmallCPU/description.html
- CHUMP ICS4U 4-bit TTL project: http://darcy.rsgc.on.ca/ACES/TEI4M/4BitComputer/index.html
- TPS/MyCo (16 pages of 4-bit addressable memory): https://github.com/GClown25/BIT4
- MiniMax Enhanced TD4: https://github.com/denjhang/MiniMax-4-bit-CPU
- TD4 CPU (adds paging): https://hackaday.io/project/26215-td4-cpu
- 4-bit TTL Scratch Built CPU (4-bit data, 8-bit address): https://www.ttlcpu.com/articles/4-bit-ttl-scratchbuilt-computer
- “How to build a CPU TD4” includes a discussion of expanding the memory: https://xyama.sakura.ne.jp/hp/4bitCPU_TD4.html
Finally, there is this “meta list” of DIY CPU projects: https://www.ttlcpu.com/content/links.
And this seems to be a massive list of TD4 related links: http://www.cable-net.ne.jp/user/takahisa/td4/index.html although, unfortunately many of the linked pages no longer exist. But there continues to be interest out there in the TD4!
I have a few ideas of my own, but if they amount to anything, well that still remains to be seen. But getting this far has been really interesting and I already feel like I’ve learned quite a lot about the lowest levels of our technology.
I have to say, “full stack developer” certainly takes on a new meaning when you get to these levels…
Kevin
#4bit #cpu #td4
#cpu #4bit #td4
Kevin's Blog @[email protected] · 2025-10-01 · 21:28 UTC
TD4 4-bit DIY CPU – Part 3
Having now spent some time thinking about and building my TD4 4-bit CPU this has a look at some code that I can run on it.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
Programming
As described in Part 1 each command has a command part (bits 4-7) and an immediate data part (bits 0-3).
Unfortunately, due to how the PCB is built, when programming the DIP switches, the bits are ordered 01234567.
Now it would be possible to turn the board upside down of course, but then the program would flow from bottom right to top left instead of the more natural top left to bottom right.
This is also slightly confusing as the IO (LED OUTPUT and DIP switch INPUT) are in the more expected 3210 order.
Consequently, in the following programs, I list the “assembly” but then show the bit patterns in DIP switch order for easy programming!
Aside, after a bit more messing around online, I stumbled across a web-based TD4 emulator here: https://umtkm.github.io/td4/
And even an implementation in Minecraft…
Clock Options
There are several options for running the board. It has a built-in clock with two speeds: 1Hz or 10Hz. But there is also a “single stepping” option using the button.
Simple LED Output
Sending a value via the OUT instruction will push that value into the OUT register and consequently onto the LEDs. The simplest LED program is therefore the simple “OUT value”, but to make it at least look like something is happening, we can send two alternating values as follows:
```
0000 OUT 0001   # 1000 1101
0001 OUT 1000   # 0001 1101
0010 JMP 0000   # 0000 1111
```
This flashes bit 1, then bit 4 and then jumps back to the start.
Larson Scanner
Swapping values is no fun, but it is easy to create a 4-bit “Larson Scanner” with a sequence of OUT instructions as shown below.
```
0000 OUT 0001   # 1000 1101
0001 ADD A,0000 # 0000 0000
0010 OUT 0010   # 0100 1101
0011 ADD A,0000 # 0000 0000
0100 OUT 0100   # 0010 1101
0101 ADD A,0000 # 0000 0000
0110 OUT 1000   # 0001 1101
0111 ADD A,0000 # 0000 0000
1000 OUT 0100   # 0010 1101
1001 ADD A,0000 # 0000 0000
1010 OUT 0010   # 0100 1101
1011 ADD A,0000 # 0000 0000
1100 JMP 0000   # 0000 1111
```
The ADD instructions are effectively NOP instructions. They are just there to make the changes all two clock cycles. Without these, the LEDs change quite smoothly but then there is a short pause whilst the JMP takes place.
Counting Up and Down
We only have an OUT instruction that takes an immediate value or will use the B register, so if we want to do some counting and output the result, that has to happen using B.
```
0000 ADD B,0001  # 1000 1010
0001 OUT B       # 0000 1001
0010 JMP 0000    # 0000 1111
```
Counting down is a bit more tricky as there is no subtract. But like the vast majority of other CPU architectures, if we use two complement for negative numbers, then the magic of wrapping around binary numbers will do the trick.
Twos complement means taking the bitwise opposite and adding 1, so the twos complement value for n is NOT(n) + 1. For example:
```
-1 => ~(0001) + 0001 = 1110 + 0001 = 1111 = 15
```
So working through 5 – 1:
```
  0101 -> 5
+ 1111 -> -1
 -----
  0100 -> 4
 1111
```
When the binary digit wraps around, and we ignore the “carry”, that gives the correct answer: 4.
```
0000 ADD B,1111  # 1111 1010
0001 OUT B       # 0000 1001
0010 JMP 0       # 0000 1111
```
Taking INPUT
This will simply echo what is on the INPUT switches to the OUTPUT LEDs.
```
0000 IN B      # 0000 0110
0001 OUT B     # 0000 1001
0010 JMP 00000 # 0000 1111
```
This will add 1 to the INPUT
```
0000 IN B       # 0000 0110
0001 ADD B,0001 # 1000 1010
0010 OUT B      # 0000 1001
0011 JMP 0000   # 0000 1111
```
Although there is actually a bit of a cheat here. OUT B assumes the immediate data is 0 otherwise that gets added to B as part of the OUT, so really the above could be shorted to the following:
```
0000 IN B            # 0000 0110
0001 OUT B, ADD 0001 # 1000 1001
0010 JMP 0000        # 0000 1111
```
Pretty much any instruction that assumes an immediate value of 0 can have the side effect of addition if an actual value is used instead as a side effect of using the accumulator to move values around…
Programmable Counter
This will read the input and count up to that value then stop. It makes use of the JNC – jump if not carry and twos-complement subtraction to count down from the INPUT value.
```
0000 IN A        # 0000 0100    A = INPUT
0001 MOV B,1111  # 1111 1110    B = -1
0010 ADD B,0001  # 1000 1010    B = B + 1
0011 OUT B       # 0000 1001    OUTPUT = B
0100 ADD A,1111  # 1111 0000    A = A + (-1)
0101 JNC 0111    # 1110 0111    JUMP IF NO CARRY TO 1000
0110 JMP 0010    # 0100 1111    JUMP to 0011
0111 MOV B,0000  # 0000 1110    B = 0
```
The JUMP IF NOT CARRY works as when 1111 is added to A, the only value of A that won’t generate a CARRY is 0, so this is really executing the logic:
```
DO
  (other stuff)
  A--
WHILE (A != 0)
```
Bit Shift/Doubling
It isn’t possible to add two registers together directly, but it is possible to keep adding 1 in a loop until a register counts down to zero. This can kind of add the two registers together.
The following reads a number from the INPUT and puts it in both A and B and then proceeds to add them together, the result of course being to double the INPUT value. As this is binary, that also acts as a bit-shift-left.
```
0000 IN A        # 0000 0100    A = B = INPUT
0001 MOV B,A     # 0000 0010
0010 ADD B,1111  # 1111 1010    B = B + (-1)
0011 ADD B,0001  # 1000 1010    B = B + 1
0100 OUT B       # 0000 1001    OUT B
0101 ADD A,1111  # 1111 0000    A = A + (-1)
0110 JNC 1000    # 0001 0111    JUMP IF NO CARRY to 1000
0111 JMP 0011    # 1100 1111    JUMP to 0011
1000 OUT B       # 0000 1001    OUT B
1001 JMP 1001    # 1001 1111    LOOP HERE FOREVER
```
Once again, I’m using a counting down from a value as the counter, so the logic going on here is something like the following:
```
B = A = INPUT
DO
  ADD 1 to B
  A--
WHILE (A != 0)
```
As I show the working by sending B to the OUTPUT on each pass through the loop, the last instruction is a “JUMP forever” instruction to pause the CPU to allow me to see the OUTPUT.
This can be slightly optimised a bit more if we swap the test (in A) to before the addition (in B).
```
0000 IN B       # 0000 0110   A = B = INPUT
0001 MOV A,B    # 0000 1000
0010 ADD A,1111 # 1111 0000   A = A - 1
0011 JNC 0111   # 1110 0111   JUMP IF NO CARRY to 0111
0100 ADD B,0001 # 1000 1010   B++
0101 OUT B      # 0000 1001   OUTPUT = B (temporary display)
0110 JMP 0010   # 0100 1111   JUMP to 0010
0111 OUT B      # 0000 1001   OUTPUT = B (final result)
```
In theory, being able to add two registers like this ought to be pretty useful. But as these are the only two registers, and this operation is destructive – i.e. A counts down to zero – and there is no stack or other temporary storage, in reality the options are pretty limited!
Multiplication
Of course what I’d really like to do is multiply two numbers. Or at least, with a single INPUT, be able to multiply it by itself, so square it.
In principle that could be done via repeated addition in a loop, but I can’t think of a way to use two registers (with no storage) to allow me to run two loops. What I think I need to do is something like the following:
```
A = B = INPUT
TOTAL = 0
DO
  B = INPUT
  DO
    TOTAL++
    B--
  WHILE (B != 0)
  A--
WHILE (A != 0)
OUT TOTAL
```
But that requires keeping track of at least three different things, and I only have two areas of workable storage. Of course, if one of those things is fixed, e.g. multiplication by a fixed amount, then it is possible. But that fixed amount has to be hardcoded in a ROM instruction.
I have wondered if I could somehow hook up the OUTPUT to the INPUT to create a sort of third register.
The best I can do, is use half the ROM as a look-up table for the answers to the squares of 0, 1, 2, 3. Then I’m using the sort of “undocumented” instruction JMP B which adds immediate data to whatever is in the B register and jumps to that location.
```
0000 IN A        # 0000 0100    A = INPUT
0001 MOV B,A     # 0000 0010    B = A
0010 ADD B,1111  # 1111 1010    B = B + (-1)
0011 ADD B,0001  # 1000 1010    B = B + 1
0100 ADD A,1111  # 1111 0000    A = A + (-1)
0101 JNC 0111    # 1110 0111    JUMP IF NO CARRY to 0111
0110 JMP 0011    # 1100 1111    JUMP TO 0011
0111 JMP B,1000  # 0001 1011    "Undocumented" JUMP to B+1000
1000 OUT 0000    # 0000 1101    0*0 = 0
1001 JMP 1111    # 1111 1111
1010 OUT 0001    # 1000 1101    1*1 = 1
1011 JMP 1111    # 1111 1111
1100 OUT 0100    # 0010 1101    2*2 = 4
1101 JMP 1111    # 1111 1111
1110 OUT 1001    # 1001 1101    3*3 = 9
1111 JMP 1111    # 1111 1111
```
I have to allow for the lookup table to execute both an OUT and a JMP so I have to use the previous logic to multiply my index by two and then add it to 1000 and JMP to that value.
The pseudo code is as follows:
```
A = B = INPUT
B = B * 2
JMP 1000 + B
1000 OUT 0; JMP END
1010 OUT 1; JMP END
1100 OUT 4; JMP END
1110 OUT 9; JMP END
1111 END
```
So, this works, and as 3*3 is the maximum we can calculate on a 4-bit OUTPUT register without overflowing anyway, arguably this is doing the job.
But it can hardly be described as multiplication!
If you know of a way to do it computationally, with only two registers, then answers on a postcard to… well, me.
Additional Notes on Multiplication
After posting this blog I ended up have two interesting conversations about the multiplication question.
Using logarithms
The first with [email protected] was a discussion about possibly using logarithms. You may recall that multiplication can be turned into addition by using logarithms.
They presented the following code to add together an arbitrary number of numbers by adding the log2(n) values:
```
0000 OUT B      # 0000 1001   Assumes B=0 at start, then culmulates
0001 IN A,1110  # 0111 0100   A = INPUT - 2
0010 JNC 0000   # 0000 0111   IF NC then A<2 STOP with B=0
0011 ADD B,0001 # 1000 1010   B++
0100 IN A,1100  # 0011 0100   A = INPUT - 4
0101 JNC 0000   # 0000 0111   IF NC then A<4 STOP with B=1
0110 ADD B,0001 # 1000 1010   B++
0111 IN A,1001  # 1001 0100   A = INPUT - 8
1000 JNC 0000   # 0000 0111   IF NC then A<8 STOP with B=2
1001 ADD B,0001 # 1000 1010   B++
1010 IN A,0011  # 1100 0100   A = INPUT + 3
1011 JNC 0000   # 0000 0111   IF NC then A<12 STOP with B=3
1100 ADD B,0001 # 1000 1010   B++
1101 JMP 0000   # 0000 1111   ELSE STOP with B=4
```
This is using the “undocumented” feature that IN A is actually IN A,data with values that test the range of A and will cumulatively add to B until the limit of the size of A has been found.
On entry, B will be 0 and the first number will be read from A until the JUMP happens. Then the second number can be set on the INPUT register and the calculation will continue with the last value of B, still adding values until the new A has been processed.
This is a great solution and very elegant, but the answer is given in terms of log2 values, so it isn’t a direct output. But apart from that is does work pretty well and is particularly neat for allowing the addition of an arbitrary number of different numbers. Well at least until the 4-bits are exhausted – it will carry on and keep overflowing of course, but the numbers will cease making sense.
Noting a curious mathematical property
I have another idea sent to me in a private conversation that pointed out that if I’m only doing squares, i.e. assuming a single INPUT value, then we can take advantage of the fact that the sum of the first n odd integers is equal to n^2:
```
0^2 = 0 = 0
1^2 = 0 + 1 = 1
2^2 = 0 + 1 + 3 = 4
3^2 = 0 + 1 + 3 + 5 = 9
4^2 = 0 + 1 + 3 + 5 + 7 = 16
5^2 = 0 + 1 + 3 + 5 + 7 + 9 = 25
6^2 = 0 + 1 + 3 + 5 + 7 + 9 + 11 = 36
....
```
There is an explanation here: https://www.mathsisfun.com/numbers/odd-square-number.html and of course the diagram makes it obvious! I don’t know that it has a name though – let me know if you know otherwise!
So we can get away without needed nested loops as we’re just adding numbers now that are 2 integers apart.
```
B = INPUT
A = 2B - 1 # This is the largest odd number we will be adding
B = 0
DO
   B = B + A
   A = A - 2
WHILE A > 0
OUTPUT = B
```
So I have to use the above routines I already have for adding registers and multiplying by 2 in the above, and if it will fit in 16 instructions, that should work!
```
0000 IN B       # 0000 0110   A = B = INPUT
0001 MOV A,B    # 0000 1000

# A = 2B - 1 - Shift left then subtract 1
0010 ADD B,1111 # 1111 1010   B = B - 1
0011 JNC 0111   # 1110 0111   JUMP IF NO CARRY to 0110
0100 ADD A,0001 # 1000 0000   A = A + 1
0101 JMP 0010   # 0100 1111   JUMP to 0010
0110 ADD A,1111 # 1111 0000   A = A - 1

# Now add consecutive odd numbers
...
```
Ok, here is where I hit a snag. The next part of the algorithm requires me to do B = B + A, but I can’t add two registers together directly – I have to count down one of the registers whilst adding 1 to the second.
That works fine for one pass through the loop, but at that point I’ve destroyed the value of either A or B and won’t be able to go back around for a 2nd pass…
I started to chew over if there was a way to work incrementally add the values, but every way I considered I ended up with the act that I am having to add two calculated values together at some point and I just can’t do that.
One interesting train of thought initially looked promising: noticing that each odd value to be added is a number of 2s as follows:
```
n=4:  4^2 = 1 + 3 + 5 + 7
          = 2 - 1
          + 2 + 2 - 1
          + 2 + 2 + 2 - 1
          + 2 + 2 + 2 + 2 - 1
```
That is the triangle number for n times two, less n times 1. The formular for a triangle number is as follows:
```
Tri(n) = n (n + 1) / 2
```
So using that many 2s and taking off the appropriate values too, gives me
```
Sq (n) = (2n(n + 1)) / 2   - n
       = (2n(n + 1) - 2n) / 2
       = 2n^2 + 2n - 2n / 2
       = 2n^2 / 2
       = n^2
```
Ah, so basically the square (n) = square (n). That isn’t particularly insightful…
Ok, so this looked really promising, but fundamentally, if I can’t add the two registers together without trashing one of the values, I think I’m stuck again.
Conclusion
I must admit to wondering if the addressing could be extended to permit longer programs. I don’t see why not, if the PC register could be extended. One limitation might be that jumps remain localised to a 4-bit value somehow.
It is proving a real limitation not being able to add the two registers together – that is really limiting what I can work out how to do with it.
It seems that at least one other person has had similar thoughts and had started to work on an extended version – the TD4E and even the TD4E8: https://github.com/Nekhaevalex/TD4. It looks like a lot of the work was done in a simulation, but they did get to produce an assembler: https://hackaday.io/project/161708-4-bit-cpu-td4-once-again.
Other related things I’ve found include:
- Crazy Small CPU (4-bit data, 8-bit address): https://minnie.tuhs.org/Programs/CrazySmallCPU/description.html
- CHUMP ICS4U 4-bit TTL project: http://darcy.rsgc.on.ca/ACES/TEI4M/4BitComputer/index.html
- TPS/MyCo (16 pages of 4-bit addressable memory): https://github.com/GClown25/BIT4
- MiniMax Enhanced TD4: https://github.com/denjhang/MiniMax-4-bit-CPU
- TD4 CPU (adds paging): https://hackaday.io/project/26215-td4-cpu
- 4-bit TTL Scratch Built CPU (4-bit data, 8-bit address): https://www.ttlcpu.com/articles/4-bit-ttl-scratchbuilt-computer
- “How to build a CPU TD4” includes a discussion of expanding the memory: https://xyama.sakura.ne.jp/hp/4bitCPU_TD4.html
Finally, there is this “meta list” of DIY CPU projects: https://www.ttlcpu.com/content/links.
And this seems to be a massive list of TD4 related links: http://www.cable-net.ne.jp/user/takahisa/td4/index.html although, unfortunately many of the linked pages no longer exist. But there continues to be interest out there in the TD4!
I have a few ideas of my own, but if they amount to anything, well that still remains to be seen. But getting this far has been really interesting and I already feel like I’ve learned quite a lot about the lowest levels of our technology.
I have to say, “full stack developer” certainly takes on a new meaning when you get to these levels…
Kevin
#4bit #cpu #td4
#4bit #td4 #cpu
Kevin's Blog @[email protected] · 2025-09-28 · 15:33 UTC
TD4 4-bit DIY CPU – Part 2
Having spent some time analysing how the TD4 4-bit DIY CPU works, I now have my kit, so this post has some notes about the build.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
https://makertube.net/w/hhtpEp7XxpJkgxDSwhAMVQ
The kit I have is labelled “MUSE LAB TD4 Ver 1.3”. Muse labs have a store on Tindie here: https://www.tindie.com/stores/johnnywu/ and this kit is available all over Aliexpress. For me it cost around £20-£25.
The PCB
There are a few immediate things to note:
- I have v1.3 of the PCB according to the printed label.
- There are no component values, so the BOM and schematic will be pretty critical at working out what goes where.
- The micro USB connector and four OUTPUT LEDs are all surface mount.
- There is an additional 5V/GND pin header connector.
- That is a lot of diodes… 128 to be precise.
The GitHub link has a HTML BOM and schematic in the hardware/v1.3 directory. Note these are very slightly different to some of the information in the docs area (apparently there is a mistake in the docs schematic, but the one in v1.3 is correct).
The key parts of BOM, which lists every single component, including each diode, individually, are:
ResistorsR1, R4, R9, R12-191KR2, R7, R8100KR3, R20-2710KR53K3R633KR10, R11100RCapacitorsC1, C2, C310uFDiodesAll1N4148
The ICs are as follows:
ICU174HC74Dual D-Type Flip-flopU274HC14Hex Schmitt Trigger InverterU3, U4, U5, U674HC1614-bit Synchronous Binary CounterU7, U874HC153Dual 4-Line to 1-Line Data Selector/MultiplexerU974HC2834-Bit Binary Full Adder with Fast CarryU1074HC32Quad 2-Input Positive OR GateU1174HC10Three 3-Input Positive NAND GateU1274HC540Octal Buffer and Line Driver with Tri-State OutputsU1374HC1544-Line to 16-Line Decoder/Demultiplexer
In order to make building slightly easier, I printed out the board diagram from the GitHub repository and added the resistor values and IC labels in the correct places.
Before doing anything else, I tackled the two surface mount elements first: the micro USB connector and then the OUTPUT LEDs. I figured if I messed it up then at least I won’t have wasted time building the rest of it first.
As it happens, although it was pretty tricky, as the USB connector only requires the outer two pins connected – for power and ground – it was just about manageable. I pasted the area with loads of flux and popped a bit of solder on each of the two pads. Then I pressed the connector in place and reheated the area until it seemed like it had taken.
One annoyance – the connector’s stress relief connections don’t go through the PCB very far, so I’m really not confident it will last, but I’ve done my best.
The LEDs weren’t too bad. Again after pasting the area with flux, I put some solder on the top pads and then used tweezers to position and solder on one side of an LED. Once that seemed ok I could do the other side and return to the first pad if necessary.
It is important (naturally with LEDs) to ensure the direction is correct. There is an arrow on the underside of each LED and the negative terminal is highlighted in green.
I was able to test the USB socket by plugging in a USB cable and checking for continuity or shorts between the outer USB connections and the GND and 5V header pin socket on the PCB. That seemed ok.
I could similarly test the LEDs by using a multimeter LED tester between the hole/pad just above the LED and the GND header pin socket. That all worked too.
Then it was a case of just getting on with it.
That was a lot of diodes! Referring to my resistor map from above, I did those next, followed by all the IC sockets.
Then I added the two tactile buttons, the 10uF capacitors, the 2-pin power header, all the DIP switches, and finally the two slider switches.
Here is the fully populated board. Note all the vertical ICs face the same direction – pin 1 down – apart from the largest one which is pin 1 up.
Conclusion
This was a really fun build and once the two surface mount elements were done, relatively straight forward.
And best of all – it worked first time! The video shows a simple program to load four different values directly into the OUTPUT register and then loop back to the start.
The main issue I have is that each set of 8 bits is backwards to how I’d expect to be able to read them!
Anyway, time to think about some actual programs to run.
Kevin
#cpu #pcb #td4
#cpu #pcb #td4
Kevin's Blog @[email protected] · 2025-09-28 · 15:33 UTC
TD4 4-bit DIY CPU – Part 2
Having spent some time analysing how the TD4 4-bit DIY CPU works, I now have my kit, so this post has some notes about the build.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
https://makertube.net/w/hhtpEp7XxpJkgxDSwhAMVQ
The kit I have is labelled “MUSE LAB TD4 Ver 1.3”. Muse labs have a store on Tindie here: https://www.tindie.com/stores/johnnywu/ and this kit is available all over Aliexpress. For me it cost around £20-£25.
The PCB
There are a few immediate things to note:
- I have v1.3 of the PCB according to the printed label.
- There are no component values, so the BOM and schematic will be pretty critical at working out what goes where.
- The micro USB connector and four OUTPUT LEDs are all surface mount.
- There is an additional 5V/GND pin header connector.
- That is a lot of diodes… 128 to be precise.
The GitHub link has a HTML BOM and schematic in the hardware/v1.3 directory. Note these are very slightly different to some of the information in the docs area (apparently there is a mistake in the docs schematic, but the one in v1.3 is correct).
The key parts of BOM, which lists every single component, including each diode, individually, are:
ResistorsR1, R4, R9, R12-191KR2, R7, R8100KR3, R20-2710KR53K3R633KR10, R11100RCapacitorsC1, C2, C310uFDiodesAll1N4148
The ICs are as follows:
ICU174HC74Dual D-Type Flip-flopU274HC14Hex Schmitt Trigger InverterU3, U4, U5, U674HC1614-bit Synchronous Binary CounterU7, U874HC153Dual 4-Line to 1-Line Data Selector/MultiplexerU974HC2834-Bit Binary Full Adder with Fast CarryU1074HC32Quad 2-Input Positive OR GateU1174HC10Three 3-Input Positive NAND GateU1274HC540Octal Buffer and Line Driver with Tri-State OutputsU1374HC1544-Line to 16-Line Decoder/Demultiplexer
In order to make building slightly easier, I printed out the board diagram from the GitHub repository and added the resistor values and IC labels in the correct places.
Before doing anything else, I tackled the two surface mount elements first: the micro USB connector and then the OUTPUT LEDs. I figured if I messed it up then at least I won’t have wasted time building the rest of it first.
As it happens, although it was pretty tricky, as the USB connector only requires the outer two pins connected – for power and ground – it was just about manageable. I pasted the area with loads of flux and popped a bit of solder on each of the two pads. Then I pressed the connector in place and reheated the area until it seemed like it had taken.
One annoyance – the connector’s stress relief connections don’t go through the PCB very far, so I’m really not confident it will last, but I’ve done my best.
The LEDs weren’t too bad. Again after pasting the area with flux, I put some solder on the top pads and then used tweezers to position and solder on one side of an LED. Once that seemed ok I could do the other side and return to the first pad if necessary.
It is important (naturally with LEDs) to ensure the direction is correct. There is an arrow on the underside of each LED and the negative terminal is highlighted in green.
I was able to test the USB socket by plugging in a USB cable and checking for continuity or shorts between the outer USB connections and the GND and 5V header pin socket on the PCB. That seemed ok.
I could similarly test the LEDs by using a multimeter LED tester between the hole/pad just above the LED and the GND header pin socket. That all worked too.
Then it was a case of just getting on with it.
That was a lot of diodes! Referring to my resistor map from above, I did those next, followed by all the IC sockets.
Then I added the two tactile buttons, the 10uF capacitors, the 2-pin power header, all the DIP switches, and finally the two slider switches.
Here is the fully populated board. Note all the vertical ICs face the same direction – pin 1 down – apart from the largest one which is pin 1 up.
Conclusion
This was a really fun build and once the two surface mount elements were done, relatively straight forward.
And best of all – it worked first time! The video shows a simple program to load four different values directly into the OUTPUT register and then loop back to the start.
The main issue I have is that each set of 8 bits is backwards to how I’d expect to be able to read them!
Anyway, time to think about some actual programs to run.
Kevin
#cpu #pcb #td4
#cpu #pcb #td4
Kevin's Blog @[email protected] · 2025-09-18 · 17:45 UTC
TD4 4-bit DIY CPU
I was looking for DIY CPU projects, as I like kits that help me think at the lowest level of processing. It helps keep me grounded in how far technology has come over the years.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
Some of the options that I know about, that actually come as kits you can buy and are interesting for me for DIY computers are:
- RC2014 and compatible for Z80 based computers: https://rc2014.co.uk/
- Small Computer Central for a range of Z80, Z180, computers: https://smallcomputercentral.com/
- Ben Eater’s 6502 computer: https://eater.net/6502
- Nick Bild’s 6502 Vectron 64 computer: https://github.com/nickbild/vectron_64
But I wanted to go further down and actually find something that lets me build a simple CPU from gates. Here there are several options too:
- NAND to Tetris: https://www.nand2tetris.org/ (only available via emulation)
- Ben Eater’s 8-bit computer: https://eater.net/8bit
- Gigatron 8-bit computer: https://www.tindie.com/products/johnson/gigatron-ttl-microcomputer-diy-kit/
- TD4 4-bit computer: https://github.com/wuxx/TD4-4BIT-CPU
- TD4 4-bit computer deluxe kit: https://www.budgetronics.eu/en/building-kits/td4-deluxe-kit-build-your-own-mini-cpu-with-ttl-logic/a-26091-20
- MiniMax 4-bit CPU: https://www.tindie.com/products/denjhang/minimax-4bit-cpu-td4-architecture-cpusbc/
Whilst I’d love to build Ben Eater’s 8-bit CPU, the kit as provided is too much of an outlay for me. It is ~$300 – I mean, good for what you get and all the knowledge, but it is a solderless breadboard kit and that isn’t really what I’m after. The Gigatron is a distinct possibility that I’ll come back to at some point I think.
NAND to Tetris is excellent, and I have their book, but it is all emulated or virtualised, which does allow for all the scaling required for an (arguably) actually useful device, but isn’t designed to be built in actual hardware.
But the TD4 is really interesting. It is available as a PCB and components for approx £25 on Aliexpress and based on an open source design that shows the basic operation of a 4-bit CPU.
The “deluxe” kit mentioned above is a lot more expensive ~£120 but has all signals broken out to LEDs which, whilst is an awful lot of soldering, does looks incredibly impressive! The MiniMax is an evolution of the TD4 and kits for that are around £120. In fact, searching on Tindie and Hackaday.io for “TD4” will surface a few other DIY projects and even kits to purchase.
The TD4 does seem to fit the bill for me as an inexpensive kit to try. The downside is that documentation for it (in English) is pretty sparse.
The TD4 project itself is by “wuxx” an embedded engineer from HangZhou and much of the documentation is in Chinese. It is based on a Japenese book by Kaoru Tonami called “how to build a CPU” which can be found for ~£50 online, but as I don’t know Japanese either is unlikely to help me very much.
There are some sources of information that others have put together though, so I’m going to be using those as a starting point along with whatever I can figure out myself:
- The original GitHub project (plus online translation): https://github.com/wuxx/TD4-4BIT-CPU
- Philip Zucker’s “Guide to the TD4 4-bit DIY CPU”: https://www.philipzucker.com/td4-4bit-cpu/
- Kevin Gibb’s “teardownit” “DIY 4-bit CPU”: https://teardownit.quora.com/DIY-4-bit-CPU-Have-you-ever-made-a-processor-I-did-Took-me-just-12-microchips-and-a-clock-generator-The-processor-c
- Minoru Yamamoto’s “How to create a CPU TD4”: https://xyama.sakura.ne.jp/hp/4bitCPU_TD4.html
This post is my own “thinking out loud” as I work through the various parts to see how they work.
Basic Architecture
This is a 4-bit computer, with a 4-bit data bus, 4-bit commands, and a 4-bit address bus.
There is a block diagram on GitHub:
The fundamental process is as follows. For each “tick” of the computer:
- An OpCode is read from the ROM using the current 4-bit address (0 to 15) from the program counter.
- Each ROM entry is an 8-bit word with 4-bits as a command and 4-bits as data for the command.
- The data selector determines a 4-bit INPUT value. This can come from one of the two registers (A or B); or a set of four switches for the IN register; or be set to zero.
- This goes to the adder which adds it with the immediate data from the ROM (which could of course be zero).
- The OUTPUT of the adder can go to either of the two registers (A or B), an OUT register which is hooked up to four LEDS, or the program counter register to create a “jump”.
I’ll pull apart the different parts of the CPU in the following sections.
ROM Format
Each 8-bit word in the 16-byte ROM has the following format:
- 4 command bits
- 4 immediate data bits
Instruction Decoding
The 4 command bits from each ROM instruction have to be turned into the various selection signals to activate different parts of the CPU.
There is a table from GitHub again:
The explanation in Japanese translates (apparently) to:
“Explanation: The SEL_B and SEL_A signals select the ALU data source, while #LOAD0-#LOAD3 select the ALU data destination. More formally, they control the source and destination operands of instructions, respectively.”
From this we can note the following:
- There is no instruction for 1000,1010,1100 or 1101.
- Instruction 1110 appears twice, and the selectors set are dependent on the state of the C (carry) flag.
- Some instructions act on immediate data, others assume it will be 0.
The LOAD# have the following meanings in the system:
- LOAD#0 – Register A (A)
- LOAD#1 – Register B (B)
- LOAD#2 – OUTPUT (OUT)
- LOAD#3 – Program counter (PC)
The actual decoding happens in two parts: input selection; and output selection.
Registers
The system has four registers, each formed from a 74HC161 “presettable, synchronous, 4-bit binary counter”. There are two general purpose registers: A and B. There is one output register, whose contents drive the state of four LEDs. And there is a program counter. Here is the schematic for register A:
P0-P3 come from the output of the adder directly. RST and CLK are hopefully self-explanatory. For the A and B registers, Q0-Q3 go into the INPUT selection section (see later). For the OUTPUT register, these go directly to LEDs. For the program counter, these go into the ROM address logic (again more on that later).
The relevant operation of the 161 is described in the datasheet:
“The outputs (Q0 to Q3) of the counters may be preset HIGH or LOW. A LOW at the parallel enable input (PE) disables the counting action and causes the data at the data inputs (D0 to D3) to be loaded into the counter on the positive-going edge of the clock… A LOW at the master reset input (MR) sets Q0 to Q3 LOW…”
So on reset the outputs are all 0. When PE goes LOW, on the next clock pulse, the value on the inputs (P0-P3) is loaded into the counter and reflected on Q0-Q3. However, because CET and CEP are LOW the counter won’t actually count any further.
The program counter is a bit special, in that it is actually allowed the count by having CET and CEP set HIGH. This allows it to step through the instructions on a clock pulse.
In this case Q0-Q3 go off to the ROM address decoding, which I’ll come to in a moment.
INPUT Selection
There are two SELECT lines select the INPUT data as follows:
SEL_BSEL_ASOURCE00Register A (A)01Register B (B)10INPUT (IN)11Zero value (0)
Input selection is handled by two 74HC153 dual 4-input multiplexers. Two are required as there are four data lines to be switched, and they all have one of four options to switch between based on the SELECT lines above.
Here is the relevant part of the schematic.
On the left are the three sets of four data signals that come from the A, B and IN inputs. D0 from each of the inputs goes to U7/1Cn; D1 goes to U7/2Cn; D2 to U8/1Cn; and D3 to U8/2Cn. Notice that the fourth set of data signals (U7/1C3, 2C3 and U8 1C3, 2C3) are connected directly to GND for the “zero” INPUT state (SEL_A=1, SEL_B=1).
On the right, the two pairs of outputs make up the four data lines to feed into the adder section.
So where does the SEL_A and SEL_B signals come from? From the schematic, we can see:
- SEL_A = D4 OR D7 (via U10B – one of the 74HC32 2-input OR gates)
- SEL_B = D5
We can start to explain why some of the instruction combinations don’t exist (or at least, aren’t distinct) as we can see that SEL_A depends on either D4 or D7.
OUTPUT Selection
The OUTPUT selection is a little more complicated. As previously mentioned, there are four destinations: the two registers, the OUTPUT register, and the program counter.
Each register has a /PE (“parallel enable input”) signal which is active low. These are individually fed by the output of the LOAD# logic.
The three signals at the bottom are D6, D7 and D4. The lone signal top left is the carry (/C) flag, and the four outputs top right are the four LOAD# signals which feed directly into the /PE pins of the four registers.
So from this we deduce the following relationships:
- Reg A LOAD0 HIGH = D6 OR D7 – so LOAD0 is only active (LOW) when both D6 and D7 are LOW.
- Reg B LOAD1 HIGH = NOT D6 OR D7 – so LOAD1 is only active (LOW) when D6 is HIGH and D7 is LOW.
- OUT LOAD2 LOW = NOT D6 AND D7 – so LOAD2 is only active (LOW) when D6 is LOW and D7 is HIGH.
- PC LOAD3 LOW = D6 AND D7 AND (D4 OR /C) – so LOAD3 is only active (LOW) when both D6 and D7 are HIGH and either D4 is HIGH or the carry signal (/C) is LOW.
This effectively means that D6 is used to select between registers A and B when D7 is LOW; and between OUT and PC when D7 is HIGH (subject to either D4 or the /C signal too in the case of PC).
Once again, we can see that there is some redundancy in the system for certain combinations of D4 to D7.
ROM Address Decoding
The 4-bit output from the program counter is effectively a 4-bit address bus. This gets turned into a set of selection signals to select which “byte” of the ROM should be active.
This simply uses a 74HC154, 4 to 16 line decoder, meaning that a 4-bit number goes in and one of 16 corresponding outputs goes LOW whilst the rest remain HIGH. There is no memory address or matrix handling – there is literally one control line per “memory” location.
The ROM itself is a set of 16 8-way DIP switches and diodes, so once its control signal is active (LOW) then those DIP switches become relevant on the data bus. Here is the last location and data bus logic. Note that all data signals are pulled HIGH by default, so will only be read as LOW if the DIP switch connects it to LOW via the diode, and that is only possible if that DIP block is selected from the 4 to 16 line decoder.
The 74HC540 is an inverting line buffer, turning any active LOW DIP switch settings into HIGH signals on the command/data bus. Recall that D0-D3 represent immediate data and D4-D7 represent command logic.
The Adder (ALU)
The arithmetic logic unit (ALU) for this CPU is a simple adder. A 74HC283 is a 4-bit binary full adder. “full” in that it supports 4-bit add-with-carry functionality, although in this design, carry is only used on the output stage – it doesn’t form part of the input addition.
A0-A3 comes from the INPUT selection circuitry, so can represent either register A or B, the state of the IN switches, or a fixed zero (0) value. B0-B3 comes directly from D0-D3 from the ROM contents as selected by the ROM addressing logic.
The COUT (carry) flag goes into a flip-flop and the active LOW version of the output is used as the carry flag in the LOAD# decoding logic to support the “JUMP IF NOT CARRY” instruction. So returning to the logic of #LOAD3, we have:
```
  COUT    /C    D4   D6   D7    LOAD3
    0      1     X    1    1      0    -> Dst = PC
    X      X     1    1    1      0    -> Dst = PC
```
Hence a jump will only happen (i.e. the PC get loaded) either if D4, D6, D7 are all 1 (unconditional) or if D4 =0, D6, D7 are 1 (conditional) if the CARRY flag is NOT set by the adder, resulting in /C = 1.
Some of the ROM instructions require D0-D3 to be zero in which case the adder is effectively taking the input (A, B, IN, 0) and loading it into the destination register (A, B, OUT, PC).
Notice that the adder does not use the carry in (CIN). This is tied to zero. Apparently this was left floating on an earlier revision of the board, which caused spurious results!
Putting it all Together
The complete truth table for the SEL, D4-7 and LOAD signals is as follows.
SEL_BSEL_AD4D5D6D7LD0/ALD1/BLD2/OPLD3/PCADD A,i0000LL00000111MOV AB0001LH10000111IN A0010HL01000111MOV A,i0011HH11000111MOV BA0100LL00101011ADD B,i0101LH10101011IN B0110HL01101011MOV B,i0111HH111010111000LH00011101OUT B1001LH100111011010HH01011101OUT i1011HH110111011100LH0011111=C1101LH10111110JNC1110HH0111111=CJMP1111HH11111110
Returning to our instruction table, we can see how the decoding of the D4-D7 lines leads to enacting the various commands. In particular, we can now expand the table to show how the SEL and LOAD logic results in selecting the source and destination registers as follows:
D7-D4D3-D0INPUTOUTPUTADD A, data0000dataAAMOV A, B00010000BAIN A00100000INAMOV A, data0011data0AMOV B, A01000000ABADD B, data0101dataBBIN B01100000INBMOV B, data0111data0BOUT B *10000000BOUTOUT B10010000BOUTOUT data *1010data0OUTOUT data1011data0OUTJNC B *1100dataB/CPC/noneJMP B *1101dataBPCJNC1110data0/CPC/noneJMP1111data0PC
As per the table, we can also now infer the missing, or duplicate, instructions (marked * above).
In this table, the output will always be the addition of the INPUT and D3-D0, so everywhere 0 is specified for D3-D0 then in reality a value could be placed here instead. But then the instruction would take on a different meaning.
For example, MOV A, B is really MOV A, B+data, which really only makes sense when data is set to 0 otherwise overflows are very likely to occur.
It is also worth noting that SEL_A depends on either D4 or D7, and when SEL_A is set to 1 the input can only be either register B or zero. However, to output to OUT or PC, D7 has to be set. This means that instructions that act on OUT or PC can only take an input from register B or zero.
The two JMP B instructions are going to be of limited use too. They are essentially JMP to B+data instructions. There are probably some creative uses of such instructions, but for simplicity, keeping to the “0” versions that just depend on the immediate data is probably best.
Utility Blocks
There is one section of the circuit that hasn’t been considered yet. There is a block that provides the clock and reset circuitry.
The clock is based on a Schmidt trigger oscillator and can run on automatic or on manual trigger. There are two selectable speeds: 1Hz or 10Hz.
Both the clock and reset signals feed into the four registers and the carry flip-flop.
The remaining block is the power. It has a micro-USB socket and has to be powered from 5V directly either via the USB socket or directly into a 2-pin jumper header.
Conclusion
I have one on order. I’m looking forward to building it and giving it a go!
I really like the LEDs on the deluxe version, but that is a bit too much for me just for some messing around, but I am wondering how difficult it would be to attempt my own version with a few extra LEDs.
Assuming I manage to get one built and working, I’ll have a poke about at some signals and see what the art of the possible might be.
Kevin
#4bit #cpu #LOAD0 #LOAD3 #TD4
#4bit #cpu #load0 #load3 #td4
Kevin's Blog @[email protected] · 2025-09-18 · 17:45 UTC
TD4 4-bit DIY CPU
I was looking for DIY CPU projects, as I like kits that help me think at the lowest level of processing. It helps keep me grounded in how far technology has come over the years.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
Some of the options that I know about, that actually come as kits you can buy and are interesting for me for DIY computers are:
- RC2014 and compatible for Z80 based computers: https://rc2014.co.uk/
- Small Computer Central for a range of Z80, Z180, computers: https://smallcomputercentral.com/
- Ben Eater’s 6502 computer: https://eater.net/6502
- Nick Bild’s 6502 Vectron 64 computer: https://github.com/nickbild/vectron_64
But I wanted to go further down and actually find something that lets me build a simple CPU from gates. Here there are several options too:
- NAND to Tetris: https://www.nand2tetris.org/ (only available via emulation)
- Ben Eater’s 8-bit computer: https://eater.net/8bit
- Gigatron 8-bit computer: https://www.tindie.com/products/johnson/gigatron-ttl-microcomputer-diy-kit/
- TD4 4-bit computer: https://github.com/wuxx/TD4-4BIT-CPU
- TD4 4-bit computer deluxe kit: https://www.budgetronics.eu/en/building-kits/td4-deluxe-kit-build-your-own-mini-cpu-with-ttl-logic/a-26091-20
- MiniMax 4-bit CPU: https://www.tindie.com/products/denjhang/minimax-4bit-cpu-td4-architecture-cpusbc/
Whilst I’d love to build Ben Eater’s 8-bit CPU, the kit as provided is too much of an outlay for me. It is ~$300 – I mean, good for what you get and all the knowledge, but it is a solderless breadboard kit and that isn’t really what I’m after. The Gigatron is a distinct possibility that I’ll come back to at some point I think.
NAND to Tetris is excellent, and I have their book, but it is all emulated or virtualised, which does allow for all the scaling required for an (arguably) actually useful device, but isn’t designed to be built in actual hardware.
But the TD4 is really interesting. It is available as a PCB and components for approx £25 on Aliexpress and based on an open source design that shows the basic operation of a 4-bit CPU.
The “deluxe” kit mentioned above is a lot more expensive ~£120 but has all signals broken out to LEDs which, whilst is an awful lot of soldering, does looks incredibly impressive! The MiniMax is an evolution of the TD4 and kits for that are around £120. In fact, searching on Tindie and Hackaday.io for “TD4” will surface a few other DIY projects and even kits to purchase.
The TD4 does seem to fit the bill for me as an inexpensive kit to try. The downside is that documentation for it (in English) is pretty sparse.
The TD4 project itself is by “wuxx” an embedded engineer from HangZhou and much of the documentation is in Chinese. It is based on a Japenese book by Kaoru Tonami called “how to build a CPU” which can be found for ~£50 online, but as I don’t know Japanese either is unlikely to help me very much.
There are some sources of information that others have put together though, so I’m going to be using those as a starting point along with whatever I can figure out myself:
- The original GitHub project (plus online translation): https://github.com/wuxx/TD4-4BIT-CPU
- Philip Zucker’s “Guide to the TD4 4-bit DIY CPU”: https://www.philipzucker.com/td4-4bit-cpu/
- Kevin Gibb’s “teardownit” “DIY 4-bit CPU”: https://teardownit.quora.com/DIY-4-bit-CPU-Have-you-ever-made-a-processor-I-did-Took-me-just-12-microchips-and-a-clock-generator-The-processor-c
- Minoru Yamamoto’s “How to create a CPU TD4”: https://xyama.sakura.ne.jp/hp/4bitCPU_TD4.html
This post is my own “thinking out loud” as I work through the various parts to see how they work.
Basic Architecture
This is a 4-bit computer, with a 4-bit data bus, 4-bit commands, and a 4-bit address bus.
There is a block diagram on GitHub:
The fundamental process is as follows. For each “tick” of the computer:
- An OpCode is read from the ROM using the current 4-bit address (0 to 15) from the program counter.
- Each ROM entry is an 8-bit word with 4-bits as a command and 4-bits as data for the command.
- The data selector determines a 4-bit INPUT value. This can come from one of the two registers (A or B); or a set of four switches for the IN register; or be set to zero.
- This goes to the adder which adds it with the immediate data from the ROM (which could of course be zero).
- The OUTPUT of the adder can go to either of the two registers (A or B), an OUT register which is hooked up to four LEDS, or the program counter register to create a “jump”.
I’ll pull apart the different parts of the CPU in the following sections.
ROM Format
Each 8-bit word in the 16-byte ROM has the following format:
- 4 command bits
- 4 immediate data bits
Instruction Decoding
The 4 command bits from each ROM instruction have to be turned into the various selection signals to activate different parts of the CPU.
There is a table from GitHub again:
The explanation in Japanese translates (apparently) to:
“Explanation: The SEL_B and SEL_A signals select the ALU data source, while #LOAD0-#LOAD3 select the ALU data destination. More formally, they control the source and destination operands of instructions, respectively.”
From this we can note the following:
- There is no instruction for 1000,1010,1100 or 1101.
- Instruction 1110 appears twice, and the selectors set are dependent on the state of the C (carry) flag.
- Some instructions act on immediate data, others assume it will be 0.
The LOAD# have the following meanings in the system:
- LOAD#0 – Register A (A)
- LOAD#1 – Register B (B)
- LOAD#2 – OUTPUT (OUT)
- LOAD#3 – Program counter (PC)
The actual decoding happens in two parts: input selection; and output selection.
Registers
The system has four registers, each formed from a 74HC161 “presettable, synchronous, 4-bit binary counter”. There are two general purpose registers: A and B. There is one output register, whose contents drive the state of four LEDs. And there is a program counter. Here is the schematic for register A:
P0-P3 come from the output of the adder directly. RST and CLK are hopefully self-explanatory. For the A and B registers, Q0-Q3 go into the INPUT selection section (see later). For the OUTPUT register, these go directly to LEDs. For the program counter, these go into the ROM address logic (again more on that later).
The relevant operation of the 161 is described in the datasheet:
“The outputs (Q0 to Q3) of the counters may be preset HIGH or LOW. A LOW at the parallel enable input (PE) disables the counting action and causes the data at the data inputs (D0 to D3) to be loaded into the counter on the positive-going edge of the clock… A LOW at the master reset input (MR) sets Q0 to Q3 LOW…”
So on reset the outputs are all 0. When PE goes LOW, on the next clock pulse, the value on the inputs (P0-P3) is loaded into the counter and reflected on Q0-Q3. However, because CET and CEP are LOW the counter won’t actually count any further.
The program counter is a bit special, in that it is actually allowed the count by having CET and CEP set HIGH. This allows it to step through the instructions on a clock pulse.
In this case Q0-Q3 go off to the ROM address decoding, which I’ll come to in a moment.
INPUT Selection
There are two SELECT lines select the INPUT data as follows:
SEL_BSEL_ASOURCE00Register A (A)01Register B (B)10INPUT (IN)11Zero value (0)
Input selection is handled by two 74HC153 dual 4-input multiplexers. Two are required as there are four data lines to be switched, and they all have one of four options to switch between based on the SELECT lines above.
Here is the relevant part of the schematic.
On the left are the three sets of four data signals that come from the A, B and IN inputs. D0 from each of the inputs goes to U7/1Cn; D1 goes to U7/2Cn; D2 to U8/1Cn; and D3 to U8/2Cn. Notice that the fourth set of data signals (U7/1C3, 2C3 and U8 1C3, 2C3) are connected directly to GND for the “zero” INPUT state (SEL_A=1, SEL_B=1).
On the right, the two pairs of outputs make up the four data lines to feed into the adder section.
So where does the SEL_A and SEL_B signals come from? From the schematic, we can see:
- SEL_A = D4 OR D7 (via U10B – one of the 74HC32 2-input OR gates)
- SEL_B = D5
We can start to explain why some of the instruction combinations don’t exist (or at least, aren’t distinct) as we can see that SEL_A depends on either D4 or D7.
OUTPUT Selection
The OUTPUT selection is a little more complicated. As previously mentioned, there are four destinations: the two registers, the OUTPUT register, and the program counter.
Each register has a /PE (“parallel enable input”) signal which is active low. These are individually fed by the output of the LOAD# logic.
The three signals at the bottom are D6, D7 and D4. The lone signal top left is the carry (/C) flag, and the four outputs top right are the four LOAD# signals which feed directly into the /PE pins of the four registers.
So from this we deduce the following relationships:
- Reg A LOAD0 HIGH = D6 OR D7 – so LOAD0 is only active (LOW) when both D6 and D7 are LOW.
- Reg B LOAD1 HIGH = NOT D6 OR D7 – so LOAD1 is only active (LOW) when D6 is HIGH and D7 is LOW.
- OUT LOAD2 LOW = NOT D6 AND D7 – so LOAD2 is only active (LOW) when D6 is LOW and D7 is HIGH.
- PC LOAD3 LOW = D6 AND D7 AND (D4 OR /C) – so LOAD3 is only active (LOW) when both D6 and D7 are HIGH and either D4 is HIGH or the carry signal (/C) is LOW.
This effectively means that D6 is used to select between registers A and B when D7 is LOW; and between OUT and PC when D7 is HIGH (subject to either D4 or the /C signal too in the case of PC).
Once again, we can see that there is some redundancy in the system for certain combinations of D4 to D7.
ROM Address Decoding
The 4-bit output from the program counter is effectively a 4-bit address bus. This gets turned into a set of selection signals to select which “byte” of the ROM should be active.
This simply uses a 74HC154, 4 to 16 line decoder, meaning that a 4-bit number goes in and one of 16 corresponding outputs goes LOW whilst the rest remain HIGH. There is no memory address or matrix handling – there is literally one control line per “memory” location.
The ROM itself is a set of 16 8-way DIP switches and diodes, so once its control signal is active (LOW) then those DIP switches become relevant on the data bus. Here is the last location and data bus logic. Note that all data signals are pulled HIGH by default, so will only be read as LOW if the DIP switch connects it to LOW via the diode, and that is only possible if that DIP block is selected from the 4 to 16 line decoder.
The 74HC540 is an inverting line buffer, turning any active LOW DIP switch settings into HIGH signals on the command/data bus. Recall that D0-D3 represent immediate data and D4-D7 represent command logic.
The Adder (ALU)
The arithmetic logic unit (ALU) for this CPU is a simple adder. A 74HC283 is a 4-bit binary full adder. “full” in that it supports 4-bit add-with-carry functionality, although in this design, carry is only used on the output stage – it doesn’t form part of the input addition.
A0-A3 comes from the INPUT selection circuitry, so can represent either register A or B, the state of the IN switches, or a fixed zero (0) value. B0-B3 comes directly from D0-D3 from the ROM contents as selected by the ROM addressing logic.
The COUT (carry) flag goes into a flip-flop and the active LOW version of the output is used as the carry flag in the LOAD# decoding logic to support the “JUMP IF NOT CARRY” instruction. So returning to the logic of #LOAD3, we have:
```
  COUT    /C    D4   D6   D7    LOAD3
    0      1     X    1    1      0    -> Dst = PC
    X      X     1    1    1      0    -> Dst = PC
```
Hence a jump will only happen (i.e. the PC get loaded) either if D4, D6, D7 are all 1 (unconditional) or if D4 =0, D6, D7 are 1 (conditional) if the CARRY flag is NOT set by the adder, resulting in /C = 1.
Some of the ROM instructions require D0-D3 to be zero in which case the adder is effectively taking the input (A, B, IN, 0) and loading it into the destination register (A, B, OUT, PC).
Notice that the adder does not use the carry in (CIN). This is tied to zero. Apparently this was left floating on an earlier revision of the board, which caused spurious results!
Putting it all Together
The complete truth table for the SEL, D4-7 and LOAD signals is as follows.
SEL_BSEL_AD4D5D6D7LD0/ALD1/BLD2/OPLD3/PCADD A,i0000LL00000111MOV AB0001LH10000111IN A0010HL01000111MOV A,i0011HH11000111MOV BA0100LL00101011ADD B,i0101LH10101011IN B0110HL01101011MOV B,i0111HH111010111000LH00011101OUT B1001LH100111011010HH01011101OUT i1011HH110111011100LH0011111=C1101LH10111110JNC1110HH0111111=CJMP1111HH11111110
Returning to our instruction table, we can see how the decoding of the D4-D7 lines leads to enacting the various commands. In particular, we can now expand the table to show how the SEL and LOAD logic results in selecting the source and destination registers as follows:
D7-D4D3-D0INPUTOUTPUTADD A, data0000dataAAMOV A, B00010000BAIN A00100000INAMOV A, data0011data0AMOV B, A01000000ABADD B, data0101dataBBIN B01100000INBMOV B, data0111data0BOUT B *10000000BOUTOUT B10010000BOUTOUT data *1010data0OUTOUT data1011data0OUTJNC B *1100dataB/CPC/noneJMP B *1101dataBPCJNC1110data0/CPC/noneJMP1111data0PC
As per the table, we can also now infer the missing, or duplicate, instructions (marked * above).
In this table, the output will always be the addition of the INPUT and D3-D0, so everywhere 0 is specified for D3-D0 then in reality a value could be placed here instead. But then the instruction would take on a different meaning.
For example, MOV A, B is really MOV A, B+data, which really only makes sense when data is set to 0 otherwise overflows are very likely to occur.
It is also worth noting that SEL_A depends on either D4 or D7, and when SEL_A is set to 1 the input can only be either register B or zero. However, to output to OUT or PC, D7 has to be set. This means that instructions that act on OUT or PC can only take an input from register B or zero.
The two JMP B instructions are going to be of limited use too. They are essentially JMP to B+data instructions. There are probably some creative uses of such instructions, but for simplicity, keeping to the “0” versions that just depend on the immediate data is probably best.
Utility Blocks
There is one section of the circuit that hasn’t been considered yet. There is a block that provides the clock and reset circuitry.
The clock is based on a Schmidt trigger oscillator and can run on automatic or on manual trigger. There are two selectable speeds: 1Hz or 10Hz.
Both the clock and reset signals feed into the four registers and the carry flip-flop.
The remaining block is the power. It has a micro-USB socket and has to be powered from 5V directly either via the USB socket or directly into a 2-pin jumper header.
Conclusion
I have one on order. I’m looking forward to building it and giving it a go!
I really like the LEDs on the deluxe version, but that is a bit too much for me just for some messing around, but I am wondering how difficult it would be to attempt my own version with a few extra LEDs.
Assuming I manage to get one built and working, I’ll have a poke about at some signals and see what the art of the possible might be.
Kevin
#4bit #cpu #load0 #load3 #td4
#cpu #td4 #load0 #load3 #4bit
Kevin's Blog @[email protected] · 2025-09-18 · 17:45 UTC
TD4 4-bit DIY CPU
I was looking for DIY CPU projects, as I like kits that help me think at the lowest level of processing. It helps keep me grounded in how far technology has come over the years.
Some of the options that I know about, that actually come as kits you can buy and are interesting for me for DIY computers are:
- RC2014 and compatible for Z80 based computers: https://rc2014.co.uk/
- Small Computer Central for a range of Z80, Z180, computers: https://smallcomputercentral.com/
- Ben Eater’s 6502 computer: https://eater.net/6502
But I wanted to go further down and actually find something that lets me build a simple CPU from gates. Here there are several options too:
- NAND to Tetris: https://www.nand2tetris.org/ (only available via emulation)
- Ben Eater’s 8-bit computer: https://eater.net/8bit
- Gigatron 8-bit computer: https://www.tindie.com/products/johnson/gigatron-ttl-microcomputer-diy-kit/
- TD4 4-bit computer: https://github.com/wuxx/TD4-4BIT-CPU
- TD4 4-bit computer deluxe kit: https://www.budgetronics.eu/en/building-kits/td4-deluxe-kit-build-your-own-mini-cpu-with-ttl-logic/a-26091-20
- MiniMax 4-bit CPU: https://www.tindie.com/products/denjhang/minimax-4bit-cpu-td4-architecture-cpusbc/
Whilst I’d love to build Ben Eater’s 8-bit CPU, the kit as provided is too much of an outlay for me. It is ~$300 – I mean, good for what you get and all the knowledge, but it is a solderless breadboard kit and that isn’t really what I’m after. The Gigatron is a distinct possibility that I’ll come back to at some point I think.
NAND to Tetris is excellent, and I have their book, but it is all emulated or virtualised, which does allow for all the scaling required for an (arguably) actually useful device, but isn’t designed to be built in actual hardware.
But the TD4 is really interesting. It is available as a PCB and components for approx £25 on Aliexpress and based on an open source design that shows the basic operation of a 4-bit CPU.
The “deluxe” kit mentioned above is a lot more expensive ~£120 but has all signals broken out to LEDs which, whilst is an awful lot of soldering, does looks incredibly impressive! The MiniMax is an evolution of the TD4 and kits for that are around £120. In fact, searching on Tindie and Hackaday.io for “TD4” will surface a few other DIY projects and even kits to purchase.
The TD4 does seem to fit the bill for me as an inexpensive kit to try. The downside is that documentation for it (in English) is pretty sparse.
The TD4 project itself is by “wuxx” an embedded engineer from HangZhou and much of the documentation is in Chinese. It is based on a Japenese book by Kaoru Tonami called “how to build a CPU” which can be found for ~£50 online, but as I don’t know Japanese either is unlikely to help me very much.
There are some sources of information that others have put together though, so I’m going to be using those as a starting point along with whatever I can figure out myself:
- The original GitHub project (plus online translation): https://github.com/wuxx/TD4-4BIT-CPU
- Philip Zucker’s “Guide to the TD4 4-bit DIY CPU”: https://www.philipzucker.com/td4-4bit-cpu/
- Kevin Gibb’s “teardownit” “DIY 4-bit CPU”: https://teardownit.quora.com/DIY-4-bit-CPU-Have-you-ever-made-a-processor-I-did-Took-me-just-12-microchips-and-a-clock-generator-The-processor-c
This post is my own “thinking out loud” as I work through the various parts to see how they work.
Basic Architecture
This is a 4-bit computer, with a 4-bit data bus, 4-bit commands, and a 4-bit address bus.
There is a block diagram on GitHub:
The fundamental process is as follows. For each “tick” of the computer:
- An OpCode is read from the ROM using the current 4-bit address (0 to 15) from the program counter.
- Each ROM entry is an 8-bit word with 4-bits as a command and 4-bits as data for the command.
- The data selector determines a 4-bit INPUT value. This can come from one of the two registers (A or B); or a set of four switches for the IN register; or be set to zero.
- This goes to the adder which adds it with the immediate data from the ROM (which could of course be zero).
- The OUTPUT of the adder can go to either of the two registers (A or B), an OUT register which is hooked up to four LEDS, or the program counter register to create a “jump”.
I’ll pull apart the different parts of the CPU in the following sections.
ROM Format
Each 8-bit word in the 16-byte ROM has the following format:
- 4 command bits
- 4 immediate data bits
Instruction Decoding
The 4 command bits from each ROM instruction have to be turned into the various selection signals to activate different parts of the CPU.
There is a table from GitHub again:
The explanation in Japanese translates (apparently) to:
“Explanation: The SEL_B and SEL_A signals select the ALU data source, while #LOAD0-#LOAD3 select the ALU data destination. More formally, they control the source and destination operands of instructions, respectively.”
From this we can note the following:
- There is no instruction for 1000,1010,1100 or 1101.
- Instruction 1110 appears twice, and the selectors set are dependent on the state of the C (carry) flag.
- Some instructions act on immediate data, others assume it will be 0.
The LOAD# have the following meanings in the system:
- LOAD#0 – Register A (A)
- LOAD#1 – Register B (B)
- LOAD#2 – OUTPUT (OUT)
- LOAD#3 – Program counter (PC)
The actual decoding happens in two parts: input selection; and output selection.
Registers
The system has four registers, each formed from a 74HC161 “presettable, synchronous, 4-bit binary counter”. There are two general purpose registers: A and B. There is one output register, whose contents drive the state of four LEDs. And there is a program counter. Here is the schematic for register A:
P0-P3 come from the output of the adder directly. RST and CLK are hopefully self-explanatory. For the A and B registers, Q0-Q3 go into the INPUT selection section (see later). For the OUTPUT register, these go directly to LEDs. For the program counter, these go into the ROM address logic (again more on that later).
The relevant operation of the 161 is described in the datasheet:
“The outputs (Q0 to Q3) of the counters may be preset HIGH or LOW. A LOW at the parallel enable input (PE) disables the counting action and causes the data at the data inputs (D0 to D3) to be loaded into the counter on the positive-going edge of the clock… A LOW at the master reset input (MR) sets Q0 to Q3 LOW…”
So on reset the outputs are all 0. When PE goes LOW, on the next clock pulse, the value on the inputs (P0-P3) is loaded into the counter and reflected on Q0-Q3. However, because CET and CEP are LOW the counter won’t actually count any further.
The program counter is a bit special, in that it is actually allowed the count by having CET and CEP set HIGH. This allows it to step through the instructions on a clock pulse.
In this case Q0-Q3 go off to the ROM address decoding, which I’ll come to in a moment.
INPUT Selection
There are two SELECT lines select the INPUT data as follows:
SEL_BSEL_ASOURCE00Register A (A)01Register B (B)10INPUT (IN)11Zero value (0)
Input selection is handled by two 74HC153 dual 4-input multiplexers. Two are required as there are four data lines to be switched, and they all have one of four options to switch between based on the SELECT lines above.
Here is the relevant part of the schematic.
On the left are the three sets of four data signals that come from the A, B and IN inputs. D0 from each of the inputs goes to U7/1Cn; D1 goes to U7/2Cn; D2 to U8/1Cn; and D3 to U8/2Cn. Notice that the fourth set of data signals (U7/1C3, 2C3 and U8 1C3, 2C3) are connected directly to GND for the “zero” INPUT state (SEL_A=1, SEL_B=1).
On the right, the two pairs of outputs make up the four data lines to feed into the adder section.
So where does the SEL_A and SEL_B signals come from? From the schematic, we can see:
- SEL_A = D4 OR D7 (via U10B – one of the 74HC32 2-input OR gates)
- SEL_B = D5
We can start to explain why some of the instruction combinations don’t exist (or at least, aren’t distinct) as we can see that SEL_A depends on either D4 or D7.
OUTPUT Selection
The OUTPUT selection is a little more complicated. As previously mentioned, there are four destinations: the two registers, the OUTPUT register, and the program counter.
Each register has a /PE (“parallel enable input”) signal which is active low. These are individually fed by the output of the LOAD# logic.
The three signals at the bottom are D6, D7 and D4. The lone signal top left is the carry (/C) flag, and the four outputs top right are the four LOAD# signals which feed directly into the /PE pins of the four registers.
So from this we deduce the following relationships:
- Reg A LOAD0 HIGH = D6 OR D7 – so LOAD0 is only active (LOW) when both D6 and D7 are LOW.
- Reg B LOAD1 HIGH = NOT D6 OR D7 – so LOAD1 is only active (LOW) when D6 is HIGH and D7 is LOW.
- OUT LOAD2 LOW = NOT D6 AND D7 – so LOAD2 is only active (LOW) when D6 is LOW and D7 is HIGH.
- PC LOAD3 LOW = D6 AND D7 AND (D4 OR /C) – so LOAD3 is only active (LOW) when both D6 and D7 are HIGH and either D4 is HIGH or the carry flag (C) is LOW.
This effectively means that D6 is used to select between registers A and B when D7 is LOW; and between OUT and PC when D7 is HIGH (subject to either D4 or the CARRY too in the case of PC).
Once again, we can see that there is some redundancy in the system for certain combinations of D4 to D7.
ROM Address Decoding
The 4-bit output from the program counter is effectively a 4-bit address bus. This gets turned into a set of selection signals to select which “byte” of the ROM should be active.
This simply uses a 74HC154, 4 to 16 line decoder, meaning that a 4-bit number goes in and one of 16 corresponding outputs goes LOW whilst the rest remain HIGH. There is no memory address or matrix handling – there is literally one control line per “memory” location.
The ROM itself is a set of 16 8-way DIP switches and diodes, so once its control signal is active (LOW) then those DIP switches become relevant on the data bus. Here is the last location and data bus logic. Note that all data signals are pulled HIGH by default, so will only be read as LOW if the DIP switch connects it to LOW via the diode, and that is only possible if that DIP block is selected from the 4 to 16 line decoder.
The 74HC540 is an inverting line buffer, turning any active LOW DIP switch settings into HIGH signals on the command/data bus. Recall that D0-D3 represent immediate data and D4-D7 represent command logic.
The Adder (ALU)
The arithmetic logic unit (ALU) for this CPU is a simple adder. A 74HC283 is a 4-bit binary full adder. “full” in that it supports 4-bit add-with-carry functionality, although in this design, carry is only used on the output stage – it doesn’t form part of the input addition.
A0-A3 comes from the INPUT selection circuitry, so can represent either register A or B, the state of the IN switches, or a fixed zero (0) value. B0-B3 comes directly from D0-D3 from the ROM contents as selected by the ROM addressing logic.
The COUT (carry) flag goes into a flip-flop and the active LOW version of the output is used as the carry flag in the LOAD# decoding logic to support the “JUMP IF CARRY” and “JUMP IF NOT CARRY” instructions.
Some of the ROM instructions require D0-D3 to be zero in which case the adder is effectively taking the input (A, B, IN, 0) and loading it into the destination register (A, B, OUT, PC).
Notice that the adder does not use the carry in (CIN). This is tied to zero. Apparently this was left floating on an earlier revision of the board, which caused spurious results!
Putting it all Together
Returning to our instruction table, we can see how the decoding of the D4-D7 lines leads to enacting the various commands. In particular, we can now expand the table to show how the SEL and LOAD logic results in selecting he source and destination registers as follows:
D7-D4D3-D0INPUTOUTPUTADD A, data0000dataAAMOV A, B00010000BAIN A00100000INAMOV A, data0011data0AMOV B, A01000000ABADD B, data0101dataBBIN B01100000INBMOV B, data0111data0BOUT B10000000BOUTOUT B10010000BOUTOUT data1010data0OUTOUT data1011data0OUTJMPC B1100dataB/CPC/noneJMP B1101dataBPCJMPC1110data0/CPC/noneJMP1111data0PC
As per the table, we can also now infer the missing, or duplicate, instructions.
In this table, the output will always be the addition of the INPUT and D3-D0, so everywhere 0 is specified for D3-D0 then in reality a value could be placed here instead. But then the instruction would take on a different meaning.
For example, MOV A, B is really MOV A, B+data, which really only makes sense when data is set to 0 otherwise overflows are very likely to occur.
It is also worth noting that SEL_A depends on either D4 or D7, and when SEL_A is set to 1 the input can only be either register B or zero. However, to output to OUT or PC, D7 has to be set. This means that instructions that act on OUT or PC can only take an input from register B or zero.
The two JMP B instructions are going to be of limited use too. They are essentially JMP to B+data instructions. There are probably some creative uses of such instructions, but for simplicity, keeping to the “0” versions that just depend on the immediate data is probably best.
Utility Blocks
There is one section of the circuit that hasn’t been considered yet. There is a block that provides the clock and reset circuitry.
The clock is based on a Schmidt trigger oscillator and can run on automatic or on manual trigger. There are two selectable speeds: 1Hz or 10Hz.
Both the clock and reset signals feed into the four registers and the carry flip-flop.
The remaining block is the power. It has a micro-USB socket and has to be powered from 5V directly either via the USB socket or directly into a 2-pin jumper header.
Conclusion
I have one on order. I’m looking forward to building it and giving it a go!
I really like the LEDs on the deluxe version, but that is a bit too much for me just for some messing around, but I am wondering how difficult it would be to attempt my own version with a few extra LEDs.
Assuming I manage to get one built and working, I’ll have a poke about at some signals and see what the art of the possible might be.
Kevin
#cpu #LOAD0 #TD4
#td4 #load0 #cpu
Kevin's Blog @[email protected] · 2025-09-18 · 17:45 UTC
TD4 4-bit DIY CPU
I was looking for DIY CPU projects, as I like kits that help me think at the lowest level of processing. It helps keep me grounded in how far technology has come over the years.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
Some of the options that I know about, that actually come as kits you can buy and are interesting for me for DIY computers are:
- RC2014 and compatible for Z80 based computers: https://rc2014.co.uk/
- Small Computer Central for a range of Z80, Z180, computers: https://smallcomputercentral.com/
- Ben Eater’s 6502 computer: https://eater.net/6502
- Nick Bild’s 6502 Vectron 64 computer: https://github.com/nickbild/vectron_64
But I wanted to go further down and actually find something that lets me build a simple CPU from gates. Here there are several options too:
- NAND to Tetris: https://www.nand2tetris.org/ (only available via emulation)
- Ben Eater’s 8-bit computer: https://eater.net/8bit
- Gigatron 8-bit computer: https://www.tindie.com/products/johnson/gigatron-ttl-microcomputer-diy-kit/
- TD4 4-bit computer: https://github.com/wuxx/TD4-4BIT-CPU
- TD4 4-bit computer deluxe kit: https://www.budgetronics.eu/en/building-kits/td4-deluxe-kit-build-your-own-mini-cpu-with-ttl-logic/a-26091-20
- MiniMax 4-bit CPU: https://www.tindie.com/products/denjhang/minimax-4bit-cpu-td4-architecture-cpusbc/
Whilst I’d love to build Ben Eater’s 8-bit CPU, the kit as provided is too much of an outlay for me. It is ~$300 – I mean, good for what you get and all the knowledge, but it is a solderless breadboard kit and that isn’t really what I’m after. The Gigatron is a distinct possibility that I’ll come back to at some point I think.
NAND to Tetris is excellent, and I have their book, but it is all emulated or virtualised, which does allow for all the scaling required for an (arguably) actually useful device, but isn’t designed to be built in actual hardware.
But the TD4 is really interesting. It is available as a PCB and components for approx £25 on Aliexpress and based on an open source design that shows the basic operation of a 4-bit CPU.
The “deluxe” kit mentioned above is a lot more expensive ~£120 but has all signals broken out to LEDs which, whilst is an awful lot of soldering, does looks incredibly impressive! The MiniMax is an evolution of the TD4 and kits for that are around £120. In fact, searching on Tindie and Hackaday.io for “TD4” will surface a few other DIY projects and even kits to purchase.
The TD4 does seem to fit the bill for me as an inexpensive kit to try. The downside is that documentation for it (in English) is pretty sparse.
The TD4 project itself is by “wuxx” an embedded engineer from HangZhou and much of the documentation is in Chinese. It is based on a Japenese book by Kaoru Tonami called “how to build a CPU” which can be found for ~£50 online, but as I don’t know Japanese either is unlikely to help me very much.
There are some sources of information that others have put together though, so I’m going to be using those as a starting point along with whatever I can figure out myself:
- The original GitHub project (plus online translation): https://github.com/wuxx/TD4-4BIT-CPU
- Philip Zucker’s “Guide to the TD4 4-bit DIY CPU”: https://www.philipzucker.com/td4-4bit-cpu/
- Kevin Gibb’s “teardownit” “DIY 4-bit CPU”: https://teardownit.quora.com/DIY-4-bit-CPU-Have-you-ever-made-a-processor-I-did-Took-me-just-12-microchips-and-a-clock-generator-The-processor-c
- Minoru Yamamoto’s “How to create a CPU TD4”: https://xyama.sakura.ne.jp/hp/4bitCPU_TD4.html
This post is my own “thinking out loud” as I work through the various parts to see how they work.
Basic Architecture
This is a 4-bit computer, with a 4-bit data bus, 4-bit commands, and a 4-bit address bus.
There is a block diagram on GitHub:
The fundamental process is as follows. For each “tick” of the computer:
- An OpCode is read from the ROM using the current 4-bit address (0 to 15) from the program counter.
- Each ROM entry is an 8-bit word with 4-bits as a command and 4-bits as data for the command.
- The data selector determines a 4-bit INPUT value. This can come from one of the two registers (A or B); or a set of four switches for the IN register; or be set to zero.
- This goes to the adder which adds it with the immediate data from the ROM (which could of course be zero).
- The OUTPUT of the adder can go to either of the two registers (A or B), an OUT register which is hooked up to four LEDS, or the program counter register to create a “jump”.
I’ll pull apart the different parts of the CPU in the following sections.
ROM Format
Each 8-bit word in the 16-byte ROM has the following format:
- 4 command bits
- 4 immediate data bits
Instruction Decoding
The 4 command bits from each ROM instruction have to be turned into the various selection signals to activate different parts of the CPU.
There is a table from GitHub again:
The explanation in Japanese translates (apparently) to:
“Explanation: The SEL_B and SEL_A signals select the ALU data source, while #LOAD0-#LOAD3 select the ALU data destination. More formally, they control the source and destination operands of instructions, respectively.”
From this we can note the following:
- There is no instruction for 1000,1010,1100 or 1101.
- Instruction 1110 appears twice, and the selectors set are dependent on the state of the C (carry) flag.
- Some instructions act on immediate data, others assume it will be 0.
The LOAD# have the following meanings in the system:
- LOAD#0 – Register A (A)
- LOAD#1 – Register B (B)
- LOAD#2 – OUTPUT (OUT)
- LOAD#3 – Program counter (PC)
The actual decoding happens in two parts: input selection; and output selection.
Registers
The system has four registers, each formed from a 74HC161 “presettable, synchronous, 4-bit binary counter”. There are two general purpose registers: A and B. There is one output register, whose contents drive the state of four LEDs. And there is a program counter. Here is the schematic for register A:
P0-P3 come from the output of the adder directly. RST and CLK are hopefully self-explanatory. For the A and B registers, Q0-Q3 go into the INPUT selection section (see later). For the OUTPUT register, these go directly to LEDs. For the program counter, these go into the ROM address logic (again more on that later).
The relevant operation of the 161 is described in the datasheet:
“The outputs (Q0 to Q3) of the counters may be preset HIGH or LOW. A LOW at the parallel enable input (PE) disables the counting action and causes the data at the data inputs (D0 to D3) to be loaded into the counter on the positive-going edge of the clock… A LOW at the master reset input (MR) sets Q0 to Q3 LOW…”
So on reset the outputs are all 0. When PE goes LOW, on the next clock pulse, the value on the inputs (P0-P3) is loaded into the counter and reflected on Q0-Q3. However, because CET and CEP are LOW the counter won’t actually count any further.
The program counter is a bit special, in that it is actually allowed the count by having CET and CEP set HIGH. This allows it to step through the instructions on a clock pulse.
In this case Q0-Q3 go off to the ROM address decoding, which I’ll come to in a moment.
INPUT Selection
There are two SELECT lines select the INPUT data as follows:
SEL_BSEL_ASOURCE00Register A (A)01Register B (B)10INPUT (IN)11Zero value (0)
Input selection is handled by two 74HC153 dual 4-input multiplexers. Two are required as there are four data lines to be switched, and they all have one of four options to switch between based on the SELECT lines above.
Here is the relevant part of the schematic.
On the left are the three sets of four data signals that come from the A, B and IN inputs. D0 from each of the inputs goes to U7/1Cn; D1 goes to U7/2Cn; D2 to U8/1Cn; and D3 to U8/2Cn. Notice that the fourth set of data signals (U7/1C3, 2C3 and U8 1C3, 2C3) are connected directly to GND for the “zero” INPUT state (SEL_A=1, SEL_B=1).
On the right, the two pairs of outputs make up the four data lines to feed into the adder section.
So where does the SEL_A and SEL_B signals come from? From the schematic, we can see:
- SEL_A = D4 OR D7 (via U10B – one of the 74HC32 2-input OR gates)
- SEL_B = D5
We can start to explain why some of the instruction combinations don’t exist (or at least, aren’t distinct) as we can see that SEL_A depends on either D4 or D7.
OUTPUT Selection
The OUTPUT selection is a little more complicated. As previously mentioned, there are four destinations: the two registers, the OUTPUT register, and the program counter.
Each register has a /PE (“parallel enable input”) signal which is active low. These are individually fed by the output of the LOAD# logic.
The three signals at the bottom are D6, D7 and D4. The lone signal top left is the carry (/C) flag, and the four outputs top right are the four LOAD# signals which feed directly into the /PE pins of the four registers.
So from this we deduce the following relationships:
- Reg A LOAD0 HIGH = D6 OR D7 – so LOAD0 is only active (LOW) when both D6 and D7 are LOW.
- Reg B LOAD1 HIGH = NOT D6 OR D7 – so LOAD1 is only active (LOW) when D6 is HIGH and D7 is LOW.
- OUT LOAD2 LOW = NOT D6 AND D7 – so LOAD2 is only active (LOW) when D6 is LOW and D7 is HIGH.
- PC LOAD3 LOW = D6 AND D7 AND (D4 OR /C) – so LOAD3 is only active (LOW) when both D6 and D7 are HIGH and either D4 is HIGH or the carry signal (/C) is LOW.
This effectively means that D6 is used to select between registers A and B when D7 is LOW; and between OUT and PC when D7 is HIGH (subject to either D4 or the /C signal too in the case of PC).
Once again, we can see that there is some redundancy in the system for certain combinations of D4 to D7.
ROM Address Decoding
The 4-bit output from the program counter is effectively a 4-bit address bus. This gets turned into a set of selection signals to select which “byte” of the ROM should be active.
This simply uses a 74HC154, 4 to 16 line decoder, meaning that a 4-bit number goes in and one of 16 corresponding outputs goes LOW whilst the rest remain HIGH. There is no memory address or matrix handling – there is literally one control line per “memory” location.
The ROM itself is a set of 16 8-way DIP switches and diodes, so once its control signal is active (LOW) then those DIP switches become relevant on the data bus. Here is the last location and data bus logic. Note that all data signals are pulled HIGH by default, so will only be read as LOW if the DIP switch connects it to LOW via the diode, and that is only possible if that DIP block is selected from the 4 to 16 line decoder.
The 74HC540 is an inverting line buffer, turning any active LOW DIP switch settings into HIGH signals on the command/data bus. Recall that D0-D3 represent immediate data and D4-D7 represent command logic.
The Adder (ALU)
The arithmetic logic unit (ALU) for this CPU is a simple adder. A 74HC283 is a 4-bit binary full adder. “full” in that it supports 4-bit add-with-carry functionality, although in this design, carry is only used on the output stage – it doesn’t form part of the input addition.
A0-A3 comes from the INPUT selection circuitry, so can represent either register A or B, the state of the IN switches, or a fixed zero (0) value. B0-B3 comes directly from D0-D3 from the ROM contents as selected by the ROM addressing logic.
The COUT (carry) flag goes into a flip-flop and the active LOW version of the output is used as the carry flag in the LOAD# decoding logic to support the “JUMP IF NOT CARRY” instruction. So returning to the logic of #LOAD3, we have:
```
  COUT    /C    D4   D6   D7    LOAD3
    0      1     X    1    1      0    -> Dst = PC
    X      X     1    1    1      0    -> Dst = PC
```
Hence a jump will only happen (i.e. the PC get loaded) either if D4, D6, D7 are all 1 (unconditional) or if D4 =0, D6, D7 are 1 (conditional) if the CARRY flag is NOT set by the adder, resulting in /C = 1.
Some of the ROM instructions require D0-D3 to be zero in which case the adder is effectively taking the input (A, B, IN, 0) and loading it into the destination register (A, B, OUT, PC).
Notice that the adder does not use the carry in (CIN). This is tied to zero. Apparently this was left floating on an earlier revision of the board, which caused spurious results!
Putting it all Together
The complete truth table for the SEL, D4-7 and LOAD signals is as follows.
SEL_BSEL_AD4D5D6D7LD0/ALD1/BLD2/OPLD3/PCADD A,i0000LL00000111MOV AB0001LH10000111IN A0010HL01000111MOV A,i0011HH11000111MOV BA0100LL00101011ADD B,i0101LH10101011IN B0110HL01101011MOV B,i0111HH111010111000LH00011101OUT B1001LH100111011010HH01011101OUT i1011HH110111011100LH0011111=C1101LH10111110JNC1110HH0111111=CJMP1111HH11111110
Returning to our instruction table, we can see how the decoding of the D4-D7 lines leads to enacting the various commands. In particular, we can now expand the table to show how the SEL and LOAD logic results in selecting the source and destination registers as follows:
D7-D4D3-D0INPUTOUTPUTADD A, data0000dataAAMOV A, B00010000BAIN A00100000INAMOV A, data0011data0AMOV B, A01000000ABADD B, data0101dataBBIN B01100000INBMOV B, data0111data0BOUT B *10000000BOUTOUT B10010000BOUTOUT data *1010data0OUTOUT data1011data0OUTJNC B *1100dataB/CPC/noneJMP B *1101dataBPCJNC1110data0/CPC/noneJMP1111data0PC
As per the table, we can also now infer the missing, or duplicate, instructions (marked * above).
In this table, the output will always be the addition of the INPUT and D3-D0, so everywhere 0 is specified for D3-D0 then in reality a value could be placed here instead. But then the instruction would take on a different meaning.
For example, MOV A, B is really MOV A, B+data, which really only makes sense when data is set to 0 otherwise overflows are very likely to occur.
It is also worth noting that SEL_A depends on either D4 or D7, and when SEL_A is set to 1 the input can only be either register B or zero. However, to output to OUT or PC, D7 has to be set. This means that instructions that act on OUT or PC can only take an input from register B or zero.
The two JMP B instructions are going to be of limited use too. They are essentially JMP to B+data instructions. There are probably some creative uses of such instructions, but for simplicity, keeping to the “0” versions that just depend on the immediate data is probably best.
Utility Blocks
There is one section of the circuit that hasn’t been considered yet. There is a block that provides the clock and reset circuitry.
The clock is based on a Schmidt trigger oscillator and can run on automatic or on manual trigger. There are two selectable speeds: 1Hz or 10Hz.
Both the clock and reset signals feed into the four registers and the carry flip-flop.
The remaining block is the power. It has a micro-USB socket and has to be powered from 5V directly either via the USB socket or directly into a 2-pin jumper header.
Conclusion
I have one on order. I’m looking forward to building it and giving it a go!
I really like the LEDs on the deluxe version, but that is a bit too much for me just for some messing around, but I am wondering how difficult it would be to attempt my own version with a few extra LEDs.
Assuming I manage to get one built and working, I’ll have a poke about at some signals and see what the art of the possible might be.
Kevin
#4bit #cpu #load0 #load3 #td4
#cpu #td4 #load0 #load3 #4bit
faraiwe @[email protected] · 2024-08-07 · 15:30 UTC

#TropicalStormDebby #TD4 still wreaking havoc.
About 35-40miles off coast of #SouthCarolina, and veering into land, projected to cruise over #NorthCarolina, then turn into Tropical Depression as it moves onto #Virginia, #Maryland and #Pennsylvania. Expected to go as TD all the way to SE #Canada coast, into banks off #NovaScotia, Gulf of St Lawrence.
Coastal surges expected at about 3ft, and potential for flash flooding.
Be safe, don't underestimate the storm.
#weather #NHC #NOAA

#weather #nhc #noaa #tropicalstormdebby #td4 #southcarolina
faraiwe @[email protected] · 2024-08-07 · 15:30 UTC

#TropicalStormDebby #TD4 still wreaking havoc.
About 35-40miles off coast of #SouthCarolina, and veering into land, projected to cruise over #NorthCarolina, then turn into Tropical Depression as it moves onto #Virginia, #Maryland and #Pennsylvania. Expected to go as TD all the way to SE #Canada coast, into banks off #NovaScotia, Gulf of St Lawrence.
Coastal surges expected at about 3ft, and potential for flash flooding.
Be safe, don't underestimate the storm.
#weather #NHC #NOAA

#noaa #nhc #weather #novascotia #canada #pennsylvania
faraiwe @[email protected] · 2024-08-05 · 22:47 UTC

#TropicalStormDebby (formerly known as #HurricaneDebby #TD4 #IN97L ) is passing over NE #Florida
#weather #NHC #NOAA

#tropicalstormdebby #hurricanedebby #td4 #in97l #florida #weather
faraiwe @[email protected] · 2024-08-05 · 22:47 UTC

#TropicalStormDebby (formerly known as #HurricaneDebby #TD4 #IN97L ) is passing over NE #Florida
#weather #NHC #NOAA

#noaa #nhc #weather #florida #in97l #td4
faraiwe @[email protected] · 2024-08-05 · 12:59 UTC

#HurricaneDebby Made landfall, as a Cat1, along the Big Bend Wildlife Management Area. Currently over Upper Steinhatchee Conservation Area.
It is expected to revert to a #TropicalStorm soon, as it approaches town of Live Oak, Fla.
#Florida #NHC #NOAA #TD4 #IN97L

#hurricanedebby #tropicalstorm #florida #nhc #noaa #td4