I might have to give up on the "RAM issues" motherboard

Рет қаралды 50,848

Күн бұрын

Пікірлер: 245

@PP-xy9bg 7 ай бұрын

RAM decoupling old tantalum caps. When you load the bus with more RAM chips, you get brown outs, because these old RAMs are quite power hungry. I think changing the tantalum will fix the issue.

@bentboybbz 7 ай бұрын

That's something I always wondered about myself, I've run across many situations where the power was causing issues with not only new electronics but even older stuff, once I worked on a house that had fried every LED light, including the ones built into the their bougie azz touch mirrors, fried the stove, five expensive fans, four expensive thermostats, and much more, the power company came out and said it was inside the house, after a lot of troubleshooting, finding 240v on the metal breaker panels also, found out the main disconnect the power company installed, charged my customer for it, because he has a large automatic generator, was installed incorrectly and had fried and melted into the disconnect... Had no noalox on the source or load, was overtightened badly, replaced it myself properly, at my customers cost, he didn't want them doing anything else to his house lol. Id love to see the power on the oscilloscope, if it's a little nasty or something else in the house is feeding back it could cause some intermittent problems when troubleshooting, Thank You For Your Time And Effort!

@SyldabiaHacks 7 ай бұрын

Can be. It's a job to do and test results.

@suvetar 7 ай бұрын

Sneaky - I love your thinking though! Genuine question - and please bare in mind, all I know about electronics what I've gleaned from fanatically following Adrians videos! ... Is there a way you can test certain components on a board without desoldering them, by using a very high OHM resistors between the probes and the rest of the circuit? or would the natural flow still make the electrons go to the path of least resistance?

@urgtuiop5455 7 ай бұрын

@@suvetar You can do in cct testing but sometimes not as easily as just adding a big R. Depending on the cct you can drop voltages to below Vbe-on (transistor) or Vf (diode) i.e.

@lowrybt1 7 ай бұрын

I had a similar issue with an old OSI C8P. Bad cap adjacent to RAM bank or cap with lead just slightly broken at solder point on MB.

@Anaerin 7 ай бұрын

The phantom "Game Port" and "Clock", and the lack of BASIC ROM indicates to me something weird is happening with the address lines - The BIOS is reading something from the Game Port address (201h) and the clock port and getting value, and is trying to read from the BASIC ROM address and not getting anything. And the second bank of RAM causing errors tells me it's at least an address between 256k and 512k (address lines 10 and 11 - 0x201 is also on line 11).

@michaelturner2806 7 ай бұрын

I don't know anything about repair, but this sounds like a good line to go down. The ram errors change every time, and with only 256k haven't shown up, but the extra hardware detected and inability to boot basic have been consistent the whole time. If he comes back to this, maybe set aside the ram error and try to tackle those other problems, and might solve the ram parity error along the way.

@DarkAlaranth 7 ай бұрын

Kind of wonder if there's still corrosion crud under that ISA socket he mentioned. a slightly conductive link between two pins perhaps?

@marcinmiklaszewski9336 7 ай бұрын

Yep, I'd surely chceck for those higher address lines cross-talk

@daybyter 7 ай бұрын

Could you close in on this by adding your own test routines to a customized test rom?

@suvetar 7 ай бұрын

How might ou you capture the address bus for 201h?

@MonochromeWench 7 ай бұрын

The detected gameport and clock really seem to me like the best indication of where things are going wrong, I really think some IO signals are ending up at the ram. This seems like a problem best solved with a logic analyser or at least a quad channel scope on those io and ram r/w lines and seeing if any are enabled at the same time. Might also try just disabling ram bank 2 on the dip switch but leaving the chips installed and seeing if that helped. If it has issues with just the chips installed but the bank disabled with the dipswitch then it might be a power issue as suggested by others. or an addressing problem that might be seen with multichannel channel scope on the cas/ras lines of both banks. It might also help to pull out some of the Bank2 chips till it starts working

@noahhill8483 7 ай бұрын

There's a good chance that's correct. What I'm assuming is if the system is reading 201h (the game port) that tells me that different signals from somewhere are ending up where they aren't supposed to be. And if that's happening for the gameport and clock, I bet if anything that those same signals are ending up in RAM. I'm looking through the assembly code of the BIOS RN to see if I can pinpoint anything that could narrow it down further

@FrancisFjordCupola 7 ай бұрын

Imagine if a bad board went around from retro channel to retro channel until finally someone manages to solve it... would be a thriller...

@c1ph3rpunk 7 ай бұрын

That’s a (series of) crossover episodes I’d love to see.

@danieldawson4937 7 ай бұрын

First time I've seen Adrian stumped by a hardware issue, yet still equally enjoyed the content. Thanks for sharing!

@urgtuiop5455 7 ай бұрын

Desolder and check under the edge connector where the corrosion was. There's a bunch of address lines there plus ale.

@roboftherock 7 ай бұрын

I admire your tenacity in trying to resolve this problem.

@tony359 7 ай бұрын

I thoroughly enjoyed this video while swapping about 100 SMD capacitors on a board. Kind of therapeutic :) I love the in-depth discussion, I really look forward to a solution though I appreciate it might not be the case of course, so no pressure!

@ser_olmy 7 ай бұрын

Adrian, you are definitely on to something regarding the I/O addressing. The fact that the BIOS consistently detects a non-existing clock and a game port is a major hint that something is messed up with I/O, and the most likely explanation for the BIOS detecting these non-devices is that at least some I/O reads (and probably writes as well) gets redirected/mirrored to RAM. If I/O access gets mangled up with RAM access you'd expect some RAM addresses to work and others not, depending on which I/O addresses are in use. And since I/O addressing wraps around at $0400 on ISA systems, the issue would affect basically every memory bank. It's interesting to note that both the CGA card and the XT-IDE work, which you wouldn't expect if all I/O access was mangled. However, if the I/O issue is limited to certain onboard components using I/O, of which there are precious few on a PC XT, you'd get symptoms very much like the ones you're seeing. Unfortunately, the BIOS doesn't report exactly on which I/O addresses it believes its seeing a clock, but while XT clocks were non-standard and could use a wide range of ports, the game port sits squarely at hex 201. If you can somehow track down the component responsible for the phantom I/O devices, you'll probably also have found the reason why RAM errors are being reported. How about probing pin 28 (IO/M) on the CPU, and then trace the signal through the various chips on the board to see if either memory or I/O access is being selected when it shouldn't be? Perhaps a tiny program that just runs in a loop with interrupts disabled could help narrow it down?

@herauthon 7 ай бұрын

if i/o signals for IDE is ok.. might it be a Rogue IRQ ?

@vchris31415 7 ай бұрын

I agree that it looks like there is something with the decoding of the IO/Memory signals. The XT has the 8088 in maximum mode, so S0, S1 and S2 on the 8088 are sent to the 8288 for IO/M decoding. Four signals come off the 8288 - IOR, IOW, MEMR, MEMW. Those lines go to sheet 5 and 10 on the 5160 schematic - to U13 on sheet five and the expansion ports on sheet ten. U13 output goes to a bunch of sheets - I like sheet 8, which has U58 (74LS32) buffering data from the 8253. I think that the 8523 is being activated when it shouldn't be.

@FRKNetwork 7 ай бұрын

It seems likely that whatever is causing your phantom game and clock port is also causing the RAM and parity errors. I dug into the Turbo XT BIOS source a bit after watching this episode. Both the game port and clock detection code work by reading an I/O device via the "in" instruction. To check for the game port, for example, the BIOS reads from the I/O device at 201h 100 times. If it ever gets something other than 0xFF it decides that a game port is present. Interestingly, the CPU talking to an I/O device is nearly identical to talking to memory. The main difference between the two is the behavior of pin 28 - IO/M. It's high for an IO operation and low for a memory operation. That tells me that whatever is misbehaving on that board is misbehaving for both memory and IO operations and that whatever it is fails consistently when checking for the game port or clock. The code to check for the clock is more interesting. It checks for a clock at 2C0h, 240h, and 340h, but it only checks each ONCE. That would imply a consistent failure if the clock is always detected. Could you maybe have a bad CPU? Or something is shorting pin 28 on the CPU to ground? I'm also suspicious about the machine often doing a warm boot after a power cycle.

@DanieleMamelicoolestaura516 7 ай бұрын

I noticed the added/removed peripherals in the last video, but I forgot to leave a comment! I love your videos!

@fredknox2781 7 ай бұрын

To refresh a DRAM, it is not necessary to run through every address. A whole column of bits is refreshed at once, so it is only necessary to cycle through all the row addresses. So with every refresh operation, one of the rows is selected by the adddress lines and the row-address-strobe line is toggled. in a 41256, only 256 rows need to be refreshed every 4ms.

@siberx4 7 ай бұрын

If your cameras are running out of battery, I suggest picking up some dummy batteries. They're usually USB powered and most cameras have some kind of hole/slot in the door so you can hook them up to power with the battery door closed, and you can then run the cameras indefinitely on a bench setup like this. As a secondary benefit, because you're not discharging a lithium ion battery inside the camera housing, they tend to stay a bit cooler so if you ever have overheating problems recording for long time spans, dummy batteries can help somewhat to get longer recordings before the camera needs to cool off.

@FaithyJo 4 ай бұрын

The machine spirit is with you Adrian

@insanelydigitalvids 7 ай бұрын

Please don't take this the wrong way(!) but your frustration is our entertainment. Pure geekery at its finest. Thanks, Adrian 🙂

@Walczyk 7 ай бұрын

i disagree, i want adrian to be happy

@KevinDotDay 7 ай бұрын

Sorry if I missed that you've already looked at this, but check the HOLD line going into the CPU. This tells the CPU to wait while DMA is happening. If that isn't happening, the CPU memory accesses won't wait for DMA (like the dram refresh) and they'll occasionally conflict and cause crazy intermittent corruption like this.

@andyhu9542 7 ай бұрын

Worth looking into. However, I would expect such a fault would cause the bus to conflict like crazy and not work at all. There's no way the board can boot into DOS with this fault present.

@KevinDotDay 7 ай бұрын

@@andyhu9542 Agreed, if it was totally dead it would probably be far worse. But I think he's made some good guesses that this is potentially connected to refresh circuitry, and I've seen similar weird corruption when the CPU wasn't holding correctly. It might not be reliably generated (isn't there an inverter in line there?) or a pad socket is making it slow to transition. Worst case it's a good place to trace backwards from. I'm not betting on my own suggestion, but i would be surprised if it was just totally missing or something. It's just where I'd look next and go from there.

@RuSrsbro 7 ай бұрын

I've had issues with the hold line going wacky when the system attempted to access a drive, I Believe it was on a 5150 style computer but I can't remember when.

@TomStorey96 7 ай бұрын

Refresh is initiated by strobing a row address into the DRAM using RAS. That causes the DRAM to latch an entire row, which then gets written back at the end of the cycle. CAS isn't necessary unless you then want to read or write a column, so a refresh is basically an access cycle without the access. This is usually known as "RAS only" refresh. Newer DRAMs can also do what is called a "CAS before RAS" refresh whereby strobing CAS before then strobing RAS causes the DRAM to refresh a row based on an internal row counter. In this mode you don't have to supply a row address because the DRAM takes care of that detail itself.

@networkg 7 ай бұрын

It is enjoyable to see you apply logic, schematics and a red pointer to develop a theory of attack to solve these retro problems.

@krzbrew 7 ай бұрын

The red pointer is important

@rarbiart 7 ай бұрын

i watched the complete Episode without skipping. I would do it for a second one. I love how the pathes are leading into dead ends. that is quite a learning experience to watch. My bets are on "5v rail ripple"

@togst 7 ай бұрын

These videos are extremely helpful. Your methodical approach and your explanations help a lot on understanding how these circuits work. I am in an unfortunate situation where I soldered together an Amiga 4000 using a brand new PCB and a mix of brand new parts and NOS for the most exotic bits. The computer boots, but have some stability issues and show graphical problems (and eventually freeze) when scrolling graphics happen. These videos have been of great value to me in order to understand and pinpoint possible culprits. Hopefully I will get to the bottom of my problems thanks to you sharing your experience and knowledge, Adrian!

@atkelar 7 ай бұрын

The "thinks it's warm starting" issue is interesting. The process behind it is essentially that there's a "magic word" written to a specific location in RAM. I think something like 0x55AA or 0xAA55 (some on/off bit combo) and if that 2-byte value is present during start, it is considered a "warm start". if not, it's a cold start. After booting, the value is put into the location, so as long as the RAM holds it's values between boots, it is considered "warm start". To "software cold boot" you first have to clear that magic bytes an then jump to the init-vector of the BIOS.

@agranero6 7 ай бұрын

I didn't know that I just thought that were some vectors on the low memory that if were not initialized (meaning had the correct values) it was considered a cold start. I didn't think I would see you here by the way.

@nurmr 7 ай бұрын

It's at location 0:472h, and the warm boot magic value is 1234h. (Thanks to The Assembly Language database published by Peter Norton Computing, Inc. in 1987)

@TomStorey96 7 ай бұрын

@@agranero6 the vectors are for interrupts. If they aren't initialised to point to valid software routines when you enable interrupts and try to call one, you'll just start executing code from random locations which may or may not do anything useful. You don't need them to differentiate between cold and warm boots, but you probably could use them in that way. Afterall, you're just comparing the contents of a memory location to see if it's a certain value, and that value could always be some interrupt vectors.

@TheDefpom 7 ай бұрын

Adrian, @10:47 I see a flip flop there just above your hand, that would certainly be a suspect, in my experience the most common failures of active devices in the gear I fix (not computers) is op-amps, followed by flip flops... so whenever I see a flip flop attached to a section that is misbehaving, that is a part I will look very closely at.

@AsahelFrost 7 ай бұрын

Using a two channel oscilloscope allows you to focus in on different parts of the memory cycles. For example, connect one channel to the refresh DMA enable and trigger the scope from that, and use the other channel to examine the data, address and chip select signals while the DMA enable is active.

@rantsfromcanada1656 7 ай бұрын

*** 3 Years Later *** Adrian robbing the 25th part from his 'parts board'... "Oh! That's what it was!"

@jaredwright5917 7 ай бұрын

The 24S10 in the original XT is a PROM that maps address ranges to RAS lines based on the configuration DIP switch settings. The one XT clone I have used the same chip but only uses two locations in the whole PROM for switching between 256k/640k, which is selected by a jumper. That PAL is probably just doing the same job in this system. If it has any floating pins, it could be messing up the RAS lines by suddenly activating the wrong RAM bank. Another possibility is a marginal delay line screwing up the RAS/CAS timing.

@michaelallen1432 7 ай бұрын

I was thinking of the delay line as well.

@philipl8184 7 ай бұрын

Try Ruud's Test ROM..the updated one on minus zero degrees. It's quite comprehensive and still being actively updated.

@peachgrush 7 ай бұрын

I took a look at the BIOS source code. It looks like the game port is detected by probing the port 0x201 100 times. If a value other than 0xFF (the floating data bus pulled up) appears at least once, it declares the port available. So, looks like one of the data bits is not being pulled high on I/O accesses. A cracked pull-up resistor array, possibly? A cracked/cold solder joint? Or a bus conflict of some kind? You could try replacing the 100-times loop in the BIOS with an infinite one and then probe the data bus with the oscilloscope to see what the bit patterns for D0...D7 look like.

@adamclark1928 7 ай бұрын

Adrian, at the start of the video, I was wondering if the rise/fall time of the signals were fast enough. At about 1:07:56 (give or take) you mention the resistors for the multiplexers for the DRAM. From what I can see on my shoe-phone, they look like 220R or 330R resistors. Now, I am not sure if they are limiting current or pulling the signal down (with TTL logic I'd be more willing to put my money on pulling the signal, especially given the number of connected ICs). 220/330R is a pretty hard pull. If one of those resistors went open or high(er) resistance the signal might look more like a capacitor curve when zoomed in on the scope but still look reasonable when zoomed out. I do hope to see a follow-up when you can clear your head. Thank you!!!

@erickvond6825 7 ай бұрын

If it were me, I'd take a good hard look at the sockets. I'd also be taking a long look at the capacitors in the memory bank. If you have an ESR meter I think it would be interesting to see how many of the capacitors in the memory bank are different. Another thought is to do a close inspection of the solder joints under a microscope if you have one.

@microcorelabs7698 7 ай бұрын

An MCL86+ in place of the CPU could perform focused read-write tests on the DRAM for both your motherboard and ISA card.

@TheDefpom 7 ай бұрын

@1:07:06 it looks to me to have some pulses that are at about 2V level, these might be causing issues as they are right at the threshold of being a 0 or a 1.

@v12alpine 7 ай бұрын

Measure VCC on the various RAM chips, look for noise. Compare with one bank vs multiple installed. Swap tantalums, some of them might be open.

@cmahte 7 ай бұрын

All this says to me that the ram is struggling to operate at the clock speed of the board. Ram in that day could be too slow OR too fast for the board. Stuff like what you're seeing happened to me, but it was on a 386sx40 ... which ran screaming fast but only had a 16bit bus... and it was difficult to expand the board memory, and I eventually gave up and just used 8 or 16 bit expansion buss memory, which killed the system speed, but let me run bigger programs. But .... find -150 memory (I think), not -120 ... it appears the board has mixed -150 ms (or is it us) and -120ms chips. that screams smoking gun.

@melkiorwiseman5234 7 ай бұрын

At the moment, I'm in the ball court with the people who are saying that the tantalum capacitors designed to suppress noise may be the main problem. If they've gone low or otherwise failed, there may be switching noise causing problems. What I'd try would be to temporarily solder on some extra bypass capacitors (especially near the RAM) and see if that makes a difference. If not, you've only wasted a little time. If it improves things, then you've likely found the problem. EDIT: Could it be that the PSU you're using for that older board isn't supplying sufficient current on the 5V line? Maybe those chips are taking more current than more modern chips would? EDIT2: If the problem is insufficient power to run the memory, then I'd be looking for a pattern that the more hardware is both connected and doing something, the more likely it is for the RAM to give errors, although to be fair, that could also be a problem with the bypass capacitors. EDIT3: Isn't the LS243 merely a buffer while the LS245 is a latch? What if someone put the wrong chip into the board, substituting a 245 for a 243? If that's so, the 245 could still have its output pins enabled instead of sending them to "high impedance" mode once the input has ceased. I'd be inclined to try putting a 243 in that socket and see if that has any effect on the problem. And you did say that the "enable" pin was tied active, so that supports my theory. EDIT4: The errors you're getting only cause me to be even more suspicious that the 245 is entirely the wrong chip in that location and it should be substituted for a 243. 47:00 is the one, in case you're lost. EDIT5: NM, my error. Both are bus transceivers, but the 243 is 4-bit and the 245 is 8-bit. My bad. I still suggest that the problem is insufficient switching noise suppression.

@jandjrandr 7 ай бұрын

It could be flaky tantalum caps like others have said, but they more often fail short and any shorts would take down the system entirely. It isn't impossible to have a marginal tantalum cap, but that is extremely rare. Testing for that wouldn't be easy though. However, to me all of the symptoms so far points to a bank select issue between the RAM banks. Almost like either it is selecting banks at the wrong time or multiple banks at once, but we would expect to see conflicts on the RAM data bus on the scope if it were selecting multiple banks. It is quite possible there is a poor sync when bank selection is happening leading to some overlap. The DRAM refresh then could be selecting banks when refreshing, but CPU RAM access tries to get back to the bank it was looking at, but due to addressing slew it didn't switch in time. The DMA might not be affected, but as soon as the CPU tries to see its RAM it is accessing the wrong place for a fraction of time. Might not be noticeable on the scope if that were the case and it would look like gradual or random corruption of RAM.

@shmehfleh3115 7 ай бұрын

The evergreen, bulletproof way to check if your PC has hung is by seeing if the caps, numlock or scroll lock keys are still responding. If they won't turn on or off, the sucker is hard-locked.

@eDoc2020 7 ай бұрын

IIRC the XT and AT don't have keyboard indicators so the numlock trick doesn't work.

@Dethernal 7 ай бұрын

I think it worth checking how POST detects clock and game port. Maybe fixing problem from that lead are easier? I suspect I/O can leaks into memory but only in specific conditions?

@paulcohen1555 5 ай бұрын

I hope you are reading all the comments. I don't know if you already fixed the stubborn MB, but I know from my bad experience with the very old motherboards that they were bad copies with only two SIDED PCBs and not MULTILAYER. As result they sometimes worked only in the case when installed over a ground plane. Is it the case?

@JASPACB750RR 7 ай бұрын

I have no clue about anything computer. I found you channel 8 months ago and it is so interesting. Your mannerism, engagement with the audience, knowledge on all the stuff you’ve shown, ability to diagnose, and much more. It’s all very engaging and both your channels are amazing! All that said, have you tried that dip switch selector down by the bios? It’s not that far from where the corrosion and leaky battery was. Maybe something leaked into it while in the upright position and the switch is bad? Allowing voltage crossover inside of it? Again, I know nothing about computers and their components. And this is just an outside look. Maybe it’s possible or maybe I’m way off base.

@JASPACB750RR 7 ай бұрын

Please don’t take this as a bad thing. Your channel is also what I use to help shutoff my brain and fall asleep. Not that your content, voice, or anything is boring. It’s quite the opposite. If something bores me my brain starts wondering and thinking about 10 things at once. Your videos are so engaging, entertaining, informative, descriptive, and detailed that it’s keeps my mind focused on what you’re talking about and doing. With that, I can slow my mind and actually sleep. Thank you greatly for the content and uploading so frequently.

@kaitlyn__L 7 ай бұрын

@@JASPACB750RRyeah, Adrian’s chill but still engaging. But he’s chill enough that you absolutely can safely have it on while asleep. I’ve got a whole playlist filled with people who speak calmly enough for that purpose. Like Radio 4! But about computers 😊

@jakethetech4958 7 ай бұрын

Double check the power on those chips. I have had a failing cap on a board cause the occasional power drop giving me the wildest of random issues.

@jakethetech4958 7 ай бұрын

maybe try the cold/hot trick on it. Just a thought. But I am convinced it is a cap.

@stevehorne5536 7 ай бұрын

One thing that bothered me is when you tested the suspect board with the known-good DMA controller chip at around 45:30ish. If you have multiple hard-to-separate issues, this probably doesn't tell you much. If the suspect board doesn't work with the known-good chip, in principle that just tells you you still have a problem on the suspect board - it doesn't tell you that the suspect chip is good. To know if the suspect chip is working, test it in the known-good board, so that it's the only suspect component in the test. If you prove the suspect chip bad, you still need to try the board with the known-good chip to try to diagnose remaining problems, but you know better than to put the now known-bad chip back in the board. I understand things will normally go faster if you just test the known-good chip in the suspect board, and even if there are other problems on the board, usually the change of chips will change the behaviour of the board. I just think maybe this is a particular special case where maybe two or more components involved in RAM refresh have subtle problems that are each causing similar symptoms, so there isn't necessarily an obvious change of behaviour by replacing any one component.

@David_Ladd 7 ай бұрын

@adriansdigitalbasement2 Great video and thank you for sharing! As far as the RAM issue. Since it seems to have worked when you had only 1 bank installed, but stopped when you added the second bank. What about the bank selection circuit goes? Could multiple banks being selected at the same time causing a buss conflict?

@LozzTheDev 7 ай бұрын

Maybe some decoupling cap issue's? What does the 5v supply line look like around the drams in terms of noise?

@user-nd8zh3ir7v 7 ай бұрын

I enjoy the deep dives, some of these old computers really do have gremlins

@grinderkenny 5 ай бұрын

To test the system if it's locked up press the caps lock or num lock and see if the light turns on and off. If the system is locked up. It can not operate the light for the num and caps lock light

@PatrickFinnegan 7 ай бұрын

The second bank causing problems makes me think there's some cross talk between CAS lines for the two banks, or a some other line and the cas line for the second bank... Like some junk between pins or something

@andyhu9542 7 ай бұрын

That's what I think, too. Adding a second bank causing trouble just shouts 'bus conflict'.

@solarbirdyz 7 ай бұрын

@@andyhu9542 Or power load/supply problems, or ground problems, which are basically the same thing in effect. Like maybe there's an intermittent ground connection on bank two, like a passive component that just moved out of tolerance and is causing it to happen.

@PatrickFinnegan 7 ай бұрын

It should be easy to test by putting the extra RAM in a higher bank and skipping the second bank.

@andyhu9542 7 ай бұрын

@@solarbirdyz I thought about power problems or even EMI issues with those newly installed lighting in the basement. However, although he did not show the +5V waveform in the video, the digital signal looks relatively clean and that implies an acceptable power ripple.

@jeremywh7 7 ай бұрын

I'm curious if the '92 OKI bus controller IC ran cooler than the '83 NEC? And ya, I also wondered if some of those tantalum's are flaky... at least compare them with an ESR meter in-circuit? But thanks as always!

@MrBrianms 7 ай бұрын

Infrared Camera to see what gets hotter than it should. I was also thinking the mainboard may have a dry joint that needs to be reflowed. Interesting problem. Going away from it to do something else works for me too. It gets processed subconsciously and then goes to the project with a fresh eye. Thanks.

@moshly64 7 ай бұрын

You need a scope with a separate channel / trigger input so you can trigger on a chip select or enable signal. Then probing the data bus will be synchronized to those types of bus cycles.

@laserhawk64 7 ай бұрын

35:12 "Ooh, that socket's a little bit damaged, there..." when you went and pulled the parity chip. Did you replace the socket? Did you check the other RAM sockets for damage? You never said, if so. Hmmmmm... ;3

@DartMatter 7 ай бұрын

Serious Deja vu! I lived this exact experience with a DTK non-turbo motherboard about two years ago. There is a long story, but it turned out to be a RAS/CAS/Address timing issue. There is a delay line that separates the CAS time from the RAS that could be the problem, but I always see this component soldered in. In desperation, I tried swapping it with one from another board and that was not my problem. Some DTK boards use 74S157 for address multiplexing whereas IBM uses 74LS158. A difference between these is propagation delay. Maybe the multiplexers are failing or are the wrong ones. Why would the “wrong” chips be in there? That’s part of the long story. In the end, I swapped ‘S157 for ‘LS158 (or other way around, can’t recall) and that board has been running ever since.

@andygozzo72 7 ай бұрын

you'll need to check around all signals and chips involved with ram and io selection, i think, not just 'buffers' , theres a lot of gates in there as well, it only needs one output to be weak at pull down to cause issues, you may have to fully socket it! it'd then be useful for other testing purposes if you can sort it eventually 😁

@Cherijo78 7 ай бұрын

One thing I think this video demonstrates is why deeper troubleshooting like this went the way of the dodo for many people once we hit the PC era. The issues are complex and difficult to trace out, and once PC parts became commodity, it just wasn't worth it anymore precisely because of how complex it can be to figure out the exact issue. I will echo others here that this is starting to look more and more like it is potentially related to the game port and clock showing up as enumerated and or there's a problem with the board with noise, possibly via the capacitors and or resistors. I'm very hesitant to jump to recapping, I think it's far over suggested in general, but in this case It may actually be time for a recap; tantalum caps are known for going bad on this era of motherboard, and this may be an early indication.

@8bitwiz_ 7 ай бұрын

Hey, I finally got one of those little oscilloscopes, except it was "Zotek" brand, and it's great! I had an analog 1990s-era scope and it's completely primitive in comparison. Mostly I needed a frequency counter too for an old 6802 board that I've been trying to get working, and it's good to have a digital scope now, because they do that.

@fredknox2781 7 ай бұрын

I suggest you use the Trigger funtion of the oscilloscope (may need a better one than the handheld.. maybe your NI one). With it, you can trigger on the DMA pulse, for example, and see what other signals line up with it.

@flemmingschandorff57 7 ай бұрын

Just a thought, check resistance packs, for pull up and down and in between. It has been seen before that resistance breaks and has strange errors.

@jerwahjwcc 7 ай бұрын

There were a couple of times you mentioned the voltages being a little low on the scope. If one of those chips is borderline a little low might be a little too low?

@jerwahjwcc 7 ай бұрын

And adding in the extra ram lowers the voltage just a bit more

@jorgelotr3752 7 ай бұрын

I your system thinks it's warm sarting and warm starting requires certain keys to be pressed, there may be some issue with the keyboard circuit (or with the circuit responsible to turn that key press combination into a "warm start" signal).

@KorAllRBare 7 ай бұрын

@ 1:59 I wonder why those Basic rom chips are out of order like that? Could have sworn I seen them in some other order.. yeah I must be getting old.. BTW.. Have yo all checked the caps "The ceramic ones" near the ram? what with the Extra Ram ergo extra noise

@SimonZerafa 7 ай бұрын

Could the jumper pads by the PAL be selecting 64K/256K DIMMS for each bank of RAM? Does this board support 1 MByte RAM installed? 🙂 Alternatively have you upset the Sophons in aome way to cause them to mess with your diagnostics? 😁

@docwhogr 7 ай бұрын

may i ask a stupid question? if you start to take out the decoupling capacitors from a working motherboard what problems are going to manifest and at what number of missing caps... i'm guessing unstability and random crashes... (this is a reverse troubleshooting or an "experiment" for "experience")

@CandyGramForMongo_ 7 ай бұрын

Day in the life of a C64 repair tech! If we had to work on the board over half an hour, it was a board swap, the most expensive option. We cured more than we didn’t. Some were obvious (PLA), but some made no sense whatsoever. Gremlins!

@Renville80 7 ай бұрын

Yup. Every now and again, you find those boards that just don't want to behave. Those are better off recognizing the symptoms and putting it out to pasture.

@CandyGramForMongo_ 7 ай бұрын

@@Renville80 During down time, we’d screw around with the reject boards, getting one back working some of the time. Most were multi component failure, not worth fixing with a brand new board for less money.

@brianfaherty31 7 ай бұрын

I would focus on why the system is showing a clock and a game port when none exists. I think that might be a better indication of what is really going on

@TechBench 7 ай бұрын

Bypass caps on the RAM chips? (The chips themselves might be good, but the power to them might not be...)

@InssiAjaton 7 ай бұрын

Why does it work on ROMs? I have a stupid answer - ROMs don’t use refresh… My background is 100% on static RAM systems, but I still have got weary about timing issues in any PAL or Gal treatises. The more chips in a chain, the more delay accumulates. But I also have been worried about the small power supply not being adequate with a massive set of memory chip being added. Finally, it is my understanding that some, if not most (?) chips slow down with reduced supply voltage. Looking at just one signal on the scope misses all timing conflicts is also a concern in my opinion. For what all this rambling is worth…

@DrBeat-zs9eb 7 ай бұрын

Well, sharing a failed project characterises you as a great youtuber imho.

@gryffuscze 7 ай бұрын

I kinda feel, that this board is a good candidate for some CuriousMarc cooperation, although a little too new :-) You know, just for some Fancy-Pants power! I think I even saw Atkelar here, this board is getting some attention for sure! 🤣

@suvetar 7 ай бұрын

Sure does, Adrians made the point in the past before that he feels the bad rasults are as instructive as the good ones!

@Really........ 7 ай бұрын

Just a silly off the wall suggestion. Continuity check on the DIP switches to see if they are bad. I know it's a long shot.

@michaelturner2806 7 ай бұрын

I think the diagnostic bios he had showed the layout of the detected dip switch block and showed correct every time. So they have to be good enough for that.

@Really........ 7 ай бұрын

@@michaelturner2806A failing, dirty, or corroded switch could cause intermittent problems. I know it is a long shot but I have had switches go bad. I just thought it was interesting that it worked with one bank of memory, switched DIP 3 when added another bank and then problems. Three minutes to check the switches for continuity to rule that out is time well spent.

@johnhermann762 7 ай бұрын

Hey, a look at the slot connectors for pins that are across from each other but are very close. The pins can get close and short. If they look okay, check for corrosion from the leaked clock card battery around the slots; corrosion can partially conduct and cause weird problems.

@stphinkle 7 ай бұрын

I wonder if there is an issue with the controller that works the higher address lines. Look at the address and data lines that work above 256K. Perhaps there is an issue with one of the address or data lines that works them. Also the one character in the memory test that was purple when all the others were white could also be a clue that it may be using that address line for that are of the video RAM, but not the others.

@Spudz76 7 ай бұрын

Voting along with all the other comments about the pull-up/down resistors or capacitors and their condition. Both could decide to become different suddenly after working fine forever and injure the reaction times of DMA and drive strength of signals. Also having the second bank installed caused the problem to return which alludes to something doesn't like the extra load (out of spec).

@adverschueren 7 ай бұрын

With a single bank installed you get no errors, so the data buffering, address multiplexing and refresh work fine. With two banks everything breaks down. That could be due to too high loading on the buffers driving the RAM chips, but I would rule that out as it would indicate a very marginal design (would never be able to cope with four banks of RAM). My guess is that there is a problem with the RAS and CAS lines going to the individual banks. If those are decoded incorrectly or have flakey connections on the board (or even have shorts to ground, VCC or other lines), the errors you see are totally what I expect to happen. Happy debugging!

@sanjyuu7616 7 ай бұрын

Power filtering issues or maybe address decoder issues are worth to be checked.

@exidy-yt 7 ай бұрын

A blown socket on the second bank of RAM since the first bank seems to work fine?

@VICTORYOVERNEPTUNE 7 ай бұрын

but what about the serial and clock installed? :) loving this series. Thank you Adrian

@davidellis6995 7 ай бұрын

I'd be checking the chip selects (CAS). Each bank should have one, likely from the PAL. May have a short even.

@imqqmi 7 ай бұрын

Maybe take a different angle, use cold spray/put the board in the freezer to see if the issue temperaruly is resolved. Then use cold spray to smaller areas until you've found the culprit. Same can be done with heat.

@CompComp 7 ай бұрын

What is that little O-scope you're using? I'd love to have one.

@jamescronin7742 7 ай бұрын

Im thinking the issue here is that the wrong bank of RAM is being selected or both banks at the same time. Im assuming the PAL is replacing the chips that generate the RAS/CAS lines for specific banks of RAM. It probably also generates the CE liines for the ROMs. Given the PAL swap doesn't make any difference, it would be worth checking the inputs to the PALs to see if there are any bad signals there. I'd look at the IO/M- line from the CPU as if thats not worling reliably then this could explain why its finding the serial ports etc and not able to see the BASIC ROMs What happens if you try and read the BASIC ROMs using DEBUG, as if something else shows up instead of the BASIC code then this would point you in the right direction. Have you checked for continuity between the RAS/CAS lines as if there is a short/some resistance, then signals could be passing between them ie between bank 0 and bank 1 due to damaged trace/a blob of solder in the qrong place.

@soothcoder 7 ай бұрын

Is it time for a logic analyzer yet? It will allow you to trigger on a select line while watching other lines. Don’t need a HP one - can get something like a Hantek 4032L cheaply enough.

@thomasives7560 7 ай бұрын

My thoughts exactly! Address and data buses can't be easily diagnosed by looking at one pin at a time - unless you get lucky and find a bad I/O channel. The LA will indicate (by where the CPU is hanging) the addresses or bus ports that are having problems so the user can locate the specific failure area. Cheers!

@waxore1142 7 ай бұрын

Im thinking the code on that chip for testing the ram is not compatible with the obvious different variation of chip layouts. Is that possible?

@waxore1142 7 ай бұрын

Nevermind. Your way ahead of me lmao

@custume 7 ай бұрын

On the other video I got the feeling about the bios chips ( the basic ones ), I know they get disable if not in use but they can be on the bus, not all the time but perhaps can they be enable all the time. I'm talking about the basic ones because they do not show up (not detected), something strange about that bios ones, try to remove them just to check

@KennethScharf 7 ай бұрын

During refresh you don't actually read every address in ram. Dram has both column and row addresses, and you only spit out one of these, which will (during a refresh cycle) refresh every row in a column, or every column in a row. No actual data is spit out and read by the processor, so if you can refresh ALL of the memory banks at the same time. The DMA controller is providing up to 256 different row or column address, and then cycles over. This is how the DMA controller inside of the Z80 CPU worked (well, it only handled 128 row or column addresses, having only a 7 bit R counter.)

@agranero6 7 ай бұрын

As you seemed to have excluded the decoding circuit for RAS and CAS lines, my experience says that when you take parts off like cards or memory and all works and when you put things begin to break that the power supply is not supporting what is needed. As your workbench probably has a power supply that is way powerful than necessary and the other board works, I would recap all this board.

@user-marco-S 7 ай бұрын

I don't like it too when a computers doesn't work. I have two (same type) old computers. One of them is working. In the non-working computer, two ic's, both good, power is good, no shorts on missing connection. On the connection between those two ic's, no signal where the working has one. All other signals on the two ic's are good. Strange.

@andygozzo72 7 ай бұрын

hmm, yeah, i have a laptop pc133 ram 'stick' that throws up errors in the machine it came in, that runs it at 133, when trying the suspect stick in an older laptop that runs memory at 100 only, tests perfectly ok.!! can ram chips degrade and slow down??!!!

@simonlathwell 7 ай бұрын

Have you had a look at the motherboard under a microscope to see if there is a very thin hair solder growth causing it to bridge and stop the RAM from functioning correctly and causing the RAM problems. I've seen solder joins when they get warm they develop a very thin hair like growth that always end up connecting to another solder joint on the motherboard. Always worth a check to see if that's the problem.

@TheDefpom 7 ай бұрын

@37:40 just because you replaced the chip doesn't mean the signals are OK, it could easily be a corroded via or through hole which is breaking the connection, I would be tracing directly from the driver pins to each pin of the DRAM.

@TheLordkiron 7 ай бұрын

As I told in previous video comments - concentrate on clock and game port wrongly detected. You do have source code for BIOS , so just check in the code to see how it detects clock and game port presence this might give you a hint on which lines, ports or something give wrong signals

@fumthings 7 ай бұрын

5:00 i would have thought the rom diagnostic for refresh, is testing the ram for stable at refresh, rather than testing the actual refresh circuit/system...

@morantaylor 7 ай бұрын

Hi Adrain, I have started ordering the parts to build one of you 8Bit ISA ram cards but have been unable to find the value for the resistor network. Can you tell me what the resistance value should be?

@Renville80 7 ай бұрын

There comes a point where it's best to cut one's losses, save what can be, and move on. There's no guarantee that the board won't suddenly start working after replacing a marginal part, only to refuse to work the next time you power it up.

@bluerizlagirl 7 ай бұрын

It's screaming "Power supply trouble". Maybe there is a faulty decoupling capacitor on the motherboard. It just about works, until you add more power-hungry chips and then it becomes unreliable.

@andygozzo72 7 ай бұрын

do you have a rom image anywhere for that 'system already' bios?

@josephdewes 7 ай бұрын

Most people who work in repairs of any kind come across faults that refuse to be repaired. Im a mechanic by Trade I can think of two cars that we never resolved, in one was my own car that recused to idle. Another became the workshop car that wound run rough at low RPM but ran fine at high rpm…

@dave4882 7 ай бұрын

Is it possible for a RAM chip to screw with the address lines and cause memory failures in the other chips? Maybe swap the first bank(where it is known to work) into the second, and the second down to the first.

@fumthings 7 ай бұрын

it seems the problem with 64k chips made by mt was sometimes, the outputs being active when the cs or oe was INACTIVE. it is assumed that 256k ram chips made by mt are different and have not yet proved to have the problem of outputs being active even when deselected.

@ritchiemastemaker1139 7 ай бұрын

Hi i don't know if it is related, but i noticed at the boot screen a error that there was no basic rom present. And i saw you pointing at a chip where both some ram and ram signals are going trough. Good luck

@ChaosHusky 7 ай бұрын

If it isn't a brownout, and now can't see the BASIC ROMs with less RAM, doesn't that imply whatever gates or ICs switch between ROM and RAM data busses for the system aren't working properly? Could indeed be a power issue, bad connections or perhaps weakened drive signals that flake out once more ICs are fitted. Odd board indeed, worse than this IBM Model 30-286 i have by far. Only boots if i have it powered without POST for 30 seconds then cycle power exactly - already replaced the caps and got it to POST at all by replacing the FDC control chip i was suspicious of (DIP40, went ahead and added a socket too) all without using my scope yet. Though i have a feeling its a flaky custom IC that has to 'warm up' to start working properly, or a bad internal chip connection or bad VIA nearby a chip that has to warm up lol

@grantnichols6765 7 ай бұрын

I’m enjoying all this testing. Just want to make sure I’m not missing something. What I remember from the first video is the board was working and you used a chip off it to fix anther system. Then you replaced the chip with one that wasn’t exactly the same. Did you ever put back the original chip to make sure that wasn’t the problem?

@siberx4 7 ай бұрын

Here's a weird thought; it's possible to add RAM to XT-class machines with an ISA card, right? What happens if you put in the first 256kB on the motherboard (which works) and then add additional RAM on an ISA card instead of on the motherboard? Trying with an "old school" card with the classic small DRAM DIPs vs the "modern" versions of these cards that pack all the RAM into a single SRAM chip (which would load the whole system less) might be instructive too.

@SilentShadow-ss5xp 7 ай бұрын

I feel like it has to be damage to the traces in the PCB. Maybe the cracks are so small that simply touching the board or bumping it at all makes connections unreliable thus causing intermittent faults like this. I doubt any of the chips are bad or have wonky outputs. I'm almost certain you could swap every chips from one board to the other than the problem would remain tied to the PCB. Maybe remove all the chips and inspect all around the ram and stuff.