How to optimize the single-chip microcomputer system to make it run with better reliability

Single-chip hardware system design solutions are generally analyzed from three aspects: optimizing design solutions, increasing redundancy and fault tolerance, and using hardware anti-interference. This article details the benefits of optimizing these three aspects to the reliability of monolithic hardware.

Single-chip hardware system design solutions are generally analyzed from three aspects: optimizing design solutions, increasing redundancy and fault tolerance, and using hardware anti-interference. This article details the benefits of optimizing these three aspects to the reliability of monolithic hardware.

1. Optimal design

In the design and processing of system hardware, good quality connectors should be selected, and the process structure should be designed; qualified components should be selected for strict testing, screening and aging; technical parameters (such as load) should be designed with a certain margin. Quantify or derate components; improve the quality of printed boards and assemblies.

2. Adopt hardware anti-interference measures

Electromagnetic interference signals from the power supply system, transmission through wires, electromagnetic coupling, etc., are important factors for the unstable operation of the single-chip microcomputer system. Effective interference suppression measures must be taken in the system hardware design. In the single-chip application system, the system monitoring circuit is often used to detect errors or faults in the system, and automatically alarm or automatically restore the normal working state of the system. For example, a watchdog circuit composed of 89C51 single-chip microcomputer and X25045 is used for power failure monitoring, watchdog timer, etc. The hardware connection diagram of X25045 is shown in the figure. The X25045 chip contains a watchdog timer, which can preset the monitoring time of the system through software. If there is no bus activity within the preset time of the watchdog timer, the X25045 will output a high-level signal from RESET, and output a positive pulse through the differential circuit C2 and R3 to reset the CPU. In the circuit shown in Figure 1, there are three reset signals of the CPU: power-on reset (C1, R2), manual reset (S, R1, R2) and Watchdog reset (C2, R3), which are added to the OR gate after synthesis. RESET terminal. The time constants of C2 and R3 do not need to be too large, only hundreds of microseconds, because the oscillator of the CPU is already working at this time.
 

How to optimize the single-chip microcomputer system to make it run with better reliability

The timing time of the watchdog circuit can be determined by the cycle cycle of the specific application program, which is usually slightly longer than the maximum cycle cycle time when the system is working normally. During programming, a dog feeding instruction can be added in the appropriate place of the software, so that the timing time of the watchdog can never reach the preset time, and the system will not reset and work normally. When the system runs away and cannot be captured back to the program by other methods such as software traps, the watchdog timing time quickly increases to the preset time, forcing the system to reset. It should be noted that when the program is running normally, a dog feeding instruction should be added in an appropriate place, so that the timing time when the system is running normally does not reach the preset time. The system will not reset.

3. Redundant and fault-tolerant design

It is impossible to guarantee 100% trouble-free application system of single-chip microcomputer. Fault tolerance means that when a certain part of the system fails, the system can still work completely normally, that is, the ability to tolerate failures is added to the system. In order to make the system fault-tolerant, it is necessary to add appropriate redundant units in the system to ensure that when a component fails, the redundant component can take over its work, and the original component is repaired and then restored to the state before the error. Hardware redundancy design can be done at the component level, subsystem or system level.

4. Instruction redundancy

The CPU fetching process is to fetch the opcode first, and then fetch the operand. Artificially inserting some single-byte instructions at key places in the program, or rewriting effective single-byte instructions is called instruction redundancy, usually inserting more than two bytes of NOP after double-byte instructions and three-byte instructions. instruction. In this way, even the runaway program flies to the operands of two-byte instructions and three-byte instructions. Due to the existence of the narrow operation instruction NOP, the following instructions are prevented from being executed incorrectly, and the program is ready for the right track. In addition, instructions that play an important role in the flow of the system, such as RET, RETI, LCALI., LJMP, JC, etc., can insert two NOP instructions after these instructions, which can bring the runaway program on the right track to ensure the execution of these important instructions. . Instruction redundancy can only prevent the CPU from erroneously executing operands as opcodes, but cannot actively reverse the wrong execution direction of the program. To correct the wrong execution direction of the program, the following techniques are required.

5. Design software “traps”

Usually, the unused EPROM space in the program memory is filled with the narrow operation instruction NOP, and finally a jump instruction is filled in to jump to the run-away processing program, or the instruction LJMP 0000H is directly filled in. When the run-away program falls into this area . You can get back on track after performing a period of no-op. If the unused EPROM space is relatively large, several no-operation instructions and jump instructions can be filled evenly. This structure of several no-operation instructions plus one jump instruction is called “software trap”.

The general structure of a software trap is:

NOP

LJMP FLY

FLY is a runaway processing subroutine. If the program is executed normally, the software trap part will never be executed. Only when the program runs into the trap, the software trap will immediately jump the program to the normal track. Even if the program doesn’t fly into the trap, it can get back on track by encountering a software trap after the program performs an incorrect operation. In addition to the blank area of ​​the program memory, software traps should also be set at the end of the data table of the program. If the data table is relatively large, a software trap should also be set in the middle of the data table to ensure that the program can fly to the data area and turn on the right track in time. In addition, if the program memory space is large enough, a software trap can be set between every two subroutines. When the used interrupt is opened due to interference, a software trap is set in the corresponding interrupt service routine to catch the erroneous interrupt in time. The number of software traps should be determined according to the actual interference situation and the capacity of the program memory. If it is too small, it will not be able to carry out effective flight interception, and if it is too large, it will take up a lot of program memory space.

6. Software “watchdog” technology

After executing some wrong operations, the running program will often enter an “infinite loop”, which is often referred to as “crash”. Usually, the “software watchdog” technology is used to make the program get out of the “dead loop”. The principle of the software “watchdog” technology is to continuously detect the program cycle running time. If the program cycle time exceeds the maximum cycle running time, it is considered that the system is trapped in An “infinite loop” requires error handling. In practical applications, the timing interrupt service routine is usually used to regularly check the operation of the main program. For example, select a byte in the RAM area as the software watchdog register, the main program adds 1 to this register every time it loops, and the interrupt service routine of the timer TO decrements this register every time it interrupts and checks it once, if the program executes normally . The watchdog register does not change or does not change much. If the watchdog register changes or changes greatly, it means that the system is stuck in an “infinite loop”. Error handling is required. In industrial applications, severe interference sometimes destroys the interrupt mode control word, closes the interrupt, and causes the watchdog to fail. At this time, a ring interrupt monitoring system can be used. Use the timer TO to monitor the timer T1, use the timer T1 to monitor the Feng program, and the main program monitors the timer T0.

The software “watchdog” using this ring structure has good anti-interference performance and greatly improves the system reliability. For the measurement and control system that needs to use the Tl timer for serial communication frequently, the timer Tl cannot be interrupted, and the serial port interrupt can be used for monitoring. Of course, the maximum cycle period of the main program and the timing periods of timers T0 and T1 should be taken into consideration on the whole. Software “watchdog” techniques require the use of timers, which are a scarce resource in most control programs. This limits the practical application of the “software watchdog” technology. We can take some tricky processing to reuse the software “watchdog” program and other timing programs with the same timer, so as to complete the timing function. And complete the software “watchdog” function.

7. Check the flag data in the RAM area to find serious interference in time

This method is to select several fixed units in the RAM area and set them to fixed data in the initialization program. As long as the program runs normally, the contents of these units will not change. If the data of any unit in these RAM units has changed due to program “runaway” or other disturbances, it means that the single-chip microcomputer system has been seriously disturbed and cannot run reliably. We can check the contents of these RAM units in a timely manner during the execution of the program. Once data changes are found, the LJMP 0000 H statement is executed immediately to force the microcontroller to reset.

8. Refresh the output port

Eliminate serious interference. When the single-chip microcomputer system is seriously interfered, the state of the output port may also change due to the interference. During the execution of the program, the output port is refreshed according to the operation result of the relevant program module in a timely manner, which can eliminate the interference of the interference on the output port state. Influence, so that the wrong output state can be corrected in time.

9. Perform multiple input sampling

Avoid serious interference. Strong interference will affect the input signal of the single-chip microcomputer, resulting in errors or misreading of the instantaneous sampling of the input signal. To avoid the influence of interference, the method of repeated sampling and weighted average is usually adopted.

Summarize

To improve the reliability of the single-chip application system, we must start with hardware and software, and improve the system’s own defense behavior. The above-mentioned methods for improving reliability are not used alone. Only these methods can be effectively combined according to the actual situation. In order to achieve the best anti-interference effect, our single-chip microcomputer system can work stably and reliably. Of course, the reliability of the operation of the single-chip microcomputer system will also be disturbed by other uncertain factors.

The Links:   CM15TF-12H ST330C16C0 BUYPART