Chapter 14 - Additional Diagnostics
This chapter describes procedures for diagnosing and recovering from problems that occur during server operations.
Selecting Troubleshooting Options
Troubleshooting options are available from the System Maintenance menu. To display the menu, choose System Maintenance from the Operator Menu.
You can run on-line diagnostic tests on the Banyan ICA and ICAmcTM cards. These diagnostic tests do not work for the newer Banyan ICAplus, ICA/RM or ICA/HS cards. To test these cards, use the DOS diagnostics shipped with the card. For more information refer to the ICA Installation Guide.
The ICA diagnostic tests are available on all Banyan servers. The tests run while the server is in operation, and do not affect server operations. However, you cannot use the ICA card while running the diagnostic tests. The diagnostics program automatically takes the card off-line, runs the diagnostic tests, and returns the card to on-line status.
If your server has several ICA cards, you can run diagnostic tests on some or all of the cards at once. Alternatively, you can run diagnostic tests on one card at a time, leaving the others in a usable state.
When you run the diagnostics, the information on the System Diagnostics screen lists the configuration for each ICA card in the server. A pound sign (#) next to an entry means that although a card is configured, the diagnostic tests could not communicate with the card. This situation may have one of two causes:
![]() |
Card is not actually installed in the server. Either reinstall the card, or remove the card from the configuration by using the Add Cards/Change Card Configuration function. |
![]() |
Card is installed, but may not be seated correctly in its slot or may be malfunctioning. If you cannot reseat the card, call your Banyan service representative for assistance. |
1. From the Operator Menu, choose System Maintenance. The System Maintenance menu appears.
2. Choose Configure/Diagnose Server. The BANYAN Server Configuration menu appears.
3. Choose Run Diagnostics. The System Diagnostics screen appears and displays information for each ICA card installed in the server.
4. Choose SELECT a card. The cursor moves to the entry for the first card.
5. Choose one or more cards. An asterisk (*) appears next to each card selected.
6. Choose DESELECT a card if you want to remove a card from the list of those to test. The asterisk is removed from the card listed.
7. When you are finished choosing the cards to test, choose RUN Diagnostics to begin testing. The Execute Tests screen appears.
8. Do one of the following:
- Allow the tests to run automatically. The diagnostic tests execute ten times for each card. After each round of tests, the PASSES COMPLETED counter increments by one.
- Stop the tests by choosing TERMINATE tests.
Two counters on the screen keep track of errors that occur during the testing. The ERRORS LAST PASS counter tells you how many errors were found during the last pass, and the TOTAL ERRORS counter keeps a running total of errors found.
When the tests are complete, the following message appears:
TESTING COMPLETE
The card re-initializes and is ready for use.
9. Do one of the following:
- Choose EXAMINE error log to see the results of the tests. Refer to "Examining the Error Log" next.
- Press ESC to exit. The System Diagnostics menu appears.
When the diagnostic tests report an error on a card, the error is logged in a file on the server. The file is overwritten each time you run diagnostic tests.
To Examine Errors Reported by the Diagnostic Tests
1. When the diagnostics complete, choose EXAMINE error log from the Execute Tests screen.
2. Do one of the following:
- If there is only one entry on the screen, press ENTER.
- If there are two or more entries, highlight the card for which you want to view the log and then press ENTER.
The Error Log screen appears.
3. Examine the error log.
4. Press ESC to exit from the log screen. The Execute Tests screen appears.
The error log begins by listing the following card configuration information for the card tested:
![]() |
Slot number |
![]() |
Interrupt level |
![]() |
I/O address |
![]() |
RAM address |
For example:
Slot #6, INT 6, I/O addr 0x140, RAM addr 0xa0000:
If the error log contains only the configuration information, the card passed the diagnostic tests. If errors were found, they indicate a malfunctioning card (see the following example).
Example Error Log Entry for a Malfunctioning ICA Card
Error Log
Slot #4 INT 10, I/O addr 0x180, RAM addr 0x80000:
Loop Count:1
Buffer 0 miscompare:
Addr = 0x00, Expected = 0x00, Actual = 0x20
Buffer 1 miscompare:
Addr = 0x00, Expected - 0x55, Actual = 0x20
Buffer 2 miscompare:
Addr = 0x00, Expected = 0xaa, Actual = 0x28
Buffer 3 miscompare:
Addr = 0x00, Expected = 0x00, Actual = 0x20
CODE FAILED TO EXECUTE
Data = 0x23
Press ESC to exit...
If the card is not found in the slot specified on the Add/Change Card screen, an error similar to this example is logged:
Slot #7, INT 4, I/O addr 0x120, RAM addr 0x40000:
Loop Count : 1
The following message indicates the test diagnostics failed to load on the ICA card (Failed to Dump to the ICA card):
Error : 165
The 165 error means that the diagnostic tests could not find the card in the specified slot. Your card is either not installed, or not properly seated in the server bus.
If you receive any error code, the card may be damaged. Record the errors and call your Banyan service representative.
Automatic Service and Device Checking
Some recovery procedures occur automatically on all servers. For example, if a service does not respond properly to a check, the system automatically stops the service, then restarts it.
LAN cards in all servers are continually checked while in use. If a server's LAN card does not respond to a check after a period of time, the system automatically resets and re-initializes the LAN card.
The system performs disk and file services checking procedures to ensure the integrity of data storage. Disk checking occurs during reboot only if the system was not shut down properly.
If the system detects disk errors, it automatically attempts to correct them. If unrecoverable disk errors occur, you may need to replace the disk and reinstall your Banyan software. Refer to the Server Installation Guide for information on disk errors and Chapter 11 of this document for information on re-installing Banyan software.
SCSI Error Handling
PC-based servers report errors in the operation of a SCSI device. For disks, operations that encounter these errors are automatically retried. For tape devices, the operation is cancelled and an error is returned to the application. If a tape error is encountered during a backup/restore operation, an error will be entered in the backup/restore log.
For example, on a backup, these messages may indicate that the heads on the tape drive need cleaning. SCSI-specific information is now displayed at the console and sent to the log file. The message is broken into three parts and each part has a header and a trailer noting that this is a console message and including a date and time stamp. The message is as follows:
WARNING: SCSI drive error - Extended status indicates TYPE ERROR
WARNING: SCSI function = a, Block length = b
WARNING: SCSI sense key = x, subcode = y, residue = z
In the first line above, drive is either Disk or Tape and TYPE is HARDWARE or MEDIUM indicating where the problem is. In the second line, a is 2 for READ or 3 for WRITE, and b is the number of blocks transferred. In the third line, x, y, and z are SCSI codes with z the number of blocks remaining in the buffer. See the specification for the particular device for information on the sense key and subcode.
Swap Area Warning Messages
You should use VNSM to monitor the swap space utilization on your server. You can tell if the system ran out of swap space if it displays messages on the server console similar to the following example:
No swap for u-area, p=D011FC18
If this occurs, refer to Monitoring and Optimizing Servers for possible solutions.
When a system panic occurs on the server, the system memory may contain valuable information about the cause of the problem. Immediately copying the contents of the system memory to diskette or tape ensures that information related to the problem is available for analysis by a Banyan support representative.
If a server panics and enters the debugger, report the problem to your Banyan service representative. Your support representative will assign an incident number and request that you send in the system memory image on diskette or tape.
Recovering from a System Panic
Following a system panic, the server enters the Banyan System Debugger. Use debugger commands to examine the server's memory contents, dump the system memory to swap space on disk1, and copy the dump to diskette or tape.
Use one of the following debugger commands to copy a system dump to the appropriate media:
tapedump - Use to copy the dump to tape. Dumps the entire contents of memory, including the kernel and the loadable device driver symbols, to tape. The extra information extracted by the tapedump command makes it easier for support personnel to diagnose server problems.
sysdump - Use to copy the dump to diskettes. This command copies system memory to the primary swap partition on the boot disk (/disk1). This method copies only enough memory to fill the swap partition. Use sysdump only if the server does not have a tape drive or the tapedump command fails.
After the system enters the debugger, information similar to the following appears on the server console:
PANIC: PGF Kernel mode trap. Type 0x0000000e
Faulting Virtual Address 0x26007df0
Page Directory 0x000020000
debugger entered from open+0000000a PC 0d0060d0e
EAX EBX ECX EDX ESI EDI
d0060d04 d0085a24 e00010d0 e00010d0 0000000 6007de8
GS SS DS ES FS GS
e0000158 e0000160 d00d0160 d00d0160 e0000008 c03e0000
d0018523: E9 0C FE FF FF jmp 0xD0018334
Banyan Systems Debugger (type h for help)d:
The information listed represents the contents of the server's CPU registers at the time of the system panic. From the prompt that follows, you can enter any of six debugger commands (Table 14-1). Debugger commands are case sensitive; enter them in lower-case letters.
The next section explains how to save system memory dumps.
You can save the memory dump to tape or diskette.
Using tapedump to Copy a System Dump
The following restrictions apply to the tape dump operation:
![]() |
The tape dump is written to the 1/4-inch cartridge (QIC), 4mm, or 8mm tape drives. |
![]() |
The tape dump cannot span tapes. You can use only one tape cartridge. |
![]() |
Only high-density tapes are supported. Avoid using 45 MB and 60 MB tapes, because they are usually low-density tapes. |
![]() |
The capacity of the tape must be equal to or greater than the size of the server memory. For example, if the server has 96 MB of memory, the tape must be able to hold at least 96 MB of data. It is recommended that you use 150 MB or larger tape cartridges. |
To Copy a System Dump to Tape
1. When the server enters the debugger, write down the information that appears on the screen.
2. Insert a tape into the tape drive. Make sure the tape is not write protected.
3. Enter:
tapedump
You must enter the command in lower-case. The contents of memory are written to tape.
After the memory dump completes, a message warns you not to remove the tape because the system must later dump the kernel and driver symbols to that tape. Do not remove the tape from the drive.
After the tape dump, the system automatically reboots. After the drivers are loaded into the kernel as part of the reboot, the system writes the kernel and the driver symbols to tape.
4. After the system is running, remove the tape, set the write-protect switch, and label the tape.
5. Report the problem to your Banyan support representative. You may be asked to send in the tape.
Note: Do not send a system memory image to Banyan unless a support engineer requests one. Banyan Customer Support cannot accept tapes or diskettes that are submitted without pre-assigned incident numbers.
Using sysdump to Copy a System Dump
You need five or six high-density, formatted diskettes to complete this procedure if your primary swap partition is 8000 blocks (4 MB). You need more than twenty high-density, formatted diskettes if your primary swap partition is 65536 blocks (32 MB). The server's primary swap area is either approximately 4 MB on an existing VINES server, or 32 MB on a new server or an existing server where the installer specified creation of the larger swap area (see the Server Installation Guide).
Avoiding Swap Space Conflicts When Using sysdump
Avoid using the sysdump command when the system is swapping processes. During normal system activity, when memory resources are heavily used, the system writes, or swaps, portions of memory that are not being actively used by a process to the swap partition of the root disk to allow other processes to run. If you use the sysdump command, it writes kernel memory to the swap area also. This may cause a panic during the subsequent shutdown and a longer reboot time because the file system must be checked.
When the system is actively swapping processes, if you enter sysdump at the debugger prompt, you see the following message:
If the system detects a configured tape drive, you also see the following message:
Note: If the system does not detect a tape drive, the T option does not appear. The C (Continue) and S (Shutdown) options follow immediately after the warning message.
On servers with a tape drive, Banyan recommends that you use the tapedump command and dump system memory to tape by choosing the T option.
If you enter C, the system dumps system memory to the swap area and then performs a non-graceful shutdown. After the server reboots, a full check of the file systems will occur. Depending on the size of the disk and the number of files it contains, a full check can take a considerable amount of time.
If you enter S, the system does not dump system memory to the swap area. Instead, the system flushes all buffers to disk, shuts down gracefully, and reboots. A full check of the file systems will not occur.
If your only options are C (Continue) and S (Shutdown), select C if you are sure that you need to save the system dump.
To Copy a System Dump to Diskettes
1. When the server enters the debugger, write down the information that appears on the screen.
2. Enter:
sysdump
You must enter the command in lower-case. The contents of memory are written to disk1 of the server. When the dump is complete, the server reboots.
3. When the system begins to reboot, it examines the disks and warns you that a system image dump has been saved to disk. You are prompted to save the dump on diskettes.
4. Do one of the following:
- Enter N. The system initializes and does not allow you to save the dump on diskettes.
- Enter Y. A message appears informing you of the number of diskettes required. Continue with step 6.
5. Enter Y. You are prompted to insert a diskette.
6. Insert a 3.5-inch diskette in the diskette drive.
7. Press ENTER. The server copies a portion of the memory dump to the diskette. When the diskette is full, you are prompted to insert the next diskette.
8. Remove the diskette from the drive.
9. Write protect the diskette.
10. Label the diskettes in the proper numerical sequence.
11. Repeat steps 7 through 10 until the entire system dump has been copied. The reboot continues normally.
12. Report the problem to your Banyan support representative. You may be asked to send in the diskettes. You are not prompted to save the memory dump following subsequent reboots.
Note: Do not send a system memory image to Banyan unless a support engineer requests one. Banyan Customer Support cannot accept tapes or diskettes that are submitted without pre-assigned incident numbers.