Bug fixing
Lesson 1:
In the program, Some
threads are designed to run forever until program is shutdown, but
unfortunately, they didn't capture all-possibly-thrown exceptions,
and this would cause thread exit unexpectedly.
Lesson 2:
When program runs slowly or weirdly, check system status, and all
possibility, and guess reasonably.
Some tests defects look very weird,
the program can not send out multicast messages intermittently. At
first, I guess it may be code problem, or because we have upgraded
machine
to new operating system, new JDK, so maybe new OS or new
JDK is the culprit. when I do test, I found that when I hit the
problem, the command 'java -version' would hang for ever. But at that
moment, I ignore this obvious information.
At last my colleague
figure out the root cause of the problem, that is because one process
in the machine consumes too many system resource,
which cause all
other processes to starve and frozen, and run extremely slow.
UID
PID PPID C STIME TTY TIME CMD
root 307694
1 50 15:14:20 - 33:08 /process_cmd
C
(-f, l, and -l flags) CPU utilization
of process or thread, incremented each time the system clock ticks
and the process or thread is found to be running. The value is
decayed by the scheduler by dividing it by 2 once per second.
For
the sched_other policy, CPU utilization is used in determining
process scheduling priority. Large values indicate a CPU intensive
process and result in lower process priority whereas small values
indicate an I/O intensive process and result in a more favorable
priority.
How stupidly I didn't use ps and top command to
check system run status, and ignore when I discover 'java -version'
hang, and didn't catch the connection.