Related: How to Solve Problems
Source code is always the ultimate truth
- Don't assume or think ..., check the code to verify it.
- Always find the related code and understand it
- how it works, how it gets called, how to configure it
- You can't troubleshoot without know understanding the code first
- We can find examples/working code
- We can understand how/why the code works by running and debug the code
Check the code and log together
Solve problems quickly but find root cause slowly
- Solve problems quickly so others can move on
- But take time to find root cause or reflect if it matters
Troubleshooting is fun
We can learn something new, avoid making same mistake in future.
it brings us accomplishment and satisfaction.
How to Troubleshoot and Debug
Check and save current state(logging)
We may not recreate the problem easily, so analyze the problem and save logging first.
Understand the problem first
-- Especially when trouble shoot problem for others(rookie).
First take some time to understand the symptom, what the symptom means, whether it's really a problem
Don't just make some change blindly without understanding the symptom
Check error code, understand how the user is writing the code
Understand the environment first
-- Especially asked to help trouble shoot in other's project which you are not familiar at all
-- Don't blindly think others made some simple/stupid mistake, or same mistake you made before
-- You din't find some configuration file - it may be because you find it in wrong folder
-- Know how to quickly search/debug in linux
-- Most time, the problem/root cause is simple, or just need some configuration change or there is no problem at all
List/think about what may be the root cause
- trouble shooting problem is thinking what may go wrong
- how it's likely caused by this?
- how easily we can verify it?
Read/Check the Log
For example, if we are using spring security library: then change level of org.springframework.security to debug in dev environment.
Check/Use IDE
- The IDE can help to detect problems.
- Eclipse' problems view may tell problems that cause project not build, or mistake in configuration files such as web.xml.
Reproduce the problem (faster)Check and save current state(logging)
We may not recreate the problem easily, so analyze the problem and save logging first.
Understand the problem first
-- Especially when trouble shoot problem for others(rookie).
First take some time to understand the symptom, what the symptom means, whether it's really a problem
Don't just make some change blindly without understanding the symptom
Check error code, understand how the user is writing the code
Understand the environment first
-- Especially asked to help trouble shoot in other's project which you are not familiar at all
-- Don't blindly think others made some simple/stupid mistake, or same mistake you made before
-- You din't find some configuration file - it may be because you find it in wrong folder
-- Know how to quickly search/debug in linux
-- Most time, the problem/root cause is simple, or just need some configuration change or there is no problem at all
List/think about what may be the root cause
- trouble shooting problem is thinking what may go wrong
- how it's likely caused by this?
- how easily we can verify it?
- if not likely and not easily to verify it, try/list other possible root cause/approaches first
Read/Check the Log
-- Read more at Read the Error Message and List Likely Causes
Change the logging level to know more abut the internal
Check/Use IDE
- The IDE can help to detect problems.
- Eclipse' problems view may tell problems that cause project not build, or mistake in configuration files such as web.xml.
Try to reproduce the problem locally in our own setup
-- It's much easier to trouble shooting in our local setup, we can do all tricks add breakpoint, change variable value, force return, drop frames to rerun etc.
-- Sometimes the problem is related with data, it only happens with the data in x line with its data. We can change local setup to point to remote data (via creating tunnel etc). - This is much easier to remote debug code in remote server.
Simplify the suspect code
-- It's much easier to trouble shooting in our local setup, we can do all tricks add breakpoint, change variable value, force return, drop frames to rerun etc.
-- Sometimes the problem is related with data, it only happens with the data in x line with its data. We can change local setup to point to remote data (via creating tunnel etc). - This is much easier to remote debug code in remote server.
Simplify the suspect code
Create simple test cases to reproduce it
If possible, debug and test locally
For example, we can change local setup to use data in remote setup(test or sometimes production lines).
Remote debug if can't debug locally
- but last resort, slow and sometimes impossible
- but last resort, slow and sometimes impossible
Use condition breakpoint to change behavior dynamically
Use display view/force return to change behavior
Force the execution of suspect path
-- throw exception, change value etc
Source code is always the ultimate truth
- Don't assume or think ..., check the code to verify it.
- Always find the related code and understand it
- how it works, how it gets called, how to configure it
- You can't troubleshoot without know understanding the code first
- We can find examples/working code
- We can understand how/why the code works by running and debug the code
Check the code and log together
Change the Code to Help Debug
-- Add logging
Construct and test hypotheses
- First verify the most-likely, easy-to-verify ones
Google Search using the error message with library, class and method name
Search source code in Github/Eclipse
Search log: use linux command:(f|e)grep
Compare the difference between the code and the failed code
-- Compare different version in git(hub).
-- Search company's code base and find similar code
-- Others may already fixed same problems
If still gets stuck:
Take time to learn the framework/feature etc
Sleep on it, try to solve it later.
Scripting
Write test code using scripting or other languages
Ask in StackOverflow, product's forum
Read the Doc/JavaDoc
Solve problems quickly but find root cause slowly
- Solve problems quickly so others can move on
- But take time to find root cause or reflect if it matters
Collaboration
Ask questions
-- When help solve problems for others or debug other's code.
Get more information as much as possible - when help others
Provide more information when ask help
When work on urgent issues with others
-- Collaborate with others timely
-- Let others know what you are testing, what's your progress, so there is no overlap.
-- Collaborate with others timely
-- Let others know what you are testing, what's your progress, so there is no overlap.
Reflection: Lesson Learned
How we find the root cause, why it takes so long
How we find the root cause, why it takes so long
What we learned
What's the root cause
Why we made the mistake
How we can prevent this happens again
Share the knowledge in the team