How to Improve Problem Solving Skills
April 24, 2019
Why Problem-solving skills matters
Problem Solving and troubleshooting
- Is fun
- Is part of daily work
- Solve the problem, get things done
- Work More efficiently
- With more confidence
- Less pressure
- Go home earlier
Solve the Problem when Needed
- It’s your responsibility if it blocks you, the team
How to Solve a Problem
Understand the Problem/environment First
- Understand the problem before google search otherwise it may just lead you to totally wrong directions.
- Find related log/data
- Copy/Save logs/info/findings that may be related
Check the Log and Error Message
- Read/Understand the error message
- Where to find log
- Common places: /var/log
- From command line:
- -Dcassandra.logdir=/var/log/cassandra
Case Study – The Log and Error Message
Problem: Failed to talk Cassandra server: 10.10.10.10
Where 10.10.10.10 comes from?
- or search in corporate github/code base
- It comes from commons2 project; check what’s the current settings in Github
- Now the problem and solution is clear
- Just upgrade commons2 to latest version
Source code is always the ultimate truth
- Find related code in Github
- Find examples/working code
- Understand how/why the code works by running and debug the code
- Check the log with the code
- Most problems can be solved by checking log and source code
Reproduce the problem
- Find easier way to reproduce them
- main method, unit test, mock
- Simplify the suspect code
- Find the code related, remove things not realtedUse
- Reproduce locally
- Connect to the remote data in local dev
- Remote debug
- Last resort, slow
Solving Problem from Different Angles
- Sometime we find problem in production and we need code change to fix it
- Try to find a workaround by changing database/Solr or other configuration
- We can fix code later
Find Information Effectively
- Google search: error message, exception
- Search source code in Github/Eclipse
- Search log
- Search in IDE
- Cmd+alt+h, cmd+h
- Search command history
- history | grep git | grep erase
- history | grep ssh | grep 9042
- history | grep kubectl | awk ‘{$1="";print}’ | sort -u
- history | grep git | grep erase
Find Information Effectively Cont.
- Know company’s internal resource
- where to find them
- Know some experts (in company) you can ask help from
How to Troubleshoot and Debug
- Don’t overcomplicate it.
- In most cases, the solution/problem is quite simple
- Troubleshooting is about thinking what may go wrong.
- Track what change you have made
Resource About Troubleshooting
Urgent Issues in Production
- Collaborate and share update timely
- Let others know what you are testing/checking, the progress, what you have found, what you will do next
Ask for help
Before
- Try to understand the problem and fix it by yourself first
- If this applies: not urgent
Where and Who
- Coworkers
- Involve more people: the team, related teams
- Stackoverflow
- Specific forums
- Google Groups
- Github issues
- Ask in multiple channels
How to Ask Help
- Provide more context and info
- log, stack trace or any information that may help others understand the problem
- Provide what you have found, tried
- Ask help once for same/similar/related issues
Learn more
- The knowledge: root cause, etc
- Learn their thinking process
- how they approach this problem (logs, code), tools
Fix same/similar/related problems in other places
- People make same mistakes in different places
- Example:
GetMapping(value = "/config/{name:.+}")
- Example:
Knowledge & Tools
Knowledge
- Be prepared
- Learn how to debug/troubleshooting
- Learn tools used for debugging
- Learn framework, library, products, services used in your project
- Apache/Tomcat configuration
- How to manage/troubleshoot Cassandra/Kafka/Solr
- Know what problem may happen, code change recently
Knowledge cont.
Common Problems
- Different versions of same library
- mvn dependency:tree
- mvn dependency:tree -Dverbose -Dincludes=com.amazonaws:aws-java-sdk-core
Tools - Eclipse
- Use Conditional Breakpoint to Execute Arbitrary Code (and automatically)
- Use Display View to Execute Arbitrary Code
- Find which jar containing the class and the application is using
MediaType.class.getProtectionDomain().getCodeSource().getLocation()
- Breakpoint doesn’t work
- Multiple versions of same class or library
Practice - Connect to the remote data in local dev
- Create a tunnel to zookeeper and solr nodes
- Add a conditional breakpoint at
CloudSolrClient.sendRequest(SolrRequest, String)
- before
LBHttpSolrClient.Req req = new LBHttpSolrClient.Req(request, theUrlList);
- before
Tools - Decompiler
- GUI: Bytecode-Viewer
- alias decom=“java -jar /Users/jyuan/apple/tools/misc/Bytecode-Viewer-2.9.11.jar”
- CFR
- Best, Support java8
Tools - Java
- jcmd: One JDK Command-Line Tool to Rule Them All
- jcmd
Thread.print - jcmd
GC.heap_dump - jcmd
- jcmd
- Thread dump Analyzer
- Heap dump Analyzer
- VisualVM
Tools - Splunk
- Syntax
- Expand messages to show all fields
- Click
Format
on the top and selectAll lines
for theMax Lines
setting
- Click
- After search and find the problem, use nearby Events +/- x seconds to show context
Tools - Misc
Search Contents of .jar Files for Specific String
gfind . -iname '*.jar' -printf "unzip -c %p | grep -q 'string_to_search' && echo %p\n" | s
- nc -zv, lsof, df, find, grep
Fiddler
Problem Solving in Practice
Example: Redis cache.put Hangs
- Get thread dump, figure out what’s happening when read from cache
- Read related code to figure out how Spring implements
@Cacheable(sync=true RedisCache$RedisCachePutCallback
- Check whether there’s cacheName~lock in redis
- When use some feature, know how it’s implemented.
Practice - Iterator vs Iterable
What’s the problem in the following Code?
Practice - Iterator vs Iterable
- How to find the root cause
- Symptoms: The function only works once in a while: when cache is refreshed
- Difference between Iterator vs Iterable
- Don’t use Iterator when need traverse multiple times
- Don’t use Iterator as cache value
Practice 2 - Spring Cacheable Not Working
- The class using cache annotation inited too early
- Add a breakpoint at the default constructor of the bean, then from the stack trace we can figure out why and which bean (or configuration class) causes this bean to be created
- Understand how spring cache works internally, spring proxy
- CacheAspectSupport
Post Mortem
Reflection: Lesson Learned
- How we find the root causeWhy it takes so long
- What we have learned
- What’s the root cause
- Why we made the mistake
- How we can prevent this happens again
- Share the knowledge in the team via wiki, quip, email etc.
- Take time to solve problem, find the root cause, but only (take time to) solve it once
Think More
- Think over the code/problem, try to find better solution even the issue’s solved
- Everything that stops you from working effectively is a problem
- Fix them
Bonus
Building Troubleshooting-Friendly Application
- Return meaningful error code and response
- Return debug info that can check from response when requested
- Should be safe to use or protected
- Add requested_id in client (automatically when provide client application)
- Mock User Feature
- Preview Feature
How to write test efficiently
- Learn and Use
Hamcrest, Mockito, JUnit, TestNG, REST Assured
- Add Static Import in Eclipse
- Preferences > Java > Editor > Content Assist > Favorites, then add:
org.hamcrest
org.hamcrest.Matchers.*
org.hamcrest.CoreMatchers.*
org.junit.*
org.junit.Assert.*
org.junit.Assume.*
org.junit.matchers.JUnitMatchers.*
org.mockito.Mockito
org.mockito.Matchers
org.mockito.ArgumentMatchers
io.restassured.RestAssured