How to Troubleshoot and Debug - Programmer Skills


Related: How to Solve Problems
Troubleshooting is fun
We can learn something new, avoid making same mistake in future.
it brings us accomplishment and satisfaction.

How to Troubleshoot and Debug

Check and save current state(logging)
We may not recreate the problem easily, so analyze the problem and save logging first.

Understand the problem first
-- Especially when trouble shoot problem for others(rookie).
First take some time to understand the symptom, what the symptom means, whether it's really a problem
Don't just make some change blindly without understanding the symptom
Check error code, understand how the user is writing the code

Understand the environment first
-- Especially asked to help trouble shoot in other's project which you are not familiar at all
-- Don't blindly think others made some simple/stupid mistake, or same mistake you made before
  -- You din't find some configuration file - it may be because you find it in wrong folder
-- Know how to quickly search/debug in linux
-- Most time, the problem/root cause is simple, or just need some configuration change or there is no problem at all

List/think about what may be the root cause
- trouble shooting problem is thinking what may go wrong
- how it's likely caused by this?
- how easily we can verify it?

- if not likely and not easily to verify it, try/list other possible root cause/approaches first

Read/Check the Log
Change the logging level to know more abut the internal
For example, if we are using spring security library: then change level of org.springframework.security to debug in dev environment.

Check/Use IDE
- The IDE can help to detect problems.
- Eclipse' problems view may tell problems that cause project not build, or mistake in configuration files such as web.xml.

Reproduce the problem (faster)
Try to reproduce the problem locally in our own setup
-- It's much easier to trouble shooting in our local setup, we can do all tricks add breakpoint, change variable value, force return, drop frames to rerun etc.
-- Sometimes the problem is related with data, it only happens with the data in x line with its data. We can change local setup to point to remote data (via creating tunnel etc). - This is much easier to remote debug code in remote server.

Simplify the suspect code
Create simple test cases to reproduce it

If possible, debug and test locally
For example, we can change local setup to use data in remote setup(test or sometimes production lines).

Remote debug if can't debug locally 
- but last resort, slow and sometimes impossible

Use condition breakpoint to change behavior dynamically
Use display view/force return to change behavior
Force the execution of suspect path
-- throw exception, change value etc

Source code is always the ultimate truth
- Don't assume or think ..., check the code to verify it.
- Always find the related code and understand it
  - how it works, how it gets called, how to configure it
  - You can't troubleshoot without know understanding the code first
- We can find examples/working code
- We can understand how/why the code works by running and debug the code

Check the code and log together

Change the Code to Help Debug
-- Add logging


Construct and test hypotheses

- First verify the most-likely, easy-to-verify ones


Google Search using the error message with library, class and method name
Search source code in Github/Eclipse
Search log: use linux command:(f|e)grep

Compare the difference between the code and the failed code
-- Compare different version in git(hub).
-- Search company's code base and find similar code
-- Others may already fixed same problems

If still gets stuck:
Take time to learn the framework/feature etc
Sleep on it, try to solve it later.

Scripting
Write test code using scripting or other languages

Ask in StackOverflow, product's forum
Read the Doc/JavaDoc

Solve problems quickly but find root cause slowly
- Solve problems quickly so others can move on
- But take time to find root cause or reflect if it matters

Collaboration
Ask questions
-- When help solve problems for others or debug other's code.
Get more information as much as possible - when help others
Provide more information when ask help

When work on urgent issues with others
-- Collaborate with others timely
-- Let others know what you are testing, what's your progress, so there is no overlap.

Reflection: Lesson Learned
How we find the root cause, why it takes so long

What we learned
What's the root cause

Why we made the mistake

How we can prevent this happens again

Share the knowledge in the team

Testing with TestNG



Fail Fast Using TestListenerAdapter
Sometimes, we want to all suites stops immediately if one test fails, TestListenerAdapter can help on this. onTestFailure will be called when any test fails, it will set static boolean varible hasFailedTest to true.
public class MyTestNGListener extends TestListenerAdapter {
    private static boolean hasFailedTest;
    @Override
    public void beforeConfiguration(final ITestResult result) {
        if (hasFailedTest) {
            throw new SkipException("Skipping beforeConfiguration");
        }
        super.beforeConfiguration(result);
    }

    @Override
    public void onStart(final ITestContext testContext) {
        if (hasFailedTest) {
            throw new SkipException("Skipping onStart");
        }
        super.onStart(testContext);
    }

    @Override
    public void onTestStart(final ITestResult result) {
        if (hasFailedTest) {
            throw new SkipException("Skipping onTestStart");
        }
        super.onTestStart(result);
    }

    @Override
    public void onTestFailure(final ITestResult tr) {
        hasFailedTest = true;
    }
}

@Listeners({MyTestNGListener.class})
public abstract class BaseTest {}

@BeforeSuite + @BeforeClass
To test admin api which requires login first, we login in the @BeforeSuite and share the session in all test in the suite.
-- We use rest-assured to do end-2-end rest api testing.

To make the child class easier to use the sessionFilter, we use @BeforeClass to store it as instance variable.
public abstract class BaseTest {
  @BeforeSuite(alwaysRun = true)
  @Parameters({"adminBaseUrl", "clientBaseUrl", "username", "password"})
  public void setup(final String adminBaseUrl, final String username, final String password,
          final ITestContext context) {
      final SessionFilter sessionFilter = TestUtil.configureAdminRestAssured(adminBaseUrl, username, password);
      context.setAttribute(FILTER_NAME, sessionFilter);
  }

  protected RequestSpecification restRequestSpec, adminRequestSpec;
  protected SessionFilter sessionFilter;

  @BeforeClass(alwaysRun = true)
  @Parameters({"adminBaseUrl", "clientBaseUrl"})
  public void beforeClass(final String adminBaseUrl, final String clientBaseUrl, final ITestContext context) {
      restRequestSpec = new RequestSpecBuilder().setBaseUri(clientBaseUrl).setContentType(ContentType.JSON)
              .setRelaxedHTTPSValidation().build();

      adminRequestSpec = new RequestSpecBuilder().setBaseUri(adminBaseUrl).setContentType(ContentType.JSON)
              .setRelaxedHTTPSValidation().build();

      sessionFilter = (SessionFilter) context.getAttribute(FILTER_NAME);
  }
}

public static SessionFilter configureAdminRestAssured(final String baseUrl, final String username,
        final String password) {
    RestAssured.reset();
    RestAssured.enableLoggingOfRequestAndResponseIfValidationFails();
    RestAssured.config = config().logConfig(new LogConfig());
    RestAssured.config.getHttpClientConfig().reuseHttpClientInstance();
    RestAssured.config = config().sslConfig(sslConfig().allowAllHostnames());
    RestAssured.useRelaxedHTTPSValidation();
    
    final SessionFilter sessionFilter = new SessionFilter();
    final Map<string object> parameters = new HashMap<>();
    parameters.put("j_username", username);
    parameters.put("j_password", password);

    // if login successful, it will redirect it to index.html, check whether its status is 302,
    // the new location ends with index.html
    final ExtractableResponse<response> response =
            given().contentType(ContentType.URLENC).formParameters(parameters).filter(sessionFilter)
            .post(baseUrl + "/j_spring_security_check").then().statusCode(302).extract();
    assertThat(response.header("Location"), endsWith("index.html"));
    RestAssured.basePath = "/v1";
    return sessionFilter;
}


Building Chrome Extension to Find All On Sale Products in Google Express


The Problem
Google express does have a page to show on-sale products from one store, like this one https://www.google.com/express/u/0/search?oc=on_sale&m=9090995, but for unknown reason, it only lists a very small part of all on-sale products.

I like on-sale products, so I build one chrome extension that lists all on-sale products from one store.

How to Use
We go to one store, then run "Find on-sale products in Express" -> "Find in all pages", or choose one category(such as grocery), then choose "Find in all pages".

How it Works
In Google Express, when we go to the end of the page, it will automatically load more data in current page. So the extension continues to scroll to end of page until there is no more data to load, then remove all not on-sale products.

How to Build it We use extensionizr to create one chrome extension, then change the manifest.json as below:
{
  "name": "Find on-sales products in Express",
  "version": "0.0.1",
  "manifest_version": 2,
  "homepage_url": "http://lifelongprogrammer.blogspot.com",
  "default_locale": "en",
  "background": {
    "page": "src/bg/background.html",
    "persistent": true
  },
  "permissions": [
    "contextMenus",
    "tabs",
    "activeTab",    
    "https://www.google.com/express/*"
  ],
  "content_scripts": [
    {
      "matches": [
        "https://www.google.com/express/*",
      ],
      "js": [
        "src/inject/inject.js"
      ],
    }
  ]
}
background.js
We create context menu in background.js. When user clicks one content menu, it uses message passing to call function defined in content scripts.

Here we use "documentUrlPatterns":showForPages, to only create the context menus in google express websites.
'use strict';
var showForPages = ["https://www.google.com/express/*"];
function findOnSalesProducts(info, tab, pageCount) {
    console.log("pageCount: " + pageCount);
    chrome.tabs.query({
        "active": true,
        "currentWindow": true
    }, function (tabs) {
        chrome.tabs.sendMessage(tabs[0].id, {
            "functiontoInvoke": "findOnSalesProducts",
            "pageCount":  pageCount
        });
    });
}

function findAllSales(info, tab) {
    chrome.tabs.query({
        "active": true,
        "currentWindow": true
    }, function (tabs) {
        chrome.tabs.sendMessage(tabs[0].id, {
            "functiontoInvoke": "findAllSales"
        });
    });
}

chrome.contextMenus.create(
{
"title": "Find in all pages",
"contexts" : ["all"],
"documentUrlPatterns":showForPages,
"onclick": function(info, tab) {
    findAllSales(info, tab)
  }
});
createSeperatorMenu();

chrome.contextMenus.create(
{
"title": "Find in current page",
"contexts" : ["all"],
"documentUrlPatterns":showForPages,
"onclick": function(info, tab) {
    findOnSalesProducts(info, tab, 0)
  }
});

createSeperatorMenu();

var pageCounts = ["5", "10", "20", "40", "60"];
for (let pageCount of pageCounts) {
  chrome.contextMenus.create(
  {
  "title": "Find in next " + pageCount + " pages",
  "contexts" : ["all"],
  "documentUrlPatterns":showForPages,
  "onclick": function(info, tab) {
      findOnSalesProducts(info, tab, pageCount)
    }
  });
}

function createSeperatorMenu()
{
  chrome.contextMenus.create(
   {
     "title": "-------------------------------------",
     "contexts" : ["all"],
     "documentUrlPatterns":showForPages     
   }
  ); 
}
content_scripts.js
Main function is defined in content_scripts.js.
It uses setInterval to scroll to end of page multiple times, and use css selector to find not-on-sale products, and then remove them.
'use strict';
chrome.extension.onMessage.addListener(function (message, sender, callback) {
  console.log("message: " + JSON.stringify(message))
  if(message.functiontoInvoke =="findOnSalesProducts")
  {    
    findOnSalesProducts(message.pageCount);
  } else if(message.functiontoInvoke == "findAllSales")
  {
    findAllSales();
  }
  return;
});
function findOnSalesProducts(times)
{
  if(times == 0)
  {
    removeAllNonSales();
    return;
  }
  let timesRun = 0;
  var interval = setInterval(function(){
      timesRun += 1;
      if(timesRun > times){
          console.log("clearInterval and removeAllNonSales");
          clearInterval(interval);
          removeAllNonSales();
      }
      console.log("scroll")
      window.scrollTo(0,document.body.scrollHeight);
  }, 500); 
}

function findAllSales()
{
  let lastProductsCount = 0, currentProductsCount = 0;
  console.log(lastProductsCount +" last1, current: "+currentProductsCount );  
  var interval = setInterval(function(){
      currentProductsCount = getProductsCount();    
      console.log(lastProductsCount +" last2, current: "+currentProductsCount );  
      if(currentProductsCount == lastProductsCount){
          console.log("clearInterval and removeAllNonSales");
          clearInterval(interval);
          removeAllNonSales();
          return;
      }
      console.log("scroll")
      window.scrollTo(0,document.body.scrollHeight);
      lastProductsCount = currentProductsCount;
      console.log(lastProductsCount +" last3, current: "+currentProductsCount );
  }, 1500); 
}

function getProductsCount()
{
  return document.querySelectorAll("body div.contentArea>div.right ul>li").length;
}

function removeAllNonSales()
{
  var allProducts = document.querySelectorAll("body div.contentArea>div.right ul>li")
  Array.prototype.forEach.call( allProducts, function( node ) {
      if(node.querySelectorAll("span.priceSale").length==0)
      {
        node.parentNode.removeChild( node );    
      }    
  });
}
Resources
Chrome Extension Overview

Java Lambda-Enabled Concurrency - Don't Use Mutable Variables


What's wrong in following code?
The following code tries to submit all query task to threadpool, but it fails to do that - It only reads part of data.
private List<Future<List<T>>> findAllAsyncImpl(final SolrParams params, final int readSize,
        final int totalCount) {
    final ModifiableSolrParams query = new ModifiableSolrParams(params);
    query.set(CommonParams.ROWS, readSize);
    final List<Future<List<T>>> futures = new ArrayList<>(totalCount);
    int start = 0;
    while (start < totalCount) {
        query.set(CommonParams.START, start);
        futures.add(executor.submit(() -> querySolr(new ModifiableSolrParams(query))));
        start += readSize;
    }
    return futures;
}

I checked the log - what I expected is that it runs query like start=0, 1000, 2000, until totalCount-1000 (readSize is 1000)-  but it's not, it runs query with some start value (here start=45000) multiple times, some start value 0 time.
AbstractSolrRepository querySolr: q=omitted...&rows=1000&start=45000
AbstractSolrRepository querySolr: q=omitted...&rows=1000&start=45000
AbstractSolrRepository querySolr: q=omitted...&rows=1000&start=45000
AbstractSolrRepository querySolr: q=omitted...&rows=1000&start=45000

What the code is doing
Now let's change the lambda expression to the anonymous class() - Eclipse quick fix(Command +1) then select "convert to anonymous class creation".
The second part code creates explicit callable.


while (start < totalCount) {
    query.set(CommonParams.START, start);
    futures.add(executor.submit(new Callable<List<T>>() {
        @Override
        public List<T> call() throws Exception {
            return querySolr(new ModifiableSolrParams(query));
        }
    }));
    start += readSize;
}


while (start < totalCount) {
    query.set(CommonParams.START, start);
    futures.add(executor.submit(new SolrQueryTask(query)));
    start += readSize;
}

private class SolrQueryTask implements Callable<List<T>> {
    private final ModifiableSolrParams query;
    public SolrQueryTask(final ModifiableSolrParams query) {
        this.query = query;
    }
    public List<T> call() throws Exception {
        return querySolr(new ModifiableSolrParams(query));
    }
}
Now the root cause is kind of clear: the task is running in a different thread(for example at that time start=0), it refers variable whose state is changed later,  when the task is actually running, the query variable is already changed.

The solution
private List<Future<List<T>>> findAllAsync
    (final SolrParams params, final int readSize, final int totalCount) {
    final List<Future<List<T>>> futures = new ArrayList<>(totalCount);
    int start = 0;
    while (start < totalCount) {
        final ModifiableSolrParams targetParams =
        new ModifiableSolrParams(params).set(CommonParams.ROWS, readSize).set(CommonParams.START, start);

        futures.add(executor.submit(() -> querySolr(targetParams)));
        start += readSize;
    }
    return futures;
}

Java Lambda-Enabled Concurrency - Variables may be Changed Unexpectedly


In my last post, I explained that Don't Use Mutable Variables in Java Lambda Concurrency, but sometimes, variables may be changed unexpectedly.

The Problem
When optimize performance of function that exports solr data as csv, I add the logger.debug statements, then find out the test throws ConcurrentModificationException:
Caused by: java.util.ConcurrentModificationException: null
at java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:711) ~[na:1.8.0_40]
at java.util.LinkedHashMap$LinkedEntryIterator.next(LinkedHashMap.java:744) ~[na:1.8.0_40]
at java.util.LinkedHashMap$LinkedEntryIterator.next(LinkedHashMap.java:742) ~[na:1.8.0_40]
at org.apache.solr.common.params.ModifiableSolrParams.toString(ModifiableSolrParams.java:201) ~[solr-solrj-5.2.1.jar:5.2.1
at java.text.MessageFormat.subformat(MessageFormat.java:1280) ~[na:1.8.0_40]
at java.text.MessageFormat.format(MessageFormat.java:865) ~[na:1.8.0_40]
at java.text.Format.format(Format.java:157) ~[na:1.8.0_40]
at java.text.MessageFormat.format(MessageFormat.java:841) ~[na:1.8.0_40]
at AbstractSolrRepository.findAllAsyncImpl(AbstractSolrRepository.java:236)

The code looks like below:
@Nonnull private List<Future<List<T>>> findAllAsync(final SolrParams originalParams, final int readSize,
        final long totalCount) {
    final ModifiableSolrParams query = new ModifiableSolrParams(originalParams);
    query.set(CommonParams.ROWS, readSize);
    final List<Future<List<T>>> futures = new ArrayList<>();
    int start = 0;
    while (start < totalCount) {
        final ModifiableSolrParams solrParams = new ModifiableSolrParams(query);
        solrParams.set(CommonParams.START, start);
        final Stopwatch stopWatch = Stopwatch.createStarted();

        futures.add(executor.submit(() -> querySolr(solrParams)));
        if (logger.isDebugEnabled()) {
            logger.debug(MessageFormat.format("executor.submit took {0}, params: {1}",
                    stopWatch.elapsed(TimeUnit.MILLISECONDS), solrParams));
        }
        start += readSize;
    }
    return futures;
}
-- the code is much simplified later. Check final code here.
We submit the query task into thread pool, and want to log how long the submit takes - when all threads in the thread pool are busy, and its queue is full, the submit will block. I want to figure out whether it happens, whether I need change the thread pool settings.
-- Check AsyncConfigurer configuration code here.

It feels strange, so I decide to figure out why.

The Root Cause
ModifiableSolrParams uses LinkedHashMap to store data, its toString loops the LinkedHashMap. 
From the exception, it means that the ModifiableSolrParams is changed during the for-loop in toString() method.

So I check what changed in ModifiableSolrParams, find out solrClient.query adds _stateVer_=cobradb.survey_response:2903.

Then I go to solr github repos and search _stateVer_, and find CloudSolrClient and public static final String STATE_VERSION = "_stateVer_";.

Then I add a break point to places where STATE_VERSION is used and find the following code:
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(SolrRequest, int, String)
    if (request.getParams() instanceof ModifiableSolrParams) {
      ModifiableSolrParams params = (ModifiableSolrParams) request.getParams();
      if (stateVerParam != null) {
        params.set(STATE_VERSION, stateVerParam);
      } else {
        params.remove(STATE_VERSION);
      }
    }

The ConcurrentModificationException will happen when the code above and the for-loop in ModifiableSolrParams.toString runs at same time.

Search - Fix same problem
Next I search logger.*query and logger.*param to check whether we made same mistake.

Writing High-Performance Application


The Problem
Some features/functions are time consuming or time sensitive, we want to make it as fast as possible.
The following are some tips that I learned form my experience and past mistakes.

Does performance really matter?
Whether the function is time consuming or time sensitive?
If so, do take time to measure performance and optimize it. 

Know the function
What it does? What expensive (db/remote)APIs/Services it calls? How many times it calls?
Can we cache results?

Measure the performance
Use Guava StopWatch to measure time consuming functions and analyze where it spends most of the time. 
Use AspectJ to log slow calls.

Test with real data
If the function handles a lot of data, then test it with a lot of data.

Test with real data can also uncover other issues/bugs in the code.
It would be good if we can get backup of production data to do test; if not, use JMeter, postman runner, newman or write test code to inject data.

Postman already generates code(use OK Http or unirest in java) for request, just introduce some randomness, use threadpool and run it x times.

Change/Optimize only one thing each step, then compare the performance

Use Cache aggressively
For example, to cache query from solr or db, we can split query to non-user related and cache the results, and do user related post filtering in the code.

Parallelization - Avoid Putting things in Series
Use Spring ThreadPoolTaskExecutor
-- Thread creation is expensive, don't create a new thread each time, instead use thread pool.
-- Use common shared thread pool, instead create a new thread pool and kill it after all tasks are done.

ThreadPool Configuration
Know what tasks the pool runs, are they CPU bound or I/O bound?Use different settings for different cases.
Understand how Java ThreadPool works
-- Rules of a ThreadPoolExecutor pool size

  1. If the number of threads is less than the corePoolSize, create a new Thread to run a new task.
  2. If the number of threads is equal (or greater than) the corePoolSize, put the task into the queue.
  3. If the queue is full, and the number of threads is less than the maxPoolSize, create a new thread to run tasks in.
  4. If the queue is full, and the number of threads is greater than or equal to maxPoolSize, reject the task.
new threads are only created when the queue fills up, so if you’re using an unbounded queue then the number of threads will not exceed corePoolSize.

Sample Code
@Configuration
public class TaskExecutorConfig implements AsyncConfigurer {
  @Bean @Override
  public Executor getAsyncExecutor() {
      final ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();

      final int cpus = Runtime.getRuntime().availableProcessors();
      final int corePoolSize = cpus * defaultScaleFactor;
      executor.setCorePoolSize(corePoolSize);
      final int maxPoolSize = cpus * maxScaleFactor;
      executor.setMaxPoolSize(maxPoolSize);
      executor.setQueueCapacity(queueCapacity);
      executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());

      executor.setThreadNamePrefix(ASYNC_EXECUTOR_PREFIX);
      executor.initialize();
      return executor;
  }
}

Use CompletableFuture/RX-Java

Use Hystrix to defend failures from third-party services and Fail Fast

Use profile tools(VisualVM)
-- Monitor memory usage and profile cpu
-- Use the method name filter to only show method we care.
This can give us a high level of the code, what methods it calls and how many times etc.

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)