Problem Solving Practice - Redis cache.put Hangs


The Issue
After deployed the change: Multi Tiered Caching - Using in-process EhCache in front of Distributed Redis to test environment (with some other change and someone did some change in the server like restart), we found out that cache.put hangs when save data to redis.

Troubleshooting Process
First we tried to reproduce the issue in my local setup, it always works. But we can easily reproduce it in test environment.

This mde me think this maybe something related with the test environment.

Then I used kill -8 processId to generate several thread dumps when reproduce the issue in test machine. I found out some suspect:
"ajp-nio-8009-exec-10" #91 daemon prio=5 os_prio=0 tid=0x00007f49c400a800 nid=0x75db waiting on condition [0x00007f495333e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at  RedisCache$RedisCachePutCallback(RedisCache$AbstractRedisCacheCallback).waitForLock(RedisConnection) line: 600
RedisCache$RedisCachePutCallback(RedisCache$AbstractRedisCacheCallback).doInRedis(RedisConnection) line: 564
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:207)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:169)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:157)
at org.springframework.data.redis.cache.RedisCache.put(RedisCache.java:226)
at org.springframework.data.redis.cache.RedisCache.put(RedisCache.java:194)
at com.lifelong.example.MultiTieredCache.lambda$put$40(MultiTieredCache.java:130)
at com.lifelong.example.MultiTieredCache$$Lambda$18/1283186866.accept(Unknown Source)
at java.util.ArrayList.forEach(ArrayList.java:1249)
at com.lifelong.example.MultiTieredCache.put(MultiTieredCache.java:128)
at org.springframework.cache.interceptor.AbstractCacheInvoker.doPut(AbstractCacheInvoker.java:85)
at org.springframework.cache.interceptor.CacheAspectSupport$CachePutRequest.apply(CacheAspectSupport.java:784)
at org.springframework.cache.interceptor.CacheAspectSupport.execute(CacheAspectSupport.java:417)
at org.springframework.cache.interceptor.CacheAspectSupport.execute(CacheAspectSupport.java:327)
at org.springframework.cache.interceptor.CacheInterceptor.invoke(CacheInterceptor.java:61)

Check the code at RedisCache$AbstractRedisCacheCallback to understand how it works:
for operations like put/putIfAbsent/evict/clear, @cacheable with sync =true(RedisWriteThroughCallback), it check whether there is a key like cacheName~lock in redis, if exist, it will wait until it's gone.

This lock is created and deleted for @Cacheable with sync =true in RedisWriteThroughCallback which calls lock and unlock methods.

This made me check the settings in redis: after created the tunnel to redis, ran command: key cacheName~lock, I found out that it's indeed there.

Now everything make sense:
- we did set sync=true and run performance test, then restarted the server and removed it. The cacheName~lock was left there may be due to server restart. Due to the cacheName~lock, now all resid update api would not work.

After removed cacheName~lock in redis, everything works fine.

Take away
- When use some feature (@Cacheable(sync=true) in this case), know how it's implemented.

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)