Google: Help Sharpen Our Profession Skills

We people, strive to improve our profession skills. 
It's not easy but we can never give up, as this is how we can make money to raise our family and children. 

It's a lifelong and tough task. We would appreciate if Google can help on this.

As developers/photographers/teachers, we want to sharpen our programmings/photography/teaching skills.
As students, we want to know what to learn, similar problems that we just solved, how to improve our problem solving skills etc.

So It would be great if Google could help in our mission to sharpen our professional skills.

Recommend technical/profession skills related content in Youtube, Google+, and Google Now

Youtube
Recommend to You(Profession Skill) 
Youtube is a great website for listening music(the music tab added recently is awesome) and watching casual videos.

But there are a also lot of profession skills related videos(programming, teaching, etc) on youtube. However, many people don't even know their existence even though they want to watch this kind of videos or sometimes they don't realize.

These videos are not usually on popular list or top video list which would hurt (passion and revenue)the producers of these profession skills videos 

A lot of people are eager to watch them, but even don't know their existence. The view count for this kind of videos are usually very small.

Therefore, it will be great that if Youtube can help us to find these videos so we know then and watch them.
There can be two kinds of profession videos: short, interesting and easy to understand which we can watch at spare time(eating, travelling, commuting); long, profound videos which we have to sit down and take time to understand.

In addition, this feature will benefit both the audiences and Youtube - people will watch more on Youtube, so more revenue for Youtube.

Watch offline if Eligible
Allow us to watchprofession skills related videos offline on our phone or tablet if its producer doesn't set ads on them.

Open Course on Youtube or Video for Education
There are a lot of open course sites like coursera, edx, they are very popular. It demonstrates that videos for education are appealing to people. So why not Youtube build its platform? 

Google+
Recommend to You(Profession Skill) - Recommend profession skills related posts to us
I'm not a very social person, so I don't use my Facebook or twitter. 
As a fan of Google. I mainly use G+, especially its what's hot and explorer(Technology category).

As most of people, I am always working hardly to improve my profession skills - for me, they are programming, coding, algorithm, coding interview(recently). 
I am sure there are a lot of useful and interesting posts that are related with these topics in G+.

So it would be great if G+ can help us to find these related posts and recommend to us. So we can read them when I check my G+ during my spare time(waiting train, shopping with family etc).

We can have fun and sharpen our profession skill at same time .

"what's hot" and "recommend to you" in community
What's hot, explore and communities are great features.
Even though I join multiple programming related communities,I don't check them often.
In these communities, there are tons of useful posts has been posted and updated everyday these communities. It is difficult for me to catch them all at once. 

So if G+ can do us a favor, in each community, it can have a category "what's hot" or "recommend to you". It will be  a great gift to us!

Allow user to create group and aggregate content from similar communities in G+.
It is awesome if we can create a group and to aggregate content from multiple communities, so we can check all updates in one place - my group.
The group will list "what's hot" and "recommend to you" posts in all defined communities. 


Group similar topics in play newsstand
Allow us to create our own topic or group to aggregate posts from multiple topics or feeds. 
More profession skills related topics. - for me like coding skills, algorithm, 

Google Now
Google Now has done a great job on this, hope it can do better in future.

Google Alters: Job Recommendation
At some time, we may decide to move on or look for new challenge.

Linkedin is great, but there are also a lot of other job search sites: lever, jobvite, etc. They can't

Google can actually help  on this: as Google search knows all jobs that are posted recently in companies' sites.

User can use Google alerts to define what kind of jobs they are interested, like:
Location: New York or Bay Area
Keyword: Lucene Solr Hadoop

Then Google Alters can notify us whenever there are matched jobs posted online.

Java: Using classmexer MemoryUtil to Get Object Deep Memory

The Problem
In some case, we may want to get the deep memory usage of one object.
For example, in recent project, I developed one Solr request handler which will copy docs from remote solr to local solr.

The request looks like this: /solr/core/pulldocs?remoteSolr=solrurl&q=query&fl=fields&rows=ROWS&start=START
Internally, it will get 100 docs each time: first get START to START+100 then get START+100 to START+200 - there are actually 5 threads to pull docs and insert to local solr at same time.

But in one test environment, the tester reports that the get 100-docs request gets slower and slower. I am guessing it's not the case, but because some 100 docs are abnormal and huge.

So I need to find it out and prove it: I want to print each request execution time and the size of solr response from remote solr server.

Solution: Use classmexer MemoryUtil to Get Deep Memory Usage
So, how to get deep memory usage of java object
Via google search, I found we can use Java Instrumentation to get object size(Instrumentation.getObjectSize), but which just gives the shallow size of object.

Then I found MemoryUtil from classmexer which can get deep memory usage of object.
MemoryUtil.deepMemoryUsageOf(object)

Integrate classmexer MemoryUtil to Web Application
In order to use MemoryUtil in our Solr application, I add the -javaagent:C:\mysolrapp\extra\classmexer.jar to the Java startup parameter.

Then change the code like below:
QueryResponse rsp = solrServer.query(fetchQuery);
logger.info("start: " + fetchQuery.getStart() + ", deep size: "
  + MemoryUtil.deepMemoryUsageOf(rsp));
Copy the new built class to WEB-INF/classes, restart server and rerun the test. From the log, I can easily find the huge solr response from remote solr like below:
INFO: start: 4000, deep size: 714, 778, 104 ==> 700mb approximately, in normal case, it should between 1 and 10 mb.
INFO: Added 100, start: 4000, took 1195796

Then clean data, rerun test with start=4000&rows=100

Check the solr index, the size of solr index is more than 5 g, use Luke to analyze the Solr index, and found 99.99% is content field, which has more than 41 million terms.

The real root cause is in the server side, when server extracts text from file, if the file is corrupted, it will get the binary data and add it into content field which is huge. We fixed the server side code issue, and everything works fine.
The Problem
In some cases, we may want to check the deep memory size of one object: for example, in recent project, 

Scala & Java: Merge K Sorted List

Recently I started to learn Scala, and the best way to learn a new language is to write code to resolve real problem.

So here is my code to use Scala for the classic algorithm question: merge K stored list.
The code works, but as I am just a beginner in Scala, it doesn't use Scala's full power or features - I just translated my Java version to Scala.

Scala Code: Merge K Sorted List
As the list can be ArrayList or linkedList, so we use Iterator to check whether it still has elements and get next element.
package org.lifelongprogrammer.scala.algorithms

import scala.collection.mutable
import scala.collection.mutable.PriorityQueue
object MergeKArrays {
  case class Element[E <: Comparable[E]](var value: E, iterator: Iterator[E]) extends Ordered[Element[E]] {
    def compare(that: Element[E]) = that.value.compareTo(this.value)
  }
  def merge[E <: Comparable[E]](lists: List[List[E]]): List[E] =
    {
      if (lists == null || lists.isEmpty)
        return List[E]();

      val pq = new PriorityQueue[Element[E]]()

      for (list <- lists) {
        if (list != null && !list.isEmpty) {
          val it = list.iterator;
          pq.enqueue(Element(it.next, it))
        }
      }

      val result = mutable.ListBuffer[E]();
      while (pq.size > 1) {
        val first = pq.dequeue;
        result.append(first.value)

        val it = first.iterator;
        if (it.hasNext) {
          // reuse first element
          first.value = it.next;
          pq.enqueue(first)
        }
      }

      if (!pq.isEmpty) {
        val first = pq.dequeue;
        result.append(first.value);

        val it = first.iterator;
        while (it.hasNext) {
          result.append(it.next);
        }
      }
      return result.toList;
    }

  def main(args: Array[String]) {
    val lists: List[List[Integer]] = List(List(1, 3, 5), List(2, 4, 6, 8))
    val result = merge(lists);
    print(result)
  }
}
Java: Merge K Sorted List
Here is my Java code:
package org.codeexample.algorithms;
public class MergeKArray {

 private static class Element<E extends Comparable<E>> implements
   Comparable<Element<E>> {
  private E value;
  private Iterator<E> iterator;
  @Override
  public int compareTo(Element<E> o) {
   return this.value.compareTo(o.value);
  }
 }
 /**
  * Preconditions: List can't contain null
  */
 public static <E extends Comparable<E>> List<E> merge(List<List<E>> lists) {
  if (lists == null || lists.isEmpty())
   return new ArrayList<>();

  PriorityQueue<Element<E>> pq = new PriorityQueue<>(lists.size()
  // ,new ElementComparator<E>() // or use external comparator
  );

  int allSize = 0;
  for (List<E> list : lists) {
   if (list != null && !list.isEmpty()) {
    Element<E> e = new Element<>();
    e.iterator = list.iterator();
    e.value = e.iterator.next();
    assert e.value != null;
    pq.add(e);

    allSize += list.size();
   }
  }
  List<E> result = new ArrayList<>(allSize);

  while (pq.size() > 1) {
   Element<E> e = pq.poll();
   assert e.value != null;
   result.add(e.value);
   Iterator<E> iterator = e.iterator;
   if (iterator.hasNext()) {
    e.value = iterator.next();
    assert e.value != null;
    pq.add(e);
   }
  }

  if (!pq.isEmpty()) {
   Element<E> e = pq.poll();
   result.add(e.value);
   while (e.iterator.hasNext()) {
    result.add(e.iterator.next());
   }
  }
  return result;
 }

 private static class ElementComparator<E extends Comparable<E>> implements
   Comparator<Element<E>> {
  public int compare(Element<E> o1, Element<E> o2) {
   return o1.value.compareTo(o2.value);
  }
 }

 public static void main(String[] args) {
  List<List<Integer>> lists = new ArrayList<>();
  lists.add(Arrays.asList(1, 3, 5));
  lists.add(Arrays.asList(2, 4, 6, 8));
  lists.add(Arrays.asList(0, 10, 13, 15));
  System.out.println(merge(lists));
 }
}

Eclipse Debugging Tips: Find which jar containing the class and the application is using

The problem:
Today I am adding Solr Cell(Tika) to our Solr application, during test, it throws the following exception:
Caused by: java.lang.NoSuchMethodError: org.apache.tika.mime.MediaType.set([Lorg/apache/tika/mime/MediaType;)Ljava/util/Set;
        at org.apache.tika.parser.crypto.Pkcs7Parser.getSupportedTypes(Pkcs7Parser.java:52)
        at org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)

This looks like there is a conflicting tika-core jar.
I used the java decompiler JD-GUI to check the jars I added: solr\contrib\extraction\lib\tika-core-1.3.jar, the class MediaType does contain this method: set(MediaType[] types).

Then it seems there are some other jars containing the MediaType class, I checked solr.war\WEB-INF\lib, but no obvious hint.

Using Eclipse Display View to Check which jar Contains the Class
I enabled the remote debug, added a breakpoint at Pkcs7Parser.getSupportedTypes(Pkcs7Parser.java:52), reran the Solr Cell request, it hit and stops at Pkcs7Parser.getSupportedTypes(Pkcs7Parser.java:52).
java.security.CodeSource src = org.apache.tika.mime.MediaType.class.getProtectionDomain().getCodeSource();
return src.getLocation();
The output:
(java.net.URL) file:omitted/webapps/server/WEB-INF/lib/crawler4j-dependency.jar

Check crawler4j-dependency.ja, so now the root cause is obvious.
The culprit is that some one added crawler4j into the Solr application and put its all dependencies into crawler4j-dependency.jar. It uses tika-core-1.0.jar, the MediaType classes doesn't contain the method: set(MediaType[] types).

We can use following code to return all methods in MediaType:
java.lang.reflect.Method[] methods = org.apache.tika.mime.MediaType.class.getMethods();
return methods;

The Problem - Breakpoint doesn't work
When we step through classes from some jars in Eclipse, we may find that the code doesn't match or breakpoint doesn't work at all.

This usually means there are multiple versions of same class or library in your application, the one java uses is not same as the Eclipse loads to debug. 
you can check what java is using by - this.getClass()/(XClass.class).getProtectionDomain().getCodeSource().getLocation()

You can check what jar Eclipse is loading in package view - if "Link with Editor" is enabled.

References
Uploading Data with Solr Cell using Apache Tika
Solr ExtractingRequestHandler

Labels

Java (159) Lucene-Solr (112) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (38) Eclipse (33) Code Example (31) Linux (25) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) J2EE (13) Network (13) Troubleshooting (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) Problem Solving (9) UIMA (9) html (9) Http Client (8) Maven (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) Shell (7) ANT (6) Coding Skills (6) Database (6) Lesson Learned (6) Programmer Skills (6) Scala (6) Tips (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) System Design (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts