Improve Apache Commons ArrayUtils removeElements to O(n)


Recently, I read apache commons ArrayUtils removeElements source code and found that its implementation is not efficient when try to get all indexes that needed to be removed.

It's O(m*n): n is the length of origin array, m is the unique count of toBeRemoved.
    final BitSet toRemove = new BitSet();
    for (final Map.Entry<Byte, MutableInt> e : occurrences.entrySet()) {
        final Byte v = e.getKey();
        int found = 0;
        for (int i = 0, ct = e.getValue().intValue(); i < ct; i++) {
            found = indexOf(array, v.byteValue(), found);
            if (found < 0) {
                break;
            }
            toRemove.set(found++);
        }
    }

We can just scan the origin array once to get all indexes that needed to be removed.
    final BitSet toRemove = new BitSet();

    // the only change here, time complexity changed to O(n)
    for (int i = 0; i < array.length; i++) {
        final int key = array[i];

        final MutableInt count = occurrences.get(key);
        if (count != null) {
            count.decrement();
            if (count.intValue() == 0) {
                occurrences.remove(key);
            }
            toRemove.set(i);
        }
    }

I created one JIRA: Improve ArrayUtils removeElements time complexity to O(n)
The complete code
The implementation is quite elegant and robust.
 
    public static int[] removeElements(final int[] array, final int... values) {
        if (isEmpty(array) || isEmpty(values)) {
            return clone(array); // can't just return origin array
        }
        final HashMap<Integer, MutableInt> occurrences = new HashMap<Integer, MutableInt>(values.length);
        for (final int v : values) {
            final Integer boxed = Integer.valueOf(v);
            final MutableInt count = occurrences.get(boxed);
            if (count == null) {
                occurrences.put(boxed, new MutableInt(1));
            } else {
                count.increment();
            }
        }
        final BitSet toRemove = new BitSet();

        // My change here, time complexity changed to O(n)
        for (int i = 0; i < array.length; i++) {
            final int key = array[i];

            final MutableInt count = occurrences.get(key);
            if (count != null) {
                count.decrement();
                if (count.intValue() == 0) {
                    occurrences.remove(key);
                }
                toRemove.set(i);
            }
        }

        return (int[]) removeAll(array, toRemove);
    }

    static Object removeAll(final Object array, final BitSet indices) {
        final int srcLength = ArrayUtils.getLength(array);
        // No need to check maxIndex here, because method only currently called from removeElements()
        // which guarantee to generate on;y valid bit entries.
//        final int maxIndex = indices.length();
//        if (maxIndex > srcLength) { 
//            throw new IndexOutOfBoundsException("Index: " + (maxIndex-1) + ", Length: " + srcLength);
//        }
        final int removals = indices.cardinality(); // true bits are items to remove
        final Object result = Array.newInstance(array.getClass().getComponentType(), srcLength - removals);
        int srcIndex=0;
        int destIndex=0;
        int count;
        int set;
        while((set = indices.nextSetBit(srcIndex)) != -1){
            count = set - srcIndex;
            if (count > 0) {
                System.arraycopy(array, srcIndex, result, destIndex, count);
                destIndex += count;
            }
            srcIndex = indices.nextClearBit(set);
        }
        count = srcLength - srcIndex;
        if (count > 0) {
            System.arraycopy(array, srcIndex, result, destIndex, count);            
        }
        return result;
    } 

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)