Using in-memory Embedded Solr

The problem
We store data in solr cloud. As application evolves from single-suppose message application to multiple-tenant message applications, there is much more traffic to our application. 

The traffic is high, but the data is small. - As we don't have many active messages at specific time.

To boost search performance, we decide to use in memory embeddedSolr. 


The Solution
Admin application still use CloudSolrRepositery to write data into solr cloud.

Client-facing application periodically deletes expired data from embeddedSolr and copies (only) updated/new data from solr cloud to embeddedSolr. 

We change server code to use the EmbeddedSolrRepositery. - So there is only little change to existing code.

DataSyncService.copyMessagesFromSolrCloudToEmbeddedSolr deletes expired data from embedded Solr and then copies (only) updated/new data from solr cloud to embeddedSolr. - it ignores data that already exists(with same id and _version_ values).

It's already be called in the ContextLoaderListener so it copies all data from solrcloud to embedded Solr before application startup finishes.

It's also a scheduled task - it will be called periodically.
Here we use SchedulingConfigurer - not @Scheduled because we want to make the interval configurable and changeable. - @Scheduled only supports read value from property file, but doesn't support to call bean method.

Also admin application can change the configuration to enable/disable embedded solr and change the frequency of sync.

Talk is cheap. Show me the code.
@Service(MessageCloudRepository.NAME)
public class MessageCloudRepository extends AbstractMessageRepository {
    public static final String NAME = "MessageCloudRepository";
    @Autowired
    @Qualifier(RestCommonsAppConfig.BEAN_SOLR_CLOUD)
    private SolrClient cloudSolrServer;
    @Override
    public SolrClient getSolrServer() {
        return cloudSolrServer;
    }
}

@Service(MessageEmbeddedRepository.NAME)
public class MessageEmbeddedRepository extends AbstractMessageRepository {
    public static final String NAME = "MessageEmbeddedRepository";
    @Autowired
    @Qualifier(RestCommonsAppConfig.BEAN_EMBEDDED_MESSAGE_)
    private SolrClient embeddedSolrServer;

    @Override
    public SolrClient getSolrServer() {
        return embeddedSolrServer;
    }
}

@Service
public class DataSyncService {
    @Autowired
    @Qualifier(MessageEmbeddedRepository.NAME)
    private IMessageRepository embeddedRepository;
    @Autowired
    private @Qualifier(MessageCloudRepository.NAME) IMessageRepository cloudRepository;
    @Autowired
    private IConfigService configService;

    public void copyMessagesFromSolrCloudToEmbeddedSolr() {
        if (!configService.isEmbeddedMessageSolrEnabled()) {
            return;
        }
        deleteExpiredDataFromEmbeddedSolr();

        final String query = ""; // the query to get new active data
        final SolrQuery solrQuery = new SolrQuery(query);
        // but add filter to ignore data already in embeddedSolr with same id and _version_
        ignoreExistingData(solrQuery);

        final List<Future<List<Message>>> messagesFutures = cloudRepository.findAllAsync(solrQuery);
        if (CollectionUtils.isNotEmpty(messagesFutures)) {
            for (final Future<List<Message>> messagesFuture : messagesFutures) {
                try {
                    final List<Message> messages = messagesFuture.get().stream().map(message -> {
                        message.setVersionFromSolrCloud(message.getVersion());
                        return message;
                    }).collect(Collectors.toList());
                    embeddedRepository.saveWithoutCommit(messages);
                } catch (InterruptedException | ExecutionException e) {
                    logger.error("Failed to copy data from solr cloud to embedded solr.", e);
                }
            }

            embeddedRepository.hardCommit();
        }
    }
    /**
     * ignore data that is already in embedded solr and with same id and _version_.
     */
    protected void ignoreExistingData(final SolrQuery solrQuery) {
        final SolrQuery existingDataQuery = new SolrQuery("*:*").setFields(Abstract.FIELD_ID,
                Message.FIELD_VERSION_FROM_SOLR_CLOUD);

        final Iterable<Message> existingMessages = embeddedRepository.findAllSync(existingDataQuery);
        // NOT ((id:id1 AND _version_:v1) OR (id:id2 AND _version_:v2))
        final Iterator<Message> it = existingMessages.iterator();
        if (it.hasNext()) {
            final StringBuilder sb = new StringBuilder();
            while (it.hasNext()) {
                final Message message = it.next();
                sb.append(MessageFormat.format("({0}:{1} AND {2}:{3,number,#})", Abstract.FIELD_ID,
                        message.getId(), AbstractSolrDocument.FIELD_VERSION_,
                        message.getVersionFromSolrCloud()));

                if (it.hasNext()) {
                    sb.append(SolrUtil.SEPERATOR_OR);
                }
            }
            solrQuery.addFilterQuery(MessageFormat.format("{0}({1})", SolrUtil.NOT, sb.toString()));
        }
    }
}

@Configuration
@EnableScheduling
public class ScheduledTaskConfig implements SchedulingConfigurer {
    @Autowired
    private DataSyncService dataSyncService;
    @Autowired
    private IConfigService configService;
    @Bean(destroyMethod = "shutdown")
    public Executor taskExecutor() {
        return Executors.newScheduledThreadPool(10);
    }
    @Override
    public void configureTasks(final ScheduledTaskRegistrar taskRegistrar) {
        taskRegistrar.setScheduler(taskExecutor());
        taskRegistrar.addTriggerTask(new Runnable() {
            @Override
            public void run() {
                dataSyncService.copyMessagesFromSolrCloudToEmbeddedSolr();
            }
        }, new Trigger() {
            @Override
            public Date nextExecutionTime(final TriggerContext triggerContext) {
                final Calendar nextExecutionTime = new GregorianCalendar();
                final Date lastActualExecutionTime = triggerContext.lastActualExecutionTime();
                nextExecutionTime.setTime(lastActualExecutionTime != null ? lastActualExecutionTime : new Date());

                nextExecutionTime.add(Calendar.MILLISECOND,
                        configService.getSimpleConfig().extractSyncMessageToEmbeddedSolrIntervalInMill());
                return nextExecutionTime.getTime();
            }
        });
    }
}
Post a Comment

Labels

Java (159) Lucene-Solr (112) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (38) Eclipse (33) Code Example (31) Linux (25) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) J2EE (13) Network (13) Troubleshooting (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) Problem Solving (9) UIMA (9) html (9) Http Client (8) Maven (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) Shell (7) ANT (6) Coding Skills (6) Database (6) Lesson Learned (6) Programmer Skills (6) Scala (6) Tips (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) System Design (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts