ReadWriteCloud

New Twitter Gets New Search

As part of its recent UI redesign, Twitter has also made some significant changes to its backend, and today Michael Busch updated the Twitter Engineering Blog with some details about how Twitter has revised search.

Initially Twitter's real-time search engine was based on the technology of Summize, a company Twitter acquired in 2008. But since then, Twitter has seen phenomenal growth: over 1,000 Tweets per second and 12,000 queries per second, making well over 1 billion queries per day. And the Twitter Engineering Team has been seeking some alternatives as "scaling the old MySQL-based system had become increasingly challenging."

So Twitter has moved to a new search architecture, choosing to adopt the open source Lucene.

Despite Lucene's strengths, it does have shortcomings in terms of real-time search. And so Twitter has rewritten parts of its architecture, while still supporting Lucene's APIs. These changes include:

  • significantly improved garbage collection performance
  • lock-free data structures and algorithms
  • posting lists, that are traversable in reverse order
  • efficient early query termination

This new search architecture is faster and more scalable, and uses only about 5% of the available backend resources, moving towards the engineering team's goal of building search "to support at least an order of magnitude more load."

For more information on how Twitter handles other big data challenges, check out the slides from Twitter engineer Kevin Weil's talk at Web 2.0 last month:


ReadWriteWeb encourages comments, but please remember: Keep it nice, keep it clean, and avoid promotional comments. We do pre-moderate some comments with links. For more information, please read our full comment policy.
blog comments powered by Disqus
Recommended Story
RWW SPONSORS



RWW PARTNERS