Because Twitter is getting more popular, every glitch in the service is now felt more acutely. Going without Twitter for many people is even harder than going without email, and so outages lead to complaints.
Complaints pile up and become debates, asking questions like: should Twitter be converted into a protocol and become decentralized? Is that the way to scale Twitter and make it more reliable? If not, how can that goal be accomplished?
To me, the answer to decentralization is a firm no. First of all technically it won't solve the problem. At least not in any way that Twitter folks can't solve it themselves. The whole question actually misses the main point. We love Twitter as an application, and its strength is the fact that people know where to find it, people love the What are you doing now? question. Amongst a sea of copycats, it was Twitter that took off and that's why we know and love it. Twitter is Twitter and it should not be anything else.
The question that people should be asking, though, is how to properly scale Twitter and, for that matter, the whole slew of other life streaming applications. Clearly these applications are trying to break new ground by merging streams of your friends' activity together and presenting you with a single view of that information. All of these applications are facing similar challenges and they could all architecturally benefit from the same pattern. So what is it? How should these applications be designed so that they scale to meet the demand?
To understand the challenge facing Twitter and other life streaming applications look at the diagram below. A node in the center is an individual user and the the nodes around it are the users that follow or subscribe to this user. The blue nodes are subscribers and red ones, U1 and U2, are publishers in this diagram. The yellow node, R1, is a receiver that gets updates from both U1 and U2. The yellow note shows that the stream of R1 contains updates from both U1 and U2.

How to manage all of this is pretty obvious until one starts thinking about how to generate the stream for R1. The answer that comes to mind right away is - on demand. That is, when the user R1 checks his Twitter page or via one of the many Twitter clients, the stream is computed and delivered on the fly. Unfortunately, and this is true with distributed large-scale systems in general, the first answer is likely to be wrong. If the user subscribes to say 100 other people, to pull together 100 streams and merge them so that they are in the right time order is likely to take seconds if not minutes. Who is going to want to wait this long?
If the user view can not be computed on the fly, then the only other answer is that it is pre-generated. That is, whether the user checks or does not check the stream, it is there and available as soon as possible. This answer also seems wrong, because this approach is quite wasteful. Pre-generating all these views without anyone looking at them is going to cost a lot of wasted space and compute power. Yet, of the two approaches, this is the one that delivers a better user experience and so this is the one that is likely to succeed.
We wrote about scalability challenges with relational databases in our post about Amazon Dynamo. Yet relational databases should not be dismissed. First, lets look at how relational database can be used to engineer a solution. A simple approach is to have three tables USER, FRIENDS, and MESSAGES; the tables would look like this:

This is very simple, and of course it does not scale. As the tables grow, even if everything is indexed, doing all the look ups via this set of tables can not work. What kills this solution is that all information of all users is sitting in these tables. What if there was a way to split all this information into many databases? If this was possible then the system would scale. So lets say we took all the users that start with letter 'a' and put them into one database and then all the users that start with letter 'b' into another and so on. There are systems that can benefit from such partitioning. For example, MyBlogLog can be partitioned so that all data is stored around individual blogs.
Unfortunately, in case of Twitter and other life streaming applications such partitioning does not work. The reason is that we can not predict who is going to follow who. So there is no way to cluster the data so that all the necessary messages end up in the same database. And cross-database queries are very costly - not a way to go. So the reason a relational database does not work is because a single database system does not scale and the data is not amicable to partitioning into multiple databases.
Increasingly more and more companies, particularly in the consumer space, are turning to cloud computing. The fundamental difference that the cloud approach takes vs. the relational database approach, is that with the cloud you split the data and split the computation on a massive grid and then make trade-offs. For example, in the relational database data duplication is a big no. Things need to be normalized. In the cloud world data duplication is okay, because a lot of views are pre-generated and a lot of them contain duplicate information. For example, a message from one user is duplicated into bins for everyone who subscribes.
The second trade-off is consistency, which is kind of related to duplication. When we are talking about mission critical applications on Wall Street, data consistency is a top priority at all times. But does it matter if user A gets a message from user B and then it takes a few minutes to deliver the same message to user C? Of course not. Even in our real-time hungry culture this delay is acceptable. So the trade-off is to forgo consistency across the entire system but gain flexibility instead.

The cloud version of the system is depicted above. For simplicity we have 26 machines on the grid - one for each letter. Each machine is identical and focused on users whose id starts with the corresponding letter. All information about a specific user is stored on the machine, including profile, list of people who follow this user, etc. The key thing is that the system maintains not only a list of messages generated by this user, it also immediately creates a list of messages for the user. It does that by processing updates from other machines. The whole thing is a peer-to-peer, completely connected ring system, where each machine is connected to all others.
When a user types an update into the UI it is first sent to the machine that handles this user. That machine immediately updates the user's messages and then broadcasts those messages to other machines. For each other user who follows the one that just updated, the system constructs the message and sends it precisely to the machine that hosts that other user (it is known because users are split based on letters). So for each update the system sends out exactly the correct number of messages. Then as soon as the message is received on the other end, it is written directly into the stream of the receiving user.
What is nice about the cloud solution is that it scales as the user set grows. The partitioning mechanism does not need to be based on letters, in fact it should really be based on a quick, uniform hash which would ensure that the users are distributed evenly around the grid. And there are many other details, of course, because what we described here is not the solution, but just the basic idea.
Software engineering as a discipline has evolved a set of design patterns. These are solutions that work across languages and across different systems. What we need is new design patterns for life streaming applications like Twitter. Instead of talking about breaking the service into distributed systems (which likely would perform worse and have more outages), we need to work through the best ways of building this kind of software. Google scales, Amazon scales, and as Stowe Boyd pointed out, so does AIM (although it has considerably less connections). And certainly there is a right architecture for Twitter, which would make it scale to meet the growing demand.
So lets start right now! Please tell us what you think are the best ways to design life streaming applications? What trade-offs do you think should be made? And please tell us about your specific experiences building these systems.
Comments
Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts
Wow! Now I know how Twitter actually works... and understand it's scalability issues a bit more.
I won't tweet so loudly next time it's down... or should I?
Alex,
The distributed architecture you explain sounds like an excellent solution, and is very similar to how DNS works if I remember correctly. Oddly enough, it is also very typical of database replication scenarios. Some of the concepts of data warehousing and OLAP applications can be applied as well, because they deal in high volume and near real-time scenarios.
Nicely done.
Out of curiosity, what method does twitter use, and when something does break, what part of the system is most likely at fault?
Your assertion that Twitter can't be decentralised conflates the notion of discovery and service. Discovery can be centralised while the service provision can be decentralised. DNS is a perfect example of a centralised namespace with a decentralised service that maps names to ip addresses. Bit Torrent is another. And to over simplify, what you describe in terms of an architecture is typical of many P2P systems - it is just P2P inside the wall.
In a perfect world there is no technical reason why, for example, a twitter type discovery and friend management service couldn't map all message delivery on to a network of decentralised jabber end-points.
The scaling problem comes because Twitter is trying to "capture" all our tweets, control the UX and API, control the namespace to insert themselves as a centralised message utility in order to extract value. i.e. it is the constrains of the business model that makes Twitter have a bundled service and therefore means Twitter (the business/service) can't be decentralised.
Warning, this is going to be long, so sorry ahead of time.
Your example of how twitter can do it would work in some scenarios, but not twitter's. The reason is that twitter actually runs off of ID numbers, not the username. As a user, I can change my username at anytime, and doing that would screw up the methodology in your system (you can verify this if you'd like - you username will change, but the ID won't).
Your system also has a flaw of only being about to scale out 26 servers. While you could actually turn it into 26 clusters of servers, or go by say the first two letters instead of the first letter giving you 17,576 locations instead of 26, it is not ideal. Mainly because some letters are more common than others. Those servers with the popular letters would have more users.
What twitter probably has is a sharded database table that has the userid, username and password (say at one million users per shard). You would want this table to just be those three columns since you want to have an index on the userid and username columns and not be slowed down by other columns. The password would be in this table so that they could easily do authentications.
Then they would have another table that says what server they are stored on by storing just the userid and the server id. This could be stored in the username table too, but for scalability reasons, it'd probably be on its own server instance.
What is nice about this is that you can setup empty servers and then as your app starts to scale, it can determine when it needs to move people to new servers automatically. Depending on your preferences, you could make it to where when you add additional server it would automatically move users around to use the new capacity. You would just need to update the one table that says where your stuff is located and move all the data there.
Systems designed like this can scale quite easily if you build it right. The only issue is that you have to know what your hardware limits are and when to add capacity.
The real issue with twitters scale-out is not the storage of user profiles like everyone thinks it is though. It is scaling their jabber back-end and the systems that listen to it. That is the system that always goes down and can't keep up with the tweets when big events happen. What I don't get is ejabberd has clustering built into it. You can easily add new servers and expand the resources. Maybe they aren't using ejabberd though. More than likely the systems that listen the the Jabber servers are crapping out with the high volumes, in which case those are the culprits of all of our downtime.
Sorry this comment got so long... Hopefully it was somewhat informative or at least thought provoking.
Alex,
Another great post. I wish other bloggers offered this kind of analysis not the drive by news commentary.
RWW rocks!
A sea of copy cats, eh?
Why is it so bad for Twitter to become decentralized, and live the same life as email? It's a feasible, and could lead to perhaps even faster innovation.
Great post!
Nice to see somebody not actually laying all the blame on Rails. It is due to how Rails does things, because it expects one database for everything, but Rails was never designed to scale to Twitter usage without adjustments. It would be asking the Digg clone 'Pligg' to scale up to Digg usage, it just isn't going to happen.
Alex,
Wonderful post, it quenched my distributed computing appetite.
The relational database model is completely unnecessary for Twitter, something like BerkleyDB with key-value pairs would do be much more suitable. High availability and quick access is all that matters.
As well I would guess you could use some type of dynamic load balancing that would place extremely active users with large numbers of followers on servers with users that do not have as many as opposed to just doing it based on some predetermined hash or id. It could be configured so that messages eventually get removed from this key-value database after a relative period of time and put into a relational database for archival.
Ugh I'm going to dream about this tonight.
SMTP + POP/IMAP?
Usenet/NNTP ?
-- MV
This topic is useless as we do not know the *real* software architecture they have chosen (I do not speak about the language or environment), the quality/reusability/rigidity of their code, etc.
The only thing for sure, the bottleneck is the database... Same issue for 99% of applications.
It's funny how twitter is basically 'cool email'. It's just one big public email inbox network!
Just make all the tweets little emails that get sent to everyone who is 'following' you and let people see what emails youve received or sent.
"It would be asking the Digg clone 'Pligg' to scale up to Digg usage, it just isn't going to happen."
huh?
you know that both pligg and digg run on php+mysql, right?
m3mnoch.
@pwb LOL - you are so right!
Hmm, this approach seems to ignore all sorts of practical considerations, and likely would run into a bunch of issues.
First off, the idea of distributing users based on name naively assumes an equal distribution of users across the alphabet -- which isn't the case.
Second, this approach assumes a relatively equal "twitter load" for all users in the system -- which definitely isn't the case.
Thirdly, this approach ignores all "cross-talk" issues related to users subscribing to other users tweets, making no optimizations for this whatsoever. This could lead to an exponential explosion of network traffic due to bad distribution of users.
There are many other problems with this approach, which I don't have time to go into. Those interested in this problem should read up on Enterprise Integration Patterns as many of these problems are well researched and already solved.
Alex says "does it matter if user A gets a message from user B and then it takes a few minutes to deliver the same message to user C ? Of course not."
In mission critical systems, even on Wallstreet, messaging is fine tuned to milliseconds. It matters to Wallstreet - most algo's are looking for price variance.
Where else does it matter? Earthquakes? Maybe. Tsunami's? Tornadoes? A few days ago, via Twitter pub timeline, we tracked the path of the Picher Tornado. We sat here knowing it's position, and trajectory, and knew people were dead, and were about to die.
Does speed of messaging matter? Not to everyone. It's relative to the value a society puts on a few (or 100,000) lives.
Whatever the ultimate system design of Twitter, they would be smart to seek government funding for an Emergency Dispatch channel.
hth :))
@Joel Strellner and @Billy:
I think you both missed this part of the article:
"The partitioning mechanism does not need to be based on letters, in fact it should really be based on a quick, uniform hash which would ensure that the users are distributed evenly around the grid. And there are many other details, of course, because what we described here is not the solution, but just the basic idea."
@Kip Exactly! :)
@Kip You are correct - about 2/3 through my post I realized that, however I felt I should still post it.
From my experience with twitter I can tell you that their systems are based solely on ID's and are relationally tied to everything else. I can also tell you that nearly all of their problems have to do with the back-end systems that listen to the Jabber servers that they have in place.
Twitter is dammenly nerd. The users should hang out instead.
Alex -
You missed cache. Twitter only interested in the last message, therefore it can be cached. Only pull from DB when you are trying to pull all tweets from an account. If Twitter is written in Java there are distributed caches such as ehcache. With RoR, it's limited to memcache is a client/server cache.
With decentralized architecture accounts can be serviced by different nodes. You can still able to follow someone in another node. Of course, you will need streaming between nodes. Furthermore, tasks can be decentralized to different node.
Alex,
A partitioned cache like our Cacheonix could help to reduce the load and increase reliability. Twitter would run on a set of hardware nodes (cluster) with state spread across the cluster. Any node dieing would not affect operations.
Hope this helps.
Regards,
Slava Imeshev
This post is very great and very interesting.
The solution that we have adopted with the our project is "on demand" and we use relational DB.
Now we scaled only image storage but not the DB.
Databases are hard to scale. This is not because they are bad. It just the services that databases provide such as relational read-write access and transactions support are not well suited for clustering. That's why most of the fully optimized applications end up with the DB as a bottleneck. Caching, paticiluarly partitioned caching, helps to solve this problem once and for all because most of or all frequently accessed data is in memory, and no trip to the database is required. Once the cache is in place, you can scale the application just by adding more hardware nodes.
Twitter is an ideal subject for caching because the data access pattern access is read-mostly (write once/read always to be precise).
Alex,
Before moving to cloud computing, I guess implementing Cache can improve the system's performance to a great extent.Of course, cache must be implemented in the correct manner, i.e. right objects should be cached. There is one major benefit of implementing cache, first it will not shake the basic architecture of the system as you are leaving basic architecture intact and second is that implementing cache is cheaper than shifting the whole backend architecture to new one. So, I think implementing cache is definately worth testing and trying before shifting the architecture of the system. Whats other's take on it ??
i still predict that Amazon will one day own twitter and it will all run on AWS and be THE flagship messaging service.
GREAT article! I really want more of those :D