Even if you overestimate ridiculously, it seems perfectly reasonable to keep the entire dataset in ram: 1 TRILLION tweets, at 500 bytes each is a data set of less than half a petabyte. A quick search shows I can get 4 GB of ram for $66 USD. Assume no redundancy, no bulk discount, and all other hardware is free, that is a cost of $8M or so (about half of their last round of funding, iirc).
Consider now that you don't need to keep non-recent tweets in ram, bulk buyers can get it significantly cheaper than individuals, and the dataset is far smaller than that, then throwing hardware seems far less impossible. I'd imagine that they could keep the last month in ram trivially.
Consider now that you don't need to keep non-recent tweets in ram, bulk buyers can get it significantly cheaper than individuals, and the dataset is far smaller than that, then throwing hardware seems far less impossible. I'd imagine that they could keep the last month in ram trivially.