Opinions on using MySQL for your Applications
1. When we do have a huge tables, too big to be fit on a single machine we do partition, MySQL scalability isn’t an issue for the job.
Facebook reported running 1800 MySQL servers with just two DBAs in 2008. [1]
You can’t do joins across partitions, but the NoSQL databases don’t allow this anyway.
Facebook hasn’t confirmed using Cassandra as the primary source for any data, and it seems like inbox search might be their only use of it. [2]
2. Stability with Scalability
Distributed databases like Cassandra, MongoDB, and CouchDB aren’t actually very scalable or stable. [3]
Twitter apparently has been trying to move from MySQL to Cassandra for over a year.
foursquare reported an 11-hour downtime because of MongoDB. [4]
Twitter gave up on the Cassandra migration. [5]
Facebook is moving away from Cassandra. [6]
HBase is getting better but is still risky if you don’t have people around with a deep understanding of it. [7]
3. Partitions
You can actually get pretty far on a single MySQL database and not even have to worry about partitioning at the application level.
You can “scale up” to a machine with lots of cores and tons of ram, plus a replica.
If you have a layer of memcached servers in front of the databases (which are easy to scale out) then the database basically only has to worry about writes.
You can also use S3 or some other distributed hash table to take the largest objects out of rows in the database.
There’s no need to burden yourself with making a system scale more than 10x further than it needs to, as long as you’re confident that you’ll be able to scale it as you grow.
5. Distributions
Many of the problems created by manually partitioning the data over a large number of MySQL machines can be mitigated by creating a layer below the application and above MySQL that automatically distributes data.
FriendFeed described a good example implementation of this [8].
- Personally, I believe the relational data model is the “right” way to structure most of the data for an applications.
- Schemas allow the data to persist in a typed manner across lots of new versions of the application as it’s developed, they serve as documentation, and prevent a lot of bugs. And SQL lets you move the computation to the data as necessary rather than having to fetch a ton of data and post-process it in the application everywhere.
- The “NoSQL” fad will end when, someone finally implements a distributed relational database with relaxed semantics.
References
Vivek Gupta: https://www.linkedin.com/in/vivekg-/
Follow me on Quora: https://www.quora.com/profile/Vivek-Gupta-1493
Check out my legal space here: https://easylaw.quora.com