Have you ever wondered how facebook stores everyting right from our likes on our photos, our friends photos , our comments , picture we are tagged in etc. Well its quite surprising to know that Facebook has 30,000 servers supporting these operations.
25 Terabytes of Log Data – Daily the amount of log data amassed in Facebook’s operations is staggering. Facebook manages more than 25 terabytes of data per day in logging data, which is the equivalent of about 1,000 times the volume of mail delivered daily by the U.S. Postal Service.
“Everything we do here is a big data problem,” says Jay Parikh, Facebook’s vice president of infrastructure engineering. “There is nothing here that is small. It’s either big or ginormous. Everything is at an order of magnitude where there is not a packaged solution that exists out there in the world, generally speaking.”
As the majority of the analytics is performed with Hive, it store the data on HDFS( Seen that in our last post) — the Hadoop distributed file system. In 2010, Facebook had the largest Hadoop cluster in the world, with over 20 PB of storage. By March 2011, the cluster had grown to 30 PB.
Facebook puts a lot of thought into handling server failures . In Facebook’s Hadoop clusters, there are always three copies of every file. Copy A and Copy B usually live within a single rack (a rack consists of 20 to 40 servers, each housing 18 to 36 terabytes of data). Copy C always lives in another rack.
A mini-database called the Namenode keeps track of the locations of these files. If the rack holding A and B or the switch controlling that rack fail for any reason, Namenode automatically reroutes incoming data requests to Copy C, and it creates new A and B copies on a third rack.
The machine running the Namenode is the single point of failure in any Hadoop cluster. If it goes down, the whole cluster goes offline. To cover that contingency, Facebook invented yet another mechanism. It’s called Avatarnode, and it’s already been given back to the Hadoop community as an open-source tool.
Well there is so much to talk about this topic which cannot be covered in single post..Stay tuned for another .. Till then keep asking
25 Terabytes of Log Data – Daily the amount of log data amassed in Facebook’s operations is staggering. Facebook manages more than 25 terabytes of data per day in logging data, which is the equivalent of about 1,000 times the volume of mail delivered daily by the U.S. Postal Service.
“Everything we do here is a big data problem,” says Jay Parikh, Facebook’s vice president of infrastructure engineering. “There is nothing here that is small. It’s either big or ginormous. Everything is at an order of magnitude where there is not a packaged solution that exists out there in the world, generally speaking.”
As the majority of the analytics is performed with Hive, it store the data on HDFS( Seen that in our last post) — the Hadoop distributed file system. In 2010, Facebook had the largest Hadoop cluster in the world, with over 20 PB of storage. By March 2011, the cluster had grown to 30 PB.
A mini-database called the Namenode keeps track of the locations of these files. If the rack holding A and B or the switch controlling that rack fail for any reason, Namenode automatically reroutes incoming data requests to Copy C, and it creates new A and B copies on a third rack.
The machine running the Namenode is the single point of failure in any Hadoop cluster. If it goes down, the whole cluster goes offline. To cover that contingency, Facebook invented yet another mechanism. It’s called Avatarnode, and it’s already been given back to the Hadoop community as an open-source tool.
Well there is so much to talk about this topic which cannot be covered in single post..Stay tuned for another .. Till then keep asking
plz tell more about big data technology ?????
ReplyDeletenice.......
ReplyDeleteThanx for info
ReplyDelete