I was wondering how google stores its data
about my blogs ie what all article I wrote ? , when I posted that article? ,its content, all comments etc etc. After a small search the solution I found is “ BIGTABLE” .
Google BigTable (also frequently spelled
Bigtable) is a distributed, column-oriented data store created by Google Inc.
to handle very large amounts of structured data associated with the company's
Internet search and Web services operations.
BigTable development began in 2004
and is now used by a number of Google applications, such as web indexing,MapReduce,
which is often used for generating and modifying data stored in BigTable, Google Maps, Google Book Search, "My Search
History", Google Earth, Blogger.com, Google Code hosting, Orkut, YouTube, and Gmail. Google's reasons for developing its
own database include scalability and better control of performance
characteristics.
In April 2008, Google announced that is was
making Bigtable available to outside developers as part of Google App Engine,
the company's cloud-computing platform. The only other large company that
offers a database for cloud computing is Amazon.com Inc., so Google’s entry
into the market is a pretty big deal.
Understanding Bigtable's architecture is a job
for Ph.D.s. But heres a small
look about it
Basic Architecture of
BigTable
Bigtable is described as a fast and extremely scalable DBMS
(database management system). It is based on the proprietary Google File System, which gives Bigtable the
ability to scale across hundreds or thousands of commodity servers that
collectively can store petabytes of data.
Each table is a multidimensional sparse map. The table consists of rows and columns, and each cell has a time stamp. There can be multiple versions of a cell with different time stamps. The time stamp allows for operations such as "select 'n' versions of this Web page" or "delete cells that are older than … "
In order to manage the huge tables, Bigtable splits tables at row boundaries and saves them as tablets. Each tablet is around 200MB, and each server saves about 100 tablets. This setup allows tablets from a single table to be spread among many machines. It also allows for fine-grained load balancing, because if one table is receiving many queries, it can shed other tablets or move the busy table to another machine that is not so busy. Also, if a machine goes down, a tablet may be spread across many other machines so that the performance impact on any given machine is minimal.
Tables are stored as immutable SSTables and a tail of logs (one log per machine). When a machine's system memory is full, it compresses some tablets using Google proprietary compression techniques such as BMDiff and Zippy. Minor compactions involve only a few tablets, while major compactions involve the whole table system and recover hard-disk space.
The locations of Bigtable tablets are stored in cells. The lookup of any particular tablet is handled by a three-tiered system. The clients get a point to a META0 table, of which there is only one. The META0 table keeps track of many META1 tablets that contain the locations of the tablets being looked up. Both META0 and META1 make heavy use of pre-fetching and caching to minimize bottlenecks in the system.
Each table is a multidimensional sparse map. The table consists of rows and columns, and each cell has a time stamp. There can be multiple versions of a cell with different time stamps. The time stamp allows for operations such as "select 'n' versions of this Web page" or "delete cells that are older than … "
In order to manage the huge tables, Bigtable splits tables at row boundaries and saves them as tablets. Each tablet is around 200MB, and each server saves about 100 tablets. This setup allows tablets from a single table to be spread among many machines. It also allows for fine-grained load balancing, because if one table is receiving many queries, it can shed other tablets or move the busy table to another machine that is not so busy. Also, if a machine goes down, a tablet may be spread across many other machines so that the performance impact on any given machine is minimal.
Tables are stored as immutable SSTables and a tail of logs (one log per machine). When a machine's system memory is full, it compresses some tablets using Google proprietary compression techniques such as BMDiff and Zippy. Minor compactions involve only a few tablets, while major compactions involve the whole table system and recover hard-disk space.
The locations of Bigtable tablets are stored in cells. The lookup of any particular tablet is handled by a three-tiered system. The clients get a point to a META0 table, of which there is only one. The META0 table keeps track of many META1 tablets that contain the locations of the tablets being looked up. Both META0 and META1 make heavy use of pre-fetching and caching to minimize bottlenecks in the system.
wow........ nice info
ReplyDeletecooll
ReplyDeleteyes. finally got the info
ReplyDeletegood yaar
ReplyDeleteawwsssoome yaar..........
ReplyDeletecan you tell me some thing more about BigTable......
ReplyDeleteSahi hai
ReplyDeleteNice info asktechi..
ReplyDeleteOnline casino site - Lucky Club
ReplyDeleteLucky Club is the only online casino website in Nigeria. Experience the best of both worlds with our new and exciting slot luckyclub machine games.About · Promotions · Deposit Methods