Tao: The Power Of The Graph

     
I will be covering the architecture & key kiến thiết principles outlined in the paper that came out of Facebook on graph databases. This is an attempt lớn summarize the architecture of a highly scalable graph database that can tư vấn objects và their associations, for a read heavy workload consisting of billions of transactions per second. In facebook’s case, reads comprise of more than 99% of the requests and write are less than a percent.

Bạn đang xem: Tao: the power of the graph

Background

Facebook has billions of users & most of these users consume nội dung more often than they create content. So obviously their workload is read heavy. So they initially implemented a distributed lookaside cache using memcached, which this paper references a lot. In this workload, a lookaside cabít is used to lớn support all the reads & writes will go to the database. A good cache-hit rate ensures a good performance và doesn’t overload the database. The following figure shows how a memcabịt based lookaside cache is used at facebook for optimizing reads.


*

*

Look aside cađậy from the Memcabít paper

While this is immensely useful, most information in Facebook is best represented using a social graph và the content that gets rendered on a page is highly customizable depending on users privacy settings and it is personalized for every user. This means that the data needs khổng lồ be stored as-is và then filtered when it is being viewed/rendered.

See the social graph that gets generated on a typical simple activity such as: “someone visited the golden gate bridge with someone else and then a few folks commented on it”


*

*

Social graph between users

Representing this information in a key-value store like lookaside cađậy becomes very tricky & cumbersome. Some of the key motivations for having a native graph based store is:

One possible implementation is lớn use a formatted list of edges as a single value. But that means that every access would require loading of the entire edge-các mục & same more modification of an edge-menu. One could introduce native sầu danh mục types that can be associated with a key. But that only solves the problem of efficient edge-các mục access lookup. In a social graph, many objects are interlinked và coordinating such updates via edge-lists is tricky.In the memcache implementation at Facebook, memcache issue leases that tell clients to lớn wait for some time & that prevents thundering herds(read và write on the same popular objects causing misses in cabịt and then going lớn database). This moves control xúc tích to clients và since clients don’t communicate with each other, it adds more complexity there. In the model of Objects and Associations, everything is controlled by the TAO system which can implement these efficiently and hence clients are miễn phí to lớn iterate quickly.Using graph semantics, it becomes more efficient to implement read-after-write consistency mã sản phẩm.

So TAO instead provides Objects & Associations as the basic units of access in the system. It also optimizes for heavy reads và is consistent most times, but in case of failure cases it provides eventual consistency.

Data Model

The data Model consists of two main entities:

Object: It maps “id” lớn “Key, ObjectType, Value”

In the example above sầu, Alice is an Object of type User. Also a phản hồi that was added by Cathy is an Object of Type comment with the text of “wish we were there”. Objects better represent things that are repeatable, lượt thích comments.

Association: It maps “Object1, AssociationType, Object2” lớn “time, Key, Value”. Associations represent relationships that happen at most once — Two friends are connected at most once using an association. The usefulness of the time field will becomes clearer in the following sections on how queries work.

In the example above, Alice and Cathy are associated with each other using an association type of friover. Also the two objects of checkin & the golden gate location are connected to each other. The type of association is different in each direction. Golden Gate location object is connected khổng lồ checkin object using checkin association type. While the checkin object connects lớn the golden gate location object using location association type.

APIs on objects and associations

Object APIs are straightforward & they allow for creation, modification, deletion, retrieval of objects using their ids.

Xem thêm: Checkpoint Là Gì? Cách Vượt Checkpoint Facebook Chỉ 5S Cách Vượt Checkpoint Phê Duyệt Đăng Nhập 2020

Association creation, modification and deletion APIs basically mutate the liên kết accordingly between the two object ids with an association.

More interesting are association query APIs. This is where the power of graph semantics comes into the play. Consider queries such as:

“Give sầu me the most recent 10 comments about a checkin by Alice”

This can be modeled as assoc_range(CHECKIN_ID, COMMENT, 0, 10). This is also where time field attached khổng lồ the associations comes in handy. The time field can be used lớn sort queries lượt thích this easily.

“How many likes did the phản hồi from Cathy have?”

assoc_count(COMMENT_ID, LIKED_BY) This query will return number of “likes” that was associated to lớn a checkin.

TAO Architecture

Persistent Storage

At a high màn chơi, TAO uses mysql database as the persistent store for the objects và associations. This way they get all the features of database replication, backups, migrations etc. where other systems lượt thích LevelDB didn’t fit their needs in this regard.

The overall contents of the system are divided inlớn shards. Each object_id contains a shard_id in it, reflecting the logical location of that object. This translates lớn locating the host for this object. Also Associations are stored on the same shard as its originating object(Rethành viên that association is defined as Object1, AssociationType, Object2). This ensures better locality & helps with retrieving objects & associations from the same host. There are far more shards in the system than the number of hosts that host the mysql servers. So many shards are mapped onto lớn a single host.

All the data belonging khổng lồ objects is serialized & stored against the id. This makes the object table design pretty straightforward in the mysql database. Association are stored similarly with id as the key and data being serialized và stored in one column. Given the queries mentioned above sầu, further indices are built on association tables for: originating id(Object1), time based sorting, type of association.

Caching Layer

Like in the memcabịt paper, it is still very important lớn offload database workload using a caching layer. A client requesting information connects to a cache first. This cađậy belongs khổng lồ a tier consisting of multiple such caches and the database. They are collectively responsible for serving objects và associations.

Xem thêm: Giám Đốc Ngân Hàng Techcombank, Nguyễn Lê Quốc Anh

If there is a read-miss then caches can liên hệ the nearby caches or go khổng lồ the database. On a write, caches go the database for a synchronous update. This helps with read-after-write consistency in most cases; more details on this in the following sections.