Sampling the Social Graph using Facebook Graph API
Recently introduced Facebook Graph API represents an interesting source of data with a nice easy-to-appreciate context to it (everyone loves social). In order to motivate some of the examples in the blog, I have written up a simple quick&dirty Graph API client in Java :
that provides trivial-to-use interface for graph data processing :
SimpleGraphAPIClient client = new SimpleGraphAPIClient(fbToken);
LinkedList<FacebookUser> friends = client.getFriends();
for (FacebookUser friend : friends) {
LinkedList<LikedEntry> likes = client.getLikes(friend.getID());
LinkedList<PhotoEntry> photos = client.getPhotos(friend.getID());
LinkedList<GroupEntry> groups = client.getGroups(friend.getID());
// arbitrary dataset creation logic
}From a pure tool-perspective (without actually having an active fb app) - the dataset that can be generated is quite limited (bounded to the “neighborhood” of single user) - but even with that, a lot of interesting “play” data can be derived. For example, at minimum, we can get a (num_likes, num_photos, num_groups) data for all “friend” users and to that we can add some “derived” metrics like average group size, photo age, etc. Modeling this data alone can motivate some very interesting problems.
Here is a simple plot of (num_likes, num_photos, num_groups) dataset of 170 anonymous users obtained in this manner:

(Note - some of the data that Graph API returns occassionaly doesn’t match actual state on the site - so some outliers might be just missing data on fb side. However, this (systematic bias) is what might make the dataset especially interesting
)
April 25th, 2010 at 6:04 pm
[...] Unlike standard linear regression, where we the goal is determining parameters of assumed linear function, with nonparametric regression, the goal is estimating the entire regression function directly. Depending on the assumptions on the structure of underlying data, a number of methods exist that achieve optimality of estimation. We give a overview of several methods and explain their practical usage in R. In doing so, we make use of the social graph data described in recent post. [...]
April 25th, 2010 at 11:17 pm
[...] This post was mentioned on Twitter by Edward J. Yoon, Aleksandar Bradic. Aleksandar Bradic said: Sampling the Social Graph using Facebook Graph API | http://tinyurl.com/2fqyrur [...]
May 3rd, 2010 at 12:00 am
[...] However, getting up to speed with Avro for simple local serialization might not be as straightforward (mostly due to the lack of examples). We give an example of using Avro with Java for simple local serialization and discuss some potential pitfalls. We consider a trivial example of serializing to disk social graph dataset mentioned in previous post. [...]