| d2r diego's weblog |
social software: automatic relationship clustering[See also this post and a follow-up here] Regarding my post on tuesday on social software: representing relationships, my mind kept coming back to one of the things I wrote: People don't always agree on what the relationship means to each other. This to me points to the need to let each person define their own relationship/trust structures and then let the software mesh them seamlessly if possible.The reason I kept thinking about it is that I didn't really explain properly what I meant by that. So what did I have in mind when I said that? Well, lots of things :), but let's start with the basics. The first thing that the software should be able to do is infer what groups are there, rather than be told what the groups are. With this in hand, if you simply define relationships to your friends, and you take into account their friends and how they relate to you, you should be able to create a graph of probable relationship clusters, that is, groups formed around strong interpersonal relationships. Sounds farfetched? Read on... Well, as it happens while I was at Drexel I did research on automatic graph clustering, applying genetic algorithms to techniques developed by my advisor at the time, Spiros Mancoridis, except back then it was applied to software systems. But what I realized a couple of days ago is that the same technique, maybe with a few mods, it would work just as well (if not better, because the graphs are smaller). The technique is described here, but to make a long story short, there's basically a set of equations that can be used to provide an objective measure of how well the clustering is done on a certain graph. The graph must define clusters to which nodes belong, along with the relationships between the nodes. The equation system favors loosely coupled clusters with dense inter-cluster relationships between the nodes (cluster coupling is determined by the edges that connect different nodes across clusters). The objective value is called the modularization quality of the graph, which is calculated by using an inter-cluster connectivity measure and an intra-cluster connectivity measure . To make things more concrete, let's look at the simplest type of MQ, one for a directed graph with unweighted edges. Don't panic, it's not as complex as it looks at first glance! :) The intra-cluster measure is calculated as follows: ![]() Where Ai is the intra-connectivity measure for cluster i, Ni is the number of nodes in the cluster, and mi is the number of edges in the cluster. The inter-connectivity measure Eij is calculated as: ![]() These two measures are combined to calculate the MQ of the whole graph: ![]() With k the number of clusters in the graph. (Btw, this is all much better explained in the paper, but this is good enough to get an idea of what's going on). Now, the problem with this calculation is that it depends on a particular graph clustering, which is precisely what we want to find out since we are assuming a set of relationships with no clustering. We have one advantage though, we know that MQ function has the property of being higher the "better" the clustering is (according to the measures just described). So what we need to do is treat this as an optimization problem. There are a number of sub-optimal techniques to traverse a space of values, including hill climbing, genetic algorithms, and so on. We just need to choose one, with the caveat that the larger the space the larger the probability that we are hitting a local maximum (rather than the overall maximum) of the space. This is not a problem for relatively small graphs (<50 nodes), with that size we can even do an exhaustive (ie., optimal) search on the space. If all of this sounds a bit iffy, let me demonstrate with an example. Let's say that we've got the following relationship set from the, um, "real world" (heh):
With the relationship file ready, but no clusters defined in the file, we can now process the graph and see what the optimization process discovers as "clusters". Here is the result (graph visualized using AT&T's dotty tool--ignore the labels for each cluster, they're automatically generated IDs): ![]() It is easy to underestimate the significance of obtaining this graph automatically, since the "clusters" that we see are for us obvious (if you've watched The Simpsons, that is :)), but keep in mind: the software has no knowledge of the actual groups, just of the relationships between nodes, ie., individuals. Additionally, there are more complex MQ calculations that involves weights on the node relationships; using different dimensions for different target groups allows creating different clustered views based on them. It wouldn't be hard to adapt this to parse, say, FOAF files and do some pretty interesting things. This is clearly only a first step, but once reasonable clusters are obtained the software can begin doing more interesting things, such as suggesting which people you should meet (e.g., when someone belongs to the same cluster as you but you don't know them), defining levels of warning for requests (exchanges between individuals in the same cluster would have less friction), etc. Cool eh? :) PS: Speaking of clustering, check this out. The clustering in this case is fairly obvious, but for more complicated sets of relationships the technique I describe would also apply in this space. [via Political Wire] Categories: soft.devPosted by diego on February 5 2004 at 1:53 AM Comments (please see the comments & trackback policy).
Excellent Simpsons example! Here is another view of your Simpsons' data More typical clusters we find in real organizations look like this... Hello, my name is Anne, and I'm a social anthropologist ;) This means that I need some sort of Rosetta Stone to understand those hieroglyphics above, and so I thank you for your use of The Simpsons. (Incidentally, I am lost in all kinds of scenarios without Simpsons' analogies :)) Anyway, a few quick observations from someone not intimately familiar with such things - so please be gentle with me. I understand the diagram above, as well as the one provided by Valdis. But my concern is with the original relationship set. I also understand it to be correct - in the sense that those relationships actually exist in the Simpsons - but so do a bunch of others between the same people that would change your diagrams. So here's my question: how can diagrams like this represent (shifting) context? Do we delineate a new set of relations, and draw a new diagram, for each time/space at hand? Is there a way to connect these diagrams so that they can flow into and out of each other, in ways similar to how relationships actually change through lived space/time? Posted by: Anne at February 6, 2004 3:29 PMValdis, thanks for the additional examples! They look good. I would note that the second (much more realistic) example you point to represents (if I understood it correctly) the view of an organization as a whole. I think that in general (there are always exceptions) personal networks will be much more limited in size. Anne, regarding your question, I actually skimmed over the answer in the post (didn't quite explain what I meant). The example above is actually *mixing* dimensions, because we've got friendship, "works with", family, etc. So it's a good starting point but in the end you need weight-dimensions to define a relationship level across. On that concept I said "Additionally, there are more complex MQ calculations that involve weights on the node relationships; using different dimensions for different target groups allows creating different clustered views based on them. [...]" So my current thinking is that when you define your relationship with someone you'd say "Friend: 8, Coworker: 0" or "Friend 5: Coworker 10" on a scale of one to ten for someone who's a good friend (first case) and a co-worker you're just friendly with (second case). These numbers could be defined visually and intuitively using the Don Park's friendship circle for example. Then you can create the diagram based on each weight-dimension which would give you the cluster view for each. Makes sense? Posted by: Diego at February 6, 2004 3:56 PMThanks Diego. At the risk of revealing further ignorance, I didn't understand the bit you quote above when I read it, but your explanation helps :) Still, and I don't mean to pester you, what happens when Friend X changes from a 2 to a 6 to a 9 and then maybe back to a 3, depending on where I am and who I am with, or what has happened between us? For example, when Nila, Julie and I are together at work, we would each rank each other with a "closeness" of 2 (1 being the greatest) and all my other co-workers as 8. But when I run into Nila and Julie while I am out with Mike, Natasha and Nigel, I would reassign their "closeness" as 4, because they are not as close friends as Mike (2) and Natasha (3) but still closer than Nigel (5). Six months later, when I fall in love with Nigel, he becomes the only person ranked with a 1. Six months after that, Nigel and I are still in love, but like all my friends, his "closeness" shifts according to context. Two years later, when I've broken his heart, I still consider him a 4, but he considers me a 10. To make this long story short, I'm afraid I still don't understand how diagrams like these, or "automatic clustering" can help me manoeuver relationships that shift like that... Would there just be *lots* of separate clusters for each social encounter? I'm sorry if I'm taking you way off topic - but I'm genuinely trying to understand :) Thanks. Anne, good points. While I haven't thought about these issues in great detail, I'd say that as relationships shift you'd have to adapt the values you define in the software... while not ideal it would be a good solution. Plus, constantly shifting relationship patterns, as they occur in real life, make more of a case for automatically derived clusters which the users can then leverage. However, it's not obvious to me what happens to the clusters as the weight of the relationships shift. It's very possible that this particular type of clustering would not do the job by itself. To clarify these points I'll write up an entry later today or tomorrow, with some examples--it will be interesting to see the evolution of the clusters with shifting asymmetric relationship patterns as those you describe. Posted by: Diego at February 7, 2004 11:31 AMHere's the book to read, if you want to really get into this... (I posted this comment also on Anne's blog) Anne, for visualization of communities over time you could want to check CiteSpace Read about it at http://www.pmbrowser.info/hublog/archives/000696.html Ann, here are two examples of showing changing networks you might be interested, although they are not strictly social networks: Copyright © Diego Doval 2002-2007.
|




