Mckinsey Interview Questions | 10 Million Data Points
Question
How would you perform clustering on a million unique keywords, assuming you have 10 million data points—each one consisting of two keywords, and a metric measuring how similar these two keywords are? How would you create this 10 million data points table in the first place?
in progress
0
Statistics
4 years
1 Answer
1164 views
Great Grand Master 0
Answer ( 1 )
We apply the Hadoop MapReduce standard K-means clustering algorithm to manage large datasets and introduce a new metric for similarity measurements such that the distances between objects exhibit high levels of intra-cluster similarity and low levels of inter-cluster similarity.