Mckinsey Interview Questions | 10 Million Data Points

Question

How would you perform clustering on a million unique keywords, assuming you have 10 million data points—each one consisting of two keywords, and a metric measuring how similar these two keywords are? How would you create this 10 million data points table in the first place?

in progress 0
Dhruv2301 55 years 1 Answer 1183 views Great Grand Master 0

Answer ( 1 )

  1. We apply the Hadoop MapReduce standard K-means clustering algorithm to manage large datasets and introduce a new metric for similarity measurements such that the distances between objects exhibit high levels of intra-cluster similarity and low levels of inter-cluster similarity.

Leave an answer

Browse
Browse