Question

Mckinsey Interview Questions | 10 Million Data Points

Question

How would you perform clustering on a million unique keywords, assuming you have 10 million data points—each one consisting of two keywords, and a metric measuring how similar these two keywords are? How would you create this 10 million data points table in the first place?

in progress 0

Statistics Dhruv2301 55 years 1 Answer 1397 views Great Grand Master 0

About Dhruv2301Great Grand Master

Follow Me

Answer ( 1 )

Leave an answer

Name*

E-Mail*

Website

Attachment

Browse

Featured image

Browse

Answer*

Previous question

Next question

Ognish Master · Answer 1 · July 7, 2020

Ognish Master

0

July 7, 2020 at 3:20 pm

Reply

We apply the Hadoop MapReduce standard K-means clustering algorithm to manage large datasets and introduce a new metric for similarity measurements such that the distances between objects exhibit high levels of intra-cluster similarity and low levels of inter-cluster similarity.

Register Now

Login

Lost Password

Mckinsey Interview Questions | 10 Million Data Points

About Dhruv2301Great Grand Master

Related questions

https://thedatamonk.com/add-question/

Want to get funny velcro morale patches?

Professional Security Guard Service

Advantage and Disadvantage of different sampling method

How do you create a sample data of 1000 rows from a population of 1 Million rows and 100 columns?

Answer ( 1 )

Leave an answer