Moonfrog Interview Questions | Queries

Question

Consider the following database schema for an e-commerce website: Given the following schema of tables:

Event table (no index/key) – contains event data recorded when an event occurs:
event_name [VARCHAR],
user_id [VARCHAR],
time [TIMESTAMP]Transaction table (no index / key) – contains transaction data for item_id [VARCHAR],
quantity [INTEGER],
price [NUMERIC],
user_id [VARCHAR],
time [TIMESTAMP]


Provide detailed queries to retrieve the answers to the following questions:-

-The average time between first visit and first purchase across users. [consider event name ‘visit’]
-Median and 80 percentile time between first visit and first purchase across users. [consider event name ‘visit’]
– The average time between first visit and purchase per item_id.   
– Median and 80 percentile time between first visit and purchase item per item_id.
– Given an item_id [consider item_id ‘shoe_01’], retrieve an ordered list of items that are most likely to be purchased by a user who purchased the item. 

in progress 4
Dhruv2301 55 years 5 Answers 2501 views Great Grand Master 0

Answers ( 5 )

  1. 1.
    SELECT t.user_id, AVG(timestampdiff(second, e.mintime, t.mintime))
    FROM (SELECT user_id, min(timestamp) AS mintime
    FROM event e
    where event_name= ‘visit’
    GROUP BY user_id
    ) e JOIN
    (SELECT user_id, min(time) AS mintime
    FROM trans t
    GROUP BY user_id
    ) t
    ON e.user_id = t.user_id
    GROUP BY t.user_id;

    2.
    select t.user_id,
    percentile_cont(0.80) within group(order by timestampdiff(second, e.mintime, t.mintime) over (partition by t.user_id) as percentile_cont_80,
    percentile_cont(0.50) within group(order by timestampdiff(second, e.mintime, t.mintime) over (partition by t.user_id) as median
    FROM (SELECT user_id, min(timestamp) AS mintime
    FROM event e
    where event_name= ‘visit’
    GROUP BY user_id
    ) e JOIN
    (SELECT user_id, min(time) AS mintime
    FROM trans t
    GROUP BY user_id
    ) t
    ON e.user_id = t.user_id
    GROUP BY t.user_id;

    3. Same codes will be for the rest of the problems (instead of select t.user_id, it will be t.item_id)

    • Hi Ognish
      Don’t you think the question is asking for Avg, Median and 80 Percentile as a whole and not for each user.
      I was confused regarding the same but seeing the granularity of data I feel it should be at an overall level.
      One other thing to support this claim is that in your case, all three (Avg, Median and 80th Percentile) are anyways going to be same for each user.

      Here are my answers to first 4 questions. Please note, I have combined answers 1,2 and 3,4 together as they had almost identical queries and same logic.

      Answers 1 & 2.

      WITH first_visit
      AS (SELECT user_id
      ,MIN(time) AS first_visit
      FROM event
      WHERE event = ‘visit’
      GROUP BY user_id
      )
      ,first_purchase
      AS (SELECT user_id
      ,MIN(time) AS first_purchase
      FROM transaction
      GROUP BY user_id
      )
      SELECT
      AVG( DATEDIFF(‘hour’, v.first_visit, p.first_purchase) / 24) AS avg_time
      ,PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY ( DATEDIFF(‘hour’, v.first_visit, p.first_purchase) / 24)) AS median
      ,PERCENTILE_DISC(0.8) WITHIN GROUP (ORDER BY ( DATEDIFF(‘hour’, v.first_visit, p.first_purchase) / 24)) AS 80_percentile
      FROM first_purchase p JOIN first_visit v ON p.user_id = v.first_visit

      3 & 4.

      WITH first_visit
      AS (SELECT user_id
      ,MIN(time) AS first_visit
      FROM event
      WHERE event = ‘vist’
      GROUP BY user_id
      )
      ,first_purchase
      AS (SELECT user_id
      ,item_id
      ,MIN(time) AS first_purchase_per_item
      FROM transaction
      GROUP BY user_id ,item_id
      )
      SELECT
      AVG( DATEDIFF(‘hour’, v.first_visit, p.first_purchase_per_item) / 24) AS avg_time
      ,PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY ( DATEDIFF(‘hour’, v.first_visit, p.first_purchase_per_item) / 24)) AS median
      ,PERCENTILE_DISC(0.8) WITHIN GROUP (ORDER BY ( DATEDIFF(‘hour’, v.first_visit, p.first_purchase_per_item) / 24)) AS 80_percentile
      FROM first_purchase p JOIN first_visit v ON p.user_id = v.first_visit

      • Hi Himanshu, even I thought the same, but in the question, it is “The average time between first visit and first purchase across users” – across users? Can you please shed some light on it? I am a bit confused I am thinking across users as individual users, that’s why wrote the query that way

      • In the 3rd and 4th part we need to calculate percentile at user id and item id level. We need to partition the result on user id and item id

        1 and 2
        select avg(diff), percentile_disc(0.5) within group (order by diff asc),
        percentile_disc(0.8) within group (order by diff asc)
        from
        (
        select id, (min(ttime)-min(etime)) diff
        from
        (select t.user_id id , t.time ttime, e.time etime
        from trans t join event e on t.user_id = e.user_id
        order by id
        )a
        group by id
        ) b

        3
        select id, itemid, avg(diff)
        from
        (
        select id, itemid, min(ttime), min(etime), (min(ttime)-min(etime)) diff
        from
        (select t.user_id id , t.item_id itemid, t.time ttime, e.time etime
        from trans t join event e on t.user_id = e.user_id
        order by id, itemid
        )a
        group by id, itemid
        order by id, itemid
        ) b

        4 and 5
        select id, itemid, percentile_disc(0.5) within group (order by diff asc) ,
        percentile_disc(0.8) within group (order by diff asc)
        from
        (
        select id, itemid, min(ttime), min(etime), (min(ttime)-min(etime)) diff
        from
        (select t.user_id id , t.item_id itemid, t.time ttime, e.time etime
        from trans t join event e on t.user_id = e.user_id
        order by id, itemid
        )a
        group by id, itemid
        order by id, itemid
        ) b

      • For 3rd and 4th I think we need metrics at item level
        3)select itemid, avg(diff)
        from
        (
        select id, itemid, min(ttime), min(etime), (min(ttime)-min(etime)) diff
        from
        (select t.user_id id , t.item_id itemid, t.time ttime, e.time etime
        from trans t join event e on t.user_id = e.user_id
        order by id, itemid
        )a
        group by itemid
        order by itemid
        ) b

        4)select distinct itemid, percentile_disc(0.5) within group (order by diff asc) over (partition by itemid),
        percentile_disc(0.8) within group (order by diff asc) over (partition by itemid)
        from
        (
        select id, itemid, min(ttime), min(etime), (min(ttime)-min(etime)) diff
        from
        (select t.user_id id , t.item_id itemid, t.time ttime, e.time etime
        from trans t join event e on t.user_id = e.user_id
        order by id, itemid
        )a
        group by id, itemid
        order by id, itemid
        ) b

        5)
        with a as
        (select t.user_id id , t.item_id itemid, t.time ttime, e.time etime
        from trans t join event e on t.user_id = e.user_id
        order by id, itemid
        )
        select a1.itemid
        from a a1
        where a1.id in (select distinct id from a where a.itemid =’shoe_01′)
        group by a.itemid
        order by count(*) desc

  2. 1 and 2
    select avg(diff), percentile_disc(0.5) within group (order by diff asc),
    percentile_disc(0.8) within group (order by diff asc)
    from
    (
    select id, (min(ttime)-min(etime)) diff
    from
    (select t.user_id id , t.time ttime, e.time etime
    from trans t join event e on t.user_id = e.user_id
    order by id
    )a
    group by id
    ) b

    3
    select id, itemid, avg(diff)
    from
    (
    select id, itemid, min(ttime), min(etime), (min(ttime)-min(etime)) diff
    from
    (select t.user_id id , t.item_id itemid, t.time ttime, e.time etime
    from trans t join event e on t.user_id = e.user_id
    order by id, itemid
    )a
    group by id, itemid
    order by id, itemid
    ) b

    4 and 5
    select id, itemid, percentile_disc(0.5) within group (order by diff asc) ,
    percentile_disc(0.8) within group (order by diff asc)
    from
    (
    select id, itemid, min(ttime), min(etime), (min(ttime)-min(etime)) diff
    from
    (select t.user_id id , t.item_id itemid, t.time ttime, e.time etime
    from trans t join event e on t.user_id = e.user_id
    order by id, itemid
    )a
    group by id, itemid
    order by id, itemid
    ) b

Leave an answer

Browse
Browse