How to clear Data Analyst Interview round 1: part 2
In the last article, I shared the questions that I have been asked in the data analyst interviews first round, I wanted to write a second part since I have given a few more interviews in the last few days and was confounded by some really smart questions that I wanted to share/document.
If you are planning to pursue your career in any field even remotely close to data science, you must be strong with your SQL skills. I would personally recommend you to practice all these questions. If you can solve these questions and from the last article, you will be 100 percent selected in the first round of your interview.
Statement: How many rows will be present in output after using the following joins
- Left join
- right join
- inner join
- outer join
I will be using pandas to show tables and determine the number of rows, since I find it easier to operate rather than creating new tables in a database.
This was a tricky question since we had a value “x” with a lot of duplicates, I got completely confused while answering this and answered all of them wrong. The trick is that every “x” of table 1 will be joined with 3 “x” of table 2. which is why left join has 7 rows.
Statement: What is the difference between UNION and UNION ALL, also give an output of the following tables after applying union operation on them?
Answer: Union will join two tables row-wise but only unique row values are added while union all will add all the rows.
Except the id 7 of table 2, all ids will be displayed.
Statement: Find out the students whose marks are not increasing over time. (I'm not sure if this was the exact problem statement, but I hope you understood what was asked in this) the table name is “marks”
select distinct answer
when diff<0 then "id"
end as answer
from (select *,
over(partition by id) as diff
from marks)a) b
Explanation: We used window function lag in this. First of all, I will try to find the difference between marks and lag values. lag values are row values of one row above, so I can subtract the value of 1 row above to evaluate that marks are increasing progressively. For illustration check the below photo.
then we will use the “case when” to find if there are any values with diff as less than 0(negative value), when we find the value, we will add the value “id” in the column.
and next, we will output all the distinct id values from the case-when column.
Next few questions that were asked in the interviews were available on coding platforms so i solved them there. All the questions are self-explnatory and test your knowledge of SQL functions.
select floor(avg(salary)), floor(max(salary))
select name, count(name)
group by name
order by name;
select a.personid, a.name, count(b.personid), sum(c.score)
from person a
inner join friend b
inner join person c
group by a.personid
order by a.personid;
This was a question of self-join, which is one of the most important topics of SQL for data analysis. I would recommend you to thoroughly practice self-joins.
In all the interviews some of the common questions were related to output after joins. The second highest salary question has been asked like 5 times. The inner join is also extremely important. Make sure you read these topics and practice them regularly.