If you want the solution for the below problems, Link below
https://drive.google.com/drive/folders/1RMjGmbUTC9i8QUNDC9n5buRvY0rAcTWy?usp=sharing
- Word count using RDD and DataFrame? (practice both)
Input text file:
hi hello fine god hi hello ji ji ji ji
f fg df sw sw
fine god
Output RDD or DF:
RDD: [(‘fine’,2)(‘fg’,1)(‘sw’,2)(‘hello’,2)(‘f’,1)(‘god’,2)(‘dfn’,1)(‘ji’,4)(‘hio’,2)]
2. Get the desired DF after cleaning
Input text file:
Name~|Age
Azarudeen, Shahul~|25
Michel, Clarke~|26
Virat, Kohli~|28
Andrew, Simond~|137
Geogre, Bush~|159
Flintoff, David~|12
Adam, James~|20
Output:
3.Get the desired DF
input text file:
Name|Age|Education
Azar|25|MBA,BE,HSC
Hari|32|
Kumar|35|ME,BE,Diploma
Output DF:
4. Find the total account balance for a Customer_no. It is found by (credit-debit) for each of the Customer_no
Input file:
Customer_No,Card_type,Date,Category,Transaction Type,Amount
1000501,Platinum Card,1/1/2018,Shopping,debit,11.11
1000501,Checking,1/2/2018,Mortgage & Rent,debit,1247.44
1000501,Silver Card,1/2/2018,Restaurants,debit,24.22
1000501,Platinum Card,1/3/2018,Credit Card Payment,credit,2298.09
1000501,Platinum Card,1/4/2018,Movies & DVDs,debit,11.76
1000501,Silver Card,1/5/2018,Restaurants,debit,25.85
1000501,Silver Card,1/6/2018,Home Improvement,debit,18.45
1000501,Checking,1/8/2018,Utilities,debit,45
1000501,Silver Card,1/8/2018,Home Improvement,debit,15.38
1000501,Platinum Card,1/9/2018,Music,debit,10.69
1000501,Checking,1/10/2018,Mobile Phone,debit,89.46
1000501,Platinum Card,1/11/2018,Gas & Fuel,debit,34.87
1000501,Platinum Card,1/11/2018,Groceries,debit,43.54
1000501,Checking,1/12/2018,Paycheck,credit,2000
1000531,Platinum Card,1/13/2018,Fast Food,debit,32.91
1000531,Platinum Card,1/13/2018,Shopping,debit,39.05
1000531,Silver Card,1/15/2018,Groceries,debit,44.19
1000531,Silver Card,1/15/2018,Restaurants,debit,64.11
1000531,Checking,1/16/2018,Utilities,debit,35
1000531,Checking,1/16/2018,Utilities,debit,60
1000531,Checking,1/19/2018,Paycheck,credit,2000
1000531,Platinum Card,1/20/2018,Shopping,debit,50.21
1000531,Platinum Card,1/22/2018,Credit Card Payment,credit,554.99
1000531,Silver Card,1/22/2018,Credit Card Payment,credit,309.81
1000531,Checking,1/22/2018,Credit Card Payment,debit,554.99
1000531,Silver Card,1/22/2018,Home Improvement,debit,17.38
1000531,Checking,1/23/2018,Credit Card Payment,debit,309.81
1000654,Platinum Card,1/24/2018,Coffee Shops,debit,3
1000654,Checking,1/25/2018,Internet,debit,69.99
1000654,Silver Card,1/29/2018,Gas & Fuel,debit,30.42
1000654,Silver Card,1/29/2018,Restaurants,debit,25
1000654,Platinum Card,1/29/2018,Restaurants,debit,17.62
1000654,Platinum Card,2/1/2018,Groceries,debit,27.79
1000654,Platinum Card,2/1/2018,Shopping,debit,11.11
1000654,Checking,2/2/2018,Mortgage & Rent,debit,1247.44
1000654,Checking,2/2/2018,Paycheck,credit,2000
1000654,Platinum Card,2/3/2018,Restaurants,debit,57.02
1000654,Platinum Card,2/4/2018,Movies & DVDs,debit,11.76
1000654,Platinum Card,2/5/2018,Credit Card Payment,credit,145.14
1000654,Silver Card,2/6/2018,Credit Card Payment,credit,154.13
1001863,Checking,2/7/2018,Credit Card Payment,debit,154.13
1001863,Checking,2/7/2018,Utilities,debit,65
1001863,Platinum Card,2/9/2018,Haircut,debit,30
1001863,Platinum Card,2/9/2018,Music,debit,10.69
1001863,Platinum Card,2/10/2018,Fast Food,debit,10.66
1001863,Platinum Card,2/11/2018,Restaurants,debit,106.8
1001863,Silver Card,2/12/2018,Gas & Fuel,debit,36.47
1001863,Checking,2/12/2018,Mobile Phone,debit,89.52
1001863,Silver Card,2/14/2018,Alcohol & Bars,debit,14
1001863,Platinum Card,2/15/2018,Restaurants,debit,10
1001863,Checking,2/15/2018,Utilities,debit,60
1001863,Checking,2/16/2018,Paycheck,credit,2000
1001863,Silver Card,2/16/2018,Restaurants,debit,8
1001863,Checking,2/16/2018,Utilities,debit,35
1001863,Silver Card,2/20/2018,Groceries,debit,35.95
1001863,Silver Card,2/20/2018,Restaurants,debit,23.51
1001863,Platinum Card,2/21/2018,Coffee Shops,debit,2
1001863,Silver Card,2/22/2018,Coffee Shops,debit,4
1001863,Platinum Card,2/26/2018,Credit Card Payment,credit,765.37
1001368,Silver Card,2/26/2018,Credit Card Payment,credit,156.11
1001368,Checking,2/26/2018,Credit Card Payment,debit,765.37
1001368,Checking,2/26/2018,Internet,debit,74.99
1001368,Silver Card,2/26/2018,Restaurants,debit,85.52
1001368,Silver Card,2/26/2018,Gas & Fuel,debit,32.21
1001368,Checking,2/27/2018,Credit Card Payment,debit,156.11
1001368,Silver Card,3/1/2018,Groceries,debit,32.07
1001368,Platinum Card,3/1/2018,Shopping,debit,13.13
1001368,Checking,3/2/2018,Paycheck,credit,1247.44
1001368,Checking,3/2/2018,Paycheck,credit,2000
1001368,Silver Card,3/3/2018,Groceries,debit,23.74
1001368,Platinum Card,3/4/2018,Groceries,debit,10.69
1001368,Platinum Card,3/4/2018,Movies & DVDs,debit,11.76
1001368,Platinum Card,3/4/2018,Restaurants,debit,42.24
1002324,Platinum Card,3/5/2018,Coffee Shops,debit,3
1002324,Silver Card,3/5/2018,Credit Card Payment,credit,761.59
1002324,Checking,3/5/2018,Credit Card Payment,debit,761.59
1002324,Platinum Card,3/7/2018,Coffee Shops,debit,3.5
1002324,Platinum Card,3/8/2018,Gas & Fuel,debit,34.9
1002324,Checking,3/8/2018,Utilities,debit,52
1002324,Platinum Card,3/9/2018,Groceries,debit,20.72
1002324,Platinum Card,3/9/2018,Groceries,debit,5.09
1002324,Platinum Card,3/9/2018,Music,debit,10.69
1002324,Platinum Card,3/12/2018,Groceries,debit,19.35
1002324,Checking,3/12/2018,Mobile Phone,debit,89.52
1002324,Platinum Card,3/13/2018,Shopping,debit,45.75
1002324,Platinum Card,3/14/2018,Groceries,debit,22.5
1002324,Platinum Card,3/14/2018,Restaurants,debit,8.49
1002324,Platinum Card,3/15/2018,Coffee Shops,debit,3.5
1002324,Checking,3/15/2018,Utilities,debit,60
1002324,Checking,3/16/2018,Paycheck,credit,2000
1002324,Silver Card,3/17/2018,Alcohol & Bars,debit,19.5
1000210,Platinum Card,3/17/2018,Fast Food,debit,23.34
1000210,Silver Card,3/19/2018,Restaurants,debit,36.48
1000210,Checking,3/19/2018,Utilities,debit,35
1000210,Platinum Card,3/20/2018,Shopping,debit,14.97
1000210,Silver Card,3/22/2018,Gas & Fuel,debit,30.55
1000210,Platinum Card,3/23/2018,Credit Card Payment,credit,559.91
1000210,Checking,3/23/2018,Credit Card Payment,debit,559.91
1000210,Silver Card,3/23/2018,Groceries,debit,11.76
1000210,Checking,3/26/2018,Internet,debit,74.99
1000210,Silver Card,3/28/2018,Groceries,debit,16.06
1000210,Silver Card,3/28/2018,Restaurants,debit,24.98
1000210,Silver Card,3/29/2018,Restaurants,debit,17.64
1000210,Silver Card,3/30/2018,Groceries,debit,9.09
1000210,Checking,3/30/2018,Paycheck,credit,2000
Output DF:
5. For each location, if the name property is ‘state’ then return its value property.
E.g. : {“name”: “state”, “value”: “IL”} here ‘IL ’is returned because the value of name is ‘state’
Input data dictionary:
dataDictionary = [
('12345',{"addressAttributes": [{"name": "houseNumber", "value": "718"}, {"name": "streetName", "value": "VIENNA"}, {"name": "streetSuffix", "value": "ST"}, {"name": "city", "value": "METROPOLIS"}, {"name": "state", "value": "IL"}, {"name": "zip5", "value": "62960"}, {"name": "zip4", "value": "1642"}, {"name": "country", "value": "USA"}]}),
('678910',{"addressAttributes": [{"name": "houseNumber", "value": "245"}, {"name": "streetName", "value": "LONGVIEW"}, {"name": "streetSuffix", "value": "DR"}, {"name": "city", "value": "PADUCAH"}, {"name": "state", "value": "KY"}, {"name": "zip5", "value": "42001"}, {"name": "zip4", "value": "5968"}, {"name": "country", "value": "USA"}]})
]
Output DF:
6. Find the students who got the highest mark for each subject
Input Df:
Sub|Name|Marks
Eng|John|85
Math|John|76
Science|John|89
Eng|Maria|91
Math|Maria|74
Science|Maria|82
Eng|Karthik|91
Math|Karthik|100
Science|Karthik|76
Output Df:
7. Find whether the current day temperature is greater than the previous day temperature
Input Data:
# Create a DataFrame with the provided data
data = [
(1, '2023-06-01', 10),
(2, '2023-06-02', 25),
(3, '2023-06-03', 20),
(4, '2023-06-04', 30)
]
Output DF:
8.UnNest the JSON and create dataframe
Read the below JSON file.
{
"sensorName": "snx001",
"sensorDate": "2020-01-01",
"sensorReadings": [
{
"sensorChannel": 1,
"sensorReading": 3.7465084060850105,
"datetime": "2020-01-01 00:00:00"
},
{
"sensorChannel": 2,
"sensorReading": 3.8465084060850105,
"datetime": "2021-01-01 00:00:00"
}]
}
output df: