Welcome to our comprehensive project designed to guide you through the exciting world of SQL learning! Whether you're a beginner eager to explore the foundations of querying databases or a data enthusiast seeking to refine your data analysis skills, this project is your ultimate resource.
In this project, we've curated a wealth of resources and learning paths that cater to various learning styles and skill levels. Our goal is to empower you with the knowledge and techniques required to effectively use SQL for data analysis.
I used a dummy clinical trial dataset. Here I have started from the beginning by learning: To design a database, create a table, add data to the table, Update the column name, Altering the data in the table, Define the questions that need to be answered, Clean the dataset, and Performing Exploratory Data analysis.
Tutorial
For a step-by-step tutorial, you can see my:
YouTube playlist: https://youtube.com/playlist?list=PLWcqrtRKXifUaCVQoUBp-ypMK2d5ZiJBk
This project is for study purposes, and all the data shown and used is dummy. You can also use this for practice and educational purposes.
STEPS:
Step-1: Creating Table
Create Table Trial_1_Data
(
PatientID INT,
Age Int,
Gender Varchar (6),
Treatment Varchar (11),
EfficacyScore Decimal (4,2),
SideEffect Varchar (3),
BloodPressure Varchar (10),
DateVisited Date,
BMI Decimal (5,2),
Cholesterol Int,
SmokingStatus Varchar (10),
DiabetesStatus Varchar (3),
FamilyHistory Varchar (3),
Ethnicity Varchar (20),
EducationLevel Varchar (20),
Occupation Varchar (20)
);
Step-2: Inserting Data into the table
Insert into Trial_1_Data (PatientID, Age, Gender, Treatment, EfficacyScore, SideEffect, BloodPressure, DateVisited, BMI, Cholesterol, SmokingStatus, DiabetesStatus, FamilyHistory, Ethnicity, EducationLevel, Occupation)
Values
(1, 45, 'M', 'Treatment A', 8.2, 'No', '120/80', '2022-01-05', 25.3, 180, 'Non-smoker', 'No', 'Yes', 'Caucasian', 'Graduate', 'Healthcare'),
(2, 37, 'F', 'Treatment B', 6.5, 'Yes', '130/90', '2022-01-10', 29.1, 220, 'Smoker', 'Yes', 'No', 'African', 'High School', 'Sales'),
(3, 52, 'F', 'Treatment A', 7.8, 'No', '140/95', '2022-01-15', 31.2, 190, 'Non-smoker', 'Yes', 'Yes', 'Asian', 'Graduate', 'IT'),
(4, 63, 'M', 'Treatment B', 5.3, 'Yes', '125/82', '2022-01-20', 27.6, 250, 'Non-smoker', 'No', 'Yes', 'Caucasian', 'High School', 'Retired'),
(5, 41, 'M', 'Treatment A', 8.7, 'No', '118/75', '2022-01-25', 23.9, 170, 'Smoker', 'Yes', 'No', 'African', 'Graduate', 'Engineering'),
(6, 56, 'F', 'Treatment B', 6.1, 'Yes', '135/88', '2022-01-30', 28.3, 200, 'Non-smoker', 'Yes', 'Yes', 'Asian', 'High School', 'Education'),
(7, 49, 'F', 'Treatment A', 7.2, 'No', '130/85', '2022-02-04', 30.1, 210, 'Non-smoker', 'Yes', 'Yes', 'Caucasian', 'Graduate', 'Healthcare'),
(8, 39, 'M', 'Treatment B', 5.8, 'Yes', '123/78', '2022-02-09', 26.8, 185, 'Smoker', 'No', 'No', 'African', 'High School', 'Sales'),
(9, 58, 'M', 'Treatment A', 8.5, 'No', '122/80', '2022-02-14', 29.7, 230, 'Non-smoker', 'Yes', 'Yes', 'Asian', 'Graduate', 'IT'),
(10, 44, 'F', 'Treatment B', 6.9, 'Yes', '128/83', '2022-02-19', 27.4, 195, 'Non-smoker', 'Yes', 'Yes', 'Caucasian', 'High School', 'Education'),
(11, 60, 'M', 'Treatment A', 9.1, 'No', '115/78', '2022-02-24', 24.5, 175, 'Non-smoker', 'No', 'Yes', 'Caucasian', 'Graduate', 'Healthcare'),
(12, 34, 'F', 'Treatment B', 7.6, 'Yes', '133/86', '2022-02-28', 26.9, 180, 'Smoker', 'Yes', 'No', 'African', 'High School', 'Sales'),
(13, 47, 'M', 'Treatment A', 8.4, 'No', '125/80', '2022-03-05', 29.8, 200, 'Non-smoker', 'Yes', 'Yes', 'Asian', 'Graduate', 'IT'),
(14, 51, 'F', 'Treatment B', 6.3, 'Yes', '140/95', '2022-03-10', 28.1, 220, 'Non-smoker', 'No', 'Yes', 'Caucasian', 'High School', 'Retired'),
(15, 43, 'M', 'Treatment A', 8.9, 'No', '120/78', '2022-03-15', 25.2, 165, 'Smoker', 'Yes', 'No', 'African', 'Graduate', 'Engineering'),
(16, 55, 'M', 'Treatment B', 6.2, 'Yes', '130/85', '2022-03-20', 29.5, 190, 'Non-smoker', 'No', 'Yes', 'Asian', 'High School', 'Education'),
(17, 48, 'F', 'Treatment A', 7.5, 'No', '125/82', '2022-03-25', 26.7, 195, 'Smoker', 'Yes', 'No', 'Caucasian', 'Graduate', 'Healthcare'),
(18, 42, 'M', 'Treatment B', 5.7, 'Yes', '123/80', '2022-03-30', 28.9, 200, 'Non-smoker', 'No', 'Yes', 'African', 'High School', 'Sales'),
(19, 57, 'M', 'Treatment A', 8.3, 'No', '118/75', '2022-04-04', 25.8, 180, 'Non-smoker', 'Yes', 'Yes', 'Asian', 'Graduate', 'IT'),
(20, 38, 'F', 'Treatment B', 6.7, 'Yes', '135/88', '2022-04-09', 30.3, 210, 'Non-smoker', 'No', 'Yes', 'Caucasian', 'High School', 'Retired'),
(21, 49, 'M', 'Treatment A', 7.9, 'No', '130/85', '2022-04-14', 27.6, 185, 'Smoker', 'Yes', 'No', 'African', 'Graduate', 'Engineering'),
(22, 53, 'F', 'Treatment B', 6.4, 'Yes', '140/95', '2022-04-19', 28.4, 195, 'Non-smoker', 'Yes', 'Yes', 'Caucasian', 'High School', 'Education'),
(23, 41, 'F', 'Treatment A', 8.1, 'No', '130/85', '2022-04-24', 29.1, 200, 'Non-smoker', 'No', 'Yes', 'Asian', 'Graduate', 'Healthcare'),
(24, 36, 'M', 'Treatment B', 5.9, 'Yes', '123/78', '2022-04-29', 26.3, 175, 'Smoker', 'Yes', 'No', 'African', 'High School', 'Sales'),
(25, 59, 'M', 'Treatment A', 8.6, 'No', '122/80', '2022-05-04', 30.2, 220, 'Non-smoker', 'Yes', 'Yes', 'Caucasian', 'Graduate', 'IT'),
(26, 45, 'F', 'Treatment B', 6.6, 'Yes', '128/83', '2022-05-09', 27.8, 190, 'Non-smoker', 'Yes', 'No', 'African', 'High School', 'Retired'),
(27, 54, 'F', 'Treatment A', 7.7, 'No', '125/80', '2022-05-14', 29.9, 210, 'Smoker', 'No', 'Yes', 'Caucasian', 'High School', 'Education'),
(28, 40, 'M', 'Treatment B', 5.6, 'Yes', '133/86', '2022-05-19', 26.7, 195, 'Non-smoker', 'Yes', 'No', 'Asian', 'Graduate', 'Healthcare'),
(29, 47, 'M', 'Treatment A', 8.3, 'No', '120/78', '2022-05-24', 27.2, 180, 'Non-smoker', 'No', 'Yes', 'African', 'Graduate', 'Sales'),
(30, 52, 'F', 'Treatment B', 6.5, 'Yes', '130/85', '2022-05-29', 28.3, 200, 'Smoker', 'Yes', 'Yes', 'Caucasian', 'High School', 'IT'),
(31, 39, 'F', 'Treatment A', 7.8, 'No', '140/95', '2022-06-03', 29.5, 195, 'Non-smoker', 'Yes', 'No', 'Asian', 'Graduate', 'Engineering'),
(32, 57, 'M', 'Treatment B', 5.3, 'Yes', '118/75', '2022-06-08', 26.1, 175, 'Non-smoker', 'No', 'Yes', 'African', 'High School', 'Education'),
(33, 42, 'M', 'Treatment A', 8.7, 'No', '125/82', '2022-06-13', 27.9, 190, 'Smoker', 'Yes', 'Yes', 'Caucasian', 'High School', 'Healthcare'),
(34, 38, 'F', 'Treatment B', 6.1, 'Yes', '133/86', '2022-06-18', 28.4, 200, 'Non-smoker', 'Yes', 'Yes', 'Asian', 'Graduate', 'IT'),
(35, 49, 'M', 'Treatment A', 7.2, 'No', '130/85', '2022-06-23', 29.8, 205, 'Smoker', 'No', 'No', 'Caucasian', 'High School', 'Sales'),
(36, 43, 'M', 'Treatment B', 5.8, 'Yes', '123/80', '2022-06-28', 27.3, 180, 'Non-smoker', 'Yes', 'Yes', 'African', 'Graduate', 'Retired'),
(37, 55, 'F', 'Treatment A', 8.5, 'No', '122/80', '2022-07-03', 30.1, 220, 'Non-smoker', 'Yes', 'No', 'Caucasian', 'High School', 'Education'),
(38, 41, 'F', 'Treatment B', 6.9, 'Yes', '128/83', '2022-07-08', 28.6, 190, 'Non-smoker', 'No', 'Yes', 'Asian', 'Graduate', 'Healthcare'),
(39, 56, 'M', 'Treatment A', 7.2, 'No', '125/80', '2022-07-13', 26.8, 185, 'Smoker', 'Yes', 'Yes', 'African', 'High School', 'IT'),
(40, 48, 'M', 'Treatment B', 6.5, 'Yes', '140/95', '2022-07-18', 29.3, 210, 'Non-smoker', 'Yes', 'No', 'Caucasian', 'High School', 'Sales'),
(41, 37, 'F', 'Treatment A', 8.8, 'No', '120/78', '2022-07-23', 25.9, 175, 'Non-smoker', 'No', 'Yes', 'Asian', 'Graduate', 'Retired'),
(42, 53, 'F', 'Treatment B', 6.3, 'Yes', '130/85', '2022-07-28', 28.7, 195, 'Smoker', 'Yes', 'Yes', 'Caucasian', 'Graduate', 'Engineering'),
(43, 39, 'M', 'Treatment A', 7.6, 'No', '140/95', '2022-08-02', 29.9, 200, 'Non-smoker', 'Yes', 'No', 'African', 'High School', 'IT'),
(44, 46, 'F', 'Treatment B', 5.5, 'Yes', '118/75', '2022-08-07', 27.2, 180, 'Non-smoker', 'No', 'No', 'Caucasian', 'Graduate', 'Healthcare'),
(45, 50, 'M', 'Treatment A', 8.4, 'No', '125/82', '2022-08-12', 29.7, 210, 'Smoker', 'Yes', 'Yes', 'Asian', 'High School', 'Sales'),
(46, 44, 'M', 'Treatment B', 6.7, 'Yes', '133/86', '2022-08-17', 28.1, 195, 'Non-smoker', 'No', 'Yes', 'African', 'High School', 'Retired'),
(47, 51, 'F', 'Treatment A', 7.9, 'No', '120/78', '2022-08-22', 30.3, 220, 'Non-smoker', 'Yes', 'No', 'Caucasian', 'Graduate', 'Engineering'),
(48, 40, 'F', 'Treatment B', 6.4, 'Yes', '128/83', '2022-08-27', 26.9, 190, 'Smoker', 'Yes', 'Yes', 'Asian', 'High School', 'Education'),
(49, 54, 'M', 'Treatment A', 7.3, 'No', '125/80', '2022-09-01', 28.2, 185, 'Non-smoker', 'No', 'Yes', 'Caucasian', 'High School', 'Healthcare'),
(50, 42, 'M', 'Treatment B', 5.9, 'Yes', '133/86', '2022-09-06', 27.7, 200, 'Non-smoker', 'Yes', 'No', 'African', 'Graduate', 'IT');
Step-3: Update column name
EXEC sp_rename 'Trial_1_Data.Gender', 'Sex';
Step-4: Updating/altering data from the table
--- Using When Condition
Update Trial_1_Data
Set SideEffect = Case
When
PatientID in (3, 22, 47) Then 'Yes'
Else 'No'
End;
---Using Where Condition
Update Trial_1_Data
Set Cholesterol = 195
Where
PatientID = 5;
Step-5: EXPLORATORY DATA ANALYSIS
QUESTION-1: Is there a significant difference in the efficacy scores between Treatment A and Treatment B?
Select
Treatment,
Avg (EfficacyScore) as AvgEfficacyScore,
Count (*) as SampleSize
From
Trial_1_Data
Group by
Treatment;
QUESTION-2: What is the distribution of education levels among patients in different treatments?
select
Treatment,
EducationLevel,
count (*) as number_of_patients
from
Trial_1_Data
group by
Treatment,
EducationLevel
order by
Treatment, number_of_patients DESC;
QUESTION-3: Does smoking status have an impact on treatment efficacy?
select
SmokingStatus,
avg (EfficacyScore) as AvgEfficacyScore,
count (*) as Number_Subjects
from
Trial_1_Data
group by
SmokingStatus;
QUESTION-4: Are there differences in treatment outcomes based on sex?
Select
Sex,
avg (EfficacyScore) as AvgEfficacyScore,
count (*) as Number_Subjects
from
Trial_1_Data
group by
Sex;
QUESTION-5: Does the presence of diabetes affect treatment outcomes?
select
DiabetesStatus,
avg (EfficacyScore) as TreatmentOutcome,
count (*) as Number_Subject
from
Trial_1_Data
group by
DiabetesStatus
order by
DiabetesStatus DESC
QUESTION-6: Are there any ethnic disparities in treatment responses and outcomes?
SELECT
Ethnicity,
Treatment,
AVG(EfficacyScore) AS AverageOutcome,
count (*) as SampleSize
FROM
Trial_1_Data
GROUP BY
Ethnicity,
Treatment
Order by
AverageOutcome DESC
QUESTION-7: What is the relationship between age and treatment preference?
select
Age,
Treatment,
count (*) as Frequency
from Trial_1_Data
group by
Age,
Treatment
order by
Age, Frequency DESC;
Home Work:
QUESTION-8: What is the relationship between BMI and blood pressure readings?
QUESTION-9: Does the occurrence of side effects vary based on treatment and age group?
QUESTION-10: Is there a correlation between cholesterol levels and treatment efficacy? Get the source code here
SOCIAL CONNECT:
1. YouTube Channel: https://www.youtube.com/@trinath-panda
2. Instagram: https://www.instagram.com/im_trinath/
3. LinkedIn: https://www.linkedin.com/in/trinath-panda/
4. Website: https://trinathp.wixsite.com/trinath-p
5. Twitter: https://twitter.com/im_trinath
Comments