top of page
Writer's pictureTrinath Panda

Clinical Trial SQL-EDA Analysis

Updated: Oct 29, 2024

Welcome to our comprehensive project designed to guide you through the exciting world of SQL learning! Whether you're a beginner eager to explore the foundations of querying databases or a data enthusiast seeking to refine your data analysis skills, this project is your ultimate resource.


In this project, we've curated a wealth of resources and learning paths that cater to various learning styles and skill levels. Our goal is to empower you with the knowledge and techniques required to effectively use SQL for data analysis.

clinical trial SQL exploratory data analysis

I used a dummy clinical trial dataset. Here I have started from the beginning by learning: To design a database, create a table, add data to the table, Update the column name, Altering the data in the table, Define the questions that need to be answered, Clean the dataset, and Performing Exploratory Data analysis.


Tutorial

For a step-by-step tutorial, you can see my:

This project is for study purposes, and all the data shown and used is dummy. You can also use this for practice and educational purposes.


STEPS:


Step-1: Creating Table

Create Table Trial_1_Data
(
PatientID INT,
Age Int,
Gender Varchar (6),
Treatment Varchar (11),
EfficacyScore Decimal (4,2),
SideEffect Varchar (3),
BloodPressure Varchar (10),
DateVisited Date,
BMI Decimal (5,2),
Cholesterol Int,
SmokingStatus Varchar (10),
DiabetesStatus Varchar (3),
FamilyHistory Varchar (3),
Ethnicity Varchar (20),
EducationLevel Varchar (20),
Occupation Varchar (20)
);

Step-2: Inserting Data into the table

Insert into Trial_1_Data (PatientID, Age, Gender, Treatment, EfficacyScore, SideEffect, BloodPressure, DateVisited, BMI, Cholesterol, SmokingStatus, DiabetesStatus, FamilyHistory, Ethnicity, EducationLevel, Occupation)
Values
(1, 45, 'M', 'Treatment A', 8.2, 'No', '120/80', '2022-01-05', 25.3, 180, 'Non-smoker', 'No', 'Yes', 'Caucasian', 'Graduate', 'Healthcare'),
    (2, 37, 'F', 'Treatment B', 6.5, 'Yes', '130/90', '2022-01-10', 29.1, 220, 'Smoker', 'Yes', 'No', 'African', 'High School', 'Sales'),
    (3, 52, 'F', 'Treatment A', 7.8, 'No', '140/95', '2022-01-15', 31.2, 190, 'Non-smoker', 'Yes', 'Yes', 'Asian', 'Graduate', 'IT'),
    (4, 63, 'M', 'Treatment B', 5.3, 'Yes', '125/82', '2022-01-20', 27.6, 250, 'Non-smoker', 'No', 'Yes', 'Caucasian', 'High School', 'Retired'),
    (5, 41, 'M', 'Treatment A', 8.7, 'No', '118/75', '2022-01-25', 23.9, 170, 'Smoker', 'Yes', 'No', 'African', 'Graduate', 'Engineering'),
    (6, 56, 'F', 'Treatment B', 6.1, 'Yes', '135/88', '2022-01-30', 28.3, 200, 'Non-smoker', 'Yes', 'Yes', 'Asian', 'High School', 'Education'),
    (7, 49, 'F', 'Treatment A', 7.2, 'No', '130/85', '2022-02-04', 30.1, 210, 'Non-smoker', 'Yes', 'Yes', 'Caucasian', 'Graduate', 'Healthcare'),
    (8, 39, 'M', 'Treatment B', 5.8, 'Yes', '123/78', '2022-02-09', 26.8, 185, 'Smoker', 'No', 'No', 'African', 'High School', 'Sales'),
    (9, 58, 'M', 'Treatment A', 8.5, 'No', '122/80', '2022-02-14', 29.7, 230, 'Non-smoker', 'Yes', 'Yes', 'Asian', 'Graduate', 'IT'),
    (10, 44, 'F', 'Treatment B', 6.9, 'Yes', '128/83', '2022-02-19', 27.4, 195, 'Non-smoker', 'Yes', 'Yes', 'Caucasian', 'High School', 'Education'),
	(11, 60, 'M', 'Treatment A', 9.1, 'No', '115/78', '2022-02-24', 24.5, 175, 'Non-smoker', 'No', 'Yes', 'Caucasian', 'Graduate', 'Healthcare'),
    (12, 34, 'F', 'Treatment B', 7.6, 'Yes', '133/86', '2022-02-28', 26.9, 180, 'Smoker', 'Yes', 'No', 'African', 'High School', 'Sales'),
    (13, 47, 'M', 'Treatment A', 8.4, 'No', '125/80', '2022-03-05', 29.8, 200, 'Non-smoker', 'Yes', 'Yes', 'Asian', 'Graduate', 'IT'),
    (14, 51, 'F', 'Treatment B', 6.3, 'Yes', '140/95', '2022-03-10', 28.1, 220, 'Non-smoker', 'No', 'Yes', 'Caucasian', 'High School', 'Retired'),
    (15, 43, 'M', 'Treatment A', 8.9, 'No', '120/78', '2022-03-15', 25.2, 165, 'Smoker', 'Yes', 'No', 'African', 'Graduate', 'Engineering'),
    (16, 55, 'M', 'Treatment B', 6.2, 'Yes', '130/85', '2022-03-20', 29.5, 190, 'Non-smoker', 'No', 'Yes', 'Asian', 'High School', 'Education'),
    (17, 48, 'F', 'Treatment A', 7.5, 'No', '125/82', '2022-03-25', 26.7, 195, 'Smoker', 'Yes', 'No', 'Caucasian', 'Graduate', 'Healthcare'),
    (18, 42, 'M', 'Treatment B', 5.7, 'Yes', '123/80', '2022-03-30', 28.9, 200, 'Non-smoker', 'No', 'Yes', 'African', 'High School', 'Sales'),
    (19, 57, 'M', 'Treatment A', 8.3, 'No', '118/75', '2022-04-04', 25.8, 180, 'Non-smoker', 'Yes', 'Yes', 'Asian', 'Graduate', 'IT'),
    (20, 38, 'F', 'Treatment B', 6.7, 'Yes', '135/88', '2022-04-09', 30.3, 210, 'Non-smoker', 'No', 'Yes', 'Caucasian', 'High School', 'Retired'),
	(21, 49, 'M', 'Treatment A', 7.9, 'No', '130/85', '2022-04-14', 27.6, 185, 'Smoker', 'Yes', 'No', 'African', 'Graduate', 'Engineering'),
    (22, 53, 'F', 'Treatment B', 6.4, 'Yes', '140/95', '2022-04-19', 28.4, 195, 'Non-smoker', 'Yes', 'Yes', 'Caucasian', 'High School', 'Education'),
    (23, 41, 'F', 'Treatment A', 8.1, 'No', '130/85', '2022-04-24', 29.1, 200, 'Non-smoker', 'No', 'Yes', 'Asian', 'Graduate', 'Healthcare'),
    (24, 36, 'M', 'Treatment B', 5.9, 'Yes', '123/78', '2022-04-29', 26.3, 175, 'Smoker', 'Yes', 'No', 'African', 'High School', 'Sales'),
    (25, 59, 'M', 'Treatment A', 8.6, 'No', '122/80', '2022-05-04', 30.2, 220, 'Non-smoker', 'Yes', 'Yes', 'Caucasian', 'Graduate', 'IT'),
    (26, 45, 'F', 'Treatment B', 6.6, 'Yes', '128/83', '2022-05-09', 27.8, 190, 'Non-smoker', 'Yes', 'No', 'African', 'High School', 'Retired'),
    (27, 54, 'F', 'Treatment A', 7.7, 'No', '125/80', '2022-05-14', 29.9, 210, 'Smoker', 'No', 'Yes', 'Caucasian', 'High School', 'Education'),
    (28, 40, 'M', 'Treatment B', 5.6, 'Yes', '133/86', '2022-05-19', 26.7, 195, 'Non-smoker', 'Yes', 'No', 'Asian', 'Graduate', 'Healthcare'),
    (29, 47, 'M', 'Treatment A', 8.3, 'No', '120/78', '2022-05-24', 27.2, 180, 'Non-smoker', 'No', 'Yes', 'African', 'Graduate', 'Sales'),
    (30, 52, 'F', 'Treatment B', 6.5, 'Yes', '130/85', '2022-05-29', 28.3, 200, 'Smoker', 'Yes', 'Yes', 'Caucasian', 'High School', 'IT'),
	(31, 39, 'F', 'Treatment A', 7.8, 'No', '140/95', '2022-06-03', 29.5, 195, 'Non-smoker', 'Yes', 'No', 'Asian', 'Graduate', 'Engineering'),
    (32, 57, 'M', 'Treatment B', 5.3, 'Yes', '118/75', '2022-06-08', 26.1, 175, 'Non-smoker', 'No', 'Yes', 'African', 'High School', 'Education'),
    (33, 42, 'M', 'Treatment A', 8.7, 'No', '125/82', '2022-06-13', 27.9, 190, 'Smoker', 'Yes', 'Yes', 'Caucasian', 'High School', 'Healthcare'),
    (34, 38, 'F', 'Treatment B', 6.1, 'Yes', '133/86', '2022-06-18', 28.4, 200, 'Non-smoker', 'Yes', 'Yes', 'Asian', 'Graduate', 'IT'),
    (35, 49, 'M', 'Treatment A', 7.2, 'No', '130/85', '2022-06-23', 29.8, 205, 'Smoker', 'No', 'No', 'Caucasian', 'High School', 'Sales'),
    (36, 43, 'M', 'Treatment B', 5.8, 'Yes', '123/80', '2022-06-28', 27.3, 180, 'Non-smoker', 'Yes', 'Yes', 'African', 'Graduate', 'Retired'),
    (37, 55, 'F', 'Treatment A', 8.5, 'No', '122/80', '2022-07-03', 30.1, 220, 'Non-smoker', 'Yes', 'No', 'Caucasian', 'High School', 'Education'),
    (38, 41, 'F', 'Treatment B', 6.9, 'Yes', '128/83', '2022-07-08', 28.6, 190, 'Non-smoker', 'No', 'Yes', 'Asian', 'Graduate', 'Healthcare'),
    (39, 56, 'M', 'Treatment A', 7.2, 'No', '125/80', '2022-07-13', 26.8, 185, 'Smoker', 'Yes', 'Yes', 'African', 'High School', 'IT'),
    (40, 48, 'M', 'Treatment B', 6.5, 'Yes', '140/95', '2022-07-18', 29.3, 210, 'Non-smoker', 'Yes', 'No', 'Caucasian', 'High School', 'Sales'),
    (41, 37, 'F', 'Treatment A', 8.8, 'No', '120/78', '2022-07-23', 25.9, 175, 'Non-smoker', 'No', 'Yes', 'Asian', 'Graduate', 'Retired'),
	(42, 53, 'F', 'Treatment B', 6.3, 'Yes', '130/85', '2022-07-28', 28.7, 195, 'Smoker', 'Yes', 'Yes', 'Caucasian', 'Graduate', 'Engineering'),
    (43, 39, 'M', 'Treatment A', 7.6, 'No', '140/95', '2022-08-02', 29.9, 200, 'Non-smoker', 'Yes', 'No', 'African', 'High School', 'IT'),
    (44, 46, 'F', 'Treatment B', 5.5, 'Yes', '118/75', '2022-08-07', 27.2, 180, 'Non-smoker', 'No', 'No', 'Caucasian', 'Graduate', 'Healthcare'),
    (45, 50, 'M', 'Treatment A', 8.4, 'No', '125/82', '2022-08-12', 29.7, 210, 'Smoker', 'Yes', 'Yes', 'Asian', 'High School', 'Sales'),
    (46, 44, 'M', 'Treatment B', 6.7, 'Yes', '133/86', '2022-08-17', 28.1, 195, 'Non-smoker', 'No', 'Yes', 'African', 'High School', 'Retired'),
    (47, 51, 'F', 'Treatment A', 7.9, 'No', '120/78', '2022-08-22', 30.3, 220, 'Non-smoker', 'Yes', 'No', 'Caucasian', 'Graduate', 'Engineering'),
    (48, 40, 'F', 'Treatment B', 6.4, 'Yes', '128/83', '2022-08-27', 26.9, 190, 'Smoker', 'Yes', 'Yes', 'Asian', 'High School', 'Education'),
    (49, 54, 'M', 'Treatment A', 7.3, 'No', '125/80', '2022-09-01', 28.2, 185, 'Non-smoker', 'No', 'Yes', 'Caucasian', 'High School', 'Healthcare'),
    (50, 42, 'M', 'Treatment B', 5.9, 'Yes', '133/86', '2022-09-06', 27.7, 200, 'Non-smoker', 'Yes', 'No', 'African', 'Graduate', 'IT');

Step-3: Update column name

EXEC sp_rename 'Trial_1_Data.Gender', 'Sex';

Step-4: Updating/altering data from the table


--- Using When Condition

Update Trial_1_Data
Set SideEffect = Case
When
  PatientID in (3, 22, 47) Then 'Yes'
Else 'No'
End;

---Using Where Condition

Update Trial_1_Data
Set Cholesterol = 195
Where
  PatientID = 5;

Step-5: EXPLORATORY DATA ANALYSIS


QUESTION-1: Is there a significant difference in the efficacy scores between Treatment A and Treatment B?

Select 
 Treatment,
 Avg (EfficacyScore) as AvgEfficacyScore,
 Count (*) as SampleSize
From
 Trial_1_Data
Group by
 Treatment;
Result

QUESTION-2: What is the distribution of education levels among patients in different treatments?

select
   Treatment,
   EducationLevel,
   count (*) as number_of_patients
from 
  Trial_1_Data
group by 
  Treatment,
  EducationLevel
order by
  Treatment, number_of_patients DESC;
Result

QUESTION-3: Does smoking status have an impact on treatment efficacy?

select
 SmokingStatus,
 avg (EfficacyScore) as AvgEfficacyScore,
 count (*) as Number_Subjects
from
 Trial_1_Data
group by
 SmokingStatus;
Result

QUESTION-4: Are there differences in treatment outcomes based on sex?

 Select
  Sex,
  avg (EfficacyScore) as AvgEfficacyScore,
  count (*) as Number_Subjects
from
 Trial_1_Data
group by
 Sex;
Result

QUESTION-5: Does the presence of diabetes affect treatment outcomes?

select
  DiabetesStatus,
  avg (EfficacyScore) as TreatmentOutcome,
  count (*) as Number_Subject
from 
  Trial_1_Data
group by
  DiabetesStatus
order by
  DiabetesStatus DESC
Result

QUESTION-6: Are there any ethnic disparities in treatment responses and outcomes?

 SELECT
    Ethnicity,
    Treatment,
    AVG(EfficacyScore) AS AverageOutcome,
	count (*) as SampleSize
FROM
    Trial_1_Data
GROUP BY
    Ethnicity,
	Treatment
Order by
    AverageOutcome DESC
Result

QUESTION-7: What is the relationship between age and treatment preference?

 select
  Age,
  Treatment,
  count (*) as Frequency
from Trial_1_Data
group by
  Age,
  Treatment
order by
  Age, Frequency DESC;
Result

Home Work:

QUESTION-8: What is the relationship between BMI and blood pressure readings?

QUESTION-9: Does the occurrence of side effects vary based on treatment and age group?

QUESTION-10: Is there a correlation between cholesterol levels and treatment efficacy? Get the source code here


SOCIAL CONNECT:


Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page