Best stroke prediction dataset github. There were 5110 rows and 12 columns in this dataset.
Best stroke prediction dataset github 5% of them are related to non-stroke patients. Using SQL and Power BI, it aims to identify trends and corr This code demonstrates the development of a stroke prediction model using machine learning and the deployment of the model as a FastAPI web service. sum() OUTPUT: id 0 gender 0 age 0 hypertension 0 heart_disease 0 ever_married 0 work_type 0 Residence Aug 25, 2022 · More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects. [5] 2. This project uses six machine learning models (XGBoost, Random Forest Classifier, Support Vector Machine, Logistic Regression, Single Decision Tree Classifier, and TabNet)to make stroke predictions. - baisali14/Hypertension-Heart-Disease-and-Stroke-Prediction-using-SVM This repository holds a machine learning model trained using SVM to predict whether a person has hypertension or not, the person has heart disease or not and the person has stroke Navigation Menu Toggle navigation. This model is created with the following data in mind: patient data which includes medical history and demographic information. csv. ipynb - 4. This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. 66 0. 09 0. Using SQL and Power BI, it aims to identify trends and corr Stroke Prediction Dataset Context According to the World Health Organization (WHO) stroke is the 2nd leading cause of death globally, responsible for approximately 11% of total deaths. GitHub repository for stroke prediction project. A dataset containing all the required fields to build robust AI/ML models to detect Stroke. py has the main function and contains all the required functions for the flask app. Predicting whether a patient is likely to get stroke or not - terickk/stroke-prediction-dataset Skip to content. Contribute to Rasha-A21/Stroke-Prediction-Dataset development by creating an account on GitHub. Analysis of the Stroke Prediction Dataset provided on Kaggle. This notebook, 2-model. ipynb at main Contribute to manop-ph/stroke-prediction-dataset development by creating an account on GitHub. o Visualize the relation between stroke and other features by use pandas crosstab and seaborn heatmap. Machine learning models were evaluated with Pandas in Jupyter notebooks using a stroke prediction dataset. 52%) and high FP rate (26. ; The system uses Logistic Regression: Logistic Regression is a regression model in which the response variable (dependent variable) has categorical values such as True/False or 0/1. The dataset have: 4 numerical variables: "id", "age", "avg_glucose_leve" and "bmi" Stroke Prediction Dataset. and choosign the best one (for this case): the Contribute to HemantKumarRathore/STROKE-PREDICTION-using-multiple-ML-algorithem-and-comparing-best-accuracy-based-on-given-dataset development by creating an account Hi all, This is the capstone project on stroke prediction dataset. isnull(). model. Find and fix vulnerabilities You signed in with another tab or window. The dataset consists of over $5000$ individuals and $10$ different input variables that we will use to predict the risk of stroke. The analysis includes linear and logistic regression models, univariate descriptive analysis, ANOVA, and chi-square tests, among others. Marital status and presence of heart disease have no significant effect on stroke; Older age, hypertension, higher glucose level and higher BMI increase the risk of stroke At the conclusion of segment 1 of this project we have tried several different machine learning models with this dataset (RandomForestClassifier, BalancedRandomForestClassifier, LogisticRegression, and Neural Network). Among the records, 1. For this purpose, I used the "healthcare-dataset-stroke-data" from Kaggle. Write better code with AI Security. You signed out in another tab or window. Data Set Information: This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, and various diseases and smoking status. Created March 22, 2023 21:03. Find and fix vulnerabilities The "Cerebral Stroke Prediction" dataset is a real-world dataset used for the task of predicting the occurrence of cerebral strokes in individual. 2. These features are selected based on our earlier discussions. using visualization libraries, ploted various plots like pie chart, count plot, curves Toggle navigation. machine-learning random-forest svm jupyter-notebook logistic-regression lda knn baysian stroke-prediction If not available on GitHub, the notebook can be accessed on nbviewer, or alternatively on Kaggle. 3). - ajspurr/stroke_prediction Skip to content. Navigation Menu Toggle navigation. We get the conclusion that age, hypertension and work type self-employed would affect the possibility of getting stroke. Saved searches Use saved searches to filter your results more quickly Comparing 10 different ML classifiers and using the one having best accuracy to predict the stroke risk to user. To determine which model is the best to make stroke predictions, I plotte… Stroke Prediction Analysis Project: This project explores a dataset on stroke occurrences, focusing on factors like age, BMI, and gender. 57%) using Logistic Regression on kaggle dataset . Recall is very useful when you have to Sep 18, 2024 · You signed in with another tab or window. You switched accounts on another tab or window. Stroke Prediction Analysis Project: This project explores a dataset on stroke occurrences, focusing on factors like age, BMI, and gender. Libraries Used: Pandas, Scitkitlearn, Keras, Tensorflow, MatPlotLib, Seaborn, and NumPy DataSet Description: The Kaggle stroke prediction dataset contains over 5 thousand samples with 11 total features (3 continuous) including age, BMI, average glucose level, and more. - NVM2209/Cerebral-Stroke-Prediction Performing Various Classification Algorithms with GridSearchCV to find the tuned parameters - Akshay672/STROKE_PREDICTION_DATASET Contribute to KhaledFadi/Stroke-Prediction development by creating an account on GitHub. A subset of the original train data is taken using the filtering method for Machine Learning and Data Visualization purposes. This project builds a classifier for stroke prediction, which predicts the probability of a person having a stroke along with the key factors which play a major role in causing a stroke. In this project, we will attempt to classify stroke patients using a dataset provided on Kaggle: Kaggle Stroke Dataset. DataFrame'> Int64Index: 4908 entries, 0 to 5109 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ----- ----- ----- 0 id 4908 non-null int64 1 gender 4908 non-null object 2 age 4908 non-null float64 3 hypertension 4908 non-null int64 4 heart_disease 4908 non-null int64 5 ever_married 4908 non-null object 6 work_type 4908 non-null object 7 Residence Project Introduction: My project is titled "Cerebral-Stroke-Prediction", with the goal of predicting whether a patient will suffer from a stroke so that timely interventions can be provided. - hridaybasa/Stroke-Prediction-Using-Data-Science-And-Machine-Learning Project Title: "Cerebral-Stroke-Prediction" for predicting whether a patient will suffer from a stroke, in order to provide timely interventions. There are more female than male in the data set. The input variables are both numerical and categorical and will be explained below. The input data is sourced from Kaggle, and this dataset is severely imbalanced, so we need to apply techniques like UnderSampling to balance the data. Leveraged skills in data preprocessing, balancing with SMOTE, and hyperparameter optimization using KNN and Optuna for model tuning. ; The system uses a 70-30 training-testing split. Comparing 10 different ML classifiers and using the one having best accuracy to predict the stroke risk to user. o scale values of avg_glucose_level, bmi, and age by using StandardScaler in sklearn. heroku scikit-learn prediction stroke-prediction Stroke Prediction for Preventive Intervention: Developed a machine learning model to predict strokes using demographic and health data. Contribute to arturnovais/Stroke-Prediction-Dataset development by creating an account on GitHub. #Create two table: stroke people, normal people #At 99% CI, the stroke people bmi is higher than normal people bmi at 0. o Replacing the outlier values with the mode. 67 0. The value of the output column stroke is either 1 or 0. Input data is preprocessed and is given to over 7 models, where a maximum accuracy of 99. It’s a crowd- sourced platform to attract, nurture, train and challenge data scientists from all around the world to solve data science, machine learning and predictive analytics problems. o Convert categorical variables to numbers by LabelEncoder in sklearn. 71 0. In this project, I use the Heart Stroke Prediction dataset from WHO to predict the heart stroke. project aims to predict the likelihood of a stroke based on various health parameters using machine learning models. - GitHub - erma0x/stroke-prediction-model: Data exploration, preprocessing, analysis and building a stroke model prediction in the life of the patient. Manage code changes You signed in with another tab or window. As said above, there are 12 features with one target feature or response variable -stroke- and 11 explanatory variables. core. ipynb, selects a model across many different classifiers and tunes the best selected classifiers using cross-validation. . By developing a predictive model, we aim to: Reduce the incidence of stroke through early intervention. Sign in #Hypothesis: people who had stroke is higher in bmi than people who had no stroke. The chosen model was connected to an interactive Tableau dashboard that predicts a user's stroke risk using a Tabpy server. Part I (see Stroke prediction using Logistic regression. Column Name Data Type Description; id: Integer: Unique identifier: gender: Object "Male", "Female", "Other" age: Float: Age of patient: hypertension: Integer: 0 if the patient doesn't have hypertension, 1 if the patient has hypertension 11 clinical features for predicting stroke events Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Analysis based 4 different machine learning models. - NIRMAL1508/STROKE-DISEASE-PREDICTION In this project, we used logistic regression to discover the relationship between stroke and other input features. Stroke Disease Prediction classifies a person with Stroke Disease and a healthy person based on the input dataset. BhanuMotupalli / Heart Stroke Prediction Dataset. 5% of them are related to stroke patients and the remaining 98. The objective is to predict brain stroke from patient's records such as age, bmi score, heart problem, hypertension and smoking practice. Navigation Menu Toggle navigation Easy Ensemble AdaBoost Classifier Balanced Accuracy Score: 0. Input Features: id: A unique identifier for each patient in the dataset. The output attribute is a The dataset used in the development of the method was the open-access Stroke Prediction dataset. - EDA-Clustering-Classification-on-Stroke-Prediction-Dataset/README. There were 5110 rows and 12 columns in this dataset. Learn more Mar 7, 2025 · Dataset Source: Healthcare Dataset Stroke Data from Kaggle. Perform Extensive Exploratory Data Analysis, apply three clustering algorithms & apply 3 classification algorithms on the given stroke prediction dataset and mention the best findings. Using SQL and Power BI, it aims to identify trends and corr Write better code with AI Code review. Contact Info Please direct all communications to Henry Tsai @ hawkeyedatatsai@gmail. I used Logistic Regression with manual class weights since the dataset is imbalanced. - JuanS286/StrokeClassifier This project looks to create a stroke classifier to predict the likelihood of a patient to have a stroke. It includes data preprocessing (label encoding, KNN imputation, SMOTE for balancing), and trains models like Naive Bayes, Decision Tree, SVM, and Logistic Regression. csv │ │ └── stroke_data_final. Deployment and API: The stroke prediction model is deployed as an easy-to-use API, allowing users to input relevant health data and obtain real-time stroke risk predictions. ipynb at main · terickk/stroke-prediction-dataset I have taken this dataset from kaggle. The dataset consists of 11 clinical features which contribute to stroke occurence. Check for Missing values # lets check for null values df. Predicting whether a patient is likely to get stroke or not - stroke-prediction-dataset/code. In addition to the features, we also show results for stroke prediction when principal components are used as the input. The number 0 indicates that no stroke risk was identified, while the value 1 indicates that a stroke risk was detected. Feature Engineering; o Substituting the missing values with the mean. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects. heroku scikit-learn prediction stroke-prediction Brain Stroke Prediction- Project on predicting brain stroke on an imbalanced dataset with various ML Algorithms and DL to find the optimal model and use for medical applications. Initially an EDA has been done to understand the features and later This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. Stroke Prediction Using Machine Learning (Classification use case) Topics machine-learning model logistic-regression decision-tree-classifier random-forest-classifier knn-classifier stroke-prediction Contribute to enot9910/Stroke-Prediction-Dataset development by creating an account on GitHub. The following approach is used: Contribute to enot9910/Stroke-Prediction-Dataset development by creating an account on GitHub. 4% is achieved. The dataset used to build our model is Stroke Prediction Dataset which is available in Kaggle. Find and fix vulnerabilities Stroke Prediction Dataset. Dataset: Stroke Prediction Dataset This project predicts stroke occurrences using machine learning on a healthcare dataset. joblib │ ├── processed/ │ │ ├── processed_stroke_data. Kaggle is an AirBnB for Data Scientists. You signed in with another tab or window. This dataset has been used to predict stroke with 566 different model algorithms. Optimized dataset, applied feature engineering, and implemented various algorithms. frame. - KSwaviman/EDA-Clustering-Classification-on-Stroke-Prediction-Dataset This project describes step-by-step procedure for building a machine learning (ML) model for stroke prediction and for analysing which features are most useful for the prediction. We did the following tasks: Performance Comparison using Machine Learning Classification Algorithms on a Stroke Prediction dataset. - msn2106/Stroke-Prediction-Using-Machine-Learning Feb 7, 2024 · Their objectives encompassed the creation of ML prediction models for stroke disease, tackling the challenge of severe class imbalance presented by stroke patients while simultaneously delving into the model’s decision-making process but achieving low accuracy (73. We used as a dataset the "Stroke Prediction Dataset" from Kaggle. Divide the data randomly in training and testing 3) What does the dataset contain? This dataset contains 5110 entries and 12 attributes related to brain health. Summary without Implementation Details# This dataset contains a total of 5110 datapoints, each of them describing a patient, whether they have had a stroke or not, as well as 10 other variables, ranging from gender, age and type of work Stroke Prediction Analysis Project: This project explores a dataset on stroke occurrences, focusing on factors like age, BMI, and gender. The dataset is preprocessed, analyzed, and multiple models are trained to achieve the best prediction accuracy. Later tuned model by selecting variables with high coefficient > 0. 16 0. csv from the Kaggle Website, credit to the author of the dataset fedesoriano. Synthetically generated dataset containing Stroke Prediction metrics. to make predictions of stroke cases based on simple health Plan and track work Code Review. Sep 21, 2021 · <class 'pandas. predict() method takes input from the request (once the 'compute' button from index. The high mortality and long-term care requirements impose a significant burden on healthcare systems and families. 95 0. Navigation Menu Toggle navigation Predicted stroke risk with 92% accuracy by applying logistic regression, random forests, and deep learning on health data. Achieved high recall for stroke cases. age: The age In our project we want to predict stroke using machine learning classification algorithms, evaluate and compare their results. joblib │ │ └── optimized_stroke_model. I perform EDA using Pandas, seaborn, matplotlib library In this I used machine learning algorithms for categorical output like, logistic regression, Decision tree, Random forest, KNN, Adaboost, gradientboost, xgboost with and without hyperpameter tunning I concluded, the This prediction model has been brought up for the purpose of predicting stroke cases in patients due to the increase in overall cases across the world. Selected features using SelectKBest and F_Classif. Analysis of the Stroke Prediction Dataset to provide insights for the hospital. In the Heart Stroke dataset, two class is totally imbalanced and heart stroke datapoints will be easy to ignore to compare with the no heart stroke datapoints. Tools: Jupyter Notebook, Visual Studio Code, Python, Pandas, Numpy, Seaborn, MatPlotLib, Supervised Machine Learning Binary Classification Model, PostgreSQL, and Tableau. It is used to predict whether a patient is likely to get stroke based on the input parameters like age, various diseases, bmi, average glucose level and smoking status. Navigation Menu Toggle navigation Model comparison techniques are employed to determine the best-performing model for stroke prediction. This package can be imported into any application for adding security features. Comparing 10 different ML classifiers and using the one having best accuracy to predict the stroke risk to user. html is pressed) and converts it into an array. 2. Brain stroke poses a critical challenge to global healthcare systems due to its high prevalence and significant socioeconomic impact. - victorjongsoon/stroke-prediction Jun 13, 2021 · Download the Stroke Prediction Dataset from Kaggle and extract the file healthcare-dataset-stroke-data. 50 1176 1 0. Data Source: The healthcare-dataset-stroke-data. We aim to identify the factors that con Prediction of stroke in patients using machine learning algorithms. - GitHub - sa-diq/Stroke-Prediction: Prediction of stroke in patients using machine learning algorithms. Navigation Menu Toggle navigation Contribute to 9amomaru/Stroke-Prediction-Dataset development by creating an account on GitHub. The stroke prediction dataset was used to perform the study. Sign in Product The Dataset Stroke Prediction is taken in Kaggle. 15,000 records & 22 fields of stroke prediction dataset, containing: 'Patient ID', 'Patient Name', 'Age', 'Gender', 'Hypertension', 'Heart Disease', 'Marital Status', 'Work Type The aim of this project is to determine the best model for the prediction of brain stroke for the dataset given, to enable early intervention and preventive measures to reduce the incidence and impact of strokes, improving patient outcomes and overall healthcare. 77 0. Data exploration, preprocessing, analysis and building a stroke model prediction in the life of the patient. Prediction of brain stroke based on imbalanced dataset in Perform Extensive Exploratory Data Analysis, apply three clustering algorithms & apply 3 classification algorithms on the given stroke prediction dataset and mention the best findings. o use SMOTE from <class 'pandas. md at main · terickk/stroke-prediction-dataset Stroke is a leading cause of death and disability worldwide. This dataset has: 5110 samples or rows; 11 features or columns; 1 target column (stroke). Write better code with AI Code review Performing Various Classification Algorithms with GridSearchCV to find the tuned parameters - STROKE_PREDICTION_DATASET/Stroke_Prediction_Dataset. gender: The gender of the patient, which can be "Male" or "Female". 4) Which type of ML model is it and what has been the approach to build it? This is a classification type of ML model. 82 bmi #Conclusion: Reject the null hypothesis, finding that higher bmi level is likely The object is to use the best machine learning model and come back to study the correct predictions, and find out more precious characters on stroke patients. This project utilizes ML models to predict stroke occurrence based on patient demographic, medical, and lifestyle data. You need to download ‘Stroke Prediction Dataset’ data using the library Scikit learn; ref is given below. joblib │ │ ├── model_metadata. Contribute to CTrouton/Stroke-Prediction-Dataset development by creating an account on GitHub. app. - ankitlehra/Stroke-Prediction-Dataset---Exploratory-Data-Analysis Nov 1, 2022 · Here we present results for stroke prediction when all the features are used and when only 4 features (A, H D, A G and H T) are used. I have done EDA, visualisation, encoding, scaling and modelling of dataset. 3 Stroke Prediction Analysis Project: This project explores a dataset on stroke occurrences, focusing on factors like age, BMI, and gender. - cayelsie/Stroke-prediction Contribute to Aftabbs/Stroke-Prediction-using-Machine-Learning development by creating an account on GitHub. Stroke Prediction Dataset. com Hi all,. csv │ └── raw/ │ └── healthcare-dataset You signed in with another tab or window. 47 - 2. 7) This project predicts stroke disease using three ML algorithms - fmspecial/Stroke_Prediction Machine Learning project using Kaggle Stroke Dataset where I perform exploratory data analysis, data preprocessing, classification model training (Logistic Regression, Random Forest, SVM, XGBoost, KNN), hyperparameter tuning, stroke prediction, and model evaluation. Dataset. Reload to refresh your session. Data Preprocessing: This includes handling missing values, encoding categorical variables, dealing with outliers, and normalizing the data to prepare it for modeling. 52 52 avg / total 0. Dependencies Python (v3. This study uses the "healthcare-dataset-stroke-data" from Kaggle, which includes 5110 observations and 12 attributes, to predict stroke occurrence. Sign in Product. csv │ │ ├── stroke_data_engineered. Using SQL and Power BI, it aims to identify trends and corr An exploratory data analysis (EDA) and various statistical tests performed on a dataset focused on stroke prediction. 7162480376766092 Predicted No Stroke Predicted Stroke Actual No Stroke 780 396 Actual Stroke 12 40 pre rec spe f1 geo iba sup 0 0. Take it to the Real World: We need to use our model to make predictions using unseen data to see how it performs. md at main · KSwaviman/EDA-Clustering-Classification-on-Stroke-Prediction-Dataset Predicting whether a patient is likely to get stroke or not - stroke-prediction-dataset/README. The dataset includes 100k patient records. - mmaghanem/ML_Stroke_Prediction Hi all,. We analyze a stroke dataset and formulate various statistical models for predicting whether a patient has had a stroke based on measurable predictors. Mar 22, 2023 · GitHub Gist: instantly share code, notes, and snippets. - GitHub - Assasi Stroke prediction with machine learning and SHAP algorithm using Kaggle dataset - Silvano315/Stroke_Prediction. 98% accurate - This stroke risk prediction Machine Learning model utilises ensemble machine learning (Random Forest, Gradient Boosting, XBoost) combined via voting classifier. In the code, we have created the instance of the Flask() and loaded the model. DataFrame'> Int64Index: 4088 entries, 25283 to 31836 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ----- ----- ----- 0 gender 4088 non-null object 1 age 4088 non-null float64 2 hypertension 4088 non-null int64 3 heart_disease 4088 non-null int64 4 ever_married 4088 non-null object 5 work_type 4088 non-null object 6 Residence_type 4088 non-null In this stroke prediction model we have implemented Logistic Regression, Random Forest & LightGBM. Stroke are becoming more common among female than male; A person’s type of residence has no bearing on whether or not they have a stroke. The model is trained on a dataset with various health-related features to predict the likelihood of a stroke occurrence. To predict what factors influence a person’s stroke, I will utilize the stroke variable as the dependent variable. Factors such as age, body mass index, smoking status, average glucose level, hypertension, heart disease, and body mass index are critical risk factors for stroke. 79 0. Contribute to kushal3877/Stroke-Prediction-Dataset development by creating an account on GitHub. - bpalia/StrokePrediction. With a relatively smaller dataset (although quite big in terms of a healthcare facility), every possible effort to minimize or eliminate overfitting was made, ranging from methods like k-fold cross validation to hyperparameter optimization (using grid search CV) to find the best value for each parameters in a model. Timely prediction and prevention are key to reducing its burden. Alleviate healthcare costs associated with long-term stroke care. Incorporate more data: To improve our dataset in the next iterations, we need to include more data points of people with stroke so that we can create target balance before modeling Sep 15, 2022 · Authors Visualization 3. Using SQL and Power BI, it aims to identify trends and corr Handling Class Imbalance: Since stroke cases are rare in the dataset (class imbalance), we applied SMOTE (Synthetic Minority Over-sampling Technique) to generate synthetic samples of the minority class and balance the dataset. Context According to the World Health Organization (WHO) stroke is the 2nd leading cause of death globally, responsible for The dataset used to predict stroke is a dataset from Kaggle. So i used sampling technique to solve that problem. We tune parameters with Stratified K-Fold Cross Validation, ROC-AUC, Precision-Recall Curves and feature importance analysis. Contribute to renjinirv/Stroke-prediction-dataset development by creating an account on GitHub. The goal is to, with the help of several easily measuable predictors such as smoking , hyptertension , age , to predict whether a person will suffer from a stroke. 98 0. Skip to content. This dataset was created by fedesoriano and it was last updated 9 months ago. 51 1228 Contribute to arturnovais/Stroke-Prediction-Dataset development by creating an account on GitHub. The goal here is to get the best accuracy on a larger dataset. This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, and various diseases and smoking status. Fetching user details through web app hosted using Heroku. Sign in Product Contribute to 9amomaru/Stroke-Prediction-Dataset development by creating an account on GitHub. Show Gist options. GitHub Copilot. 76 0. com This dataset is imbalenced . PREDICTION-STROKE/ ├── data/ │ ├── models/ │ │ ├── best_stroke_model. The project aims at displaying the charts/plots of the number of people affected by stroke based on the input parameters like smoking status, high blood pressure level, Cholesterol level, obesity level in some of the countries. Working with dataset consisting of lifestyle and physical data in order to build model for predicting strokes - R-C-McDermott/Stroke-prediction-dataset The system uses data pre-processing to handle character values as well as null values. Manage code changes Write better code with AI Security. Each row in the data provides relavant information about the patient. The API can be integrated seamlessly into existing healthcare systems Using the “Stroke Prediction Dataset” available on Kaggle, our primary goal for this project is to delve deeper into the risk factors associated with stroke. We have also done Hyperparameter tuning for each model. Navigation Menu Toggle navigation The dataset for the project has the following columns: id: unique identifier; gender: "Male", "Female" or "Other" age: age of the patient; hypertension: 0 if the patient doesn't have hypertension, 1 if the patient has hypertension With the help of kaggle stroke prediction dataset, identify patients with a stroke. xkbfghgjmnpglnmxlaljfoycrajvtnagwcuafacxboulrevzqkbcpdqkqychxigzgklocryfcetossvuf