The aim of this thesis is to identify dangerous driver behavior using large-scale data from intelligent sensor systems and machine learning techniques. The data was collected from a large database created through a simulation experiment conducted within the European H2020 project. Three categories of driving were extracted from the data; normal driving, dangerous driving and avoidable accident. The three categories were extracted using maximum speed as the concerned variable and checking whether drivers exceeded the speed limit through it. In addition, the majority of studies had a sample imbalance problem, with the samples of dangerous driving conditions being much smaller than those of safe driving conditions. Therefore, the SMOTE resampling method was used to resolve the imbalance of the data in the safety levels as well as to ensure the impartiality of the models. Ridge classifier, support-vector machine, random forests and XgBoost models were developed for data analysis. According to their results, the random forests and XgBoost models showed the most reliable results in the prediction ability with 95% accuracy of the three driver categories with lower probability of prediction error, compared to Ridge Classifier and Support-vector machine. Then to better understand these models, Shapley values were found where they showed us the most important variables affecting each model. Finally, suggestions are made to utilize the results and to further research the subject.
ID | ad128 |
Presentation | |
Full Text | |
Tags |