报告题目：A Novel Penalized Log-likelihood Objective Function for Class Imbalance Problem
报 告 人：张丽丽
The log-likelihood function is the optimization objective in the maximum likelihood method for estimating model coefficients. However, its underlying assumption is to maximize the overall accuracy, which does not apply to the imbalanced data existing in many real-world problems (e.g. fraud detection, defective production detection, customer conversion prediction, predictive maintenance, cybersecurity, rare disease diagnoses). The resulted models tend to be biased towards the majority class (i.e. non-event), which can bring great loss in practice. One strategy for mitigating this bias is to penalize the misclassifications of observations differently in the log-likelihood objective function in the learning process. Existing penalized log-likelihood functions require hard hyperparameter estimating or high computational complexity. In the present work, we propose a novel penalized log-likelihood function by including penalty weights for observations in the minority class (i.e. event) as decision variables and learning them from data along with model coefficients. In the experiments, we compared models trained by the proposed log-likelihood function and existing ones, in respects of the statistics of Area under ROC Curve of 100 runs of 10-fold stratified cross validation on 10 public datasets, including 95% confidence interval, mean and standard deviation, as well as the training time. A more detailed analysis was conducted on an imbalanced credit dataset to examine estimated probability distributions and additional performance measures (i.e. Type I error, Type II error and accuracy). The results demonstrate that both discrimination ability and computation efficiency of models are improved by using the proposed log-likelihood function as the learning objective.