Youngjin Lee, Associate Professor













 Postdoc, Massachusetts Institute of Technology, 2005-2007

  • Educational Data Mining

 Postdoc, National Center for Supercomputing Applications, 2003 – 2005

  • Automated Learning

PhD, University of Illinois at Urbana-Champaign, 2003

  • Educational Computing

MS, Seoul National University, 1996

  • Earth Science (focusing on astronomy)

BSc, Seoul National University, Seoul National University, 1994

  • Earth Science

Research Interests

  •  Learning analytics
  •  Educational data mining
  •  Quantitative data analysis
  •  Information visualization



Notable Journal Publications

  • Effect of uninterrupted time-on-task on students' success in Massive Open Online Courses (MOOCs) (2018). Published by Elsevier. 
    Abstract: This study investigated the relationship between uninterrupted time-on-task and academic success of students enrolled in a Massive Open Online Course (MOOC). The variables representing uninterrupted time-on-task, such as number and duration of uninterrupted consecutive learning activities, were mined from the log files capturing how 4286 students tried to learn Newtonian mechanics concepts in a MOOC. These variables were used as predictors in the logistic regression model estimating the likelihood of students getting a course certificate at the end of the semester. The analysis results indicate that the predictive power of the logistic regression model, which was assessed by Area Under the Precision-Recall Curve  (AUPRC), depends on the value of off-task threshold time, and the likelihood of students getting a course certificate increases as students were doing more uninterrupted learning activities over a longer period of time. The findings from this study suggest that a simple count of learning activities, which has been used as a proxy for time-on-task in previous studies, may not accurately describe student learning in the computer-based learning environment because it does not reflect the quality, such as uninterrupted durations, of those learning activities. 
  • Using Self-Organizing Map and Clustering to Investigate Problem-Solving Patterns in the Massive Open Online Course: An Exploratory Study (2018). Published by the Journal of Educational Computing Research. 
    Abstract: This study investigated whether clustering can identify different groups of students enrolled in a massive open online course (MOOC). This study applied self-organizing map and hierarchical clustering algorithms to the log files of a physics MOOC capturing how students solved weekly homework and quiz problems to identify clusters of students showing similar problem-solving patterns. The usefulness of the identified clusters was verified by examining various characteristics of students such as the number of problems students attempted to solve, weekly and daily problem completion percentages, and whether they earned a course certificate. The findings of this study suggest that the clustering technique utilizing self-organizing map and hierarchical clustering algorithms in tandem can be a useful exploratory data analysis tool that can help MOOC instructors identify similar students based on a large number of variables that examine their characteristics from multiple perspectives. 
  • Estimating student ability and problem difficulty using item response theory (IRT) and TrueSkill (2019). Published by EmeraldInsight. 
    Abstract: Purpose – The purpose of this paper is to investigate an efficient means of estimating the ability of students solving problems in the computer-based learning environment. Design/methodology/approach – Item response theory (IRT) and TrueSkill were applied to simulated and real problem-solving data to estimate the ability of students solving homework problems in the massive open online course (MOOC). Based on the estimated ability, data mining models predicting whether students can correctly solve homework and quiz problems in the MOOC were developed. The predictive power of IRT- and TrueSkill-based data mining models were compared in terms of Area Under the receiver operating characteristic Curve. Findings – The correlation between students’ ability estimated from IRT and TrueSkill was strong. In addition, IRT- and TrueSkill-based data mining models showed a comparable predictive power when the data included a large number of students. While IRT failed to estimate students’ ability and could not predict their problem-solving performance when the data included a small number of students, TrueSkill did not experience such problems. Originality/value – Estimating students’ ability is critical to determine the most appropriate time for providing instructional scaffolding in the computer-based learning environment. The findings of this study suggest that TrueSkill can be an efficient means for estimating the ability of students solving problems in the computer-based learning environment regardless of the number of students. 
  • Mathematical learning models that depend on prior knowledge and instructional strategies (2008). Published by Physical Review Physics Education Research. 
    Abstract: We present mathematical learning models—predictions of student’s knowledge vs amount of instruction—that are based on assumptions motivated by various theories of learning: tabula rasa, constructivist, and tutoring. These models predict the improvement (on the post-test) as a function of the pretest score due to intervening instruction and also depend on the type of instruction. We introduce a connectedness model whose connectedness parameter measures the degree to which the rate of learning is proportional to prior knowledge. Over a wide range of pretest scores on standard tests of introductory physics concepts, it fits high-quality data nearly within error. We suggest that data from MIT have low connectedness (indicating memory-based learning) because the test used the same context and representation as the instruction and that more connected data from the University of Minnesota resulted from instruction in a different representation from the test.
  • Measuring student learning with item response theory (2008). Published by Physical Review Physics Education Research. 
    Abstract: We investigate short-term learning from hints and feedback in a Web-based physics tutoring system. Both the skill of students and the difficulty and discrimination of items were determined by applying item response theory IRT to the first answers of students who are working on for-credit homework items in an introductory Newtonian physics course. We show that after tutoring a shifted logistic item response function with lower discrimination fits the students’ second responses to an item previously answered incorrectly. Student skill decreased by 1.0 standard deviation when students used no tutoring between their incorrect first and second attempts, which we attribute to “item-wrong bias.” On average, using hints or feedback increased students’ skill by 0.8 standard deviation. A skill increase of 1.9 standard deviation was observed when hints were requested after viewing, but prior to attempting to answer, a particular item. The skill changes measured in this way will enable the use of IRT to assess students based on their second attempt in a tutoring environment.