Carnegie Mellon University

17-601 Data Driven Software Engineering

Data Driven Software Engineering is a 12 unit full semester course.

Software engineers and developers play a critical role in the value chain of today’s technology driven enterprises—such as Amazon, Google, Facebook, Apple, Netflix, Tesla, Uber, etc.—more than ever before. In recent years, we have witnessed several technology transformations which have been the results of self-contained and self-directed software engineering teams that compete to deliver on ideas that merit investment. Further, continuous deployments force new demands on DevOps teams to rapidly deliver highly-available and defect-free software that features intuitive and responsive UI design, relevant content, optimal user experience that fosters efficiency and speed—this is the world to which customers have become increasingly accustomed. Digitization of information intensive processes affords large amounts of data to DevOps teams that can be statistically analyzed to make predictions or draw inferences in support of software engineering investments. To this end, the course aims to equip software engineers with data analysis skills that will assist them in developing a more holistic and data-driven view of their function leading to more informed decisions on the software features they prioritize and develop. Students will learn and apply statistical learning techniques such as logistic regression, clustering, decision trees, etc. The techniques will be applied to the provided DevOps data sets to visualize, analyze, interpret and drive software engineering decisions. The course will also use a practical corporate case study to set the stage and context before the project commences.

Target Audience

The course is focused on graduate students and undergraduate seniors with a desire to learn and use data analytics to create measureable impact through software engineering activities in terms of business outcomes. Additionally, those with a desire to understand the use of data in the decision-making process to transform engineering operations will benefit. Formal knowledge of database management systems formal software engineering and/or software development skills/experience will be beneficial. A basic course in statistics will be a plus.

After completing this course, students will:

  • Use DevOps data sets to measure software engineering activities and visualize performance
  • Perform hypothesis testing on feature experimentation (A/B testing or split test) data and determine the best feature option
  • Analyze performance in terms of integration, build, deployment and post-production support metrics and determine the main variance drivers and actions to redress
  • Make decisions for improvement of software engineering activities including the classification of defect data and prediction of quality
  • Predict user-centric value offerings and measure business value
  • Appreciate the opportunity to extrapolate the use of these concepts in non-software engineering contexts, should they need to do so

More course details can be found in the Data-driven Software Engineering syllabus.