The MSBA program curriculum focuses on three core areas: data engineering, data analytics, and data interpretation. MSBA exposes students to both open-source and commercial software to reflect the heterogeneity of software used in different companies. This includes, but is not limited to STATA, R, Python, and Tableau.
Pre-Program runs from January until the start of Pre-Module 1 in March. During this time, the MSBA program will provide matriculated students access to the Admitted Students Website, where they will be able to get more familiar with the program administration, format, logistics, as well as access optional materials to brush up on any skills prior to the Pre-Module period.
Pre-module 1 is the period from March through April before students arrive to the in-class residential period of the module. During this time, the MSBA program will distribute pre-readings, assignments, group work, and online sessions so that students can complete their Module 1 course work prior to the first in-class session. This period is also an opportunity for the cohort to meet each other virtually before the program convenes on campus for Module 1.
Foundations of Statistics Using R
Previously co-taught by: Kristen Sosulski, Peter Lakner
Course description: The purpose of this course is to ensure that students are prepared to use R as a statistical tool and understand the fundamental statistical concepts. This course is divided into two parts: 1) Getting Started with R and 2) Statistics and R.
Part 1: Getting started with R: The R portion of the course will equip students with the skills needed to work with data using the R statistical computing application. This begins with developing a basic understanding of the R working environment. Second, students will learn to use R while being introduced to the necessary arithmetic and logical operators, and salient functions for manipulating data. Next, students will be introduced to the common data structures, variables, and data types used in R. Students will learn how to develop their own R scripts and utilize the various packages available in R for visualization, manipulation, and statistical analysis. Students will learn how to import data sets and transform and manipulate those datasets for various analytical purposes such as dealing with missing data. Finally, students will learn how to create control structures, such as loops and conditional statements to traverse, sort, merge, and evaluate data.
Part 2: Statistics and R: In the second part of the class basic concepts of probability and statistics will be introduced. We shall study the concepts of population and sample, discuss the difference between population parameters and sample statistics, and draw inferences from known sample statistics to usually unknown population parameters. We shall study discrete distributions along with their means and standard deviations, paying particular attention to the binomial distribution. We shall also study continuous distributions and their probability density functions, paying special attention to the most central of the continuous distributions—the normal distribution. The Central Limit Theorem will be introduced, and confidence intervals and statistical tests will be discussed. We shall then study the simple and multiple linear regression and their applications to prediction and forecast.
Getting started with R (commands, arithmetic operators, logical operators, functions)
Data structures and types
Working and manipulating data sets in R
Digital Marketing Analytics
Previously taught by: Anindya Ghose
Course description: The emergence of the Internet has drastically changed marketing. Some traditional marketing strategies are now completely outdated, others have been deeply transformed, and new digital marketing strategies are continuously emerging based on the unprecedented access to vast amounts of information about products, firms, and consumer behavior. The Internet is now encroaching core business activities such as new product design, advertising, marketing and sales, creation of word-of-mouth, new start-up funding, and customer service. Our goal in this class is to discuss the new business models in electronic commerce that have been enabled by Internet-based social media and advertising technologies, and to analyze the impact these technologies and business models have on industries, firms, and people. We will inform our discussions with insights from data and metrics that can guide us for measurement. To recognize how businesses can successfully leverage these technologies, we will therefore go beyond the technology itself and investigate some key questions.
Econometric regression modeling
Omitted variables problems
Introduction to Business Analytics
Previously taught by: Foster Provost; Alex Tuzhilin
Course description: This course will change the way you think about data and its role in business. Businesses, governments, and individuals create massive collections of data as a byproduct of their activity. Increasingly, decision-makers and systems rely on intelligent technology to analyze data systematically to improve decision-making. In many cases, automating analytical and decision-making processes is necessary because of the volume of data and the speed with which new data are generated. We will examine how data analysis technologies can be used to improve decision-making. We will study the fundamental principles and techniques of data mining, and we will examine real-world examples and cases to place data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science. In addition, we will work “hands-on” with data mining software.
Data mining and data mining processes
Introduction to predictive modeling
Data fitting and over fitting
Cross-validation and learning curves
Model performance analytics
Unsupervised learning and clustering
Bayesian reasoning and text classification
Dealing with Data
Previously taught by: Panos Ipeirotis
All analytics projects rely on data. A crucial step in a business analytics process is creating the dataset that will be analyzed. Unfortunately, the vast majority of the stakeholders do not pay serious attention at this step; however, streamlining and understanding the data often takes 90% of the effort and time of a data analytics project. Furthermore, because most people do not know how the dataset was created, they miss important details and assumptions that were part of the data gathering and handling process, leading to serious problems down the road. This class is designed to teach students to handle data programmatically, without being software engineers. This course guides students through the whole data management process, from initial data acquisition to final data analysis. From a tools perspective, we cover the Python ecosystem: Python serves as a great general-purpose programming language for a wide variety of data management tasks, and is commonly used as the “glue” that brings together all the different aspects of the analytics process.
Previously taught by: Jiawei Zhang; Ilan Lobel
Course description: This course trains students to turn real-world problems into mathematical and spreadsheet models and to use such models to make better managerial decisions. This is a hands-on course that focuses on modeling business problems, turning them into Excel spreadsheet models and using tools like Solver and Crystal Ball to obtain solutions to these managerial problems. The course focuses on two classes of models: optimization and simulation. The application areas are diverse and they originate from problems in finance, marketing and operations. We cover problems such as how to optimize a supply chain, how to price products when faced with demand uncertainty, and how to price exotic financial options using Monte Carlo simulation.
Linear and linear integer programming
Nonlinear programming and evolutionary solver
Simulation and optimization
Multi-period linear programming
Monte Carlo simulation
Previously taught by: Norm White; Ramesh Shankar
Course description: This course offers an in-depth hands-on exploration of various cutting-edge information technologies used for big data analytics. The course will cover background readings on the theoretical foundations of Hadoop and MapReduce, as well as business articles on how Hadoop and related technologies are used by companies. The course will also cover some basics of navigating Google Cloud Platform (GCP) for uploading and analyzing data using Google Cloud Storage, BigQuery, and PySpark on Dataproc (Hadoop cluster). Students have the opportunity to be hands-on with Hadoop – specifically, Linux, Hadoop distributed file system (HDFS), Apache Sqoop, Apache Pig and Apache Hive – for data management and extract-transform-load (ETL) operations. Students will also learn about cloud file storage using GCP, querying cloud data, visualizing cloud data, and using PySpark to run analytics on cloud data.
System architecture and ecosystems
Data management and extract-transform-load operations
Linux, Hadoop, MapReduce, Apache Sqoop, Apache Pig, Apache Hive
Databases for Business Analytics
Databases are ubiquitous in all businesses and hold significant amount of information about the business. Every data analysis and report typically starts with an SQL query, as SQL is the lingua franca of all database systems. Therefore, SQL is a necessity for anyone who needs to analyze data as part of their job, and many tech companies consider knowledge of SQL a prerequisite for all their analysts and managers.
This database class is designed for absolute beginners and teaches students how databases are structured and how to write SQL queries that retrieve data from a database. The class is heavily hands-on, with a focus on developing the necessary skills for writing SQL queries. We will cover the following topics:
- Basics of Entity-Relationship model, and the connection to databases
- USE, DESCRIBE queries, to understand the structure of a database
- Selection queries: *, column, column AS, DISTINCT, ORDER BY, LIMIT
- Filtering data using “where”: Boolean conditions, IN, BETWEEN, LIKE
- Join queries: Inner and Outer joins, self joins
- Aggregation queries: GROUP BY, SUM, AVG, MAX, MIN, etc
At the completion of this course, students will be able to navigate relational databases, issue queries against databases in an organization, and generate data that can be used for analyses and reports.
Data Mining in R
Previously taught by: Luis Torgo; Ravi Bapna
Course description: The goal of this course is to provide hands-on experience on key data mining technologies using one particular tool—the R environment. R is a fast growing technology that has been witnessing widespread acceptance both in academia and industry. Recent surveys have even put it in the top regarding usage by professional data miners (Rexer Analytics survey, 2013). There are many factors contributing for this acceptance, but clearly these include the price (free), being open source (trustworthy software that can be easily inspected/checked for flaws), the extension of available methods (exponential growth of the set of available methods for different application areas), and the available support from the community (an extremely large community of knowledgeable experts proving top-notch support for free). This course illustrates the use of R for several key data mining processes. This illustration will be driven by concrete case studies that we will “solve” using R. The course can be regarded as a hands-on complement of the Data Science for Business Analytics.
Data pre-processing (dealing with unknown values)
Defining the data mining task
Performance estimation for time series models
Modeling and performance estimation
Model outcomes and model selection
Data Driven Decision Making
Previously taught by: Vishal Singh; Rob Seamans
Course description: In every aspect of our daily lives, from the way we work, shop, communicate, or socialize, we are both consuming and creating vast amounts of information. More often than not, these daily activities create a trail of digitized data that is being stored, mined, and analyzed by firms hoping to create valuable business intelligence. With technological advances and developments in customer databases, firms have access to vast amounts of high-quality data which allows them to understand customer behavior and customize business tactics to increasingly fine segments or even segments of one. However, much of the promise of such data-driven policies has failed to materialize because managers find it difficult to translate customer data into actionable policies. The general objective of this course is to fill this gap by providing students with tools and techniques that can be utilized for making business decisions. Note that this is not a statistics or mathematics course. The emphasis of the class will be on applications and interpretation of the results for making real life business decisions.
Regression-based model development
Capturing non-linear effects: dummy variables & log transformations
Estimating & Interpreting log demand models
Using log-regressions to understand competitive marketplace
Previously taught by: Harry Chernoff
Course description: This course is an introduction to the principles and techniques of operations analytics. Operations and supply management is defined as the design, operation, and improvement of the systems that create and deliver the firm's primary products and services. In this course, students will learn operations models and techniques that work with large data sources. Operations management has dealt with applying analytics for many years. Recently, however, due to big data, many older models and software are incapable of running the analyses. This course will demonstrate the application of Operations models that are currently being used in industry incorporating big data.
Process design and analysis
Quality, value and cost
Previously taught by: Kristen Sosulski
Course description: This course is an introduction to the principles and techniques of data visualization. Visualizations are graphical depictions of data that can improve comprehension, communication, and decision making. In this course, students will learn visual representation methods and techniques that increase the understanding of complex data and models. Emphasis is placed on the identification of patterns, trends and differences from datasets across categories, space, and time. This is a hands-on course. Students will use several tools to refine their data and create visualizations. These include: R/RStudio, Python, Tableau, ThinkCell for PowerPoint, Geocodio, and Excel.
Design principles for charts and graphs
Creating data displays
Designing effective digital presentations
Visualizing categorical data
Time series data, multiple variables, and geospatial data
Previously taught by: Arun Sundararajan
Course description: Social media and mobile commerce create massive connected data sets that contain a wealth of business and social insights. This course will translate cutting-edge network science research into actionable analytics strategies for dealing with big data that is networked, text-intensive and unstructured, with applications from viral marketing, A/B testing and media planning.
Strength and trust in social networks
Measuring and interpreting network position
Community structure in networks
Identifying and measuring contagion in networks
Decision Under Risk
Previously taught by: Gustavo Vulcano; Ilan Lobel
Course description: Analytics is “the scientific process of transforming data into insight for making better decisions.” For example, sales data can help us understand consumer purchase behaviors as well as demand patterns. These insights can be used to make sales forecasts, which in turn can inform assortment and production planning decisions. Optimization models have played a very important role in turning “insights” into “decisions” for companies in various industries: advertising, airlines, energy, investment and finance, marketing, manufacturing, retailing, etc. This course is aimed at enriching the student exposure to business analytics techniques. It has two main parts. The first part covers sensitivity analysis, which is a follow-up of the linear programming topic covered in the Decision Models course, and which relates to understanding the impact of changing the parameters of a model on the optimal solution. It is executed using Excel Solver. The second part, which spans most of the course, covers decision making under uncertainty. Students will learn how to build optimization models that incorporate random parameters (e.g., stochastic demand, price, etc.).
Sensitivity analysis for linear programming
Two-stage stochastic optimization with recourse
Revenue Management and Pricing
Previously taught by: Rene Caldentey; Gustavo Vulcano
Course description: Revenue management and Pricing (RMP) focuses on how firms should manage their pricing and product availability policies across different selling channels in order to maximize performance and profitability. One of the best-known applications of PRM is yield management whereby airlines, hotels, and other companies seek to maximize operating contribution by dynamically managing capacity over time. Building on a combination of lectures and case studies the course develops a set of methodologies that students can use to identify and develop opportunities for revenue optimization in different business contexts, including the transportation and hospitality industries, retail, media and entertainment, financial services, health care and manufacturing, and others. The course places particular emphasis on discussing quantitative models needed to tackle a number of important business problems including capacity allocation, markdown management, dynamic pricing for e-commerce, customized pricing, and demand forecasts under market uncertainty, to name a few.
Marginal value of capacity
Network revenue management
Pricing policies in action
Demand forecasting and data analysis
Data Privacy and Ethics
Previously taught by: Solon Barocas; Aaron Martin; Michael Veale
Course description: There is a growing sense of urgency around the ethics of analytics. Harvard Business Review, for instance, declared that “oversight for algorithms” and “data privacy” would be among the top trends that business professionals could not ignore in 2016. Our class will tackle these topics head-on. Together, we will explore what it means to use analytics ethically and how to think about its ethical implications. We will approach these matters from the perspective of professionals who lead analytics-oriented teams or organizations, and whose success depends on their ability to recognize the ethical issues at stake and resolve these to the satisfaction of multiple stakeholders.
Understanding sources of unfairness in analytics
Unique challenge that analytics pose for privacy
Policy responses and proposals
Conducting experiments ethically
Strategy, Change and Analytics
Previously taught by: JP Eggers
Course description: This course focuses on significant strategic decisions—such as the introduction of new products or the acquisition of another firm—and explores how data-driven and analytical approaches can be used to inform these decisions from a senior management perspective. A case-based approach allows us to discuss details of significant strategic decisions. We will cover some core aspects of business strategy, including external analysis, competitor analysis, and opportunity analysis. We will also look more deeply at different aspects of the decision-making process within organizations, both to understand the process and to think about implementation. The goal is to understand the role of analytics and analytical approaches in the broader organization.
Value creation and capture
Firm positioning versus competitors
Ratio analysis in strategy
Flexibility and commitment in strategy
Leading organizational change; causality and interpretation of analytical results
Modern Artificial Intelligence
Previously taught by: Alex Tuzhilin; Xi Chen
Course description: The purpose of this course is to provide the students with systematic introduction to the recent developments in AI through the coverage of fundamental AI concepts, practical business applications and the hands-on experiences with modern AI frameworks, such as Weka.
AI's fundamental concepts and methods
Deep Learning frameworks
Learning how to apply AI-based methos to solving practical business problems
Understand future of AI technologies over next few years
The Capstone project, which students work on throughout the year, is presented at the culmination of the program. This integrative exercise gives students an opportunity to review and interpret data through statistical and operational analysis with the use of predictive models and the application of optimization techniques. The result is a unified and practical case presentation on a topic of the group's choosing. This is a team-based project with approximately 4-6 students per group. The integrative projects should not take the form of formal dissertations or narrative papers. Rather, they should take the form of “reports to management,” emphasizing substance over length and the forest over the trees. Where possible, they should be action-oriented and framed in terms of business policy and competitive strategy. Given this format, they should be easily convertible into PowerPoint presentations.
To review more information on past projects, visit our Capstone page.