# What You’ll Learn in Tufts' Data Science Master’s Program—And Why

Are you a good candidate for a Master of Science in Data Science (MSDS) program? If you're interested in changing the world through the power of data and you're not squeamish about statistics, computer science, mathematics, and analytical reasoning, the answer is yes.

You may be thinking, "Yes, but do I need a master's degree to succeed in data science?" Most data scientists think so. According to a Burtch Works study, 90 percent of all data scientists have advanced to either the master's (49 percent) or PhD (41 percent) level. While you may be able to get a data science job with just a bachelor's degree, it likely won't be a top job, and you'll be competing for that job against candidates with more advanced academic credentials.

**Employers seek data scientists with advanced degrees for a reason:** it takes time, expertise, and training to develop the skills necessary to excel in this field. A master's in data science program teaches you how to:

- Formulate a problem in terms of data science tools and practices
- Collect and clean data to solve a problem
- Apply data analysis techniques to get an accurate and comprehensive answer
- Conceive of and express that answer in terms of risks and potential errors—whether those errors be human or machine

## Who typically earns an MSDS?

Most students in Tufts’ online MSDS program have a bachelor's degree in a STEM field related to data science, such as:

- Business analytics
- Computer engineering
- Computer science
- Data science
- Engineering
- Health informatics
- Mathematics
- Physics
- Statistics

Additionally, MSDS candidates often have some professional experience in computing. Job titles most frequently listed by MSDS applicants include:

- Computer and information research scientist
- Computer and information systems manager
- Computer network architect
- Database administrator
- Information security analyst
- Network architect
- Programmer
- Software developer
- Systems analyst
- Web developer

Some students seek to advance in their current fields; others hope to broaden their skill sets to develop new opportunities in emerging disciplines. Some plan to continue to a PhD program after completing their master's degree.

They share in common an ambition to advance and succeed in data science professions. They are problem solvers capable of working independently on complex problems, confident of their capability to manage, tame, and divine meaning from massive amounts of data.

**Tufts’ online MSDS program favors candidates with drive, enthusiasm, and exceptional analytical skills.** The School of Engineering admissions committee reviews applications holistically when considering applicants to the online MS in Data Science, weighing applicants’ professional experience and academic history.

## What you’ll learn in Tufts’ MSDS program—and why it’s important

The interdisciplinary data science curriculum at Tufts' online MSDS program trains agile data scientists who can apply data analysis methodologies and problem-solving skills across various enterprises. The degree opens career opportunities in business, education, healthcare and health management, public policy and government, nonprofit management, and any other endeavor that benefits from big data analysis—which is to say, almost any modern undertaking.

**The rigorous curriculum is delivered 100 percent online, and students can complete their degree in under two years.** Course work facilitates extensive contact between researchers and students, exposing them to the real-world problems and considerable obstacles they’ll face as leaders of their generation of data science professionals. Faculty members include pioneers in experimental data science practices and applications, affording students invaluable exposure to tomorrow's standard practices.

What, specifically, will you learn at Tufts' online MSDS program? We've summarized the curriculum below.

### Principles of Data Science in Python

According to the Python Software Foundation, Python is “an interpreted, object-oriented, high-level programming language with dynamic semantics.” Dynamic typing and binding, coupled with Python's built-in data structures, make it an optimal glue language for connecting existing data components. Python's straightforward syntax, emphasizing readability and reducing program maintenance cost, explains why it is the programming language of choice for so many data scientists.

Python has a variety of applications across the field of data science, including:

- Web and internet development using frameworks such as Django and Pyramid
- Micro-frameworks such as Flask and Bottle
- Advanced content management systems such as Plone and django CMS

Python’s standard library supports internet protocols such as HTML and XML, JSON, email processing; it also supports FTP, IMAP, and other internet protocols.

Python offers a broad range of specialized applications. Python's collection of packages for mathematics, science, and engineering—called SciPy—is used in scientific and internet computing. Pandas is a data analysis and modeling library, while IPython is a robust interactive shell that supports visualizations and parallel computing. Python’s other products teach programming, act as a support language for software developers, and build EPR and e-commerce systems.

In the Tufts online data science graduate program, you’ll learn the foundations of Python programming for data analysis, including commonly used Python data structures and algorithms, Python program design, the use and creation of software libraries, and coding standards and practices. In labs, students learn to use IPython and the Jupyter data analysis workflow framework. Tufts’ online MSDS Python course provides numerous examples drawn from data preparation and transformation, statistical data analysis, machine learning, deep learning, and deep data science, including recommendation systems and trend analysis.

### Probabilistic Systems Analysis

How do you model randomness and uncertainty? Probabilistic systems and network analysis address this complex challenge to provide essential data for decision-making processes on the operational level. Load flow analysis, reliability analyses, voltage sag assessment, and generic scenario analysis all use probabilistic systems analysis—and so do risk analysis engineers, safety analysis engineers, licensing engineers, reliability engineers, and simulation engineers. Healthcare analysts use probabilistic systems analysis to capture diagnostic and therapeutic errors and to optimize care delivery.

In this course, you'll develop foundational analytical methodologies to model and analyze stochastic phenomena. You'll also learn to apply these tools to various problems arising in engineering, operations, manufacturing research, and academia. The first portion of the class covers introductory probability theory, including sample spaces related to probability, discrete and continuous random variables, expectations and conditional expectations, conditional probability, and derived distributions. The rest of the class focuses on statistical analysis methods like hypothesis testing, confidence intervals, regression modeling, and nonparametric methods.

### Introduction to Machine Learning

Machine learning algorithms detect patterns in enormous quantities of data and use this to power recommendation systems like those used by Netflix and Spotify, search engines like Google, social media feeds like Facebook and Twitter, and voice assistants like Siri.

Because machine learning has so many applications, careers in the field are among the most popular career choices for data science and computer science professionals. Machine learning engineers, natural language processing scientists, and business intelligence developers all use machine learning in performing their core roles. Machine learning engineers examine tasks that humans perform and figure out how to automate them.

Introduction to Machine Learning provides an overview of how computers learn from data and experience to produce problem-solving decisions. Tufts MSDS students gain extensive knowledge of numerous machine learning components, including supervised learning, unsupervised learning, reinforcement learning, and knowledge extraction from massive databases with science, engineering, and medical applications. Students also learn how to examine problems and determine whether those problems are appropriate for machine learning solutions and, subsequently, how to solve those problems using applicable techniques.

### Statistics

Statistics is the science of collecting, analyzing, presenting and interpreting data. Statisticians work in every industry that requires collection and analysis of large quantities of data—and as our world becomes more and more data-driven, the ability to translate data into meaningful insights is a skill that most employers covet. Statistics are used by everyone, from financial risk analysts, investment analysts, and operational researchers to chartered accountants, economists, and insurance underwriters.

Course work emphasizes theory, with a secondary focus on computations. Students gain extensive experience analyzing problems of estimating, predicting, and inferring, given limited data. You’ll learn about parameter estimation, the convergence of random variables, properties of estimators, statistical tests and confidence intervals, and non-parametric statistics.

### Reinforcement Learning

Reinforcement learning is, in many ways, a subset of machine learning. It involves training machine learning models to execute a sequence of decisions. Real-world applications include autonomous cars, learning-based robots that can perform tasks too dangerous for humans, text summarization, question answering, and machine translation.

Reinforcement Learning focuses on agents responsible for learning, planning, and acting in complex, non-deterministic environments. Course work covers the foundational theory of and approaches to reinforcement learning and commonly used software libraries and packages used to implement and test reinforcement learning algorithms. This course is a graduate seminar with assigned readings and discussions; the course content is guided in part by the students' interests. Options include the advanced study of transfer learning and deep reinforcement learning.

### Stochastic Processes, Detection, and Estimation

Stochastic processes collect random variables defined on a common probability space, taking values in a common set S (the state space) and indexed by a set T (thought of as time, either discrete or continuous). Mathematicians, physicists, and engineers are among those who use stochastic processes to model stock prices, drive option pricing theory, and solve differential geometry problems.

Students develop essential analytical tools for modeling and analyzing stochastic (random) phenomena. The course explores how to apply these tools to various engineering, manufacturing, and operations research problems. The course covers introductory probability theory (including sample spaces and probability, discrete and continuous random variables, conditional probability, expectations, and conditional expectations, and derived distributions). Students also investigate statistical analysis methods, including hypothesis testing, confidence intervals, and nonparametric methods.

### Aspects of Data Analysis

Data analytics applies statistics, business intelligence, and information science principles to collect, arrange, and examine data. Data analysts organize their discoveries into dashboards, reports, and visualizations for stakeholders; they often make actionable recommendations based on those findings. Data analysis professionals are generally domain experts with solid database querying skills in SQL and statistical programming languages such as R and Python.

Aspects of Data Analytics covers mathematical data science, emphasizing theory. Students learn about essential applications of data analytics on a small and large scale and program standard algorithms. They learn foundational and crucial topics, including principal component analysis, algorithms in numerical linear algebra, unsupervised clustering and density methods, nearest neighbor classifiers, supervised methods (such as vector machines and neural networks), and spectral graph theory, with applications in image processing, network analysis, and other areas.

### Big Data

The term “big data” describes the large volume of structured and unstructured data that floods an organization on a day-to-day basis. Most data science professionals—data scientists, data engineers, data analysts, security engineers, database managers, data architects, and technical recruiters—use big data. These professionals collect large sets of structured and unstructured data, clean and validate that data, and apply models and algorithms to data mining.

Big Data class is a combination of theory, algorithms, and practical, hands-on work. It covers collecting, processing, analyzing, and acting on data with unprecedented speed, scale, and complexity. Students dive into the most recent big data techniques and infrastructures, including parallel and distributed database systems, map-reduce infrastructures, scalable platforms for complex data types, stream processing systems, and cloud-based computing. Students learn to apply common statistical and machine learning techniques to large data sets.

### Artificial Intelligence

Artificial Intelligence (AI) technologies rely on deep learning and natural language processing to train machines to learn from experience, adjust to new inputs, and perform human-like tasks. Chess-playing computers and self-driving cars can learn to accomplish specific tasks by processing large datasets and recognizing patterns in those datasets.

The Artificial Intelligence course focuses on history, theory, and computational methods. Students review foundational concepts, including representation of knowledge and computational methods for reasoning. Other potential application areas of study include expert systems, robotics, computer vision, natural language understanding, and planning.

## Should you get a master's in data science?

Tufts' MSDS program immerses students in data analysis principles, methods, and practices. It cultivates analytics skills and develops the ability to guide exhaustive, data-driven decision-making processes and methodologies. It won't be easy to accomplish this on your own, and it will be even more difficult to convince employers you have the same skill set as someone who holds an advanced degree in the field. That's why most data scientists opt for graduate study.