We live in a data-driven age. The digitization of our lives generates vast sums of information, commonly known as big data. One study forecasts global data volume will grow to 175 zettabytes by 2025. If we copied all that data onto DVDs, the number of discs needed would circle the Earth 222 times. Much of this information enters the datasphere when we shop online, peruse social media, or use smart devices connected to the Internet of Things (IoT). New scientific instruments also contribute heaps of information to the global datasphere. Researchers believe there will be almost 350 petabytes of climate data by 2030, collected from weather instruments, satellites, radar, and climate models.
The sheer volume and variety of big data are overwhelming—even the most skilled spreadsheet wranglers can’t derive insights from the modern datasphere. But data scientists can tackle big data with advanced mathematical skills and cutting-edge technologies, such as machine learning and artificial intelligence. They reveal patterns in otherwise indecipherable data sets and provide actionable information to decision-makers in business, nonprofit organizations, and government.
Demand for data scientists is significant. The U.S. Bureau of Labor Statistics projects that jobs in this field will increase by 36 percent by 2031. However, data science is a challenging field, and data scientists need specialized training. They develop their expertise in programs such as Tufts University School of Engineering’s online Master of Science in Data Science (MSDS).
Data scientists can use their skills to influence change in almost every field, and data science work in medicine, business, agriculture, and transportation already makes headlines. As a consequence, leaders across all sectors are paying close attention and considering the future of data science.
Data Science Facilitates Efficiency and Effectiveness in Healthcare
One of the most striking examples of the applications of data science in healthcare is its use in medical imaging. X-ray machines, CT scanners, and MRI machines generate irregular imagery, and doctors spend years honing their skills to consistently and accurately interpret them. Historically, machine-driven analysis of these types of images was impossible. Machines thrive on structured data—data that fits into highly organized spreadsheets—but images are unstructured data and not well-organized for computer consumption.
Data scientists develop algorithms to analyze vast amounts of this unstructured information, such as images, news feeds, or social media posts—and deliver concrete insights for people to act on. In medicine, machine learning algorithms assist doctors in evaluating medical images and identifying potential diagnostic indicators for further investigation, such as tumors and arterial plaque. Some modern algorithms can perform as well as the best doctors in diagnostic settings. Karen Panetta, dean of graduate education and professor of electrical and computer engineering at the Tufts School of Engineering, knows the impact machine learning can have on medical imaging. Recently, her work helped diagnose pneumonia in COVID-19 patients with 99 percent accuracy. “We had already developed all these tools for image processing, machine learning, and AI methods for cancer, so COVID-19 was just a more timely application of the same technology,” Panetta said. “We’re just tuning the software for a different use case.”
Data science also assists healthcare administrators. The Portuguese company Glintt uses artificial intelligence and predictive analytics—two tools in the data science trade—to streamline hospital bed assignments. The company’s algorithm predicts the length of each patient’s admission, and hospital administrators use this information to assign beds. With the help of accurate predictions, they avoid turning away patients or delaying treatment. Researchers utilized a similar machine learning algorithm to predict hospital stays for COVID-19 patients when bed space was in high demand.
Data Science Accelerates the Development of New Pharmaceuticals
Traditional drug development is a trial-and-error process where human researchers test many molecular compounds to find successful treatments. The process can lead down many false avenues before making progress, and some landmark medical advances, such as the discovery of penicillin, happened by accident.
Artificial intelligence and data science offer a new way forward for drug developers. AI can examine vast pharmaceutical data sets and “learn patterns that might be too subtle or complex for humans to recognize,” identifying promising new molecules for further study. AI-assisted drug development has already yielded success stories. The German biotechnology company Evotec began phase 1 clinical trials of an anticancer molecule in 2021. The traditional discovery process likely would have taken four or five years to find this molecule, but the company’s AI-driven work trimmed initial discovery to eight months.
Researchers at Tufts are also contributing to these efforts. In a paper published in Cell Reports Medicine, researchers created machine learning tools to help drug developers identify the best therapeutic combinations. Under ordinary conditions, pharmacists may have hundreds or thousands of promising four-drug combinations available, resulting in lengthy trials to find the best combinations. Data science and machine learning can speed up that process. “Our framework creates accurate predictions of how effective treatments will be when we move from testing in a lab to testing in mouse models, which is an important step in choosing which treatments progress to human clinical trials,” says Bree Aldridge, associate professor of biomedical engineering at the School of Engineering.
AI’s contributions to pharmaceutical breakthroughs aren’t limited to discovery. Consulting firm Deloitte notes that, in addition to discovery, clinical drug trials can take between five and seven years, which means new therapeutics can take over a decade to reach patients. Data science speeds up this process. Machine learning can illuminate obscure relationships within clinical data, accelerating the data analysis stage. Artificial intelligence can also analyze previous studies and offer cost- and efficiency-improving insights as researchers develop the next generation of clinical trials.
Human biology poses another challenge for pharmaceutical researchers. Therapeutics influence the body’s functioning in various ways, but the mechanisms in the human body, such as protein structures in the genome, have remained obscure for years. Until recently, discovering one new protein sequence could take “months to years of painstaking effort.”
New advancements in data science promise to hasten the process exponentially. Researchers at Alphabet’s DeepMind developed AlphaFold, a machine learning algorithm to discover new protein sequences. In 2021 they released AlphaFold alongside 350,000 protein sequences, including the entire human genome. In 2022 that number of protein sequences jumped to 200 million—nearly every protein on Earth, covering plants, animals, bacteria, and fungi.
AlphaFold’s images aren’t always perfect, but they still offer incredible new insight for scientists. “We hope that AlphaFold—and computational approaches that apply its techniques for other biophysical problems—will become essential tools of modern biology,” said a DeepMind researcher.
Data Science Helps Stabilize the Supply Chain
Efficient supply chain operation relies significantly on accurate predictions of demand for products. Perishable goods such as fresh fruits and vegetables rot if stores receive excess supply at a given time. Even non-perishables such as t-shirts may go to waste if there is no storage space for them in stores. Businesses employ traditional data analytics to estimate demand and order goods, but supply chains are extremely labyrinthine. “In typical [supply chain management] problems, it is assumed that capacity, demand, and cost are known parameters,” researchers noted in the Journal of Big Data. However, uncertainties in demand for goods, transportation mishaps, organizational risks, and lead times complicate supply chain analysis and drive demand for more advanced data tools.
It’s not enough for businesses to make their best guess at how much of a particular good to order. Only advanced data science techniques can render accurate demand forecasts. The researchers write, “big data analytics (BDA) has emerged as a means of arriving at more precise predictions that better reflect customer needs.” Data science’s forecasting advantage minimizes product loss, reduces unnecessary spending, and ensures businesses can meet customer demand without creating waste.
Data Science Detects Online Fraud
Identity theft and fraud are significant threats to both consumers and businesses. American consumers lost $56 billion to identity theft in 2020, and fraud topped the list of complaints to the FTC in 2021 with 2.8 million reports. In the vast sea of consumer information, human analysts can’t pinpoint all potential instances of fraud on their own. Fortunately, machine learning and artificial intelligence offer new tools to combat fraudulent purchases and catch criminals.
Data scientists are training machines to review credit card transactions and other avenues for fraud. Their algorithms check for suspicious behavior faster than any human analyst. Machines scan millions of transactions and spot unusual markers, such as online credit card orders suddenly sent to brand-new addresses. Researchers are still working to find the most effective fraud detection algorithms and refine existing ones, but many promising techniques are already in use. Artificial Neural Network (ANN) is a machine learning method that mimics the human approach to detecting fraud. It has a nearly 94 percent detection rate, and only mislabels legitimate transactions as fraud six percent of the time. Another widely-used method called K-Nearest Neighbour (KNN) detects over 97 percent of fraudulent transactions.
The speed and accuracy of these algorithms can provide significant protection for financial institutions and their customers. Banks, credit card companies, and online retailers are among the top data scientist employers—companies like Wells Fargo and Amazon hire whole teams of specialists to combat fraud.
Data Science Improves Agricultural Yields
Data science offers farmers new tools to become more efficient and effective. Remote sensors and images from satellites and drones provide a wealth of data for farmers—and data science offers tools to interpret that data and guide new agricultural practices.
A review of machine learning applications in agriculture found opportunities for data science to transform agricultural management and enhance older farming techniques. Researchers used computer vision techniques and historical data to predict agricultural yields on crops from coffee beans to cherries. These accurate yield estimates help farmers allocate labor more efficiently. In one instance, researchers developed a system to analyze images of cherry trees to estimate the number of cherries still on a tree, even when they weren’t visible in the picture. Tools such as this eliminate the need for human workers to manually count cherries and allow them to do the work much faster and more accurately.
Data science also contributes to weed and species detection in crops. Researchers used aerial images to detect weeds that can harm crop yields. Weed detection algorithms work with robots that destroy invasive plants, presenting an alternative to herbicides or manual labor.
Researchers have also applied machine learning models to data from cattle collars and optical sensors to detect changes in livestock health. These tools monitor herds and call attention to sick cattle, alerting human workers to problems before they become too severe. In addition, data scientists have analyzed chemical structures in cows’ milk to evaluate diets and identify which are the most productive.
Data science also offers solutions for higher-level challenges, such as water and soil management. In multiple studies, data scientists have used historical weather information to determine how much water evaporates into the atmosphere versus how much crops absorb. This information is critical in semi-arid regions.
Soil is another critical resource in agriculture, but measuring soil moisture and temperature is time-consuming and expensive. Machine learning and data science provide new techniques for estimating soil properties based on spectral imaging, weather patterns, and other data that are easier to obtain. Farmers can use these tools to assist with agricultural planning and management.
Data Science Supports Transportation Innovation
Data science is contributing to sustainability in transportation. The United Nations’ Data for Climate Action initiative challenged data scientists worldwide to figure out new ways of utilizing existing data to help address, mitigate, and adapt to climate change in several areas, including transportation. The winning submission used traffic data from Mexico City, the most congested city in the world, to identify ideal locations for electric vehicle charging stations. They also recommended that the city electrify the city’s taxi fleet, public transit buses, and all light-duty vehicles. These policies would reduce harmful emissions and help prevent over 10,000 emissions-related deaths each year.
Government agencies are also starting to utilize data science’s potential for improving public transportation. A 2022 report from the Federal Transit Administration notes that agencies in the U.S. have a glut of public transportation system data available. The report highlighted opportunities for data science to improve public transit by monitoring vehicle health and predicting maintenance needs, analyzing existing and potential community public transit demand, improving operational efficiency, and optimizing bus, light rail, and subway schedules.
Top MSDS Programs Train Data Scientists to Change the World
Data science stands ready to revolutionize multiple industries using big data and advances in cutting-edge computation and artificial intelligence. The World Economic Forum predicts a data-driven future for workers in which professionals with data skills are a prized commodity on the labor market and have titles as diverse as data mining engineer and data warehousing specialist.
To some extent, that future is already here. According to Glassdoor, data scientist was the third-best job in America in 2022. Other data-related positions, such as machine learning engineer and data engineer, also cracked the top 10. In addition to overall job satisfaction and the opportunity to do world-changing work, data scientists earn high compensation. Top employers will pay over $200,000 for experienced data scientists with the right credentials.
Tufts’ online Master of Science in Data Science program teaches students how to harness the transformative power of data science and prepares them for impactful careers that rely on data skills. Tuft’s industry-leading faculty work on research projects with huge implications in their fields. They include Associate Teaching Professor Martin Allen, who researches machine learning applications for biological systems, and Assistant Professor Jivko Sinapov, who studies robotics, computational perception, and human-robot interaction.
At Tufts, students work closely with faculty in live online classes, learning skills directly relevant to their future data science careers. The MSDS curriculum covers machine learning, artificial intelligence, advanced statistics, and big data—all in-demand skills that prepare students for world-changing careers in present day data science and the future of the field.