Applying big data to real-world applications

I recently had the opportunity to meet with Seth DeLand, a data analytics product manager for The MathWorks. Here are the results of that conversation.

 

What are the industries where you're seeing a strong desire for simulation and design tools? And what can be done to help designers in this space?

Interest in simulation and design tools is a trend spanning multiple industries as companies look to make sense of and utilize the vast amount of data that's out there. Engineers are looking for an ecosystem of tools to help them aggregate data from a variety of sources, analyze that data, and then share the results. These engineers exist in every industry sector, from companies in the and aerospace industries building the next generation of vehicles and aircraft, to medical device companies looking to improve quality of living.

For example, BuildingIQ is using MATLAB to build a -hosted system that optimizes the of a building's HVAC system. They bring in data from external sources such as a temperature forecast and an electricity price forecast, and then use this data to minimize the building's energy consumption. With this approach, they're reducing the energy used by HVAC systems by 10 percent to 25 percent.

I've also noticed an interest in from the automotive industry. They're collecting large amounts of data from real-world driving situations (think millions of miles of driving), recording things such as video, radar, and other signals on the vehicles' controller area network (CAN bus). This data is used to measure important metrics such as fuel economy and performance at the fleet level. Engineering teams are using this data to design and test systems based on real-world situations. This data is also important to the development of Advanced Driver Assistance Systems (). One of the big challenges with ADAS is validating that the algorithms perform as expected over a testing data set that is often several TBs in size. Distributing this validation step over several machines is crucial, as it enables teams to quickly try out different architectures and approaches for their ADAS systems. Customers are trying to scale validation, using both MATLAB Distributed Computing Server and MATLAB integration with Hadoop.

How can industries address the shortage of data scientists?

Rather than hire a data scientist, who may have the technical skills but lack domain expertise, companies want their existing domain experts working with big data. These experts have a good understanding of how the systems they work with behave, but might not be familiar with the tools and techniques of the data scientist. Companies are looking to lower the bar on certain tools and materials to help employees ramp-up in this new area.

How are old-school brick-and-mortar companies reaping value from big data assets? Is that different from online retailers?

While big data has its roots in the large software and Internet companies that pioneered the technology, the technology has matured to the point that it's really branching out into a wide array of applications. As far as brick-and-mortar companies are concerned, retail companies are utilizing analytics to optimize how they price and sell perishable foods. By analyzing historical sales data, these companies have figured out how to allocate products throughout their supply chain in a way that meets consumer demand while incurring a savings.

We're on the cusp of Big Data 3.0, where in addition to text, we can draw insights from video, audio, and environmental phenomenon, such as temperature and humidity.

The rise of big data has resulted in a variety of new data sources being made available. It makes complete sense technically to incorporate a weather forecast into a model of electricity use (Figure 1), but until recently, working out the implementation details was quite difficult.

21
Figure 1: Pictured is a web app that's used to forecast the load on the electrical grid. To generate this forecast, a neural network is used with historical load and weather data, and the 24-hour ahead weather fore-cast.
(Click graphic to zoom)

One reason people use our tools is the ability to bring in data regardless of what form it's in. For example, with MATLAB R2014b's "webread" function, users can access data from online sources via RESTful APIs, so you could bring that weather forecast directly from the web into MATLAB. For many of these data types, specific toolboxes provide a library of functions as well as point-and-click apps for popular workflows. Image data can be processed with the Toolbox. And audio data can be processed with the Toolbox.

As data becomes bigger and faster, users are looking to adopt stream-processing methods for these data types. With the Computer Vision System and Toolboxes, users are designing streaming algorithms that can keep up with this data in a memory-efficient way.

What implications do the increasing amount of data sources have on data centers?

There's been a big move lately to data storage systems such as Hadoop, which are based on commodity hardware rather than expensive customized hardware. At a high level, these systems provide two things: a way to store data distributed across machines and a way to compute on the distributed data (without having to transfer large amounts of that data over the network, which is slow). To support this move, MathWorks is supporting a connection to data stored on Hadoop.

A lot of this data will come from Internet of Things () devices, where you have a large number of embedded devices (things) connected to the Internet. These devices can communicate with people and other things to gain insights and provide new product experiences.

Lately, designers want to move more of the processing down to the edge nodes (as close to the sensors as possible). This has the benefit of shrinking the amount of data that's transferred over the network, which reduces transmission costs and the devices' power consumption.

As more devices come online with IoT, there are going to be opportunities for people to use this data to develop products and services that don't exist today. It should be an industry-wide goal to make the engineers working on these systems as productive as possible, so they can quickly bring new ideas to life, and get products to market faster.