innovation

High-performance storage for the engine business

To make machine data more useful, data analysts must put gigantic amounts of data into context. This calls for a completely new kind of storage.

author: Thorsten Rienth | 4 mins reading time published on: 19.05.2021

AEROREPORT series On a digital mission

If you wanted to illustrate the abstract concept of data overload, you could use a common milling machine of the kind that MTU Aero Engines in Munich operates to manufacture, say, the blisks used in engine compressors. With a frequency of 250 hertz, i.e. 250 times a second, the machine’s sensors pick up about 70 different signals: torque, temperature, axis positions or values for cooling lubricant quality. A complex component can keep the milling machine busy for ten hours, easily. This means the resulting data set has over ten million rows—for just this one work step.

AEROREPORT series: On a digital mission

Big Data and AI take MTU to the next level
High-performance storage for the engine business
Fast and precise: Calculating proposals with big data
Searching documents using artificial intelligence
New data management system for engine test data
Mastermind for MTU production

Even advanced databases have limits

However, to be able to extract relevant information from this raw data later on and derive knowledge and an understanding of products and processes from it, ten hours of data recordings aren’t enough. If unknown patterns and relationships between different variables are to come to light, the requirement for machine operation data quickly expands to a complete year. Millions of rows become billions of rows. Yet at some point, even advanced databases reach their limits. As the amount of storage increases, their speed falls off rapidly. That’s why Dr. Galina Baader and Dr. Sonja Hecht, IT experts at MTU, are working on a “turbocharger” solution.

“If we want to continue handling this data, we have to completely rethink the structure for storing it,” Baader says. “We need a new structure that can be accessed in a powerful way and ensures data takes up as little storage space as possible.” The road to this goal is long—but there is a shortcut: instead of thinking in terms of rows, the developers want to start thinking in columns. “A column usually contains homogeneous data that can be significantly compressed using a suitable algorithm,” Baader explains. The result is very compact files; depending on the use case, these are four to ten times smaller than previous storage solutions.

“I’ve been working with big data and data analytics for a very long time: first in my business informatics studies, then in IT consulting, and now at MTU. The entire field of big data has undergone a tremendous change. It wasn’t all that long ago when its focus was primarily on administrative processes. Now its focus is very much right on the product. That’s exactly what makes my job at MTU so thrilling: I can influence product quality very directly through my work.”

Dr. Sonja Hecht

IT expert at MTU Aero Engines

Columns and parallelization make data storage efficient

Saving the data in columns has other advantages as well. If, for example, a data analyst wants to look at the temperature curve, the computer has to read only the relevant column and can ignore all the others. “The computer is quick to home in on precisely where the data required for the task in question is located,” Baader says.

Storing the data this way also allows very large volumes to be processed simultaneously by means of parallelization. While one process is working on data from one time period, a second process is already checking the data from another. Theoretically, a large number of such processes can run simultaneously and accelerate the processing speed many times over.

The technical term for this column-based approach is Parquet format. “Record-shredding and assembly” is the name of the algorithm that disassembles and reassembles the nested data structures in fractions of a millisecond. This makes use of MTU computing power highly efficient. “Depending on the use case, we can reduce an algorithm’s runtime from several hours to one hour,” Baader says.

Columns instead of rows

Getting there faster: Saving data in columns is particularly useful in data analysis. The computer can access the required location in a faster, targeted manner, and parallelization makes it possible for several processes to run simultaneously, shortening runtimes even further.

“We data engineers act as interfaces between the IT infrastructure team, the technical departments and the data scientists in the company. To carry out this function properly, we need to have some understanding of the work of these other roles. When I started at MTU a little over a year ago, I first read up on low-pressure turbines and their manufacturing processes. Big data projects are always interdisciplinary projects, which I find extremely exciting.”

Dr. Galina Baader

IT expert at MTU Aero Engines

Recognizing patterns in machine data, statistical assumptions and empirical values

What makes this stage so important is that it allows Baader’s and Hecht’s colleagues in data analysis to step out into a new world. Suddenly, they are able to analyze the proverbial “big data” from machine operations, and are no longer limited to smaller slices of the whole. “For genuine pattern recognition, we rely on readings taken over longer periods of time,” Hecht says. In extreme cases, these periods span several years.

As engine components become increasingly complex, so too does their production. “Ultimately, the quality of a component depends in no small part on the interactions between its individual production steps,” Hecht explains. What if machine data, statistical assumptions and empirical values could be linked to form reliable forecasts? That would create a data-driven prediction of product quality—true predictive quality.

Predictive capability gives production engineers insight into hitherto invisible relationships

“What if there’s a certain dependency among pressure, torque and temperature at a particular manufacturing step, but it doesn’t set off a quality warning until several steps later?” Hecht asks. Snipping relevant readings from the data sets as if with digital scissors, extracting them as patterns and visualizing them in dashboards—this gives the production engineers an insight into previously invisible relationships directly on their line. “This might allow them to counteract a potential mistake well before they would otherwise have even suspected it might arise,” Hecht points out.

Similarly, forecasts of tool wear patterns become conceivable as well. Engine components are highly resilient—including to the milling tools that produce them, unfortunately. In order to not jeopardize the low manufacturing tolerances of engine components, the tool runtimes are always furnished with a certain material buffer. Better utilization of a component’s remaining service life could noticeably improve the production flow by making time-consuming tool changes less frequent. In addition, tooling costs are a real cost factor in the engine business.

Maintaining an overview: With a special way of storing data, the two IT experts make it easier for their colleagues from data analysis and manufacturing to place the data in patterns and relationships.

A hardware “backbone”: High-performance computing (HPC) at the MTU data center

When the two IT specialists set up the new data storage and analysis tools, one MTU-specific feature played right into their hands. “Engine developers make extensive use of simulations, especially when it comes to aerodynamics and structural mechanics,” Hecht says. This requires enormous computing power in MTU’s data center.

Data analysis algorithms can now also run on the high-performance computing cluster available there without having to set up a second computing environment. “That, of course, is incredibly appealing from an IT infrastructure perspective,” Hecht says.