High-performance storage for the engine business
To make machine data more useful, data analysts must put gigantic amounts of data into context. This calls for a completely new kind of storage.
05.2021 | Text: Thorsten Rienth
Thorsten Rienth writes as a freelance journalist for AEROREPORT. In addition to the aerospace industry, his technical writing focuses on rail traffic and the transportation industry.
If you wanted to illustrate the abstract concept of data overload, you could use a common milling machine of the kind that MTU Aero Engines in Munich operates to manufacture, say, the blisks used in engine compressors. With a frequency of 250 hertz, i.e. 250 times a second, the machine’s sensors pick up about 70 different signals: torque, temperature, axis positions or values for cooling lubricant quality. A complex component can keep the milling machine busy for ten hours, easily. This means the resulting data set has over ten million rows—for just this one work step.
Even advanced databases have limits
However, to be able to extract relevant information from this raw data later on and derive knowledge and an understanding of products and processes from it, ten hours of data recordings aren’t enough. If unknown patterns and relationships between different variables are to come to light, the requirement for machine operation data quickly expands to a complete year. Millions of rows become billions of rows. Yet at some point, even advanced databases reach their limits. As the amount of storage increases, their speed falls off rapidly. That’s why Dr. Galina Baader and Dr. Sonja Hecht, IT experts at MTU, are working on a “turbocharger” solution.
“If we want to continue handling this data, we have to completely rethink the structure for storing it,” Baader says. “We need a new structure that can be accessed in a powerful way and ensures data takes up as little storage space as possible.” The road to this goal is long—but there is a shortcut: instead of thinking in terms of rows, the developers want to start thinking in columns. “A column usually contains homogeneous data that can be significantly compressed using a suitable algorithm,” Baader explains. The result is very compact files; depending on the use case, these are four to ten times smaller than previous storage solutions.
Columns and parallelization make data storage efficient
Saving the data in columns has other advantages as well. If, for example, a data analyst wants to look at the temperature curve, the computer has to read only the relevant column and can ignore all the others. “The computer is quick to home in on precisely where the data required for the task in question is located,” Baader says.
Storing the data this way also allows very large volumes to be processed simultaneously by means of parallelization. While one process is working on data from one time period, a second process is already checking the data from another. Theoretically, a large number of such processes can run simultaneously and accelerate the processing speed many times over.
The technical term for this column-based approach is Parquet format. “Record-shredding and assembly” is the name of the algorithm that disassembles and reassembles the nested data structures in fractions of a millisecond. This makes use of MTU computing power highly efficient. “Depending on the use case, we can reduce an algorithm’s runtime from several hours to one hour,” Baader says.
Columns instead of rows
Getting there faster: Saving data in columns is particularly useful in data analysis. The computer can access the required location in a faster, targeted manner, and parallelization makes it possible for several processes to run simultaneously, shortening runtimes even further.
Recognizing patterns in machine data, statistical assumptions and empirical values
What makes this stage so important is that it allows Baader’s and Hecht’s colleagues in data analysis to step out into a new world. Suddenly, they are able to analyze the proverbial “big data” from machine operations, and are no longer limited to smaller slices of the whole. “For genuine pattern recognition, we rely on readings taken over longer periods of time,” Hecht says. In extreme cases, these periods span several years.
As engine components become increasingly complex, so too does their production. “Ultimately, the quality of a component depends in no small part on the interactions between its individual production steps,” Hecht explains. What if machine data, statistical assumptions and empirical values could be linked to form reliable forecasts? That would create a data-driven prediction of product quality—true predictive quality.
Predictive capability gives production engineers insight into hitherto invisible relationships
“What if there’s a certain dependency among pressure, torque and temperature at a particular manufacturing step, but it doesn’t set off a quality warning until several steps later?” Hecht asks. Snipping relevant readings from the data sets as if with digital scissors, extracting them as patterns and visualizing them in dashboards—this gives the production engineers an insight into previously invisible relationships directly on their line. “This might allow them to counteract a potential mistake well before they would otherwise have even suspected it might arise,” Hecht points out.
Similarly, forecasts of tool wear patterns become conceivable as well. Engine components are highly resilient—including to the milling tools that produce them, unfortunately. In order to not jeopardize the low manufacturing tolerances of engine components, the tool runtimes are always furnished with a certain material buffer. Better utilization of a component’s remaining service life could noticeably improve the production flow by making time-consuming tool changes less frequent. In addition, tooling costs are a real cost factor in the engine business.
Maintaining an overview: With a special way of storing data, the two IT experts make it easier for their colleagues from data analysis and manufacturing to place the data in patterns and relationships.
A hardware “backbone”: High-performance computing (HPC) at the MTU data center
When the two IT specialists set up the new data storage and analysis tools, one MTU-specific feature played right into their hands. “Engine developers make extensive use of simulations, especially when it comes to aerodynamics and structural mechanics,” Hecht says. This requires enormous computing power in MTU’s data center.
Data analysis algorithms can now also run on the high-performance computing cluster available there without having to set up a second computing environment. “That, of course, is incredibly appealing from an IT infrastructure perspective,” Hecht says.