Data Mining for industrial 4.0

How to manipulate industry 4.0 data to train classifiers

22 Giugno 2022

To train a machine learning for the recognition of ”topics” I used the data, approximately one year of production, of three companies belonging to the 4.0 industry operating in the manufacturing sector, in particular in the textile sector.

Companies have store production data in relational databases. For each archived data, the timestamp, the value produced by the sensor and the name of the topic are stored.

The values obtained directly from the sensors of industrial machines are stored in the relational database, therefore no coding process of the stored data. From these data all statu messages that are not needed for the correct learning of machine learning were excluded.

Below a summary table of customer in table 1 and in table 2 the list of kind of topic for each customers are obtained.

Client	Number of data	Number of topic
Customer 1	5788	7
Customer 2	1291	1
Customer 3	51045	12
Client 4	51049	12

Table 1: List of dimension data customers

Topic	Customer 1	Customer 2	Customer 3
topic 1	X		X
topic 2	X		X
topic 3	X		X
topic 4	X		X
topic 5	X		X
topic 6	X		X
topic 7		X
topic 8			X
topic 9			X
topic 10			X
topic 11			X
topic 12			X
topic 13			X

Table 2: List of dimension data customers

To provide a better way of dataset to train the machine learning, it was decided to group by the origin data of grouped by ten minutes by timestamp and topic. Then for each topic group it has been obtained the min and max value, the average and the deviation standard.

For each group of topic of ten minutes, it has been saved also the start of timestamp, the endo of timestamp and the topic name.

In table 3 we can see the first ten row of an extract of example of dataset grouped for the machine learning.

topic	datetime start	datetime end	occurs	max	average	dev stand	average	dev stand
topic 1	2021-12-17 10:37:01.764000	2021-12-17 10:48:58.493000	35	2	0.114286	0.403288	0.114286	0.403288
topic 1	2021-12-17 10:48:58.493000	2021-12-17 10:58:58.987000	86	1	0.0581395	0.234007	0.0581395	0.234007
topic 1	2022-02-22 07:29:31.331000	2022-02-22 07:39:43.586000	193	1	0.217617	0.416156	0.217617	0.416156
topic 1	2022-02-22 07:39:43.586000	2022-02-22 07:50:46.690000	145	1	0.172414	0.37774	0.172414	0.37774
topic 2	2022-02-22 08:41:36.819000	2022-02-22 08:51:48.124000	259	3	1.03089	0.842301	103.089	0.842301
topic 2	2022-02-22 08:51:48.124000	2022-02-22 09:01:50.280000	325	4	1.11077	0.960079	111.077	0.960079
topic 2	2022-02-22 09:01:50.280000	2022-02-22 11:10:12.621000	165	5	1.4	1.42063	1.4	142.063
topic 2	2022-02-22 11:10:12.621000	2022-02-22 11:53:42.168000	7	1	0.285714	0.515079	0.285714	0.515079
topic 2	2022-02-22 11:53:42.168000	2022-02-22 12:03:43.412000	229	25	2.24891	4.77293	224.891	477.293

Tabele 3: Dataset groupped by timestemp and topic

From the data of production of the 3 client above mentioned, we have obtained 4 dataset to train and test the machine learning, from the data of thirth client we have obtained 2 dataset.

In table 4 we can see the The number of range of dataset par client. The dataset 3 and 4 of the data production of client 3 are distributed equally by topic.

Client	Number of data	Number of topic
Data set 1	05/11/1915	7
Data set 2	14/07/1903	1
Data set 3	02/10/2039	12
Data set 4	06/10/2039	12

Table 4: The number of range of dataset par client