To train a machine learning for the recognition of ”topics” I used the data, approximately one year of production, of three companies belonging to the 4.0 industry operating in the manufacturing sector, in particular in the textile sector.
Companies have store production data in relational databases. For each archived data, the timestamp, the value produced by the sensor and the name of the topic are stored.
The values obtained directly from the sensors of industrial machines are stored in the relational database, therefore no coding process of the stored data. From these data all statu messages that are not needed for the correct learning of machine learning were excluded.
Below a summary table of customer in table 1 and in table 2 the list of kind of topic for each customers are obtained.
Client | Number of data | Number of topic |
---|---|---|
Customer 1 | 5788 | 7 |
Customer 2 | 1291 | 1 |
Customer 3 | 51045 | 12 |
Client 4 | 51049 | 12 |
Topic | Customer 1 | Customer 2 | Customer 3 |
---|---|---|---|
topic 1 | X | X | |
topic 2 | X | X | |
topic 3 | X | X | |
topic 4 | X | X | |
topic 5 | X | X | |
topic 6 | X | X | |
topic 7 | X | ||
topic 8 | X | ||
topic 9 | X | ||
topic 10 | X | ||
topic 11 | X | ||
topic 12 | X | ||
topic 13 | X |
To provide a better way of dataset to train the machine learning, it was decided to group by the origin data of grouped by ten minutes by timestamp and topic. Then for each topic group it has been obtained the min and max value, the average and the deviation standard.
For each group of topic of ten minutes, it has been saved also the start of timestamp, the endo of timestamp and the topic name.
In table 3 we can see the first ten row of an extract of example of dataset grouped for the machine learning.
topic | datetime start | datetime end | occurs | min | max | average | dev stand | average | dev stand |
---|---|---|---|---|---|---|---|---|---|
topic 1 | 2021-12-17 10:37:01.764000 | 2021-12-17 10:48:58.493000 | 35 | 0 | 2 | 0.114286 | 0.403288 | 0.114286 | 0.403288 |
topic 1 | 2021-12-17 10:48:58.493000 | 2021-12-17 10:58:58.987000 | 86 | 0 | 1 | 0.0581395 | 0.234007 | 0.0581395 | 0.234007 |
topic 1 | 2022-02-22 07:29:31.331000 | 2022-02-22 07:39:43.586000 | 193 | 0 | 1 | 0.217617 | 0.416156 | 0.217617 | 0.416156 |
topic 1 | 2022-02-22 07:39:43.586000 | 2022-02-22 07:50:46.690000 | 145 | 0 | 1 | 0.172414 | 0.37774 | 0.172414 | 0.37774 |
topic 2 | 2022-02-22 08:41:36.819000 | 2022-02-22 08:51:48.124000 | 259 | 0 | 3 | 1.03089 | 0.842301 | 103.089 | 0.842301 |
topic 2 | 2022-02-22 08:51:48.124000 | 2022-02-22 09:01:50.280000 | 325 | 0 | 4 | 1.11077 | 0.960079 | 111.077 | 0.960079 |
topic 2 | 2022-02-22 09:01:50.280000 | 2022-02-22 11:10:12.621000 | 165 | 0 | 5 | 1.4 | 1.42063 | 1.4 | 142.063 |
topic 2 | 2022-02-22 11:10:12.621000 | 2022-02-22 11:53:42.168000 | 7 | 0 | 1 | 0.285714 | 0.515079 | 0.285714 | 0.515079 |
topic 2 | 2022-02-22 11:53:42.168000 | 2022-02-22 12:03:43.412000 | 229 | 0 | 25 | 2.24891 | 4.77293 | 224.891 | 477.293 |
From the data of production of the 3 client above mentioned, we have obtained 4 dataset to train and test the machine learning, from the data of thirth client we have obtained 2 dataset.
In table 4 we can see the The number of range of dataset par client. The dataset 3 and 4 of the data production of client 3 are distributed equally by topic.
Client | Number of data | Number of topic |
---|---|---|
Data set 1 | 05/11/1915 | 7 |
Data set 2 | 14/07/1903 | 1 |
Data set 3 | 02/10/2039 | 12 |
Data set 4 | 06/10/2039 | 12 |