Русский
!

Presentations

Automation of the audit process for data analysis

Martinovich E.N., Popov E.V., Shutkin A.S.

Sirius University of Science and Technology

In order to automate the work of audit companies, minimize the human factor and reduce resource costs, we have invented and implemented an XML graph algorithm for document analysis. From the analysis of the literature, we saw that such works have not yet been presented – a completely innovative development.

As input data, we use a subset of the XML markup language, because reporting is most often presented in this form. First, the XML files are analyzed and lists and dictionaries of the tags contained in them, their values and hierarchy are compiled. Based on this, a text document is compiled with a detailed list of links for each individual element.

Next, a graph of the first type is constructed, depicting file connections through separate tags, each element has a tooltip with information about the location of this element in a text file. The vertices of the graph are the markup elements of the file, and the edges are the connection between them.

Based on the built hierarchy, the weights of tags and their values are calculated, the forces of interaction of files with each other, and a table of file relationships by fields and values is created. A table of the magnitude of the links between files is created and normalized, on the basis of which the dimensionality is reduced using the principal component method. The resulting clustering is diagonalized and the distance of the initial position from the final one is calculated. Then the files are distributed across the canvas according to their original position on the diagonal and the angle proportional to the calculated distance. As a result, a graph of the second type is constructed and visualized in three variants: with low, medium and high degrees of detail.

As a result of the application of our development, we get:

• relations.txt – a text file with a detailed list of links for each individual element.

• result.csv – a table of relationships of graph objects through fields.

• test.svg is a graph of the first type. Shows the relationships of XML files through common tags inside the file and the positions of the described elements in a text document.

• low.png is a graph of the second type with a low degree of detail, showing the relationship of XML files with each other.

• middle.png is a graph of the second type with an average degree of detail, showing the relationship of XML files with each other.

• high.png is a graph of the second type with a high degree of detail, showing the relationship of XML files with each other.

Presentation

© 2004 Designed by Lyceum of Informational Technologies №1533