LEGAT – Advanced computer system based on artificial intelligence (AI) for identifying and extracting entities from unstructured data collections
News
2024
Project
-
Project ID: PN-IV-P6-6.3-SOL-2024-0090
-
Consortium: UB (coordinator),
ATM,
Nextgen Software SRL.
-
Team: 30 positions (3 still open at UB)
-
Funder:
UEFISCDI
-
Budget: 2.122.787 lei (~ 424.557 euro)
-
Duration: 06 June 2024 - 05 June 2026
Main Objective
The main objective of this project is
the creation of a hardware-software IT system, called LEGAT,
based on artificial intelligence, which,
based on certain training data sets,
will proceed to the semi-automatic structuring of the historical data
collected at the MAI/DGPI level
based on on a number of essential components:
-
extracting data from unstructured datasets;
-
characterization of the entities and the links between them;
-
identifying patterns and retrieving the information of the entities in focus.
End Result
Prototype hardware-software platform delivered to the Beneficiary at the end of the project.
Team
Paul Irofti -- Project Coordinator
University of Bucharest:
Paul Irofti -- Principal Investigator
Radu Ionescu -- Senior Researcher
Marius Popescu -- Senior Researcher
Iulia Timofte -- Researcher
Roxana Voicu -- Researcher
Eduard Poesina -- Assistant Researcher
Ana Cristina Rogoz -- Assistant Researcher
Silviu Gheorghe -- Master Student
Open Positions:
1 Researcher position,
2 PhD or Masters student
positions.
Contact me if interested!
Military Technical Academy
Luciana Morogan -- Principal Investigator
Ion Bica -- Senior Researcher
Ștefan-Adrian Toma -- Senior Researcher
Mihai Coca -- Researcher
Iulian Tiță -- Assistant Researcher
Mirabela Medvei -- Assistant Researcher
George Hariga -- Assistant Researcher
Alexandra Buzățoiu -- Master Student
Paul-Florinel Căsăndroiu -- Master Student
Ilie-Cosmin Bilțan -- Master Student
Florina Conchințoiu -- L1 Technician
Andrei Brînzea -- L1 Technician
Nextgen Software SRL:
Bogdan Legănaru -- Principal Investigator
Vlad Gladin -- Senior Researcher
Daniel Tache -- Researcher
Alexandru Cocosila -- Researcher
Viorel Tiganescu -- L2 Technology Engineer
Bonciu Emilian Cristian -- L2 Technology Engineer
Adrian Bogdan Sandu -- L2 Technician
Documentation
Papers
About
LEGAT aims to create a hardware-software computer system based on artificial intelligence, which, based on training data sets, will proceed to the semi-automatic structuring of the historical data collected at the MAI/DGPI level based on a series of essential components:
(i) extracting data from unstructured data sets
Training large language models (LLM) for a high degree of accuracy and efficiency starting from pre-trained models, which we will adapt to our data sets, using effective training techniques such as Low-Rank Adaptation (LoRA), Direct Preference Optimization (DPO) or combinations thereof. To maximize performance, we will manually annotate the data and fit LLM models in a supervised manner.
(ii) characterization of the entities and the links between them;
Starting from the latent representation obtained by the language model trained by our team, we will add a module consisting of neural layers for entity extraction that will have a classification layer that assigns to each language token a class representing an entity type or a class representing simple words (non-entities). We will use a module with a similar architecture for identifying and finding attributes for entities. A third neural module will be used to extract relationships.
(iii) identifying patterns and retrieving the information of the entities in focus
The resulting data will be inserted into dedicated database tables. Also, the metadata of the processed file will be inserted as well as other information considered necessary for the creation and identification of predefined topics. The web interface module will value the entities presented in the database with multiple query criteria: links or property values. Thus, users can investigate entities, links between entities and a map of links between them.