Skip to content

Latest commit

 

History

History

SpaceLexiconGenerator

Space Lexicon Generator

Introduction - The Design Engineering Assistant project

The Design Engineering Assistant (DEA) project is run at the Intelligent Computational Intelligence Lab, University of Strathclyde, Glasgow UK, by Audrey Berquand, PhD student, under the supervision of Annalisa Riccardi and in cooperation with ESA, RHEA, Airbus and satsearch in the frame of an ESA Networking/Partnership Initiative (NPI).

The goal is to develop an Expert System (ES) to support decision making at the early stages of space mission design (e.g., during feasibility studies). Implementing quick and efficient Information Retrieval (IR) has become essential to reduce the time spent by engineers searching for information in previous missions reports, books, online databases, etc.

This study focused on an innovative application of the first two layers of the Ontology Learning (OL) Layer Cake to a space mission design corpus. OL is an active field of research seeking to automatically generate an ontology, a key element to organise and combine unstructured data from several sources. The OL Layer Cake describes the consecutive steps to automatically create an ontology. The first two layers are 'Terms' and 'Synonyms'.

More about the project
All Publications

Generating a Space Lexicon

The code from this repository allows to semi-automatically extract a domain-specific lexicon, laying the basis of an ontology, from unstructured data. The results based on a 'space mission design' corpus are presented in the paper "Space mission design ontology: extraction of domain-specific entities and concepts similarity analysis", presented at the 2020 AIAA SciTech Forum in Orlando, USA.

An NLP Pipeline based on the Python NLTK library and adapted to a space mission design corpus (including ECSS acronyms and terms, and 'space missions' stopwords) is provided. The domain-specific lexica are generated by a frequency analysis with an additional filtering relying either on TF-IDF or on the Weirdness Index. The context of the lexica' items are embedded with word2vec, cosine similarity is applied to identify similar concepts. Word2vec models are generated with the Python Gensim library.

Getting Started

This code was run with Python 3.7.

Start by running DEA_init.py.

Parsed wikipedia pages and books are already provided, in .json format, in the Corpora folder.

Select the code building blocks to run in main.py, either re-running the NLP pipeline and/or identifying new entities (generating new domain-specific lexica) and/or identifying similar entities.

Citation:

If you use this code, we kindly request that you cite our research, you may use the following BibTex entry or equivalent:

@inproceedings{berquand2020,
title = {{Space mission design ontology: extraction of domain-specific entities and concepts similarity analysis}},
author = {Audrey Berquand and Yashar Moshfeghi and Annalisa Riccardi},
booktitle = {{Proceedings of the 2020 AIAA SciTech Forum}},
year = 2020,
month = January,
publisher = {AIAA},
address = {Orlando, USA},
note={\url{https://arc.aiaa.org/doi/10.2514/6.2020-2253}},\ language={English}}

License

This code is licensed under Version 2.0 of the Mozilla Public License.

Contact

Open an 'issue' or contact Audrey Berquand.