Requirements and Design Consistency: A Bi-directional Traceability and Natural Language Processing Assisted Approach

— Requirements Engineering (RE) is a crucial and essential phase in the Software Development Life Cycle (SDLC). The design phase follows it. Traceability is one of the core concepts in software engineering; it is used to follow updates to make consistent items. This paper aimed to cover consistency through bi-directional traceability between requirements and design phase in a semi-automatic way. The Natural Language Processing (NLP) was used to analyze the requirements text and generate a class diagram; then, the generated items can be traced back to requirements. We developed a novel process to support consistency and bi-directional traceability. To ensure our proposed process's practical applicability, we implemented a tool named as Requirements and Design Bi-directional Traceability (RDBT). RDTB receives textual format requirements, performs NLP tasks (Tokenization, Part-of-Speech Tagging, etc.), generates UML class diagram, and finally performs traceability management to ensure consistency of requirements and UML class diagram. The work evaluation reveals good results, which indicates it can be used efficiently as a guide to generate the UML class diagram semi-automatically and manage traceability.

I. INTRODUCTION 1 Completing successful software projects within its approved plan is a critical challenge. As in [1], around 69% did not deliver due to improper requirements management, lack of suitable user input, inadequate staff training, incomplete requirements, rapid change, ambiguity, and poor contract management. Obviously, poor requirement management has a lion share in the failure of software projects.
There are many phases for software development; each phase depends on the previous ones. Requirements' engineering is the first phase. In this phase, the requirements are gathered, analyzed, validated, and documented. The final output is a Software Requirements Specification (SRS). Commonly, requirements in SRS are written in Natural Language (NL).
Requirements engineers can write requirements in their sentences. During system analysis and design, it is crucial to ensure that the needed design phase requirements are covered to make a consistent design to make high-quality software.
NL's use as requirement descriptions causes several issues, such as ambiguous, incomplete, and inconsistent. Hence, the requirements engineer's primary duty is detecting, dealing, and fixing these issues [2]. Requirements' specifications are the formal description of business processes; as a life fact, business processes have dynamic nature, i.e., business processes comprise continuously changing. Therefore, requirements traceability is crucial to establish and maintain consistency between heterogeneous modes through the system development lifecycle [27].
On the other hand, NLP applications to requirements can be used to verify the SRS document's completeness and consistency [7]. Building and traceability and consistency are essential to achieve consistency between models. Consistent requirements should be linked to other requirements [22]. Based on these, we need to cover the area of requirements consistency between SRS and SDD through a novel methodology. This approach can be treated with an NL text and UML class diagram, and a consistent and traceability matrix was built. So, the traceability matrix can be used to ensure consistency [27]. There are the types of inconsistency in UML model redundancy, Conformance to Constraints and Standards, and change [14]. In our paper, we considered change inconsistency.
The automation allows for consistency, and errors can be avoided in earlier stages. One method to build traceability between requirements and model use Natural language Processing (NLP) techniques to process requirements text to obtain UML diagrams using different methodologies [2].
This work is an extension of our previous study [34], which covers the auto-generation of UML diagrams from requirements without going in-depth for inconsistency types. A framework enhanced to minimize the auto-generation of UML diagrams applies consistency between requirements and design. The paper covers moving from SRS to Software Architecture Design Description (SADD) in a semiautomatically to make bi-directional traceability in a consistent matter to ensure requirements consistent with the design. In the literature review section, lots of research were reviewed; these researches cover generation design diagrams from requirements without considering consistency and traceability [7]- [13], [15]. There is some research cover traceability but did not deal with consistency management [16]- [25]. In contrast, others deal with consistency but did not deal with auto-generation UML and requirements processing and how traceability can be managed [14], [29]- [31]. We developed a methodology to make consistent requirements and UML class diagrams through auto-generation class diagrams and bi-directional traceability. As in Fig. 1, the research covers the first two stages (SRS) and (SADD). Both forward traceability SRS to SADD, and backward traceability SADD to SRS.
The rest of the paper is organized as follows. Section 2 investigates the background and related work, while section 3 details the methodology. Section 4 covers our case study, while in section 5, we evaluate our proposed methodology. Finally, section 6 concludes the work and gives insights regarding future work.

II. BACKGROUND AND RELATED WORK
Requirements Traceability and Natural Language Processing (NLP) are the two main terms of this research, where they have been defined in this section under the preliminary's subsection. In the rest of this section, the related works have been analyzed by highlighting their strengths and weaknesses.

A. Preliminaries
Requirements Traceability: Based on international standards ISO/IEC/IEEE 29148 [3] requirements traceability defined as identification and documentation of the derivation path (upward) and allocation/flow-down path(downward) of requirements in the requirements set. It should be established to maintain the SRS document and trace it through the lifecycle of the system. Requirements Traceability is frequently used as a single point of accountability for tracing a requirement back to the source of the requirement and forward through the life cycle to assess that the requirement has been met [3]. Tracing the requirement specification during software development in all SDLC phases is vitally important [17]. According to [18], traceability management has been established to cope with the dramatically changing requirements due to add, delete, or modify the requirement features.
Traceability management is concerned with ensuring the consistency between requirements documents and design face. For instance, between SRS and class diagram [19]. Involving traceability management in the software development process demonstrate that requirements have been satisfied [20]. Bi-directional traceability provides a means of documenting and reviewing the relationships between layers of requirements that capture certain aspects of the design [3]. In order to maintain and manage traceability in a project, it is common to set up something called traceability matrixes [18].
Natural Languages Processing (NLP): Natural language processing can be defined as processing human language in an automatic (or semi-automatic) way [5]. NLP's goal is to identify the computational machinery needed for an agent to exhibit various linguistic behavior forms, i.e., Scientific Goal. It also designs, implements, and tests systems that process natural languages for practical applications, i.e., engineering goals [6].

B. Analysis of Related Works
The related works have been classified based on work output in two groups: the first group includes research that generates UML diagrams from SRS. The second group includes researches that suggested a mechanism for requirements traceability.

From Requirements to UML Transformation
In related works, several studies are dealing with the generation of UML diagrams from requirement specifications. Although these studies have the final aim of facilitating UML generation from requirements, they are still varying in methodologies or the final outputs. Some produce use cases, while others produce class, sequence, package, or even activity diagrams in the following, the works that have handled UML transformation requirements.
The work in [7] suggested managing textual requirements through application-specific ontologies and NLP. The study [8] proposed a method of automated the process of generating a package diagram from requirements to support the design process. In [9], [11], the Activity diagram and sequence diagram were generated, wherein [10] use case diagram was generated using heuristic rules. The work in [12] proposed a tool to generate a class diagram from requirements, then generate code from class. Chamitha et al. [13] proposed a workflow for generating use cases and class diagrams. They used web browsers to enter the text, finally process it using NL processing techniques, and then generate candidate diagram elements. Keletso [15] developed a tool for constructing an analysis model from the natural language specification of requirements (NLR) automatically. Imane et al. [16] developed an approach to generate a use case diagram from Semantic Business Vocabulary and Business Rules (SBVR). SBVR is requirement representations in a structured language instead of using natural language. In [26], they used analysis based approach to generate a class diagram from natural language requirements through dependency. However, their model neglected, providing an inconsistency between requirements and class diagrams. [32] proposed a tool named CM-Builder to support the auto transformation from requirements to UML diagrams. However, a detailed description of consistency between requirements and generated diagrams are missed. The studies mentioned above have concentrated on the generation of UML diagrams without considering the bi-directional traceability to follow changes in both sides. Table I shows a summary of the previous discussion by highlighting the goal and limitation of each work.

Requirements Traceability
In the following, the works that have dealt with requirement traceability management have been explored and discussed.
In [18], the framework for modeling requirements, traceability, and UML patterns was introduced as a communication tool between requirements and other software artifacts. The work in [19], proposed a formal methodology that utilized context-free grammar to denote use case, activity, and class diagrams. In [19], traceability rules were developed to ensure the use case mapped to activity and class diagrams. However, this is uni-directional tractability as their methodology lacks opposite traceability, i.e., traceability from (SDD) to SRS. In [21], guidelines for supporting traceability were introduced with manual generating of the UML diagram. In [22], a model labeled as Requirement Interchange Format (ReqIF) was proposed to exchange requirements traceability information between different Requirement Management Tools (RMT) and designing tools that are limited to XML only. The qualitative data analysis to improve the traceability of requirements was used in [23] to focus only on elicitation methods. In [24], the meta-model was proposed to provide explicit linking with requirements to enables traceability. However, the description of the system designed has been ignored in [24]. In [1] proposed a framework for requirements management that support requirements traceability. It consists of planning, execution, and management phases. The work in [1] disregards the design details.
In [14], three types of UML model's inconsistency have been introduced based on rule based. The rules are to detect inconsistencies between UML diagrams. However, inconsistencies between the UML diagram and SRS do not detect. In [28], formal methods for detecting five types of consistencies in the Use Case diagram has been presented. Work [28] has neglected to check for inconsistency between the Use Case diagram and SRS text. Work [29] provides an intuitive approach to analyze UML diagrams' consistency using Web Ontology reasoners. The consistency analysis is exclusively implemented only on UML diagrams. In [30], integration between UML and Service Refinement (SR) has been introduced to ensure consistency. However, consistency between SRS and UML diagrams has not been considered. Table II summarizes the most relevant studies that deal with requirement tractability.
In summary, from the works mentioned above, we can summarize the most crucial research gap in the lack of a model that can simultaneously perform Auto-generation of UML diagrams, provide bi-directional requirement tractability between SRS documents, and UML, and produce a tractability matrix.  [7] No diagram generated Scan and tag a set of requirements and allow system engineers to maintain a comprehensive, valid, and accurate body of requirements It is missing a clear description of consistency and traceability [8] Package diagram To have a systematic solution that recommends design choices based on system requirements Do not cover traceability and consistency [9] Activity and Sequence Diagrams To minimize the errors that arise in a traditional system Do not cover traceability and consistency [10] Use Case Diagram Facilitate the requirements analysis process Do not cover traceability and consistency [11] Activity and Sequence Diagrams Manual transformation from requirements to UML Do not cover traceability and consistency. It also missing automation [12] Class and Sequence Diagrams Supporting the Analysis stage of software development Do not cover traceability and consistency [13] Use Case and Class Diagrams To build quick, accurate, and intelligent software for generating UML Do not cover traceability and consistency [15] No  Goal Limitation [17] Proposed architecture for traceability management systems to the product life cycle.
Traceability covered in general through product life cycle without considering Requirements Traceability Matrix. [18] Build a framework suitable for modeling traceability patterns allow communicating best practices to other software projects.
Traceability is done between requirements without covering consistency between requirements and design. [19] Ensuring traceability and verify consistency among three phases use case, activity &class diagrams.
They did not perform any processing to the requirements, and no traceability matrix was defined. [20] Apply traceability information models in practice.
They did not perform any processing to the requirements and did not consider consistency issues. [21] Address the issue of traceability mapping between requirements and UML that may cause the problem of inconsistency. It provides traceability guidelines includes meta-model and process step They did not perform any processing to the requirements. The UML diagram is built manually. [22] Use of Standardized ReqIF Format to interchange traceability information between requirements tools and UML tools.
Did not generate a UML diagram, no consistency information, and no traceability matrix defined. [23] Use Qualitative Data Analysis to enhance requirements traceability.
It only concentrated on elicitation methods and did not take consistency into account. [24] Provide an explicit bridging between the different requirements and design decisions.
It did not offer details about UML diagrams that represent the design. [25] They built a framework to support requirements management and traceability management.
The framework is general and did not show the design in detail. [14] Proposed Rule-based to detect and trace inconsistency in UML modes. They introduced three types of inconsistency.
Cover inconsistency in UML models without considering consistency with SRS. [28] How to check the consistency of requirements which are modeled by UML use cases and constraints.
No requirement text processing.
[29] Make use of Web Ontology Language to automatic analysis and handling of inconsistency between three of UML diagrams.
They did not cover consistency between Requirements and UML diagrams. [30] Utilize a synthetic approach to support the formal requirements modeling and analysis as well as formal verification.
They covered some UML diagrams without making the process of handling and processing.

III. METHODOLOGY
In this section, the methodology for developing the proposed model has been explained and discussed in detail. The methodology consists of two main phases, as in figure 3. The first phase is a requirement processing where the output is a set of tagged texts. It consists of the following steps: rewritten requirement text in a specific way, filter and remove the stop words, perform text tokenization, and apply POS tag. The output of this phase is prepared SRS. The second phase is the auto-generating of a Class diagram from the prepared SRS. In the following, the detailed descriptions of the two phases have been provided.

A. Phase 1: Requirements Processing
The following steps are used for preparing requirements.

Writing requirements in specific rules/patterns:
In this step, SRS has been revised and rewritten by classifying requirements into related groups. Each group should be documented in a single form. Redundancies and unnecessary information should be removed.

Removing stop-words
Stop word is defined as a word with less significance, such as the, at, which, on, etc. Removing stop words will facilitate text processing. In case of need to some of these words, programmatically can be ignored. For example, we should keep (is a) between classes in the generalization relationship and not remove it.

Requirements text tokenization (lexical analysis)
After removing stop words, requirements have been divided into tokens as it is easier to process a single word than a set of words. To implement tokenization, we have used OPEN NLP Tokenizer.

Part-of-Speech (POS) Tagging
The final process in phase one is a POS tagging, which forms a word in a corpus to an equivalent part of a speech tag, based on its context and definition [25]. The POS tagging applied to tokens to markup noun, verb, etc. OPEN NLP POS Tagger has been used to generate the POS tagging. Then we can apply a pattern using a regular expression to extract the candidate class elements. Table III summarizes phase 1, where the input, output, and used tool has been defined.

B. Phase 2: Class Diagram Generation, Traceability, and consistency Management
The following steps are illustrating phase 2.

Rules Management
We proposed to save predefined rules in the database when chunking is needed for the pattern retrieved from the database and chunk the text using a regular expression.

Rule Engine and UML Database
This module work as a database to store patterns and UML class diagram elements. All classes, attributes, methods, and relationships are saved in this database. The databases are central and save all traceability information and used later to generate reports.

Apply Rules/Patterns on Requirements (Chunking)
In this phase, rules were retrieved from the database and applied to POS tagged text using a regular expression to obtain specific elements. In our case, to obtain candidate classes, attributes, operations or methods, and relationships.

Select Class Elements
After requirements are chunking, the candidate elements can be selected and saved in the database. The class diagram may contain classes, attributes, methods, and relationships with other classes. The SQL databases can be exported and imported into many models.

Class Diagram Generation
In this phase, the diagram is displayed based on elements selected in the previous stage. The class diagram is generated in semi-automatic ways and can be managed, edited, or removed from diagrams. We obtained a fully matched diagram with all class elements, attributes, classes, methods, and relationships generated and matched with manually generated diagrams.

Edit Requirements / Design
We may need to trace changes in requirements or class diagram elements. When we change in requirements, this change must be tracked and reflected in the design, and vice versa. When design changed, the change must be traced and reflected requirements.
In this phase, if any change happens to requirements without reflected in the class diagram or any change in the class diagram not reflected in requirements, this causes inconsistency. We manage change inconsistency through bidirectional traceability to ensure both phases are consistent. All information saved in the database, which update, source of change, and inconsistency.

Display Inconsistency in Requirements and Design Reports
An inconsistency report can be generated in case of inconsistency between requirements and design (updates not traced and reflected from phase to phase). This report helps us to know what change happened and applied well in both requirement and class diagram, so we can handle and manage inconsistency between these phases.

Generate Traceability Matrix
This is a crucial phase; the traceability matrix links both requirements with design elements. It shows all the need information that we need for traceability and inconsistency. It is one of the final outputs of our methodology. Its shows requirements Id, Traceability Id, source of change. As in Fig. 2, the SRS document's requirements transformed into a UML class diagram semi-automatically; this will speed up the design generation and decrease error expectation. The Traceability matrix handles both design and SRS. When any changes happen, it can be traced from source to destination. By doing so, we can ensure the achievement of consistency between requirements and design stages.

IV. CASE STUDY
In this section, the proposed model has illustrated by using a case study. Eclipse as IDE, Java as a programming language, and Apache OpenNLP have been selected as toolkits to process natural language text.
The case study has been borrowed from [31] by applying some changes to sentence syntax. As has been mentioned in phase 1, all requirements were written in a single form. The critical change that is applied to requirement text is to rewrite requirements based on our proposed rules. Each sentence should be written in a specific format. In this section, we also describe the implementation of the case study. Fig. 4 shows the main form used to enter SRS. This SRS should follow defined rules, each statement written in a specific way. Fig. 4-19 represent snapshots of our proposed tool.
Stop words have been removed. Fig. 5 shows SRS after removing stop word. The next step is applying the POS Tag. Fig. 6 shows SRS after applying POS Tag. By preparing SRS as in Fig. 6, the process reaches the end of phase 1. Hence, phase 2 should start by apply rules on tagged text to extract candidate class elements. Regular expressions were used to extract the needed elements for applying these rules.
As shown in Fig. 7, after applying rules to POS tagged text, the candidate classes, methods, attributes, and relationships are shown to the list. System analysts can select one and/more methods and attributes; also, the class can be passive. In this phase, the class diagram elements have been linked to a requirement, and the traceability matrix generated automatically in an implicit way.     In Fig. 8, the relationships between classes could be selected. We can determine three types of relationships, association, aggregation, and generalization in the proposed tool. The relationship between classes can be automatically determined, but we select using list and form for more flexibility. In Fig. 9, the requirements need for some change will be selected. We can choose requirements in two ways in our system, either by requirements ID or by the Trace ID. In the traceability matrix, each row is given a unique ID known as trace Id used to manage traceability between a specific class and requirements elements stored in that table. When we select trace id, it will show only specific elements, one element of class or requirements, as Fig. 10 shows.   Fig. 9, the proposed change for requirements elements will be entered and wait for approval (approve or ignore proposed change). It consists of trace Id, Req ID, source, source description, target, and target description. The source is the source of the change either in the design phase or requirements phase. If the change in the requirement phase, the target is the design phase because it should be updated to become a requirement phase and vice versa. Source description refers to the required element after applying POS Tag, or the name of design elements. Target is the phase that should be reflected by change. Finally, the target description, the element name, and element specify element type (class, attribute, method, or relationship). Fig. 11 . Traceability Matrix. Fig. 11 shows the traceability matrix. It consists of a unique Traceability ID (Trace ID) for each requirement/design elements. Requirements ID shows several requirements that the elements belong to. In our case, we assume that the requirements are written as a single text for better processing. Source determines the source of requirements or proposed change; in our case, the default requirements source is the requirement phase, and the target is the design phase. Description in source shows the requirements element in the requirement phase obtained from the processed original text. Target shows the target of the requirement or design phase. Design description is requirements elements that are mapped to the design phase, and design elements specifying elements are either Class, Attribute, Methods, or relationships. This table or matrix handles the necessary information needed in traceability and can obtain many traceability reports. When UML class elements selected all needed traceability, the information will be saved in the traceability table.

V. EVALUATION
To assess the quality and applicability of the developed framework, we perform the following: 1. Make questionnaires that discuss our work to specialists.
Here we will discuss the evaluation: 1. Questionnaires that discuss our work to specialists. A survey has been applied and globally distributed to specialists from LAAS in France, PSAU in Saudi Arabia, SUST in Sudan, NCTR in Sudan under our direct supervision to ensure that we obtain a real evaluation of our work. The survey discusses the framework in details and forms of writing requirements to generate the class diagram elements, and how to manage traceability, and different reports. Twenty-two persons filled the questionnaire; 59.1% are academics in computer science, 31.8% are software engineers, and 9.1% work in other computer science fields. 54.5% are Ph.D. holders, 31.8% have M.Sc in Computer science or related fields, and 13.6 have B.Sc in computer science or related fields. Most of the respondents are Ph.D. holders in the related fields. Also, we found around 81.8% use Software Requirements Specification(SRS) and Software Design Documents(SDD), while 18.2 did not use SRS and SDD, so the SRS and SDD are used in most cases in the requirements specification and design. It is crucial to have consistency between requirements and design. Regarding this, we mention some problems. One of the significant problems is SRS and SDD consistency. We found that 9.5% strongly agreed that there is a problem. 57.1 agreed that there is a problem. 19% are neutral, and 14.3% disagree that there is a problem. We can say that there is a consistency problem between requirements and design as shown in Fig. 12. Requirements can be traced to the design phase to check the consistency. The traceability can be applied manually as Requirements Traceability Matrix(RTM) or automatically. So, it is not easy to maintain full requirements traceability. We found that 4.8% perform it automatically, 38.1% perform it manually, and 57.1% perform semi-automatically performed traceability Fig. 13. We proposed some rules to write requirements that will facilitate the auto-generation of UML diagrams that allow traceability of requirements and design. As shown in Fig. 14, we found 15% strongly agree that the given rules are easy to understand, while 55% agree the rules are easy to understand, and 30% say it is neutral to understand. This means the given rules, obtained acceptable agree regarding understandability and ease of use. The framework was practically implemented using Eclipse IDE, and the payment system was applied as a use case in our developed system. In the survey, the methodology was discussed in detail, and potential questions are put. Does the developed methodology facilitate the process of moving from SRS to SSD consistently? We found that 10% strongly agree, 65% agree, and 25 % are neutral. This reveals that the developed framework is acceptable and facilitates the process of traceability Fig. 15.  As in Fig. 16, our developed framework developed to support bi-directional traceability, i.e., both backward and forward traceability; 15% strongly agree that the framework support bi-directional traceability, while 75% agree, and 10% are neutral.
1. Mathmatical Evaluation: There is two main mathematical formula used to measure accuracy and compleness [26]: a. Precision Function: Reflect the accuracy of produced results and how much produced information, its measure by the following function.  As shown in Table IV, the process of the Class diagram was evaluated, and a good result was obtained, high precision and recall in all class elements, because we made our rule engines entirely with patterns/rules that can facilitate the chunking process compared with the studies [27] and [32] which includes a comparison of many previous works. Regarding traceability accuracy, we obtained 100%. All changes are bi-directionally traced and reflected while in [33] they obtained 80%. So, all changes are applied in both requirements and the UML class diagram.

VI. CONCLUSION AND FUTURE WORK
Traceability is one of the crucial issues in software engineering, and especially in requirements. When traditionally moving from SRS to the design phase, this may take time and also inconsistency raised. To solve these issues, we proposed using NLP to generate a UML class diagram semi-automatically from requirements, make bi-directional traceability between two of these stages to make them consistent, and obtain higher quality software. The methodology was applied practically. The use case was applied, evaluated through a well-designed questionnaire approved by two professors. A mathematical evaluation was conducted; all these evaluation results indicate the work is suitable and achieve the research aim. Now machine learning plays a vital role in artificial intelligence. As extensions to this work, we propose to use machine learning and text mining to facilitate the process of patterns writing and generate them through mining techniques.