Summary

This page outlines how we use language models currently, and our plans for the future.

Screenshot 2024-04-05 133629.png

Model Generation Process

Our document generation process breaks down a system into a high-level overview, mid-level documents that summarize specific functionality, and low-level summaries of individual code files or segments.

Project Summarization

SAFA starts at the lowest level by summarizing individual blocks of code. SAFA uses these summaries to create a high-level overview of the system, and re-enriches the code-level summaries with additional details based on this overview.

Document Generation

Following this, through the use of an ensemble of clustering methods, SAFA creates groups of code with functional similarities. Documents are generated for each group to accurately summarize all system functionality. During this process, SAFA identifies and merges documents that are potentially duplicated.

To generate successive levels of documentation (such as Code → Functional Requirements → Features) SAFA applies similar clustering methods to derive the next layer up within the context of the system.

Trace Link Generation

Throughout the process of generating documentation, SAFA additionally generates trace links between all layers of documentation, allowing you to visualize and traverse code related to specific documents. These trace links include a confidence score, as well as a detailed explanation as to why they may be related.

Additional Features

We are continuously working to expand the capabilities of SAFA’s document generation. Some features we are currently exploring include change summarization, continuous updates to documentation, and detailed technical documentation such as endpoint data flows.

To summarize changes, SAFA will compare against a new system version — such as a pull request or new release — and collect all modified code files. That code is initially summarized to document what has been added, removed, or in what ways it was modified. Using these summaries, SAFA then generates an overview of what has changed at the most granular and technical level, and a non-technical overview of all changes. SAFA also generates an impact analysis report on what other code and documentation is affected by the changed files.

Building upon the ability to summarize changes to a system, SAFA can use those changes to keep its generated documentation continually updated. SAFA will search for existing documents that need to be modified based on the changes, and generate additional documents to encapsulate new functionality.

We have also been exploring how to use AST parsing to improve various details of the documentation we generate, and allow us to generate more specific types of technical documentation. SAFA will first use the AST to decompose code files into smaller blocks of information such as functions, classes & interfaces, object fields, and type definitions. This information can then be used to fully trace the relationships between segments of code. With these more granular blocks, SAFA can more easily identify & document database entities and their fields, API endpoints, or the steps that a function performs as it calls other functions. SAFA will also be able to filter out unused code files or commented out blocks of code.