Kai - Generative AI Applied to Application Modernization

Overview

Konveyor AI or “Kai”, is an early effort of Generative AI applied to Application Modernization being explored under the Konveyor Ecosystem at konveyor-ecosystem/kai.

Kai implements a Retrieval Augmented Generation (RAG) approach that leverages data from Konveyor to help generate code suggestions to aid migrating legacy code bases to a different technology. The intent of this RAG approach is to shape the code suggestions to be similar to how an organization has solved problems in the past, without additional fine-tuning of the model.

Demonstration Video

The team has explored a use case of a Java EE application migrating to Quarkus for it’s first example. Kai can be applied to other domains beyond Java EE -> Quarkus, the only requirement is that analyzer-lsp supports the language and there are sufficient rulesets defined for the target.

Kai Demo Loop

Below is a high level view of where Kai fits into a typical large scale modernization engagement.

What does Kai provide?

Konveyor extended with Kai allows developers to work in their IDE to see analysis information and request code suggestions to resolve those migration issues.

Kai’s basic workflow from an IDE is:

Discover migrations issues via static code analysis
Generate a fix for migration issues

Access static code analysis information from Konveyor’s analzyer-lsp inside the IDE

For a given issue, ask Kai to generate a code suggestion to solve the migration issue

How does Kai work?

Kai’s basic workflow is:

Identify migration ‘issues’ via static code analysis
Look to see how the organization has solved similar problems in past, we call this a ‘Solved Example’
Extract enough contextual info from a ‘Solved Example’ that we can provide the LLM with guidance of how we want this current problem solved
Work with a LLM to generate a code suggestion, giving it both the Analysis Information and any extra contextual info from how the organization has solved this problem in the past
Surface the code suggestion either via an API call or in a developers IDE

RAG Approach

Kai uses 2 types of information to help inform the LLM of additional context to improve a response.

Static Code Analysis Information
Solved Examples

Static Code Analysis Information

Analysis information is the result of running analyzer-lsp with a set of rules that help discover points of interest when migrating an application to a new technology. These rules may leverage the ~2400+ community contributed rules at konveyor/rulesets or may be organization specific custom rules covering information on proprietary corporate frameworks.

We use the analysis information for:

Identifying what areas of code need to be updated to move to a new technology
Guidance to inform a developer what the issue is and hints of how they may resolve the problem

For example we can look at the below snippet to see an example of the YAML data from a specific analysis issue informing the developer about a concern when moving from JMS to Quarkus’ reactive messaging:

incidents:
  - uri: file:///src/main/....../service/InventoryNotificationMDB.java
    message: "JMS `Topic`s should be replaced with Micrometer `Emitter`s feeding a Channel. See the following example of migrating\n a Topic to an Emitter:\n \n Before:\n ```\n @Resource(lookup = \"java:/topic/HELLOWORLDMDBTopic\")\n private Topic topic;\n ```\n \n After:\n ```\n @Inject\n @Channel(\"HELLOWORLDMDBTopic\")\n Emitter<String> topicEmitter;\n ```"
    lineNumber: 60

Additionally, this analysis information is available in a more human friendly WebUI view Analysis Report

Solved Examples

What do we mean by ‘Solved Examples’

Application Modernization engagements inside a large organization typically encompass ~100s of applications that need to be migrated to a new technology. Often these applications share a large number of similar issues. As an organization successfully migrates a handful of applications they begin to encounter repeated patterns of issues they have already solved. One of the big goals with Kai is helping an organization tap into these prior solutions and leverage them for new migration needs, hence what we refer to as a ‘Solved Example’.

How do we determine if we have a ‘Solved Example’ for a given Analysis Incident

Kai will help address similar migration problems by using the data inside of Konveyor which has an organization wide view of the entire application portfolio.

Kai is able to find occurrences of when an application previously had the same problem and then was successfully migrated.

Use of ‘Solved Examples’ in a Few Shot Prompt

Once Kai has found how a similar problem has been solved elsewhere in the organization it extracts part of that information and gives it to the LLM to help consider as a ‘few shot’ example.

LLM Specific Concerns

As the team began working with LLMs we identified a few concerns that influenced our approach with Kai.

Limited Context Size in LLMs
Need to handle changes that cascade throughout a repository
Desire to work with knowledge not in the model (an organization’s custom frameworks)
Model capabilities are quickly improving
Iterate on responses with a LLM to improve the solution

#1 Limited Context Size in LLMs

LLMs have a limitation on the size of data they will consider when forming a response, this is called their context size. This limitation makes it impractical to ingest an entire source code repository into each request for most models. Kai approaches this limitation by leveraging Konveyor’s static code analysis to discover migration issues and use those identified migration issues as the natural boundaries for scoping the problem to smaller subsets.

#2 How to handle changes that cascade throughout a repository

The first iteration of Kai is focused on splitting work into impacted files as identified from source code analysis. These impacted files are then run through Kai and an updated file is produced. We quickly identified that this approach was lacking the ability to understand changes that will ripple or cascade throughout a repository, such as changing the signature of a method and needing to update each place it is called in external files.

This area of handling cascaded changes is what we refer to as ‘Phase 2’ and is the next set of development efforts the team is undertaking. Phase 2, inspired by the work of Microsoft in their CodePlan: Repository-level Coding using LLMs and Planning paper involves cascading changes throughout a repository of code. We can detect what the changed code “touches” by leveraging the LSP server (similar to what we already use in analyzer-lsp). Each of those changes is then fed back into the algorithm, over and over, until no more changes are necessary.

#3 Desire to work with knowledge not in the model

The nature of modernization engagements in organizations is that a large number of the issues faced deal with internal propietary frameworks. It is unlikely that an existing model will have data on these frameworks. We are leveraging the RAG pattern described previously to help mitigate this concern by supplying few-shot prompt examples with extra contextual information of how the organization has solved this problem in the past.

Going beyond this RAG pattern we see future work paths to ease the integration of exposing Konveyor’s data into on-premise AI platforms such as Open Data Hub. This integration would support an organization’s ability for fine tuning a local model on the data they have collected in Konveyor.

#4 Model capabilities are quickly improving

We’ve built Kai to be model agnostic so organizations may experiment with new models as they are released.

Kai may be configured with coordinates to a LLM provider, allowing for use with public AI providers, internally hosted LLMs, or even local LLMs. We introduced a concept of being able to tweak generated prompts based on the ‘family’ of LLM being used to help facilitate freedom with exploring various models.

#5 Iterate on responses with a LLM to improve the solution

We are building the notion of an Agent into Kai to help improve the quality of a LLM’s final solution by working with the LLM to iterate on a smaller scoped solution and then running that solution through several tools to help check the validity and go back to the LLM to address problems found.

Next Steps?

The team is working towards integration with Konveyor in Summer 2024, yet plans to remain in the konveyor-ecosystem for a few more months as the solution is implemented and improved, we expect to pursue being an official Konveyor component later in 2024.

Repositories:

If you are interested to collaborate or have questions, consider:

Joining our biweekly community calls
Filing a GitHub Issue - https://github.com/konveyor-ecosystem/kai/issues
Send an email to the konveyor-dev@googlegroups.com mailing list
Chat with us at kubernetes.slack.com #konveyor
- For slack invites - Join Kubernetes on Slack