An AI Companion for Developers to Tune Complex Technologies
(Co-advisor Yunus Durmuş)
In this project, we aim to create an AI companion to help developers in tuning complex technologies, such as Apache Spark. Apache Spark is a distributed data processing engine with many features. It has hundreds of settings on its own and on top of that the platform that they run has additional properties to tune. Many developers struggle to follow best practices. They implement underperforming code which uses more resources than needed and takes longer to finish. Mostly external experts are hired in companies for troubleshooting.
Our goal is to create an AI companion to guide the developer. The AI companion does not implement a code from scratch. Instead, after an application is deployed, it helps in troubleshooting and tuning. We foresee the following steps:
-
The companion collects statistics and details of jobs. In the case of Spark, there is Spark UI provides all the details
-
Based on these statistics, and query plans, the AI companion should highlight the pain points e.g., the join statements shuffle large amounts of data, instead broadcast join can be used. The data is not partitioned well and hence we don’t have enough parallelism.
-
-
It should read the code and match the pain points with statistics and job plans to the exact location in the code.
-
Together with input data sources and the code, the AI companion should first show the problematic parts and then offer fixes.
-
We can provide best practices manually or the companion can pull these from the internet.
-
-
Finally, when an error is thrown in the logs, the AI companion should explain the developer what the problem is and what are the possible solutions
This is a demanding project where the students will learn about LLMs, Apache Spark, and Cloud Technologies.
It is best to collaborate in this project with 2 students. Apache Spark is an example technology. If the students have knowledge of another technology, we can consider that too.