Any idea if we can use state of the art code-LLM dense embeddings along with a trainable classifier to identify the different code quality metrics ? We don’t have to worry about manually identifying the code metrics thresholds. However, we will need a good labeled dataset. Your thoughts ?
@valentinapakhomova7833 Жыл бұрын
Hi! I see three possible cases here: 1. You will prepare a big dataset of ‘good’ code and give it to AI. The question here is - how do YOU know that the code that you put in the dataset is good? You based your judgement on some factors, and it would be more efficient to write down those factors and analyse them yourself. This way, you will come up with a list of metrics & threshold values without the need to spend time & resources to create a dataset & to train the AI 2. You could try to use your whole codebase as a dataset and tell AI that this code is good (since the code has passed the review and has been merged to master). This way, AI will try do define code quality metrics & threshold values specific to your company. But most likely, these metrics will not be strict. If AI finds a lot of places where you have 10+ arguments in a function, it will consider it to be OK, and therefore will set a higher error reference value. And if you’re OK with this, then looks like this approach would work for you. Additionally, if we still consider that all code that gets merged to master is ‘good’, we can setup continuous AI training by sending it all the code that gets merged to master and tell that that’s what good code looks like. But can you be sure, that the reviewers will always be strict and attentive? I highly doubt that. And because of that, AI’s standards will be continuously degrading & in the end it would work like as if you’re not trying to control code quality at all. 3. Let's let our imagination run wild and expand on the previous case. Imagine that sometime in the future it will be possible to track how much time the developer spent reading and understanding the code, how many times the code has been refactored and how many bugs were found in this piece of code (we also need to distinguish between bugs that appeared due to the code being poorly written and bugs that appeared because the developer simply did not understand the requirements completely). We will have to come up the optimal formula (for example, the number of bugs should be equal to 0, the number of refactors is also 0, and the code is read with the speed of 0.5 seconds per line). And if we give the entire codebase PLUS this data as an initial dataset, then, in theory, AI will be able to provide us with interesting metrics that we would not have thought of. But I believe such possibilities are too far in the future :) Bottom line: in the current state of LLM dense embeddings & trainable classifiers I believe it’s better to define the metrics & thresholds yourself, since it gives you the flexibility to make the metrics as strict as you want them to be.