@rubyaftermidnight Hi! That's great feedback, thanks so much for the detailed comment!!
To be clear, I think the version you were trying was one of the earliest iterations. What I wanted to show at that time was just the idea of "tagging" the reasons why the system thinks something is / is not AI. But I didn't expect it to perform well, it was a super early experiment. My bad because I wasn't super clear on that - I've added a disclaimer to the website that I think makes things more clear, would that help?
> This system is a very early experimental prototype. Do NOT trust the model's predictions for real-world decisions, we've trained it for very few steps. And keep in mind that we iterate frequently, so any outputs you see here may change significantly as we keep developing it. Our goal here is to showcase how a real system could work, for example when it comes to showing the annotated reasons for the predictions, but you should not expect it to perform well, at least for now.
Then, I've been working on doing better training - and longer runs - and I think the system is much better now (and nuanced). Would you have some time to test it out and let me know if you think this iteration's better? (I added some examples to the website if you want inspiration) :)