Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
rubyaftermidnight avatarrubyaftermidnight avatar
Ruby M.

@rubyaftermidnight

llm tamer

https://feli.fyi
$0total balance
$0charity balance
$0cash balance

$0 in pending offers

About Me

I'm ruby, an ex-language analyst currently working as a high-flow community manager in an affiliate organization. I learned in the past few years my real passion are language models: how they work, how they interact with people, and how they impact us & their own development. My experience in studying language and history, as well as my time as a cryptologic language analyst translating pashto in the USAF, has geared me to a very unique, very valuable niche within this field that helps me engage in novel ways, without ever losing my own voice.

Comments

TELL: The AI Detector You *Don't* Have To Trust
rubyaftermidnight avatar

Ruby M.

4 days ago

I fear your demonstration is putting too much weight into whether a piece of text sounds like it is from an assistant or not, which reflects a small training foundation of probably mostly assistant prompted generated text. It is quite good at telling when it sounds like stock-standard AI slop, but I feel anyone can do that. When I change just a few characters, change the em dashes to -, change sided " to standard quotation marks, and remove one period from an ellipses, as well as remove the names of the speakers and replace them with Speaker 1/ Speaker 2. The estimate drops from -0.15 human (wrong) to -0.95 human (wrong).

I threw a few passages in there of rather nuanced heavy conversations I had, as well as some conversations I had with a smaller model. It got perfect marks against the local llama model that fits on my gpu. But at the moment, seems to be worse than a coin flip for ambiguous text (which is what we're trying to fill the gap for with these types of detectors; obvious ai text is obvious), which is confusing. Pangram's research paper on their method seems to be what you are doing except done in a way that doesn't rely on AI to rate AI work; it instead works on the false positive rate by continually training in false positives so that it can actually detect what a false positive will look like, which has improved Pangram's results explosively in my opinion. I thought it was useless before, but it is worth examining their methodology. Here, the explanations the models gave both on the experimental page + the other ones I saw (as one broke) looking at the network request were massive underestimates of what an LLM can do and made sweeping assumptions about the depth or philosophical difficulty of a passage.

At times, it seems to rate a passage purely on whether it sounds empathetic enough to be human and doesn't sound fake. Unfortunately, we live in a time where nuanced, well-spoken AI will be commonplace soon enough, so that's not good enough either. I struggle to know whether more training would solve this issue, because if the problem is it needs to distinguish between meaningful text and fakely meaningful text an ai writes, if it's an AI, then how can it know what is meaningful and what isn't if that's the definition?

I submitted all my entries with feedback correct/incorrects so you can review them if you want. Interesting idea for an alternative solution to this problem with an uncommon training style for this purpose, but I fail to see the things it's intended to show off.