Can a Traffic-Light Test Catch False AI Claims About Ancient Scripts?

Project summary:

AI can sound confident before a claim has been tested. This project builds a simple public way to test AI-related interpretation claims against evidence before people believe them.

Egyptian hieroglyphics are the starting point because the script is already understood. That makes Egyptian the right control case: AI-related claims can be tested against known answers instead of guesses. If the method works here, it can later be extended to more uncertain scripts, including Meroitic, and eventually to harder undeciphered-script claims.

AI does not “figure it out.” Humans make claims. This pilot tests those claims with evidence, reviewer checks, and AI traffic logic logged as input — not authority.

Comic summary: One claim. One evidence trail. Two independent reviews. One recorded outcome.

Video overview

I made a short video explaining the project in plain language: why Egyptian hieroglyphics are the control case, how one claim becomes one evidence trail, how reviewer checks and AI comments are logged, and why each case receives a Green, Yellow, or Red outcome.

Watch the video overview here: https://drive.google.com/file/d/1zblJXrUnLx54IZSduhIe7eixNhsGzNFk/view?usp=drivesdk

What this pilot will produce

Minimum funding ($12,000) will produce:

10 completed Egyptian test cases
reviewer checks where possible
AI traffic-light logic labels
a public dataset

A short report showing which claims held, weakened, remained unresolved, or failed.

Full goal ($15,000) will expand the pilot to:

12–15 cases
stronger reviewer coverage
cleaner dataset preparation
expanded final documentation across a four-month timeline

What I built

I created a one-page prototype called the Falsifiability Sheet: a structured record for testing one interpretation claim.

Each sheet captures:
the claim being tested
the object, image, or inscription
the evidence used
reviewer notes
AI comments
one final traffic-light outcome

The traffic-light outcome is simple:

Green = supported

Yellow = uncertain or unresolved

Red = unsupported, overconfident, or failed

The purpose is not to force agreement. The purpose is to make the reasoning visible, testable, and public.

Example: one Egyptian claim tested through evidence, reviewer checks, AI comments, traffic-light logic, and one recorded result.

Why Egypt first

This pilot starts with Egyptian hieroglyphics because the script is already understood. I am not trying to re-decipher Egyptian. I am using it as a stable testing ground to evaluate whether the method can test claims clearly, repeatably, and publicly.

Starting with a known script reduces risk. The method can be tested against known answers before it is applied to more uncertain materials.

How this project works

This project works at the claim level:

one claim → one evidence trail → reviewer checks where possible → one AI record → one public result

A case means one claim tested against one object, image, or inscription.

1. Build the case record

Each case gets one Falsifiability Sheet containing the claim, evidence, uncertainty labels, reviewer notes, and final outcome.

2. Test the claim against evidence

AI will not guess freely. It will review only a fixed evidence packet: the image, claim, evidence, notes, and uncertainty labels.

3. Add reviewer checks where possible

R1/R2 reviewers — two independent readers where available — check the evidence trail and flag disagreements, contradictions, or weak links.

4. Record the result publicly

Each case receives one traffic-light result:

Green = supported
Yellow = uncertain or unresolved
Red = unsupported, overconfident, or failed

The final output will be a public dataset and short report showing which claims held, weakened, remained unresolved, or failed.

Goals

Test whether the Falsifiability Sheet can evaluate interpretation claims using Egyptian hieroglyphics as the control case
Produce 10 completed Egyptian test sheets
Publish a public dataset showing which claims held, weakened, remained unresolved, or failed
Show whether the method can scale from a known control case toward more uncertain ancient-script claims

Why this can scale

The method is not tied to one script. It is tied to a repeatable process:
one claim, one evidence trail, restrained AI review, reviewer checks where possible, and one recorded outcome.
Egyptian hieroglyphics provide the known baseline. Meroitic or other partially understood scripts would provide the next test of uncertainty. Undeciphered materials would come later, only after the method has been tested on easier ground first.
The broader aim is not to solve every ancient script. It is to build a reusable way to test interpretation claims as uncertainty increases.

How funding will be used

$7,000 — Project lead: analysis, case preparation, documentation
$3,000–$4,000 — Independent reviewers: R1/R2 checks where possible
$1,000–$2,000 — Publishing, dataset preparation, and materials
Minimum ($12,000): complete a 10-case pilot with full dataset and report.
Goal ($15,000): expand to 12–15 cases, increase reviewer coverage, and strengthen dataset quality across a four-month timeline.

Who is leading this project?

This project is led by Michael Grasa, an independent researcher working at the intersection of ancient-script interpretation, AI verification, and public falsifiability tools.

This work builds on prior support from Emergent Ventures at the Mercatus Center, George Mason University, and conference-stage research presented in New Delhi, India, in 2025, with proceedings forthcoming. That conference-stage work began as Version 1 of the method and has since developed into the current Version 5 Falsifiability Sheet. The conference abstract appears on page 43 of the Book of Abstracts.

The framework has been developed through multiple iterations (V1–V5) and includes structured peer review (R1/R2) and AI accountability tracking.

Previous work and public outputs

Emergent Ventures reference: Marginal Revolution search result
Conference abstract: Book of Abstracts, page 43
Zenodo: DOI-linked research archive
LinkedIn: public-facing updates and engagement
GitHub: open project materials

The project is built around a simple public method: one claim, one evidence trail, reviewer checks where possible, AI comments, and one recorded traffic-light result.

Nonprofit and fiscal support is also available through Echoes of the Script for partners or donors who prefer a formal route.

Risks and outcomes if unsuccessful:

Independent reviewers may disagree
Some cases may remain unresolved
The method may need revision before wider use

Even if the pilot is unsuccessful, it will still produce a public record of what worked, what failed, and where the method needs revision. The goal is not to force success, but to produce transparent, testable outcomes.

Prior funding

Emergent Ventures at the Mercatus Center, George Mason University: approximately $12,000 received in the last 6 months.