Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
🐻
🐻

@e4ee50fa-e4a5-4e08-83ba-059e8ef22d02

$0total balance
$0charity balance
$0cash balance

$0 in pending offers

Projects

Token-level interpretability of long-chain reasoning in transformer modelspending grant agreement signature
Toward an Equivalence-Level Theory of Transformer Models

Comments

Token-level interpretability of long-chain reasoning in transformer models
🐻

3 days ago

Research update — June 2026

Since posting this project, I have continued the literature review and theoretical formulation. The project has narrowed into a more concrete technical question:

Can chain-of-thought reasoning be modelled as a learned local transition over hidden representation states, repeatedly applied across decoding time, and validated through token-level long-chain representation trajectories?

This is a sharpening of the original proposal rather than a pivot. The original project focused on token-level interpretability of long-chain reasoning in transformer models. The current formulation makes that object more precise: the aim is to understand the local transition by which a model moves through representation states as it generates each token in a chain of thought.

The relevant literature already covers many adjacent ingredients: CoT as intermediate computation, CoT as extra serial depth, CoT expressivity, sample-efficiency gains from CoT, test-time compute, latent/non-natural-language reasoning substrates, mechanistic state tracking, and representation trajectories during reasoning. The gap I am now targeting is the synthesis: a predictive or mechanistic account of CoT as a weight-tied, prefix-recursive local transition system over representation states.

This gives the project a clearer near-term timeline. My current working plan is:

1. finish the literature-grounded reformulation by the end of June;

2. spend July deriving and stress-testing the central technical insight;

3. spend August setting up and running experiments, drafting the manuscript, and preparing a publishable result if the direction continues to hold.

The target remains an ICLR-oriented technical result if the work produces sufficiently strong evidence. If the strongest version of the hypothesis fails, the fallback output would be a narrower methodological paper or useful negative result clarifying the limits of token-level representation-trajectory modelling.