Base class for any model producing P(correct) over (subject, item) cells.
Subclasses implement predict(), which accepts a long-form query
(a dict of 1-D index tensors) and returns one probability per row.
forward() is a thin wrapper that delegates to predict(),
so model(query) works via nn.Module.__call__().
Each subclass declares the keys it consumes via expected_keys.
The default ("subject_idx","item_idx") covers every IRT-style
model; condition-aware or trial-aware models extend it.
Although the recipe for forward pass needs to be defined within
this function, one should call the Module instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Abstract base for factor-based Item Response Theory models.
Specialises Predictor for models with explicit ability and
difficulty parameters that compose into a per-cell probability via
a logistic link. Subclasses implement predict() (inherited from
Predictor) by gathering parameters at the query indices and
applying the IRT formula — see _irt_probability().
For non-factor predictors (TabPFN-style, neural baselines), inherit
Predictor directly instead.
data (LongFormData | torch.Tensor) – Either a LongFormData (canonical
long-form input — every observation is one row) or a wide-form
response tensor of shape (n_subjects,n_items). For wide-form,
missing entries may be encoded as NaN or -1.
mask (torch.Tensor | None) – Only used when data is a wide-form tensor — boolean mask of
entries to use for fitting. Inferred from NaN/-1 when None.
Ignored for long-form input (absent rows are absent observations).
method (str) – Fitting method: "mle", "em", "jml", or "svi".
max_epochs (int) – Maximum number of optimization epochs.
data (LongFormData | torch.Tensor) – Long-form dataset (preferred) or wide-form response tensor of
shape (n_subjects,n_items). For wide-form, missing entries
may be encoded as NaN or -1.
mask (torch.Tensor | None) – Only used with wide-form input — boolean mask of entries to use.
Inferred from NaNs if None.
device (str or torch.device or None) – Device for the returned tensors. None uses the torch default.
Returns:
{"subject_idx":LongTensor(n_subjects*n_items,),"item_idx":LongTensor(n_subjects*n_items,)}. Row order is subject-major:
all of subject 0’s items first, then subject 1’s items, etc.
Predict over the full (n_subjects,n_items) Cartesian grid.
Convenience wrapper around cartesian_query() + model.predict,
reshaped back to a (n_subjects,n_items) matrix. Use this for
visualization, EM quadrature, and other callers that genuinely want
the dense view.
Parameters:
model (Predictor) – Any predictor with a (n_subjects,n_items) universe.
**extra_keys (torch.Tensor) – Additional query columns required by the model’s expected_keys
beyond subject_idx / item_idx. Each must be 1-D of length
n_subjects*n_items.
Fit the Ising model via Maximum Pseudo-Likelihood Estimation.
Minimises the summed binary cross-entropy of each item given all
other items across all observed (subject, item) pairs.
Parameters:
data (LongFormData | torch.Tensor) – Long-form dataset (preferred) or wide-form binary response tensor
of shape (n_subjects,n_items). NaN or -1 marks missing.
mask (torch.Tensor | None) – Only used with wide-form input — boolean mask of observed
entries. Inferred from NaNs if None.
Gaussian Graphical Model for continuous response data.
Estimates a sparse precision matrix K via the GraphicalLasso objective,
optimised with Adam using a Cholesky parameterisation to ensure K remains
positive definite.
Parameters:
n_items (int) – Number of items (nodes in the network).
lam (float) – L1 regularisation strength on off-diagonal precision entries.
Larger values produce sparser networks.
Minimises −logdetK+tr(SK)+λ·Σᵢ≠ⱼ|Kᵢⱼ| with K
constrained to be positive definite via Cholesky parameterisation.
Parameters:
data (LongFormData | torch.Tensor) – Long-form dataset (preferred) or wide-form continuous response
tensor of shape (n_subjects,n_items). NaN or -1 marks missing.
mask (torch.Tensor | None) – Only used with wide-form input — boolean mask. Inferred from
NaNs if None.
Models the probability that subject a beats subject b as:
\[P(a > b) = \sigma(\theta_a - \theta_b)\]
Mathematically equivalent to Rasch, but the “item” axis is itself a
subject — so predict(query) consumes subject_idx (the A-side)
and item_idx (the B-side).
Parameters:
n_subjects (int) – Number of subjects (e.g., LLMs).
Identical to Rasch in prediction: mu=sigmoid(theta-b).
Uses Beta NLL loss instead of Bernoulli NLL for fitting, allowing
continuous responses in (0, 1) such as empirical probabilities.
Parameters:
n_subjects (int) – Number of subjects (test-takers / models).
n_items (int) – Number of items (test questions / benchmark tasks).
phi (float) – Beta distribution precision parameter. Higher values mean
tighter concentration around the predicted mean. Default 10.0.
response_matrix (torch.Tensor) – Continuous response matrix with values in (0, 1),
shape (n_subjects, n_items). Values must be strictly
between 0 and 1 (exclusive).
mask (torch.Tensor | None) – Boolean mask of entries to use. If None, uses all non-NaN entries.
method (str) – Fitting method: “mle”, “em”, or “jml”.
Identical to TwoPL in prediction: mu=sigmoid(a*(theta-b)).
Uses Beta NLL loss instead of Bernoulli NLL for fitting, allowing
continuous responses in (0, 1) such as empirical probabilities.
Parameters:
n_subjects (int) – Number of subjects (test-takers / models).
n_items (int) – Number of items (test questions / benchmark tasks).
phi (float) – Beta distribution precision parameter. Higher values mean
tighter concentration around the predicted mean. Default 10.0.
response_matrix (torch.Tensor) – Continuous response matrix with values in (0, 1),
shape (n_subjects, n_items). Values must be strictly
between 0 and 1 (exclusive).
mask (torch.Tensor | None) – Boolean mask of entries to use. If None, uses all non-NaN entries.
method (str) – Fitting method: “mle”, “em”, or “jml”.
Instead of learning independent parameters for each item, this model
learns a mapping from item embeddings to item parameters (difficulty,
discrimination, guessing). This enables zero-shot prediction on new
items given their embeddings.
P(correct) = c + (1-c) * sigmoid(a * (theta - b))
where b, a, c = f(embedding) are predicted by a neural network.
Compute P(correct) at query rows for the given facet level(s).
Query must contain subject_idx and item_idx (1-D, length N).
Optionally facet_idx (1-D, length N or scalar). When omitted,
defaults to facet level 0 — matches the prior behavior where
fitting did not surface facet information.
Anchor a facet level to zero (e.g., English baseline).
Forces gamma[level_idx]=0 and tau[:,level_idx]=0 at both
fit and predict time. Also zeros delta[:,level_idx] (the subject
intercept under the reference facet is absorbed by ability).
A constrained factor model with one general factor and multiple
group-specific factors. The general factor loads on all items,
while group factors load only on items in their cluster.
Build a testlet mapping from hierarchical item identifiers.
Parameters:
item_ids (list[str]) – Item identifiers with testlet structure, e.g.
["task_1:0","task_1:1","task_2:0",...].
The prefix before separator identifies the testlet.
separator (str) – Delimiter between testlet name and sub-item index.
Returns:
testlet_map (torch.Tensor) – Integer tensor of shape (n_items,) mapping each item to
its testlet index.
testlet_names (list[str]) – Ordered list of unique testlet names (first-seen order).
Uses the next-token yes/no log-probability ratio from a causal language
model to predict whether a subject would answer an item correctly.
Optionally prepends same-subject in-context examples from labeled.
Compute response probability P(subject passes item).
Parameters:
data (dict) – Dictionary with keys "subject_content", "item_content",
"benchmark", and "condition".
labeled (list[dict] or None) – Previously observed subject-item-response records with keys
"subject_content", "item_content", "benchmark",
"condition", and "label" (float in [0, 1]). Same-subject
records are prepended as in-context examples.
Returns:
Predicted probability that the subject passes the item, clipped to
[1e-7,1-1e-7].
Instead of learning independent parameters for each item, this model
learns a mapping from item embeddings to item parameters (difficulty,
discrimination, guessing). This enables zero-shot prediction on new
items given their embeddings.
P(correct) = c + (1-c) * sigmoid(a * (theta - b))
where b, a, c = f(embedding) are predicted by a neural network.
Compute P(correct) at query rows for the given facet level(s).
Query must contain subject_idx and item_idx (1-D, length N).
Optionally facet_idx (1-D, length N or scalar). When omitted,
defaults to facet level 0 — matches the prior behavior where
fitting did not surface facet information.
Identical to Rasch in prediction: mu=sigmoid(theta-b).
Uses Beta NLL loss instead of Bernoulli NLL for fitting, allowing
continuous responses in (0, 1) such as empirical probabilities.
Parameters:
n_subjects (int) – Number of subjects (test-takers / models).
n_items (int) – Number of items (test questions / benchmark tasks).
phi (float) – Beta distribution precision parameter. Higher values mean
tighter concentration around the predicted mean. Default 10.0.
response_matrix (torch.Tensor) – Continuous response matrix with values in (0, 1),
shape (n_subjects, n_items). Values must be strictly
between 0 and 1 (exclusive).
mask (torch.Tensor | None) – Boolean mask of entries to use. If None, uses all non-NaN entries.
method (str) – Fitting method: “mle”, “em”, or “jml”.
Identical to TwoPL in prediction: mu=sigmoid(a*(theta-b)).
Uses Beta NLL loss instead of Bernoulli NLL for fitting, allowing
continuous responses in (0, 1) such as empirical probabilities.
Parameters:
n_subjects (int) – Number of subjects (test-takers / models).
n_items (int) – Number of items (test questions / benchmark tasks).
phi (float) – Beta distribution precision parameter. Higher values mean
tighter concentration around the predicted mean. Default 10.0.
response_matrix (torch.Tensor) – Continuous response matrix with values in (0, 1),
shape (n_subjects, n_items). Values must be strictly
between 0 and 1 (exclusive).
mask (torch.Tensor | None) – Boolean mask of entries to use. If None, uses all non-NaN entries.
method (str) – Fitting method: “mle”, “em”, or “jml”.
A constrained factor model with one general factor and multiple
group-specific factors. The general factor loads on all items,
while group factors load only on items in their cluster.