Models

Measurement models: IRT, factor models, network models, and rotation utilities.

class torch_measure.models.Predictor(n_subjects, n_items, device='cpu')[source]

Base class for any model producing P(correct) over (subject, item) cells.

Subclasses implement predict(), which accepts a long-form query (a dict of 1-D index tensors) and returns one probability per row. forward() is a thin wrapper that delegates to predict(), so model(query) works via nn.Module.__call__().

Each subclass declares the keys it consumes via expected_keys. The default ("subject_idx", "item_idx") covers every IRT-style model; condition-aware or trial-aware models extend it.

Parameters:
abstractmethod predict(query)[source]

Predict P(correct) for each row of query.

Parameters:

query (dict[str, torch.Tensor]) – Must contain a 1-D tensor for each name in expected_keys, all of equal length N. Extra keys are ignored.

Returns:

Predicted probabilities, shape (N,) on the model’s device.

Return type:

torch.Tensor

forward(query)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Parameters:

query (dict[str, Tensor])

Return type:

Tensor

class torch_measure.models.IRTModel(n_subjects, n_items, device='cpu')[source]

Abstract base for factor-based Item Response Theory models.

Specialises Predictor for models with explicit ability and difficulty parameters that compose into a per-cell probability via a logistic link. Subclasses implement predict() (inherited from Predictor) by gathering parameters at the query indices and applying the IRT formula — see _irt_probability().

For non-factor predictors (TabPFN-style, neural baselines), inherit Predictor directly instead.

Parameters:
fit(data, mask=None, method='mle', max_epochs=1000, lr=0.01, verbose=True, **kwargs)[source]

Fit the model.

Parameters:
  • data (LongFormData | torch.Tensor) – Either a LongFormData (canonical long-form input — every observation is one row) or a wide-form response tensor of shape (n_subjects, n_items). For wide-form, missing entries may be encoded as NaN or -1.

  • mask (torch.Tensor | None) – Only used when data is a wide-form tensor — boolean mask of entries to use for fitting. Inferred from NaN/-1 when None. Ignored for long-form input (absent rows are absent observations).

  • method (str) – Fitting method: "mle", "em", "jml", or "svi".

  • max_epochs (int) – Maximum number of optimization epochs.

  • lr (float) – Learning rate.

  • verbose (bool) – Whether to show a progress bar.

Returns:

Training history with loss values.

Return type:

dict

class torch_measure.models.NetworkModel(n_items, device='cpu')[source]

Abstract base class for network psychometric models.

Network models characterize the conditional dependence structure among items rather than estimating per-subject latent traits. They expose:

  • .fit(response_matrix, …) to estimate network parameters

  • .adjacency to access the estimated edge weight matrix

  • .centrality(measure) for common node centrality metrics

Unlike IRTModel, there is no notion of subjects or per-subject ability — the model is defined entirely over items.

Parameters:
abstract property adjacency: Tensor

Edge weight matrix of shape (n_items, n_items).

Symmetric with zero diagonal. Positive values indicate positive conditional dependence; negative values indicate negative dependence.

Returns:

Detached weight matrix, shape (n_items, n_items).

Return type:

torch.Tensor

abstractmethod fit(data, mask=None, max_epochs=1000, lr=0.01, verbose=True, **kwargs)[source]

Estimate network parameters.

Parameters:
  • data (LongFormData | torch.Tensor) – Long-form dataset (preferred) or wide-form response tensor of shape (n_subjects, n_items). For wide-form, missing entries may be encoded as NaN or -1.

  • mask (torch.Tensor | None) – Only used with wide-form input — boolean mask of entries to use. Inferred from NaNs if None.

  • max_epochs (int) – Maximum optimisation epochs.

  • lr (float) – Learning rate.

  • verbose (bool) – Show progress bar.

Returns:

Training history with "losses" key.

Return type:

dict

centrality(measure='strength')[source]

Compute node centrality from the estimated adjacency matrix.

Parameters:

measure (str) – One of "strength", "expected_influence", "closeness", or "betweenness".

Returns:

Centrality scores per item, shape (n_items,).

Return type:

torch.Tensor

torch_measure.models.cartesian_query(n_subjects, n_items, device=None)[source]

Build the (subject, item) Cartesian-product query of size n_subjects * n_items.

Useful when a caller wants the dense matrix of predictions; see predict_dense() for the common shortcut.

Parameters:
  • n_subjects (int) – Universe sizes.

  • n_items (int) – Universe sizes.

  • device (str or torch.device or None) – Device for the returned tensors. None uses the torch default.

Returns:

{"subject_idx": LongTensor (n_subjects*n_items,), "item_idx": LongTensor (n_subjects*n_items,)}. Row order is subject-major: all of subject 0’s items first, then subject 1’s items, etc.

Return type:

dict[str, torch.Tensor]

torch_measure.models.predict_dense(model, **extra_keys)[source]

Predict over the full (n_subjects, n_items) Cartesian grid.

Convenience wrapper around cartesian_query() + model.predict, reshaped back to a (n_subjects, n_items) matrix. Use this for visualization, EM quadrature, and other callers that genuinely want the dense view.

Parameters:
  • model (Predictor) – Any predictor with a (n_subjects, n_items) universe.

  • **extra_keys (torch.Tensor) – Additional query columns required by the model’s expected_keys beyond subject_idx / item_idx. Each must be 1-D of length n_subjects * n_items.

Returns:

Probability matrix of shape (n_subjects, n_items).

Return type:

torch.Tensor

class torch_measure.models.IsingModel(n_items, device='cpu')[source]

Ising model for binary response data.

Estimates the pairwise conditional dependence structure between items via Maximum Pseudo-Likelihood Estimation (MPLE).

Parameters:
  • n_items (int) – Number of items (nodes in the network).

  • device (str) – Device to place parameters on.

thresholds

Node threshold parameters τ, shape (n_items,).

Type:

nn.Parameter

adjacency

Estimated symmetric edge weight matrix Θ, shape (n_items, n_items), zero diagonal.

Type:

torch.Tensor

Examples

>>> model = IsingModel(n_items=10)
>>> history = model.fit(binary_responses, max_epochs=500, verbose=False)
>>> W = model.adjacency          # (10, 10) edge weights
>>> s = model.centrality("strength")  # (10,) strength centrality

References

property adjacency: Tensor

Symmetric edge weight matrix (n_items, n_items), zero diagonal.

conditional_probs(response_matrix)[source]

Compute item-conditional response probabilities given all other items.

For each subject i and item j:

P(Xᵢⱼ = 1 | Xᵢ,₋ⱼ) = sigmoid(τⱼ + Σₖ≠ⱼ Θⱼₖ Xᵢₖ)

Parameters:

response_matrix (torch.Tensor) – Binary response matrix (n_subjects, n_items).

Returns:

Conditional probabilities (n_subjects, n_items).

Return type:

torch.Tensor

fit(data, mask=None, max_epochs=1000, lr=0.01, verbose=True, convergence_tol=1e-06, **kwargs)[source]

Fit the Ising model via Maximum Pseudo-Likelihood Estimation.

Minimises the summed binary cross-entropy of each item given all other items across all observed (subject, item) pairs.

Parameters:
  • data (LongFormData | torch.Tensor) – Long-form dataset (preferred) or wide-form binary response tensor of shape (n_subjects, n_items). NaN or -1 marks missing.

  • mask (torch.Tensor | None) – Only used with wide-form input — boolean mask of observed entries. Inferred from NaNs if None.

  • max_epochs (int) – Maximum optimisation epochs.

  • lr (float) – Adam learning rate.

  • verbose (bool) – Show tqdm progress bar.

  • convergence_tol (float) – Stop early if |Δloss| < tol.

Returns:

{"losses": [float, ...]}.

Return type:

dict

class torch_measure.models.GaussianGraphicalModel(n_items, lam=0.1, device='cpu')[source]

Gaussian Graphical Model for continuous response data.

Estimates a sparse precision matrix K via the GraphicalLasso objective, optimised with Adam using a Cholesky parameterisation to ensure K remains positive definite.

Parameters:
  • n_items (int) – Number of items (nodes in the network).

  • lam (float) – L1 regularisation strength on off-diagonal precision entries. Larger values produce sparser networks.

  • device (str) – Device to place parameters on.

precision

Estimated precision matrix K = LLᵀ, shape (n_items, n_items).

Type:

torch.Tensor

partial_correlations

Partial correlation matrix, shape (n_items, n_items). Diagonal is 1.

Type:

torch.Tensor

adjacency

Partial correlations with zero diagonal (edge weights).

Type:

torch.Tensor

Examples

>>> model = GaussianGraphicalModel(n_items=10, lam=0.1)
>>> history = model.fit(continuous_responses, max_epochs=500, verbose=False)
>>> pcor = model.partial_correlations   # (10, 10)
>>> s = model.centrality("strength")    # (10,) strength centrality

References

property precision: Tensor

Estimated precision matrix K = LLᵀ, shape (n_items, n_items).

property partial_correlations: Tensor

Partial correlation matrix derived from the precision matrix.

pcorᵢⱼ = −Kᵢⱼ / √(Kᵢᵢ · Kⱼⱼ) for i ≠ j, 1 on the diagonal.

Returns:

Shape (n_items, n_items). Values in [−1, 1].

Return type:

torch.Tensor

property adjacency: Tensor

Partial correlations with zero diagonal (edge weight matrix).

fit(data, mask=None, max_epochs=1000, lr=0.01, lam=None, verbose=True, convergence_tol=1e-06, **kwargs)[source]

Fit the GGM via the GraphicalLasso objective.

Minimises −log det K + tr(S K) + λ · Σᵢ≠ⱼ |Kᵢⱼ| with K constrained to be positive definite via Cholesky parameterisation.

Parameters:
  • data (LongFormData | torch.Tensor) – Long-form dataset (preferred) or wide-form continuous response tensor of shape (n_subjects, n_items). NaN or -1 marks missing.

  • mask (torch.Tensor | None) – Only used with wide-form input — boolean mask. Inferred from NaNs if None.

  • max_epochs (int) – Maximum optimisation epochs.

  • lr (float) – Adam learning rate.

  • lam (float | None) – Override the instance-level L1 regularisation strength.

  • verbose (bool) – Show tqdm progress bar.

  • convergence_tol (float) – Stop early if |Δloss| < tol.

Returns:

{"losses": [float, ...]}.

Return type:

dict

class torch_measure.models.BradleyTerry(n_subjects, device='cpu')[source]

Bradley-Terry model for pairwise comparison data.

Models the probability that subject a beats subject b as:

\[P(a > b) = \sigma(\theta_a - \theta_b)\]

Mathematically equivalent to Rasch, but the “item” axis is itself a subject — so predict(query) consumes subject_idx (the A-side) and item_idx (the B-side).

Parameters:
  • n_subjects (int) – Number of subjects (e.g., LLMs).

  • device (str) – Device to place parameters on.

Examples

>>> from torch_measure.models import BradleyTerry
>>> from torch_measure.models._predictor import predict_dense
>>> model = BradleyTerry(n_subjects=3)
>>> predict_dense(model)  # (3, 3) win probability matrix
predict(query)[source]

Compute P(a beats b) at query rows.

query["subject_idx"] is the A-side; query["item_idx"] is the B-side (also a subject index).

Parameters:

query (dict[str, Tensor])

Return type:

Tensor

predict_pairwise(subject_a, subject_b)[source]

Domain-named convenience: P(a beats b) for explicit pair tensors.

Equivalent to self.predict({"subject_idx": subject_a, "item_idx": subject_b}).

Parameters:
Return type:

Tensor

fit(comparisons, method='mle', max_epochs=1000, lr=0.01, regularization=0.01, convergence_tol=1e-06, verbose=True)[source]

Fit the model to pairwise comparison data.

Parameters:
  • comparisons (PairwiseComparisons) – Pairwise comparison data with subject_a, subject_b, and outcome tensors.

  • method (str) – Fitting method: "mle" (Adam optimizer) or "jml" (LBFGS with L2 regularization).

  • max_epochs (int) – Maximum number of optimization epochs.

  • lr (float) – Learning rate.

  • regularization (float) – L2 regularization weight (only used for method="jml").

  • convergence_tol (float) – Stop if loss change is below this threshold.

  • verbose (bool) – Show progress bar.

Returns:

Training history with 'losses' key.

Return type:

dict

class torch_measure.models.Rasch(n_subjects, n_items, device='cpu')[source]

Rasch (1-Parameter Logistic) IRT model.

The simplest IRT model where P(correct) = sigmoid(theta - b): - theta: subject ability (one per subject) - b: item difficulty (one per item)

No discrimination or guessing parameters.

Parameters:
  • n_subjects (int) – Number of subjects (test-takers / models).

  • n_items (int) – Number of items (test questions / benchmark tasks).

  • device (str) – Device to place parameters on.

predict(query)[source]

Compute P(correct) = sigmoid(ability - difficulty) at query rows.

Parameters:

query (dict[str, Tensor])

Return type:

Tensor

class torch_measure.models.TwoPL(n_subjects, n_items, device='cpu')[source]

2-Parameter Logistic IRT model.

P(correct) = sigmoid(a * (theta - b)) where: - theta: subject ability - b: item difficulty - a: item discrimination (how well the item differentiates abilities)

Parameters:
  • n_subjects (int) – Number of subjects.

  • n_items (int) – Number of items.

  • device (str) – Device to place parameters on.

property discrimination: Tensor

Item discrimination parameters (constrained to be positive).

predict(query)[source]

Compute P(correct) = sigmoid(a * (theta - b)) at query rows.

Parameters:

query (dict[str, Tensor])

Return type:

Tensor

class torch_measure.models.ThreePL(n_subjects, n_items, device='cpu')[source]

3-Parameter Logistic IRT model.

P(correct) = c + (1 - c) * sigmoid(a * (theta - b)) where: - theta: subject ability - b: item difficulty - a: item discrimination - c: guessing parameter (lower asymptote)

Parameters:
  • n_subjects (int) – Number of subjects.

  • n_items (int) – Number of items.

  • device (str) – Device to place parameters on.

property discrimination: Tensor

Item discrimination parameters (constrained positive).

property guessing: Tensor

Item guessing parameters (constrained to [0, 1]).

predict(query)[source]

Compute P(correct) = c + (1-c) * sigmoid(a * (theta - b)) at query rows.

Parameters:

query (dict[str, Tensor])

Return type:

Tensor

class torch_measure.models.BetaRasch(n_subjects, n_items, phi=10.0, device='cpu')[source]

Beta-Rasch (1PL) IRT model.

Identical to Rasch in prediction: mu = sigmoid(theta - b). Uses Beta NLL loss instead of Bernoulli NLL for fitting, allowing continuous responses in (0, 1) such as empirical probabilities.

Parameters:
  • n_subjects (int) – Number of subjects (test-takers / models).

  • n_items (int) – Number of items (test questions / benchmark tasks).

  • phi (float) – Beta distribution precision parameter. Higher values mean tighter concentration around the predicted mean. Default 10.0.

  • device (str) – Device to place parameters on.

References

fit(response_matrix, mask=None, method='mle', max_epochs=1000, lr=0.01, verbose=True, **kwargs)[source]

Fit the Beta-Rasch model using Beta NLL loss.

Parameters:
  • response_matrix (torch.Tensor) – Continuous response matrix with values in (0, 1), shape (n_subjects, n_items). Values must be strictly between 0 and 1 (exclusive).

  • mask (torch.Tensor | None) – Boolean mask of entries to use. If None, uses all non-NaN entries.

  • method (str) – Fitting method: “mle”, “em”, or “jml”.

  • max_epochs (int) – Maximum optimization epochs.

  • lr (float) – Learning rate.

  • verbose (bool) – Show progress bar.

Returns:

Training history.

Return type:

dict

class torch_measure.models.BetaTwoPL(n_subjects, n_items, phi=10.0, device='cpu')[source]

Beta-2PL IRT model.

Identical to TwoPL in prediction: mu = sigmoid(a * (theta - b)). Uses Beta NLL loss instead of Bernoulli NLL for fitting, allowing continuous responses in (0, 1) such as empirical probabilities.

Parameters:
  • n_subjects (int) – Number of subjects (test-takers / models).

  • n_items (int) – Number of items (test questions / benchmark tasks).

  • phi (float) – Beta distribution precision parameter. Higher values mean tighter concentration around the predicted mean. Default 10.0.

  • device (str) – Device to place parameters on.

References

fit(response_matrix, mask=None, method='mle', max_epochs=1000, lr=0.01, verbose=True, **kwargs)[source]

Fit the Beta-2PL model using Beta NLL loss.

Parameters:
  • response_matrix (torch.Tensor) – Continuous response matrix with values in (0, 1), shape (n_subjects, n_items). Values must be strictly between 0 and 1 (exclusive).

  • mask (torch.Tensor | None) – Boolean mask of entries to use. If None, uses all non-NaN entries.

  • method (str) – Fitting method: “mle”, “em”, or “jml”.

  • max_epochs (int) – Maximum optimization epochs.

  • lr (float) – Learning rate.

  • verbose (bool) – Show progress bar.

Returns:

Training history.

Return type:

dict

class torch_measure.models.AmortizedIRT(n_subjects, n_items, embedding_dim, hidden_dim=256, n_layers=3, pl=2, dropout=0.1, device='cpu')[source]

Amortized IRT model.

Instead of learning independent parameters for each item, this model learns a mapping from item embeddings to item parameters (difficulty, discrimination, guessing). This enables zero-shot prediction on new items given their embeddings.

P(correct) = c + (1-c) * sigmoid(a * (theta - b))

where b, a, c = f(embedding) are predicted by a neural network.

Parameters:
  • n_subjects (int) – Number of subjects.

  • n_items (int) – Number of items.

  • embedding_dim (int) – Dimension of item embeddings.

  • hidden_dim (int) – Hidden dimension for the embedding projection network.

  • n_layers (int) – Number of layers in the projection network.

  • pl (int) – Number of IRT parameters: 1 (Rasch), 2 (+discrimination), 3 (+guessing).

  • dropout (float) – Dropout rate in the projection network.

  • device (str) – Device to place parameters on.

set_embeddings(embeddings)[source]

Set item embeddings for parameter prediction.

Parameters:

embeddings (torch.Tensor) – Item embeddings of shape (n_items, embedding_dim).

Return type:

None

property difficulty: Tensor

Predicted item difficulties from embeddings.

property discrimination: Tensor | None

Predicted item discriminations from embeddings (2PL/3PL only).

property guessing: Tensor | None

Predicted item guessing parameters from embeddings (3PL only).

predict(query)[source]

Compute P(correct) at query rows using amortized item parameters.

Parameters:

query (dict[str, Tensor])

Return type:

Tensor

fit(data, embeddings, mask=None, max_epochs=1000, lr=0.001, weight_decay=0.0001, verbose=True, **kwargs)[source]

Fit the amortized IRT model.

Parameters:
  • data (LongFormData | torch.Tensor) – Long-form dataset (preferred) or wide-form response tensor.

  • embeddings (torch.Tensor) – Item embeddings (n_items, embedding_dim).

  • mask (torch.Tensor | None) – Boolean mask for observed entries (only used with wide-form input).

  • max_epochs (int) – Maximum training epochs.

  • lr (float) – Learning rate.

  • weight_decay (float) – Weight decay for Adam optimizer.

  • verbose (bool) – Show progress bar.

Returns:

Training history.

Return type:

dict

class torch_measure.models.MultiFacetRasch(n_subjects, n_items, n_facet_levels, device='cpu')[source]

Many-Facet Rasch Model.

Extends the standard Rasch model with additional facets to model systematic sources of variation beyond ability and difficulty.

Parameters:
  • n_subjects (int) – Number of subjects.

  • n_items (int) – Number of items.

  • n_facet_levels (int) – Number of levels in the additional facet (e.g., number of languages).

  • device (str) – Device to place parameters on.

set_reference_level(level_idx)[source]

Set a facet level as the reference (anchor to zero).

Parameters:

level_idx (int) – Index of the reference level (e.g., 0 for English).

Return type:

None

predict(query)[source]

Compute P(correct) at query rows for the given facet level(s).

Query must contain subject_idx and item_idx (1-D, length N). Optionally facet_idx (1-D, length N or scalar). When omitted, defaults to facet level 0 — matches the prior behavior where fitting did not surface facet information.

Parameters:

query (dict[str, Tensor])

Return type:

Tensor

fit(response_matrix, mask=None, method='mle', **kwargs)[source]

Fit the model.

Supports all fitting methods: ‘mle’, ‘em’, ‘jml’, ‘svi’.

class torch_measure.models.MultiFacet2PL(n_subjects, n_items, n_facet_levels, device='cpu')[source]

Many-Facet 2PL IRT Model with anchoring (Bayesian SVI only).

Parameters:
  • n_subjects (int) – Number of subjects.

  • n_items (int) – Number of items.

  • n_facet_levels (int) – Number of levels in the additional facet (e.g., number of languages).

  • device (str) – Device to place parameters on.

Notes

Estimation is Bayesian SVI via Pyro. Install with: pip install torch_measure[bayesian]

property discrimination: Tensor

Per-item discrimination, constrained positive via exp.

set_reference_level(level_idx)[source]

Anchor a facet level to zero (e.g., English baseline).

Forces gamma[level_idx] = 0 and tau[:, level_idx] = 0 at both fit and predict time. Also zeros delta[:, level_idx] (the subject intercept under the reference facet is absorbed by ability).

Parameters:

level_idx (int)

Return type:

None

set_anchor_items(item_indices)[source]

Mark items whose tau should be near zero across all facet levels.

Anchor items get a tight (sd=0.01) Student-t prior on tau, encoding the assumption that their difficulty is invariant across the facet.

Parameters:

item_indices (Sequence[int] | Tensor)

Return type:

None

predict(facet_indices=None)[source]

Compute response probabilities for one facet level.

Parameters:

facet_indices (torch.Tensor | None) – Single facet level index. If None, uses level 0.

Returns:

Probability matrix of shape (n_subjects, n_items).

Return type:

torch.Tensor

fit(subject_idx, item_idx, facet_idx, response, max_epochs=4000, lr=0.01, clip_norm=10.0, verbose=True, num_posterior_samples=500)[source]

Fit via Bayesian SVI (Pyro).

Long-form quadruple input: each row is one observation (subject_idx[k], item_idx[k], facet_idx[k]) -> response[k].

Priors:

  • ability ~ Normal(0, 1)

  • difficulty ~ Normal(0, 1)

  • discrimination ~ LogNormal(0.5, 0.5) (positive)

  • gamma_raw ~ Normal(0, 1), then gamma = gamma_raw * gamma_mask

  • tau_scale ~ HalfCauchy(1); tau_raw ~ StudentT(1, 0, scale) with scale=0.01 at anchor cells, tau_scale elsewhere; then tau = tau_raw * tau_mask

  • delta_raw ~ Normal(0, 0.5), then delta = delta_raw * gamma_mask

Parameters:
  • subject_idx (torch.LongTensor) – Long-form indices, each shape (n_obs,).

  • item_idx (torch.LongTensor) – Long-form indices, each shape (n_obs,).

  • facet_idx (torch.LongTensor) – Long-form indices, each shape (n_obs,).

  • response (torch.Tensor) – Binary observations, shape (n_obs,).

  • max_epochs (int) – Number of SVI steps.

  • lr (float) – Learning rate for ClippedAdam.

  • clip_norm (float) – Gradient clipping norm.

  • verbose (bool) – Show tqdm progress bar if available.

  • num_posterior_samples (int) – Posterior samples for parameter extraction.

Returns:

{"losses": list[float], "posterior": {param_name: Tensor}} where posterior holds the posterior means used to populate the model’s parameter slots.

Return type:

dict

class torch_measure.models.TestletRasch(n_subjects, n_items, testlet_map, device='cpu')[source]
Parameters:
property testlet_scale: Tensor

Empirical standard deviation of testlet effects per testlet.

Returns:

Shape (n_testlets,).

Return type:

torch.Tensor

predict(query)[source]

Compute P(correct) at query rows, including testlet random effects.

Parameters:

query (dict[str, Tensor])

Return type:

Tensor

class torch_measure.models.LogisticFM(n_subjects, n_items, n_factors=2, device='cpu')[source]

K-factor Logistic Factor Model.

P(correct) = sigmoid(U @ V^T + Z^T) where: - U: (n_subjects, K) latent ability factors - V: (n_items, K) item loadings on factors - Z: (n_items,) item intercepts (easiness)

When K=1, this is equivalent to the Rasch model.

Parameters:
  • n_subjects (int) – Number of subjects.

  • n_items (int) – Number of items.

  • n_factors (int) – Number of latent factors (K).

  • device (str) – Device to place parameters on.

property ability: Tensor

Subject ability factors (n_subjects, K).

property difficulty: Tensor

Item intercepts (n_items,). Negative Z = harder items.

property loadings: Tensor

Item factor loadings (n_items, K).

predict(query)[source]

Compute P(correct) = sigmoid(U_s · V_i + Z_i) at query rows.

Parameters:

query (dict[str, Tensor])

Return type:

Tensor

class torch_measure.models.Bifactor(n_subjects, n_items, n_groups, item_groups, device='cpu')[source]

Bifactor Model.

A constrained factor model with one general factor and multiple group-specific factors. The general factor loads on all items, while group factors load only on items in their cluster.

P(correct) = sigmoid(g_n * lambda_g_j + sum_k(s_nk * lambda_sk_j) + z_j)

Parameters:
  • n_subjects (int) – Number of subjects.

  • n_items (int) – Number of items.

  • n_groups (int) – Number of group-specific factors.

  • item_groups (torch.Tensor) – Group assignment for each item, shape (n_items,). Values in [0, n_groups).

  • device (str) – Device.

predict(query)[source]

Compute P(correct) at query rows using general + group factors.

Parameters:

query (dict[str, Tensor])

Return type:

Tensor

torch_measure.models.build_testlet_map(item_ids, separator=':')[source]

Build a testlet mapping from hierarchical item identifiers.

Parameters:
  • item_ids (list[str]) – Item identifiers with testlet structure, e.g. ["task_1:0", "task_1:1", "task_2:0", ...]. The prefix before separator identifies the testlet.

  • separator (str) – Delimiter between testlet name and sub-item index.

Returns:

  • testlet_map (torch.Tensor) – Integer tensor of shape (n_items,) mapping each item to its testlet index.

  • testlet_names (list[str]) – Ordered list of unique testlet names (first-seen order).

Return type:

tuple[Tensor, list[str]]

torch_measure.models.varimax_rotation(loadings, max_iter=100, tol=1e-06)[source]

Apply Varimax rotation to factor loadings.

Varimax maximizes the variance of squared loadings within each factor, producing a simpler structure.

Parameters:
  • loadings (torch.Tensor) – Factor loading matrix (n_items, n_factors).

  • max_iter (int) – Maximum iterations.

  • tol (float) – Convergence tolerance.

Returns:

  • rotated_loadings (torch.Tensor) – Rotated loading matrix (n_items, n_factors).

  • rotation_matrix (torch.Tensor) – Rotation matrix (n_factors, n_factors).

Return type:

tuple[Tensor, Tensor]

torch_measure.models.promax_rotation(loadings, power=4, **kwargs)[source]

Apply Promax (oblique) rotation to factor loadings.

Promax starts with Varimax and then applies a power transformation to achieve simple structure while allowing correlated factors.

Parameters:
  • loadings (torch.Tensor) – Factor loading matrix (n_items, n_factors).

  • power (int) – Power parameter for the Promax transformation.

Returns:

  • rotated_loadings (torch.Tensor) – Promax-rotated loadings.

  • rotation_matrix (torch.Tensor) – Rotation matrix.

Return type:

tuple[Tensor, Tensor]

torch_measure.models.bifactor_rotation(U, V, Z)[source]

Apply bifactor rotation: whiten, then Varimax, then separate general factor.

Parameters:
Returns:

  • U_rot (torch.Tensor) – Rotated abilities.

  • V_rot (torch.Tensor) – Rotated loadings.

  • Z (torch.Tensor) – Unchanged intercepts.

Return type:

tuple[Tensor, Tensor, Tensor]

class torch_measure.models.NCF(encoder, embedding_dim, encode_batch_size=256, hidden_dim=256, n_layers=3, dropout=0.1, device='cpu')[source]

Neural Collaborative Filter predictive model.

A neural network model to predict response matrix entries.

Architecture: - Sentence embeddings for both subject and item content - Small MLP head trained offline on training data

Parameters:
  • encoder (SentenceTransformer) – Pre-trained transformer model used to embed subject and item content.

  • embedding_dim (int) – Output dimension of the encoder model.

  • encode_batch_size (int) – Batch size used to embed subject and item content.

  • hidden_dim (int) – Dimension of hidden layers.

  • n_layers (int) – Number of layers (minimum 1).

  • dropout (float) – Dropout rate between layers.

  • device (str) – Device to place parameters on.

encode_batch(subjects, items)[source]

Encode a batch of subject-item pairs.

Parameters:
Return type:

tuple[Tensor, Tensor]

load_head(path)[source]

Load pre-trained NCFHead weights from a state dict file.

Parameters:

path (str)

Return type:

None

load_embeddings(path)[source]

Load pre-computed subject and item embeddings from a checkpoint file.

Parameters:

path (str) – Path to the embeddings checkpoint saved by torch.save with keys "subject_embeddings" and "item_embeddings".

Returns:

Subject embeddings and item embeddings, respectively.

Return type:

tuple[torch.Tensor, torch.Tensor]

predict(data, labeled)[source]

Compute response probability P(subject passes item).

  1. Compute raw NCF probability.

  2. On first call of a round with labels available, fit Platt scaler.

  3. Apply calibrated scaling and return.

Parameters:
  • data (dict) – Dictionary with keys "subject_content" (str) and "item_content" (str) containing the raw text for the subject and item to score.

  • labeled (list[dict]) – Previously observed subject-item-response records.

Returns:

Predicted probability that the subject passes the item, clipped to [1e-7, 1 - 1e-7].

Return type:

float

class torch_measure.models.LLMJudge(model_id='Qwen/Qwen2-7B-Instruct', max_icl=5, batch_size=32, device='auto')[source]

LLM-as-judge predictive model.

Uses the next-token yes/no log-probability ratio from a causal language model to predict whether a subject would answer an item correctly. Optionally prepends same-subject in-context examples from labeled.

Parameters:
  • model_id (str) – HuggingFace model identifier.

  • max_icl (int) – Maximum number of same-subject labeled examples to prepend as in-context demonstrations.

  • batch_size (int) – Batch size for LLM inference.

  • device (str) – Device passed to device_map. Use "auto" for multi-GPU.

predict(data, labeled=None)[source]

Compute response probability P(subject passes item).

Parameters:
  • data (dict) – Dictionary with keys "subject_content", "item_content", "benchmark", and "condition".

  • labeled (list[dict] or None) – Previously observed subject-item-response records with keys "subject_content", "item_content", "benchmark", "condition", and "label" (float in [0, 1]). Same-subject records are prepended as in-context examples.

Returns:

Predicted probability that the subject passes the item, clipped to [1e-7, 1 - 1e-7].

Return type:

float

IRT Models

class torch_measure.models.Rasch(n_subjects, n_items, device='cpu')[source]

Rasch (1-Parameter Logistic) IRT model.

The simplest IRT model where P(correct) = sigmoid(theta - b): - theta: subject ability (one per subject) - b: item difficulty (one per item)

No discrimination or guessing parameters.

Parameters:
  • n_subjects (int) – Number of subjects (test-takers / models).

  • n_items (int) – Number of items (test questions / benchmark tasks).

  • device (str) – Device to place parameters on.

predict(query)[source]

Compute P(correct) = sigmoid(ability - difficulty) at query rows.

Parameters:

query (dict[str, Tensor])

Return type:

Tensor

class torch_measure.models.TwoPL(n_subjects, n_items, device='cpu')[source]

2-Parameter Logistic IRT model.

P(correct) = sigmoid(a * (theta - b)) where: - theta: subject ability - b: item difficulty - a: item discrimination (how well the item differentiates abilities)

Parameters:
  • n_subjects (int) – Number of subjects.

  • n_items (int) – Number of items.

  • device (str) – Device to place parameters on.

property discrimination: Tensor

Item discrimination parameters (constrained to be positive).

predict(query)[source]

Compute P(correct) = sigmoid(a * (theta - b)) at query rows.

Parameters:

query (dict[str, Tensor])

Return type:

Tensor

class torch_measure.models.ThreePL(n_subjects, n_items, device='cpu')[source]

3-Parameter Logistic IRT model.

P(correct) = c + (1 - c) * sigmoid(a * (theta - b)) where: - theta: subject ability - b: item difficulty - a: item discrimination - c: guessing parameter (lower asymptote)

Parameters:
  • n_subjects (int) – Number of subjects.

  • n_items (int) – Number of items.

  • device (str) – Device to place parameters on.

property discrimination: Tensor

Item discrimination parameters (constrained positive).

property guessing: Tensor

Item guessing parameters (constrained to [0, 1]).

predict(query)[source]

Compute P(correct) = c + (1-c) * sigmoid(a * (theta - b)) at query rows.

Parameters:

query (dict[str, Tensor])

Return type:

Tensor

class torch_measure.models.AmortizedIRT(n_subjects, n_items, embedding_dim, hidden_dim=256, n_layers=3, pl=2, dropout=0.1, device='cpu')[source]

Amortized IRT model.

Instead of learning independent parameters for each item, this model learns a mapping from item embeddings to item parameters (difficulty, discrimination, guessing). This enables zero-shot prediction on new items given their embeddings.

P(correct) = c + (1-c) * sigmoid(a * (theta - b))

where b, a, c = f(embedding) are predicted by a neural network.

Parameters:
  • n_subjects (int) – Number of subjects.

  • n_items (int) – Number of items.

  • embedding_dim (int) – Dimension of item embeddings.

  • hidden_dim (int) – Hidden dimension for the embedding projection network.

  • n_layers (int) – Number of layers in the projection network.

  • pl (int) – Number of IRT parameters: 1 (Rasch), 2 (+discrimination), 3 (+guessing).

  • dropout (float) – Dropout rate in the projection network.

  • device (str) – Device to place parameters on.

set_embeddings(embeddings)[source]

Set item embeddings for parameter prediction.

Parameters:

embeddings (torch.Tensor) – Item embeddings of shape (n_items, embedding_dim).

Return type:

None

property difficulty: Tensor

Predicted item difficulties from embeddings.

property discrimination: Tensor | None

Predicted item discriminations from embeddings (2PL/3PL only).

property guessing: Tensor | None

Predicted item guessing parameters from embeddings (3PL only).

predict(query)[source]

Compute P(correct) at query rows using amortized item parameters.

Parameters:

query (dict[str, Tensor])

Return type:

Tensor

fit(data, embeddings, mask=None, max_epochs=1000, lr=0.001, weight_decay=0.0001, verbose=True, **kwargs)[source]

Fit the amortized IRT model.

Parameters:
  • data (LongFormData | torch.Tensor) – Long-form dataset (preferred) or wide-form response tensor.

  • embeddings (torch.Tensor) – Item embeddings (n_items, embedding_dim).

  • mask (torch.Tensor | None) – Boolean mask for observed entries (only used with wide-form input).

  • max_epochs (int) – Maximum training epochs.

  • lr (float) – Learning rate.

  • weight_decay (float) – Weight decay for Adam optimizer.

  • verbose (bool) – Show progress bar.

Returns:

Training history.

Return type:

dict

class torch_measure.models.MultiFacetRasch(n_subjects, n_items, n_facet_levels, device='cpu')[source]

Many-Facet Rasch Model.

Extends the standard Rasch model with additional facets to model systematic sources of variation beyond ability and difficulty.

Parameters:
  • n_subjects (int) – Number of subjects.

  • n_items (int) – Number of items.

  • n_facet_levels (int) – Number of levels in the additional facet (e.g., number of languages).

  • device (str) – Device to place parameters on.

set_reference_level(level_idx)[source]

Set a facet level as the reference (anchor to zero).

Parameters:

level_idx (int) – Index of the reference level (e.g., 0 for English).

Return type:

None

predict(query)[source]

Compute P(correct) at query rows for the given facet level(s).

Query must contain subject_idx and item_idx (1-D, length N). Optionally facet_idx (1-D, length N or scalar). When omitted, defaults to facet level 0 — matches the prior behavior where fitting did not surface facet information.

Parameters:

query (dict[str, Tensor])

Return type:

Tensor

fit(response_matrix, mask=None, method='mle', **kwargs)[source]

Fit the model.

Supports all fitting methods: ‘mle’, ‘em’, ‘jml’, ‘svi’.

Beta IRT Models

class torch_measure.models.BetaRasch(n_subjects, n_items, phi=10.0, device='cpu')[source]

Beta-Rasch (1PL) IRT model.

Identical to Rasch in prediction: mu = sigmoid(theta - b). Uses Beta NLL loss instead of Bernoulli NLL for fitting, allowing continuous responses in (0, 1) such as empirical probabilities.

Parameters:
  • n_subjects (int) – Number of subjects (test-takers / models).

  • n_items (int) – Number of items (test questions / benchmark tasks).

  • phi (float) – Beta distribution precision parameter. Higher values mean tighter concentration around the predicted mean. Default 10.0.

  • device (str) – Device to place parameters on.

References

fit(response_matrix, mask=None, method='mle', max_epochs=1000, lr=0.01, verbose=True, **kwargs)[source]

Fit the Beta-Rasch model using Beta NLL loss.

Parameters:
  • response_matrix (torch.Tensor) – Continuous response matrix with values in (0, 1), shape (n_subjects, n_items). Values must be strictly between 0 and 1 (exclusive).

  • mask (torch.Tensor | None) – Boolean mask of entries to use. If None, uses all non-NaN entries.

  • method (str) – Fitting method: “mle”, “em”, or “jml”.

  • max_epochs (int) – Maximum optimization epochs.

  • lr (float) – Learning rate.

  • verbose (bool) – Show progress bar.

Returns:

Training history.

Return type:

dict

class torch_measure.models.BetaTwoPL(n_subjects, n_items, phi=10.0, device='cpu')[source]

Beta-2PL IRT model.

Identical to TwoPL in prediction: mu = sigmoid(a * (theta - b)). Uses Beta NLL loss instead of Bernoulli NLL for fitting, allowing continuous responses in (0, 1) such as empirical probabilities.

Parameters:
  • n_subjects (int) – Number of subjects (test-takers / models).

  • n_items (int) – Number of items (test questions / benchmark tasks).

  • phi (float) – Beta distribution precision parameter. Higher values mean tighter concentration around the predicted mean. Default 10.0.

  • device (str) – Device to place parameters on.

References

fit(response_matrix, mask=None, method='mle', max_epochs=1000, lr=0.01, verbose=True, **kwargs)[source]

Fit the Beta-2PL model using Beta NLL loss.

Parameters:
  • response_matrix (torch.Tensor) – Continuous response matrix with values in (0, 1), shape (n_subjects, n_items). Values must be strictly between 0 and 1 (exclusive).

  • mask (torch.Tensor | None) – Boolean mask of entries to use. If None, uses all non-NaN entries.

  • method (str) – Fitting method: “mle”, “em”, or “jml”.

  • max_epochs (int) – Maximum optimization epochs.

  • lr (float) – Learning rate.

  • verbose (bool) – Show progress bar.

Returns:

Training history.

Return type:

dict

Factor Models

class torch_measure.models.LogisticFM(n_subjects, n_items, n_factors=2, device='cpu')[source]

K-factor Logistic Factor Model.

P(correct) = sigmoid(U @ V^T + Z^T) where: - U: (n_subjects, K) latent ability factors - V: (n_items, K) item loadings on factors - Z: (n_items,) item intercepts (easiness)

When K=1, this is equivalent to the Rasch model.

Parameters:
  • n_subjects (int) – Number of subjects.

  • n_items (int) – Number of items.

  • n_factors (int) – Number of latent factors (K).

  • device (str) – Device to place parameters on.

property ability: Tensor

Subject ability factors (n_subjects, K).

property difficulty: Tensor

Item intercepts (n_items,). Negative Z = harder items.

property loadings: Tensor

Item factor loadings (n_items, K).

predict(query)[source]

Compute P(correct) = sigmoid(U_s · V_i + Z_i) at query rows.

Parameters:

query (dict[str, Tensor])

Return type:

Tensor

class torch_measure.models.Bifactor(n_subjects, n_items, n_groups, item_groups, device='cpu')[source]

Bifactor Model.

A constrained factor model with one general factor and multiple group-specific factors. The general factor loads on all items, while group factors load only on items in their cluster.

P(correct) = sigmoid(g_n * lambda_g_j + sum_k(s_nk * lambda_sk_j) + z_j)

Parameters:
  • n_subjects (int) – Number of subjects.

  • n_items (int) – Number of items.

  • n_groups (int) – Number of group-specific factors.

  • item_groups (torch.Tensor) – Group assignment for each item, shape (n_items,). Values in [0, n_groups).

  • device (str) – Device.

property ability: Tensor
property difficulty: Tensor
predict(query)[source]

Compute P(correct) at query rows using general + group factors.

Parameters:

query (dict[str, Tensor])

Return type:

Tensor

Rotation Utilities

torch_measure.models.varimax_rotation(loadings, max_iter=100, tol=1e-06)[source]

Apply Varimax rotation to factor loadings.

Varimax maximizes the variance of squared loadings within each factor, producing a simpler structure.

Parameters:
  • loadings (torch.Tensor) – Factor loading matrix (n_items, n_factors).

  • max_iter (int) – Maximum iterations.

  • tol (float) – Convergence tolerance.

Returns:

  • rotated_loadings (torch.Tensor) – Rotated loading matrix (n_items, n_factors).

  • rotation_matrix (torch.Tensor) – Rotation matrix (n_factors, n_factors).

Return type:

tuple[Tensor, Tensor]

torch_measure.models.promax_rotation(loadings, power=4, **kwargs)[source]

Apply Promax (oblique) rotation to factor loadings.

Promax starts with Varimax and then applies a power transformation to achieve simple structure while allowing correlated factors.

Parameters:
  • loadings (torch.Tensor) – Factor loading matrix (n_items, n_factors).

  • power (int) – Power parameter for the Promax transformation.

Returns:

  • rotated_loadings (torch.Tensor) – Promax-rotated loadings.

  • rotation_matrix (torch.Tensor) – Rotation matrix.

Return type:

tuple[Tensor, Tensor]

torch_measure.models.bifactor_rotation(U, V, Z)[source]

Apply bifactor rotation: whiten, then Varimax, then separate general factor.

Parameters:
Returns:

  • U_rot (torch.Tensor) – Rotated abilities.

  • V_rot (torch.Tensor) – Rotated loadings.

  • Z (torch.Tensor) – Unchanged intercepts.

Return type:

tuple[Tensor, Tensor, Tensor]