Psychometric Metrics¶

Psychometric metrics for measurement analysis.

torch_measure.metrics.tetrachoric_correlation(data, min_pairs=5)[source]¶

Compute the tetrachoric correlation matrix for binary data.

Uses the cosine-pi approximation:: r = cos(pi / (1 + sqrt(AD / BC)))

where A, B, C, D are the counts in the 2x2 contingency table for each pair of items.

Parameters:

data (torch.Tensor) – Binary response matrix (n_subjects, n_items) with values 0, 1, or NaN.
min_pairs (int) – Minimum number of valid pairs required. Pairs with fewer observations get correlation 0.

Returns:

Tetrachoric correlation matrix of shape (n_items, n_items).

Return type:

torch.Tensor

torch_measure.metrics.point_biserial_correlation(continuous, binary)[source]¶

Compute point-biserial correlation between continuous and binary variables.

Parameters:

continuous (torch.Tensor) – Continuous variable (e.g., total score) of shape (N,).
binary (torch.Tensor) – Binary variable (e.g., item response) of shape (N,) or (N, M).

Returns:

Correlation(s). Scalar if binary is 1D, shape (M,) if 2D.

Return type:

torch.Tensor

torch_measure.metrics.infit_statistics(predicted, observed, mask=None)[source]¶

Compute Rasch infit (information-weighted) mean square statistics per item.

Infit is sensitive to unexpected responses near item difficulty. Values near 1.0 indicate good fit. Values > 1.3 indicate underfit (noise), values < 0.7 indicate overfit (Guttman pattern).

Parameters:

predicted (torch.Tensor) – Predicted probabilities (n_subjects, n_items).
observed (torch.Tensor) – Observed binary responses (n_subjects, n_items).
mask (torch.Tensor | None) – Boolean mask of entries to include.

Returns:

Infit statistics per item, shape (n_items,).

Return type:

torch.Tensor

torch_measure.metrics.outfit_statistics(predicted, observed, mask=None)[source]¶

Compute Rasch outfit (unweighted) mean square statistics per item.

Outfit is sensitive to unexpected responses far from item difficulty.

Parameters:

predicted (torch.Tensor) – Predicted probabilities (n_subjects, n_items).
observed (torch.Tensor) – Observed binary responses (n_subjects, n_items).
mask (torch.Tensor | None) – Boolean mask of entries to include.

Returns:

Outfit statistics per item, shape (n_items,).

Return type:

torch.Tensor

torch_measure.metrics.item_total_correlation(data, mask=None)[source]¶

Compute corrected item-total correlation for each item.

For each item, computes the Pearson correlation between the item responses and the total score excluding that item.

Parameters:

data (torch.Tensor) – Binary response matrix (n_subjects, n_items).
mask (torch.Tensor | None) – Boolean mask.

Returns:

Item-total correlations, shape (n_items,).

Return type:

torch.Tensor

torch_measure.metrics.cronbach_alpha(data, mask=None)[source]¶

Compute Cronbach’s alpha reliability coefficient.

Parameters:

data (torch.Tensor) – Response matrix (n_subjects, n_items).
mask (torch.Tensor | None) – Boolean mask.

Returns:

Cronbach’s alpha.

Return type:

float

torch_measure.metrics.mokken_scalability(data, mask=None)[source]¶

Compute Mokken scalability coefficients.

Mokken scaling is a non-parametric IRT approach that tests whether items form a unidimensional scale. The H coefficient measures how well item pairs conform to the Guttman pattern.

H >= 0.5: strong scale 0.4 <= H < 0.5: medium scale 0.3 <= H < 0.4: weak scale H < 0.3: not a scale

Parameters:

data (torch.Tensor) – Binary response matrix (n_subjects, n_items).
mask (torch.Tensor | None) – Boolean mask.

Returns:

Dictionary with: - ‘H’: Overall scalability coefficient - ‘H_items’: Per-item scalability coefficients, shape (n_items,) - ‘H_pairs’: Pairwise scalability matrix, shape (n_items, n_items)

Return type:

dict

torch_measure.metrics.expected_calibration_error(predicted, observed, mask=None, n_bins=15)[source]¶

Compute Expected Calibration Error (ECE).

Measures how well predicted probabilities match observed frequencies. ECE = 0 means perfectly calibrated.

Parameters:

predicted (torch.Tensor) – Predicted probabilities.
observed (torch.Tensor) – Observed binary outcomes.
mask (torch.Tensor | None) – Boolean mask of entries to evaluate.
n_bins (int) – Number of calibration bins.

Returns:

ECE value in [0, 1].

Return type:

float

torch_measure.metrics.brier_score(predicted, observed, mask=None)[source]¶

Compute the Brier score (mean squared error of probabilities).

Parameters:

predicted (torch.Tensor) – Predicted probabilities.
observed (torch.Tensor) – Observed binary outcomes.
mask (torch.Tensor | None) – Boolean mask.

Returns:

Brier score in [0, 1]. Lower is better.

Return type:

float

torch_measure.metrics.differential_item_functioning(data, group, mask=None, method='mh')[source]¶

Detect Differential Item Functioning (DIF).

DIF occurs when subjects of equal ability from different groups have different probabilities of answering an item correctly.

Parameters:

data (torch.Tensor) – Binary response matrix (n_subjects, n_items).
group (torch.Tensor) – Group membership for each subject (n_subjects,). Binary (0/1).
mask (torch.Tensor | None) – Boolean mask.
method (str) – DIF detection method. Currently supports “mh” (Mantel-Haenszel).

Returns:

Dictionary with: - ‘mh_statistic’: Mantel-Haenszel chi-square per item, shape (n_items,) - ‘effect_size’: MH odds ratio (Delta-MH) per item, shape (n_items,) - ‘flagged’: Boolean mask of items flagged for DIF

Return type:

dict

torch_measure.metrics.ability_standard_errors(ability, difficulty, discrimination=None)[source]¶

Compute standard errors for ability estimates.

SE(theta_i) = 1 / sqrt(sum_j I_j(theta_i)), where I_j is the Fisher information of item j evaluated at theta_i.

Parameters:

ability (torch.Tensor) – Subject ability values, shape (N,).
difficulty (torch.Tensor) – Item difficulty values, shape (M,).
discrimination (torch.Tensor | None) – Item discrimination values, shape (M,). Defaults to 1 (Rasch).

Returns:

Standard errors per subject, shape (N,).

Return type:

torch.Tensor

torch_measure.metrics.difficulty_standard_errors(ability, difficulty, response_matrix, discrimination=None, mask=None)[source]¶

Compute standard errors for difficulty estimates.

SE(b_j) = 1 / sqrt(sum_i I_j(theta_i)) over observed subjects for each item, where I_j(theta_i) = a_j^2 * P_ij * Q_ij.

Parameters:

ability (torch.Tensor) – Subject ability values, shape (N,).
difficulty (torch.Tensor) – Item difficulty values, shape (M,).
response_matrix (torch.Tensor) – Response matrix, shape (N, M). Used only for determining observed entries.
discrimination (torch.Tensor | None) – Item discrimination values, shape (M,). Defaults to 1 (Rasch).
mask (torch.Tensor | None) – Boolean mask of observed entries, shape (N, M). If None, all non-NaN entries are treated as observed.

Returns:

Standard errors per item, shape (M,).

Return type:

torch.Tensor

torch_measure.metrics.discrimination_standard_errors(ability, difficulty, discrimination, response_matrix, mask=None)[source]¶

Compute standard errors for discrimination estimates.

I(a_j) = sum_i (theta_i - b_j)^2 * P_ij * Q_ij over observed subjects. SE(a_j) = 1 / sqrt(I(a_j)).

Parameters:

ability (torch.Tensor) – Subject ability values, shape (N,).
difficulty (torch.Tensor) – Item difficulty values, shape (M,).
discrimination (torch.Tensor) – Item discrimination values, shape (M,).
response_matrix (torch.Tensor) – Response matrix, shape (N, M). Used only for determining observed entries.
mask (torch.Tensor | None) – Boolean mask of observed entries, shape (N, M). If None, all non-NaN entries are treated as observed.

Returns:

Standard errors per item, shape (M,).

Return type:

torch.Tensor

torch_measure.metrics.strength_centrality(adjacency)[source]¶

Node strength: sum of absolute edge weights.

The most widely used centrality measure in network psychometrics. A high-strength node has strong connections (in absolute value) with many other nodes.

Parameters:: adjacency (torch.Tensor) – Symmetric edge-weight matrix (n_items, n_items), zero diagonal.
Returns:: Strength per node, shape (n_items,).
Return type:: torch.Tensor

torch_measure.metrics.expected_influence(adjacency)[source]¶

Expected influence: signed sum of edge weights.

Unlike strength, this is sensitive to the polarity of edges and can be negative for nodes connected primarily by negative edges. Proposed by Robinaugh et al. (2016) for signed networks (e.g., symptom networks).

Parameters:: adjacency (torch.Tensor) – Symmetric edge-weight matrix (n_items, n_items), zero diagonal.
Returns:: Expected influence per node, shape (n_items,).
Return type:: torch.Tensor

References

torch_measure.metrics.closeness_centrality(adjacency)[source]¶

Closeness centrality: normalised reciprocal of mean shortest-path distance.

Defined as (reachable − 1) / Σ dist(i, j) over all reachable j ≠ i, matching the Wasserman-Faust normalisation for possibly disconnected graphs.

Parameters:: adjacency (torch.Tensor) – Symmetric edge-weight matrix (n_items, n_items), zero diagonal.
Returns:: Closeness scores per node, shape (n_items,). Zero for isolated nodes.
Return type:: torch.Tensor

torch_measure.metrics.betweenness_centrality(adjacency)[source]¶

Node betweenness centrality.

For each node v, counts the fraction of (s, t) pairs (s < t, s ≠ v, t ≠ v) for which v lies on a shortest path. A node on a shortest path satisfies

dist(s, v) + dist(v, t) ≈ dist(s, t).

The result is normalised by (n−1)(n−2)/2, the total number of source–target pairs.

Parameters:: adjacency (torch.Tensor) – Symmetric edge-weight matrix (n_items, n_items), zero diagonal.
Returns:: Betweenness per node in [0, 1], shape (n_items,).
Return type:: torch.Tensor

Correlation¶

torch_measure.metrics.tetrachoric_correlation(data, min_pairs=5)[source]¶

Compute the tetrachoric correlation matrix for binary data.

Uses the cosine-pi approximation:: r = cos(pi / (1 + sqrt(AD / BC)))

where A, B, C, D are the counts in the 2x2 contingency table for each pair of items.

Parameters:

data (torch.Tensor) – Binary response matrix (n_subjects, n_items) with values 0, 1, or NaN.
min_pairs (int) – Minimum number of valid pairs required. Pairs with fewer observations get correlation 0.

Returns:

Tetrachoric correlation matrix of shape (n_items, n_items).

Return type:

torch.Tensor

torch_measure.metrics.point_biserial_correlation(continuous, binary)[source]¶

Compute point-biserial correlation between continuous and binary variables.

Parameters:

continuous (torch.Tensor) – Continuous variable (e.g., total score) of shape (N,).
binary (torch.Tensor) – Binary variable (e.g., item response) of shape (N,) or (N, M).

Returns:

Correlation(s). Scalar if binary is 1D, shape (M,) if 2D.

Return type:

torch.Tensor

Reliability¶

torch_measure.metrics.infit_statistics(predicted, observed, mask=None)[source]¶

Compute Rasch infit (information-weighted) mean square statistics per item.

Infit is sensitive to unexpected responses near item difficulty. Values near 1.0 indicate good fit. Values > 1.3 indicate underfit (noise), values < 0.7 indicate overfit (Guttman pattern).

Parameters:

predicted (torch.Tensor) – Predicted probabilities (n_subjects, n_items).
observed (torch.Tensor) – Observed binary responses (n_subjects, n_items).
mask (torch.Tensor | None) – Boolean mask of entries to include.

Returns:

Infit statistics per item, shape (n_items,).

Return type:

torch.Tensor

torch_measure.metrics.outfit_statistics(predicted, observed, mask=None)[source]¶

Compute Rasch outfit (unweighted) mean square statistics per item.

Outfit is sensitive to unexpected responses far from item difficulty.

Parameters:

predicted (torch.Tensor) – Predicted probabilities (n_subjects, n_items).
observed (torch.Tensor) – Observed binary responses (n_subjects, n_items).
mask (torch.Tensor | None) – Boolean mask of entries to include.

Returns:

Outfit statistics per item, shape (n_items,).

Return type:

torch.Tensor

torch_measure.metrics.item_total_correlation(data, mask=None)[source]¶

Compute corrected item-total correlation for each item.

For each item, computes the Pearson correlation between the item responses and the total score excluding that item.

Parameters:

data (torch.Tensor) – Binary response matrix (n_subjects, n_items).
mask (torch.Tensor | None) – Boolean mask.

Returns:

Item-total correlations, shape (n_items,).

Return type:

torch.Tensor

torch_measure.metrics.cronbach_alpha(data, mask=None)[source]¶

Compute Cronbach’s alpha reliability coefficient.

Parameters:

data (torch.Tensor) – Response matrix (n_subjects, n_items).
mask (torch.Tensor | None) – Boolean mask.

Returns:

Cronbach’s alpha.

Return type:

float

Calibration¶

torch_measure.metrics.expected_calibration_error(predicted, observed, mask=None, n_bins=15)[source]¶

Compute Expected Calibration Error (ECE).

Measures how well predicted probabilities match observed frequencies. ECE = 0 means perfectly calibrated.

Parameters:

predicted (torch.Tensor) – Predicted probabilities.
observed (torch.Tensor) – Observed binary outcomes.
mask (torch.Tensor | None) – Boolean mask of entries to evaluate.
n_bins (int) – Number of calibration bins.

Returns:

ECE value in [0, 1].

Return type:

float

torch_measure.metrics.brier_score(predicted, observed, mask=None)[source]¶

Compute the Brier score (mean squared error of probabilities).

Parameters:

predicted (torch.Tensor) – Predicted probabilities.
observed (torch.Tensor) – Observed binary outcomes.
mask (torch.Tensor | None) – Boolean mask.

Returns:

Brier score in [0, 1]. Lower is better.

Return type:

float

Scalability¶

torch_measure.metrics.mokken_scalability(data, mask=None)[source]¶

Compute Mokken scalability coefficients.

Mokken scaling is a non-parametric IRT approach that tests whether items form a unidimensional scale. The H coefficient measures how well item pairs conform to the Guttman pattern.

H >= 0.5: strong scale 0.4 <= H < 0.5: medium scale 0.3 <= H < 0.4: weak scale H < 0.3: not a scale

Parameters:

data (torch.Tensor) – Binary response matrix (n_subjects, n_items).
mask (torch.Tensor | None) – Boolean mask.

Returns:

Dictionary with: - ‘H’: Overall scalability coefficient - ‘H_items’: Per-item scalability coefficients, shape (n_items,) - ‘H_pairs’: Pairwise scalability matrix, shape (n_items, n_items)

Return type:

dict