dataFormat()'s id argument, and longdataFormat()'s Sid/Qid/Resp/w
arguments, now accept a column name (character string) in addition to a
column number. Previously only numeric column indices were accepted;
dataFormat() even raised an explicit error when a character value was
passed. Column names can now be supplied directly, e.g.
dataFormat(data, id = "StudentID") or
longdataFormat(data, Sid = "Sid", Qid = "Qid", Resp = "Resp"). Specifying
an unknown or ambiguous (duplicated) column name now raises a clear error
listing the available columns (R/01_dataFormat.R).An API audit across all exported model functions surfaced a number of gratuitous inconsistencies in argument names, argument order, and defaults. They are unified in this release; the changes below alter default behavior or (rarely) positional-argument calls, so please review if you relied on the old defaults.
The verbose argument now defaults to FALSE in every function that has
one. Previously IRT(), GRM(), LCA(), GridSearch(), all
Biclustering() methods, all Biclustering_IRM() methods, BNM_GA(),
and BNM_PBIL() defaulted to TRUE while LRA(), LDLRA(), LDB(),
BINET(), LDLRA_PBIL(), Glasso(), and chatterjee_matrix() defaulted
to FALSE. Estimation results are unaffected; pass verbose = TRUE to
restore progress messages. Several roxygen pages that claimed
"default is TRUE" for functions whose default was already FALSE were
corrected as well.
Biclustering_IRM()'s iteration-limit argument is renamed from
max_iter to maxiter, matching Biclustering(), LCA(), LRA(), and
the rest of the package. Because the IRM methods accept ..., a
maxiter = argument passed under the old assumption was silently
swallowed with no effect — an easy trap when switching between
Biclustering() and Biclustering_IRM(). The old name is still honored
(with a deprecation warning) and will be removed in v2.0.0.
Biclustering_IRM() for ordinal data now defaults to mic = FALSE and
EM_limit = 20, matching LRA()/Biclustering() (ordinal) and the
other Biclustering_IRM() methods respectively. It previously defaulted
to mic = TRUE and EM_limit = 100 with no documented rationale. Pass
mic = TRUE explicitly if you want monotonically increasing Field
Reference Profiles.
BNM(), LDLRA(), LDB(), BINET(), BNM_GA(), BNM_PBIL(),
LDLRA_PBIL(), and Biclustering_IRM() (binary) now declare their
missing-data arguments in the order na, Z, w — the order used by
CTT(), IRT(), GRM(), LCA(), LRA(), and Biclustering() — where
they previously used Z, w, na. Only calls that passed these three
arguments positionally are affected; named usage (Z = ...) is
unchanged.
Biclustering() for rated data now exposes conf_class in its signature
and documentation. It was already forwarded to the internal nominal
engine via ..., but never appeared on the help page. Also documented
that the rated method defaults to method = "R" (Ranklustering, classes
sorted by correct response rate) while binary/ordinal default to
method = "B".
LDLRA()'s beta1/beta2 defaults remain 2 (the other network models
use 1); the help page now notes this follows the original Mathematica
implementation of LDLRA rather than being an accident.
longdataFormat() falsely reported "Duplicated IDs found" whenever a
student answered more than one item. The duplicate check ran on the raw
student-ID column, which is expected to repeat once per item in long
format; it now checks for duplicated (student, item) pairs instead, so
ordinary long-format data (one row per student-item response) no longer
triggers a spurious error. A genuine duplicate — the same student
answering the same item twice — is still caught (R/01_dataFormat.R).
IRT(), Biclustering(), BNM(), LDLRA(), LDB(), and BINET() silently
treated missing responses as incorrect when computing item-total correlation
or correct response rate (crr()). In each case, a response matrix that
had already been stripped of its exametrika class (with missing values
recoded to 0 via tmp$U * tmp$Z) was passed to ItemTotalCorr(),
ItemThreshold(), or crr(). Those functions re-run dataFormat()
internally when given an unclassed matrix, but without the original missing
-value mask, so missing responses were counted as incorrect answers. This
affected IRT()'s initial parameter values (rho/tau, which only seed
the EM algorithm — final estimates were unaffected), Biclustering()'s
FieldAnalysis$CRR column, and the $crr field returned by BNM(),
LDLRA(), LDB(), and BINET() (including their BNM_GA()/BNM_PBIL()/
LDLRA_PBIL() wrappers). Fixed by passing the already-formatted object
through instead of the stripped matrix (R/04C_ParameterEstimation.R,
R/07_Biclustering.R, R/08A_BNM.R, R/09_LDLRA.R, R/10_LDB.R,
R/11_BINET.R).
CTT(), BNM(), LDLRA(), LDB(), and BINET() crashed when passed raw
(unformatted) matrix/data.frame input instead of a pre-built
exametrika object, even though this is documented as supported. The
response-type check in each function read U$response.type from the raw
input argument instead of from the dataFormat()-formatted object, which
is NULL for raw input and made the check error out before model fitting
ever started. CTT() had a more severe variant of the same bug: its
inherits(U, "exametrika") branch was inverted, so raw input skipped
dataFormat() entirely and crashed immediately. Fixed to consistently
check the formatted object (R/03_CTT.R, R/08A_BNM.R, R/09_LDLRA.R,
R/10_LDB.R, R/11_BINET.R).
BINET() could crash with a cryptic 'dimnames' length error (via NaN
values silently produced by log()) on data with missing responses. Two
compounding issues were found. First, Ccj (the per-class correct-response
count used throughout the RISP estimation) was computed as
t(clsmemb) %*% tmp$U, which does not mask out missing cells (encoded as
-1 in tmp$U) the way the corresponding incorrect-response count Fcj
already did (t(clsmemb) %*% (tmp$Z * (1 - tmp$U))); missing responses
therefore contributed -1 to the correct-response count instead of 0,
corrupting the conditional correct-response-rate estimates. Second, the
smoothing constant that keeps those conditional rates away from 0/0 when
a class-by-field cell has no non-missing observations was hardcoded to
gamp <- 1 (a flat Beta(1,1) prior, under which 0/0 is undefined) with
no way for the caller to change it, unlike the equivalent beta1/beta2
arguments already exposed by BNM(), LDLRA(), and LDB(). Fixed the
Ccj computation to mask missing cells via tmp$Z, added beta1 = 1 /
beta2 = 1 arguments to BINET() so the prior can be strengthened when
needed, and replaced the eventual downstream crash with an immediate,
actionable error when a class-by-field cell is still undefined under the
requested smoothing (R/11_BINET.R).
0-indexed or non-contiguous polytomous category codes were silently
dropped in Biclustering()/Biclustering_IRM() for nominal and ordinal
data, with no error or warning. Biclustering.nominal(),
Biclustering.ordinal(), Biclustering_IRM.nominal(), and
Biclustering_IRM.ordinal() (and, via delegation, the rated/rated_IRM
variants) built a one-hot response array by indexing directly with each
item's raw category code; a raw code of 0 is a no-op under R's
array-indexing rules, so every response in category 0 vanished from the
likelihood and classification with no diagnostic. On the package's own
J15S3810 sample data (categories coded 0..3), this silently discarded
3,854 observed responses. GRM() already remapped each item's categories
to contiguous 1-based indices to avoid exactly this; that remap is now
shared (remap_category_codes(), R/00_BiclucterUtils.R) and applied in
all four functions.
Biclustering_IRM.nominal()/Biclustering_IRM.ordinal() corrupted
results whenever the data had missing responses. The one-hot response
array was built by writing directly at Uq[s, j, tmp$Q[s, j]] <- 1
without checking for missing cells; a missing cell's -1 sentinel, used
as a raw array index, sets every category except the first to 1 under
R's negative-indexing rules instead of leaving the cell at 0. This fed
through into the Gibbs sampler's small-class-adjustment step and the final
class/field assignments. With 8% missing data injected into J20S600,
7% of final class assignments differed from the correct (missing-masked)
result. Fixed by building the one-hot array with the same
missing-value-masked construction already used in the binary/nominal/
ordinal Biclustering() path.
LDB() crashed with "object 'conf_mat' not found" when conf was
passed as a matrix or data.frame (only the vector form of conf worked).
This is the same class of bug already fixed in Biclustering.ordinal()/
Biclustering.nominal() on 2026-04-27, which LDB() was missed by at the
time (R/10_LDB.R).
LDB()'s beta1/beta2 smoothing arguments had their roles swapped
relative to BNM()/LDLRA()/BINET(). All four functions compute a
posterior-mode correct-response rate of the form
(count + α - 1) / (N + α + β - 2) under a Beta(α, β) prior; BNM(),
LD_param_est() (used by LDLRA()), and BINET() consistently use
beta1 as α (the "success" pseudo-count, matching the count in the
numerator), but LDB() used beta2. At the package default
beta1 = beta2 = 1 this made no numerical difference, but a caller
following BINET()'s own advice to set an asymmetric prior (e.g.
beta1 = 2, beta2 = 2 to avoid a degenerate 0/0 cell) would see the
opposite effect in LDB() versus its sibling functions. Traced to the
original Mathematica Chapter 10 module
(develop/mtmk15forVer13/mod/Module_LDB.nb), which uses beta2 here —
this is not an R porting bug but an inconsistency inherited from the
reference implementation. Changed LDB() to use beta1, matching the
mathematically conventional Beta(α, β) parameterization and the rest of
the package (R/10_LDB.R).
LDLRA(..., method = "C") (local dependence latent Class model)
mislabeled its output as the Rank model. A hardcoded model <- 2
right before building the output tables silently overrode the model
value that had correctly been set from method earlier in the function,
so result$model, result$msg, and the "Rank"/"Class" column headers in
Estimation_table/CCRR_table always read "Rank" regardless of the
requested method. Numerical estimates were unaffected — only the labeling
(R/09_LDLRA.R).
BNM()'s and BINET()'s graph-validity checks (acyclicFLG,
connectedFLG) never actually detected cycles or disconnection. Both
used adj^i in a loop intending to count length-i walks, but ^ on a
matrix in R is elementwise, not matrix power, so the loop only ever
re-tested self-loops and "any edge present." BINET()'s
stop("Your graph is not a DAG") guard was consequently a no-op for any
real cycle of length ≥ 2, and BNM() silently fit cyclic "Bayesian
networks" with no warning. (BINET()'s connectedFLG was further
computing off the student response matrix, not a graph object, so it
could not have tested connectivity at all.) Replaced both with
igraph::is_dag() / igraph::is_connected(mode = "weak") on the actual
adjacency matrix being validated (R/08A_BNM.R, R/11_BINET.R).
Biclustering_IRM.binary()'s small-class-reduction loop tracked
bestclass/class, which resolve to base R's class() function rather
than any local variable, so the intended "roll back to the best-BIC
class assignment" step silently did nothing. In practice this had no
effect on the final returned model (the class/field assignments are
recomputed from scratch after the loop regardless), but the code read as
doing something it did not; fixed to track the actual cls vector
(R/07_IRM.R).
CCRR() returned JCRR()'s result when called on non-exametrika
input — a copy-paste bug in CCRR.default()'s fallback branch called
JCRR(U) instead of recursing into CCRR(U) (R/02_TestItemFunctions.R).
MutualInformation()'s base argument was ignored for binary data;
MutualInformation.binary() always computed log base 2 regardless of the
requested base, unlike the ordinal/nominal/rated path which already
respected it (R/02_TestItemFunctions.R).
JSR() and CSR() crashed instead of falling back cleanly when given
binary data. Both message "using X instead" and call the binary
equivalent, but never return()ed its result, so execution fell through
into polytomous-only code (NCOL(U$Q), U$Z == 0 on nonexistent U$Q)
and crashed (R/02_TestItemFunctions.R).
ItemTotalCorr.ordinal() did not mask missing responses before summing
total scores. total <- rowSums(U$Q) included the -1 missing
sentinel directly in the sum, so any student with even one missing item
got a wrong total score, corrupting their item-total correlation with
every item, not just the missing one. Fixed to mask by Z before
summing, matching ItemThreshold.ordinal()'s existing pattern
(R/02_TestItemFunctions.R).
ITBiserial(), ScoreReport(), and ItemReport() crashed when passed
raw (unformatted) data — each called dataFormat() but stored the
result in a variable that was never used, leaving the original
unformatted argument in place for the subsequent $response.type check
and computation (R/02_TestItemFunctions.R, R/02_QitemFunctions.R).
dataFormat() silently skipped value validation for ordinal/rated/
nominal data. The branch was guarded by
response.type == "polytgomous" (a typo — no response.type value is
ever literally "polytomous", since the actual values are "ordinal"/
"rated"/"nominal"), so the "must be non-negative integers" check
never ran for any polytomous type. Fixed to check
response.type %in% c("ordinal", "rated") (deliberately excluding
"nominal", whose category codes are arbitrary labels rather than
ordered counts and may legitimately be negative) (R/01_dataFormat.R).
longdataFormat() misaligned ID/response-matrix rows whenever
numeric student or item IDs were not an exact 1..N sequence (e.g.
student IDs 10, 20, 30). Numeric IDs were used directly as row/column
indices instead of being remapped to a dense index, so nrow(result$U)
could be far larger than the actual number of students, with most rows
entirely empty. Both Sid/Qid are now always remapped through
as.factor() to a dense, correctly-ordered 1-based index, regardless of
whether the raw IDs are numeric or character (R/01_dataFormat.R).
longdataFormat()'s w (item weight) argument was misused as a row
index. w_vec[unique(Qid_num)] indexed the long-format weight column
by item-ID values rather than by the row where each item first
appears, so weights were essentially always wrong unless item IDs
happened to be an exact 1..N sequence starting from the first item
column. Fixed to look up each item's weight via its first occurrence in
the long-format data (R/01_dataFormat.R).
print(GridSearch(...)) always crashed. print.exametrika()'s
internal switch() had no case for the "GridSearch" class, and a
trailing comma after the last real case created an empty unnamed
fallback branch that errors when evaluated. GridSearch() is a
top-level function whose result is auto-printed at the console in normal
interactive use, so this reliably crashed on first use. Added a proper
GridSearch print case summarizing the fit-index grid and optimal
setting (R/00_exametrikaPrint.R).
plot(x) without an explicit type argument crashed with a confusing
"the condition has length > 1" error instead of the intended
"The 'type' argument must be specified." message. The missing(type)
check ran after type had already been evaluated (forcing its
multi-element default) for an unrelated uses_layout computation, so by
the time missing() was checked, type was no longer missing in the
technical sense needed for the check to matter — and the multi-element
default itself broke the preceding if first. Reordered so the
missing(type) check runs first (R/00_exametrikaPlot.R).
plot(grm_result, type = "ICC") failed even though the identical call
works for IRT models. GRM's plot-type alias handling for "ICC" was
implemented but never reachable, because valid_types$GRM in the
upstream validation list omitted "ICC" (only "IIC"/"TIC" were
listed). Added "ICC" to valid_types$GRM (R/00_exametrikaPlot.R).
print(bnm_result, digits = ...)/print(ldlra_result, digits = ...)
ignored the requested digits for the "Conditional Correct Response
Rate" table. A typo (x$CCRR_tabje instead of x$CCRR_table, in two
places) wrote the formatted string into a nonexistent list field instead
of the one actually printed immediately afterward. Also propagated the
existing digits argument to two related tables (IRPIndex/FRPIndex
in the LDLRA/LDB print sections) that were missing it while equivalent
tables elsewhere already had it (R/00_exametrikaPrint.R).
print(InterItemAnalysis(ordinal_data)) never displayed the
Conditional Selection Ratio — it printed x$JSR (Joint Selection
Ratio, already shown just above) a second time instead of x$CSR
(R/00_exametrikaPrint.R).
GridSearch() could report the wrong optimal (ncls, nfld) when
multiple grid cells tied for the best fit index. which(ret == ..., arr.ind = TRUE) returns one row per tied match; optimal_idx[1]/
optimal_idx[2] read the matrix's column-major-flattened first two
values instead of the first tie's row/col pair, so with ≥ 2 ties the
reported optimum could combine the row of one tie with the column of
another — a combination that was not actually the best (or even
necessarily tied). Fixed to read optimal_idx[1, "row"]/
optimal_idx[1, "col"] (R/00_GridSearch.R).
GridSearch(..., max_ncls = 1) silently tested ncls = 2 anyway.
2:max_ncls is R's descending-sequence trap: with max_ncls = 1 it
evaluates to c(2, 1) rather than an empty range, so a value the caller
explicitly excluded was tested regardless. Added an explicit
max_ncls/max_nfld < 2 guard that stops with a clear message instead
(R/00_GridSearch.R).
Glasso()'s edge_tol argument was documented as controlling edge
detection but did not affect model selection. The EBIC computation
used internally to choose lambda_opt had its own hardcoded 1e-6
edge-count threshold, so changing edge_tol only affected the
reported n_edge, not which lambda was actually selected. Threaded
edge_tol through to compute_EBIC_glasso() so it consistently governs
both (R/22_GlassoUnit.R).
dataFormat() could silently miss items with no real variance whenever
the item also had missing responses, letting them through into IRT(),
which then crashed with the uninformative "object 'result' not found"
(the EM optimizer's initial parameter, seeded from an item-total
correlation of NaN, made every retry of optim() fail). The existing
zero-variance check computed sd() on the raw response column, which
still contains the -1 missing-value sentinel; an item that only a
handful of students answered, all identically, therefore looked
non-constant (-1 mixed with the real value) even though it truly had
zero variance among actual respondents. sd.check is now computed on
each item's valid (Z == 1) responses only. Additionally, items with no
variance among valid responses are now excluded from the data (with a
message identifying them) instead of merely warned about and left in,
matching the original Mathematica dataformat[], which drops such items
before any analysis; an item with CA specified (rated data) has its
CA entry dropped along with it so the two stay aligned. All-missing
items (no valid responses at all) are unaffected and continue to be kept
with a separate warning (R/01_dataFormat.R).
DistractorAnalysis() counted raw category codes against columns
1..maxQ of its frequency tables. For rated data coded from 0 (or with
gaps in the codes), every category-0 response silently vanished from the
frequency/proportion tables and the remaining categories were shifted by
one column — inconsistent with the fitted model itself, whose category
probabilities are computed on codes remapped to 1..K per item since the
category-code fix earlier in this release. The stored Q and correct
answers are now remapped the same way before tabulation
(R/21_DistractorAnalysis.R).
A conf given as a data.frame was wrongly rejected ("The conf matrix
should only contain 0s and 1s") by Biclustering() and LDB(), because
the 0/1 validation ran before coercion and %in% compares whole columns
on a data.frame. The input is now coerced with as.matrix() first
(R/07_Biclustering.R).
GridSearch()'s help-page example called a nonexistent function
grid_serch(data_matrix, ...); it now calls GridSearch(J35S515, ...)
(R/00_GridSearch.R).
LDLRA() now stops with an informative error when a
rank-by-parent-pattern cell has no (smoothed) observations under
beta1 = beta2 = 1 (0/0), instead of silently propagating NaN through
the log-likelihood, matching the diagnostic added to BINET() earlier in
this release (R/09_LDLRA.R).
LRA() for rated data has a minFreqRatio argument that is documented to
pool response categories rarer than nobs * minFreqRatio into a single
bucket, but the feature was doubly broken: the collapse test compared each
category's frequency against the correct-answer code (catfreq != CA,
almost always true), so no category was ever pooled; and had it fired, the
report assembly indexed the full category-label set and would have stopped
on a length mismatch. The condition now reads "keep a category if it is
frequent enough or it is the correct answer" (the correct answer must
never be pooled), and the Item-Category Reference Profile labels the
pooled bucket "CatX", using only the labels of the categories actually
estimated. The default minFreqRatio = 0 keeps every category and is
unaffected (R/13_LRA_rated.R).
Fixed user-visible typos in printed output: "Dimensionality Analyeis" →
"Dimensionality Analysis", "Cummurative Percentage" → "Cumulative
Percentage", and "Conditonal" → "Conditional" (in "Conditional Correct
Response Ratio" and "Conditional Selection Ratio" headings)
(R/00_print_ctt_irt.R).
LLtheta_mat()/EAP_PSD() (R/04B_AbilityEstimation.R) and
the corresponding dead print.exametrika() "IRT_EAP_PSD" case.
Neither function was exported or called anywhere in the package; IRT()
computes ability EAP/PSD inline instead. Also removed a handful of other
now-dead assignments left over from the same tmp$U * tmp$Z
class-stripping pattern fixed above, and one duplicated-then-discarded
array/loop in LDLRA()'s internal LD_param_est(), none of which were
ever read after being written (R/06_LRA.R, R/08C_BNM_GA.R,
R/09B_LDLRA_GA.R, R/09_LDLRA.R, R/11_BINET.R).Follow-up cleanup after the audit above, consolidating logic that had drifted out of sync across near-identical call sites (the same root cause behind several of the bugs fixed above):
R/00_EMclus.R, R/04C_ParameterEstimation.R,
R/07_Biclustering.R, R/08C_BNM_GA.R, R/09B_LDLRA_GA.R,
R/15_Biclustering_nominal.R, and R/16_Biclustering_ordinal.R used a
carriage-return (\r) "overwrite in place" style, which reads as one
long run-on line when logged to a file or captured non-interactively
rather than the intended per-iteration progress. Switched to \n.beta_posterior_mode() (R/08A_BNM.R) for the
(count + beta1 - 1) / (total + beta1 + beta2 - 2) formula duplicated
across BNM(), LD_param_est() (used by LDLRA()), and BINET() — the
same formula whose numerator/denominator role LDB() had swapped, above.build_conf_mat() (R/07_Biclustering.R) for the conf
(vector/matrix/data.frame) parsing and validation duplicated across
Biclustering(), Biclustering.nominal(), Biclustering.ordinal(), and
LDB() — the same parsing whose missing as.matrix(conf) step caused
the LDB() crash fixed above.stop_if_all_grid_failed(), report_failed_grid_settings(),
and select_optimal_grid_index() (R/00_GridSearch.R) for the
all-cells-failed check, failed-settings warning, and optimal-index
selection duplicated between GridSearch()'s Biclustering (2-D) and
LCA/LRA (1-D) branches — the same duplication in which both the tie-break
bug and the max_ncls < 2 bug above were independently present.tests/testthat/test-regression-1150.R) and smoke tests for
AlphaCoefficient(), OmegaCoefficient(), and BiserialCorrelation(),
which previously had no direct tests
(tests/testthat/test-smoke-reliability.R).R/00_BiclucterUtils.R to R/00_BiclusterUtils.R (filename
typo) and fixed a "Bicluter" comment typo; removed a stray .png file
accidentally committed to the repository root in 2024, and broadened the
.Rbuildignore rule for Rplots.pdf so a leftover
tests/testthat/Rplots.pdf can no longer slip into the CRAN tarball.plot.exametrika() now forwards graphical parameters supplied via ...
to every plot type (R Journal review request). Previously the ...
argument documented on the plot.exametrika() help page was captured but
never passed on: the internal dispatch functions (plot_irt_model(),
plot_grm_model(), plot_common_profiles(), the polytomous Biclustering
plotters, the array plot, and the network plotters) did not accept or
propagate it, so standard graphical parameters were silently ignored. For
example, plot(result.LRA, type = "IRP", items = 1:4, nc = 2, nr = 2, las = 2, pch = 16) changed neither the axis-label orientation (las)
nor the plotting symbol (pch).
User-supplied parameters are now forwarded consistently to the underlying
base R plotting calls (plot, barplot, image, lines, curve) for
all plot types, including manually drawn axes so that las,
cex.axis, and similar parameters take effect on every axis. Standard
parameters such as pch, las, cex, col, lty, and lwd work as
expected, and user values take precedence over the package defaults, so
xlab, ylab, main, etc. can be overridden. This is achieved without
adding any dependency: the package remains lightweight and base-graphics
only.
Internally this is implemented via small helpers (merge_plot_dots(),
call_plot(), draw_curve()) that merge the package defaults with the
user's ... (user wins) and dispatch through do.call(); the IRT/GRM
curves are drawn by grid evaluation through draw_curve() instead of
curve() so that the non-standard evaluation of curve()'s first
argument does not block parameter forwarding.
IRT Test Information Function plot title typo. The type = "TIF"
plot for IRT models had main = "Test Informaiton Function"; it now reads
"Test Information Function" (R/00_plot_irt.R).
User-facing message typos and missing-word fixes. Four cosmetic but
user-visible string fixes following a family-wide audit (exametrika,
ggExametrika, shinyExametrika) prompted by the Clusterd -> Clustered
rename. No behaviour change; messages now read correctly.
R/04C_ParameterEstimation.R: IRT slope warning had a missing space
after the period ("... exceeds 10.Please exercise caution ..."
-> "... exceeds 10. Please exercise caution ...").R/02_TestItemFunctions.R: CCRR.nominal() info message was missing
the verb "Using" ("... binary data only. Conditional Selection Rate for your polytomous data instead." -> "... binary data only. Using Conditional Selection Rate for your polytomous data instead."), now
matching the parallel messages in CCRR.ordinal/CCRR.rated/CCRR.binary.R/09_LDLRA.R and R/10_LDB.R: max-parents warning had a comma
where a period belongs ("... is N, Please check." -> "... is N. Please check.").plot.exametrika() Biclustering panel: typo "Clusterd" -> "Clustered".
Fixed a long-standing misspelling in the base-R Biclustering array plot
(R/00_plot_biclustering.R): the panel title "Clusterd Data" is now
"Clustered Data", the inline comment ## Clusterd Plot is corrected to
## Clustered Plot, and the internal variable clusterd_data is renamed
to clustered_data. No user-facing API change. The downstream
ggExametrika::plotArray_gg() had inherited and propagated the typo into
its argument names (Clusterd, Clusterd_lines, Clusterd_lines_color);
that is being corrected in ggExametrika 1.1.1 as a coordinated release.
Binary IRM: crash on real data with missing values. Biclustering_IRM
on a binary exametrika data object containing missing cells
(tmp$Z[s, j] == 0) crashed with "確率ベクトル中にNAがあります"
(NA in probability vector) inside rmultinom(), after log() produced
NaN in the Gibbs sampler. The cause was that tmp$U retains the
-1 missingness marker from dataFormat() and the IRM aggregated it
directly (tmp$U %*% fld01), producing negative counts in CcfPlus
whose log was NaN. The pre-existing line U <- tmp$U * tmp$Z was a
dead variable. R/07_IRM.R now assigns the masked matrix back as
tmp$U <- tmp$U * tmp$Z, so all downstream tmp$U references see
zero in missing cells. Reproduced on HCI (651×20, 1.5% missing) and
SAT12 (600×32 binarised, 0.4% missing).
Graphical Lasso: graceful handling of numerical divergence. When the
inner block coordinate descent (BCD) of Glasso() produced non-finite
values — typically with high-dimensional, near-singular polychoric
correlation matrices (N <= p regime) and very small lambdas — the
previous implementation crashed in R's if (diff < eps) test with
"missing value where TRUE/FALSE needed", discarding all of the
intermediate work along the lambda grid.
glasso_one() now detects non-finite values in the BCD update of
beta, in the inner convergence statistic diff, and in the working
covariance W, and returns converged = FALSE instead of crashing.Glasso() now reacts to converged = FALSE (or a non-finite EBIC)
by emitting a warning() that names the offending lambda and the
best lambda found so far, and breaks the lambda search, returning
the best solution encountered up to that point. If divergence happens
on the very first lambda the call still errors out, since no solution
exists to return.path element of the result preserves NA for any lambda values
skipped by the early break, making the breakpoint visible to the user.Observed concretely on a 98-item / 143-respondent ordinal dataset
(polychoric condition number ~1.4e5): the old Glasso() crashed at
lambda = 0.0096 after ~25 minutes of work; the new implementation
returns the same lambda_opt = 0.1003 solution with a clear warning
and a path whose two trailing entries are NA.
Resubmission of the 1.13.0 release. The 1.13.0 submission was rejected
by the CRAN auto-check service for a single Overall checktime NOTE on
r-devel-windows-x86_64 (11 min vs. the 10 min limit). The package itself
passed cleanly on both Windows and Debian (Status: OK / OK).
R CMD check comfortably under the 10-minute Windows limit. The
same tests continue to run locally and on R-hub / win-devel via the
NOT_CRAN environment variable that testthat sets.
test-grm.R: the J15S3810 regression (a 15-item / 3,810-respondent
fit, ~60s) and the nitems >= 8 underflow regression (~8s) are
skipped on CRAN. Default-coverage GRM tests on small data continue
to run on CRAN.test-irm.R: the J35S515 shared-fixture Gibbs run (~23s) and the
two reproducibility tests that re-run Gibbs (~22s combined) are
skipped on CRAN. The Default seed is 123 structural check still
runs on CRAN.This release also incorporates the development changes from 1.12.0–1.12.2 (never published to CRAN). Their entries are retained below for traceability.
Graphical Lasso (Glasso): New function for sparse precision matrix
estimation from ordinal item response data. The polychoric correlation
matrix is computed internally and the optimal regularization parameter
is selected by Extended Bayesian Information Criterion (EBIC; Foygel
and Drton 2010). Implements block coordinate descent (Friedman, Hastie,
Tibshirani 2008; Algorithm 17.2 of Hastie, Tibshirani, Friedman 2009)
with cyclical coordinate descent for the inner lasso step. Warm-starting
across the lambda grid accelerates the search.
Internal helpers glasso_one() (single-lambda solver) and
compute_EBIC_glasso() (EBIC computation) are available but not
exported.
Returns a list with theta (selected precision matrix), lambda_opt
(selected lambda), ebic_opt, n_edge, and path (data frame of
lambda, ebic, n_edge over the search grid).
print.exametrika Glasso method: Added a Glasso branch to the
shared print.exametrika() dispatcher to summarize the estimated
model (optimal lambda, EBIC value, edge count, precision matrix).
Chatterjee's xi correlation (chatterjee_xi, xi_stable,
chatterjee_matrix): New family of functions implementing
Chatterjee's (2021) rank-based correlation coefficient.
chatterjee_xi() computes the single-shot value with random
tie-breaking. xi_stable() averages B replications (default
B = 1000) to stabilize against tie-induced variability and returns
a list with the mean, standard deviation, standard error, and B.
chatterjee_matrix() produces the p x p asymmetric pairwise xi
matrix from ordinal data with pairwise-complete handling of
missing values; the asymmetry between xi(j, k) and xi(k, j)
enables direction detection in graphical-model construction.
Glasso() and chatterjee_matrix() in
backticks (e.g., `[0, 1]`, `[j, k]`, `Q[i, j]`) so
that roxygen2 no longer rewrites them as \link{} cross-references.
Resolves R CMD check WARNINGs about missing link targets.Glasso() ... documented: Added @param ... to Glasso() so
that R CMD check's "Undocumented arguments" WARNING is cleared.compute_EBIC_glasso() partial-match fix: Replaced
determinant(Theta, log = TRUE) with the full argument name
logarithm = TRUE, removing the partial-argument-match NOTE.dataFormat(response.type = "rated") no longer errors when a CA
category is unobserved: previously, if no respondent chose item
j's correct-answer category (e.g., everyone got it wrong on a hard
item, or the sample was small enough that the CA category did not
appear by chance), dataFormat() aborted with
"CA for item j is not a valid response category". This rejected
legitimate test data where an item is uniformly difficult or where
the sample size is small. The check has been relaxed to a
warning(); the affected items are encoded as all-incorrect (U[, j] == 0), which is the correct downstream behavior. This applies to
both the matrix-input path (dataFormat) and the long-format path
(longdataFormat). Mistyped CAs (e.g., CA[j] = 8 when categories
are 1–7) still surface as warnings, so the diagnostic value is
preserved.Biclustering.nominal() and Biclustering.ordinal() no longer abort
with "missing value where TRUE/FALSE needed" when extreme grid
configurations (e.g., very small ncls combined with very large
nfld) leave fields or classes empty during EM. The two
log-likelihood checks inside the EM loop — the relative convergence
test and the monotonicity guard — now treat a non-finite
test_log_lik as a non-converged exit instead of throwing. Affected
cells are returned with converge = FALSE so that GridSearch()
skips them automatically when comparing criteria. This also resolves
the same crash in Biclustering.rated(), which calls
Biclustering.nominal() internally.Class-side Confirmatory Biclustering: Biclustering() now accepts
a conf_class argument that fixes class memberships during EM. Like
conf (which fixes field memberships), conf_class accepts either a
vector of class labels (one per respondent) or a 0/1 membership matrix
(respondents x classes). When supplied, the class-side E-step is
overridden every iteration. For Ranklustering (method = "R"), the
neighbour-smoothing step is skipped (smoothed_memb <- clsmemb) since
smoothing pre-fixed labels would defeat the purpose of fixing them.
Available for binary, ordinal, nominal, and rated data; can be
combined with conf to fix both fields and classes simultaneously.
Note: rated re-orders classes by correct rate after estimation, so
the output class labels may not match the input labels (the
individual-to-class mapping is preserved up to relabeling).
Number of model parameters (nparam) is intentionally not adjusted
when conf or conf_class is in effect: only PiFR / BCRM cell
probabilities count as parameters in this implementation, and
membership matrices are latent posteriors, not parameters.
Confirmatory ordinal Biclustering: field membership now stays fixed
during EM: Biclustering.ordinal() was overwriting fldmemb with the
E-step estimate every iteration, so the conf/conf_mat argument only
affected the initial value. Aligned the implementation with
Biclustering.binary() by re-applying fldmemb <- conf_mat immediately
after the field-side E-step. With this fix result$FieldEstimated
matches the user-supplied assignment exactly.
Confirmatory nominal Biclustering: conf argument is now actually
honored: Biclustering.nominal() validated length(conf) against
NCOL(U), but U is an exametrika list object so NCOL(U) always
returned 1. The check rejected every well-formed conf vector with
"conf vector size does NOT match with data.", making confirmatory
nominal Biclustering unreachable since v1.10.0. Replaced with
NCOL(U$Q) (and the matrix-form NROW(conf) check, plus the
conf_mat allocation, in the same way).
Confirmatory ordinal Biclustering: conf length check actually
validates input: same NCOL(U) issue as nominal, but in
Biclustering.ordinal(). Fixed by switching to NCOL(U$Q).
Confirmatory binary Biclustering: conf size checks now reference the
formatted response matrix explicitly: Biclustering.binary() happened
to work because U is rebound to tmp$U * tmp$Z before the conf block,
so NCOL(U) returned the item count. Switched to NCOL(tmp$U) for
consistency with the ordinal/nominal fixes and to make the intent
obvious to readers.
Confirmatory Biclustering: matrix-form conf is no longer silently
dropped: when conf was supplied as a membership matrix
(items x fields) instead of a vector, all three implementations
(binary, ordinal, nominal) validated the matrix but never copied
it into conf_mat, so the subsequent nfld <- NCOL(conf_mat) failed
with object 'conf_mat' not found. Added conf_mat <- as.matrix(conf)
in the matrix branch.
Biclustering.ordinal() now honors the maxiter argument: The
inner EM iteration cap was hardcoded to 100 (maxemt <- 100) and
ignored the user-supplied maxiter. This mirrors the bug that was
fixed for Biclustering.nominal() in 1.11.0. Callers that pass a
larger maxiter (e.g. 2000 in Monte Carlo studies) now actually
use that ceiling.
GRM() fit indices no longer return NaN for moderate or larger
item counts: The benchmark model used
const <- exp(-nitems * 100) as a log-domain epsilon inside
sum(cat_counts * log(cat_probs + const)). For about 8 or more items
this expression underflows to exactly 0 in IEEE-754 double precision,
so log(0) = -Inf propagates through 0 * -Inf = NaN and poisons
every downstream index (model_Chi_sq, NFI, CFI, RMSEA, AIC, CAIC,
BIC). The benchmark and null loops now skip zero-count categories
explicitly, removing the need for an additive epsilon.
GRM() degrees of freedom were computed incorrectly: The model
df was set to n_pattern * (ncat - 1) + 1 and the null df to
n_pattern * (ncat - 1), which is neither (# bench params) - (# model params) nor (# bench params) - (# null params). The inflated
df then clamped CFI, TLI, IFI, and RMSEA to zero whenever
chi^2 < df (and the previous df_A > df_B inequality was the wrong
direction to begin with). The convention now matches
Biclustering.ordinal():
bench_nparam_j = n_pattern * (ncat_j - 1)null_nparam_j = ncat_j - 1model_nparam_j = ncat_j (one slope plus ncat_j - 1 thresholds)df_A_j = bench_nparam_j - model_nparam_jdf_B_j = bench_nparam_j - null_nparam_jGRM() now accepts any integer-coded ordinal responses, not just
1..K: Category counts were derived from apply(dat, 2, max), which
assumes responses are already 1-indexed. Data coded from 0 (e.g.
0..3) or with gaps (e.g. 1, 2, 4) undercounted ncat[j] by one or
more and then indexed grm_prob()[resp] / v[resp] out of range,
producing either an outright invalid subscript type 'list' error
(on J15S3810-style data) or silently truncated threshold columns.
Each item's responses are now remapped to contiguous 1..K codes via
match() against the sorted unique values on entry, with ncat[j]
derived from the mapped levels; both the R model-fit blocks and the
C++ log-likelihood receive 1-based input regardless of the user's
coding.
GridSearch() now tolerates per-cell fit errors: Previously, a
single Biclustering() (or LCA() / LRA()) call that raised an
error at a grid corner (for example, empty-cluster edge cases at
large ncls/nfld with small nobs/nitems) would propagate out
of GridSearch() and abort the entire grid. The call to the
underlying analysis function is now wrapped in tryCatch, and
errors are handled the same way as non-convergence: the cell is
marked NA in the index matrix and the (ncls, nfld) pair is
recorded in failed_settings. GridSearch() still raises only
when all grid cells fail, preserving the existing "all-failed"
error.
C++ implementation of the IRM Gibbs sampler core: The collapsed
Gibbs sampler shared by Biclustering_IRM.nominal(),
Biclustering_IRM.ordinal(), and Biclustering_IRM.rated() is now
implemented in C++ via Rcpp (src/irm_gibbs_core.cpp). The R
reference implementation in R/00_IRM_Gibbs_CORE.R is preserved
behind irm_gibbs_core(..., use_cpp = FALSE) for cross-checking.
Numerical reproducibility (revised 2026-05-07): RNG calls are
routed through R-level sample.int() and rmultinom() (via
Rcpp::Function) so the unif_rand() consumption order matches
base R exactly. The C-level entry points (Rcpp::sample,
R::rmultinom) consume RNG differently from base R and were
explicitly avoided.
The C++ and R reference paths are bit-identical for the first
Gibbs iterations on the bundled test datasets and on real
polytomous data (verified through iteration 2 on J143S32 ordinal),
but diverge from iteration 3 onward on real data due to
sub-LSB floating-point ordering differences accumulating through
the lmvbeta() calls in the CRP likelihood. Once the divergence
flips a single rmultinom() outcome, the two chains follow
different sample paths.
Empirically the two paths still target the same posterior:
on a 50-seed comparison (M = 66 ordinal items, N = 143), the
marginal distributions of n_field and n_class are
statistically indistinguishable between paths (Wilcoxon
p = 0.56 / 0.81). Use either path for inference; do not rely
on a single seed reproducing the same (n_field, n_class)
across the two paths.
The parity tests in tests/testthat/test-irm-gibbs-cpp.R
cover only short runs (<= 10 iterations) and therefore did not
catch this. A longer-iteration test on real-data-like
configurations is on the roadmap.
Wall-clock: roughly 4x speedup on the inner Gibbs loop on the
bundled test datasets (J20S600 nominal: 82 to 20 ms/iter;
J35S500 ordinal: 76 to 19 ms/iter). The end-to-end
Biclustering_IRM() call also benefits proportionally on larger
iteration counts (e.g. simulation runs).
No new dependencies: Rcpp is already in LinkingTo:; the
implementation uses only Rcpp::Function and <vector>/<cmath>.
Vectorized EM hot path in Biclustering.ordinal(): The per-iteration
cost of the ordinal EM loop has been reduced by rewriting five hot spots
in base R without introducing new package dependencies.
apply(Ufcq_prior, c(1, 2), function(x) rev(cumsum(rev(x))))
with an in-place reverse cumulative sum loop over the category axis.
The previous form dispatched nfld * ncls R-level calls per EM
iteration; the new form performs maxQ - 1 vectorized array
additions.apply(X, 1, min) and apply(X, 1, max) with
do.call(pmin.int, as.data.frame(X)) /
do.call(pmax.int, as.data.frame(X)). These compute the row-wise
reduction in a small number of C-level calls rather than one
R-level call per row. apply(X, 1, which.max) was similarly
replaced by max.col(X, ties.method = "first").log(BBRM[,,q] - BBRM[,,q+1] + const) once per EM
iteration instead of independently in the class-side and field-side
E-steps for every q.Z * Uq[,,q] once per fit (call it ZU) via
column-major recycling (Uq * as.vector(Z)), replacing about eight
separate elementwise products per EM iteration and reusing the
same array in the post-EM fit-index blocks. The replicate(maxQ, Z)
allocation that produced Zrep has been removed as a byproduct.apply(X, c(1, 2), sum) and apply(X, c(2, 3), sum) on
3-D arrays with rowSums(X, dims = 2) / colSums(X, dims = 1),
and apply(M, 2, sum) on matrices with colSums(M).Output is bit-identical to 1.11.0 on every tested configuration
(max(abs(old - new)) == 0 for BCRM, BBRM, ClassMembership,
FieldMembership, TestFitIndices, and the full EM trajectory).
Single-fit wall-clock on typical sizes (ncls, nfld up to 20 by 15 on
J35S500) improves by roughly 1.2 to 1.3 times; the per-EM-iteration
speedup compounds across GridSearch() grids.
Vectorized 3-D apply reductions in Biclustering.nominal(): The
same rowSums/colSums(X, dims = ...) substitutions were applied to
the nominal M-step and null-model blocks, and Zrep was eliminated
in favor of the ZU precomputation. Output is bit-identical to
1.11.0.
Vectorized Uq one-hot encoding in Biclustering.ordinal() and
Biclustering.nominal(): The construction of the one-hot response
array Uq[i, j, tmp$Q[i,j]] = 1 previously used a nobs * nitems
nested R loop. It now writes all ones in one C-level matrix-index
assignment, restricted to non-missing cells (tmp$Z == 1). Single-fit
wall-clock on the validation matrix improves by 1.2-5.3x over 1.11.0
(up to 5x on small cases where the loop overhead dominated). Output
is bit-identical to 1.11.0 at every downstream location (the missing
entries of the old Uq were never read because every consumer
applies the tmp$Z mask).
Biclustering.ordinal() now returns SmoothedMembership: The
smoothed (Ranklustering-filtered) class membership matrix is now part
of the return value, mirroring Biclustering.binary(). This fills a
long-standing gap (only the binary form previously exposed it) and
makes Ranklustering with conf_class introspectable: callers can
verify that the smoothing step is correctly skipped when class labels
are pre-fixed (SmoothedMembership equals ClassMembership in that
case).print() smoke test for Biclustering_IRM.rated() results
(test-irm-rated.R).nrank = 4 LRA.rated regression test for
DistractorAnalysis() to cover varying rank counts
(test-distractor.R).Imports or Depends. The new C++ Gibbs
sampler uses Rcpp, which was already declared in LinkingTo:.conf_class argument on
Biclustering() is additive and defaults to NULL, preserving the
previous behavior for all existing callers.Biclustering_IRM.rated(): Rated IRM (Infinite Relational Model for multiple-choice data): Performs IRM-based biclustering for rated data (items with correct answers and multiple response categories). Internally calls Biclustering_IRM.nominal() for estimation via Dirichlet-Multinomial collapsed Gibbs sampler, then post-processes the results: classes are sorted by correct response rate (ranklustering), and two layers of fit indices are reported — binary (item correct/incorrect, with full benchmark model including CFI/RMSEA) and nominal (category-level, AIC/BIC/CAIC only). Returns TestFitIndices (binary, default) and TestFitIndices_nominal (nominal) in the output.
DistractorAnalysis(): Distractor analysis for rated models: S3 generic function for analyzing response category distributions by rank/class. Computes observed frequency tables, proportion tables, chi-square tests against chance level, and Cramer's V effect sizes for each item-by-rank cell. Works with LRA.rated and Biclustering.rated / Biclustering_IRM.rated results. For Biclustering models, output is grouped by field. Supports items and ranks filtering in both print() and plot() methods.
J21S300: New rated sample dataset (21 items, 300 students, 4 response categories) for testing rated Biclustering and IRM models. Smaller and faster than J35S5000 (35 items, 5000 students).Fixed maxiter parameter ignored in Biclustering.nominal(): The maxiter parameter was accepted but internally overridden by a hardcoded maxemt <- 100. Now correctly uses maxiter as the iteration limit.
Fixed spurious "No ID column detected" message in Biclustering.rated() and Biclustering_IRM.rated(): The crr() function was called with a raw matrix (tmp$Z * tmp$U) instead of an exametrika object, triggering an unnecessary dataFormat() call. Now creates a temporary binary-typed exametrika object to avoid the message.
Biclustering.rated(): Rated (multiple-choice) Biclustering: Performs biclustering for rated data (items with correct answers and multiple response categories). Internally calls Biclustering.nominal() for estimation, then post-processes the results: classes are sorted by correct response rate (ranklustering), and two layers of fit indices are reported — binary (item correct/incorrect, with full benchmark model) and nominal (category-level, AIC/BIC/CAIC only). The binary layer uses item-level correct response rates per class without field constraints (nparam = nitems * ncls). Supports both Biclustering (method = "B") and Ranklustering (method = "R") modes. Returns TestFitIndices (binary) and TestFitIndices_nominal (nominal) in the output. Added 16 tests for rated Biclustering using J35S5000.ItemFRP to quasiFRP in Biclustering.rated() output: The item-level correct response rate matrix (J x C) is computed without field constraints because the field assignments come from nominal analysis and have different meaning in the binary context. The name quasiFRP (quasi Field Reference Profile) clarifies this distinction from the standard field-constrained FRP.subscript out of bounds in nominal/ordinal Biclustering with large nfld: The initial field assignment ceiling(1:nitems / (nitems / nfld)) could exceed nfld due to floating-point rounding. Added pmin(..., nfld) clamping, consistent with the binary Biclustering fix already in place.dataFormat() call in Biclustering.rated(): The function was unnecessarily re-calling dataFormat() on already-formatted data, causing repeated "No ID column detected" messages during GridSearch(). Now reuses the existing exametrika object directly.x$U (binary correctness matrix) was used instead of x$Q (polytomous responses). Now correctly uses x$Q for polytomous models.invalid graphics state error: par(mfrow=...) and layout() conflicted when plotting FCRP/FCBR. Now skips par(mfrow=...) for plot types that use layout() internally.id parameter validation in dataFormat(): When a vector (e.g., id = tmp$ID) was passed instead of a column number, the error message was cryptic (the condition has length > 1). Now validates that id is a single integer and provides a clear error message.ell_B = 0) and chi-square based indices (NFI, RFI, IFI, TLI, CFI, RMSEA) meaningless. These indices are now reported as NA for nominal data. Information criteria (AIC, BIC, CAIC) are computed directly from the model log-likelihood and remain available. Also fixed a length(benchGroup) → length(unique(benchGroup)) bug in Biclustering.nominal() (the unique() call was missing).nfld. Affects Biclustering() (binary/ordinal/nominal), Biclustering_IRM() (binary/ordinal/nominal).field to fld in Biclustering_IRM.binary(): Internal variable name changed from field to fld for consistency with other Biclustering models. No change to the public API (FieldEstimated output name is unchanged).@format field incorrectly stated "nominal responses" but the dataset is ordinal (response.type=ordinal). Also fixed "A ordinal" to "An ordinal" in @description.eval=FALSE to computationally intensive code chunks in vignettes (GridSearch, IRM, LDLRA, LDLRA_PBIL, LDB, BINET, ordinal/nominal Biclustering, ordinal/rated LRA). The Japanese guide (guide-ja) now uses eval=FALSE for all model estimation chunks to avoid duplicating computation from English vignettes. Full rendered output is available on the pkgdown site.log(n + 1) but should be log(n) + 1 per Bozdogan (1987, Psychometrika, 52(3), p.358, Proposition 2, Eq.44). The original Mathematica implementation had this error (Log[nobs + 1]), and the R port inherited it. Both the R version (R/00_ModelFitModule.R) and the Mathematica version (develop/mtmk15forVer13/mod/Module_ModelFit.nb) have been corrected. The numerical difference is approximately 1 (constant), but the corrected formula now matches the published definition: CAIC(k) = -2 log L + k * (log(n) + 1). This affects all models that compute fit indices: IRT, LCA, LRA (binary/ordinal/rated), Biclustering (binary/ordinal/nominal), IRM, BNM, LDLRA, LDB, BINET, and GRM. All 873 tests pass with the corrected formula.\donttest to \dontrun: GRM's multi-panel plot examples caused "invalid graphics state" errors in non-interactive environments (pkgdown, CI). Changed to \dontrun to prevent build failures.CA parameter: When CA (correct answer vector) was provided but response.type was not explicitly specified, dataFormat() incorrectly classified the data as "ordinal" instead of "rated". This caused $U (binary scoring matrix) to be NULL. The auto-detection now checks for CA before ordinal/nominal detection: binary → rated (if CA provided) → ordinal → nominal.id parameter not working for column 1: Changed id default from 1 to NULL. Previously, id = 1 (both default and explicit) triggered auto-detection heuristics, making it impossible to explicitly specify the first column as the ID column. Now id = NULL triggers auto-detection, and any numeric value (including 1) forces that column to be used as ID.1:N) as ID regardless of whether they also look like valid response values. Previously, 1:10 was misclassified as response data because looks_like_response_data() returned TRUE.U matrix: When the response matrix contained missing values (-1), the binary scoring matrix U incorrectly scored them as 0 (incorrect) instead of -1 (missing). Now U[i,j] = -1 when Q[i,j] = -1.drop=FALSE missing in item exclusion: When items were excluded due to invalid variance (e.g., containing Inf), the column subsetting response.matrix[, mask] dropped matrix dimensions if only one item remained, causing a crash. Added drop = FALSE.dataFormat() now reports via message() when it detects problematic data:
g_list / adj_list Input Path Fixg_csv variable undefined error when using g_list or adj_list input: BINET() crashed with an undefined variable error at the graph construction step when the DAG was specified via g_list or adj_list parameters. The internal variable g_csv (used to build the integrated graph object all_g) was only defined in the adj_file code path. Added logic to reconstruct g_csv from adj_list for the g_list and adj_list input paths, ensuring all_g is correctly built with Field edge attributes regardless of input method.g_list / adj_list length validation: The length check for g_list and adj_list incorrectly compared against ncls (number of classes) instead of nfld (number of fields). In BINET, each element of adj_list represents the DAG structure at a specific field, so the list length should equal nfld. Also fixed g_list type check from g_list[[1]] (always checking the first element) to g_list[[j]] (checking each element).... to Biclustering_IRM.binary() signature: The S3 generic Biclustering_IRM(U, ...) requires all methods to include ... in their formal arguments. The .binary method was missing it, causing an R CMD check WARNING. Added ... to match the generic and the .nominal/.ordinal methods.msg Field Fixmsg field to correctly distinguish Rank vs Class models: Binary IRM and Ordinal IRM are Ranklustering models (ordered latent classes), so msg = "Rank" is correct. Nominal IRM has no Ranklustering concept (unordered latent classes), so msg = "Class" is correct. Previously all three methods had inconsistent values; now Biclustering_IRM.binary and Biclustering_IRM.ordinal return msg = "Rank", while Biclustering_IRM.nominal returns msg = "Class". Also updated internal column names of cls01 and FRP from "Class" to "Rank" for binary/ordinal IRM. The shared Gibbs core (irm_gibbs_core()) output column names were also updated.LRD AliasLRD (Latent Rank Distribution) alias to Biclustering_IRM.binary() return value: The binary IRM was the only Biclustering model that did not include LRD in its return value (it only had LCD). Other Biclustering models (binary Biclustering, nominal IRM, ordinal IRM) all return both LRD and LCD. Added LRD = clsdist as an alias for LCD to ensure consistency across all Biclustering-family models. This improves interoperability with downstream packages (ggExametrika) that look up LRD when msg == "Rank".Biclustering_IRM() seed default back to 123: Ensures reproducibility by default.t(apply()) Dimension Drop Fixt(apply(log_S, 1, irm_log_to_prob)) dimension drop in Nominal/Ordinal IRM: When the Gibbs sampler converged to ncls=1 or nfld=1, apply() on a single-column matrix returned a vector instead of a matrix, causing t() to produce a 1-row matrix instead of an N-row matrix. This led to incorrect class/field membership assignment. Replaced all 6 instances of t(apply(mat, 1, fun)) with explicit for-loops in both Biclustering_IRM.nominal() (3 locations: EM E-step, final class membership, final field membership) and Biclustering_IRM.ordinal() (3 locations: same). Consistent with the project's apply() caution policy.Biclustering_IRM() Gibbs sampler: Changed exp(ptab - min(ptab)) to exp(ptab - max(ptab)) for numerical stability. The previous implementation subtracted the minimum log-probability, which could cause overflow (exp(large positive) = Inf) when the range of log-probabilities was large, resulting in NaN after normalization. Subtracting the maximum ensures the largest value becomes exp(0) = 1 and all others underflow harmlessly to near-zero values.Biclustering.nominal() using stale log-likelihood in model fit indices: When the EM algorithm exited due to log-likelihood decrease, BCRM was correctly reverted to the previous iteration's value, but test_log_lik was not. The model fit section recalculated the correct log-likelihood as testell, but chi_A, model_log_like, log_lik, and LogLik all referenced the stale test_log_lik instead of testell. Also removed a duplicated model fit code block (copy-paste error).J20S400 response type from nominal to binary: The J20S400.rda dataset was incorrectly stored with response.type = "nominal" despite being a binary (0/1) dataset with -99 as missing values. Missing values were not properly handled (Z was all 1s, Q contained raw -99 values). Regenerated from the original CSV source (develop/sampleData/J20S400.csv) with dataFormat(..., na = -99), now correctly typed as binary with 86 missing values properly masked in Z.Biclustering_IRM.nominal() for nominal/polytomous data: Extends the Infinite Relational Model (IRM) from binary to nominal scale data using a Dirichlet-Multinomial collapsed Gibbs sampler. The Chinese Restaurant Process (CRP) automatically determines the optimal number of classes and fields. After the Gibbs sampling phase, small classes are consolidated and refined with an EM algorithm. The Dirichlet prior concentration parameter alpha controls smoothing of category probabilities.Biclustering_IRM.ordinal() for ordinal/polytomous data: Extends IRM to ordinal scale data. Shares the same Dirichlet-Multinomial collapsed Gibbs sampler as nominal IRM (Phase 1), then applies ordinal-specific EM refinement (Phase 2) with cumulative normalization to enforce monotonic category boundaries. The mic parameter (default TRUE) enforces monotone increasing class ordering by expected score sum. Reports BFRP (Bicluster Field Reference Profile), FRPIndex, TRP, and Strongly/Weakly Ordinal Alignment Conditions (SOACflg/WOACflg).Biclustering_IRM() is now an S3 generic: Dispatches to Biclustering_IRM.binary (existing binary IRM), Biclustering_IRM.nominal (new), and Biclustering_IRM.ordinal (new). Raw data is automatically formatted and dispatched based on response.type.irm_gibbs_core() (in R/00_IRM_Gibbs_CORE.R), used by both nominal and ordinal IRM. Helper functions (irm_calc_Ufcq, irm_lmvbeta, irm_log_to_prob, irm_bic_calc, etc.) are also shared.U_fcq, computing only the contribution of the target student/item rather than recalculating the full array at each step.LCA() and LRA() now support a conf parameter for confirmatory analysis: The conf argument accepts an IRP matrix (ncls/nrank x testlength) where non-NA values are held fixed throughout EM estimation and NA values are freely estimated. This enables test equating scenarios where anchor items retain their known IRPs while new items are calibrated against them.conf has column names, items are matched by label rather than by position. This allows conf to contain a subset of items (anchor items only) or items in a different order than the data. Items in the data but not in conf are automatically set to freely estimated. Unknown labels in conf produce an informative error.somclus() internal function: The Self-Organizing Maps estimation code previously inlined in LRA.binary() (~100 lines) has been extracted into a standalone internal function somclus() in 00_EMclus.R. This parallels emclus() (GTM/EM) and returns the same structure. Also fixed a pre-existing typo (h_cout → h_count) and another (oldsBIC → oldBIC).tidyverse and readxl for reading Excel-based Mathematica reference data. Replaced with 24 modern test files using base R read.csv() and test_path() to load CSV fixtures from tests/testthat/fixtures/mathematica_reference/. The new test suite has zero external package dependencies beyond testthat.readxl/tidyverse from test dependencies: DESCRIPTION Suggests no longer requires any packages beyond knitr, rmarkdown, and testthat.test-irm-nominal.R): 18 test blocks for Biclustering_IRM.nominal() using J20S600 data — basic execution, dimensions, FRP validity, membership, fit indices, backward compatibility, seed reproducibility, alpha validation.test-irm-ordinal.R): 25 test blocks for Biclustering_IRM.ordinal() using J35S500 data — basic execution, dimensions, FRP validity, expected scores, TRP, BFRP, FRPIndex, SOAC/WOAC flags, mic parameter, fit indices, backward compatibility, seed reproducibility, alpha validation.stem(nrs(dataFormat(J15S500))) to demonstrate score distribution visualization.^tests$ and ^inst$ entries that were incorrectly excluding tests and vignettes from the built package.index Parameter FixesGridSearch() now accepts common aliases for fit indices. "loglik", "log_lik", "LogLik", and "LL" are mapped to "model_log_like". "Chi_sq" and "chi_sq" are mapped to "model_Chi_sq". Previously, using these aliases caused silent NULL extraction and eventual errors.GridSearch(), before the computationally expensive grid search loop runs. The error message lists all valid options including available aliases.model_log_like is now correctly treated as a maximization target (larger log-likelihood = better fit). Previously, it was incorrectly placed in the minimization group, which would have selected the worst-fitting model.tolerance = 1e-2 in expect_equal) to absolute difference comparison (threshold 0.005). Q3 residual correlations include values near zero where relative tolerance comparisons are unreliable.layout and plot.new from graphics to NAMESPACE. These functions are used by the legend strip layout helpers for polytomous Biclustering plots.apply(U$Q, 2, unique) returning matrix instead of list: When all items have the same number of response categories (e.g., all 5-point Likert), apply() returns a matrix rather than a list. The subsequent lapply() then iterates over individual elements instead of per-column vectors, causing ncat to be a vector of 1s and crashing the algorithm. Replaced with lapply(seq_len(nitems), function(j) sort(unique(U$Q[, j]))) which always returns a proper list. Same fix applied to catfreq999 computation. Both LRA.ordinal and LRA.rated were affected.apply(tmp$Q, 2, table) returning matrix instead of list: Same class of bug as the LRA.ordinal/LRA.rated fix above. In GRM(), the null model log-likelihood computation used apply(tmp$Q, 2, table) to compute per-item category frequencies. When all items have the same number of response categories (e.g., all 5-point Likert), apply() returns a matrix instead of a list, causing response_list[[j]] to extract a single number rather than the full frequency table. This produced incorrect null_log_like values and consequently wrong chi-square statistics, RMSEA, TLI, and CFI in ItemFitIndices. Replaced with lapply(seq_len(nitems), function(j) table(tmp$Q[, j])) which always returns a proper list.LRA.ordinal() now raises an informative error when items have different numbers of response categories (e.g., some items with 3 categories and others with 5). The internal matrix algebra uses fixed-stride indexing that assumes uniform category counts. The error message suggests alternatives (LRA.rated, Biclustering.ordinal) that support mixed category counts via list-based designs.plot(lca_result, type = "FRP") passed validation but failed at runtime because the $FRP field does not exist in LCA/LRA return values. Now properly rejected with an informative error message at the validation stage.stat parameter supporting "mean" (default), "median", and "mode".style parameter supporting "line" (default) and "bar" (stacked bar chart).stat parameter.FRPIndex (Field Reference Profile indices) including location parameters (Alpha, Beta), slope parameters (A, B), and monotonicity indices (Gamma, C).plot.exametrika()stat: Controls the summary statistic for FRP and RRV plots ("mean", "median", "mode").style: Controls the display style for FCRP plots ("line", "bar").#808080) to distinguish from white (incorrect) and black (correct). Polytomous data uses black (#000000) to distinguish from the category color palette.print() now displays FRPIndex (Field Reference Profile Indices) with a note that the values are based on normalized expected scores (E[score]-1)/(maxQ-1).?Biclustering, including detailed definitions and polytomous adaptation logic.Systematic unification of return value structures across all analysis functions for consistency and interoperability.
n_class (retaining Nclass for backward compatibility)n_rank, n_field (retaining Nrank, Nfield)n_class, n_field (retaining Nclass, Nfield)n_class, n_field, n_cycle (retaining Nclass, Nfield, N_Cycle)n_class, n_field, n_cycle (retaining Nclass, Nfield, N_Cycle)n_cycle, N_Cycle (retaining em_cycle, EM_Cycle as IRM-specific aliases)log_lik Added to All Functionslog_lik at the top level of the return object, matching TestFitIndices$model_log_like.ModelFit class, matching the format used by IRT/LCA/LRA and other functions. Previously these used bare calcFitIndices() output (9 fields, no class). Added fields: model_log_like, bench_log_like, null_log_like, model_Chi_sq, null_Chi_sq, model_df, null_df.ModelFit class.TestFitIndices added as primary name for multigroup fit indices (previously only MG_FitIndices). MG_FitIndices retained as backward-compatible alias. SM_FitIndices (saturated model) remains unchanged.Estimate column (most probable class assignment) to the Students matrix.Estimate column (most probable class assignment) to the Students matrix.FRPIndex (Field Reference Profile indices) for consistency with Biclustering.binary and LDB.TRP from matrix(1×ncls) to numeric vector, consistent with all other functions.class = c("exametrika", "GridSearch") to return value for method dispatch support.msg field assignment: Fixed msg <- "Class" to msg = "Class" inside structure() call (line 150 of 05_LCA.R). The <- operator was being interpreted as a standalone assignment rather than a named list element, causing the msg field name to be empty.plot(r, type="RMP", students=1)). Added drop = FALSE to prevent matrix-to-vector coercion when extracting a single row from the Students matrix.TestFitIndices log-likelihood fields: Fixed null_log_like which incorrectly stored the saturated model log-likelihood (log_lik_satu) instead of the null model log-likelihood (sum(null_itemll)). Added missing bench_log_like field containing the saturated model log-likelihood (sum(satu_itemll2)). Note: the chi-square-based fit indices (NFI, CFI, TLI, RMSEA, AIC, BIC, etc.) were always computed correctly; only the stored log-likelihood labels were affected.TestFitIndices and ItemFitIndices to the standard 16-field structure with ModelFit class (c("exametrika", "ModelFit")), matching all other analysis functions. Added model_log_like, bench_log_like, null_log_like to ItemFitIndices. Removed ScoreRankCorr / RankQuantCorr from TestFitIndices (already available at the top level of the return object).test-12OLR.R and test-13NLR.R to handle the new ModelFit class (add unclass() before as.data.frame()) and adjusted column/index references to match the unified 16-field structure.layout() with a thin dedicated row (height ratio 0.2) to reduce visual clutter in data panels. Added setup_legend_layout() and draw_legend_strip() internal helper functions.P(Q>=1)=1.0 reference line at the top of FCBR plots for visual completeness.dataFormat() now correctly identifies factor-type ID columns before converting factors to numeric. Previously, the factor-to-numeric conversion occurred before ID detection, causing factor ID columns with many levels (>=20) to trigger a "Too many categories" error instead of being recognized as IDs.is_response_data()) that was defined but never called.obj$Q instead of obj$U for test length calculationpmax(Ufcq_prior, 1e-10) to prevent division by zero and NaN errorsThis release improves naming consistency across the package while maintaining full backward compatibility through a deprecation path.
All analysis functions now return results with snake_case field names for better consistency:
n_class - Number of latent classes (replaces Nclass)n_field - Number of latent fields (replaces Nfield)n_rank - Number of latent ranks (replaces Nrank)n_cycle - Number of EM iterations (replaces N_Cycle)log_lik - Log-likelihood value (replaces LogLik)The old field names continue to work for backward compatibility:
Nclass, Nfield, Nrank, N_Cycle, LogLik - Deprecated but functional# Old code (deprecated but still works)
result <- LCA(data, ncls = 3)
n_classes <- result$Nclass
# New code (recommended)
result <- LCA(data, ncls = 3)
n_classes <- result$n_class
Progress messages now display properly in R Markdown documents:
verbose parameter (default: TRUE)
ncls = 2: nfld=2 nfld=3 nfld=4 ...verbose = FALSE to suppress all progress messagesiter 1: match=0 nfld=15 ncls=30Adjusting classes: BIC=-99592.5 ncls=21 (min size < 20)verbose = TRUE to verbose = FALSE (consistent with binary/rated versions)Saturation Model - iter 1: log_lik=-1.234567Restricted Model - iter 1: log_lik=-0.987654verbose = TRUE to verbose = FALSE (consistent with binary/ordinal versions)All progress messages now use proper line breaks instead of carriage returns, ensuring clean output in R Markdown/knitr documents and web documentation.
sort(unique(...)) to ensure consistent ordering: 0 (white/incorrect) → 1 (black/correct)testEll → test_log_lik)log_lik terminologyconverge variable to all EM-based functions to indicate algorithm convergence status
New function LRA() now supports ordinal response data
Bug fixes and improvements
Standardized terminology: unified the usage of "class" and "rank" throughout the package
Various minor bug fixes