Package 'kfold' reference manual

Title:	Machine Learning for Runoff Prediction
Description:	Machine learning In k-fold cross validation .
Authors:	Dongdong Kong [aut, cre] (ORCID: <https://orcid.org/0000-0003-1836-8172>)
Maintainer:	Dongdong Kong <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.1
Built:	2026-06-06 15:14:45 UTC
Source:	https://github.com/rpkgs/kfold

add_previous

Description

add_previous

Usage

add_previous(d, nlead = 12)
add_previous(d, nlead = 12)

Arguments

d

with the variable of Q_obs

nlead

the number of leads to add

Stratified k-fold split

Description

Splits observation indices into kfold groups, ensuring each group receives a representative range of the target variable y.

Usage

chunk_stratified(y, kfold = 5, seed = 1)
chunk_stratified(y, kfold = 5, seed = 1)

Arguments

y

Numeric vector of target values used for stratified splitting.

kfold

Number of folds.

seed

Random seed (currently unused; seed is fixed internally).

Value

A named list of length kfold, each element an integer vector of row indices belonging to that fold.

Build lagged feature matrices for multiple lead times

Description

Build lagged feature matrices for multiple lead times

Usage

feature_leads(data_full, leads = 1:12)
feature_leads(data_full, leads = 1:12)

Arguments

data_full

A data.table / data.frame with columns Q_obs, P, PET_Romanenko, and Q_sim.

leads

Integer vector of lead times (in time steps) to build features for.

Value

A named list (one element per lead) of lists with X (feature matrix) and Y (response matrix).

Compute GOF across multiple lead-time kfold objects

Description

Compute GOF across multiple lead-time kfold objects

Usage

GOT_list(list_kfold, list_test, ..., idcol = "lead")
GOT_list(list_kfold, list_test, ..., idcol = "lead")

Arguments

...

Ignored.

idcol

Column name for the lead-time id column.

objects

Named list of kfold objects (one per lead time).

ds_test

Named list of test datasets (one per lead time), each a list with X and Y.

Value

A data.table of GOF metrics with columns lead and mode.

kfold_calib

Description

Calibrate a model on a single train/validation split.

Usage

kfold_calib(X, Y, FUN = xgboost, index = NULL, ..., ratio_valid = 0.3)
kfold_calib(X, Y, FUN = xgboost, index = NULL, ..., ratio_valid = 0.3)

Arguments

X

Feature matrix (rows = observations).

Y

Response matrix (rows = observations).

FUN

Model fitting function with signature FUN(x_train, y_train, ...).

index

Integer vector of validation row indices. If NULL, the first floor(n * ratio_valid) rows are used.

...

Additional arguments forwarded to FUN.

ratio_valid

Fraction of rows used as validation when index = NULL.

kfold machine learning

Description

kfold machine learning

Usage

kfold_ml(
  X,
  Y,
  kfold = 5,
  FUN,
  ...,
  fn_chunk = chunk_stratified,
  .progress = TRUE
)

kfold_rf(X, Y, kfold = 5, FUN = ranger, ntree = 500, importance = "none", ...)

kfold_xgboost(X, Y, kfold = 5, FUN = xgboost, nrounds = 500, ...)

kfold_lm(X, Y, kfold = 5, ...)
kfold_ml(
  X,
  Y,
  kfold = 5,
  FUN,
  ...,
  fn_chunk = chunk_stratified,
  .progress = TRUE
)

kfold_rf(X, Y, kfold = 5, FUN = ranger, ntree = 500, importance = "none", ...)

kfold_xgboost(X, Y, kfold = 5, FUN = xgboost, nrounds = 500, ...)

kfold_lm(X, Y, kfold = 5, ...)

Arguments

X

Feature matrix (rows = observations).

Y

Response matrix (rows = observations).

kfold

Number of folds.

FUN

Model fitting function with signature FUN(x_train, y_train, ...).

...

Additional arguments forwarded to FUN.

fn_chunk

Fold-splitting function; defaults to chunk_stratified().

.progress

Show a progress bar during fold iteration.

ntree

Number of trees for kfold_rf().

importance

Variable importance mode, one of 'none', 'impurity', 'impurity_corrected', 'permutation'. The 'impurity' measure is the Gini index for classification, the variance of the responses for regression and the sum of test statistics (see splitrule) for survival.

nrounds

Number of boosting iterations / rounds.

Note that the number of default boosting rounds here is not automatically tuned, and different problems will have vastly different optimal numbers of boosting rounds.

Examples

set.seed(1)
n <- 100 ; p <- 2
X <- matrix(rnorm(n * p), n, p) # no intercept!
y <- as.matrix(rnorm(n))

## kfold
r_lm  <- kfold_lm(X, y)
r_xgb <- kfold_xgboost(X, y)
# r_rf  <- kfold_rf(X, y)

## 70%-30% split
r = kfold_calib(X, y, ratio_valid = 0.7, nrounds=500, verbose=FALSE)
r$gof
set.seed(1)
n <- 100 ; p <- 2
X <- matrix(rnorm(n * p), n, p) # no intercept!
y <- as.matrix(rnorm(n))

## kfold
r_lm  <- kfold_lm(X, y)
r_xgb <- kfold_xgboost(X, y)
# r_rf  <- kfold_rf(X, y)

## 70%-30% split
r = kfold_calib(X, y, ratio_valid = 0.7, nrounds=500, verbose=FALSE)
r$gof

GOF

Description

Good of fitting

Usage

NSE(yobs, ysim, w, ...)

GOF(yobs, ...)

## Default S3 method:
GOF(
  yobs,
  ysim,
  w = NULL,
  include.cv = FALSE,
  include.r = TRUE,
  ...,
  idcol = "kfold",
  mode = "test"
)

## S3 method for class 'kfold'
GOF(yobs, test = NULL, ...)
NSE(yobs, ysim, w, ...)

GOF(yobs, ...)

## Default S3 method:
GOF(
  yobs,
  ysim,
  w = NULL,
  include.cv = FALSE,
  include.r = TRUE,
  ...,
  idcol = "kfold",
  mode = "test"
)

## S3 method for class 'kfold'
GOF(yobs, test = NULL, ...)

Arguments

yobs

Numeric vector, observations

ysim

Numeric vector, corresponding simulated values

w

Numeric vector, weights of every points. If w included, when calculating mean, Bias, MAE, RMSE and NSE, w will be taken into considered.

...

Ignored.

include.cv

If true, cv will be included.

include.r

If true, r and R2 will be included.

idcol

Column name for the id column when binding multi-column results.

mode

Label inserted into the mode column when computing multi-column GOF.

test

A list with X and Y for external testing. If NULL, GOF will be calculated

Value

RMSE root mean square error
NSE NASH coefficient
MAE mean absolute error
AI Agreement index (only good points (w == 1)) participate to calculate. See details in Zhang et al., (2015).
Bias bias
Bias_perc bias percentage
n_sim number of valid obs
cv Coefficient of variation
R2 correlation of determination
R pearson correlation
pvalue pvalue of R

References

https://en.wikipedia.org/wiki/Coefficient_of_determination
https://en.wikipedia.org/wiki/Explained_sum_of_squares
https://en.wikipedia.org/wiki/Nash%E2%80%93Sutcliffe_model_efficiency_coefficient
Zhang Xiaoyang (2015), http://dx.doi.org/10.1016/j.rse.2014.10.012

Examples

yobs <- rnorm(100)
ysim <- yobs + rnorm(100) / 4
GOF(yobs, ysim)
yobs <- rnorm(100)
ysim <- yobs + rnorm(100) / 4
GOF(yobs, ysim)

predict for kfold object

Description

predict for kfold object

Usage

## S3 method for class 'kfold'
predict(object, newdata = NULL, ..., mode = "test")
## S3 method for class 'kfold'
predict(object, newdata = NULL, ..., mode = "test")

Arguments

object

A kfold object returned by kfold_ml().

newdata

New feature matrix for prediction. Required when mode = "test".

...

Additional arguments forwarded to the underlying model's predict method.

mode

Prediction mode: "train" (in-sample, hold-out fold masked), "valid" (out-of-fold only), or "test" (full new data).

previous_tn

Description

Build a lagged matrix: column t is the original series, columns t-1, t-2, … are progressively shifted (lagged) copies.

Usage

previous_tn(x, n = 7, prefix = "", ...)

## Default S3 method:
previous_tn(x, n = 7, prefix = "", ...)

## S3 method for class 'data.frame'
previous_tn(x, n = 7, ...)
previous_tn(x, n = 7, prefix = "", ...)

## Default S3 method:
previous_tn(x, n = 7, prefix = "", ...)

## S3 method for class 'data.frame'
previous_tn(x, n = 7, ...)

Arguments

x

Numeric vector (default method) or data.frame to lag.

n

Number of lags to create.

prefix

Character prefix prepended to each column name.

...

Ignored.

Examples

set.seed(1)
x <- rnorm(10)
previous_tn(x, 7, "R1_")
# data.frame
d = data.frame(x)
previous_tn(d)
set.seed(1)
x <- rnorm(10)
previous_tn(x, 7, "R1_")
# data.frame
d = data.frame(x)
previous_tn(d)

Package 'kfold'

Help Index

add_previous

Description

Usage

Arguments

Stratified k-fold split

Description

Usage

Arguments

Value

Build lagged feature matrices for multiple lead times

Description

Usage

Arguments

Value

Compute GOF across multiple lead-time kfold objects

Description

Usage

Arguments

Value

kfold_calib

Description

Usage

Arguments

kfold machine learning

Description

Usage

Arguments

See Also

Examples

GOF

Description

Usage

Arguments

Value

References

Examples

predict for kfold object

Description

Usage

Arguments

previous_tn

Description

Usage

Arguments

Examples