High-Performance Open-Source Archive
This R package provides a convenient interface to the CISNET Smoking History Generator. It can produce the identical outputs as the command-line version (CLI) of the Smoking History Generator in R and offers an easy way for modelers to access the Smoking History Generator directly in R.
install.packages("SmokingHistoryGenerator")install.packages("pak")
pak::pak("NCI-CISNET/shg-r")
# OR
pak::pak("NCI-CISNET/shg-r@[optional-branch-of-your-choice]")Releases ship per-OS binaries from
R CMD INSTALL --build. Download the asset in your
browser from Releases (no
GitHub token), then install from the saved file.
macOS (Apple Silicon) — use the exact downloaded
filename (including (1) if the browser added it). R
4.6+ no longer accepts type = "binary" for macOS
CRAN builds; pass this session’s native binary type (or use
R CMD INSTALL below):
pkg_tgz <- path.expand("~/Downloads/SmokingHistoryGenerator_6.5.2-1.0.0_macos-arm64.tgz")
stopifnot(file.exists(pkg_tgz))
install.packages(pkg_tgz, repos = NULL, type = .Platform$pkgType)On older R, .Platform$pkgType is still the right choice
when it is not "source". Shell install avoids the
type argument entirely:
R CMD INSTALL /path/to/SmokingHistoryGenerator_6.5.2-1.0.0_macos-arm64.tgzIntel Macs use _macos-x64.tgz. Windows and Linux assets
use .zip / *_linux-*_R_*.tar.gz with the same
install.packages(..., repos = NULL, type = .Platform$pkgType)
idea when your R build reports a non-source pkg type.
The SHG needs calibrated input files (initiation, cessation, CPD, and
mortality tables). The package ships a default
CRAN-sized NHIS-1965–2018 csv-partial under
inst/extdata/2018/ (smoking/,
mortality/). Full NHIS-style tables are distributed as
parameter bundles via Zenodo (and GitHub Releases). See
?shg_load_params for bundle URLs, ACM vs OCM mortality,
authentication, and cache behavior.
Use this when you already have a local directory containing
smoking/ and mortality/ files and want to
point SHG directly at those inputs.
library(SmokingHistoryGenerator)
shg <- new(SHGInterface)
shg$input_data_folder <- "/path/to/usa-national@smok-2018-mort-2016"
shg$initiation_filename <- "smoking/initiation.csv"
shg$cessation_filename <- "smoking/cessation.csv"
shg$cpd_filename <- "smoking/cpd.csv"
shg$mortality_filename <- "mortality/acm.csv" # or mortality/ocm-excl-lung-cancer.csv
run_cfg <- list(
individuals = 1e5,
race = 0,
sex = 0,
cohort_year = 1980
)
bundle <- shg$runSim(run_cfg)
sim <- bundle$resultsUse a single config list that includes both smoking and mortality bundle sources and run fields.
library(SmokingHistoryGenerator)
shg <- new(SHGInterface)
run_cfg <- list(
smok_params_source = "/path/to/usa-national@smok-NHIS-2022.zip",
mort_params_source = "/path/to/usa-national@mort-v1.0.0.zip",
mort_params_type = "acm", # or "ocm"; alias `mortality = "ocm"` also works
individuals = 1e5,
race = 0,
sex = 0,
cohort_year = 1980
)
# Hydrate tables from bundle metadata in config
shg_apply_config(shg, run_cfg)
# Single run call returns coupled outputs
bundle <- shg$runSim(run_cfg)
sim <- bundle$resultsFuture Zenodo variant (same pattern; replace xxxx with
the published record id):
run_cfg <- list(
smok_params_source = "https://zenodo.org/records/xxxx/files/usa-national@smok-NHIS-2022.zip",
mort_params_source = "https://zenodo.org/records/xxxx/files/usa-national@mort-v1.0.0.zip",
mort_params_type = "acm",
individuals = 1e5,
race = 0,
sex = 0,
cohort_year = 1980
)The bundle is downloaded/extracted once and cached locally; subsequent calls reuse the cache.
Using a config list that includes a parameter bundle source (recommended), you can launch a smoking history simulation as follows:
library(SmokingHistoryGenerator)
shg <- new(SHGInterface)
N <- 10^5
race <- 0
sex <- 0
cohort_year <- 1940
run_cfg <- list(
smok_params_source = "/path/to/usa-national@smok-NHIS-2022.zip",
mort_params_source = "/path/to/usa-national@mort-v1.0.0.zip",
mort_params_type = "acm",
individuals = N,
race = race,
sex = sex,
cohort_year = cohort_year
)
# Hydrate parameter tables from config bundle metadata
shg_apply_config(shg, run_cfg)
bundle <- shg$runSim(run_cfg)
RNGSTREAM_SIM <- bundle$resultsFor a single object that couples simulated rows with
original_config,
repro_config (full snapshot), and
run_info (machine/software audit), call
the 6-argument method with
attach_run_info = TRUE:
bundle <- shg$runSim(run_cfg)
sim <- bundle$results
cfg_intent <- bundle$original_config
cfg_repro <- bundle$repro_config
audit <- bundle$run_infoshg <- new(SHGInterface)
shg_apply_config(shg, list(cohort_year = 1950))shg_apply_config() resets the instance to factory
defaults first, then applies only the keys you supply.
# Small hand-editable config snippet
shg_write_config_yaml(bundle$original_config, "intent.yml")
# Full replay config
shg_write_config_yaml(bundle$repro_config, "repro.yml")The same shg_write_config_yaml(config, path) function
handles both.
shg2 <- new(SHGInterface)
shg_apply_config(shg2, bundle$repro_config)
sim2 <- shg2$runSim(bundle$repro_config)
sim2_df <- sim2$resultsshg3 <- new(SHGInterface)
base_run <- shg_load_config(shg3, "repro.yml") # applies params + engine settings
# Keep everything else the same, change only cohort year
base_run$cohort_year <- 2000
sim3 <- shg3$runSim(base_run)
sim3_df <- sim3$resultsYou can also use a pre-generated population instead of using fixed values for race, sex, cohort_year:
If birth_cohort spans many distinct years (as in this
illustration), you need full NHIS-style
inputs—initiation, cessation, CPD, and mortality tables that include
every cohort column your population uses. The trimmed CSVs under
inst/extdata/2018 do not cover that; they
only bundle a few cohorts for CRAN. Use shg_load_params()
or set input_data_folder to a directory with complete
tables.
shg <- new(SHGInterface)
# Full tables required for multi-year cohorts—not system.file("extdata", "2018", ...):
shg$input_data_folder <- "/path/to/NHIS-1965-2018/csv-complete"
N <- 10^5 # Individuals to simulate (REPEAT)
pop <- list(
race = rep(0, N),
sex = sample(x = c(0, 1), size = N, prob = c(0.5, 0.5), replace = TRUE),
birth_cohort = rep(1930:1949, N / 20)
)
# The following are default configuration values; change as needed
shg$rng_strategy <- "RngStream"
shg$number_of_segments <- -1 # -1 = auto, or set explicit value for reproducibility
shg$num_threads <- -1 # -1 = auto (all cores), 1 = single-threaded
RNGSTREAM_SIM_POP <- shg$runSimFromDataFrame(pop)Note on RNG strategies:
number_of_segments > 1 or num_threads != 1
will result in an error.If you want to produce identical results as with legacy versions of the SHG command line version (v6.3.5 and earlier), you must select the Mersenne Twister strategy:
library(SmokingHistoryGenerator)
shg <- new(SHGInterface)
N <- 10^5 # Individuals to simulate (REPEAT)
# If you want to produce identical results as previous versions of the legacy CLI you must set the following properties:
shg$rng_strategy <- "MersenneTwister"
# Note: MersenneTwister is automatically restricted to 1 segment and non-parallel execution
MT_SIM <- shg$runSimFromFixedValues(N, 0, 0, 1940)The cpd_format property controls how cigarettes-per-day
data is returned:
shg$cpd_format <- "sparse" # Default - fastest with CPD: "20, 20, 10, 3"
shg$cpd_format <- "none" # Fastest - no CPD column returned
shg$cpd_format <- "legacy" # Backwards compatible: "17 (20), 18 (20), 19 (10)"Note: The sparse format stores only CPD
values. The age can be computed as init_age + index since
values are sequential from initiation age.
For CLI-like performance, you can write rows directly to disk. With the bundled return form, the in-memory object still includes configs and audit metadata, but not the full simulated row set (to conserve memory):
library(SmokingHistoryGenerator)
shg <- new(SHGInterface)
run_cfg <- list(
smok_params_source = "/path/to/usa-national@smok-NHIS-2022.zip",
mort_params_source = "/path/to/usa-national@mort-v1.0.0.zip",
mort_params_type = "acm",
cohort_year = 1950,
output_file = "/path/to/output-fixed.csv"
)
# Load parameters from config metadata, then run
bundle <- shg$runSim(run_cfg)
# Same bundle structure; output rows are in output-fixed.csv
# Defaults used here: individuals = 1000, race = 0, sex = 0.
# bundle$original_config / bundle$repro_config / bundle$run_info are returnedFile output matches CLI’s data format (semicolon-separated).
Set seeds on the SHGInterface before running (for
example shg$seed_init, shg$seed_cess,
shg$seed_mortality, shg$seed_misc). Use
getReproConfig() after a run to inspect the effective
values used. See ?SHGInterface and
?getReproConfig.
Use shg_apply_config() for intent-oriented updates,
getConfig() / useConfig() to read or replace
settings, and shg_write_config_yaml() /
shg_load_config() to save or reload portable YAML for exact
reruns.
The Smoking History Generator CLI (Command Line Interface) was developed in the early 2000s and maintained by several contributors since that time.
You can find a complete set of publications about the Smoking History Generator via CISNET and project-specific resource pages linked from there.
Funding for the CISNET Smoking History Generator and the Rcpp wrapper came from the following National Cancer Institute (NCI) grants.
You may not use the Software or Datasets for commercial purposes without prior written consent from the CISNET Lung Working Group and without entering into a separate license agreement regarding such commercial use. Contact: Rafael Meza Rodriguez rmeza@bccrc.ca and Jamie Tam jamie.tam@yale.edu.
The software is released under the GPL-3. The test input tables shipped with the package are released under the CC BY-SA 4.0 license.
© 2026 CISNET Lung Working Group. All rights reserved.
Need mirroring services?
Contact our team at info@vpspulse.com.
Mirror powered by VPSpulse
Infrastructure sponsored by VPSPulse & Secure Payments by ArionPay.