This is useful for benchmarking, but also for bug reports when you cannot share the real dataset.
Usage
gen_tbl(
rows,
cols = NULL,
col_types = NULL,
locale = default_locale(),
missing = 0
)
Arguments
- rows
Number of rows to generate
- cols
Number of columns to generate, if
NULL
this is derived fromcol_types
.- col_types
One of
NULL
, acols()
specification, or a string.If
NULL
, all column types will be imputed fromguess_max
rows on the input interspersed throughout the file. This is convenient (and fast), but not robust. If the imputation fails, you'll need to increase theguess_max
or supply the correct types yourself.Column specifications created by
list()
orcols()
must contain one column specification for each column. If you only want to read a subset of the columns, usecols_only()
.Alternatively, you can use a compact string representation where each character represents one column:
c = character
i = integer
n = number
d = double
l = logical
f = factor
D = date
T = date time
t = time
? = guess
_ or - = skip
By default, reading a file without a column specification will print a message showing what
readr
guessed they were. To remove this message, setshow_col_types = FALSE
or set `options(readr.show_col_types = FALSE).
- locale
The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use
locale()
to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.- missing
The percentage (from 0 to 1) of missing data to use
See also
generators to generate individual vectors.
Examples
# random 10 x 5 table with random column types
rand_tbl <- gen_tbl(10, 5)
rand_tbl
#> # A tibble: 10 × 5
#> X1 X2 X3 X4 X5
#> <dbl> <chr> <time> <date> <time>
#> 1 -0.360 Cx8IGGQE0xNQ2gUwi7l4LhWy 23:06:12 2012-07-04 07:27:50
#> 2 2.03 0ppro74Vd6526yNFUJhFCn 05:23:50 2011-12-19 11:35:33
#> 3 0.713 lyGwrIU0Q6RDFnHYXXK9d 14:51:28 2020-01-18 17:30:55
#> 4 0.546 A3LY0Ft0L9bbs 03:54:58 2007-02-25 14:43:31
#> 5 -1.75 PSTHw9hkhGIdgzHFRPnWPTvp 07:34:07 2017-12-07 07:20:46
#> 6 -0.835 o3wRoWS 17:51:21 2006-07-11 21:58:57
#> 7 -0.157 tncO3afldeEK4Xuya 06:17:45 2004-04-27 09:21:06
#> 8 0.459 l10zUfHUXhCkhINeMsG0BbL27 23:14:52 2011-12-29 17:40:19
#> 9 1.34 LvbzaOz2JklmAgXOmy3zi 15:43:46 2007-03-15 13:02:26
#> 10 0.0855 Cfa5vhOl5vaFh7LrPb 03:41:14 2009-01-17 15:49:57
# all double 25 x 4 table
dbl_tbl <- gen_tbl(25, 4, col_types = "dddd")
dbl_tbl
#> # A tibble: 25 × 4
#> X1 X2 X3 X4
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0.346 -1.70 1.40 -0.146
#> 2 0.333 -0.484 -0.311 -0.714
#> 3 0.0764 0.194 0.139 -0.343
#> 4 1.71 0.00121 -0.0497 1.56
#> 5 -0.373 1.82 -1.33 1.14
#> 6 -1.11 0.256 0.797 0.863
#> 7 0.274 -1.64 -1.04 0.966
#> 8 -1.94 -0.768 -1.25 0.311
#> 9 -0.118 -0.0402 -1.31 -1.59
#> 10 0.357 0.893 0.977 1.08
#> # ℹ 15 more rows
# Use the dots in long form column types to change the random function and options
types <- rep(times = 4, list(col_double(f = stats::runif, min = -10, max = 25)))
types
#> [[1]]
#> <collector_double>
#>
#> [[2]]
#> <collector_double>
#>
#> [[3]]
#> <collector_double>
#>
#> [[4]]
#> <collector_double>
#>
dbl_tbl2 <- gen_tbl(25, 4, col_types = types)
dbl_tbl2
#> # A tibble: 25 × 4
#> X1 X2 X3 X4
#> <dbl> <dbl> <dbl> <dbl>
#> 1 -1.96 22.6 5.69 17.1
#> 2 -2.07 -2.78 9.23 13.4
#> 3 12.1 19.2 15.5 8.45
#> 4 24.6 -3.90 -6.85 -6.64
#> 5 -9.80 0.823 11.5 15.4
#> 6 -8.11 0.557 -5.89 14.4
#> 7 5.84 4.34 -5.68 9.90
#> 8 16.0 -2.42 8.95 14.3
#> 9 -9.30 10.3 17.5 19.8
#> 10 7.60 23.8 25.0 18.4
#> # ℹ 15 more rows