This is useful for benchmarking, but also for bug reports when you cannot share the real dataset.
Usage
gen_tbl(
rows,
cols = NULL,
col_types = NULL,
locale = default_locale(),
missing = 0
)
Arguments
- rows
Number of rows to generate
- cols
Number of columns to generate, if
NULL
this is derived fromcol_types
.- col_types
One of
NULL
, acols()
specification, or a string.If
NULL
, all column types will be imputed fromguess_max
rows on the input interspersed throughout the file. This is convenient (and fast), but not robust. If the imputation fails, you'll need to increase theguess_max
or supply the correct types yourself.Column specifications created by
list()
orcols()
must contain one column specification for each column. If you only want to read a subset of the columns, usecols_only()
.Alternatively, you can use a compact string representation where each character represents one column:
c = character
i = integer
n = number
d = double
l = logical
f = factor
D = date
T = date time
t = time
? = guess
_ or - = skip
By default, reading a file without a column specification will print a message showing what
readr
guessed they were. To remove this message, setshow_col_types = FALSE
or setoptions(readr.show_col_types = FALSE)
.
- locale
The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use
locale()
to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.- missing
The percentage (from 0 to 1) of missing data to use
See also
generators to generate individual vectors.
Examples
# random 10 x 5 table with random column types
rand_tbl <- gen_tbl(10, 5)
rand_tbl
#> # A tibble: 10 × 5
#> X1 X2 X3 X4 X5
#> <int> <chr> <chr> <int> <dttm>
#> 1 1626445563 YCY0u dknmr23D… 1.83e9 2002-09-07 21:39:20
#> 2 78690202 UC9K9kzPcbj7wFE9ZI9x 2WdVH4Px… 1.62e9 2017-03-23 23:16:12
#> 3 33575456 4sSXW7Gqdpoqv9UovCe eKJCATXY… 1.50e9 2016-10-29 17:09:32
#> 4 334704841 P8ssx WK3s2Xlh… 3.75e7 2004-08-23 07:53:10
#> 5 1117321732 jsCQvWCSSDOqoejW7h H1rO0ia 6.78e8 2016-07-19 02:45:19
#> 6 2076402054 iWvBT5VehwNhsYlwCDhs vqtxDwxo… 2.36e8 2015-08-08 15:22:40
#> 7 979499669 17z1TUnJ8hzh1 bbv3XSCs… 8.52e8 2012-12-05 19:06:57
#> 8 1250661942 DUhH5nD4bxmjyY mFp7qkKg… 3.78e8 2018-10-06 16:47:24
#> 9 29137468 9wFhNsp884M2vlqRR irGuoFyP… 1.30e9 2002-04-04 23:05:53
#> 10 1164232522 Dlb00KwXM29ovOylwp3EgJe 0Swttr6F… 4.08e7 2017-04-03 00:45:28
# all double 25 x 4 table
dbl_tbl <- gen_tbl(25, 4, col_types = "dddd")
dbl_tbl
#> # A tibble: 25 × 4
#> X1 X2 X3 X4
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0.474 0.589 -0.121 -0.791
#> 2 0.205 -0.469 0.0436 -0.0151
#> 3 -1.41 0.455 1.26 1.40
#> 4 0.515 0.0612 0.920 -1.10
#> 5 -1.04 0.610 -0.809 -0.295
#> 6 -0.797 -0.328 0.0333 -0.330
#> 7 1.02 -1.05 0.0877 0.551
#> 8 -1.96 -0.0982 -1.15 0.433
#> 9 -0.465 0.470 -0.495 0.460
#> 10 0.161 0.301 0.175 -0.723
#> # ℹ 15 more rows
# Use the dots in long form column types to change the random function and options
types <- rep(times = 4, list(col_double(f = stats::runif, min = -10, max = 25)))
types
#> [[1]]
#> <collector_double>
#>
#> [[2]]
#> <collector_double>
#>
#> [[3]]
#> <collector_double>
#>
#> [[4]]
#> <collector_double>
#>
dbl_tbl2 <- gen_tbl(25, 4, col_types = types)
dbl_tbl2
#> # A tibble: 25 × 4
#> X1 X2 X3 X4
#> <dbl> <dbl> <dbl> <dbl>
#> 1 18.8 24.9 3.72 -1.63
#> 2 9.42 -9.05 21.6 9.58
#> 3 7.37 14.7 10.5 -6.74
#> 4 -7.82 12.3 3.62 9.95
#> 5 -2.18 -5.31 4.31 -6.19
#> 6 0.0869 14.1 -7.04 8.38
#> 7 -9.57 10.2 13.7 0.929
#> 8 19.8 -0.409 4.04 -7.23
#> 9 19.6 5.68 -3.08 4.34
#> 10 13.6 17.8 24.4 3.32
#> # ℹ 15 more rows