Skip to content

This is useful for benchmarking, but also for bug reports when you cannot share the real dataset.

Usage

gen_tbl(
  rows,
  cols = NULL,
  col_types = NULL,
  locale = default_locale(),
  missing = 0
)

Arguments

rows

Number of rows to generate

cols

Number of columns to generate, if NULL this is derived from col_types.

col_types

One of NULL, a cols() specification, or a string.

If NULL, all column types will be imputed from guess_max rows on the input interspersed throughout the file. This is convenient (and fast), but not robust. If the imputation fails, you'll need to increase the guess_max or supply the correct types yourself.

Column specifications created by list() or cols() must contain one column specification for each column. If you only want to read a subset of the columns, use cols_only().

Alternatively, you can use a compact string representation where each character represents one column:

  • c = character

  • i = integer

  • n = number

  • d = double

  • l = logical

  • f = factor

  • D = date

  • T = date time

  • t = time

  • ? = guess

  • _ or - = skip

    By default, reading a file without a column specification will print a message showing what readr guessed they were. To remove this message, set show_col_types = FALSE or set options(readr.show_col_types = FALSE).

locale

The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use locale() to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.

missing

The percentage (from 0 to 1) of missing data to use

Details

There is also a family of functions to generate individual vectors of each type.

See also

generators to generate individual vectors.

Examples

# random 10 x 5 table with random column types
rand_tbl <- gen_tbl(10, 5)
rand_tbl
#> # A tibble: 10 × 5
#>            X1 X2                      X3            X4 X5                 
#>         <int> <chr>                   <chr>      <int> <dttm>             
#>  1 1626445563 YCY0u                   dknmr23D… 1.83e9 2002-09-07 21:39:20
#>  2   78690202 UC9K9kzPcbj7wFE9ZI9x    2WdVH4Px… 1.62e9 2017-03-23 23:16:12
#>  3   33575456 4sSXW7Gqdpoqv9UovCe     eKJCATXY… 1.50e9 2016-10-29 17:09:32
#>  4  334704841 P8ssx                   WK3s2Xlh… 3.75e7 2004-08-23 07:53:10
#>  5 1117321732 jsCQvWCSSDOqoejW7h      H1rO0ia   6.78e8 2016-07-19 02:45:19
#>  6 2076402054 iWvBT5VehwNhsYlwCDhs    vqtxDwxo… 2.36e8 2015-08-08 15:22:40
#>  7  979499669 17z1TUnJ8hzh1           bbv3XSCs… 8.52e8 2012-12-05 19:06:57
#>  8 1250661942 DUhH5nD4bxmjyY          mFp7qkKg… 3.78e8 2018-10-06 16:47:24
#>  9   29137468 9wFhNsp884M2vlqRR       irGuoFyP… 1.30e9 2002-04-04 23:05:53
#> 10 1164232522 Dlb00KwXM29ovOylwp3EgJe 0Swttr6F… 4.08e7 2017-04-03 00:45:28

# all double 25 x 4 table
dbl_tbl <- gen_tbl(25, 4, col_types = "dddd")
dbl_tbl
#> # A tibble: 25 × 4
#>        X1      X2      X3      X4
#>     <dbl>   <dbl>   <dbl>   <dbl>
#>  1  0.474  0.589  -0.121  -0.791 
#>  2  0.205 -0.469   0.0436 -0.0151
#>  3 -1.41   0.455   1.26    1.40  
#>  4  0.515  0.0612  0.920  -1.10  
#>  5 -1.04   0.610  -0.809  -0.295 
#>  6 -0.797 -0.328   0.0333 -0.330 
#>  7  1.02  -1.05    0.0877  0.551 
#>  8 -1.96  -0.0982 -1.15    0.433 
#>  9 -0.465  0.470  -0.495   0.460 
#> 10  0.161  0.301   0.175  -0.723 
#> # ℹ 15 more rows

# Use the dots in long form column types to change the random function and options
types <- rep(times = 4, list(col_double(f = stats::runif, min = -10, max = 25)))
types
#> [[1]]
#> <collector_double>
#> 
#> [[2]]
#> <collector_double>
#> 
#> [[3]]
#> <collector_double>
#> 
#> [[4]]
#> <collector_double>
#> 
dbl_tbl2 <- gen_tbl(25, 4, col_types = types)
dbl_tbl2
#> # A tibble: 25 × 4
#>         X1     X2    X3     X4
#>      <dbl>  <dbl> <dbl>  <dbl>
#>  1 18.8    24.9    3.72 -1.63 
#>  2  9.42   -9.05  21.6   9.58 
#>  3  7.37   14.7   10.5  -6.74 
#>  4 -7.82   12.3    3.62  9.95 
#>  5 -2.18   -5.31   4.31 -6.19 
#>  6  0.0869 14.1   -7.04  8.38 
#>  7 -9.57   10.2   13.7   0.929
#>  8 19.8    -0.409  4.04 -7.23 
#>  9 19.6     5.68  -3.08  4.34 
#> 10 13.6    17.8   24.4   3.32 
#> # ℹ 15 more rows