Skip to content

This is useful for benchmarking, but also for bug reports when you cannot share the real dataset.

Usage

gen_tbl(
  rows,
  cols = NULL,
  col_types = NULL,
  locale = default_locale(),
  missing = 0
)

Arguments

rows

Number of rows to generate

cols

Number of columns to generate, if NULL this is derived from col_types.

col_types

One of NULL, a cols() specification, or a string.

If NULL, all column types will be imputed from guess_max rows on the input interspersed throughout the file. This is convenient (and fast), but not robust. If the imputation fails, you'll need to increase the guess_max or supply the correct types yourself.

Column specifications created by list() or cols() must contain one column specification for each column. If you only want to read a subset of the columns, use cols_only().

Alternatively, you can use a compact string representation where each character represents one column:

  • c = character

  • i = integer

  • n = number

  • d = double

  • l = logical

  • f = factor

  • D = date

  • T = date time

  • t = time

  • ? = guess

  • _ or - = skip

    By default, reading a file without a column specification will print a message showing what readr guessed they were. To remove this message, set show_col_types = FALSE or set `options(readr.show_col_types = FALSE).

locale

The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use locale() to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.

missing

The percentage (from 0 to 1) of missing data to use

Details

There is also a family of functions to generate individual vectors of each type.

See also

generators to generate individual vectors.

Examples

# random 10 x 5 table with random column types
rand_tbl <- gen_tbl(10, 5)
rand_tbl
#> # A tibble: 10 × 5
#>         X1 X2                        X3       X4         X5      
#>      <dbl> <chr>                     <time>   <date>     <time>  
#>  1 -0.360  Cx8IGGQE0xNQ2gUwi7l4LhWy  23:06:12 2012-07-04 07:27:50
#>  2  2.03   0ppro74Vd6526yNFUJhFCn    05:23:50 2011-12-19 11:35:33
#>  3  0.713  lyGwrIU0Q6RDFnHYXXK9d     14:51:28 2020-01-18 17:30:55
#>  4  0.546  A3LY0Ft0L9bbs             03:54:58 2007-02-25 14:43:31
#>  5 -1.75   PSTHw9hkhGIdgzHFRPnWPTvp  07:34:07 2017-12-07 07:20:46
#>  6 -0.835  o3wRoWS                   17:51:21 2006-07-11 21:58:57
#>  7 -0.157  tncO3afldeEK4Xuya         06:17:45 2004-04-27 09:21:06
#>  8  0.459  l10zUfHUXhCkhINeMsG0BbL27 23:14:52 2011-12-29 17:40:19
#>  9  1.34   LvbzaOz2JklmAgXOmy3zi     15:43:46 2007-03-15 13:02:26
#> 10  0.0855 Cfa5vhOl5vaFh7LrPb        03:41:14 2009-01-17 15:49:57

# all double 25 x 4 table
dbl_tbl <- gen_tbl(25, 4, col_types = "dddd")
dbl_tbl
#> # A tibble: 25 × 4
#>         X1       X2      X3     X4
#>      <dbl>    <dbl>   <dbl>  <dbl>
#>  1  0.346  -1.70     1.40   -0.146
#>  2  0.333  -0.484   -0.311  -0.714
#>  3  0.0764  0.194    0.139  -0.343
#>  4  1.71    0.00121 -0.0497  1.56 
#>  5 -0.373   1.82    -1.33    1.14 
#>  6 -1.11    0.256    0.797   0.863
#>  7  0.274  -1.64    -1.04    0.966
#>  8 -1.94   -0.768   -1.25    0.311
#>  9 -0.118  -0.0402  -1.31   -1.59 
#> 10  0.357   0.893    0.977   1.08 
#> # ℹ 15 more rows

# Use the dots in long form column types to change the random function and options
types <- rep(times = 4, list(col_double(f = stats::runif, min = -10, max = 25)))
types
#> [[1]]
#> <collector_double>
#> 
#> [[2]]
#> <collector_double>
#> 
#> [[3]]
#> <collector_double>
#> 
#> [[4]]
#> <collector_double>
#> 
dbl_tbl2 <- gen_tbl(25, 4, col_types = types)
dbl_tbl2
#> # A tibble: 25 × 4
#>       X1     X2    X3    X4
#>    <dbl>  <dbl> <dbl> <dbl>
#>  1 -1.96 22.6    5.69 17.1 
#>  2 -2.07 -2.78   9.23 13.4 
#>  3 12.1  19.2   15.5   8.45
#>  4 24.6  -3.90  -6.85 -6.64
#>  5 -9.80  0.823 11.5  15.4 
#>  6 -8.11  0.557 -5.89 14.4 
#>  7  5.84  4.34  -5.68  9.90
#>  8 16.0  -2.42   8.95 14.3 
#>  9 -9.30 10.3   17.5  19.8 
#> 10  7.60 23.8   25.0  18.4 
#> # ℹ 15 more rows