A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/sl-solution/DLMReader.jl below:

sl-solution/DLMReader.jl: High-performance delimited-file reader and writer for Julia

An efficient multi-threaded package for reading(writing) delimited files. It is designed as a file parser for InMemoryDatasets.jl.

DLMReader writes and reads AbstractDatasets types, i.e. other types must be converted to/from AbstractDatasets.

It works very well for huge files (long or/and wide).

DLMReader does not guess delimiter and if it is different from ,, it must be passed via the delimiter keyword argument. By default, the DLMReader package assumes Strings are not quoted, if they are quoted, user must pass the quote character via the quotechar keyword argument.

DLMReader.jl has some interesting features which distinguish it from other packages for reading delimited files. In what follows, we list few of them;

See here for some benchmarks.

The following files will be used during the examples, it is assumed that the files are located in the current working directory

ex1.csv

a, b, c
1,2,NA
2,3,2001-1-2
2,4,2020-4-2
1,2,2000-12-1

ex2.csv

a::b::C::DD
12::1345::15::15
12::13::15::15
12::13::15::15
12::13::15::15
12::13::15::15
12::13::15::15
12::13::15::15
12::13::::15
12::13::15::15
12::13::15::157

ex3.csv

ex4.csv

ex5.csv

x1;x2:x3,x4
1;2;123;3
2;4,4,5

ex6.csv

id1 $2,000,000 3
id2 $34,000 4
id3 $200,000 1

And the code to read them into Julia

julia> using DLMReader
julia> filereader("ex1.csv", dtformat = Dict(3 => dateformat"y-m-d"))
julia> filereader("ex2.csv", dlmstr = "::")
julia> filereader("ex3.csv", types = [Int, Int, Int], header = false, linebreak = ';', delimiter = '\n')
julia> filereader("ex4.csv", fixed = Dict(1 => 1:4), header = false)
julia> filereader("ex5.csv", delimiter = [';', ':', ','])
julia> filereader("ex6.csv", delimiter = ' ', informat = Dict(2=>COMMA!), header = [:ID, :price, :quarter])

COMMA! is a built-in informat which removes the comma from numbers. If number contains dollar or sterling signs, it also removes them. The trimmed text is sent to the parser for converting to a number.

julia> filereader(IOBuffer("1,2,3,4,5\n6,7,8\n10\n"),
                  header = [:x1, :x2],
                  types = [Int, Int],
                  multiple_obs = true)
5×2 Dataset
 Row │ x1        x2       
     │ identity  identity
     │ Int64?    Int64?   
─────┼────────────────────
   11         2
   23         4
   35         6
   47         8
   510   missing

julia> filereader(IOBuffer(""" name1 name2 avg1 avg2  y
              0   A   D   75   5    32
              1   A   D   75   5    32
              2   D   L   32   7    12
              3   F   C   99   8    42
              4   F   C   99   8    42
              5   C   A   43   6    39
              6   C   A   43   6    39
              7   L   R   53   3    11
              8   R   F   21   2    25
              9   R   F   21   2    25
              """), delimiter = ' ', ignorerepeated = true, emptycolname = true)
10×6 Dataset
 Row │ NONAME1   name1     name2     avg1      avg2      y        
     │ identity  identity  identity  identity  identity  identity
     │ Int64?    String?   String?   Int64?    Int64?    Int64?   
─────┼────────────────────────────────────────────────────────────
   10  A         D               75         5        32
   21  A         D               75         5        32
   32  D         L               32         7        12
   43  F         C               99         8        42
   54  F         C               99         8        42
   65  C         A               43         6        39
   76  C         A               43         6        39
   87  L         R               53         3        11
   98  R         F               21         2        25
  109  R         F               21         2        25

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4