RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from http://cran.rstudio.com/web/packages/uwot/../rmarkdown/../diffobj/vignettes/diffobj.html below:

diffobj - Diffs for R Objects

Shortest Edit Script

The output from diffobj is a visual representation of the Shortest Edit Script (SES). An SES is the shortest set of deletion and insertion instructions for converting one sequence of elements into another. In our case, the elements are lines of text. We encode the instructions to convert a to b by deleting lines from a (in yellow) and inserting new ones from b (in blue).

Diff Structure

The first line of our diff output acts as a legend to the diff by associating the colors and symbols used to represent differences present in each object with the name of the object:

After the legend come the hunks, which are portions of the objects that have differences with nearby matching lines provided for context:

At the top of the hunk is the hunk header: this tells us that the first displayed hunk (including context lines), starts at line 17 and spans 6 lines for a and 7 for b. These are display lines, not object row indices, which is why the first row shown of the matrix is row 16. You might have also noticed that the line after the hunk header is out of place:

This is a special context line that is not technically part of the hunk, but is shown nonetheless because it is useful in helping understand the data. The line is styled differently to highlight that it is not part of the hunk. Since it is not part of the hunk, it is not accounted for in the hunk header. See ?guideLines for more details.

The actual mismatched lines are highlighted in the colors of the legend, with additional visual cues in the gutters:

diffobj uses a line by line diff to identify which portions of each of the objects are mismatches, so even if only part of a line mismatches it will be considered different. diffobj then runs a word diff within the hunks and further highlights mismatching words.

Letâs examine the last two lines from the previous hunk more closely:

Here b has an extra line so diffobj adds an empty line to a to maintain the alignment for subsequent matching lines. This additional line is marked with a tilde in the gutter and is shown in a different color to indicate it is not part of the original text.

If you look closely at the next matching line you will notice that the a and b values are not exactly the same. The row indices are different, but diffobj excludes row indices from the diff so that rows that are identical otherwise are shown as matching. diffobj indicates this is happening by showing the portions of a line that are ignored in the diff in grey.

See ?guides and ?trim for details and limitations on guideline detection and unsemantic meta data trimming.

Atomic Vectors

Since R can display multiple elements in an atomic vector on the same line, and diffPrint is fundamentally a line diff, we use specialized logic when diffing atomic vectors. Consider:

state.abb2 <- state.abb[-16]
state.abb2[37] <- "Pennsylvania"
diffPrint(state.abb, state.abb2)

 [1] "AL" "AK" "AZ" "AR" "CA" "CO"
 [7] "CT" "DE" "FL" "GA" "HI" "ID"
[13] "IL" "IN" "IA" "KS" "KY" "LA"
[19] "ME" "MD" "MA" "MI" "MN" "MS"
[25] "MO" "MT" "NE" "NV" "NH" "NJ"
[31] "NM" "NY" "NC" "ND" "OH" "OK"
[37] "OR" "PA" "RI" "SC" "SD" "TN"
[43] "TX" "UT" "VT" "VA" "WA" "WV"

Due to the different wrapping frequency no line in the text display of our two vectors matches. Despite this, diffPrint only highlights the lines that actually contain differences. The side effect is that lines that only contain matching elements are shown as matching even though the actual lines may be different. You can turn off this behavior in favor of a normal line diff with the unwrap.atomic argument to diffPrint.

Currently this only works for unnamed vectors, and even for them some inputs may produce sub-optimal results. Nested vectors inside lists will not be unwrapped. You can also use diffChr (see below) to do a direct element by element comparison.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4