Wikidiff2 is a native extension for PHP that provides a faster diff engine to MediaWiki. It is partly based on the original wikidiff, and partly on MediaWiki's DifferenceEngine class. It produces diffs from input text (line-based or word-level) and can format these as HTML or JSON.
Wikidiff2 includes support for character-level diffs for text composed of characters from the Japanese and Thai alphabets and the unified Han, and includes support for Thai segmentation for word-level diffs in that language. Japanese, Chinese and Thai do not use spaces to separate words. The input is assumed to be UTF-8 encoded. Invalid UTF-8 may cause undesirable operation, such as truncation of the output, so the input should be validated by the application. The input text should have Unix-style line endings.
apt-get install php-wikidiff2
On older versions of the package you may need to run a command to actually enable the extension:
First, get and compile libthai (it should be available from your OS or distro's packages, e.g. libthai-dev
).
You can download wikidiff2 through git (git clone https://gerrit.wikimedia.org/r/mediawiki/php/wikidiff2
) or by downloading a tarball from https://releases.wikimedia.org/wikidiff2/.
Then compile wikidiff2. You need phpize (shipped with PHP).
cd wikidiff2 phpize ./configure make sudo make install
Make sure that your php option
is set. This is usually set in your "php.ini" file.
The following "php.ini" parameters are supported:
wikidiff2.moved_line_threshold
Wikidiff2 estimates similarity of added and deleted lines based on changed character count. When the similarity of an added and deleted line is greater than this threshold, the lines are displayed as moved.
Range 0.0 .. 1.0. Default 0.4
.
wikidiff2.change_threshold
Changed lines with a similarity value below this threshold will be split into a deleted line and added line. This helps matching up moved lines in some cases.
Range 0.0 .. 1.0. Default 0.2
.
wikidiff2.moved_paragraph_detection_cutoff
When the number of added and deleted lines in a table diff is greater than this limit, no attempt to detect moved lines will be made.
Default 100
.
wikidiff2.max_word_level_diff_complexity
When comparing two lines for changes within the line, a word-level diff will be done unless the product of the LHS word count and the RHS word count exceeds this limit.
Default 40000000
.
If the module is installed into PHP, MediaWiki will try and use it. See $wgExternalDiffEngine for configuration options.
The input is assumed to be UTF-8 encoded. Invalid UTF-8 may cause undesirable operation, such as truncation of the output, so the input should be validated by the application. The input text should have Unix-style line endings.
function wikidiff2_do_diff(string $text1, string $text2, int $numContextLines): string
Compare two strings $text1
and $text2
, and produce output formatted as a fragment of an HTML table, that is, a series of <tr>
elements.
$numContextLines
is the number of copied context lines shown before and after each change. Before each block of context lines and changes, a line number will appear as an HTML comment inside a <tr>
/<td>
, e.g.
<!--LINE 1-->
This allows the application to localize line numbers.
wikidiff2_inline_diff[edit]function wikidiff2_inline_diff(string $text1, string $text2, int $numContextLines): string
Compare two strings $text1
and $text2
, and produce output formatted as inline HTML.
function wikidiff2_inline_json_diff(string $text1, string $text2, int $numContextLines): string
Compare two strings $text1
and $text2
and produce output formatted as JSON. See the JSON diff format documentation.
function wikidiff2_multi_format_diff(string $text1, string $text2, array $options = []): array
Compare two strings $text1
and $text2
with an associative array of options:
$text2
which may be considered for a word-level diff against a single line of $text1
. Default: 1.The return value is an associative array of formatted outputs. The key of each element is the format name table, inline or inlineJSON, and the value is a string.
function wikidiff2_version(): string {}
Produces the same thing as phpversion('wikidiff2')
.
The HTML diff—a number of HTML table rows with the rest of the document structure omitted—is available as a side-by-side or inline comparison. The characters "<", ">" and "&" will be HTML-escaped in the output. In the Wikidiff2 C++ library, you can access the side-by-side diff using the TableDiff
class or the inline diff using the InlineDiff
class. Both classes include an execute method that returns the diff of the text passed in as parameters. You can also access these execute methods using the PHP wrapper functions wikidiff2_do_diff
(for the side-by-side diff) and wikidiff2_inline_diff
(for the inline diff).
The JSON diff provides structured data to compose a visual, line-by-line comparison between two sets of text. In the Wikidiff2 C++ library, you can access the JSON diff using the InlineDiffJSON
class, which includes an execute method that returns the diff of the text passed in as parameters. You can also access this execute method using the PHP wrapper function wikidiff2_inline_json_diff
.
JSON diff schema
The JSON diff includes properties to identify changes between the two sets of text. For an example of a JSON diff, see the MediaWiki REST API compare revisions endpoint.
property descriptiondiff
required | array of objects
Each object in thediff
array represents a line in a visual, line-by-line comparison between the two revisions. diff.type
required | integer
The type of change represented by the diff object, either:0
: A line with the same content in both revisions, included to provide context when viewing the diff. The API returns up to two context lines around each change.1
: A line included in the to
revision but not in the from
revision.2
: A line included in the from
revision but not in the to
revision.3
: A line containing text that differs between the two revisions. (For changes to paragraph location as well as content, see type 5.)4
: When a paragraph's location differs between the two revisions, a type 4 object represents the location in the from
revision.5
: When a paragraph's location differs between the two revisions, a type 5 object represents the location in the to
revision. This type can also include word-level differences between the two revisions.diff.lineNumber
optional | integer
The line number of the change based on theto
revision. diff.text
required | string
The text of the line, including content from both revisions. For a line containing text that differs between the two revisions, you can usehighlightRanges
to visually indicate added and removed text. For a line containing a new line, the API returns the text as ""
(empty string). diff.highlightRanges
optional | array of objects
An array of objects that indicate where and in what style text should be highlighted to visually represent changes.Each object includes:
start
(integer): Where the highlighted text should start, in the number of bytes from the beginning of the line.length
(integer): The length of the highlighted section, in bytes.type
(integer): The type of highlight:
0
indicates an addition.1
indicates a deletion.diff.moveInfo
optional | object
Visual indicators to use when a paragraph's location differs between the two revisions.moveInfo
objects occur in pairs within the diff.
id
(string): The ID of the paragraph described by the diff object.linkId
(string): The ID of the corresponding paragraph.
linkId
represents the location in the to
revision.linkId
represents the location in the from
revision.linkDirection
(integer): A visual indicator of the relationship between the two locations. You can use this property to display an arrow icon within the diff.
0
indicates that the linkId
paragraph is lower on the page than the id
paragraph.1
indicates that the linkId
paragraph is higher on the page than the id
paragraph.diff.offset
required | object
The location of the line in bytes from the beginning of the page, including:from
(integer): The first byte of the line in the from
revision. A null
value indicates that the line doesn't exist in the from
revision.to
(integer): The first byte of the line in the to
revision. A null
value indicates that the line doesn't exist in the to
revision.RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4