⚠️ Release of 0.60.0 has breaking changes due to re-organization, renaming, clean up and optimization of implementation classes.
In many cases, the extent of the changes and optimizations made it impossible to use the slow @Deprecated
annotation with eventual removal. Manual changes to code will be required to migrate code.
ℹ️ If you encounter difficulty in migrating some code constructs please open an issue with details of the construct to be migrated.
Major improvements include:
New implementation of SegmentedSequence
using binary offset tree with efficient access, storage and instantiation.
New SequenceBuilder
class, used to create segmented sequences of arbitrary content without concern for segment ordering or whether they share a common base sequence.
A segment which cannot be converted to an offset range from the base sequence will be converted to out of base characters, preserving the expected character sequence result.
The builder will optimize literal characters when they match corresponding base sequence characters with special handling of spaces and EOL characters. This means that adding literal spaces and EOL characters instead of using a subsequence will result in them being efficiently replaced by segments from the original base sequence.
For convenience, an instance of SequenceBuilder
can be obtained from any based sequence through BasedSequence.getBuilder()
method.
New LineAppendable
implementation used for rendering text. Internally, the class builds a list of lines and keeps track of each line's prefix portion, allowing efficient access and manipulation of lines and prefixes in the rendered result.
The generated BasedSequence
result will result in a SegmentedSequence
with offsets into the source sequence preserved, allowing mapping offsets in result to original sequence.
The result lines are stored as separate BasedSequences
to maximize preservation of original sequence offset information when the rendering rearranges the lines of the source, as in the case of formatting with reference definition sorting or MarkdownTable
sorting.
Formatting module is now part of the core library with additional features:
Formatter implementation is now part of core implementation in flexmark
module
Formatter
improved with more options including wrapping text to margins.
TrackedOffset
. Used by MarkdownParagraph
for text wrapping and MarkdownTable
for table formatting and able to handle caret position during typing and backspace editing operations which are immediately followed by formatting or the edited source.Tests cleaned up to eliminate duplication and hacks
flexmark-test-util
made reusable for other projects. Having markdown as the source code for tests is too convenient for use only in flexmark-java
tests.
Optimized SegmentedSequence
implementation using binary trees for searching segments and byte efficient segment packing. Parser performance is either slightly improved or not affected but allows using SegmentedSequences
for collecting Formatter
and HtmlRenderer
output to track source location of all text with minimal overhead and double the performance of old implementation.
new implementation of LineAppendable
replaces LineFormattingAppendable
used for text generation in rendering:
uses SequenceBuilder
to generate BasedSequence
result with original source offsets for those character segments which come from the source. This allows round trip source tracking from Source -> AST -> Formatted Source -> Source throughout the library.
As an added bonus using the appendable makes formatting to it 40% faster than previous implementation and 160 times more efficient in memory use. For the tests below, old implementation allocated 6GB worth of segmented sequences, new implementation 37MB. The % overhead for the new implementation is four times greater than before but that is after a 43 fold reduction in total overhead bytes, old implementation needed 342MB of overhead, new implementation 8MB.
As a result of increased efficiency, two additional files of about 600kB each can be included in the test run and only add 0.6 sec to the formatter run time.
Tests run on 1141 markdown files from GitHub projects and some other user samples. Largest was 256k bytes.
Description Old SegmentedSequence New Segmented Sequence New LineAppendable Total wall clock time 13.896 sec 9.672 sec 8.344 sec Parse time 2.402 sec 2.335 sec 2.297 sec Formatter appendable 0.603 sec 0.602 sec 0.831 sec Formatter sequence builder 7.264 sec 3.109 sec 1.772 secThe overhead difference is significant. The totals are for all segmented sequences created during the test run of 1141 files. Parser statistics show requirements during parsing and formatting.
Description Old Parser Old Formatter New Parser New Formatter New LineAppendable Bytes for characters of all segmented sequences 917,016 6,029,774,526 917,016 6,029,774,526 37,663,196 Bytes for overhead of all segmented sequences 1,845,048 12,060,276,408 93,628 342,351,155 8,204,796 Overhead % 201.2% 200.0% 10.2% 5.7% 21.8%Break: split out generic AST utilities from flexmark-util
module into separate smaller modules. com.vladsch.flexmark.util
no longer contains any files, only separate utility modules with flexmark-utils
module being an aggregate of all utilities modules, similar to flexmark-all
ast/
classes to flexmark-util-ast
builder/
classes to flexmark-util-builder
collection/
classes to flexmark-util-collection
data/
classes to flexmark-util-data
dependency/
classes to flexmark-util-dependency
format/
classes to flexmark-util-format
html/
classes to flexmark-util-html
mappers/
classes to flexmark-util-sequence
options/
classes to flexmark-util-options
sequence/
classes to flexmark-util-sequence
visitor/
classes to flexmark-util-visitor
Break: delete deprecated properties, methods and classes
Add: org.jetbrains:annotations:15.0
dependency to have @Nullable
/@NotNull
annotations added for all parameters. When using IntelliJ IDEA for development, it helps to have these annotations for analysis of potential problems and makes it easier to use the library with Kotlin.
Break: refactor and cleanup tests to eliminate duplicated code and allow easier reuse of test cases with spec example data.
Break: move formatter tests to flexmark-core-test
module to allow sharing of formatter base classes in extensions without causing dependency cycles in formatter module.
Break: move formatter module into flexmark
core. this module is almost always included anyway because most extension have a dependency on formatter for their custom formatting implementations. Having it as part of the core allows relying on its functionality in all modules.
Break: move com.vladsch.flexmark.spec
and com.vladsch.flexmark.util
in flexmark-test-util
to com.vladsch.flexmark.test.spec
and com.vladsch.flexmark.test.util
respectively to respect the naming convention between modules and their packages.
Break: NodeVisitor
implementation details have changed. If you were overriding NodeVisitor.visit(Node)
in the previous version it is now final
to ensure compile time error is generated. You will need to change your implementation. See javadoc comment in the NodeVisitor
class for instructions.
ℹ️ com.vladsch.flexmark.util.ast.Visitor
is only needed for implementation of NodeVisitor
and VisitHandler
. If all anonymous implementations of VisitHandler
are converted to lambdas, then imports for Visitor
can be eliminated.
com.vladsch.flexmark.util.ast.NodeAdaptedVisitor
see javadoc for classcom.vladsch.flexmark.util.ast.NodeAdaptingVisitHandler
com.vladsch.flexmark.util.ast.NodeAdaptingVisitor
IntelliJ-IDEA migration migrate flexmark-java 0_50_x to 0_60_0.xml can be used to assist in migrating from 0.50.40 to 0.60 version of the library. It will migrate class name and package changes only.
Changes to arguments and method changes have to be addressed manually.
This class is renamed to LineAppendable
. Implementation and subclasses are similarly renamed to remove Formatting
in the class name.
All formatting flags are now prefixed with F_
and when present, select the given modification of appended text. Previously, ALLOW_LEADING_WHITESPACE
and ALLOW_LEADING_EOL
were inverted and setting them disabled the text modification.
ALLOW_LEADING_WHITESPACE
is now F_TRIM_LEADING_WHITESPACE
and has inverted meaning.ALLOW_LEADING_EOL
is now F_TRIM_LEADING_EOL
and has inverted meaning.CONVERT_TABS
is now F_CONVERT_TABS
COLLAPSE_WHITESPACE
is now F_COLLAPSE_WHITESPACE
TRIM_TRAILING_WHITESPACE
is now F_TRIM_TRAILING_WHITESPACE
PASS_THROUGH
is now F_PASS_THROUGH
TRIM_LEADING_WHITESPACE
is now F_TRIM_LEADING_WHITESPACE
PREFIX_PRE_FORMATTED
is now F_PREFIX_PRE_FORMATTED
FORMAT_ALL
is now F_FORMAT_ALL
This interface and the implementation classes were refactored and were reworked for efficient use with SequenceBuilder
.
CharPredicate
class is now used to provide character sets instead of CharSequence
to provide consistent and efficient character tests. Methods with CharSequence
arguments which were used for selecting character sets, are now CharPredicate
.
The simplest way to change the method call is to use CharPredicate.anyOf(CharSequence)
to convert a character sequence to predicate.
some methods were renamed to better reflect their operation. In these cases the old name methods are deprecated and default implementation invokes the new methods.
This class was renamed to SegmentedSequenceFull
, which contains the old, inefficient implementation. It is not recommended that the old class be used due to its inefficient and in some cases buggy implementation.
The new SegmentedSequence
is an abstract class with concrete implementation by SegmentedSequenceFull
and SegmentedSequenceTree
. The latter is an efficient implementation using binary search tree.
The right way to create an instance of SegmentedSequence
is to use an instance of SequenceBuilder
to build a sequence then use SequenceBuilder.toSequence()
to return an instance of SegmentedSequenceTree
if the result requires a segmented sequence or a subsequence of underlying BasedSequence
if the single segment.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4