RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/ClosedXML/ClosedXML/releases/tag/0.100.0 below:

Clean Break · ClosedXML/ClosedXML · GitHub

Clean Break

These are release notes for a version 0.100. We skipped a few version since the last release (0.97), because 0.100 should denote a major change at the very heart of ClosedXML. Not as clean break as I hoped, but close enough.

The list of all things that were changed from 0.97 to 0.100 is at the migration guide at the https://closedxml.readthedocs.io/en/latest/migrations/migrate-to-0.100.html

This is more like list of you should upgrade despite breaking changes :)

Memory consumption during big was decreased

Memory consumption during saving of large data workbooks was significantly improved. Originally, ClosedXML workbook representation was converted to DocumentFomrat.OpenXML DOM representation and the DOM was then saved. Instead of creating whole DOM, sheet data (=cell values) are now directly streamed to the output file and aren't included in the DOM.

To demonstrate difference, see the before and after memory consumption of a report that generated 30 000 rows, 45 columns. Memory consumption has decreased from 2.08 GiB 🡆 0.8 GiB.

Save cells and strings through DOM: 2.08 GiB

Save cell and strings through streaming: 0.8 GiB

The purple area are bytes of uncompressed package zip stream.

Cell value is now strongly typed

IXLCell.Value and IXLCellValue.CachedValue have now type XLCellValue. At the core, xlsx consists of addressable cells with a functions that transform a set of values in source cells to different values in target cells. Is is really important to represent potential values of cells by a sane type. All other things, pivot tables, auto filter, graphs rely on this premise.

Cell value has been represented as string text and a value. The string depended on the value, e.g. 0/1 for boolean. That has been the case since the beginning of the ClosedXML project (see the original XLCell). The value was also returned as an Object.
This approach has several drawbacks

Object is not suitable representation of cell value. User had no idea what kind of values could be returned as a cell value. Everything could also break down, if a new type would be returned (e.g. XLError).
Setter could accept different types that the getter returned. E.g. it was possible to set cell value to a IXLColumn.
Values were always boxed/unboxed. That is not a problem for small amount of data, but it is not great for large workbooks.
It caused an potentially buggy behavior in other places of the ClosedXML.

Value of a cell is not represented by a XLCellValue structure. It is basically a union of one of possible types that can be value of a cell:

blank
boolean
number
text
error
datetime - basically number representing serial datetime, use serial datetime.
duration - basically number representing serial datetime, use serial datetime

Since datetime and duration are basically masqaraded number, you can use XLCellValue.GetUnifiedNumber() to get a backing number, no matter if the type is number, datetime and duration.

The structure contains implicit operators, as well as other methods to make transaction as seamless as possible

// Will use an implicit cast operator to convert string to XLCellValue and pass it to the Value setter
ws.Cell("A1").Value = "Text";

There is also a new singleton Blank.Value that represent a blank value of a cell. Null is not blank. Empty string is not a blank value of a cell. Null instead of blank was considered and everything is just so much easier to work with, if blank is represented as a custom singleton type and not as a null.

XLCellValue will be able to represent all values of a cell and won't be boxed/unboxed all the time.

Cell data type is no longer guessed

ClosedXML used to guess a data type from a value. It caused all sort of unexpected behaviors (e.g. text value Z12.31 has been converted to date time 12/30/2022 19:00). Date caused most problems, but other sometimes too (e.g. text "Infinity" was detected as a number).

This behavior was likely intended to emulate how user interacts with an Excel. Excel guesses type, but only if the cell Number Format is set to "General" (e.g. if NumberFormat is set to Text, there is no conversion even in Excel). Application is not human and doesn't have to interact with xlsx in the same way.

This behavior was removed. Type that is set is the type that will be returned. Note that although XLCellValue can represent date and time as a different types, in reality that is only presentation logic for user. They are both just serial date time numbers.

Cell value now can be XLError or Blank

Cell value now can accurately represent error or a blank value.

ClosedXML used to throw on error value and cell couldn't contain an error. That was a significant problem, especially for formula calculation where formula referenced a cell that should contain an error value.

ClosedXML used to represent blank cell as an empty string, but no longer. It uses Blank.Value singleton, wrapped in XLCellValue. Also brings significant improvement in accuracy for CalcEngine evaluation.

Text to number coercion

Excel has a pretty complicated undocumented coercion process from text to number. It can convert fraction text (="1 1/2"*2 is 3), dates (e.g. ="1900-01-05"*2 is 10, though date format is culture specific), percent (e.g. ="100%"*2), braces imply negative value (="(100%)"*2 = -2) and many more. That causes a significant problems for formula evaluation, especially if the source cell contains a date as a text, not as a date.

ClosedXml used to only convert test that looked like double, it now coerces nearly everything Excel does. Coercion from dates should mostly work, but Excel has it's own database of acceptable formats and it's own format, while we rely on .NET Core infrastructure.

CalcEngine doesn't throw exceptions

Thanks to incorporation of XLError to core of CalcEngine, the exceptions are no longer necessary and have been removed. Error is a normal value type that is used during formula evaluation (e.g. ISNA accepts it and VLOOKUP returns it).

Technically speaking CalcEngine can still throw MissingContextException, but only if evaluation is not called from a cell, but from method like XLWorkbook.Evaluate. Functions like ROW just can't work without the context of the cell.

Unimplemented functions now return #NAME?

If you ever tried to use CalcEngine, you have encountered a dreaded The function *SomeFunctionwas not recognised. exception.

ClosedXML will no longer throw an exception on unimplemented function, but will return #NAME? error instead. It has several reasons

It aligns behavior of user defined functions in like with predefined functions. ClosedXML doesn't throw anything on =SOME.UNKNOWN.FUN(4), why should it throw on =LARGE(A1:A5,1)?
By default, ClosedXML doesn't save calculated values. A portion of workbook that doesn't use unimplemented function should work correctly, maybe that is enough for some use case? Excel (nearly always) recalculates everything on load anyway.

Basically, the exception doesn't bring any benefit and only imposes costs. User can report missing function on #NAME? error just like on exception.

Array literal can now be parsed

CalcEngine now can evaluate array literal expressions, so formulas like VLOOKUP(4, {1,2; 3,2; 5,3; 7,4}, 2) now actually work.

Array processing is limited to argument parsing across formulas and CalcEngine still needs some love to process it work correctly. Array formulas are still not implemented.

Reimplementation of information and lookup functions

Information and lookup functions were reimplemented to take advantage of other improvements. They should now be compliant with Excel (with exception of wildcard search for VLOOKUP).

Documentation in the version control

Documentation is being moved from wiki to the ReadTheDocs. It has been there for since 2019, but we didn't actually had any documentation. Documentation is super important and ClosedXML lacks in that area. It is of course WIP, but it should improve over the time (see https://closedxml.readthedocs.io/en/latest/features/protect.html, https://closedxml.readthedocs.io/en/latest/features/cell-format.html#number-format or infamous https://closedxml.readthedocs.io/en/latest/tips/missing-font.html).

The move to ReadTheDocs has significant advantages:

It is in version control. That means every PR now can contain modification to documentation.
It is built as part of CI
It is versioned.
It uses ReStructured Text (rst) that has more rich style options and even plugins. Commonmark is heavily limited in style application.
It can generate documentation from xml comments
It can use references and includes. That means all examples can be in separate files and only included to documentation. Separate example files could be just complied and checked for correctness (we are not doing that ATM, but will likely do at some point in the future). That would solve the pesky issue of outdated examples in documentation.

Notes about breaking changes

We are not breaking the compatibility just because. Break imposes heavy penalty on users of the library. That makes it less likely to use it and that is definitely not the goal. Even the ClosedXML.Report must be fixed after every release.

That is not desirable situation. Version 1.0 and semantic versioning is certainly the goal. But it must be with an clear API that can endure some development between minor version. That is just not the case at the moment.

API will be reviewed along with the documentation and will be adjusted as necessary. ClosedXML will practice release early, release often. If breaking changes are not acceptable, stay on version that works and wait for 1.0 (though that will likely take at least a year, likely more... we are on a second decade).

Technically we do semver since forever, since Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable. ). Initial development for a decade /sigh.

Future plans

Similar to current release, the general plan is to work on neglected foundational things and bug fixes.

Fix AutoFilter - doesn't work correctly, API is a mess and accepts any type. I wanted to have it done for 0.100 ¯_(ツ)_/¯
Finish CalcEngine redesign with array formulas.
Update XLParser to 1.6.2, I added PRs 162 and163 to improve speed by about factor of 3x (test dataset was parsed in 13 seconds vs 47 originally). But not enough time to upgrade the version ¯_(ツ)_/¯
Housekeeping of PR - some PRs were merged, but most are still there.
Cell sizing is a mess. Clean it up and fix AdjustToContent to be in line with what Excel does (research was done: https://github.com/ClosedXML/ClosedXML/wiki/Cell-Dimensions).
Make a fuzzer for function evaluation that compares ClosedXML implementation with result from Excel

It is likely there will be 0.100.x to fix whatever bugs XLCellValue caused that weren't convered by tests.

Pivot tables won't get any love in 0.101, but hopefully in the next one. It is one of distinguishing features of ClosedXML and it has a lot of reported issues.

What's Changed

Add a graphic engine ctor accepting streams by @jahav in #1881
#1887 - System.NotImplementedException in ConvertLegacyCellValueToSca… by @cjundt in #1888
Read properly VML color by @jahav in #1892
Default column width wasn't converted from width to NoC by @jahav in #1893
Reimplement of a VALUE function. by @jahav in #1895
Add a text-to-number coercion by @jahav in #1899
Make AutoFilter filtering slighly less buggy. by @jahav in #1900
Add a read the docs by @jahav in #1907
Update System.IO.Packaging dependency by @jahav in #1912
Ignore indexer in objects for bulk insert by @jahav in #1913
Bump System.Data.SqlClient from 4.3.0 to 4.8.5 in /ClosedXML.Tests by @dependabot in #1918
Remove advice about System.Drawing on Linux by @leotsarev in #1923
Function documentation by @jahav in #1925
Border fluent API doesn't work for styles on non IStylized types. by @jahav in #1928
Check ActiveTab.HasValue before accessing ActiveTab.Value by @0MG-DEN in #1930
Use types values in a cell. by @jahav in #1922
Fix info functions by @jahav in #1933
Use formats only for number-like values by @jahav in #1934
Implement loading/saving of error values. by @jahav in #1935
Unimplemented standard functions should evaluate to #NAME? by @jahav in #1936
Replace calculation engine exceptions with XLError enum by @igitur in #1404
Reimplement VLOOKUP using bisection. by @jahav in #1938
Saving through streams by @jahav in #1937
Add support for parsing array literals by @jahav in #1939
Reimplement logical functions by @jahav in #1943
Adding custom number format to a template could fail with existing non-sequential ids by @jahav in #1940
Document possible solutions of a missing font by @jahav in #1948
Remove obsoleted methods by @igitur in #1830
Renamed XLPivotValues parameters by @0MG-DEN in #1946
Return strongly typed result from Evaluate methods by @jahav in #1952
fix #1896 "Pictures and Shapes display order is not respected after save" by @nakamura2000 in #1944
Add documentation of a "protect workbook data" feature. by @jahav in #1957
Add documentation for graphic engine and move from ctor to factory methods by @jahav in #1958
Fixes issue #1917 not contains filter by @cjundt in #1953
Reorder XLError members by @jahav in #1959

Full Changelog: 0.97.0...0.100.0

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4