Use NFC in copy, makeWord, and export functions, and NFKC for search operations.
NFKC may alter characters when copied or exported. For example ⑥ in pdf will be pasted as 6. So most instances are replaced with NFC.
To simplify matching during search operation, NFKC is used.
BUG: 466521
CCBUG: 473495
After trying to measure the effect of switching TinyTextEntity from a
custom SBO string type to just QString and getting a barely measurable
gain, TinyTextEntity kind of looked like TextEntity, so merge those two.
Also reduce the amount of new/deletes around TextEntities
Implement a string pool for some tiny strings, and various memory optimizations
We now definitely has std::as_const available and Qt has started nagging
about converting to std::as_const.
Implementation is the same for both functions, and qAsConst was a
stop-gap measure until std::as_const was sufficiently available.
In order to be able to also do this with Qt6 we can't currently rely on
'random distro' to be able to give us an environment sufficient for
running clang and clazy, so use the CI system setup instead
A side effect is that we get a newer clazy and clang-tidy and thus is a
few more issues to either ignore or fix.
Also, ask ninja to continue as long as possible rather than stop at
first error to be able to get succes in as few runs as possible
The bounding rect is *visual* and it can happen that we have
invisible text and thus the algorithm will get super confused
because there will be text items that are outside the text area
A lot of this code has been commented out for over
a decade and adds no value to the project.
It is only annoying when you look over it ;).
Same for the KNS2 support which was commented out.
Also some of the debug statements didn't even build
anymore, because the properties got removed/refactored.
find . \( -name "*.cpp" -or -name "*.h" -or -name "*.c" -or -name "*.cc" \) -exec clang-format -i {} \;
If you reached this file doing a git blame, please see README.clang-format (added 2 commits in the future of this one)
It happens that sometimes the hypen is actually "part of the word" like
in one-third, so if there's one- at the end of a line
and third at the beginning of the next, we should still match and not
force the user to type onethird, even we will also match onethird since
there's no way to know if "hyphen at end of line" is supposed to be part
of the word or not
BUGS: 418520
The code compiles and okular seems to load and work as before, all unit
tests pass except (parttest and epubgeneratortest, but they fail on master
too).
Summary:
This adds some important documentation on TextEntity and other classes, and improves some of the existing documentation.
This includes changing parameter names from ‘rect’ to ‘area’, because I found ‘rect’ misleading.
Test Plan: Run doxygen
Reviewers: #okular, aacid
Reviewed By: #okular, aacid
Subscribers: aacid, yurchor, okular-devel
Tags: #okular
Differential Revision: https://phabricator.kde.org/D21271
text page was storing a pointer to a PagePrivate pointer but those die
after saving so we need to store a Page pointer since those are stable.
BUGS: 387247
It was a simple bug in the XY Cut layout recognition code that made it too eager to see columns everywhere.
Also removed the dependence of the layout analysis algorithms on the display DPI (introduced by the recently added feature of using KScreen) to make their behavior more predictable and reproducible.
BUGS: 326207
BUGS: 331090
FIXED-IN: 4.13.0
REVIEW: 115759
Having functions which are defined but not used serves no gain. This patch
therefore removes the extra method and updates the comment reference in the
second one to make it standalone.
REVIEW: 114959
Also simplified code a bit by removing unnecessary calls to toLower in TextPagePrivate::findTextInternalForward and TextPagePrivate::findTextInternalBackward I also fixed a small bug: the letter capital I with dot above (U+0130) did not match itself in case-insensitive mode on Qt 4.8.4 (U+0130 still does not match lowercase i (U+0069), which can be considered another bug, that I didn't fix (although this behavior conforms to the Unicode case folding rules)).
(I did not implement the Knuth-Morris-Pratt algorithm that I promised in a comment of Bug 323263 because on second thought I find that the win, if any, would probably be negligible except for some very special documents and special query strings.)
BUGS: 323262
BUGS: 323263
REVIEW: 112135
When searching backwards end is not actually words.end but words.begin (since the loop goes backwards) hence we can't pass end to stringLengthAdaptedWithHyphen
I've now renamed end to loop_end to make it a bit more clear.
BUGS: 309030
FIXED-IN: 4.9.4