Commit graph

70 commits

Author SHA1 Message Date
Alec Murphy da4067a75f AK: Remove tilde from URL::PercentEncodeSet::EncodeURI 2022-12-26 04:51:55 +03:30
Linus Groh 57dc179b1f Everywhere: Rename to_{string => deprecated_string}() where applicable
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.

One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
2022-12-06 08:54:33 +01:00
Linus Groh 6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Ben Wiederhake 3aeb57ed09 AK+Everywhere: Fix data corruption due to code-point-to-char conversion
In particular, StringView::contains(char) is often used with a u32
code point. When this is done, the compiler will for some reason allow
data corruption to occur silently.

In fact, this is one of two reasons for the following OSS Fuzz issue:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=49184
This is probably a very old bug.

In the particular case of URLParser, AK::is_url_code_point got confused:
    return /* ... */ || "!$&'()*+,-./:;=?@_~"sv.contains(code_point);
If code_point is a large code point that happens to have the correct
lower bytes, AK::is_url_code_point is then convinced that the given
code point is okay, even if it is actually problematic.

This commit fixes *only* the silent data corruption due to the erroneous
conversion, and does not fully resolve OSS-Fuzz#49184.
2022-10-09 10:37:20 -06:00
sin-ack 3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
Karol Kosek 65afa113e5 AK: Make URL ApplicationXWWWFormUrlencoded encoding closer to spec
It was mostly implemented based on a spec note, that described only
allowed characters, but instead of allowing some special characters not
to be escaped, we escaped every special character except those 'new in
this encode set' disallowed characters from the spec definition.
2022-06-10 22:32:29 +01:00
Karol Kosek 9e69a89f8e AK: Append correct number of port characters when serializing a URL
Instead of formatting a port string, it put bytes from stack, using the
port number as a length (so for port 8000 it appended 8000 bytes).
2022-06-10 22:32:29 +01:00
ForLoveOfCats a7fe3183f5 AK: Add URL::create_with_help_scheme helper function 2022-04-21 09:12:37 +04:30
Andreas Kling 79c77debb0 AK: Don't destructively re-encode query strings in the URL parser
We were decoding and then re-encoding the query string in URLs.
This round-trip caused us to lose information about plus ('+')
ASCII characters encoded as "%2B".
2022-04-10 01:37:45 +02:00
Andreas Kling 3724ce765e AK+LibWeb: Encode ' ' as '+' in application/x-www-form-urlencoded
This matches what the URL and HTML specifications ask us to do.
2022-04-10 01:37:45 +02:00
GeekFiftyFive 832920c003 AK+LibHTTP: Revert prior change to percent encode plus signs
A change was made prior to percent encode plus signs in order to fix an
issue with the Google cookie consent page.

Unforunately, this was treating a symptom of a problem and not the root
cause and is incorrect behavior.
2022-04-08 20:44:49 +02:00
GeekFiftyFive 737f5b26b7 AK+LibHTTP: Ensure plus signs are percent encoded in query string
Adds a new optional parameter 'reserved_chars' to
AK::URL::percent_encode. This new optional parameter allows the caller
to specify custom characters to be percent encoded. This is then used
to percent encode plus signs by HttpRequest::to_raw_request.
2022-04-02 18:43:15 +02:00
Andreas Kling 8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Idan Horowitz d6cfa34667 AK: Make URL::m_port an Optional<u16>, Expose raw port getter
Our current way of signalling a missing port with m_port == 0 was
lacking, as 0 is a valid port number in URLs.
2021-09-14 00:14:45 +02:00
Idan Horowitz 55b67ba7a7 AK: Accept optional url and state override parameters in URLParser
These are required in the specification and used by the web's URL
built-in, this commit also removes the Badge<AK::URL> from URLParser
to allow other classes that need to call the parser directly like the
web's URL built-in to do so.
2021-09-14 00:14:45 +02:00
Idan Horowitz 6fa4fc8353 AK: Add URL::serialize_origin based on HTML's origin definition 2021-09-14 00:14:45 +02:00
Max Wipfli 9b8f35259c AK: Remove the LexicalPath::is_valid() API
Since this is always set to true on the non-default constructor and
subsequently never modified, it is somewhat pointless. Furthermore,
there are arguably no invalid relative paths.
2021-06-30 11:13:54 +02:00
Max Wipfli bc8d16ad28 Everywhere: Replace ctype.h to avoid narrowing conversions
This replaces ctype.h with CharacterType.h everywhere I could find
issues with narrowing conversions. While using it will probably make
sense almost everywhere in the future, the most critical places should
have been addressed.
2021-06-03 13:31:46 +02:00
Max Wipfli 2e23954271 AK: Move identity check from URL::operator==() to equals() 2021-06-01 12:23:16 +02:00
Max Wipfli a9114be1b8 AK: Use correct constness in URL class methods
This changes the URL class to use the correct constness for getters,
setters and other methods. It also changes the entire class to use east
const style.
2021-06-01 12:23:16 +02:00
Max Wipfli 5caaa52bee AK: Add hostname parameter to URL::create_with_file_scheme()
This adds a hostname parameter as the third parameter to
URL::create_with_file_scheme(). If the hostname is "localhost", it will
be ignored (as per the URL specification).

This can for example be used by ls(1) to create more conforming file
URLs.
2021-06-01 09:28:05 +02:00
Max Wipfli ebf074ca29 AK: Rewrite URL::compute_validity() to conform to new parser
This rewrites the URL validation check to be more specific, so it can
more accurately detect if a user of URL class constructs invalid URLs by
hand.
2021-06-01 09:28:05 +02:00
Max Wipfli 522ef53b98 AK: Remove deprecated m_path member variable from URL
The m_path member variable has been superseded by m_paths. Thus, it has
been removed. The path() getter will continue to exist as a convenience
method for getting the path joined together as a string.
2021-06-01 09:28:05 +02:00
Max Wipfli b7c6af0a04 AK: Replace URL::to_string() with new serialize() implementation 2021-06-01 09:28:05 +02:00
Max Wipfli 81f03e7a5d AK: Replace old URL parser with new URLParser::parse()
This replaces the old URL::parse() and URL::complete_url() parsing
mechanisms with the new spec-compliant URLParser::parse().
2021-06-01 09:28:05 +02:00
Max Wipfli 1697f3c35b AK: Add spec-compliant URL serialization methods
This adds URL serialization methods which are more in line with the
specification.

The serialize_for_display() method should be used e.g. in the browser
address bar, and as per the spec should not display username and
password. Furthermore, it could decode most percent-encoded code points,
although that is not implemented yet.
2021-06-01 09:28:05 +02:00
Max Wipfli 8a938a3e25 AK: Add helper functions and private data URL constructor to URL
This adds a few helper functions and a private constructor to
instantiate a data URL to the URL class. These will be needed by the
upcoming URL parser.
2021-06-01 09:28:05 +02:00
Max Wipfli dd392dfa03 AK: Add member variables to the URL class
This adds the m_username, m_password, m_paths and m_cannot_be_a_base_url
member variables to the URL class. These are necessary for the upcoming
new URL parser.

The deprecated m_path variable shadows the m_paths variable if it is
non-null. This behavior will be removed once the old URL parser has been
removed.
2021-06-01 09:28:05 +02:00
Max Wipfli a603e69599 AK+Everywhere: Replace usages of URLParser::urlencode() and urldecode()
This replaces all occurrences of those functions with the newly
implemented functions URL::percent_encode() and URL::percent_decode().
The old functions will be removed in a further commit.
2021-06-01 09:28:05 +02:00
Max Wipfli 2a6c9bc5f7 AK: Implement more conforming URL percent encode/decode mechanism
This adds a few new functions to percent encode/decode strings according
to the URL specification. The functions allow specifying a
PercentEncodeSet, which is defined by the specification. It will be used
to replace the current urlencode() and urldecode() functions in a
further commit.

This commit adds a few duplicate helper functions in the URL class, such
as is_digit() and is_ascii_digit(). This will be cleaned up as soon as
the upcoming new URL parser will replace the current one.
2021-06-01 09:28:05 +02:00
Max Wipfli 31f6ba0952 AK: Internally rename protocol to scheme in URL
This renames all references to protocol to scheme, which is the name
used by the URL standard (https://url.spec.whatwg.org/). Externally, all
methods referencing "protocol" were duplicated with "scheme". The old
methods still exist as compatibility.
2021-06-01 09:28:05 +02:00
Max Wipfli d6709ac87d AK: Omit unnecessary function parameter names in URL
This patch removes unnecessary function parameter names in declarations
of the URL class. It also changes parameter types from String to
StringView where applicable.
2021-06-01 09:28:05 +02:00
Brian Gianforcaro 1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
DexesTTP 8e44a0bd46 AK: Add default ports for Websockets to the URL class 2021-04-18 22:42:10 +02:00
speles 50de653cc9 AK: Add optional fragment parameter to create_with_file_protocol()
Now that we use fragment for specifying starting selection in
FileManager we would benefit from providing it as argument instead of
setting it each time separately.
2021-03-07 11:00:36 +01:00
xspager dd198e1a29 AK::URL: Fix setting the port number in the case it was the last element of the URL 2020-12-12 20:09:42 +01:00
Brendan Coles 3e0e84dcd1 AK::URL: Check if URL requires a port set to be considered a valid URL
`AK::URL` will now check if the URL requires a port to be set using
`AK::URL.protocol_requires_port(protocol)`.

If the URL does not specify a port, and no default port for the URL
protocol is found with `AK::URL.default_port_for_protocol(protocol)`,
the URL is considered to be invalid.
2020-11-04 19:34:00 +01:00
asynts 1d96d5eea4 AK: Use new format functions. 2020-10-08 09:59:55 +02:00
AnotherTest 5b5ba91335 AK: Add URL::create_with_data() to create data URLs 2020-08-24 18:21:33 +02:00
Andreas Kling fdfda6dec2 AK: Make string-to-number conversion helpers return Optional
Get rid of the weird old signature:

- int StringType::to_int(bool& ok) const

And replace it with sensible new signature:

- Optional<int> StringType::to_int() const
2020-06-12 21:28:55 +02:00
Andreas Kling ff6de8a169 AK: URL should urldecode data: URL payloads
Otherwise we can end up with percent-encoded nonsense in base64 data
which does not decode correctly.
2020-06-10 17:45:34 +02:00
Andreas Kling ec83555b87 AK: Don't try to complete relative data: URLs 2020-06-07 19:09:03 +02:00
Sergey Bugaev 602c3fdb3a AK: Rename FileSystemPath -> LexicalPath
And move canonicalized_path() to a static method on LexicalPath.

This is to make it clear that FileSystemPath/canonicalized_path() only
perform *lexical* canonicalization.
2020-05-26 14:35:10 +02:00
FalseHonesty b1e4f70173 AK: Fix URL::complete_url behaviour for when a fragment is passed
Previously, passing a fragment string ("#section3") to the complete_url
method would result in a URL that looked like
"file:///home/anon/www/#section3" which was obviously incorrect. Now the
result looks like "file:///home/anon/www/afrag.html#section3".
2020-05-23 11:53:25 +02:00
Conrad Pankoff b5b08fba92 AK: Make sure URL retains trailing slash if present in complete_url 2020-05-17 16:35:42 +02:00
Conrad Pankoff 4214680e76 AK: Set default port in URL to 1965 for gemini protocol 2020-05-17 12:41:38 +02:00
Linus Groh b193670967 AK: Handle "protocol relative URLs" in URL::complete_url() 2020-05-16 21:47:16 +02:00
Andreas Kling 56dbe58bbb AK: Add support for about: URLs 2020-05-10 11:11:48 +02:00
Andreas Kling 3422019e35 AK: Unbreak parsing of file:// URLs with no host
We should still accept file:/// in the URL parser. :^)
2020-05-09 16:47:05 +02:00
Andreas Kling 15601988a4 AK: Allow file:// URLs to have a hostname 2020-05-09 16:14:37 +02:00