Commit graph

86 commits

Author SHA1 Message Date
Kemal Zebari b6b4e59bf7 AK: Implement URL::serialize_path() to spec
This commit also reverts db5ad0c since code outside of the web spec
expects serialized paths to be percent decoded.

Also, there are issues trying to implement the concept "opaque
path". For now, we still use the old cannot_be_a_base_url(), but its
usage needs to be removed in favor of a has_opaque_path() as the spec
has changed since then.
2023-09-15 11:15:43 -06:00
Shannon Booth 23e82114b4 AK: Do not consider port of 0 as a null port
This fixes an issue where if a port number of 0 was given for a non
special scheme the port number was being dropped.
2023-08-31 11:02:18 +02:00
Shannon Booth 582784d0cf AK: Remove unused ApplyPercentDecoding enum from URL class 2023-08-13 15:03:53 -06:00
Shannon Booth 9d60f23abc AK: Port URL::m_fragment from DeprecatedString to String 2023-08-13 15:03:53 -06:00
Shannon Booth 5663a2d3b4 AK+LibWeb: Do not percent encode/decode in URL fragment setter/getters
The web specs do not expect decoding or decoding to happen when calling
these helpers. This allows us to remove the raw_fragment helper function
from the URL class.
2023-08-13 15:03:53 -06:00
Shannon Booth c25485700a AK: Port URL scheme from DeprecatedString to String 2023-08-13 15:03:53 -06:00
Shannon Booth 21fe86d235 AK: Port URL::m_query from DeprecatedString to String 2023-08-13 15:03:53 -06:00
Shannon Booth 55a01e72ca AK: Port URL username/password from DeprecatedString to String
And for cases that just need to check whether the password/username is
empty, add a raw_{password,username} helper to avoid any allocation.
2023-08-13 15:03:53 -06:00
Shannon Booth d2fe657879 AK: Remove unused URL::m_path member 2023-08-13 15:03:53 -06:00
Sam Atkins 28aa4ca767 AK: Add URL::raw_fragment()
This is a little hackish. Part of the algorithm to get the indicated
part of a DOM::Document
https://html.spec.whatwg.org/multipage/browsing-the-web.html#the-indicated-part-of-the-document
wants to first get the URL's fragment, use it, and then later
percent-decode it and use that. So, we need a way to get that
un-decoded fragment.
2023-08-12 08:39:04 +02:00
Shannon Booth db5ad0c2b0 AK: Remove ApplyPercentDecoding from URL
Nowhere was setting this flag from the default.
2023-08-06 08:57:23 +02:00
Shannon Booth 98666b012d AK: Remove URL::ApplyPercentEncoding
Everywhere only ever expects percent encoding to occur, so let's just
remove this flag altogether. At the same time, replace some
DeprecatedString with StringView.
2023-08-06 08:57:23 +02:00
Shannon Booth c4d7be100e AK: Directly append URL paths where applicable
This is a little closer to the spec text, and helps us avoid using
the ApplyPercentEncoding flag.
2023-08-06 08:57:23 +02:00
Karol Kosek eb41f0144b AK: Decode data URLs to separate class (and parse like every other URL)
Parsing 'data:' URLs took it's own route. It never set standard URL
fields like path, query or fragment (except for scheme) and instead
gave us separate methods called `data_payload()`, `data_mime_type()`,
and `data_payload_is_base64()`.

Because parsing 'data:' didn't use standard fields, running the
following JS code:

    new URL('#a', 'data:text/plain,hello').toString()

not only cleared the path as URLParser doesn't check for data from
data_payload() function (making the result be 'data:#a'), but it also
crashes the program because we forbid having an empty MIME type when we
serialize to string.

With this change, 'data:' URLs will be parsed like every other URLs.
To decode the 'data:' URL contents, one needs to call process_data_url()
on a URL, which will return a struct containing MIME type with already
decoded data! :^)
2023-08-01 14:19:05 +02:00
Shannon Booth 8751be09f9 AK: Serialize URL hosts with 'concept-host-serializer'
In order to follow spec text to achieve this, we need to change the
underlying representation of a host in AK::URL to deserialized format.
Before this, we were parsing the host and then immediately serializing
it again.

Making that change resulted in a whole bunch of fallout.

After this change, callers can access the serialized data through
this concept-host-serializer. The functional end result of this
change is that IPv6 hosts are now correctly serialized to be
surrounded with '[' and ']'.
2023-07-31 05:18:51 +02:00
Shannon Booth a1ae701a7d AK: Move URL::cannot_have_a_username_or_password_or_port out of line
This doesn't seem trivial enough to be defining in the header like this,
and should not be a performance critical function anyhow.

Also add spec comments while we are at it, and a FIXME since we do not
seem to exactly align.
2023-07-31 05:18:51 +02:00
Shannon Booth 0c0117fc86 AK: Add typdefs for host URL definitions
And use them where applicable. This will allow us to store the host in
the deserialized format as the spec specifies.

Ideally these typdefs would instead be the existing AK interfaces, but
in the meantime, we can just use this.
2023-07-31 05:18:51 +02:00
Shannon Booth 7b3902e3d5 AK: Remove unused URL::scheme_requires_port
THis function does not seem to be used anywhere, and I cannot find any
spec equivalent for this function.
2023-07-25 06:43:50 -04:00
Shannon Booth 50359567e0 AK: Add spec comments for URL spec defined member variables 2023-07-24 17:07:16 -04:00
Timothy Flynn c911781c21 Everywhere: Remove needless trailing semi-colons after functions
This is a new option in clang-format-16.
2023-07-08 10:32:56 +01:00
Kenneth Myhra c03c0ec900 AK: Add URL::to_string() returning a String
For convenience wire up a to_string() method which returns a new String.
2023-06-17 20:38:20 +02:00
MacDue 5db1eb9961 AK+Everywhere: Replace URL::paths() with path_segment_at_index()
This allows accessing and looping over the path segments in a URL
without necessarily allocating a new vector if you want them percent
decoded too (which path_segment_at_index() has an option for).
2023-04-15 06:37:04 +02:00
MacDue 35612c6a7f AK+Everywhere: Change URL::path() to serialize_path()
This now defaults to serializing the path with percent decoded segments
(which is what all callers expect), but has an option not to. This fixes
`file://` URLs with spaces in their paths.

The name has been changed to serialize_path() path to make it more clear
that this method will generate a new string each call (except for the
cannot_be_a_base_url() case). A few callers have then been updated to
avoid repeatedly calling this function.
2023-04-15 06:37:04 +02:00
MacDue 5acd40c525 AK+Everywhere: Add ApplyPercentDecoding option to URL getters
The defaults selected for this are based on the behaviour of URL
when it applied percent decoding during parsing. This does mean now
in some cases the getters will allocate, but percent_decode() checks
if there's anything to decode first, so in many cases still won't.
2023-04-15 06:37:04 +02:00
MacDue b9e03071cc AK: Remove unnecessary parameter names in URL.h 2023-04-15 06:37:04 +02:00
MacDue 8283e8b88c AK: Don't store parts of URLs percent decoded
As noted in serval comments doing this goes against the WC3 spec,
and breaks parsing then re-serializing URLs that contain percent
encoded data, that was not encoded using the same character set as
the serializer.

For example, previously if you had a URL like:

https:://foo.com/what%2F%2F (the path is what + '//' percent encoded)

Creating URL("https:://foo.com/what%2F%2F").serialize() would return:

https://foo.com/what//

Which is incorrect and not the same as the URL we passed. This is
because the re-serializing uses the PercentEncodeSet::Path which
does not include '/'.

Only doing the percent encoding in the setters fixes this, which
is required to navigate to Google Street View (which includes a
percent encoded URL in its URL).

Seems to fix #13477 too
2023-04-12 07:40:22 +02:00
Kenneth Myhra 843c9d6cd7 AK: Add new String constructor to URL 2023-03-01 22:44:20 +00:00
Sam Atkins abc01cc9fe AK+Tests+LibWeb: Make URL::complete_url() take a StringView
All it does is pass this to `URLParser::parse()` which takes a
StringView, so we might as well take one here too.
2023-02-15 12:48:26 -05:00
Linus Groh 57dc179b1f Everywhere: Rename to_{string => deprecated_string}() where applicable
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.

One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
2022-12-06 08:54:33 +01:00
Linus Groh 6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Nico Weber daeaefad17 Everywhere: Clean up "the the" comment typos 2022-11-03 17:38:32 +00:00
Dennis Bonke ccb5151291 AK: Add support for mlibc in URL 2022-11-02 22:19:12 -06:00
networkException 4230dbbb21 AK+Everywhere: Replace "protocol" with "scheme" url helpers
URL had properly named replacements for protocol(), set_protocol() and
create_with_file_protocol() already. This patch removes these function
and updates all call sites to use the functions named according to the
specification.

See https://url.spec.whatwg.org/#concept-url-scheme
2022-09-29 09:39:04 +01:00
sin-ack 604aac531c AK+Userland+Tests: Remove URL(char const*) constructor
The StringView(char const*) constructor is being removed, and there was
only a few users of this left, which are also cleaned up in this commit.
2022-07-12 23:11:35 +02:00
ForLoveOfCats a7fe3183f5 AK: Add URL::create_with_help_scheme helper function 2022-04-21 09:12:37 +04:30
Andreas Kling 79c77debb0 AK: Don't destructively re-encode query strings in the URL parser
We were decoding and then re-encoding the query string in URLs.
This round-trip caused us to lose information about plus ('+')
ASCII characters encoded as "%2B".
2022-04-10 01:37:45 +02:00
Andreas Kling 3724ce765e AK+LibWeb: Encode ' ' as '+' in application/x-www-form-urlencoded
This matches what the URL and HTML specifications ask us to do.
2022-04-10 01:37:45 +02:00
GeekFiftyFive 832920c003 AK+LibHTTP: Revert prior change to percent encode plus signs
A change was made prior to percent encode plus signs in order to fix an
issue with the Google cookie consent page.

Unforunately, this was treating a symptom of a problem and not the root
cause and is incorrect behavior.
2022-04-08 20:44:49 +02:00
GeekFiftyFive 737f5b26b7 AK+LibHTTP: Ensure plus signs are percent encoded in query string
Adds a new optional parameter 'reserved_chars' to
AK::URL::percent_encode. This new optional parameter allows the caller
to specify custom characters to be percent encoded. This is then used
to percent encode plus signs by HttpRequest::to_raw_request.
2022-04-02 18:43:15 +02:00
Andreas Kling 216e21a1fa AK: Convert AK::Format formatting helpers to returning ErrorOr<void>
This isn't a complete conversion to ErrorOr<void>, but a good chunk.
The end goal here is to propagate buffer allocation failures to the
caller, and allow the use of TRY() with formatting functions.
2021-11-17 00:21:13 +01:00
Andreas Kling 8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Andreas Kling 5f7d008791 AK+Everywhere: Stop including Vector.h from StringView.h
Preparation for using Error.h from Vector.h. This required moving some
things out of line.
2021-11-10 21:58:58 +01:00
Idan Horowitz 30849b10d5 AK: Move the path argument of URL::append_path instead of copying it 2021-09-14 00:14:45 +02:00
Idan Horowitz d6cfa34667 AK: Make URL::m_port an Optional<u16>, Expose raw port getter
Our current way of signalling a missing port with m_port == 0 was
lacking, as 0 is a valid port number in URLs.
2021-09-14 00:14:45 +02:00
Idan Horowitz 1c9c43785d AK: Add URL::cannot_have_a_username_or_password_or_port
As defined by the URL specification:
https://url.spec.whatwg.org/#cannot-have-a-username-password-port
2021-09-14 00:14:45 +02:00
Idan Horowitz 929af64a67 AK: Change URL::cannot_be_a_base_url, URL::is_valid return type to bool
There's no need to return a const reference (8 bytes) when the value is
always used as a temporary bool (1 byte).
2021-09-14 00:14:45 +02:00
Idan Horowitz 6fa4fc8353 AK: Add URL::serialize_origin based on HTML's origin definition 2021-09-14 00:14:45 +02:00
Max Wipfli 2e23954271 AK: Move identity check from URL::operator==() to equals() 2021-06-01 12:23:16 +02:00
Max Wipfli 33396494f6 AK+LibWeb: Remove URL::to_string_encoded()
This replaces URL::to_string_encoded() with to_string() and removes the
former, since they are now equivalent.
2021-06-01 12:23:16 +02:00
Max Wipfli a9114be1b8 AK: Use correct constness in URL class methods
This changes the URL class to use the correct constness for getters,
setters and other methods. It also changes the entire class to use east
const style.
2021-06-01 12:23:16 +02:00