Class HttpUrl
- java.lang.Object
-
- com.squareup.okhttp.HttpUrl
-
public final class HttpUrl extends Object
A uniform resource locator (URL) with a scheme of eitherhttporhttps. Use this class to compose and decompose Internet addresses. For example, this code will compose and print a URL for Google search:
which prints:HttpUrl url = new HttpUrl.Builder() .scheme("https") .host("www.google.com") .addPathSegment("search") .addQueryParameter("q", "polar bears") .build(); System.out.println(url);
As another example, this code prints the human-readable query parameters of a Twitter search:https://www.google.com/search?q=polar%20bears
which prints:HttpUrl url = HttpUrl.parse("https://twitter.com/search?q=cute%20%23puppies&f=images"); for (int i = 0, size = url.querySize(); i < size; i++) { System.out.println(url.queryParameterName(i) + ": " + url.queryParameterValue(i)); }
In addition to composing URLs from their component parts and decomposing URLs into their component parts, this class implements relative URL resolution: what address you'd reach by clicking a relative link on a specified page. For example:q: cute #puppies f: images
which prints:HttpUrl base = HttpUrl.parse("https://www.youtube.com/user/WatchTheDaily/videos"); HttpUrl link = base.resolve("../../watch?v=cbP2N1BQdYc"); System.out.println(link);https://www.youtube.com/watch?v=cbP2N1BQdYcWhat's in a URL?
A URL has several components.Scheme
Sometimes referred to as protocol, A URL's scheme describes what mechanism should be used to retrieve the resource. Although URLs have many schemes (mailto,file,ftp), this class only supportshttpandhttps. Usejava.net.URIfor URLs with arbitrary schemes.Username and Password
Username and password are either present, or the empty string""if absent. This class offers no mechanism to differentiate empty from absent. Neither of these components are popular in practice. Typically HTTP applications use other mechanisms for user identification and authentication.Host
The host identifies the webserver that serves the URL's resource. It is either a hostname likesquare.comorlocalhost, an IPv4 address like192.168.0.1, or an IPv6 address like::1.Usually a webserver is reachable with multiple identifiers: its IP addresses, registered domain names, and even
localhostwhen connecting from the server itself. Each of a webserver's names is a distinct URL and they are not interchangeable. For example, even ifhttp://square.github.io/daggerandhttp://google.github.io/daggerare served by the same IP address, the two URLs identify different resources.Port
The port used to connect to the webserver. By default this is 80 for HTTP and 443 for HTTPS. This class never returns -1 for the port: if no port is explicitly specified in the URL then the scheme's default is used.Path
The path identifies a specific resource on the host. Paths have a hierarchical structure like "/square/okhttp/issues/1486". Each path segment is prefixed with "/". This class offers methods to compose and decompose paths by segment. If a path's last segment is the empty string, then the path ends with "/". This class always builds non-empty paths: if the path is omitted it defaults to "/", which is a path whose only segment is the empty string.Query
The query is optional: it can be null, empty, or non-empty. For many HTTP URLs the query string is subdivided into a collection of name-value parameters. This class offers methods to set the query as the single string, or as individual name-value parameters. With name-value parameters the values are optional and names may be repeated.Fragment
The fragment is optional: it can be null, empty, or non-empty. Unlike host, port, path, and query the fragment is not sent to the webserver: it's private to the client.Encoding
Each component must be encoded before it is embedded in the complete URL. As we saw above, the stringcute #puppiesis encoded ascute%20%23puppieswhen used as a query parameter value.Percent encoding
Percent encoding replaces a character (like🍩) with its UTF-8 hex bytes (like%F0%9F%8D%A9). This approach works for whitespace characters, control characters, non-ASCII characters, and characters that already have another meaning in a particular context.Percent encoding is used in every URL component except for the hostname. But the set of characters that need to be encoded is different for each component. For example, the path component must escape all of its
?characters, otherwise it could be interpreted as the start of the URL's query. But within the query and fragment components, the?character doesn't delimit anything and doesn't need to be escaped.
This prints:HttpUrl url = HttpUrl.parse("http://who-let-the-dogs.out").newBuilder() .addPathSegment("_Who?_") .query("_Who?_") .fragment("_Who?_") .build(); System.out.println(url);
When parsing URLs that lack percent encoding where it is required, this class will percent encode the offending characters.http://who-let-the-dogs.out/_Who%3F_?_Who?_#_Who?_IDNA Mapping and Punycode encoding
Hostnames have different requirements and use a different encoding scheme. It consists of IDNA mapping and Punycode encoding.In order to avoid confusion and discourage phishing attacks, IDNA Mapping transforms names to avoid confusing characters. This includes basic case folding: transforming shouting
SQUARE.COMinto cool and casualsquare.com. It also handles more exotic characters. For example, the Unicode trademark sign (™) could be confused for the letters "TM" inhttp://ho™mail.com. To mitigate this, the single character (™) maps to the string (tm). There is similar policy for all of the 1.1 million Unicode code points. Note that some code points such as "🍩" are not mapped and cannot be used in a hostname.Punycode converts a Unicode string to an ASCII string to make international domain names work everywhere. For example, "σ" encodes as "xn--4xa". The encoded string is not human readable, but can be used with classes like
InetAddressto establish connections.Why another URL model?
Java includes bothjava.net.URLandjava.net.URI. We offer a new URL model to address problems that the others don't.Different URLs should be different
Although they have different content,java.net.URLconsiders the following two URLs equal, and theequals()method between them returns true:- http://square.github.io/
- http://google.github.io/
java.net.URLunusable for many things. It shouldn't be used as aMapkey or in aSet. Doing so is both inefficient because equality may require a DNS lookup, and incorrect because unequal URLs may be equal because of how they are hosted.Equal URLs should be equal
These two URLs are semantically identical, butjava.net.URIdisagrees:- http://host:80/
- http://host
:80) and the absent trailing slash (/) cause URI to bucket the two URLs separately. This harms URI's usefulness in collections. Any application that stores information-per-URL will need to either canonicalize manually, or suffer unnecessary redundancy for such URLs.Because they don't attempt canonical form, these classes are surprisingly difficult to use securely. Suppose you're building a webservice that checks that incoming paths are prefixed "/static/images/" before serving the corresponding assets from the filesystem.
By canonicalizing the input paths, they are complicit in directory traversal attacks. Code that checks only the path prefix may suffer!String attack = "http://example.com/static/images/../../../../../etc/passwd"; System.out.println(new URL(attack).getPath()); System.out.println(new URI(attack).getPath()); System.out.println(HttpUrl.parse(attack).path());/static/images/../../../../../etc/passwd /static/images/../../../../../etc/passwd /etc/passwdIf it works on the web, it should work in your application
Thejava.net.URIclass is strict around what URLs it accepts. It rejects URLs like "http://example.com/abc|def" because the '|' character is unsupported. This class is more forgiving: it will automatically percent-encode the '|', yielding "http://example.com/abc%7Cdef". This kind behavior is consistent with web browsers.HttpUrlprefers consistency with major web browsers over consistency with obsolete specifications.Paths and Queries should decompose
Neither of the built-in URL models offer direct access to path segments or query parameters. Manually usingStringBuilderto assemble these components is cumbersome: do '+' characters get silently replaced with spaces? If a query parameter contains a '&', does that get escaped? By offering methods to read and write individual query parameters directly, application developers are saved from the hassles of encoding and decoding.Plus a modern API
The URL (JDK1.0) and URI (Java 1.4) classes predate builders and instead use telescoping constructors. For example, there's no API to compose a URI with a custom port without also providing a query and fragment.Instances of
HttpUrlare well-formed and always have a scheme, host, and path. Withjava.net.URLit's possible to create an awkward URL likehttp:/with scheme and path but no hostname. Building APIs that consume such malformed values is difficult!This class has a modern API. It avoids punitive checked exceptions:
parse()returns null if the input is an invalid URL. You can even be explicit about whether each component has been encoded already.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classHttpUrl.Builder
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static intdefaultPort(String scheme)Returns 80 ifscheme.equals("http"), 443 ifscheme.equals("https")and -1 otherwise.StringencodedFragment()StringencodedPassword()Returns the password, or an empty string if none is set.StringencodedPath()Returns the entire path of this URL, encoded for use in HTTP resource resolution.List<String>encodedPathSegments()StringencodedQuery()Returns the query of this URL, encoded for use in HTTP resource resolution.StringencodedUsername()Returns the username, or an empty string if none is set.booleanequals(Object o)Compares this instance with the specified object and indicates if they are equal.Stringfragment()static HttpUrlget(URI uri)static HttpUrlget(URL url)inthashCode()Returns an integer hash code for this object.Stringhost()Returns the host address suitable for use withInetAddress.getAllByName(String).booleanisHttps()HttpUrl.BuildernewBuilder()static HttpUrlparse(String url)Returns a newHttpUrlrepresentingurlif it is a well-formed HTTP or HTTPS URL, or null if it isn't.Stringpassword()Returns the decoded password, or an empty string if none is present.List<String>pathSegments()intpathSize()intport()Returns the explicitly-specified port if one was provided, or the default port for this URL's scheme.Stringquery()StringqueryParameter(String name)Returns the first query parameter namednamedecoded using UTF-8, or null if there is no such query parameter.StringqueryParameterName(int index)Set<String>queryParameterNames()StringqueryParameterValue(int index)List<String>queryParameterValues(String name)intquerySize()HttpUrlresolve(String link)Returns the URL that would be retrieved by followinglinkfrom this URL.Stringscheme()Returns either "http" or "https".StringtoString()Returns a string containing a concise, human-readable description of this object.URIuri()Attempt to convert this URL to ajava.net.URI.URLurl()Returns this URL as ajava.net.URL.Stringusername()
-
-
-
Method Detail
-
url
public URL url()
Returns this URL as ajava.net.URL.
-
uri
public URI uri()
Attempt to convert this URL to ajava.net.URI. This method throws an uncheckedIllegalStateExceptionif the URL it holds isn't valid by URI's overly-stringent standard. For example, URI rejects paths containing the '[' character. Consult that class for the exact rules of what URLs are permitted.
-
scheme
public String scheme()
Returns either "http" or "https".
-
isHttps
public boolean isHttps()
-
encodedUsername
public String encodedUsername()
Returns the username, or an empty string if none is set.
-
username
public String username()
-
encodedPassword
public String encodedPassword()
Returns the password, or an empty string if none is set.
-
password
public String password()
Returns the decoded password, or an empty string if none is present.
-
host
public String host()
Returns the host address suitable for use withInetAddress.getAllByName(String). May be:- A regular host name, like
android.com. - An IPv4 address, like
127.0.0.1. - An IPv6 address, like
::1. Note that there are no square braces. - An encoded IDN, like
xn--n3h.net.
- A regular host name, like
-
port
public int port()
Returns the explicitly-specified port if one was provided, or the default port for this URL's scheme. For example, this returns 8443 forhttps://square.com:8443/and 443 forhttps://square.com/. The result is in[1..65535].
-
defaultPort
public static int defaultPort(String scheme)
Returns 80 ifscheme.equals("http"), 443 ifscheme.equals("https")and -1 otherwise.
-
pathSize
public int pathSize()
-
encodedPath
public String encodedPath()
Returns the entire path of this URL, encoded for use in HTTP resource resolution. The returned path is always nonempty and is prefixed with/.
-
encodedQuery
public String encodedQuery()
Returns the query of this URL, encoded for use in HTTP resource resolution. The returned string may be null (for URLs with no query), empty (for URLs with an empty query) or non-empty (all other URLs).
-
query
public String query()
-
querySize
public int querySize()
-
queryParameter
public String queryParameter(String name)
Returns the first query parameter namednamedecoded using UTF-8, or null if there is no such query parameter.
-
queryParameterName
public String queryParameterName(int index)
-
queryParameterValue
public String queryParameterValue(int index)
-
encodedFragment
public String encodedFragment()
-
fragment
public String fragment()
-
resolve
public HttpUrl resolve(String link)
Returns the URL that would be retrieved by followinglinkfrom this URL.
-
newBuilder
public HttpUrl.Builder newBuilder()
-
parse
public static HttpUrl parse(String url)
Returns a newHttpUrlrepresentingurlif it is a well-formed HTTP or HTTPS URL, or null if it isn't.
-
equals
public boolean equals(Object o)
Description copied from class:ObjectCompares this instance with the specified object and indicates if they are equal. In order to be equal,omust represent the same object as this instance using a class-specific comparison. The general contract is that this comparison should be reflexive, symmetric, and transitive. Also, no object reference other than null is equal to null.The default implementation returns
trueonly ifthis == o. See Writing a correctequalsmethod if you intend implementing your ownequalsmethod.The general contract for the
equalsandObject.hashCode()methods is that ifequalsreturnstruefor any two objects, thenhashCode()must return the same value for these objects. This means that subclasses ofObjectusually override either both methods or neither of them.- Overrides:
equalsin classObject- Parameters:
o- the object to compare this instance with.- Returns:
trueif the specified object is equal to thisObject;falseotherwise.- See Also:
Object.hashCode()
-
hashCode
public int hashCode()
Description copied from class:ObjectReturns an integer hash code for this object. By contract, any two objects for whichObject.equals(java.lang.Object)returnstruemust return the same hash code value. This means that subclasses ofObjectusually override both methods or neither method.Note that hash values must not change over time unless information used in equals comparisons also changes.
See Writing a correct
hashCodemethod if you intend implementing your ownhashCodemethod.- Overrides:
hashCodein classObject- Returns:
- this object's hash code.
- See Also:
Object.equals(java.lang.Object)
-
toString
public String toString()
Description copied from class:ObjectReturns a string containing a concise, human-readable description of this object. Subclasses are encouraged to override this method and provide an implementation that takes into account the object's type and data. The default implementation is equivalent to the following expression:getClass().getName() + '@' + Integer.toHexString(hashCode())
See Writing a useful
toStringmethod if you intend implementing your owntoStringmethod.
-
-