URL Encoding Explained: Percent-Encoding, Spaces, and the Plus Sign Trap

You've seen it before. A user pastes a link into Slack, someone clicks it, and instead of landing on the right search results they get a 404 — or worse, a completely wrong page. You dig into the URL and spot the problem immediately: there's a raw space sitting in the middle of the query string, or a + where there should be %20, or vice versa. It looks like a small thing. It is absolutely not a small thing.

URL encoding is one of those topics that seems trivially simple until the moment it quietly breaks your application in production. This article is about understanding it deeply enough that it never catches you off guard again.

Why URLs Can't Just Have Spaces In Them

The URL specification — originally RFC 1738, later updated through RFC 3986 — defines a very specific set of characters that are allowed to appear literally in a URL. Letters, digits, and a handful of special characters like hyphens, underscores, dots, and tildes. Everything else must be encoded.

The reason isn't arbitrary. URLs are passed around as text across dozens of different systems: HTTP headers, HTML attributes, JavaScript strings, server logs, email clients, terminal windows. Each of these systems has its own opinions about what counts as a delimiter or a control character. A raw space in a URL breaks HTTP parsing because the HTTP request line uses spaces to separate the method, path, and protocol version. A raw ampersand in a query parameter value looks like the start of the next parameter. A raw hash character signals the browser to stop sending data to the server and treat the rest as a fragment identifier.

So the standard answer is percent-encoding: replace the problematic character with a percent sign followed by the character's two-digit hexadecimal ASCII code. A space becomes %20. An ampersand becomes %26. A hash becomes %23. This is the universal, unambiguous solution defined in RFC 3986.

The Plus Sign: Where Things Get Messy

Here's where developers get into trouble.

HTML forms that use method="GET" serialize their data using a format called application/x-www-form-urlencoded. This format predates RFC 3986 and has its own encoding rules. One of those rules says that spaces should be encoded as a plus sign (+), not as %20.

So when a user types "hello world" into a search box and submits a GET form, the browser constructs a URL like /search?q=hello+world. The server's form-parsing library sees that plus sign and decodes it back to a space. Everything works fine — as long as every layer in the chain understands the application/x-www-form-urlencoded convention.

The problem is that + is a completely valid literal character in a URL path or query string. It does not automatically mean "space" in all contexts. It only means space when decoded by a form-decoding function that specifically knows about the application/x-www-form-urlencoded convention.

Consider this scenario: you build an API endpoint that accepts a user's full name as a query parameter. A client sends /api/user?name=John+Smith. Your server-side code uses a generic URL decoder instead of a form decoder. It sees John+Smith and returns... John+Smith. The plus sign is left alone because it's a valid URL character. The actual name "John Smith" never arrives.

Or the opposite problem: someone passes a Base64-encoded value through a query string. Base64 uses + and / as part of its alphabet. If the receiving end decodes it with a form decoder, every + in your Base64 payload silently becomes a space, and your data is corrupted.

The Two Encoding Functions You're Probably Confusing

In JavaScript, there are two built-in encoding functions, and they are not interchangeable.

encodeURI() was designed to encode a complete URL. It leaves intact all characters that have structural meaning in a URL — colons, slashes, question marks, hash signs, ampersands. You'd use this if you have a full URL and want to make it safe to pass around as a string without destroying its structure.

encodeURIComponent() is the one you should reach for when encoding a value that will be placed inside a URL. It encodes nearly everything that isn't a letter, digit, or one of - _ . ! ~ * ' ( ). Crucially, it encodes the characters that matter inside query strings: &, =, +, #, and the slash. It produces proper percent-encoding (%20 for spaces, %2B for plus signs), not the form-encoding shorthand.

The trap most developers fall into is using encodeURI() on a parameter value. Since encodeURI() doesn't encode & or =, a value like cat&dog passes through intact and becomes an extra query parameter instead of a single value containing the ampersand.

On the server side in PHP, the equivalent split is between urlencode() (which uses the +-for-space convention, appropriate for form data) and rawurlencode() (which always uses %20 for spaces, appropriate for path segments and general query values per RFC 3986). Python's urllib.parse.quote() defaults to %20-style encoding; urllib.parse.urlencode() produces form-encoded output with plus signs.

Path Segments vs. Query Parameters: Different Rules

Another dimension people miss: the rules aren't identical for every part of a URL.

In a path segment — the part between the slashes — a plus sign has absolutely no special meaning and should be left alone. /files/hello+world.txt refers to a file literally named "hello+world.txt". If you want a space in a path, you must use %20. There is no shortcut.

In a query string, the situation depends entirely on what convention the server expects. REST APIs built around RFC 3986 will expect %20 for spaces and %2B for literal plus signs. Traditional web frameworks that process form submissions will expect + for spaces and %2B for literal plus signs. The encoded form of a literal plus sign (%2B) is the same in both conventions — only the space encoding differs.

This is why you can't blindly copy a URL from a form submission and use it in an API call. They may look the same but decode differently.

The Base64 Landmine in Query Strings

Base64 encoding is frequently used to pass binary data or complex structured data through a URL. Developers encode a payload in Base64, drop it into a query parameter, and assume it'll arrive intact. It often doesn't.

Standard Base64 uses the characters A-Z, a-z, 0-9, +, and /, with = as padding. Of those, +, /, and = all have special meaning in URLs or query strings.

The solution is URL-safe Base64, which substitutes - for + and _ for /. Padding is often omitted. Most languages have a variant: Python's base64.urlsafe_b64encode(), Java's Base64.getUrlEncoder(), and JavaScript libraries that support the base64url variant. If you're passing Base64 through a URL and not using the URL-safe variant, you're relying on luck.

A Practical Checklist

When you're working with URLs and encoding, here's how to think through it:

  1. Are you encoding a full URL or a single value? Full URL: encodeURI() or equivalent. Single value going into a query parameter: encodeURIComponent() or equivalent.
  2. Is the value going into a path segment or a query string? Path segments must use %20 for spaces. Query strings depend on what the server expects, but %20 is always safe and unambiguous.
  3. Does the receiving server use form-decoding or generic URL-decoding? If you control both ends, prefer %20-style encoding to eliminate ambiguity.
  4. Are you putting Base64 data in a URL? Use URL-safe Base64 (- and _ instead of + and /).
  5. Are you constructing URLs by hand with string concatenation? Stop doing that. Use a URL builder or template that handles encoding for you. Manual concatenation is where encoding bugs are born.

When Decoding Goes Wrong

Double-encoding is the other classic failure mode. You encode a value, then pass it to a function that encodes it again. Now %20 becomes %2520 (the % itself gets encoded to %25). The server decodes it once and gets the literal string %20 instead of a space. This happens most often when URL values are passed between layers of an application — one layer encodes, the next layer encodes again without checking.

The fix is to be explicit about where encoding happens. Encode at the boundary where you're constructing the URL. Don't encode at every layer and hope it cancels out.

The Actual Rule to Remember

If you walk away with one thing: in a URL, + is not a reliable encoding for a space. It only works as a space when the thing reading it expects form-encoded data. In every other context, it's a literal plus sign. Use %20 when you want a space and need it to be unambiguous everywhere.

The encoding system for URLs was designed in an era when the web was simpler and the distinction between "form data" and "arbitrary query values" wasn't as sharp. We've inherited that ambiguity. The only way out is to understand which convention applies in your specific context and be deliberate about it — rather than letting a hidden mismatch corrupt your data silently, miles from where you'll ever think to look.