Base64 and Unicode: UTF-8 Encoding Pitfalls Developers Hit
Fix Base64 Unicode bugs: encode UTF-8 bytes, avoid btoa pitfalls, and verify emoji round-trips in APIs.
Quick answer
Base64 encodes bytes. Convert Unicode text to UTF-8 bytes before encoding. Base64 is encoding, not encryption.
Key takeaways
- ›btoa only accepts Latin-1 — use TextEncoder for Unicode.
- ›Emoji fails when UTF-8 steps are skipped.
- ›Document utf-8 in API schemas for Base64 fields.
- ›Test round-trips with the Base64 tool first.
Apply this guide with the Base64 Encoder & Decoder
Open Base64 Encoder & DecoderBase64 transforms bytes into ASCII text. Unicode problems appear when encoders and decoders disagree on which bytes represent your string — usually UTF-8 vs UTF-16 code units. The Base64 Encoder & Decoder encodes UTF-8 text correctly in the browser; this guide explains the pitfalls that break round-trips in APIs and JWT workflows.
Encoding is not encryption
Base64 (and Base64URL) are reversible encodings. Anyone can decode without a key. Do not use Base64 to hide API keys, passwords, or PII.
For JWT-specific rules (+/ vs -_), see Base64URL in JWTs and APIs.
The core rule: encode UTF-8 bytes
Unicode strings must be converted to UTF-8 bytes before Base64 encoding.
// Browser — correct UTF-8 path
const text = 'Hello 世界 🌍';
const bytes = new TextEncoder().encode(text);
const base64 = btoa(String.fromCharCode(...bytes));// Node.js — correct
Buffer.from('Hello 世界 🌍', 'utf8').toString('base64');What goes wrong with btoa/atob
btoa() expects Latin-1 (ISO-8859-1) code units, not arbitrary Unicode.
btoa('Hello'); // works
btoa('Hello 世界'); // throws InvalidCharacterError in browserFix: UTF-8 encode first, then Base64 the bytes (as above), or use the Base64 tool for interactive checks.
UTF-8 vs UTF-16 confusion
| Approach | Result |
|---|---|
| UTF-8 bytes → Base64 | Interoperable (JSON, JWT, most APIs) |
| UTF-16 code units → Base64 | Breaks cross-language decoders |
// JavaScript pitfall — UTF-16 code units as bytes (WRONG for APIs)
const wrong = btoa(unescape(encodeURIComponent(text))); // legacy pattern — know why it works
// Prefer TextEncoder for clarityC#, Java, and Python defaults vary — always document utf-8 in API specs.
Emoji and multi-byte characters
Emoji like 🌍 are multiple UTF-8 bytes. Encoding a single UTF-16 surrogate without the pair corrupts output.
🌍 UTF-8 bytes: F0 9F 8C 8D (4 bytes)
Base64 expands all binary — expect ~33% growthTest emoji in the Base64 Encoder before assuming your mobile client matches your server.
Round-trip checklist
- Encode: string → UTF-8 bytes → Base64
- Decode: Base64 → bytes → UTF-8 string
- Compare original and decoded strings character-for-character
- Watch for trailing
=padding differences (some APIs strip padding)
function decodeBase64Utf8(base64) {
const binary = atob(base64);
const bytes = Uint8Array.from(binary, (c) => c.charCodeAt(0));
return new TextDecoder().decode(bytes);
}API and JSON pitfalls
| Pitfall | Symptom |
|---|---|
| Server UTF-8, client Latin-1 | Mojibake (é instead of é) |
| Line breaks in Base64 fields | Decode failures — strip whitespace |
| URL-safe vs standard alphabet | +// vs -/_ mismatch |
| Double encoding | Decode once still looks like Base64 |
{
"fileName": "report.pdf",
"contentBase64": "JVBERi0xLjQK..."
}Document charset (utf-8) in your OpenAPI schema for string fields that are decoded after Base64.
Common mistakes
- Calling
btoadirectly on Unicode text - Assuming
atoboutput is a UTF-8 string (it is a binary string) - Mixing Base64URL decode rules with standard Base64 fields
- Treating encoded blobs as hashed or encrypted
- Logging decoded payloads containing secrets
How to use ByteToolBox Base64 tool safely
- Open Base64 Encoder & Decoder
- Paste Unicode text including emoji and accented characters
- Encode, then decode — confirm round-trip equality
- Compare with server output byte-for-byte
- Do not paste production secrets or live JWTs into any online tool unless you trust the environment — ByteToolBox runs locally, but treat credentials carefully
Related tools
- Base64 Encoder & Decoder — Unicode-safe encode/decode in browser
- JSON Formatter — inspect JSON that wraps Base64 fields
Try Base64 Encoder/Decoder
Verify your UTF-8 round-trip with the Base64 tool before debugging production encoding mismatches.
Related tools
Related guides
Base64 Encoding: When and Why to Use It
Understanding Base64 encoding, its use cases, and best practices for web development.
Base64Base64URL in JWTs and APIs: Encoding Rules Developers Miss
Learn how Base64URL differs from Base64, why JWTs use it, how padding works, and common API encoding mistakes.
JSONHow to Find and Fix Invalid JSON With Real Error Examples
Fix invalid JSON with real examples: trailing commas, missing quotes, bad escaping, comments, and mismatched brackets.