Skip to Content
JSON

JSON

JSON (JavaScript Object Notation) is a lightweight text format for exchanging data. Python’s json module serializes Python objects to JSON strings and deserializes JSON back to Python. It is built into the standard library and requires no installation.

JSON shows up in APIs, config files, and data pipelines. Understanding the type mapping between Python and JSON, plus the common pitfalls (NaN, custom objects, encoding), saves time in interviews and real projects.

What is JSON?

JSON is a text format that uses a small set of structures: objects (key-value pairs), arrays (ordered lists), strings, numbers, booleans, and null. It is human-readable and widely supported across languages. The format does not support comments, trailing commas, or single quotes.

Valid JSON types:

JSON TypeExample
Object{"a": 2, "b": 4}
Array[2, 4, 6, 8]
String"hello"
Number42, 2.46
Booleantrue, false
Nullnull

Python to JSON Type Mapping

The json module maps Python types to JSON types in a fixed way. Not every Python type has a direct JSON equivalent.

Python TypeJSON Type
dictobject
list, tuplearray
strstring
int, floatnumber
boolboolean
Nonenull

No direct mapping: set, frozenset, bytes, complex, datetime, and custom classes are not JSON types. Serializing them requires custom handling.

import json data = {"a": 2, "b": 4, "c": [6, 8], "d": None, "e": True} s = json.dumps(data) print(s) # {"a": 2, "b": 4, "c": [6, 8], "d": null, "e": true}

Serialize: json.dumps()

json.dumps(obj) converts a Python object to a JSON string. The result is a str, not bytes.

import json data = {"x": 2, "y": 4, "z": 6} s = json.dumps(data) print(s) # {"x": 2, "y": 4, "z": 6} print(type(s)) # <class 'str'>

Common Parameters

ParameterDefaultDescription
indentNonePretty-print with this many spaces per level
sort_keysFalseSort dictionary keys in output
ensure_asciiTrueEscape non-ASCII characters as \uXXXX
separators(', ', ': ')Custom separators for compact output
defaultNoneCallable for non-serializable types

Pretty printing: Use indent=2 or indent=4 for readable output.

data = {"a": 2, "b": 4, "c": [6, 8]} print(json.dumps(data, indent=2)) # { # "a": 2, # "b": 4, # "c": [6, 8] # }

Sort keys: Use sort_keys=True for deterministic output (useful for diffs and tests).

data = {"z": 2, "a": 4, "m": 6} print(json.dumps(data, sort_keys=True)) # {"a": 4, "m": 6, "z": 2}

Compact output: Use separators=(',', ':') to remove spaces for smaller strings.

data = {"a": 2, "b": 4} print(json.dumps(data, separators=(',', ':'))) # {"a":2,"b":4}

Non-ASCII: With ensure_ascii=True (default), characters like Ă© become \u00e9. Set ensure_ascii=False to keep them as-is.

data = {"name": "José"} print(json.dumps(data)) # {"name": "Jos\u00e9"} print(json.dumps(data, ensure_ascii=False)) # {"name": "José"}

Deserialize: json.loads()

json.loads(s) parses a JSON string and returns a Python object. The input must be a str or bytes (decoded as UTF-8).

import json s = '{"a": 2, "b": 4, "c": [6, 8]}' data = json.loads(s) print(data) # {'a': 2, 'b': 4, 'c': [6, 8]} print(type(data)) # <class 'dict'>

JSON arrays become lists: JSON has no tuple type. Arrays always deserialize to Python lists.

s = '[2, 4, 6, 8]' data = json.loads(s) print(type(data)) # <class 'list'>

Invalid JSON: Malformed input raises json.JSONDecodeError (a subclass of ValueError).

# json.loads('{invalid}') # JSONDecodeError # json.loads("['single quotes']") # JSONDecodeError - JSON requires double quotes

File Operations: json.dump() and json.load()

For reading and writing files, use json.dump() and json.load() instead of dumps and loads. They take a file-like object as the second argument.

Write to File

json.dump(obj, fp) writes the JSON string to the file. The file must be opened for writing (text or binary mode).

import json data = {"scores": [82, 84, 86, 88]} with open("scores.json", "w") as f: json.dump(data, f)

With options: Pass the same parameters as dumps for pretty printing or sorting.

data = {"a": 2, "b": 4} with open("out.json", "w") as f: json.dump(data, f, indent=2, sort_keys=True)

Read from File

json.load(fp) reads the entire file and parses it as JSON. The file must be opened for reading.

import json with open("scores.json") as f: data = json.load(f) print(data) # {'scores': [82, 84, 86, 88]}

File position: json.load() reads the whole file. Do not mix with other reads or writes on the same file handle in the same block.

Handling Non-Serializable Types

By default, json.dumps() raises TypeError when it encounters a type it cannot serialize (e.g. set, datetime, custom classes). Use the default parameter to provide a custom serializer.

The default Callable

default receives the object that cannot be serialized and must return a JSON-serializable value (or raise TypeError).

import json from datetime import datetime def serialize(obj): if isinstance(obj, datetime): return obj.isoformat() if isinstance(obj, set): return list(obj) raise TypeError(f"Object of type {type(obj).__name__} is not JSON serializable") data = {"created": datetime(2024, 2, 4), "tags": {2, 4, 6}} s = json.dumps(data, default=serialize) print(s) # {"created": "2024-02-04T00:00:00", "tags": [2, 4, 6]}

Common pattern: Convert to a string or a dict representation.

def default(obj): if hasattr(obj, "__dict__"): return obj.__dict__ raise TypeError(f"{type(obj).__name__} not serializable") class Point: def __init__(self, x, y): self.x = x self.y = y data = {"p": Point(2, 4)} print(json.dumps(data, default=default)) # {"p": {"x": 2, "y": 4}}

Custom JSONEncoder

For more control, subclass json.JSONEncoder and override default. Pass it with cls=MyEncoder.

import json class SetEncoder(json.JSONEncoder): def default(self, obj): if isinstance(obj, set): return {"__set__": True, "items": list(obj)} return super().default(obj) data = {"ids": {2, 4, 6}} s = json.dumps(data, cls=SetEncoder) print(s) # {"ids": {"__set__": true, "items": [2, 4, 6]}}

To round-trip (decode back to a set), a custom decoder is needed (see object_hook below).

Handling Special Float Values

JSON does not support NaN, Infinity, or -Infinity. Python’s float('nan') and float('inf') are not valid JSON. By default, json.dumps() raises ValueError when it encounters them.

Python 3.5+: json.dumps() accepts allow_nan=True (default). With allow_nan=True, it outputs NaN, Infinity, and -Infinity as literal tokens. These are valid in JavaScript but not in strict JSON (RFC 8259). Many parsers accept them anyway.

import json data = {"value": float('nan'), "max": float('inf')} s = json.dumps(data) print(s) # {"value": NaN, "max": Infinity}

Strict JSON: Set allow_nan=False to raise ValueError on NaN or Infinity. Use this when the output must be strict RFC 8259 JSON.

# json.dumps({"x": float('nan')}, allow_nan=False) # ValueError

Deserializing: json.loads() accepts NaN, Infinity, and -Infinity by default and converts them to Python floats. Set parse_constant to customize or reject them.

Custom Decoding with object_hook

object_hook is a callable that receives every decoded JSON object (dict) and can transform it before it is returned. Use it to restore custom types or apply post-processing.

import json def decode_set(d): if d.get("__set__") is True: return set(d["items"]) return d s = '{"ids": {"__set__": true, "items": [2, 4, 6]}}' data = json.loads(s, object_hook=decode_set) print(data) # {'ids': {2, 4, 6}} print(type(data["ids"])) # <class 'set'>

Round-trip with custom types: Encode with a custom default and decode with a matching object_hook to preserve types across serialization.

object_pairs_hook for Key Order

object_pairs_hook receives a list of (key, value) pairs instead of a dict. Use it when key order matters (Python 3.7+ dicts preserve order, but object_hook receives an already-built dict). object_pairs_hook receives the raw pairs in the order they appeared in the JSON.

import json from collections import OrderedDict s = '{"z": 2, "a": 4, "m": 6}' data = json.loads(s, object_pairs_hook=OrderedDict) print(list(data.keys())) # ['z', 'a', 'm'] - preserves JSON key order

Reading and Writing Files Safely

Always use a context manager (with) so the file is closed even if an error occurs.

import json # Write with open("config.json", "w") as f: json.dump({"timeout": 20, "retries": 4}, f, indent=2) # Read with open("config.json") as f: config = json.load(f)

Encoding: json.dump() and json.load() use the file’s encoding. For text mode, the default is usually UTF-8. For binary mode, json.load() expects UTF-8 encoded bytes.

JSON vs Other Formats

FormatUse casePython support
JSONAPIs, config, human-readable data exchangejson module
picklePython-only, arbitrary objects, not human-readablepickle module
YAMLConfig files, comments, more readablepyyaml (third-party)
TOMLConfig files, simple syntaxtoml (third-party)

When to use JSON: Cross-language data exchange, REST APIs, config that must be readable. Do not use JSON for sensitive data without encryption, or when you need to store arbitrary Python objects (use pickle for that, but only for trusted sources).

Common Use Cases

  • API responses: Parse response.json() (from requests) or json.loads(response.text) to get a Python dict.
  • Config files: Store settings as JSON; load at startup with json.load().
  • Logging or caching: Serialize state to JSON for persistence; deserialize when loading.
  • Data pipelines: Pass structured data between services as JSON strings.

Tricky Behaviors

Serialize - Tuples become arrays

JSON has no tuple type. json.dumps((2, 4, 6)) produces [2, 4, 6]. Round-trip: json.loads(json.dumps((2, 4, 6))) returns a list, not a tuple.

Serialize - Keys must be strings

JSON object keys must be strings. json.dumps({2: "a", 4: "b"}) converts keys to strings: {"2": "a", "4": "b"}. Deserializing gives string keys, not integers.

Serialize - Sets and custom types raise by default

json.dumps({"x": {2, 4}}) raises TypeError. Use default or a custom encoder to handle sets, datetimes, and custom classes.

Serialize - NaN and Infinity

With allow_nan=True (default), float('nan') and float('inf') serialize to NaN and Infinity. These are not valid strict JSON (RFC 8259) but are accepted by many parsers. Use allow_nan=False for strict compliance.

Deserialize - Arrays always become lists

JSON arrays deserialize to Python lists. There is no way to get tuples from json.loads() without post-processing.

Deserialize - Duplicate keys

JSON allows duplicate keys in an object. json.loads('{"a": 2, "a": 4}') returns {"a": 4}. The last value wins. Use object_pairs_hook if duplicate keys must be preserved.

File - dump vs dumps

json.dump(obj, f) writes to a file; json.dumps(obj) returns a string. Mixing them up (e.g. passing a file to dumps) causes TypeError.

File - File must be opened in correct mode

json.dump() needs a file opened for writing. json.load() needs a file opened for reading. Using a write-only file for load or a read-only file for dump causes errors.

Encoding - ensure_ascii and non-ASCII

With ensure_ascii=True (default), non-ASCII characters become \uXXXX escape sequences. Set ensure_ascii=False to keep Unicode characters in the output.

default - Must return serializable or raise

The default callable must return a value that json can serialize (dict, list, str, int, float, bool, None), or raise TypeError. Returning a non-serializable value causes a nested TypeError.

Round-trip - Type information is lost

JSON does not store type information. A Python tuple becomes a list after round-trip. A Python set must be encoded with a custom scheme (e.g. {"__set__": true, "items": [...]}) and decoded with object_hook to restore it.

Deserialize - parse_constant for NaN and Infinity

By default, json.loads() converts NaN, Infinity, and -Infinity to Python floats. To reject or transform them, pass a custom parse_constant that raises or returns a different value. Without it, these literals always become floats.

Interview Questions

What is JSON and what types does it support?

JSON is a text format for data exchange. It supports objects (key-value pairs), arrays, strings, numbers, booleans, and null. It does not support comments, trailing commas, or single-quoted strings.

How does Python map types to JSON?

dict maps to object, list and tuple to array, str to string, int and float to number, bool to boolean, and None to null. Types like set, bytes, datetime, and custom classes have no direct mapping and require custom handling.

What is the difference between json.dumps() and json.dump()?

dumps() serializes a Python object to a JSON string and returns it. dump() writes the JSON string to a file-like object. Same for loads() (parse string) vs load() (parse from file).

Why does json.dumps((2, 4, 6)) produce a list when deserialized?

JSON has no tuple type. Tuples are serialized as arrays. json.loads() always produces lists for arrays, so the round-trip loses the tuple type.

How do you pretty-print JSON?

Use json.dumps(obj, indent=2) or json.dumps(obj, indent=4). For file output, use json.dump(obj, f, indent=2).

How do you serialize a datetime or a custom class?

Use the default parameter: json.dumps(data, default=my_serializer). The callable receives non-serializable objects and must return a JSON-serializable value (e.g. ISO string for datetime, dict for custom objects) or raise TypeError.

What happens when you serialize float('nan') or float('inf')?

By default, json.dumps() outputs NaN and Infinity as literal tokens. These are valid in JavaScript but not in strict RFC 8259 JSON. Set allow_nan=False to raise ValueError instead.

How do you decode JSON back into a custom type (e.g. set)?

Use object_hook. The callable receives each decoded dict and can return a transformed value. For a set encoded as {"__set__": true, "items": [2, 4, 6]}, the hook checks for __set__ and returns set(d["items"]).

What is object_pairs_hook and when would you use it?

object_pairs_hook receives a list of (key, value) pairs instead of a dict. Use it when key order must be preserved (e.g. with OrderedDict) or when duplicate keys need special handling. object_hook receives an already-built dict, so key order is whatever the implementation used.

Why do JSON object keys become strings in Python?

JSON requires object keys to be strings. {"2": 4} in JSON deserializes to {"2": 4} in Python (string key). Numeric keys in a Python dict are converted to strings during serialization.

What does ensure_ascii do in json.dumps()?

When ensure_ascii=True (default), non-ASCII characters are escaped as \uXXXX. When False, Unicode characters appear as-is in the output. Set False when the output is consumed by a UTF-8 aware system and readability matters.

How do you handle duplicate keys in JSON?

Standard json.loads() builds a dict, so duplicate keys overwrite; the last value wins. To preserve order or handle duplicates, use object_pairs_hook with a custom structure (e.g. list of pairs or a dict that accumulates values per key).

When would you use json instead of pickle?

Use JSON for cross-language exchange, APIs, and human-readable config. Use pickle for Python-only persistence of arbitrary objects (including functions, classes). Never unpickle data from untrusted sources; JSON is safer for external input.

What is the difference between sort_keys and object_pairs_hook for key order?

sort_keys=True sorts keys alphabetically in the output. object_pairs_hook controls how decoded objects are built and can preserve the order in which keys appeared in the JSON string.

What happens if the default callable returns a non-serializable value?

The default function must return a value that json can serialize (dict, list, str, int, float, bool, None). If it returns something like a datetime or a custom object, json.dumps() raises TypeError when it tries to serialize that return value. The error occurs during serialization of the default result, not when default is called.

Can you pass a file object to json.loads()?

No. json.loads() expects a string or bytes. Passing a file object causes TypeError or unexpected behavior because the function expects the full JSON text, not a file handle. Use json.load(fp) to read from a file.

What does parse_constant do in json.loads()?

parse_constant is a callable that receives literal tokens like NaN, Infinity, and -Infinity during parsing. By default, these become Python floats. A custom parse_constant can raise an error to reject them or return a different value (e.g. None for NaN).

Last updated on