Stephen Horne

← Back to blog

Published on 08/28/2024 14:00 by Stephen Horne

JSON and the Golden Fleece, Pt 5

Hey look at this commit. What do you see? Yes, it’s a bunch of very ugly text. Those are C++ source files. I’m so sorry. But wait! What this ugly text compiles into is a working JSON parser! I still need to do some more testing, and refactor it because it could be less repetitive and defensive (JSON payloads are a scary landscape when you can only see one byte at a time), and improve efficiency here and there with some more helpful, actionable error messages, but it works!

It works on all six of the JSON data types: Object, Array, String, Null, Number and Boolean. It works when nesting an Array in an Object, or an Object in an Array, and, because of my “clever” algorithm, it keeps context regardless of how deep it is in a payload. It also uses proper C++ ownership and move semantics, saving memory and keeping pointers from going rogue and leaking memory. It uses a polymorphic tree to store all the parsed data. It doesn’t waste parsing buffer space on, well, spaces. It appears to be pretty fast, though I’ll have to benchmark with some big JSON payloads.

Compile time is a little long but apparently GCC’s linker is just slow? I read about using foreward declarations instead of #include statements to improve compile time, but I couldn’t find any real opportunities in my project to do so. Once it’s compiled, though, it runs all the tests in the blink of an eye. Good enough for now.

How It Works

The algorithm uses a loop. That is, a while loop that will go on forever until the stream providing the character bytes reaches EOF. Within this loop, after an EOF check, is a very large switch. Every character of interest has its own case, with many of them falling through for grouped characters like digits and those that make up true, false and null. As these “primitive” characters are encountered, they are added to a buffer until an event happens which signals that it’s time to makes sense of those bytes.

static void collapse(StackType& stack, const Type type) {
    if (type != Type::Array && type != Type::Object)
        throw std::runtime_error("JSON::collapse called with non-container Type enum");

    StackType temp;

    while (!stack.empty() && stack.top().second->getType() != type) {
        temp.push(std::move(stack.top()));
        stack.pop();
    }

    if (stack.empty())
        throw std::runtime_error("did not find beginning of container when collapsing");

    if (type == Type::Object) {
        auto objectPtr = static_cast<ObjectNode*>(stack.top().second.get());

        while (!temp.empty()) {
            auto tempTop = std::move(temp.top());
            if (tempTop.first.empty())
                throw std::runtime_error("empty key encountered while collapsing Object");

            objectPtr->insert(std::move(tempTop.first), std::move(tempTop.second));

            temp.pop();
        }
    } else {
        auto arrayPtr = static_cast<ArrayNode*>(stack.top().second.get());

        while (!temp.empty()) {
            auto tempTop = std::move(temp.top());
            if (!tempTop.first.empty())
                throw std::runtime_error("non-empty key encountered while collapsing Array");

            arrayPtr->insert(std::move(tempTop.second));

            temp.pop();
        }
    }
}

Above is my collapse procedure that is triggered when a } or ] is encountered. More on that in a moment.

The “well, what do we have here?” events are all triggered by encountering specific characters, except for EOF, which is just a bit somewhere in the bowels of a stream object. The characters are:

case '"':
    if (!parsingBuffer.empty())
        throw std::runtime_error("unexpectd double-quote (\")");

    parsingBuffer.push_back('"');
    while (stream.get(byte)) {
        parsingBuffer.push_back(byte);
        if (byte == '"')
            break;
    }

    if (!stack.empty()) {
        if (readyForObjectKey(stack)) {
            parsingBuffer.erase(parsingBuffer.begin());
            parsingBuffer.erase(parsingBuffer.end() - 1);
            stack.push(
                std::make_pair(
                    parsingBuffer,
                    std::move(std::unique_ptr<ValueNodeBase>{}))
            );
            parsingBuffer.clear();
        }
    }

Okay, well, look at the code so far if you feel so inclined. I’ve been at the computer long enough today. À la prochaine, mes amis!

Written by Stephen Horne

← Back to blog