Stephen Horne

← Back to blog

Published on 09/04/2024 13:43 by Stephen Horne

JSON and the Golden Fleece, Pt 6

Hi. There have been many improvements to the parsing logic itself, and the agorithm’s ability to identify and complain about bad JSON syntax and value formatting. That’s been pretty fun, and I got all the bad test payloads here to trigger the throwing of an exception with a specific message with the serialized offending value and, when relevant, why the given value was deemed inappropriate given the parser’s state at that moment. I want to keep track of where in the payload the offenses are made and print the line and column (# characters from line beginning) information for them, but last I worked on the parsing logic there was an edge case with array collapsing that I still need to address. I started to find writing tests to be very, very tedious and the resulting test code hard to read, however, so I began overloading operators. Before:

        tests.add({ "correctly parses {\"numbos\": [1, 2, 3]}", [](){
            std::stringstream sstream(R"({"numbos": [1, 2, 3]})");
            auto parser = JSON::Parser();

            auto json = JSON::Parser().parse(sstream);
            auto objectNode = static_cast<JSON::ObjectNode*>(json.get());
            auto received = static_cast<JSON::ObjectStorageType*>(objectNode->getValue());
            if (received->size() != 1)
                return false;

            if (received->at("numbos")->getType() != JSON::Type::Array)
                return false;

            auto receivedArrayPtr = static_cast<JSON::ArrayStorageType*>(received->at("numbos")->getValue());

            if (receivedArrayPtr->size() != 3)
                return false;
            
            auto num1Ptr = static_cast<double*>(receivedArrayPtr->at(0)->getValue());
            if (receivedArrayPtr->at(0)->getType() != JSON::Type::Number || *num1Ptr != 1) {
                return false;
            }

            auto num2Ptr = static_cast<double*>(receivedArrayPtr->at(1)->getValue());
            if (receivedArrayPtr->at(1)->getType() != JSON::Type::Number || *num2Ptr != 2) {
                return false;
            }

            auto num3Ptr = static_cast<double*>(receivedArrayPtr->at(2)->getValue());
            if (receivedArrayPtr->at(2)->getType() != JSON::Type::Number || *num3Ptr != 3) {
                return false;
            }

            return true;
        }});

and after:

        tests.add({ "correctly parses {\"numbos\": [1, 2, 3]}", [](){
            std::stringstream sstream(R"({"numbos": [1, 2, 3]})");
            auto json = JSON::Parser().parse(sstream);

            JSON::ArrayStorageType numbos;
            numbos.push_back(std::make_unique<JSON::NumberNode>(1));
            numbos.push_back(std::make_unique<JSON::NumberNode>(2));
            numbos.push_back(std::make_unique<JSON::NumberNode>(3));

            JSON::ObjectStorageType map;
            map.emplace("numbos", std::make_unique<JSON::ArrayNode>(std::move(numbos)));

            return json == Test::createJSON<JSON::ObjectNode>(map);
        }});

Mmm yum. C++ doesn’t have to be scary! This is just under 39% the number of lines, and is easy to read and understand even if you don’t know C++. I refactored all of the parsing tests using these new, adorable operators and the nifty little function template I created to manufacture JSON objects to compare against parser output:

namespace Test {
    ...
    template<class NodeType, typename type>
    JSON::JSON createJSON(type& value) {
        return JSON::JSON(std::move(std::make_unique<NodeType>(std::move(value))));
    }

    template<class NodeType>
    JSON::JSON createJSON() {
        return JSON::JSON(std::move(std::make_unique<NodeType>()));
    }
    ...
}

There may be a nicer way to coalesce these two templates into one, but this is pretty clean, I think. Each time I come back to this project I’m learning new C++ tricks, so I’m in no hurry to gussy this up further. Besides, I suspect that making a single template for both of these wouldn’t be as easy to read.

So, with these changes I can write really nice tests. What’s cool is that, of course, improving testability in this way also improves usability. Really, though? Just use simdjson. It’s amazing. More on SIMD programming from a PS3 programming perspective (nerd alert! but you should have turned back by now!) here. Maybe I’ll be that good at programming one day. You know, if Anthropic don’t replace me wholesale by then with Claude.

I suppose since you’re here, you can see what’s under the hood of these overloaded operators. They were not free. I’ll skip JSON::JSON::operator== because all that does is delegate the operation to the head node of its inner tree. So, if the JSON::JSON instance is the result of parsing

{
    "age": 99,
    "aliases": [
        "Big Steve",
        "Stevie Baby",
        "Mephistopheles"
    ],
    "address": {
        "address1": "123 Noneya Ave",
        "city": "Nowhereville",
        "state": "ME",
        "postalCode": "04101"
    }
}

It is JSON::ObjectNode::operator== that is called when you pull this little stunt:

bool isWhatIThoughtItWouldBe(JSON::JSON& it, JSON::JSON& expected) {
    return it == expected; // cross your fingers
}

And that is made possible by this method (operator!= included for completeness):

namespace JSON {

    bool ObjectNode::operator==(const ValueNodeBase& other) const {
        if (other.getType() != Type::Object)
            return false;

        auto otherUmap = static_cast<ObjectStorageType*>(other.getValue());
        if (value->size() != otherUmap->size())
            return false;

        return std::all_of(value->begin(), value->end(), isEqual(otherUmap));
    }

    // still kinda redundant, but I like the explicitness
    bool ObjectNode::operator!=(const ValueNodeBase& other) const {
        if (other.getType() != Type::Object)
            return true;

        auto otherUmap = static_cast<ObjectStorageType*>(other.getValue());
        if (value->size() != otherUmap->size())
            return true;

        return !std::all_of(value->begin(), value->end(), isEqual(otherUmap));
    }
}

This is only as pretty as it is because I extracted the predicate being passed in to std::all_of into its own function. I’m not pasting that whole thing here. I’ve assaulted your poor eyes enough for one day. Go ahead and click that link if you really want to see it. It’s pretty slick, I think, but if pointers scare you don’t say I didn’t warn you. The ArrayNode operator overload code is pretty similar, made a little nicer thanks to the very nifty std::equal function (just make sure your vectors are the same size first unless segfaults are your thing!).

That’s it for now. Browse the repo in its current state, if you like, and any guidance from C++ magicians is more than welcome! I’m just figuring this out as I go. Thanks and au revoir!

Correction: too many uses of std::move

I realize I’ve been overusing std::move. For example, the code excerpt from above:

namespace Test {
    ...
    template<class NodeType, typename type>
    JSON::JSON createJSON(type& value) {
        return JSON::JSON(std::move(std::make_unique<NodeType>(std::move(value))));
    }

    template<class NodeType>
    JSON::JSON createJSON() {
        return JSON::JSON(std::move(std::make_unique<NodeType>()));
    }
    ...
}

std::move is not necessary around the call of the factory function std::make_unique, as it already returns an rvalue. I’ve made the following change in the actual code, and a similar change in many other places where I made the same error:

namespace Test {
    ...
    template<class NodeType, typename type>
    JSON::JSON createJSON(type& value) {
        return JSON::JSON(std::make_unique<NodeType>(std::move(value)));
    }

    template<class NodeType>
    JSON::JSON createJSON() {
        return JSON::JSON(std::make_unique<NodeType>());
    }
    ...
}

std::move is needed in order to convert value, a reference to an lvalue, into an rvalue so that the constructor called by std::make_unique for the to-be-decided class NodeType receives it and is able to take ownership of it. As an added bonus, it is less annoying to look at.

Written by Stephen Horne

← Back to blog