▲Packed Data Support in Haskellarthi-chaud.github.io

77 points by matt_d 143 days ago | 12 comments

nine_k 143 days ago [-]

> Introducing the ‘packed’ data format, a binary format that allows using data as it is, without the need for a deserialisation step. A notable perk of this format is that traversals on packed trees is proven to be faster than on ‘unpacked’ trees: as the fields of data structures are inlines, there are no pointer jumps, thus making the most of the L1 cache.

That is, a "memory dump -> zero-copy memory read" of a subgraph of Haskell objects, allowing to pass such trees / subgraphs directly over a network. Slightly reminiscent of Cap'n Proto.

Zolomon 143 days ago [-]

They mention this in the article.

spockz 143 days ago [-]

It reminds me more of flat buffers though. Does protobuf also have zero allocation (beyond initial ingestion) and no pointer jumps?

cstrahan 141 days ago [-]

No, one example of why being variable sized integers.

See https://protobuf.dev/programming-guides/encoding/

carterschonwald 143 days ago [-]

One thing that sometimes gets tricky in these things is handling Sub term sharing. I wonder how they implemented it.

90s_dev 143 days ago [-]

We are always reinventing wheels. If we didn't, they'd all still be made of wood.

tlb 143 days ago [-]

> the serialised version of the data is usually bigger than its in-memory representation

I don’t think this is common. Perhaps for arrays of floats serialized as JSON or something. But I can’t think of a case where binary serialization is bigger. Data types like maps are necessarily larger in memory to support fast lookup and mutability.

IsTom 143 days ago [-]

If you use a lot of sharing in immutable data it can grow a lot when serializing. A simple pathological example would be a tree that has all left subtrees same as the right ones. It takes O(height) space in memory, but O(2^height) when serialized.

nine_k 143 days ago [-]

I suppose all self-describing formats, like protobuf, or thrift or, well, JSON are bigger than the efficient machine representation, because they carry the schema in every message, one way or another.

lordleft 143 days ago [-]

This was very well written. Excellent article!

NetOpWibby 143 days ago [-]

Is this like MessagePack for Haskell?

gitroom 143 days ago [-]

honestly i wish more stuff worked this way - fewer hops in memory always makes me happy

Loading comments...

nine_k 143 days ago [-]

That is, a "memory dump -> zero-copy memory read" of a subgraph of Haskell objects, allowing to pass such trees / subgraphs directly over a network. Slightly reminiscent of Cap'n Proto.

Zolomon 143 days ago [-]

They mention this in the article.

spockz 143 days ago [-]

It reminds me more of flat buffers though. Does protobuf also have zero allocation (beyond initial ingestion) and no pointer jumps?

cstrahan 141 days ago [-]

No, one example of why being variable sized integers.

See https://protobuf.dev/programming-guides/encoding/

carterschonwald 143 days ago [-]

One thing that sometimes gets tricky in these things is handling Sub term sharing. I wonder how they implemented it.

90s_dev 143 days ago [-]

We are always reinventing wheels. If we didn't, they'd all still be made of wood.

tlb 143 days ago [-]

> the serialised version of the data is usually bigger than its in-memory representation

IsTom 143 days ago [-]

nine_k 143 days ago [-]

lordleft 143 days ago [-]

This was very well written. Excellent article!

NetOpWibby 143 days ago [-]

Is this like MessagePack for Haskell?

gitroom 143 days ago [-]

honestly i wish more stuff worked this way - fewer hops in memory always makes me happy